Professional Documents
Culture Documents
78, No. 4 (Mar. 31, 1938), pp. 551-572 Published by: American Philosophical Society Stable URL: http://www.jstor.org/stable/984802 . Accessed: 08/02/2014 11:22
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
American Philosophical Society is collaborating with JSTOR to digitize, preserve and extend access to Proceedings of the American Philosophical Society.
http://www.jstor.org
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
that the first It has been observed pages of a table of common logarithms thatmoreusednumbers showmore wearthando thelast pages,indicating begin withthedigit1 thanwiththe digit9. A compilation ofsome20,000first digits shows sources thatthere is a logarithmic takenfrom distribution widely divergent ormore digits are composed offour offirst whenthenumbers digits. An analysis different showsthatthe numbers from sources ofthe numbers takenfrom unresuchas a group ofnewspaper showa much latedsubjects, better items, agreement witha logarithmic distribution thando numbers from mathematical tabulations or otherformal data. There is herethe peculiarfactthat numbers that indiin largegroups, in good viduallyare without relationship are, whenconsidered witha distribution law-hence the name " Anomalous agreement Numbers." A further ofthedata shows a strong analysis for tendency bodiesofnumerical data to fallintogeometric series. If theseries is madeup ofnumbers containing ormore thefirst form a logarithmic three series. If thenumbers condigits digits thegeometric relation stillholdsbutthesimple tainonlysingle digits logarithmic no longer relation applies. An equationis givenshowing the frequencies of first digitsin the different 1 to 10, 10 to 100,etc. orders ofnumbers The equationalso givesthefrequency ofdigits in thesecond, third place ofa multi-digit and it is shown thatthesamelaw appliesto reciprocals. number, Thereare manyinstances thatthegeometric showing or thelogarithseries, miclaw,has longbeenrecognized as a common phenomenon in factual literature and intheordinary affairs oflife. The wire gaugeand drill gaugeofthemechanic, the magnitude scale of the astronomer and the sensory response curvesof the are all particular psychologist examples ofa relationship thatseemsto extend to all human affairs. The Law ofAnomalous is thusa general Numbers probability law ofwidespread application. PART I: STATISTICAL DERIVATION IT OF THE LAW
has been observedthat the pages of a much used table of common logarithmsshow evidences of a selective use of the natural numbers. The pages containingthe logarithms of the low numbers1 and 2 are apt to be more stained and frayedby use than those of the highernumbers8 and 9. Of
PROCEEDINGS VOL. 78, NO.
4, MARCH 1938
OF THE
AMERICAN
PHILOSOPHICAL
SOCIETY,
551
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
552
FRANK BENFORD
course,no one could be expected to be greatlyinterestedin but the mattermay be the conditionof a table of logarithms, ofstudywhenwe recallthat the table moreworthy considered and engineering, is used in the building up of our scientific, general factual literature. There may be, in the relative table, data on how we cleanlinessof the pages of a logarithm thinkand how we react when dealing withthingsthat can be describedby means of numbers. Methodsand Terms the the data collectedwhileinvestigating Beforepresenting law that applies to numerpossibleexistenceof a distribution it may and to randomdata in particular, ical data in general-, a fewtermsand outlinethe methodofattack. be wellto define is made betweena digit,whichis one First, a distinction of the nine natural numbers 1, 2, 3, ... 9, and a number, whichis composedof one or moredigits,and whichmay contain a 0 as a digitin any positionafterthe first. The method any tabulationofdata that is not ofstudyconsistsofselecting in some way in numerical range,or conditioned too restricted too sharply,and makinga count of the numberof times the natural numbers 1, 2, 3, ... 9 occur as firstdigits. If a naturalnumberit decimalpoint or zero occursbeforethe first is ignored,forno attentionis to be paid to magnitudeother digit. than that indicatedby the first The Law of Large Numbers was made to collect data fromas many fieldsas An effort types. possible and to include a variety of widely different no numbers that have random from purely The types range relationotherthan appearing withinthe covers of the same magazine, to formalmathematicaltabulations that admit of no variationfromfixedlaws. Between these limitsone will recognizevarious degrees of randomness,and in general the title of each line of data in Table I will suggestthe nature of the source. In every group the count was continuousfrom to the end, or in the case oflong tabulations,to the beginning numberof observationsto insure a fair average. a sufficient
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
553
The numbers countedin each groupis givenin the last column of Table I.
TABLE I
PERCENTAGE OF TIMES THE DIGITS IN NUMBERS, NATURAL NUMBERS AS DETERMINED BY First Digit
T itle
_ _ _ _ _ _ _ __ - _ _ _ - _ _ _ - _ _ _
FIRST
-~ount 9
B C D
E Spec.Heat
F Pressure G H.P. Lost
7.2
8.6
6.2 5.8 6.0
I Drainage 27.1 J AtomicWgt. 47.2 K n-, i/n,*.**25.7 L Design 26.8 N Cost Data 0 X-RayVolts P Am. League Q Black Body R Addresses S n',n2... n! T Death Rate
Average.
H Mol. Wgt.
9.8 10.8
4.1
6.4 7.0
5.5
3.2
4.2
2.2 3259 10.6 104 5.0 100 4.7 3.6 703 690
5.1
335
M Digest
7.5
5.1
4.1
5.0 3.3 7.2 7.0
159 91
6.5
5.5 7.4 6.4 7.0 6.4 6.5
5.5
4.7 5.1 4.9 5.2 5.6
308
560
6.7
8.5
8.8
6.8
7.2
7.1
5.0 4.1
54
5.5
900
. . . . . .30.6
12.4 9.4 8.0 6.4 5.1 4.9 4.7 -0.4 +0.3 +-0.2 +-0.2 +-0.2 +-0.2 +0.3
1011
At the foot of each column of Table I the average percentage is given for each firstdigit, and also the probable errorof the average. These averages can be better studied if the decimal point is moved two places to the left,making the sum of all the averages unity. The frequency of first l's is then seen to be 0.306, whichis about equal to the common of first logarithmof 2. The frequency 2's is 0.185, which is slightlygreater than the logarithmof 3/2. The difference here,log 3 - log 2, is called the logarithmic integral. These resemblances persistthroughout, and finally thereis 0.047 to with or be compared log 10/9, 0.046.
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
554
FRANK BENFORD
The frequency of first digits thus follows closely the logarithmic relation
Fa
= log (a +
),
(1)
2 3 4 5 6 7 8 9
2 to 3 3 to 4 4 to 5 5 to 6 6 to 7 7 to 8 8 to 9 9 to 10
1 to 2
0.306
0.301
+0.005
?t0.008
There is a qualification to be noted immediately,for Table I was compiledfromnumberscomposed in general of and six digits. It will be shown later that Eq. (1) four,fivelaw for largenumbers,and there is a more is a distribution numbersof general equation that applies when considering one, two significant digits. If we may assume the accuracyof Eq. (1), we thenhave a law ofthe mostgeneralnature,forit is a probabilprobability ity derived from"events" throughthe medium of theirdescriptivenumbers;it is not a law of numbersin themselves. The range of subjects studied and tabulated was as wide as and as no definite timeand energy exceptionshave permitted; ever been observed among true variables, the logarithmic law forlarge numbersevidentlygoes deeper among the roots ofprimalcauses than our numbersystemunaided can explain. Frequency ofDigits in theqthPosition The second-place digits are ten in number,for here we the frequency musttake 0 into account. Also, in considering
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
555
of a second-place digit b we must take into account the interval between digit a that preceded it. The logarithmic to two digitsis now to be dividedinto ten parts corresponding ... of 9. Let a be the first the ten digits0, 1, 2, digit a nummeanber and b be the second digit;thenusingthe customary ing of position and order in our decimal system a two-digit numberis written ab, and the next greaternumberis written ab + 1. The logarithmic interval between ab and ab + 1 is log (ab + 1) - log ab, while the interval covered by the ten possible second-place digits is log (a + 1) - log a. Therefore the frequencyFb of a second-place digit b followinga first-place digit a is = log F Fb= )/l1 Og ( ab?+ ab ,1og + a (2)
It followsthat the probability fora digitin the qthpositionis ... p (q +1) 1 lgabc abc ... pq Fb = -3abc o (p+1)) log abc... oIp ..
abc p
Here the frequencyof q depends upon all the digits that precede it, but when all possible combinations of these digits are takeninto accountFq approachesequalityforall the digits 0, 1, 2, *.. 9, or Fq 0.1. (4) As a resultof this approach to uniformity in the qth place of digitsin all places in an extensivetabulathe distribution tion of multi-digit numberswill be also nearlyuniform.
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
556
Digit
First Place
Second Place
1. 2. 3. 4. 5. 6. 7... 8.... 9.
0.
0.000 0.301 0.176 0.125 0.097 0.079 0.067 0.058 0.051 0.046
0.120 0.114 0.108 0.104 0.100 0.097 0.093 0.090 0.088 0.085
Reciprocals and scientificdata are Some tabulations of engineering such as candles per watt, and watts given in reciprocalform, per candle. If one formof tabulation followsa logarithmic then the reciprocaltabulation will also have the distribution, will show that this same distribution. A little consideration must followfordividingunityby a given set of numbersby withmerely leads to identicallogarithms means of logarithms a negativesign prefixed. The Law ofAnomalousNumbers A study of the itemsof Table I shows a distincttendency forthose of a randomnatureto agreebetterwiththe logarithmic law than those of a formalor mathematicalnature. The was foundin the arabic numbers(not spelled best agreement out) of consecutivefrontpage news items of a newspaper. Dates were barred as not being variable, and the omissionof the counted digitsto numbers spelled-outnumbersrestricted 342 streetaddressesgiven in the cur10 and over. The first rentAmericanMen ofScience (Item R, Table IV) gave exceland a complete count (except for dates and lent agreement, page numbers)of an issue of the Readers' Digest was also in agreement. On the other hand, the greatest variations from the relation were found in the firstdigits of mathelogarithmic
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
557
maticaltables from engineering handbooks,and in tabulations ofsuch closelyknitdata as Molecular Weights,Specific Heats, Physical Constantsand AtomicWeights.
TABLE IV
SUMMATION OF DIFFERENCES BETWEEN FREQUENCIES OBSERVED AND THEORETICAL
Nature
Nature
1 2 3 4 5 6 7 8 9 10
D F G R P Q 0 M A T
Newspaper Items 2.8 Pressure Lost,AirFlow 3.2 H.P. Lost in AirFlow 4.8 Street Addresses, A.M.S. 5.4 Am. League,1936 6.6 Black Body Radiation 7.2 X-Ray Voltage 7.4 Readers' Digest 8.4 AreaRivers 9.8 Death Rates 11.2
11 N Cost Data, Concrete 12.4 12 S n.... n8,n! 13.8 13 L DesignData Generators 16.6 14 B Population, U. S. A. 16.6 15 I DrainageRate ofRivers 21.6 16 K n-1,-Vfn .*22.8 17 H Molecular Wgts. 23.2 18 E Specific Heats 24.2 19 C Physical 34.9 Constants 20 J Atomic 35.4 Weights
These factslead to the conclusionthat the logarithmic law applies particularly to those outlaw numbersthat are without known relationshiprather than to those that individually followan orderly course;and therefore the logarithmic relation is essentiallya Law of Anomalous Numbers.
PART
OF THE LAW
The data so farconsidered have been composedentirely of used numbers;that is, numbersas they are used in everyday affairs. There must be some underlying causes that distort what we call the "natural" numbersysteminto a logarithmic and perhaps we can best get at these causes by distribution, first examiningbriefly the frequency of the natural numbers themselves when arranged in the infinitearithmeticseries 1, 2, 3, ... n, wheren is as large as any numberencountered in use. Let us assume that each individualnumberin the natural numbersystemup to n is used exactlyas oftenas everyother individual number. Starting with 1, and counting up to
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
558
FRANK BENFORD
10,000,forexample, 1 would have been used 1,112 times,or 11.12 per cent of all uses. If the count is extendedto 19,999 l's occur in 55.55 per cent thereare 9,999 l's added, and first of the 19,999 numbers. When number 20,000 is reached there is a temporarystopping of the addition of first l's and 90,000 of the other digits are added to the series before FRfEQfJENVCY Or P/,Sr PLACC D/G/r3 0 OBSERVED
0.30
/ 2
3 4
7 8 3
intothe series,at100,000. At thispoint l's are again brought the percentageof l's is again reduced to 11.112 per cent as illustratedin curve A of Fig. 2. This curve is Fn and log n scale. If the equations forA plotted to a semi-logarithmic forthe threediscontinuous but connectedsections are written 10,000-20,000, 20,000-99,999 and 99,999-100,000 the area underthe curve will be veryclosely0.30103, wherethe entire area of the frameof coordinateshas an area 1. But an integrationby the methodsof the calculus is merelya quick way of adding up an infinite numberof equallyspaced ordinatesto the average heightof this additionfinding the curve and from
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
559
the ordinatesand hence the area underthe curve. But if we are satisfied witha resultsomewhatshortof the perfection of the integralcalculus we may take a finitenumberof equally spaced ordinatesand by plain arithmetic come to practically the same answer. By definition each point of A represents
LINEAR FREqU/NC/ES 1 2 3 4 5 67
FROM /Q000TO/0,000
89
/0
ArFOR/ 8FOR 9
0.4 0O3
NA7VRAA NUMBER
FIG. 2. Linear frequencies of the naturalnumber systembetween10,000and
100,000.
the frequency of first l's from1 up to that point,and an integration (by calculus or arithmetic)under curve A gives the average frequency offirst l's up to 100,000. The finite number corresponding to equally spaced ordinatesnow representsa geometric series of numbersfrom10,000 to 100,000,and it is substantially thisseriesofnumbers, in thisand otherordersof
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
560
FRANK BENFORD
the natural numberscale that lead to the,numericalfrequencies already presented. digit. The frequency Curve B of Fig. 2 is for9 as a first of9's decreasesin the numberrangefrom10,000to 89,999 and then increasesas 9's are added from90,000 to 99,999, and an undercurve B leads to a good numericalapproxiintegration intervallog 10 - log 9, as called for mationto the logarithmic by the previousstatisticalstudy. Series and Logarithmic Geometric seriesand a logarithof a geometric The close relationship demonstration. micseriesis easilyseenand hardlyneedsformal spaced ordinates of Fig. 2 forma geometric The uniformly series of numbersfor these numbershave a constant factor is determined and thisconstantfactor betweenadjacent terms, increment. logarithmic in size by the constant Semi-LogCurves plottedto a semi-logarithmic A geometric seriesofnumbers In line. the originaltabulation of obscale gives a straight served numbersthe line of data marked "R" is designated simplyas "street addresses." These are the streetaddresses Men American in the current 342 people mentioned ofthe first of such a list is hardlyto be disofScience. The randomness be usefulforillustrativepurputed, and it should therefore poses. indicatedby the height In Fig. 3 these addresses are first of the lines at the base of the diagram. The heightof a line, measured on the scale at the left,indicates the numberof addresses at, or near, that streetnumber. Thus therewere fiveaddressesat No. 29 on various streets. In orderto make the trendclearer,the heightsof these lines were summed,beacross to the right. It was at the leftand proceeding ginning found that four straightlines could be drawn among these of trend,and these four summationpoints with fair fidelity lines representfour geometricseries, each with a different factorbetweenterms. Each line will give the observedfre-
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
/9-34
-
WIll
/0
/0
___t_Q
_08_C
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
562
FRANK BENFORD
range it covers,and hence satisfies quency over the numerical relationship. the logarithmic and Nature'sNumbers The Natural Numbers In natural events and in events of which man considers thereare plentyofexamplesofgeometric an originator himself progressions. We are so accustomedto labeling orlogarithmic things1, 2, 3, 4, *** and thensayingtheyare in naturalorder that the idea of 1, 2, 4, 8, * being a more natural arrangement is not easily accepted. Yet it is in this latter manner large numberofphenomenaoccur,and the that a surprisingly evidenceforthis is available to everyone. First, let us considerthe physiologicaland psychological reactionto externalstimuli. with increasing The growthof the sensationof brightness illumination is a logarithmic function, as illustrated by Fechner's Law. The growth of sensation is slow at first and a straight whilethe rodsofthe retinaare alone responsive, paper (the stimulus being on the line on semi-logarithmic functhe intensity-brightness scale) can represent logarithmic tion in this region. When the cones come into action there and anotherstraight is a sharp change in the rate of growth, rangeof vision. When over-exciour working line represents tationand fatigueset in, a thirdline is needed; and thus three geometricseries could be used to state the relationbetween illuminationand the sensation of brightness. If the literathe brightness numericalreferences, ture contained sufficient of the close approximation function should give an extremely law of distribution. logarithmic The sense of loudness followsthe same rules, as does the sense of weight;and perhaps the same laws operate to make at ages ten and fifty. the senseofelapsed timeseemso different seriesthat repeat geometric Our music scales are irregular everyoctave. rigidly In the fieldof medicine,the responseof the body to medias are the killingcurves cine or radiationis oftenlogarithmic, undertoxinsand radiation.
.
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
563
In the mechanical arts, where standard sizes have arisen fromyears of practical experience, the finalresultsare often geometricseries,as witness our standards of wire diameters and drillsizes, and the issued lists of "preferred numbers." The astronomer lists stars on a geometric scale brightness that multipliesby 100 every five steps and the illuminating engineer adopts the same typeofseriesin choosing thewattage of incandescentlamps. In the field of experimentalatomic physics, where the results representwhat occurs among groups of the building units of nature, and where the unit itselfis known only by mass action,the test data are statisticalaverages. The action of a single atom or electronis a random and unpredictable event; and a statistical average of a group of such events would show a statisticalrelationship to the resultsand laws here presented. That this is so is evidencedby the frequent use made of semi-logpaper in plottingthe test data, and the test points often fall on one or more straightlines. The analogy is complete, and one is temptedto thinkthat the 1, 2, 3, *.. scale is not the natural scale; but that, invokingthe base e of the natural logarithms, Nature counts e0,
ex, e2x e3x ...
III.
DIGITAL
ORDERS OF NUMBERS
The natural number system is an array of numbers in simple arithmetic series,but on top of this we have imposed an idea taken froma geometricseries. Numbers composed of many digits are ordinarily separated into groups of three digitsby interposing commas,and here we unknowingly give evidence of the use of these numberson a geometric scale. For convenience ofdescription the naturalnumbers1 to 10 are called the first digitalordernumbers, thosefrom10 to 100 the second digitalorder,etc. It will be noted that 10 is both the last numberof the first orderand the first numberof the second order,and when an integration is carriedout, as will
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
564
FRANK BENFORD
be done later, 10 appears as both an upper and a lowerlimit, and it is thus used in this case as a boundaryline ratherthan a unit zone in the natural numbersystem. In Fig. 4 the curves show the frequencywith which the naturalnumbersoccurin the Natural NumberSystem,beginits frequency ningat the leftedge,where1 is the onlynumber, is 1; that is, until a second numberis added 1 is the entire is 0.50 for1 numbersystem. When2 is reachedthefrequency
/INEAIR FREQUEAW/E5 OF rHE NATURAL mum8ERS
/ ro
/,0O
am
A feL
r Dw Xw __r
/Gr17L
~~~~~JECOND
ORDER-O_
44
X I1 I
V0
41
11;1
42C~~~~~~~
1.2% -V tl=SW XM
rr/2
;SS
f oreachofthe thefrequency 2. AtS,for andO.S0for example, until9 is continues is 0.20,and theequal division S digits first reached. At 10, the digit1 has'appearedtwiceand has a of 0.20 against0.10 foreach of the othereight frequency but once. thathave appeared digits from 9 on thescale thatthecurve It willbe observed rising thecurvecontinuing is foronlythedigit1, while of abscissoe 2 to 9 inclusive. At 19 the from 9 is forthedigits downward 2 to rises for curve join thecurvefor1 at 29 and 1 frequency
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
565
and 2 have a commoncurve until 99 is reached and a third 1 is about to be added to the series. At any ordinatethe first curves therefore tell the frequencyof the total number of natural numbersup to that point.
II I
I,
/0
The curvesare drawnas ifwe weredealingwithcontinuous functionsin place of a discontinuousnumbersystem. The forusing a continuousformis that the thingswe justification use the numbersystem to representare nearly always perand the number,say 9, given to fectlycontinuousfunctions, willbe used in some degreeforall the infinite any phenomenon sizes of phenomenabetween 8 and 10 when we confineourselves to singledigitnumbers.
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
566
FRANK BENFORD
An enlargedsketch of the linear frequencycurves at the and second ordersis givenin Fig. 5. The junctionofthe first lines h-b and b-j are the computedratios of 1 in this region, whilethe lines 8-b forthe ratio of 9 beginsat 8, foras soon as of our usinga 9, whilefor size 8 is passed thereis a possibility rFIW /EUNCY OF S/NGLE D/G/I3 / T09 + rHEORE7/CAL O OgSERvVED FREQUENCY Or FOOTNOFfS VING Ar L.EAsr /N /o BOOKS EACH H/A ONE PAGE WIrH TEN FoorWores O&RV&ED) (2,968
0.50
__
0.40
0. /O
7 8a9
size 812 the chancesare about equal forcallingit either8 or 9. The summationof area underthe curve 8-b-c is taken as the ofusinga 9 forphenomenain this region. This is probability accuratelythe size ofall phenomabout equivalentto knowing between8.5 ena in this regionand decidingto call everything
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
567
and 9.5 by the number9. Once 9 is passed the curve for 1, b-j, beginsto risein anticipationof the phenomenabetween9 and 10 that will be called 10. It has been notedthat forhighordersofnumbers the areas underthe curvesof Fig. 2 are proportional to the frequency of use of the firstdigit. The same demonstration will now be made withthe aid ofthe calculus in regions that are markedly discontinuous. Selectingthe thirddigitalorder,Fig. 4, the area underthe 1-curvecan be written
*199
A1"'
00
yd dx +j
999 19
lOQO
Y2d +
88
99
3 dx,
(5)
(6)
a-
888
(8)
(9)
(10)
loe
99
8 1000'
A similar operation yields for the 1-curve in the second digital order 8 A1 A1" = og. g190 + 100 9
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
568
FRANK BENFORD
10
throughthese solutionsand running From the symmetry digits,we can write the solutionsforthe eightotherfirst from the generalequation forthe Law of AnomalousNumbers
F
where
r = Fa
oge [lge (a
~~a+
1) 10-
1N
)10r -1
lori
_
tN11
the expressions from to convert whereN = log, 10 is thefactor base e, to the commonlogarithm system, the naturallogarithm system,base 10. done as was unwittingly If highordersof r are considered, simplify by in the originalstatisticalwork,these expressions and denominator, the terms - 1 in both numerator dropping beand the numericaltermshaving lor in the denominator come negligible. Hence the generalequations become
Fr = =log0lo1,
Far
(12)
,
a$l
= log
(13)
in form, no longerhave a difference but these two expressions and -theymay be mergedinto
Far =
+ log1o a
(14)
observedformulti-digit originally whichwas the relationship numbers. In Table V numericalvalues are given forthe theoretical second, third and of used numbersfor the first, frequencies digitalorders. limiting
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
569
First Order
Second Order
Limiting Order
1 2 3 4 5 6 7 8 9
1 tO 10
10 tO 100
100 tO 1000
The frequencies ofthe singledigits1 to 9 varyenoughfrom the frequencies of the limiting orderto allow a statisticaltest if a source of digitsused singlycan be found. The footnotes so commonlyused in technical literatureare an excellent source, consistingof units that are indicated by numbers, lettersor symbols. The procedure ofcollecting data forthe first-order numbers was to make a cursoryexaminationof a volume to see if it contained as many as 10 footnotesto a page, forobviously no test of the range1 to 9 could be made if the maximum number fell short of the full range. The numbershere recorded in Table VI are the numberof footnotes observedon consecutivepages, beginning on page 1 and continuing to the end of the book, or until it seemed that a fairsample of the book had been obtained. The books used were the Standard Handbook for Electrical Engineers, Smithsonian Physical Tables, Handbuchder Physik and Glazebrook's Dictionaryof Applied Physics. In Table VI the observedpercentagesof singledigits1 to 9 are givenalongwiththenumber ofpages used in each volume and the numberof footnotes observed. The frequency for1 is seen to be 43.2 per cent as against the theoretical frequency of39.3 per cent,and forthe digit9 the observations agreewith = theorywith Fg' 0.8 per cent.
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
570
FRANK BENFORD
In general the agreementwith theoryis as good as the computedprobable errorsof the observation.
TABLE VI
COUNT OF FOOTNOTES 1 ~~Pages __ Used 2
_ _ _
3
_ _
4
_ _
5
_ _ _
6
_ _
7
_ _
8
_ _
9
_
Volume Volume
Frequencies, in Per Cent 22.7 22.1 12.3 6.6 5.0 6.1 2.4 5.0 1.7 2.2 0.3 1.1 0.3 0.6 0.3 0.0
Total Count
3. 4. 5. 6. 7. 8. 9. 10.
II. derPhy..
H. der Phy.. H. der Phy.. H. der Phy...
All All
360 360
365
55.1 56.3
27.5 11.8 10.7 23.2 6.7 7.6 22.3 13.7 6.9 25.2 13.4 9.1
8.5
5.5
3.2
0.8
2.6 1.8 6.1
0.0
2.2 1.0 5.8
1.6
127
11.8 8.3 4.9 3.9 1.9 1.6 0.8 5.3 3.6 2.4 13.3 8.1 1.5 0.8 -1.5 +0.2 -0.4 +0.3 -0.5 +0.1 0.0 40.7 ?0.5 ?0.6 ?0.5 ?0.4 ?0.4 ?0.4
2968
Summation ofFrequencies thatmustbe metby theseexpressions One ofthe conditions of the integersis that, in any one order, forthe frequencies must equal unity;that is, the sum the sum of the frequencies must equal certainty. of theirprobabilities Selectingthe first-order digits,Eq. 11, and remembering of a group rule that the sum of the logarithms the logarithmic of numbersis equal to the logarithmof theircombinedprodP' ucts, we have the probability
P = logio 9102-345678
1023456789 1-010
rs
10 1010
Pt
=
10
10
10
10 N'
1 1
whichreducesto =1.
log1010 + 0
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
571
+I
10
1
100
1
100
1
100
10 10100
100
=1
log1010 + 0
1 1 100 N
and similarproofcan be workedout forthe otherorders. SummaryofPart III Single digits, regardlessof their relation to the decimal point and also regardless of preceding or following zeros,have a specific natural frequencythat varies sharply from the logarithmic ratios. The second digital order,which is composed of two adjacent significant digits, has a specificfrequency approximatingthe logarithmicfrequency; and for three or more associated digitsthe variation fromthe latter frequency would be extremely difficult to findstatistically. The basic operation F=f F or F_ _a a
fda a
in converting from the linearfrequency ofthe naturalnumbers to the logarithmic frequency ofnaturalphenomenaand human events can be interpreted as meaningthat, on the average, these things proceed on a logarithmicor geometricscale. Anotherway of interpreting this relationis to say that small thingsare more numerousthan large things,and there is a tendencyfor the step between sizes to be equal to a fixed fraction ofthe last preceding phenomenon or event. There is
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
572
FRANK BENFORD
no necessityor implicationof limits at eitherthe upper or the lower regionsof the series. If the view is accepted that phenomenafallinto geometric series,then it followsthat the observedlogarithmic relationship is not a result of the particularnumericalsystem,with its base, 10, that we have elected to use. Any other base, such as 8, or 12, or 20, to selectsome ofthe numbers that have been suggestedat various times, would lead to similarrelationships; for the logarithmicscales of the new numerical systemwouldbe coveredby equally spaced stepsby the march ofnaturalevents. As has been pointedout before, the theory of anomalous numbersis reallythe theoryof phenomenaand events, and the numbersbut play the poor part of lifeless symbolsforlivingthings.
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions