/s'
1
a
1
Digitized by the Internet Archive
in 2008 with funding from
IVIicrosoft Corporation
http://www.archive.org/details/firstcourseinstaOOjoneuoft
BELL'S MATHEMATICAL SERIES
ADVANCED SECTION
General Editor, WILLIAM P. MILNE, M.A., D.Sc.
Professor of Mathematics, University of Leeds
A FIRST COURSE IN STATISTICS
BELL'S MATHEMATICAL SERIES.
ADVANCED SECTION.
General Editor: WILLIAM P. MILNE, M.A., D.Sc,
Professor of Mathematics, University of Leeds.
AN ELEMENTARY TREATISE ON DIFFEREN
TIAL EQUATIONS AND THEIR APPLICA
TIONS. By H. T. H. PiAGGio, M.A., D.Sc,
Professor of Mathematics, University College,
Nottingham. Demy 8vo. I2s.
A FIRST COURSE IN NOMOGRAPHY. By S.
Brodetsky, M. a. , B. Sc. , Ph. D. , Reader in Applied
Mathematics at Leeds University. Demy 8vo. lOs.
AN INTRODUCTION TO THE STUDY OF
VECTOR ANALYSIS. By C. E. Weatherburn,
M. A., D.Sc, Professor of Mathematics and Natural
Philosophy, Ormond College, University of Mel
bourne. Demy 8vo. I2s. net.
A FIRST COURSE IN STATISTICS. By D.
Caradog Jones, M.A., F.S.S., formerly Lecturer
in Mathematics at Durham University. Demy 8vo.
15s. net.
THE ELEMENTS OF NON  EUCLIDEAN
GEOMETRY. By D. M. Y. Sommerville,
M.A., D.Sc, Professor of Mathematics, Victoria
University College, Wellington, N.Z. Crown 8vo.
7s. 6d. net.
LONDON : G. BELL AND SONS, LTD.
^^\
raV>s
A
FIRST COURSE
IN
STATISTICS
rC^
)
BY
D? CARADOG JONES, M.A., F.S.S.
FORMERLY LECTURER IN MATHEMATICS
AT DURHAM UNIVERSITY
LONDON
G. BELL AND SONS, LTD.
192 I
PREFACE
Fifty years ago a large section of the general public were not
only uninterested in what we now call the social problem, but they
scarcely gave a thought to the existence of such a problem. They
felt vaguely perhaps, during periods of acute distress due to lack of
employment, that all was not well and they thought the Govern
ment or possibly the big landowner was to blame, but only the
more enlightened realized the complexity of the body poHtic and
how fearfully and wonderfully it is made. Today all this is changed,
and comparatively few imagine that a single panacea — the pro
hibition of drink, the nationalization of land, or a levy on capital —
will cure all evils.
The very fact that nearly the whole civilized world has given
itself up for over four years to the destruction of life and the dragging
down of the social fabric in all countries on so vast a scale has
led to a surfeit and a reaction in which thoughtful men are eager
to take part in proclaiming again a common brotherhood and in
building a better world. Those who have always been interested
in this kind of architecture welcome the change of spirit, but they
also recognize the difficulty of the task undertaken and the need
for no little mental effort to second the good will, which is the jQrst
essential for success. To pull down no teacher is needed, but we
must learn to build.
This leads one to the subject of the present book. The man who
wishes his work to stand must make sure of its foundations. He
cannot afford to rest satisfied, as too often the poUtician and social
worker do, with wild and illinformed generalizations where more
exact knowledge is possible, and there are few human problems in
the discussion of which some acquaintance with the proper treat
ment of statistics is not in the highest degree necessary.
Vi STATISTICS
Most people, however, are suspicious of figures. They imagine
that quantitative considerations must of necessity deaden all
feeling for the purely aesthetic or qualitative spirit which is the
very life of the phenomena observed or measured. But this surely
need not be the case. Kepler, when he succeeded in translating
the motions of the planets into the language of number was not, we
believe, the less but rather the more enamoured of the beauty and
order with which the whole of creation is clothed.
A second reason for suspicion is that partisans of one school or
another with more push than principle sometimes trade upon the
general ignorance of statistics to ' prove ' their own pet theories,
while others no less enthusiastic lead the credulous public into the
ditch, not with malice intent, but because they are really blind
themselves to the right interpretation of the figures they so glibly
quote.
Although a concern in social questions led the present writer in
the first instance to study the theory of statistics, there is no reason
why this bias should prevent the book being of service to those who
wish to know something of its application in other directions, seeing
that the general principles underlying the theory are the same in
all cases, and illustrations have been taken from any field, biological,
economic, medical, etc., just as they suited the immediate purpose
in view.
The author makes no claim to any originality : he is no more
than a student seeking to put together, with some kind of system
and as he understands them, the simpler and more important ideas
he has gathered from other sources. The matter is entirely the
work of others, the manner only is his own, and he will be happy
to receive criticism if thereby he may learn more. His chief quali
fication for writing is that he has had to worry through most of
his difficulties alone, and consequently he knows where another
student is likely to be in trouble better perhaps than the kind of
writer who is so quick as to be able to see through things at a glance
or, failing that, so fortunate as to be able to borrow immediate
light from others.
The book is divided into two parts. Practically all the first part
should be well within the understanding of the ordinary person.
PREFACE vii
Part II. is more mathematical, but an effort has been made through
out to explain results in such a way that the reader shall gain a
general idea of the theory and be able to apply it without needing
to master all the actual proofs. The whole is meant, not as an
exhaustive treatise, but merely as a first course introducing the
reader to more serious works, and, since real inspiration is to be
found nowhere so surely as at the source, it is intended to encourage
and fit him to pursue the subject further by consulting at least the
most important original papers referred to in the text, only enough
references being given to awaken curiosity. With the same inten
tion a short chapter is inserted after the Appendix by way of sug
gesting a few of the sources of statistics Hkely to be of interest to
the social student.
Some living writers, notably Professor Karl Pearson, have
contributed so largely to the development and application of
statistics that it is impossible to write upon the subject at all without
incorporating large parts of their work, and the least one can do
is gladly to record the benefit and pleasure one has received from
them. The author's indebtedness to the two most important
English textbooks — Yule's Theory of Statistics and Bowley's
Elements of Statistics — will be evident also to any one who knows
these books, for they became so familiar through constant study
that he fears he may have drawn upon them unconsciously even
to the point of plagiarism in places.
Finally, he wishes specially to acknowledge the kindness of four
friends — Mr. Peter Fraser, Lecturer in Mathematics at Bristol Uni
versity, without whose encouragement in the early stages the work
would never have been attempted ; Pjrofessor H. T. H. Piaggio,
University College, Nottingham, and Mr. A. W. Young, sometime
Lecturer at the Sir John Cass Technical Institute, London, whose
criticisms and suggestions were most valuable ; and Professor
W. P. Milne, of Leeds University, who, both as a practical teacher
and as Editor of this series, ungrudgingly gave his help and advice.
D. C. J.
CONTENTS
PART I
II.
III.
IV.
V.
VI.
VII.
VIII.
IX.
 X.
XL
Introductory — Early Historical Beginnings : Logical
Development ......
Measurement, Variables, and Frequency Distribution
Classification and Tabulation
Averages ......
Averages (continued) — Applications of Weighted Mean
Dispersion or Variability . . . ' .
Frequency Distribution : Examples to illustrate Calcu
lating and Plotting : Skewness
Graphs — Correlation suggested by Graphical means
Graphs (continued) — Graphical Ideas as a Basis for Inter
polation : Reasoning made Clear with the help of
Graphs or Curves .....
Correlation ......
Correlation — Examples ....
1
5
14
22
32
42
52
68
85
102
115
PART II
XII. Introduction TO Probability AND Sampling . . 132
XIII. Sampling (continued) — Formul.*: for Probable Errors . 150
:► XIV. Further Applications of Sampling Formulae . . 165
XV. Curve Fitting— Pearson's Generalized Probability
Curve ........ 178
XVI. Curve Fitting (continued) — The Method of Moments for
connecting Curve and Statistics .... 194
XVII. Applications of Curve Fitting .... 206
XVIII. The Normal Curve of Error . . . . .231
XIX. Frequency Surface for Two Correlated Variables . 249
APPENDIX
Certain Current Sources of Social Statistics
A Note on Tables to aid Calculation
263
279
284
ix
I
PART I
CHAPTER I
mTRODUCTORY
Early Historical Beginnings. Statistics, more or less valuable,
have been compiled in most civilized countries from very early
times. One reason for doing this on a large scale has been to
ascertain the manpower and material strength of the nation for
miUtary or fiscal purposes, and we read in the Old Testament of
such censuses being taken in the case of the Jews, while among the
Romans also it was a common practice.
In England, as economic terms began to be used and their mean'
ings analysed, and especially during the period when the mercantile
system prevailed, and the Government endeavoured so far as was
practicable to direct industry into channels such that it would add
most to the power of the realm, men tried frequently to base argu
ments for social and political reform upon the results of figures
collected. A distinct advance had been made in the seventeenth
century when mortality tables were drawn up and discussed by Sir
William Petty and Halley, the famous astronomer, among others,
and their labours prepared the way for a more scientific treatment
of statistical methods, especially at the hands of one, Siissmilch, a
Prussian clergyman, who published an important work in 1761.
It is almost true to say, however, that until the time of the great
Belgian, Quetelet (17961874), no substantial theory of statistics
existed. The justice of this claim will be recognized when we
remark that it was he who really grasped the significance of one
of the fundamental principles — sometimes spoken of as the constancy
of great numbers — upon which the theory is based. A simple illus
tration will explain the nature of this important idea : Imagine
100,000 EngUshmen, all of the same age and living under the same
normal conditions — ruling out, that is, such abnormalities as are
occasioned by wars, famines, pestilence, etc. Let us divide these
men at random into ten groups, containing 10,000 each, and note
the age of every man when he dies. Quetelet's principle lays
A
2 STATISTICS
down that, although we cannot foretell how long any particular
individual will live, the ages at death of the 10,000 added together,
whichever group we consider, will be practically the same. De
pending upon this fact insurance companies calculate the premiums
they must charge, by a process of averaging mortality results re
corded in the past, and so they are able to carry on business without
serious ,risk of bankruptcy.
As a distinguished statistician once said, ' By the use of statistics
we obtain from milliards of facts the grand average of the world.'
But if the average resulting from our observations were subject to
violent fluctuation as we passed from one set of facts to another
cognate set there would be little satisfaction in finding it. It is
the comparative constancy of the average, if the number of our
observations is large enough, which makes it so important, as
Quetelet observed, for although the idea was not altogether new he
first realized how wide an application it had and how fruitful of
practical results it might prove.
Quetelet was born in Ghent, and taught mathematics in the
College there in his early youth. After graduating as Doctor of
Science he became Professor of Mathematics in Brussels Athenaeum
when only twentythree years old, and later he was made Director
of the Brussels Observatory, in the foundation of which he had
taken a leading part. In 1841 he was appointed President of the
Central Commission of Statistics, where he was in a position to
render valuable assistance to the Belgian Government by his advice
on important social questions. He initiated the International
Statistical Congress, which has served to bring together the leading
statisticians of all countries, and the first meeting was held in 1853
at Brussels. His death occurred at the ripe age of seventy eight.
Some idea of the extent of Quetelet's statistical researches may
be gathered from the titles of his chief works : (1) Sur Vhomme et
le developpement de ses facuUes, ou essai de physique sociale (1835) ;
(2) Lettres . . . sur la theorie des probabilites appliquee aux sciences
morales et politiques (1846) ; (3) Du systeme social et des lois qui le
regissent (1848) ; (4) U AnthropomHrie, ou mesure des differentes
facultes de Vhomme (1871).
In his writings he visuaUzes a man with qualities of average
measurement, physical and mental (V%omme moyen), and shows
how all other men, in respect of any particular organ or character,
can be ranged about the mean or average man, just as in Physics
a number of observations of the same thing are ranged about
the mean of all the observations. Hence he concluded that the
INTRODUCTORY 3
methods of Probability, which are so effective in discussing errors
of observation, could be used also in Statistics, and that deviations
from the mean in both cases would be subject to the binomial law.
Hain in Vienna put some of Quetelet's ideas to good service in
1852, employing a superior method for the calculation of statistical
variability. Knapp and Lexis in Germany, also following up
Quetelet's principles, made an exhaustive investigation several years
later of the statistics of mortality, and their work has been extended
in many directions, and in our own time notably by Galton, Karl
Pearson, and Edgeworth.
The name of Sir Francis Galton (18221911), to whose work as
a pioneer the science of Statistics owes so much, is deserving of
even greater honour than it has yet received. Founder of the School
of Eugenics, Galton himself came of famous stock, being grandson
of Erasmus Darwin and a cousin to Charles Darwin. He studied
medicine in early youth, but after graduating at Cambridge his
attention was turned to exploration, and the Royal Geographical
Society awarded him a gold medal on the results of his investiga
tions in South West Africa. His first great work on heredity was
not published till 1869, after he had already earned distinction in
other directions, for he was elected a Fellow of the Royal Society
in 1860. Alive with new ideas, marvellously patient and persistent
in bringing them to the test of observation — qualities essential for
real scientific research — he set himself to inquire into the laws
governing the transmission of characteristics, physical and mental,
from one generation to another. Large tracts of this ground have
since been carefully explored and mapped out by the school of
his great successor, Karl Pearson, who has originated formulae for
testing the extensive anthropometrical and biological data col
lected. Largely as a result of their work it is now widely recognized
that ' the whole problem of evolution,' as Professor Pearson himself
has well said, ' is a problem in vital statistics — a problem of longevity,
of fertility, of health, and of disease, and it is as impossible for the
evolutionist to proceed without statistics as it would be for the
RegistrarGeneral to discuss the national mortality without an
enumeration of the population, a classification of deaths, and a
knowledge of statistical theory.'
Logical Development. The best way to approach the study of
any subject, if one had time, would be along the lines of its historical
development, but these lines seem so often to diverge from the
main theme, like branches from the parent stem of a tree, that
4 STATISTICS
when one tries to describe them the general effect is apt to be some
what confusing. It is therefore usually the custom to adopt a
logical rather than a historical sequence, but it may assist the reader
to see the connection between the two and the unity which embraces
the whole if we now briefly trace the natural growth of the subject,
suggesting the steps we might expect it logically to take. This we
have tried to keep in view as nearly as possible in the succeeding
chapters, except that the order may have been altered here and
matter may have been omitted or inserted there as* reason and
the elementary nature of the work dictated : —
1. Owing to the difficulty which the mind experiences in grasping
a large mass of figures, the necessity for an average arises to sum
up shortly the character of the mass, and various kinds of averages
are proposed.
2. An average proves insufficient alone to define the whole scheme
of observations, and other constants are invented to measure their
spread or dispersion about the average.
3. Considerations of space and the desire for some kind of system
lead further to the formation of tables with the observations classi
fied in ordered groups.
4. The formation of these tables suggests the possibility of a
graphical representation of the numbers in the different groups to
bring out the nature of their distribution.
5. The impossibihty of dealing with a whole population results
in the selection of samples, and the comparison of one sample with
another introduces the subject of random errors.
6. The closer examination of this subject leads us into the domain
of mathematical probability and discovers the probabiHty curve, or
normal curve of error, first formulated in connection with the study
of errors of observation.
7. This same curve serves in the sequel to describe a certain
important type of statistical distribution, in which each observation
is determined by a multitude of socalled chance causes puUing this
way and that, so that it is impossible to foretell what the resultant
effect will be.
8. The failure of the normal curve to describe other common dis
tributions, especially those which are unsymmetrical in character,
leads to the development of skew varieties of curves which will
fit them.
9. The extent of connection between one set of data and a pos
sibly related set is a natural subject for inquiry giving rise to the
theory of correlation.
CHAPTER II
MEASUREMENT, VARIABLES, AND FREQUENCY DISTRIBUTION
Measurement. There are two fundamental characteristics which
pertain to nearly all measurement : it is (1) relative : it involves
a comparison between one magnitude and another of the same kind,
and (2) approximate : the comparison in practice cannot be made
with absolute exactness.
A man's height, for example, is stated to be 5 ft. 8 in., but this
would convey Httle to one who did not know how long a foot was
and how long an inch was. The first step in the measurement is
made by comparing the man's length with a certain constant
length previously agreed upon as a standard or unit, namely, a
' foot ' ; he is placed to stand up against a scale which is divided
up into feet, and the highest point of his head is seen to come
somewhere between the 5 ft. line and the 6 ft. Hne : he is there
fore longer than five of these units, set end to end, but not so long
as six of them. To carry the measurement a stage further a smaller
unit has to be introduced ; each foot length of the scale is sub
divided into twelve equal parts called inches, and the top of the
man's head is found to come somewhere between the 5 ft. 8 in.
line and the 5 ft. 9 in. line : he is therefore over 5 ft. 8 in., but
not quite 5 ft. 9 in. in height. For the next stage in the measure
ment each inch of the scale has to be further subdivided into quarter
inches, and the top of the man's head is found to come somewhere
between the 5 ft. 8 in. 3 qu. in. line and the 5 ft. 9 in. line ; more
over it is nearer, let us suppose, to the former line than to the latter.
In this case, then, we say that the man's height or length is 5 ft.
8f in., measured to the nearest quarter inch.
In measurement the decimal notation has very obvious advan
tages, because each unit is always divided into ten equal parts to
get the next smaller unit. Thus a weight of 7 kilogr. 5 hectogr.
3 decagr. 8 gr. 4 decigr. 3 centigr. can be expressed at once in
grammes, namely 753843 gr. ; hence if we were measuring to the
nearest decagramme, the result would be expressed as 764 decagr. ;
to the nearest decigramme, it would be 75384 decigr., etc.
6 STATISTICS
Similarly, a length of 12 kilom. 7 metres 2 centim. can be written
1200702 metres, or, in kilometres, 12*00702 kilom., or, to the nearest
decametre, 1201 decam., and so on.
The mere act of counting things of a like kind is, in a sense,
measurement of a primitive type, one thing being the linit, though
the measurement may in many such cases be exact ; for example,
we may count the number of persons in a room exactly. Even in
this type of case, however, the counting or measuring cannot
always be done accurately, but the inaccuracy arises from lack of
precision and uniformity in definition rather than from want of
power in the measuring instrument itself : e.g. in determining the
population of a city, inaccuracies may arise because of failure to
define exactly the boundaries of the city, or the time at which the
census is to be taken, or how to deal with the migration of the in
habitants from or into the city, and with births and deaths during
the actual time of numbering.
Variables. By a variable is meant any organ or character which
is capable of variation or difference in size or kind. The difference
may be measurable as in the case of headlength, height, tempera
ture, etc., or not directly measurable as in the case of colour, intelli
gence, occupation, etc. Further, the variation, when measurable, may
be continuous, or it may take place only by integral steps, omitting
intermediate values : population, for example, can never go up or down
by less than one, but if temperature is to change from 60 degrees to
61 degrees it must pass continuously through every intermediate
state of temperature between 60 degrees and 61 degrees.
In dealing with a measurable variable sometimes we are inter
ested not so much in its actual value at a particular instant as in
the change which has taken place in its value during some specified
interval, but to gauge fairly the amount of this change it is necessary
to measure it relative to the original value of the variable. For
example, if we are told that the wages of a certain person have
gone up during the year to the extent of 3d. an hour, we cannot
say whether this is much or little to him until we know what his
wages were originally. The addition would be relatively much less
if he were a skilled patternmaker earning Is. 6d. an hour than it
would be if he were a chainmaker earning only 6d. an hour.* This
point can be met by stating, not simply the change in the value of
the variable, but the ratio of the new value to the old. For instance,
the patternmaker in the above instance has had his wages increased
[* Wages today are, of course, much higher — the above figures are only hypothetical.]
MEASUREMENT, VARIABLES, FREQUENCY DISTRIBUTION 7
in the ratio of Is. 9d. to Is. 6d. It is important to notice that
this form of measurement is quite independent of the particular
units used ; if we take Id. as unit, the ratio=21/18=7/6, and if
we take Is. as unit, the ratio=l/IJ=7/6 just as before.
There are other ways of measuring this change in the value of
a variable. One of the commonest is to express it as a percentage
of the original value ; thus the patternmaker's increase is at the
rate of yVxlOO, or 16f per cent., which is simply the ratio of
increase in wage to previous wage multipUed by 100. The multiplier,
lOG, is quite an arbitrary factor, but it has obvious advantages : among
others, it works well with the decimal notation and it often serves
to put the result into a form which is greater than unity instead of
leaving it as a fraction. Again, a man who gets a dividend of £25
on an investment of £500 receives interest at the rate of ^ X 100,
or 5 per cent. ; in other words, this is the rate at which his capital
accumulates if the interest is added to it instead of being spent.
Annual birth rates and death rates, on the other hand, are best
expressed per thousand of the population, as estimated, say, at
the middle of the year in question ; e.g. the birth rate of the United
Kingdom in 1911 was 244 per thousand, and the death rate was
148 per thousand, which is equivalent to 244 and 148 per 10,000
of the population respectively. If we could assume the birth
and death rates to remain constant from year to year, and if we
could afford to leave migration out of account, the population
would be subject to exactly the same law of increase as capital
accumulating at compound interest [see Appendix, Note 1], thus : —
1. If P be the original population, and if the annual net increase
be at the rate of 25 per thousand, then
the population in I year's time=Px (1*025)
2 „ =Px (1025)2
3 „ =Px (1025)3
n „ =Px (1025)".
2. If £P be the original capital, and if the annual increase be at
the rate of 2 J per cent., then
the capital in 1 year's time==Px (1025)
„ 2 „ =Px (1025)2
„ 3 „ =Px (1025)3
„ „ n „ =Px (1025)".
Lest we may seem to have laboured to make plain what is really
a simple idea, it may be remarked that quite frequently confusion
arises with regard to percentage even in reputable quarters. As an
'8
STATISTICS
illustration of the kind of mistake which, without thinking, is easily
made, the following argument has been taken from a monthly
circular sent out a little while ago to the members of the Boiler
makers' Society by their Secretary : Since July 1914, wages have
risen 15 jper cent., the cost of living has gone wp 45 'per cent., therefore
the workers' real wages have fallen 30 per cent. This same argument
was quoted shortly after in one of the leading articles of The Man
chester Guardian under the heading ' Prices and Wages,' and again
in The Labour Leader tersely as truth ' In a Nutshell,' but in
neither instance did it seem to have occurred to the writer that it
was inaccurate. It may be worth while for the sake of clearness to
show what the statement should have been : —
Wages.
Cost of
Living.
Ratio of Wages to
Cost of Living,
Same Ratio
multiplied by 100.
July 1914 .
October 1916 .
100
115
100
145
1
100
79
Since
11 5
14^
X 100 is roughly 79, this calculation shows that ' real
wages ' had faUen only about 21 per cent. (100—79=21), and not
30 per cent, as stated, between the two dates.
Index Numbers. A very important case of variables changing
with time appears in the discussion of changes in the value of
money as measured by the movement of prices of commodities,
introducing the notion of an index number. For example, supposing
the wholesale price of beef was 6d. a lb. at one date, 8d. a lb. at
another date, and 5Jd. a lb. at a third date, the change might be
exhibited as in the following table : —
1st Date.
2nd Date.
3rd Date.
Price of beef
6d.
100
Sd.
133
5K
92
Here 100, 133, and 92 are called index numbers, the price at the
first date being taken as a standard and denoted by 100, while
the prices at the other two dates are altered proportionally, so that
6:8:5J=100:I33:92.
Index numbers calculated on this principle have been published
systematically for several years by Mr. A. Sauerbeck (in the Journal
MEASUREMENT, VARIABLES, :E'REQUENCY DISTRIBUTION 9
o/ the Royal Statistical Society up to January 1913, and continued
afterwards in The Statist under the supervision of Sir George Paish)
and in The Economist.
In Sauerbeck's index numbers the average wholesale prices of
forty five commodities for the eleven years 186777 are taken as
the standard, being denoted each by 100 as above, and the prices
of the same commodities for any other year are then written as
percentages of these standard prices. The commodities chosen are
various — ^food of all kinds (cereals, meat, potatoes, rice, butter,
sugar, coffee, tea), minerals (including coal), textiles, and sundries
(including hides, leather, tallow, palm oil, olive oil. Unseed,
petroleum, soda, soda nitrate, indigo, timber). Articles of similar
character are grouped together ; naturally no class is exhaustive,
but the selection is a fairly representative one. A sort of general
average is then formed by combining all the results, and the move
ment of this average is taken to measure changes in the value of
money. An example will make clear the way in which an index
number for each group and the general average are obtained.
The index number for each separate commodity may be first
calculated thus : —
Price of English Wheat.
Years,
QtX: ; Index Number.
186777
1912 .
s. d.
54 6 100
34 9 64
Now forming similar index numbers for each of the eight vegetable
and cereal foods and combining them together, we have : —
Index Numbers for Vegetable and Cereal Foods.
Years.
1^
0)
T
1
1
i
^1
<
186777 .
100
100
100
100
100
100
100
100
800
100
1912.
64
68
70
79
83
85
74
101
624
78
The figures in the last column but one are obtained by simply
adding the figures in the eight previous columns, and, dividing these
10
STATISTICS
results by eight, we get the average index number for the group
in 1912 as a percentage of that in the standard years 186777.
Treating all the other commodities in the same way we ultimately
get index numbers for all the different groups and for all com
modities combined as follows : —
Index Numbers for different Groups and
FOR ALL Commodities.
No. of CoTnmodities
8
7
4
19
7
8
11
45
Years.
<
6
1
in
1
X
w
1
3
SO
OS
S
6
186777 .
1912 ....
100
78
100
96
100
62
100
81
100
no
100
76
100
82
100
85
The index number for ' All Food ' is obtained by summing the
nineteen index numbers for the separate commodities which are
included in this class and dividing the result by 19. Similarly the
general index number for all commodities is obtained, not by
adding the numbers for the different groups and dividing by the
number of groups, but by adding the fortyfive index numbers of
ail the separate commodities and dividing the result by 45.
In The Economist the average prices of twentytwo commodities
for the years 19015 are taken as the standard, being denoted
each by 100, and the prices of the same commodities for any other
year are then written as percentages of these standard prices ; the
sum of these percentages is taken as the index number, and it is
a simple matter to divide by 22 if we wish to get the average per
centage change. The following table explains the method of
calculation : —
Index Numbers aHbESENTiNG Prices of Commodities
Date.
Cereals
and
Meat.
^^L Textiles.
Minerals.
Miscel
laneous.
Total.
Index No.
22.
19015 .
End of Dec. 1916
500 1 300 500
1294 553 11245
400
8245
500
1112
2200
4908
100
223
MEASUREMENT, VARIABLES, FREQUENCY DISTRIBUTION 11
In this table five commodities are included under the head of
* Cereals and Meat,' three under ' Other Foods,' and so on. The
numbers in the last column are obtained by dividing those in the
previous column by 22.
It is clear that what is at bottom the same principle may be
appHed in any case of a variable changing with time when we wish
to measure the extent of the change, so that the use of index numbers
is not confined to the problem of prices. We shall return again to
discuss one or two further points in connection with the same
subject in the Chapter on ' Averages.'
Frequency Distribution. So far we have been thinking more
particularly of the change which an individual variable, or a col
lection of such variables, may undergo in the course of time, or the
difference between two values which the same variable may have
at two different instants of time, and how to measure it. Now
the science of Statistics is based upon the study of the crowd
rather than of the individual, although observations on individuals
have to be made before they can be combined together to produce
the crowd, just as individual incometax schedules have to be
completed and combined before the balancesheet of the State can
be drawn up. As we pass from one individual to another there
may be great differences in the organ or character observed — hence
the word variable already introduced — but in the mass these differ
ences are merged together and lose their individual importance :
it is rather their resultant effect we seek to measure. In order
therefore to discover this effect it is necessary to make a collection
of individual observations and to analyse the results. Now if our
ultimate conclusions are to be safe the number of observations
must be considerable, and in order to be able to cope with them
and reduce them to some sort of system the first step in the analysis
consists in arranging them in different classes according to the
value of the variable under consideration.
It is to be noted that now we are ""frtpr with changes in the
value of a variable as we pass from onRy^ff0>midual to another at the
same period of time and under the same ^ifteral conditions, and not
with the change in a variable in the same individual occurring with
the lapse of time. We wish, for example, to draw a distinction
between (1) the change in wages as we pass from one man to another
at the same time in the same trade, and (2) the change in wages of
the same man, or class of men, in the same trade occurring in a
given period of time ; in the fii'st case we want to find the amount
12 STATISTICS
of diversity within the trade at some stated time, and in the second
our object is to discover whether an improvement has taken place
in the wages of a particular individual or a particular trade with
the passage of time.
In picturing variation of the first type the conception arises of a
frequency distribution where the observations are distributed in
ordered groups, with a number corresponding to each showing
how many, or how frequent, are the individuals possessing the type
of variable or character which defines that group. More generally,
if a series of measurements or observations of a variable y are
made corresponding to a selected series of another variable x we
get a distribution, which becomes a frequency distribution when y
represents the frequency of events happening in a particular way,
or of individuals corresponding to a particular value of some
common variable or character, represented by x. Thus (1) the
boys in a school might be grouped according to their intelligence :
so many, dull ; so many, of ordinary intelligence ; and so many,
bright or above the ordinary. Again (2) in an inquiry into the
housing of the people in any town or district it would be necessary
to draw up a table showing the number or frequency of existing
tenements with one room, the frequency of tenements with two
rooms, the frequency of tenements with three rooms, and so on.
Once more (3) a zoologist, wishing to discover whether crabs of a
certain species caught in one locality differ in any remarkable way
from members of the same species caught in another locality, might
start by making measurements of the length of carapace or upper
shell for crabs of like sex in the two places and then proceed to
form frequency tables for each, setting out the frequency of crabs
for which the carapace length lies, say, between 5 and 6 millimetres,
the frequency with length between 6 and 7 millimetres, the frequency
with length between 7 and 8 millimetres, and so on. He would
then have in these tables some basis for comparing the specimens
caught in the two locaHties.
The three illustrations just used give three different types of
distribution corresponding to the three types of variable to which
attention has been drawn before. In the first, where the variable
or character observed is not measurable, doubt will sometimes
arise as to the appropriate class in which individuals should be
placed who seem to be on the border line between dulness and
mediocrity or between mediocrity and brilliance, so that accurate
classification will greatly depend upon what is called the ' personal
equation ' of the observer. The second illustration corresponds
MEASUREMENT, VARIABLES, FREQUENCY DISTRIBUTION 13
to the case where the variable changes not continuously but by
unit stages ; the choice of classes in such a case depends little
upon the observer unless the unit is very small compared to the
total range of variabiHty; for example, a tenement might either
definitely have two rooms or it might have three rooms, but it
clearly could not be put down as having 2J rooms or 2^ rooms :
in other words, the only natural classification is so many tenements
with two rooms, so many with three rooms, so many with four
rooms, and so on, though here too some confusion might arise
through failure to define clearly what is ' a room.' In the third
tjrpe, where we can conceive of the continuous variation of the
character under observation, there would be nothing surprising in
the appearance of any value of the variable between the lowest
and highest values observed ; the choice of suitable limits for the
several groups becomes therefore in this case rather a delicate
matter which requires careful judgment.
We shall begin the next chapter with some general remarks
upon the subject of classification and tabulation.
CHAPTER III
CLASSIFICATION AND TABULATION
No part of Statistics is of more importance than that which deals
with classification and tabulation, and it is the one part for which
no very precise rules can be given. A neat arrangement of ideas
in the mind, capacity to express them clearly, and patience are
indispensable, but experience alone will convince one of the extreme
care which must be exercised if blunders are to be avoided and
time is to be saved in the long run. This has to be emphasized
because most people, until they have tried and failed, imagine
that to arrange things in classes and in tables is a straightforward
proceeding involving no great thought or trouble.
Abundant matter of a statistical character is published periodi
cally in Bluebooks, Government Reports, Reports of Local Authori
ties, Directors of Education, Medical Officers of Health, Chief
Constables, Employers' Associations, Trade Unions, Cooperative
Societies, etc., but it needs a trained intelligence as a rule to assimi
late it and turn it to further advantage. The larger the scale upon
which any inquiry is made, the more valuable should the results
be, granted that equal accuracy is possible on the large as on the
small scale, but it is fairly clear that mistakes of various kinds
have also much more chance of creeping into a large work than into
a small one. To appreciate the various and numerous' possibilities
of error when the scope is wide it is enough to read the introduc
tions to the RegistrarGeneral's Reports on the Census from decade
to decade ; this should also impress the student with the care that
is necessary if he proposes to use such material for the investigation
of some other problem. It may seem a comparatively simple task
to abstract two sets of figures from a Census Report, to establish
a onetoone correspondence between them, and to make deductions
therefrom, but such figures when taken from their context will
sometimes lead to absolutely unsafe, if not false, conclusions. The
exact meaning and limitations of any data can only be properly
appreciated by one who has been closely in touch with the persons
who have collected them, and it is therefore important, before
14
CLASSIFICATION AND TABULATION 15
attempting to re classify or re tabulate any old statistics for a new
purpose, to read very carefully through the notes made by the
original compilers.
Perhaps the best advice that can be given to any one in this
connection is that he should embark upon some small inquiry
which will necessitate the collection of statistics for himself ; the
final result of his efforts may seem disappointing, but the experi
ence he will gain will be invaluable. Ideas for such an inquiry will
occur to him if he reads through some authoritative work on social
questions, e.g. Beveridge's Unemployment, the decennial Census
Reports, or The Minority Report on the Poor Law (1905). But he
must read with an open and critical mind, questioning particularly
the foundation for all statements as to cause and effect which may
be made. A few simple hints may be useful as to method of
procedure.
When he thinks he has discovered some subject of interest which
would appear to deserve examination, it wiU be well to put it
down on paper in order to get it clearly defined, because a precise
written statement is likely to carry one further than a shadowy
idea somewhere at the back of the mind which is hardly formu
lated at all. When the actual collection of statistics is begun
it will almost certainly be found that it is impossible to solve the
original problem contemplated ; but that need not prevent further
progress — what is important is that the limitations should be
exactly realized, and this will be impossible unless the original
problem is clearly presented side by side with the nearest solution
obtainable.
The problem stated, the next thing is to set down categorically
a number of questions, the answers to which are to be the raw
material for the solution of the given problem. For the answers
let us assume the inquirer is dependent upon the goodwill of others,
either employers, or trade union secretaries, or public officials.
The questions in that case must be clearly, concisely, and courteously
phrased, and must not be capable of more than one interpretation.
In number they should be few and in character not inquisitorial ;
moreover, the replies should be obtainable without any great labour
on the part of the persons approached. Here again it will be fou^nd
that the questions first set down are not all satisfactory : one will
be too vague ; another, though clear enough, may involve a con
siderable search through a mass of other matter before it can be
properly answered ; while to another it might be impossible to give
an exact reply in any case. Revision and amendment may there
16 STATISTICS
fore be necessary in the light of the first replies received, and the
inquirer will begin to see at this stage how far the solution to his
original problem is reaUy possible.
When the bulk of the returns have come in they should be critically
examined one by one. A number will, for one reason or another,
be worthless, and they must be discarded ; as for the remainder,
if the questions were well chosen, the answers should not be difficult
to interpret and classify ; the most successful questions are those
to which a simple ' yes ' or ' no ' in reply gives all the information
required ; numerical answers are less easy to deal with, especially
if there is the least chance of misunderstanding on either side as
there often is, for example, in the case of observations which are
on the border line between two classes.
Tables should then be drawn up and the headings to the different
columns of the tables should state concisely and exactly what the
figures below represent. So far as possible any one should be able
readily to grasp their general meaning without being obliged to
wade through a page or two of written explanation ; if any heading
cannot be clearly expressed in a few words it may be helped out
by a further note at the bottom of the page, but too many such
notes are to be avoided.
Finally, a summary should be made of the various conclusions
suggested by a study of the tables. Some of the points raised in
the course of the inquiry will perhaps be only incidental to the
main problem under discussion, but may still deserve a passing
reference. It will also be of advantage to foUow up the summary
by any recommendations which can be fairly based on the con
clusions obtained, when the problem is such that recommendations
are expedient, and, if ultimately the whole is of sufficient value to
be printed, emphasis can be introduced where necessary by suitable
variations in t3rpe.
For this part of the work considerable judgment is necessary
which can only be acquired by long training — a faculty to pick out
the real from the false and an eye to distinguish the important from
the trivial. A sense of numerical proportion too is desirable inci
dentally ; one of our leading exponents on finance in a book dealing
with the meaning of money uses a very interesting illustration which
is perhaps worth quoting here to show how even an acute mind
may on occasion prove itself curiously lacking in such a sense.
He is seeking to show how the credit system of the country is built
upon a foundation composed of a little gold and a lot of paper ;
for this purpose he amalgamates together the balancesheets of half
CLASSIFICATION AND TABULATION
17
a dozen big banks, and proves that their habilities on current and
deposit account amounted at a certain date prior to 1914 to 249
million pounds, while the cash in hand and at the Bank of England
was 43 millions. Of the 43 millions he estimates that roughly
20 millions would be cash in the Bank of England, and further
that about twothirds of this 20 millions would be represented really
by securities and not by gold. Hence he concludes that to support
this vast erection of credit there would only be £6,666,666 of actual
gold. Thus after talking throughout in millions the author closes
by giving his iesult true apparently to a pound !
Much may be learnt as to methods of classification and the
drawing up of tables by a careful study of those which appear in
various official reports, and a few such tables are reproduced in
the pages which follow.
Table (1). Condition as to Cleanliness of
School Children in Surrey.
Cleanliness.
5 years, 190812. 79,070 children inspected.
Above the average .
Average
Below average
Much below average
154 per cent.
765
76
05 „
Table (2). Condition as to Infectioijs Diseases of
School Children at Different Ages in Surrey (1913).
Age Groups inspected
56
89
1314
Total at
All Ages.
Numbers inspected
5,191
5,151
4,962
15,304
Proportion who before inspe
c
tion had suffered from —
per cent.
per cent.
per cent.
per cent.
Diphtheria .
13
35
54
34
Scarlet fever
27
72
109
69
Measles
553
793
846
729
Whooping cough
418
564
543 !
509
German measles
1 29
51
75 •
51
Chicken pox
1 261
401
386
349
Mumps
i 106
220
298
207
No infectious diseases
189
61
47
100
No definite information
33
22
09
22
18
STATISTICS
Table (3). Height op School Children according to
District, Age, and Sex (1913).
Age
' Groups.
Boys.
Girls.
1
Nos.
measured.
1 "
Average
Height
in inches.
Average Height
in cms.
Nos.
measured.
2467
2573
2433
Average
Height
in inches.
Average Height
in cms.
Surrey.
England
and
Wales.
Surrey.
England
and
Wales.
1026
1194
1442
56
89
1314
2724
2578
2529
414
478
570
1052
1214
1448
1034
1204
1424
413
475
579
1049
1207
1471
The first four are taken from the Annual Report of the School
Medical Officer for the County of Surrey, 1913. The first is an
example of single tabulation showing the distribution according to
cleanhness of children inspected in the elementary schools. The
second is an example of double tabulation, showing the distribu
tion according to age of school children who at some period before
the date of inspection had suffered from certain infectious diseases.
The third is an example of quadruple tabulation, showing the dis
tribution of school children according to height, district, sex, and
age. Thus in the first case we have one factor brought into relief,
viz. cleanliness ; in the second case we have two factors, age and
disease ; in the third case we have four factors, height, district,
sex, and age.
When we have two or more factors tabulated together as in cases
(2) and (3), we may be sometimes led to discover a connection of
some kind, possibly causal, between them, and the search for such
a connection, or correlation as it is called, represents one very useful
purpose to which tabulation may be put. Table (4) is an illustra
tion of this. It is the result of certain measurements carried out in
order to discover the effect of employment out of school hours upon
the physical condition of boys. The particular factor examined as
the possible cause of evil in this connection is lack of sleep, and
the figures given certainly seem to warrant a closer examination
into the matter.
CLASSIFICATION AND TABULATION
19
Table (4). Physical Condition of certain Boys according
TO Hours op Sleep Obtained.
No. of Hours
Sleep obtained.
No. of Boys
examined.
Average
Height in
inches.
Average
Weight in
lbs.
Nutrition.
Percentage
above
average.
Percentage
average.
Percentage
below
average.
7 to 8 .
8 to 9 .
9 to 10 .
10 to 11 .
11 to 12 .
14
80
296
280
50
545
554
564
579
59.0
713
739
793
832
870
71
101
153
228
220
358
659
645
665
680
571
240
202
107
100
Tables (5) and (6) are two illustrations of neat tables, containing
a large amount of information in a small space, set out in such a
form that the eye can easily take it in — and that is the main purpose
of tabulation. These examples are selected from the Sixteenth
Abstract of Labour Statistics of the United Kingdom, Cd. 7131.
In Table (6) note the classification of age groups : it is not ' 5 to
10 years,' ' 10 to 15 years,' and so on, but ' 5 and under 10 years,'
' 10 and under 15 years,' and so on. This removes difficulties at
the border lines between two classes ; the difficulties are not com
pletely removed, however, unless there is some understanding as
to what shall constitute under any particular age. Shall it be six
months under, or one day under, or one hour under ? This sort
of ambiguity has more importance in some cases than in others.
Suppose, for example, we were classifying men according to their
height : a group of the type ' 60 inches and under 62 inches,'
assuming that measurements were made to the nearest halfinch,
would really include all men who were ' 59J inches and under
61 1 inches ' ; because one who measured anything from 59f in.
to 60i in., being nearer to 60 in. than to 59 J in. measuring to
the nearest half inch, would be registered as 60 in. in height, while
one who measured anything from 61f in. to 62J in., being nearer
to 62 in. than to 61J in., would be registered as 62 in. in height.
Another point to be noted is that in general people making
returns seem to have a psychological weakness for round figures,
so that a man in the neighbourhood of 40 years of age, for example,
is apt to record himself as actually 40 although he may really
20
STATISTICS
Table (5). Classification of Overcrowded Tenements — *
England and Wales (1911).
Urban Districts.
Rural Districts.
Total.
Occupants
thereof.
Occupants
thereof.
Occupants
thereof.
Tenements
WITH
No. of
Over
crowded
Tene
ments.
No. of
Over
crowded
Tene
ments.
No. of
Over
, crowded
1 Tene
i ments.
No.
Per
cent
age of
total
No.
Per
cent
age of
total
No.
Per
cent
age of
total
popu
lation.
popu
lation.
popu
lation.
1 room .
2 rooms .
3 rooms .
4 rooms .
56,290
119,695
107,892
64,470
206,022
712,613
847,937
624,747
07
25
30
22
1,545
15,397
22,380
17,341
5,748
91,458
175,988
167,969
01
12
22
21
57,835
i 135,092
i 130,272
' 81,811
211,770
804,071
1,023,925
792,716
06
22
28
22
5 or more
rooms .
21,200
251,405
09
4,700
55,585
07
25,900
306,990
08
Table (6). Population grouped according to Age
England AND Wales (1911).
males.
Age Groups.
Urban Districts.
Rural Districts.
]
All Districts.
]
Number.
Percentage.
Number.
Percentage.
Number.
Percentage.
Under 5 years
1,517,432
113)
418,681
106^
1,936,113
111^
5 and under 10 years
1,431,900
'nw^
415,395
105 Li.o
103 P^ ^
1,847,295
^^•*H412
100 p^ ^
10 „ 15 „
1,341,586
406,045
1,747,631
15 ,, 20 „
1,267,500
94
387,395
98j
1,654,895
95J
20 „ .30 „
1 2,332,135
173^
626,300
159^
2,958,435
170^
30 „ 40 „
2,094,934
155 1444
542,370
137 Uo9
2,637,304
151 U36 
40 ,, 50 „
! 1,556,818
116
444,360
II.3J
2,001,178
11.5/
50 „ 60 „
i 1,042,868
77^
333,368
84)
1,376,236
79)
60 „ 70 „
1 612,741
45 V144
230,306
58 V179
843,047
48 V 152
70 and upwards
1 296,246
22j
147,228
37j
443,474
25 J
Total
13,494,160
1000
3,951,448
1000
17,445,608
1000
[* For the purpose of the Census Reports 'ordinary tenements which have more
than two occupants per room, bedrooms and sittinjjiooms included,' are considered
overcrowded.]
CLASSIFICATION AND TABULATION 21
be 39 or 41 years old. To diminish the error arising from this fact
it is usual, when not otherwise inconvenient, to fix the centres
of the classintervals at round figures : e.g. to take * 15 and under
25 years,' ' 25 and under 35 years,' etc., in preference to ' 20 and
under 30 years,' ' 30 and under 40 years,' etc. Where there is
any known bias in the data, as, for instance, in the famihar case
of certain women who consistently register themselves as younger
than they really are, a correction can be made in the final figures.
In any frequency distribution where we wish to group a number
of observations according to the magnitude of some common
variable, as in Table (6) a number of males grouped according to
age, the question arises — ' How many groups should there be ? '
With this question is involved also the size of the corresponding
classinterval, and this should be so large that, with possible excep
tions at either extremity of the table, there are a fair proportion of
observations to each class or group ; and, contrariwise, it should
be so small that all the observations in any one group may be
treated practically as if they were located at the centre of the group
so far as the variable in question is concerned, e.g. it should be
possible to treat males recorded in class ' 50 and under 60 years,'
where the interval is 10 years, as if they were all of age 55 years. It
will be found in general that a number of groups somewhere in the
neighbourhood of 20 is the most satisfactory, granted that the
number of observations is reasonably large, although in some cases
it is impossible to spHt up the unit of classinterval, and we are
obliged to be satisfied with a smaller number of groups on this
account : Table (5) is a case in point where we are tied down to
one room as the class interval. In Table (6) the classinterval
varies, being only 5 years at first, and afterwards 10 years, but
as a rule the labour of calculation of the different statistical constants
we require is considerably simplified if it is possible to keep the
size of the classinterval the same for each group.
CHAPTER IV
AVERAGES
Common Average or Arithmetic Mean. Let us consider one of the
commonest meanings of the term average. If a train travels a
distance of 180 miles in 3 hours we say that it has been moving
at 60 miles an hour. By this we do not mean that its speed is
always 60 m/h, never more, never less, but that if it had moved
always at that uniform speed it would have accomplished its
journey in exactly the same time. As a matter of fact, during
some instants it may have been moving at a much slower rate
than 60 m/h, but, if so, it must have made up for this slackness
by travelling at a much faster rate than 60 m/h during other
instants, so that on the whole a balance was effected, and, as we
say, the speed averaged out at 60 m/h.
Again, suppose the wages of three men are : A, 27s. a week ;
B, 18s. a week ; C, 30s. a week. We should say that the average
wage of the three was equivalent to
J(27+18+30)s.=25s. a week.
In other words, if A, B, and C were all under the same employer,
and if, instead of paying them different amounts, he wanted to
pay them all equally, he would have to give each man 25s. a week,
assuming that his total wages bill was to remain unaltered. This
method of measurement gives what is known as the arithmetic
mean, or, more simply, the mean.
Once more, in discussing the state of the labour market as regards
different trades, when we wish to compare one with another, it is
not the actual numbers unemployed in each trade that are quoted,
but these numbers expressed as percentages of the total numbers
employable in each trade.
In each of these three cases we reduce our observations or
measurements to a sort of common denominator, so that they may be
mentally compared or contrasted more readily with other observa
tions of a similar character. Thus we have in mind a certain mean
AVERAGES 23
train speed per hour, or mean wage per week, or moan percentage
out of work; as the case may be.
An average then in general we may regard as one of a class
of statistical constants (others of which we are to meet later) which
concisely label a set of observations or measurements pertaining
to a common family. It is designed to describe the family type
more nearly than is possible by observing any chance member, and in
value it should therefore come somewhere near the middle of the
family group, so that if the individual members of the family
chance to be equal each to each in respect to the organ or character
observed it should have the same value as they have. This consti
tutes a test for the validity of any formula giving the average of a
set of observations : e.g. we might, if we wish, define the average
of three numbers, p, q, r to be, not J(i?+g+r) but
for (1) this formula, too, can be shown to give a number intermediate
in value between the greatest and least of the numbers j), q, r ;
also (2) if we put p=q=r=k (say), the formula reduces to
l/i{J(^+k^iJ(^)= X/k^=k.
Clearly the range of choice for the definition of an average is
infinite, though only a few definitions give averages which have
proved their utility and come into general use. Of these the most
important is the common mean already introduced, with its ex
tension, the weighted mean, but at least two others deserve special
consideration, the median and the mode.
Median. In any observed distribution if aU the individuals
can be arranged in order of magnitude of the character or organ
observed, which may be conveniently done when they are not very
numerous, the median organ or character will be that pertaining to
the individual halfway along the series, so that there are in general
an equal number of individuals above and below the median.
For instance, if seven boys of different heights be placed to stand in
a row, the tallest first, the next tallest next, and so on, the median
height is the height of the fourth boy from either end. If there
are an even number of boys, say eight, it would be natural to take
as median the height midway between that of the fourth and that
of the fifth boy.
When the items are numerous they are frequently grouped into
classes, as we have seen, such that all in the same class are reckoned
24 STATISTICS
to have some value lying between the extreme limits of that class.
We should then, as before, halve the total number of observations
to fix the i^articular individual which defines the median organ or
character. This would enable us to pick out the group in which
the median lies, and on reference to the original record of observa
tions, assuming it was at hand, it would be a simple matter to
identify the median.
If the original record be not available, however, it will be neces
sary to proceed to get the best value we can for the median in some
other way. Consider, for example. Table (7), showing the distribu
tion of marks obtained by 514 candidates in a certain examination.
We begin by rearranging the data in the manner shown below.
Table (7). Now in accordance with the definition the median in
marks should, strictly speaking, be midway between the marks
assigned to the 257th candidate and the marks assigned to the
258th candidate : in fact, the marks corresponding to candidate
number 2575, if it were possible for such a candidate to exist.
But we are ignorant so far as Table (7) goes of the marks gained
by either the 257th or the 258th candidate, though it is possible,
by the simple proportional process known as ' interpolation,' to
calculate approximately the marks we require. We think of all
the candidates as forming an ordered sequence, ranged one after
the other according to their marks just like the boys of different
heights, and the table shows that in this mental picture
the 231st candidate gets approximately 30 marks, while
„ 318th „ „ „ 35
Hence candidate number 2575, if one existed, ought to get a
number of marks somewhere between 30 and 35. But, in this
neighbourhood of the sequence,
a difference of (318231) candidates corresponds to a difference
of 5 marks, therefore
a difference of (257'523l) candidates corresponds to a difference
of (mtX265) marks.
Thus the marks obtained by candidate number 2575 are ap
proximately = 30+ ^T X 265
=31523,
and this may be taken as the median.
On examining the actual markssheet it was found that 252
candidates obtained 31 marks or less, and 273 candidates obtained
AVERAGES
25
32 marks or leas, so that the real median was 32, because this was
the number of marks gained by both the 257th and the 258th
candidates. The number 31523 found above, however, would be
a good approximation to take for the median when all the informa
tion at our disposal was that shown in Table (7).
Table (7). Marks obtained by 514 Candidates in a
CERTAIN Examination.
Marks Obtained.
No. of
Candidates.
Marks Obtained.
No. of
Candidates.
lto5
6 to 10
11 to 15
16 to 20
21 to 25
26 to 30
31 to 35
5
9
28
49
58
82
87
36 to 40
41 to 45
46 to 50
51 to 55
56 to 60
61 to 65
79
50
37
21
6
3
■
Total
514
1
The table is to be read as follows : —
5 candidates obtained 1, 2, 3, 4, or 5 marks,
9 „ „ 6, 7, 8, 9, or 10 „
and so on.
By straightforward addition it can evidently be rearranged so
as to read thus : —
5 candidates obtained not more than 5 marks.
14
42
91
149
231
31,8
397
447
484
505
511
514
10
15
20
25
30
35
40
45
50
65
60
65
26 STATISTICS
It will be noted that in calculating the median no use is made of
the marks of any of the candidates except those in the two groups
in the immediate neighbourhood of the median, and it is one of
the great advantages of this average that it can be found when an
exact knowledge of the characters of the more extreme individuals
in the series is not in our possession, and even when their measure
ment is impossible : it is enough if they can be roughly located.
The arithmetic mean on the other hand is often unduly influenced
by abnormal individuals which are not really typical of the popula
tion in which they appear.
Mode. If we measure or observe some organ or character for
each individual in a given population, the mode, as its name sug
gests, is simply the organ or character of most fashionable or most
frequent size. A large draper, for example, will have collars of
several different shapes and sizes in his shop, but the fashionable
shape and the predominant size correspond to the mode : it is the
mode that sells most readily, and the intelligent draper will always
have it in stock. Again, in Table (2), the disease mode or fashion
able disease among certain school children inspected in Surrey in
1913 was measles, for a greater percentage of children had suffered
from measles than from any other of the diseases recorded.
Now when the variable in which we are interested is ' discrete,'
that is, when it changes by unit steps, leading to classes like ' tene
ments with 1 room,' ' tenements with 2 rooms,' ' tenements with
3 rooms,' and so on, it is an easy matter to pick out the class of
greatest frequency : thus, in Table (5) there are more overcrowded
tenements with 2 rooms than with any other number of rooms
in the urban districts, so that 2 is the mode so far as this character
(number of rooms) is concerned, whereas in the rural districts 3 is
the mode, for there are more overcrowded tenements with 3 rooms
than with any other number. There may be ambiguity, however,
in determining the mode in this way for a grouped frequency dis
tribution when we are dealing with an organ or character subject
to * continuous variation.' To cover such cases the modal value
has been defined as that value for which the frequency per unit
variation of the organ or character is a maximum. The precise
significance of this wording will only be appreciated after discussing
frequency curves : at present it must suffice to give a practical
illustration of how the ambiguity arises and calls for some more
refined treatment.
For this purpose turn again to the examination marks in Table (7),
AVERAGES
27
from which it appears that the mode, if it is to be the marks obtained
by the greatest number of candidates, should lie in the group
(31 to 35), since there are 87 candidates with marks between these
limits, and this number exceeds that in any other group. But
how are we to decide the exact point in the interval (31 to 35) which
is to correspond to the mode ? Shall it be 33 ? We might say
' yes ' if the distribution were perfectly symmetrical on either side
of the (31 to 35) group, but if we examine the neighbouring groups
we see that the balance leans rather more heavily to the (26 to 30)
group with a frequency of 82 than to the (36 to 40) group with a
frequency of 79, and we might allow for this by interpolating in
some way — ignoring, of course, any errors which may occur in the
frequencies themselves owing to the observations being generally
limited in number. But the pull in the direction of lower marks
becomes still more pronounced to our minds when we contrast
also the frequencies in the next groups on either side, namely
58 and 50. So we might go on until the influence of the whole
field of observations comes into action.
Now it so happened that in this particular case the original
markssheet was to be seen, and a regrouping of the candidates as
in Table (8) makes it clear that the value found in this way for the
mode may be artificially displaced sometimes to a serious extent
by the particular method of grouping adopted. Thus, according
to this new arrangement, the mode would seem to lie in the interval
(28 to 32), the midvalue of which differs materially from 33, the
mid value of the previous maximum frequency group.
Table (8). Marks obtained by 514 Candidates in a
CERTAIN Examination (Alternative Grouping).
Marks Obtained.
No. of
Candidates.
Marks Obtained.
38 to 42
43 to 47
48 to 52
53 to 57
58 to 62
63 to 67
No. of
Candidates.
73
45
31
12
3
3
3 to 7
8 to 12
13 to 17
18 to 22
23 to 27
28 to 32
33 to 37
1
10
17
35
56
47
108
74
1
Total
514
28
STATISTICS
[It should be observed that while an alteration of the grouping
may also affect the median, it does not affect it nearly to the same
extent : e.g. the median determined from Table (8) is 313, which
differs little from 315 the value obtained by the first grouping.]
If, again, we combine the results of our two groupings to find
the mode we might be tempted to conclude that it lies somewhere
between the limits 31 and 32, but on examining the original records
it was discovered that the real mode was 28. The frequency
distribution of candidates in this neighbourhood was in fact very
interesting ; it ran as follows : —
Number of candidates who obtained 25 marks =14
26 „ =10
27
28
29
30
= 6
= 33
= 17
= 16
The explanation of this peculiar distribution seemed to be that
28 marks were required for a candidate to pass, and apparently as
many candidates as possible were pushed over the pass line : if,
on the first marking, a candidate was found to want only one mark
to pass, the examiner presumably looked through his paper again
and did his best to find an answer which by kindly treatment
might be granted an extra mark. The effect of this leniency was
ultimately to leave only 6 candidates in the division immediately
below the pass line, and to swell the number immediately above
to 33, which thus made 28 easily the ' most fashionable ' mark of
any, the next largest group of candidates being only 21. It will
be observed that even a candidate who wanted 2 marks to pass
was treated in the same tolerant fashion, although it is not so
easy, of course, for a conscientious examiner to discover two extra
marks as it is to discover one ; and if the candidate is 3 marks
below the pass line it is still harder to give him the necessary lift
to carry him over. Thus in the final list we find more condidates
with 26 marks than with 27, and still more with 25 than with 26.
If the above diagnosis is correct, and aU markssheets tell the same
tale, who shall again say that examiners do not temper justice with
mercy ?
This example has illustrated fairly clearly the difficulty of fixing
the mode with any great precision by mere inspection when the
individuals are arranged in groups, the value of the variable under
discussion lying between prescribed limits for each group. While
AVERAGES 29
it is possible to get a rough approximation to its value in this way,
we conclude that for a really satisfactory determination we require
some method which makes use of the whole distribution, as in the
determination of the mean, and not merely of the portion in the
supposed neighbourhood of the mode. This must be left to a later
chapter ; we shall only point out before passing on that there
may sometimes be more than one mode in a given frequency dis
tribution just as there may be more than one fashionable type of
collar which it is expedient for the draper to stock in large quan
tities. The second grouping in the examination example suggests
such a possibility, for it will be noticed that the frequencies of
candidates do not rise steadily to a single maximum at 108 for
class (28 to 32), and then fall steadily : there is a previous rise and
fall in the neighbourhood of class (18 to 22).
Weighted Mean. Let us suppose a farmer employs for the
harvest 5 men, 3 women, and 4 boys. In estimating the amount
of work they can do in a given time it is clear that in general a
woman or boy cannot be reckoned as equal to a man. He must
therefore decide what ' weight ' must be given to each in proportion
to a man. If a woman's work be taken, for example, to be three
quarters as effective and a boy's work to be half as effective as
that of a man, we have as the appropriate proportional weights
1 :f a, or 4:3:2.
Hence 5 men, 3 women, and 4 boys would on the average be equiva
lent in output to
(5+3xi+4xJ) men
4x5+3x3+2x4
= men
^ =91 men.
An average of this type is called a weighted mean, 1, , and
I being the weights, because they tell us what weight to give to
each separate worker in calculating the average.
Let us consider the effect such weighting has in general upon a
mean, and for this purpose we shall test it on a set of index numbers
measuring rents in certain groups of towns in 1912, as given in a
Report on the Cost of Living of the Working Classes issued by the
Board of Trade (Cd. 6955).
30
STATISTICS
Table (9). Mean Index Numbers of Rents for certain
Geographical Groups of Towns in 1912 (with reference
TO Middle Zone of London as standard = 100).
(2) (8)
(1)
(4)
(6)
(6)
Geographical Group.
Rents.
No. of
Towns
included
in the
Group.
Each
Group
counting
as 1.
Arbitrary
Weights.
Approxi
mate sub
multiples
of Noa. in
previous
column.
Northern Counties and Cleve
land ....
Yorkshire (except Cleveland)
Lancashire and Cheshire
Midlands ....
Eastern and East Midland Cos.
Southern Counties
Wales and Monmouth .
Scotland ....
Ireland ....
660
585
569
523
534
637
648
620
517
9
10
17
14
7
10
4
10
6
1
27
54
45
125
63
14
22
178
55
3
6
5
14
7
2
2
20
6
Average . .
••
584
588
576
576
The first mean in the above table, 584, is obtained by multiply
ing (or weighting) the mean rent of each geographical group by the
number of towns in the group, given in col. (3), adding the numbers
so obtained, and dividing the total by the total number of towns,
thus : —
9(660)+ 10(585)+
+ 6(517)
9 + 10 +
+ 6
This is simply the arithmetic mean treating each town as unit.
The second mean, 588, is obtained by adding the mean rents of
all the groups and dividing by the total number of groups, thus : —
66^0+585+
"r~+^rT
+517
+ 1
This is the arithmetic mean treating each geographical group as
unit.
The third mean, 576, is obtained by multipljdng, or weighting,
the mean rent of each group by a perfectly arbitrary number given
in col. (5) ; the numbers selected were taken quite at random from
AVERAGES 31
another column of figures in another Bluebook, and had no con
nection whatever with the subject of rents ; this gives : —
27(660)+ 54(5 8'5)+ . . . +55(517)
27 + ~54 + . . . + 55~'
The last mean, 576, is obtained by choosing as weights any
numbers (and for simplicity we choose the smallest) as in col. (6)
which are very roughly proportional to the arbitrary weights used
in the last instance ; we thus get : —
3(660)+6(585)+ . . . +6(517)
3 + 6 + . . r+ 6~ *
Now the first of these means is clearly the most satisfactory, since
it is the result of very properly weighting the mean rent of each
group of towns according to the number of towns the group con
tains. But the second result shows that if we are ignorant of the
number of the towns in each group we shall not be very far out in
our calculation if we treat them all as of equal importance, and find
the simple arithmetic mean of the mean rents in the nine groups.
We can even go further, for we find, from the third and fourth results,
that by weighting the mean rents in the various groups on quite a
random basis, the mean we get still does not differ very greatly from
the best value first found.
The important principle of which the above example is an illus
tration is perfectly general, and may be stated as follows : If the
total number of measurements or observations be not very small,
and if the resulting values of the organ or character measured
(rent in our case) be not very unequal, any reasonable selection of
multipliers or weights (as, for instance, the first two adopted above)
will give means which differ from one another by but little ; and
even an apparently unreasonable selection of multipHers (as, for
instance, the third adopted above), assuming they are not so
wildly chosen as to give any particular group a very unfair weight
in comparison with the others, will not throw the mean out badly.
Further, in place of a set of large multipHers we may substitute
small numbers which are roughly proportional to them (as we have
done in the fourth case above), and the mean wiU again be very
little affected. [See Appendix, Note 2.]
CHAPTER V
AVERAGES {continued)
Applications of Weighted Mean. In determining the weighted mean
of a set of observations it is usual, of course, to weight each observa
tion according to its importance, though what number should be
chosen as a measure of its importance may sometimes be a matter
of doubt. It is not a very difficult matter to decide when we
wish, for example, to compare birth, marriage, or death rates in
two districts, if we know how the constitution of the population
in the one district differs from that in the other, for the weighting
in each of these cases must be in proportion to the population
concerned, and it is too important to ignore.
Death rate, crude and corrected. Imagine a city in which the
total number of deaths in a certain year is N out of a population
numbering P.
The ordinary or crude death rate for that city will then be
N
XlOOO, by deianition.
Now this number N may be analysed according to the ages of
the people who have died ; let us suppose it is made up of
^1 people between limits and less than 5 years of age,
^2 n ,, ,, 5 ,, 15
^3 ' '' " 15 " 25
and so on, where
^l + ^2 + '^3+ • • • =N
Again the number P may be analysed according to the ages of
the people who compose the total population, giving, say,
p^ of the population between limits and less than 5 years of age,
15
25
P2 „
5> 55
5
Vz »j "
>5 55
15
and so on, where
Pl^V2\lh\
AVERAGES 33
Thus we may write for the crude death rate
N
D= xlOOO
P
:_JIL_ 2^ 3^ X 1000
=''ilOOO+'^1000+'^1000+ . . .
^^/^MoooV^^^^ioooV^f'^ioooV . . .
where d^\^ the death rate between limits and less than 5 years of age,
^2 " " .5 5 . ,, 15 ,,
d^ „ „ „ 15 „ 25
and so on.
Now if we compare this expression with the corresponding one for
another city, say,
it is quite conceivable that the death rates in the various age groups
might be equal —
di^=d , d^==d^, d_=d^ . . .
and yet D might exceed D' because in the first city there are a
greater proportion of infants or old people, on which classes the
hand of death falls heaviest, that is, because the ^'s or weights
which multiply the biggest d's are greater in the first case than in
the second. But so long as the d's in the two cities are equal, age
group by age group, it would be reasonable to regard the cities as
equally healthy, or unhealthy as the case might be, and therefore
to insure a fair comparison it is usual in the Reports of the Registrar
General to give a corrected death rate in place of the crude death
rate defined above.
This is done by weighting the death rate for each age group, not
in proportion to the actual number of persons in that group in
the city itself, but in proportion to the corresponding number in
C
34 STATISTICS '
the country at large. Thus, if we denote the proportion of the^
population, Q, ,
between limits and less than 5 in the country at large by qJQ, !
» J5 15 ,, 25 „ ,, ,, QslHf i
and so on, we get as the corrected death rate i
i
toA+g'2<^2+9^3^3+ • • O/Q, i
a form wliich has the effect of making the results agree in two'
cities which have equal d'8 throughout.
A similar method of correction is clearly applicable in consider •
ing the incidence of the death rate when we are concerned not 
with a difference of district but with a difference of sex, occupation, ;
religious profession, wageearning capacity, or any other well j
defined character. Further, it may be used also in comparing birth !
rates, marriage rates, heights, weights, chest measurements, or any .
similar attributes, when it is necessary to refer the observations i
or measurements to a standard population in order to avoid \
complications due to age variation. '
There is another method of correction, equally general in appUca j
tion, which is useful when the death rates in the various age groups \
are not known. In this case D, the crude death rate for the whole i
population of the district is known, also pJ'P, 'P^l^, Psl^, • • • the \
proportions of the population between the various age limits, but
di, ^25 ^3 . • • are supposed unknown. i
Now if the population in the country as a whole were the same in ■
corresponding age groups as it is in the district under consideration, j
we should get as the death rate for the whole country j
where S^, Sg? ^a • • ^re the death rates in the various age groups in
the country at large, and these would in practice as a rule be known.
The actual death rate for the whole country is, however, ;
{qA+q2^2+q2^z+ . . . )/Q, \
where g'l/Q, g'2/Q' S's/Q • • • denote, as before, the real proportions !
of the population in the various age groups in the country at large. 1
We take as the corrected death rate required for the district a ]
number bearing to the crude death rate the same ratio as ■
{qA+q2K+ • . O/Q bears to {PiS,+p,S,+ . . .)/P. \
AVERAGES
35
Hence we have
corrected death rate_g'iSi 1^282+ ... P
iTidex Numbers to compare Household Budgets. Another highly
important illustration of a weighted mean occurs in the search for a
satisfactory measure of the change in the cost of Uving from year
to year. We have already introduced the subject of variation in
wholesale prices, and we have seen that Sauerbeck, in forming his
index numbers, treats as one each of the fortyfive commodities
he uses to measure this variation : the observations, that is to
say, are not weighted.
But, confining our attention to food alone, supposing we have
five items, such as bacon, bread, tea, sugar, milk, for which the
index numbers of prices at two different dates are : —
Bacon.
Bread.
Tea.
Sugar.
Milk.
First date
Second date
100
117
100
95
100
94
100
102
100
109
Is it really right to treat each of these items as of equal importance
with the rest, or ought we to regard bread and tea, say, as of more
weight than bacon, and count bread perhaps five times and tea
three times while counting bacon only once ? It is clear that, in
order to select a reasonable set of multipUers in this case, we should
need to know the standard of living of the class of people under
consideration, and how much in the aggregate they spend upon
bacon and how much upon bread, etc.
A partial answer to these questions can be obtained by making
a collection of household budgets as was done, for example, by two
Government Committees which recently reported (191819) on the
Cost of Living among the Urban and the Agricultural Worki'ng Glasses
respectively. If the number of commodities employed is large,
even an arbitrary set of multipHers, as we have indicated, will not
displace the mean any great distance from the value when reason
able weights are chosen, but unfortunately in collecting such house
hold budgets we are confined to the comparatively limited variety
of foodstuffs which are in general use.
Different principles may be followed in making the comparison
36
STATISTICS
between one year and another which may be illustrated by a few
figures from the Urban Classes Report (1918) : —
Table (10). Household Budgets showing Prices of bach Com
modity AND Quantities Purchased at Two Different
Dates by Typical Family.
Commodity.
First year (1914).
Second year (1918).
Price (pence
per lb).
ni
No. of lb.
bought.
a;2
Price (pence
per lb.)
n2
No. of lb.
bought.
Sugar.
Tea .
Potatoes
22
213
07
59
068
156
707
333
126
283
057
200
Let Xi be the price, in pence per unit, of any one commodity
at the first date, and let n^ be the number of units of this commodity
bought per week by a typical family {n may be estimated in different
ways, e.g. (1) by dividing the total number of units bought by
all famihes by the total number of those famiHes, or (2) by ranging
the different amounts bought by different families in order of
magnitude and picking out the median amount, or (3) by choosing
the mode, i.e. the amount most commonly purchased). Also let ajg
be the price, in pence per unit, of the same commodity at the second
date, and let Wg ^^ ^^® number of units of the commodity then
bought per week by the typical family estimated in the same way
as before.
The actual expenditure, measured in pence, at the two dates
will then be
Z{Xini) and ^(x^n^
respectively, where E(x^n^ simply denotes the sum of expressions
like (xjji^) for all the commodities recorded and ^{x^n^) denotes the
sum of expressions like (x^n^ for aU the commodities recorded,
Sy the old English S, being a wellknown conventional abbreviation
for ' Sum of expressions like.' Thus, with the numbers in Table (10),
we should have
2'(a;i7ii)=(22)(5'9)+(213)(068)+(07)(156)+ . . .
^(^2^2)=('707)(283)f(333)(057)+(l26)(200)+ . . .
AVERAGES 37
Taking 100 as the index number to represent expenditure at the
first date, the index number measuring expenditure at the second
date may be formed in any of the following different ways,* which
as a rule, of course, lead to different results : —
(1) lOOZ{x^n^)IU{x^n^) ;
(2) lOOi:(x^nj)IU{x^ni) or lOOZ (x^n^jjUix^n^) ;
(3) l002J{Xjn2)IZ{Xjnj) or 1002'(a:2?i2)/^(^2%)
The first of these expressions compares the acttuil expenditure at
the second date to that at the first date.
The next two expressions take into account directly only the
change in prices ; they compare, not actual expenditures but, the
expenditures at the two dates as they would be if the amounts
purchased at the two dates were the same : the first supposiug
these amounts to equal those actually bought at the first date,
and the second supposing them to equal those actually bought
at the second date.
The last two expressions, on the other hand, take into account
directly only the change in amounts purchased ; they compare
the expenditures at the two dates as they would be if the prices
ruling at the two dates were the same : the first supposing these
prices to equal those actually charged at the first date, and the
second supposing them to equal those actually charged at the
second date.
The particular method of weighting adopted must naturally
depend upon the circumstances of the period under discussion
and the nature of the inquiry one is making ; it is a nice question
to decide how far emphasis should be laid upon the old standard
of life (measured by food, lighting, rent, recreation, etc.) with the
expense required to maintain it, and upon the new standard of life
and the cost necessary to reach it.
It may be useful here to summarize a few of the questions of
interest which present themselves in connection with the formation
of index numbers of prices designed to measure changes in the
value of money in general without reference to any particular class
of the community : —
1. What years should be selected in fixing our standard prices ?
2. What commodities should be chosen as a basis for our
average ?
[* See also The Measurement of Changes in the Cost of Living, by A. L. Bowley, Sc.D.,
in the Jov/rnal of the Royal Statistical Society, May 1919, for a more complete dis
cussion of the subject.]
38 STATISTICS
3. What weight should be given to each commodity in relation
to the rest ?
4. How should the prices of the several commodities be deter
mined, bearing in mind that ' price ' itself frequently varies from
place to place ?
5. Finally, how should these prices be combined to give the
average required ? Should we use the simple arithmetic mean, the
geometric mean [see Appendix, Note 3], the median, or some other
measure ?
While we are not prepared to attempt to answer these questions
fully, seeing that authorities are not altogether agreed as to what
the answers should be, one or two points may be worth noting.
Generally speaking we may say that : —
1. The years selected in fixing our standard prices should be
years in which economic conditions were normal rather than
abnormal.
2. The commodities chosen . should be articles of general con
sumption, and as wide a field as possible should be covered in their
choice.
3. Many consider that little is gained by weighting, but, if
weights are introduced, the greater the importance of any com
modity in relation to the rest, judged for example by the relative
quantity consumed, the greater should be the weight assigned
to it.
4. The practical difficulty of assessing retail prices when they
are uncontrolled compels us in general to fall back upon whole
sale quotations, on which some light may be thrown by keeping
under observation the important markets for the sale of each
commodity.
5. The average commonly used is the simple arithmetic or the
weighted mean, though arguments can be adduced in favour of
other averages such as the median.
Leaving index numbers now on one side and returning to the
general subject of averages, we may remark that the question
which average is correct in any given case, the mean (weighted or
otherwise), the median, or the mode, does not arise : no one average
is more correct than another, because they are all entirely con
ventional and represent different ideas ; they correspond in fact
to so many different ways of summing up a set of observations or
measurements in a single numerical statement, and the real question
AVERAGES 39
to determine is which statement, which^ kind of average, brings the
set of observations before us to the best focus.
For this purpose one average will clearly be best in one case and
another in another, but it may be stated without hesitation that
the arithmetic mean is certainly the most useful of the three and
it is the most frequently used. Other averages have been sug
gested, such as the geometric and the harmonic means [see Appendix,
Note 3] familiar to students of Algebra, but they are only suitable
in a comparatively small class of problems.
In a reasonably symmetrical distribution of observations, one in
which the variables of medium size are the most frequent and the
frequency diminishes about equally on either side towards the
largest and the least of the variables, the values of the mean, the
median, and the mode will be found to lie all very close together ;
and a useful practical rule to remember is that the median comes
in general between the mean and the mode, the difference between the
mean and the mode being about three times the difference between the
mean and the median. This rule, for lack of a better, might be used
to determine the mode in suitable cases, or it might be used to test
the value found in some other way.
The general term ' average ' is frequently used when the par
ticular denomination ' arithmetic mean ' is implied, but the context
will usually prevent misunderstanding.
In order to get a clear impression of the outstanding features
presented by the three chief averages discussed, let us go over them
once more in the case of marks awarded to a number of students
in a class. All three may be regarded as in a sense measures of
the standard reached by the class as a whole in the examination,
but the measures are made in different ways : —
1. The Arithmetic Mean is found by merely dividing the aggregate
marks of the class by the number of the students, and it gives the
marks earned by each student if we conceive them all to be of
equal merit.
2. The Median is found by rangmg the students in order of merit
from top to bottom, and picking out the marks awarded to the one
who comes halfway down the list.
3. The Mode is the most fashionable number of marks, i.e. the
marks obtained by the greatest number of candidates.
The advantages and disadvantages of the three types may be
set out broadly as follows, although the boundary lines must not
be too strictly drawn : —
40
STATISTICS
Mean.
Median.
Mode.
Easy to calculate when
the values of the vari
able can be summed
and their number is
known.
Easy to pick out when
the individuals can
be ranged in order
according to the
value or degree of
the variable ob
served.
Not easy to determine
with precision, when
the observations fall
into groups of differ
ent ranges, without
fitting a frequency
curve to the distribu
tion as a whole.
Well designed for alge
braical manipulation,
as, for example, when
we wish to combine
different sets of obser
vations [see Appendix,
Note 4, for two illus
trations].
Unsuited for algebrai
cal work.
Unsuited for algebrai
cal work.
Affected sometimes too
much by abnormal in
dividuals among the
observations.
Determined merely by
its position in the
distribution, and its
actual value is thus
quite unaffected by
abnormal individuals.
Unaffected by abnor
mal individuals, and
owes its importance
to the fact that it is
located in the region
where the frequency
is most dense.
The reader should test his grasp of the principles so far intro
duced by applying them himself to a concrete case. For example,
he might use the data in Table (11), with regard to wages earned
by certain women, taken from Tawney's Minimum Wages in the
Tailoring Trade, and based upon the 1906 Wages Census. Let him
begin by roughly estimating the mean, the median, and the mode
from an inspection of the distribution. He might then proceed
to calculate the mean wage : —
(1) taking the actual frequencies given in the table ;
(2) taking simple submultiples of these frequencies, roughly one
hundredth part of each : 2, 4, 6, 7, 9, 11, etc. ;
(3) assuming unit frequency in place of that given in the table for
each wage group.
Finally, he might determine the median and the mode in the
manner explained in the text, deducing the latter from the relation
(mean— mode) = 3(mean— median) .
AVERAGES
The results obtained should be
41
(1) 1308S. ; (2) 13lQs. ; (3) 1559s. ^
Median=1253sKf Mode=ll'43s.
Table (11). Distribution of Wages of certain
Women Tailors.
(1
(2)
(3)
(4)
No. of Women
No. of Women
Wages betAveen limits
earning wages
as shown in
Wages between limits
earning wages
as shown in
Column (1).
Column (3).
5s. and less than 6s.
\m^
16s. and less than 17s.
642^^
68. ,
7s.
384
17s. „ „ 18s.
453
7s. ,
8s.
553^
18s. „ „ 19s.
401
8s. ,
9s.
690
19s. „ „ 20s.
272^^
9s. ,
10s.
900 j:^
20s. „ „ 21s.
251
10s. ,
Us.
1145
21s. „ „ 22s.
138
lis. ,
, 12s.
1201
22s. „ „ 23s.
124
12s. ,
13s.
1138
23s. „ „ 24s.
64
13s. ,
14s.
930
24s. ,, ., 25s.
5r~^
14s. ,
15s.
885
25s. „ „ 30s.
122
15s. ,
16s.
790 
..
••
CHAPTER VI
DISPERSION OR VARIABILITY
Let us suppose that two men set out separately on walking tours
and that they walk as follows : —
First Man
walks
Second Man
walks
First day .
Second „ .
Third „ .
Fourth „ .
Fifth „ .
Sixth „ .
20 miles.
20 „
25 „
25 „
30 „
30 „
15 miles.
20 „
25 „
25 „
30 „
35' „
6 days
150 miles.
150 miles.
The total distance covered in six days, namely 150 miles, and
therefore also the mean rate of walking, 25 miles a day, are thus
exactly the same in both cases, but the dispersion of the values of
the variable (the variable being in this instance the number of
miles walked per day) round about their mean value, the variability,
is different in the two cases. The greatest deviation from the
average in the first case is five and in the second case it is ten miles.
Thus, besides knowing the average of a set of values of a variable
it is important to measure the dispersion of the distribution. Are
the observations crowded in a dense mass around the average,
or do they tail off above and below it, and to what extent ?
In other words, what is the variability from the average of the
distribution ?
Mean Deviation. Now we are not concerned here with the signs
of the separate deviations, with the question, that is, whether any
particular value of the variable lies above or below the average :
42
DISPERSION OR VARIABILITY
43
it is only of their amount we wish to take cognizance, and perhaps
the most obvious way to measure the total variability and at the
same time to ignore the signs of the separate deviations from the
average is to add up these deviations, treating them all as signless,
and to divide the result by their total number. This gives what
is known as the mean deviation of the system of observations — it
is the ordinary arithmetic mean of the separate deviations, treated
as if they are aU in the same direction, and, in measuring them, we
may use either the mean or the median as the average, but it
would seem preferable to take the latter because the mean deviation
is least when the median is chosen as the origin, or zero point, from
which the differences are measured. The proof of this fact will
be found in Note 6 in the Appendix, but we may readily test it in
a given case.
Let us adapt the ' walking ' illustration used above, sUghtly
extending the figures and making them unsymmetrical, i.e. of
unequal variability on either side of the average, so as to prevent
the median coinciding with the mean. We then have an amended
table setting out the number of miles walked by a certain man on
successive days during, say, a fortnight's tour, as follows : —
Table (12). Number of Miles walked on Successive Days.
(1) (2) (3) (4) (5) (6) (7) (8)
No. of
days.
Miles
walked.
X
Deviation
from 25.
^1
Deviation
from
2464.
Xo
Deviation
from 24.
Xi
Deviation
from 2G.
[No. in
Col. (l)]x
[No. in
Col. m
[No. in
Col. (l)]x
[No. in
Col. (4)].
1
2
3
3
2
2
1
10
15
20
25
30
35
40
■
15
10
5
5
10
15
1464
964
464
036
536
1036
1536
14
9
4
1
6
11
16
16
11
6
1
4
9
14
15
20
15
10
20
15
1464
1928
1392
108
1072
2072
1536
14
••
••
••
••
••,
95
9572
The first two columns show that 10 miles was the distance walked
on the first day, 15 miles on each of the next two days, 20 miles
on each of the next three days, and so on until the last day, when
40 miles was the distance walked.
44 STATISTICS
The median in this case, being the number of miles walked on
the middle day when the days are ranged in order of mileage from
the least to the greatest, is 25, for this is the distance covered on
both the seventh and the eighth days which come halfway along
the series.
Col. (3) shows the deviations from the median, 25, of the distances
covered each day as recorded in col. (2), and col. (7) enables us to
sum these deviations when each is multiplied by the number of
days to which it corresponds, since these numbers, given in col. (1),
show how many times each deviation is repeated. Hence the mean
deviation, regardless of sign, measured from the median
=:[(lxl5)+(2xl0)+(3x5)f(2x5)+(2xl0)+(lxl5)]/14
= (15+20+15+10+20+ 15)/14
=95/14
=679 miles.
We may compare this with the corresponding deviations measured
from (1) the arithmetic mean, (2) the number 24, and (3) the
number 26 as origin respectively.
1. The arithmetic mean of the distribution is obtained at once
by multiplying the corresponding numbers in cols. (1) and (2),
adding the results, and dividing the total by 14, thus
1 + 2 + 3+3 + 2 + 2+1
10+30+60+75+60+70+40
14
=345/14
=2464 miles,
and the deviations from 2464 are shown in col. (4) ; the mean
deviation from 2464, obtained by combining cols. (1) and (4) and
adding as shown in col. (8)
= [l(1464)+2(964)+ . . . ]/14
=9572/14
=684 miles.
2. Similarly, the mean deviation from 24, making use of col. (5).
= [l(14)+2(9)+ . . . ]/14
= 693 miles.
DISPERSION OR VARIABILITY 45
3. And the mean deviation from 26, making use of col. (6),
=[1(16)+2(11)+ . . . ]/U
=707 miles.
The original determination gives a value which is less than any
of these three results, as was anticipated.
The mean deviation from the median is, however, difficult to
calculate with exactness when the observations are recorded in
groups between different limits : for this and other reasons we
shall not spend much time upon it, and we shall as a rule choose
the mean as origin of reference rather than the median. It
may be as well to explain the source of the difficulty by a small
hypothetical illustration.
Let us suppose that in making measurements of some organ or
character in 13 individuals we get a result lying between 4 and 6
units on six occasions, between 6 and 8 units on four occasions, and
between 8 and 10 units on three occasions. Here, assuming that all
the individuals in any group have the mid value measurement for
that group, i.e. treating the distribution as one of 6 individuals
with a variable measuring 5 units, 4 individuals with a variable
measuring 7 units, and 3 individuals with a variable measuring
9 units, we get § as the mean deviation with 7 as origin and ^^
for the mean deviation with 65 as origin, as the following table
shows : —
Measurement.
Frequency.
X
Deviation
from 7.
y
Deviation
from 65.
fx
fy
4 and less than 6
6 „ „ 8
8 „ „ 10
6
4
3
2
2
15
05
25
12
6
9
2
75
13
••
••
18
186
Now the result obtained is in agreement with the minimum
mean deviation theory, granted that 7 is the median measurement,
as it might certainly be. But it is not so of necessity, and in that
case the assumption italicized might lead, in the above calculation,
to appreciable inaccuracy unless the number of observations is
large and the classinterval is small. For example, the actual
46
STATISTICS
distribution might, without contradicting the previous data, con
ceivably run : —
Measurement.
Frequency.
x'
Deviation
from 7.
Deviation
from 65.
fx'
fy'.
6
65
75
9
6
2
2
3
2
05
05
2
15
i
25
12
1
1
6
9
2
75
••
13
••
••
20
185
But in this case the median, the measurement for the seventh indi
vidual from either end of the series, is 65, and according to the
first calculation the mean deviation referred to 6*5 as origin appears
to be greater than that referred to 7 as origin. If, however, we
recalculate, using the more detailed table, we find that the mean
deviation referred to 6*5 as origin (^) is really less than the mean
deviation with reference to 7 as origin, as it should be, for the
latter now turns out to be j^.
Standard Deviation. An alternative method of avoiding the
signs of the' deviations from the average in order to estimate the
amount of variability of the distribution is to square each separate
deviation, sum the squares, divide by their number, and take the
square root of the result. This gives the rootmeansqmire deviation,
and it is least when the arithmetic mean of the variables is chosen
as origin from which to measure the deviations, when it is known
as the standard deviation. For proof of this minimum principle
see Appendix, Note 5, but it is worth while testing it also with the
data given in Table (12).
The numbers in cols. (3) to (6) in Table (13) are obtained simply
by squaring the corresponding numbers in the same cols. (3) to (6)
in Table (12). Col. (7) is formed in order to enable us to calculate
the meansquare deviation referred to 25 as origin ; the numbers
in col. (3) show the squares of the deviations for each individual
observation, and the numbers in col. (1), by which they are multi
plied, show how frequently the same values are repeated. Hence
we get the mean square deviation with reference to 25
^[l(225) + 2(100)+3(25)+2(25)+2(100)+l(225)]/14
=975/14
= 6964.
DISPERSION OR VARIABILITY 47
Thus the rootmeansquare deviation referred to 25
=8345. ^
Similarly, by means of col. (8), formed on exactly the same
principle, we find that the rootmeansquare deviation referred to
2464 as origin
= V[(21433+ 18586+ . , . )/14]
= V(97322/14)
= 8338.
But 2464 is the mean of the distribution, hence 8338 is the standard
deviation.
With the help of cols. (5) and (6) the student may himself calcu
late the rootmeansquare deviation with regard to 24 and 26
respectively as origin ; the results should be 836 and 845. Of
the four values thus obtained for the rootmeansquare deviation,
the least is that referred to the mean as origin, the standard devia
tion, now proposed as a measure of variability or dispersion suitable
for most general purposes.
This measure possesses several decided advantages over the
mean deviation ; among others it lends itself more easily to certain
algebraical processes (see, for example, p. 158), a fact of importance
when we wish, for instance, to discuss two sets of observations in
combination, and it is in general less affected by ' fluctuations of
sampling ' — errors which arise owing to the fact that we cannot as
a rulev survey the whole field of operations, but have to be content
with a sample.
Table (13). Number of Miles walked on Successive Days.
(1) (2) (3) (4) (6) (6) (7) (8)
No.
of
days.
Miles
walked.
a;2
Square of
Deviation
from 25.
x,^
Square of
Deviation
from 24 64
x,^
Sqviare of
Deviatfion
from 24.
xs'
Square of
Deviation
from 26.
/X2
[No. in Col (1)]
[No.inCol.(3)J
fxi"
[No. in Col. (1)]
X
[No. in Col. (4)]
1
10
225
21433
196
256
225
21433
2
16
100
9293
81
121
200
18586
3
20
25
2153
16
36
75
6459
3
25
. .
013
1
1
039
2
30
25
2873
36
16
50
5746
2
35
100
10733
121
81
200
21466
1
40
225
23593
256
196
225
23593
14
••
•
••
•
975
97322
48 STATISTICS
Quartile Deviation or Semiinterauartile Range. There is a third
measure of dispersion, based upon the determination of the quartiles,
and to introduce them we may refer again to Table (7) in order to
show how the idea of the median may be extended.
We define the individual occupying a position one quarter the
way along any series of observations, arranged in ascending order
of magnitude of some organ or character common to all the indi
viduals of the series, as the lower quartile ; and we define the indi
vidual occupying a position threequarters the way along the series
as the upper quartile.
When the distribution of observations is divided up into groups
lying between different Umits of the variable under consideration
the quartiles may, like the median, be calculated by interpolation.
^ Thus, in the examination example, the total number of candidates
^ is 514 and J(514) 1285.
^ But the 9 1st candidate from the bottom gets approximately 20
marks, and the 149th candidate from the bottom gets approxi
mately 25 marks. Hence the imaginary candidate. No. 1285,
should get a number of marks lying somewhere between 20 and
25. But if, in this neighbourhood, a difference of
(14991) candidates corresponds to a difference of 5 marks,
375
(128591) ,, should correspond ,, 5x marks.
Do
Thus, the marks assigned to the lower quartile candidate are
approximately
58
^« =20+323.
Hence the lower quartile=2S23.
^' Again (514)=:3855.
But the 318th candidate from the bottom gets approximately 35
marks, and the 397th candidate from the bottom gets approxi
mately 40 marks. Therefore, the imaginary candidate, No. 3855,
should get approximately a number of marks
=35+5x^
79
=3927.
Hence the upper quartile=^^'21 .
DISPERSION OR VARIABILITY 49
It is clear that the quartiles together with the median divide the
whole series of observations into approximately f om* equal groups, so
that the quartile marks
give a rough idea of the 23'23 31 '52 39'27
distribution on either q ^^^^ q,
side of the average. For
this reason half the difference between the quartiles provides a
convenient measure of the dispersion, and it is called the quartile
deviation or semiinterquartile range ; thus, if Q bfe the lower and
Q' the upper quartUe, we have
the quartile deviation=^{Q'—Q).
In the above example, this measure
=4(39272323)
=4(1604)
= 802.
If a more minute analysis of the distribution of variables is
desired, we may range them in order of magnitude as before, and
divide up the series into ten equal parts, recording every tenth along
the line ; these tenths are called deciles.
Thus, the deciles in the examination example correspond to the
marks assigned to imaginary candidates numbered as follows : —
514, 1028, 1542, 2056, 2570, 3084, 3598, 4112, 4626,
and they can be calculated by the interpolation method used in
finding the median and quartiles.
This way of representing the chief features of a distribution, by
quartiles, etc., was much used by Galton in his researches and
writings.
The student may be perplexed as to which should be used of so
many different measures of dispersion or variability, but there
need be no real confusion. If a rough estimate only is wanted the
quartile deviation is a convenient measure, assuming that the
variables observed or measured can be ranged in order of magnitude
so as to admit of the quartiles being readily picked out. Also the
measure thus obtained is not unsatisfactory when the distribution
of values of the variable is fairly symmetrical and uniform in its
gradation from greatest frequency to least. If, however, it is
conspicuously skew (unsymmetrical) and there are erratic differ
ences in frequency between successive values of the variable, it
is better to choose a measure which gives the magnitude and
the position of each recorded observation its due weight in the
deviation sum.
D
50 STATISTICS
Then again the choice as between the standard deviation and the
mean deviation may be sometimes determined by the particular
kind of average which suits the problem best. But as the arith
metic mean is the most important and the most commonly used
average, so the standard deviation is certainly the most important
measure of dispersion.
It will be shown later that the following relations are approxi
mately true when the distribution of variables is not very far from
being symmetrical : —
(1) Quartile deviation= ^(Standard deviation).
(2) Mean deviation =i{Standard deviation).
In (2) the mean deviation should be measured from the mean.
Also (3) a range of two or three times the standard deviation
will be found to include the majority of the observations which
make up the distribution.
Coefficient of Variation. Before we pass on to illustrate the
subject of averages and variability by means of a few examples
it is necessary to introduce one more constant known as the co
efficient of variation. It is a measure of variabiUty but it differs
from the chief measures already discussed in that they are absolute
measures, whereas the coefficient of variation, written C. of V. for
short, is a ratio or relative measure. The need for it arises when
we reflect that in order to gauge fairly the amount of variability we
ought to have in mind also the size of the mean from which the
variation is measured ; just as a difference of 1 foot between the
heights of two men is a conspicuous difference when the normal
height is between 5 and 6 feet, whereas the same difference of 1 foot
between two measured miles would be trifling because the standard
mile contains over 5000 feet.
The coefficient of variation has been defined by Karl Pearson
(Phil. Trans., vol. 187a p. 277), who first suggested its use, as ' the
percentage variation in the mean, the standard deviation (S.D.)
being treated as the total variation in the mean,' so that
C. of V. = 100 S.D./Mean.
He pointed out that it would be idle, in dealing with the variation
of men and women (or indeed very often of the two sexes of any
animal), to compare the absolute variation of the larger male organ
directly with that of the smaller female organ, because several of
these organs, as well as the height, the weight, brain capacity, etc.,
DISPERSION OR VARIABILITY
51
are greater in man than in woman in the approximate proportion
of 13 : 12.
As an example of the use of the C. of V., figures may be quoted
from a paper by R. Pearl and F. J. Dunbar {Biometrika, vol. ii.
pp. 321 et seq.), On Variation and Correlation in Arcella. Measure
ments in mikrons were made of the outer and inner diameters of
504 specimens of a shelled rhizopod belonging to the group Imper
forata, family ArcelUna, with the following results, to two decimal
places : —
Mean.
S.D.
C. of V.
Outer diameter .
Inner „
5579
1591
573
217
1027 per cent.
1366 „
Thus, judging by the S.D. column, giving the absolute size of
deviation, the outer diameter would appear to be more variable
than the inner, but the C. of V. column shows that, if we take the
sizes of the two diameters into account, the inner is reaUy the
more variable of the two. To turn aside the edge of possible criti
cism it should be added that the authors also give the errors to
which the above measures are subject, as unless these are known
we cannot teU whether the differences observed in variation are
significant or not of a real difference in fact, but that question
must be left until the theory of errors due to sampling has been
developed in a later chapter.
The C. of V. varies considerably for different characters. W. R.
Macdonell states that * 3 to 55 are representative values for varia
bility in man, while in plants it may run to 40,' and Pearson and others
have shown that for stature in man it varies from about 3 to 4
and for the length of long bones from 4 to 6.
CHAPTER VII
FREQUENCY DISTRIBUTION : EXAMPLES TO ILLUSTRATE
CALCULATING AND PLOTTING : SKEWNESS
Calculation of Mean and Standard Deviation. Example (1). — We
return now to the examination example in order to show how the
labour of calculation in finding the arithmetic mean and standard
deviation of a frequency distribution may be somewhat lessened.
The various steps in the process appear in Table (14). In the
first column the marks at the middle of each classinterval have
been written down, and we make the assumption that all the candi
dates in any one class have the same number of marks, namely, the
marks at the middle of the classinterval. In any case where the
number of observations is large, and where the class intervals are
reasonably small, the errors resulting from such an assumption will
be insignificant, because the individuals in each class are just as
likely to have values above as below the value at the middle of the
classinterval, and they will therefore compensate for one another.
We now seek to alter the scale of marking so as to produce a
simpler set of marks than the original, wliich will make the work
of finding the mean also simpler, but we must not forget at the
end to change back again to the original scale. We choose a number
from col. (1), somewhere near the required mean, to act as a kind
of origin from which to measure the other numbers in the column.
This choice is only a rough guess, and it is really immaterial which
number is selected as origin, except that the nearer it is to the
mean the lighter will be the calculation to follow ; the number 33
has been selected in this instance.
In col. (2) are written down the deviations of the marks in each
class from 33, so that now some candidates appear as if they were
5, 10, 15 . . . marks to the bad, and others as if they were 5, 10,
15 ... to the good. So long as we remember to add 33 at the
end we can content ourselves therefore by finding the mean of the
marks as given in col. (2). But these again can be further simplified
by dividing each candidate's marks by 5, and we then only need
62
FREQUENCY DISTRIBUTION
53
to find the mean of the marks as shown in col. (3), so long as we
remember to multiply by 5 at the first step back to the old scale
of marking. The addition of col. (5) makes it easy to calculate
this mean, for it gives the result of multiplying each value of the
variable (the number of marks in each class) by its appropriate
weight (the number of candidates who obtained that number of
marks).
Table (14). Marks obtained by 514 Candidates in a certain
Examination — (Analysis of Method for Calculating
Mean and Standard Deviation).
(1)
(2)
(3)
(4)
(6)
(6)
Marks on old
scale.
Deviation of
Nos.inCol.(l)
Marks on
new scale.
Frequency
of
Product of
Nos. in
Product of
Nos. in
from 33.
Candidates.
Cols. (3) & (4).
Cols. (3) & (5).
{X)
(/)
(/^)
if^')
3=3330
30
6
5
 30
180
8=3325
25
5
9
 45
225
13 = 3320
20
4
28
112
448
18 = 3315
15
3
49
147
441
23=3310
10
2
58
116
232
28=33 5
 5
1
82
 82
82
33 = 33
. .
. .
87
. .
38=33+ 5
+ 5
+ 1
79
+ 79
79
43=33 + 10
+ 10
+ 2
50
+ 100
200
48=33 + 15
+ 15
+ 3
37
+ 111
333
53 = 33 + 20
+20
+4
21
+ 84
336
58=33+25
. +25
+ 5
6
+ 30
150
63 = 33+30
+30
+ 6
3
+ 18
108
••
514
110
2814
Thus, on this new scale, the mean marks obtained are
5(_6)+9(5)+28(4)+ . . . +87(0)+ . . . +6(+5)+3(+6)
514
532+422
614
110
514
0214.
54 STATISTICS ■
This, then, is the mean of the marks obtained by the candidates on ;
the scale indicated in col. (3). If the marks are on the scale given ■
in col. (2), the mean is 5(— 0214), i.e. —1070. To bring them back
to the original scale as in col. (1) we must add 33 to this result, so ^
that the required arithmetic mean ^ i
33+5(0214) i
= 331070 ^
=3193. 1
To find the Standard Deviation, or the rootmeansquare deviation i
from the arithmetic mean, it is convenient as before to work with ■
the simpUfied scale, to measure the deviations from the arbitrary 
origin (33) associated with that scale, and to make the necessary i
corrections at the end of the work. j
Col. (5) in Table (14) gives the deviation multiplied by the ■
frequency in each class, the frequency denoting the number of
times the particular deviation occurs. Hence, if these numbers be j
multiplied again by the numbers in col. (3), we shaU have each •
separate deviation squared and multiplied by its frequency. The '.
results are shown in col. (6), and they must be added, and their !
sum divided by the sum of the frequencies (514), to give the mean j
square deviation, which we may represent by s^. ;
Thus . 52=2814/514 f^'
=5475, \
and this is the meansquare deviation referred to 33 as origin.
We require the corresponding expression referred to the mean, j
3193, as origin. If we denote this by s^^^ there is a simple relation '
connecting the two, namely,
where x is the deviation of the mean itself from 33 [see Appendix, \
Note 5] ; of course s^^, s, and x are all to be measured on the same i
scale, the simplified scale adopted with 5 marks as unit.
Now we have already shown that the deviation of the mean from :,
33^—0214, and this is therefore the value of x.
Hence s^2=5475 (0214)2 I
=54750'046 ]
=5429 \
= (233)2. ]
FREQUENCY DISTRIBUTION 55
And; returning to the old scale, the standard deviation, usually
denoted by a
=5(233)
= 1165.
We notice that 3cr= 3495, and this range on either side of the
mean amply takes in all the observations.
The mean deviation is readily found from Table (14) by adding up
the numbers in col. (5) regardless of sign and dividing by the sum
of frequencies, 514.
Thus, on the new scale, the mean deviation
954
= 5^TT
1856
which, on the old scale, becomes 5(1856) or 928. This, however,
is the mean deviation measured from 33 as origin, and a correction
has to be applied to get the mean deviation measured from the
median or from the mean.
To get the mean deviation from the mean we note that the
difference between the mean, 3193, and 33 is 107. Hence it
should be clear from Table (14) that, by measuring from 33 instead
of from 3193, we have made the deviations of all the marks from
33 upwards too little by 107, and we have made the deviations of
all the marks from 28 downwards too much by 107. Hence, to
get the deviation required we must add to 928 an amount
= 6T4[l07(87+79+ . . . +3) 107(82+58+ . . . +5)]
107
=:^(283231)
514^ '
= — X52
514
=0108.
Therefore, the mean deviation measured from the mean=939.
This may be compared with I (standard deviation) =932.
Also the quartile deviation for this distribution has been shown
to be=802, and it may be compared with §(standard deviation)
= 777.
Plotting of a Frequency Distribution. The data for the two
examples which foUow are taken from the Quarterly Return of
Marriages, Births, and Deaths, No. 261, issued by the Registrar
General,
56
STATISTICS
The first shows the proportion to population of cases of infectious
disease notified in 241 large towns of England and Wales for the
thirteen weeks ended 4th April 1914. This proportion was given
for each town separately in the Return, but, in order to bring out
the distinctive features of the distribution, the several towns have
Table (15). Proportion to Population of Cases of Infectious
Disease notified in 241 Large Towns of England and
Wales during the Thirteen Weeks ended 4th April 1914.
Case Rate
per 1000
persons
living.
Each dot below represents One Town with Notified Rate of Infectious Disease
between limits as given in previous column.
Total No.
of Towns
with given
Rate.
0
....
5
2—
................!...... ..!... .!....
39
4—
....!............i........l............I....!....l...,l....!....
69
6
....:....l....i........ !....!.... I. ....
41
8—
.......... ..!....!....!....
29
10—
....!....!....!....:..
22
12—
....'....I....1.
16
14—
7
16
—
5
18—
...
3
20
—
4
22—
24—
26—
•
1 1
241
been, in Table (15), represented by dots and put into different classes
according to the proportion of infectious cases notified in each,
with a separate line for each class : e.g. ii the proportion for any
town was 537 a dot was placed in the line corresponding to the
class of towns for which the rate was ' 4 and less than 6.' Every
FREQUENCY DISTRIBUTION
57
fifth dot in each line was ticked off, so as to make them easy to
count up and also to keep the lines, down the paper as well as
across, straight. The frequency, i.e. the number of dots in each
class, was then recorded in a column at the extreme righthand
side of the paper.
'^3
Ifr
.^
^ .
stub
's
2
(5?
^^
<Occ "
§5
OT J
•«,
'^SO H
05°° 
§
o
"•5
O ii"? .
vS**' :
"t^
^^
*t^An
c
T
o
?
*"
"^•^s 
T
J
o oo
^
T^
s
Q.
X '
Ort
}
s;^° "
5
^ =
5 !
c:^
01 OK _J
Xh
■k>
C _j
±:
Ho
XI
^
'^on
X
520
r
A
± : :
^
X
5
j_
t
^ i«^
xi  
15
TT
1.
^^
v..
A II
o'
*
• •^ +
•>^ irt
*
g»10
hj
5
^
X
T T
:5
T
ST «i ^
tx
I x
^ 5 ^
Z_ A ?
^ i
5
! '
:i:
^
5 i I !
xX j^^T
10
20
25
30 X
Rate of Disease per 1000 persons living
Fiu. (1).
It will be at once seen that this procedure, without calculating
any averages, etc., ultimately gives to the eye a very good picture
of the distribution, and indeed it is the basis of the graphical method
of studying statistics. In drawing a proper graph we use a specially
ruled sheet of paper which is divided up into a large number^of
equal small squares by * horizontal ' (cross) and ' vertical ' (upand
58
STATISTICS
down) lines. This merely enables us to place our dots accurately
in position, as shown in fig. (1), where the numbers 0, 5, 10 . . .
have been marked off along the line Ox to correspond to ' case
y ~
Tn _
T
_r
T_
^ 
it
^fir: _
 Jt   
H^65
t
•♦j
t
^ en 
a
Ci
■
«0 EC _
Q)
r
^
r
C5
CO en 
S 50 
.o
•tj
.
S
. . 4^
'^AR 
J
^45
t  
■ t it
o^ 
s: An
O 40 
L
'•P
1 HIT . 4 f T
?
\ ^ Modal Line
Q. :
 1 ^^
P Qc;
It ^^
^35
t U^^
:: i^
s
§
4__ ^/
OjOO 
X^^
Q>
t^
^
A
:S25 
■ti ^o
t^^
S
^\
^ V
x
ri ork
\ __ ±
i~
f2
i
t I
">
V
_r __ _
s^'5 ■
t
o
\
03
S
^ in
Q^ 10 j
u:
\ ^
^
c
i
5^ i
v^^
^"^ X
x
o X
1 1 lllllll 1 1 1 1 1 1 1 1 M M 1 IMU^I 1 1 1 1 1 1 1
5 10 15 20 25 30
Rate of Disease per 1000 persons living
Fig. (2).
rates ' of these magnitudes : thus rates of ' 4 and less than 6 '
were recorded by 69 successive dots along a vertical line at a dis
tance 5 (the centre of the classinterval 46) from the axis Oy.
FREQUENCY DISTRIBUTION
59
The final configuration in fig. (1), when turned half round, is
exactly the same as that of Table (15). If desired the frequency
y
j__ _ ___ ___
70  
"B
•g
« eo
% 60  —
^« i
o

fO
5S 55
CO
<* i^O 
3 £>U
.o
s^
s :
*K/it:
^45 
o* 
r 11 )C[s;l U;rae
=: /.n
o 40 
r ^'^
t: — 
'' Median" Li le
CJ.
^
y
V o5
^
; /X r ;an Itme
s=
' ^^ I
§
•§.an
Oi 30  
<»
:g
5 OK
.t; 25 
S
^
2i nr\
^
*t>
2h ^S "
o
§
^
5»" in
CD lO 
li:
5 1 
■*
OLJL
5 10 '15 20 25 30 X
Fig. (3).
may be recorded, dot by dot, on a side piece of paper and then
only the topmost dot in each class need be marked on the graph
sheet. In order, however, to enable the eye to measure the height
60 STATISTICS
X)f each frequency in relation to the rest, it is advisable in that
case to connect up adjacent dots as in fig. (2) or as in fig. (3).
The last method of representation (fig. (3)), to which the name
histogram has been given by Professor Karl Pearson, is particularly
useful and should be carefully studied. It is formed in this case by
erecting a succession of rectangles with the lines 02, 24, 46 . . .
along Oa; as their bases, corresponding to the successive classes of
the given distribution, and with heights proportional to the fre
quencies proper to those classes. It is not necessary to complete
the sides of the rectangles, but, if they were completed, each would
enclose a number of squares proportional to the frequency of towns
with the rate of disease defined by its base : e.g. the first rectangle
would enclose 10 squares, the second 78, the third 138, and so on,
numbers respectively proportional to 5, 39, 69, and so on. It
follows that the total area enclosed between the histogram and the
axis Ox is proportional to the aggregate frequency of towns observed.
Now we might conceive a step further taken and a smoothed
curve drawn freehand so as to agree as closely as possible with
fig. (2) or fig. (3), but with all the sharp corners smoothed out, and
so nicely adjusted as to make the area enclosed between the curve,
the axis Ox, and lines parallel to Oy defining the limits of any class,
proportional to the frequency of towns in that class. To this
fig. (2) and fig. (3) might be regarded as approximating if only a
sufficient number of observations were recorded, and only in that
case would it be possible to draw it with any accuracy. Such a
curve is called a frequency curve, measuring as it does the frequency
of the observations in different classes.
[Assuming that corresponding to a given frequency distribution a curve
of this kind does really exist — and the assumption turns upon the frequency
being continuous — the reader who is acquainted with the notation of the
Calculus will recognise that, if {x, y) represents any point on the curve, ybx
measures the frequency of observations or measurements of an organ or
character lying between the values x and {x\bx), when the total frequency
comprises a large number of observations, say 500 to 1000.
Further, it will appear later that the mean, the median, and the mode
have a geometrical interpretation of no small importance associated with the
curve.
The mean x corresponds to the particular ordinate y which passes through
the centroid or centre of gravity of the area between the frequency curve
and axis (ix, because
the mean= J ^ 2{x.y8x)/J^ My^x),
where the summation extends throughout the distribution,
=jxydx/jydx
where the integral extends throughout the curve.
FREQUENCY DISTRIBUTION
61
The median x corresponds to the ordinate y which bisects this same area ;
e.g. in fig. (3), the number of small squares on either side of the median in the
space bounded by the histogram and the axis represents half the total number
of observations, two small squares corresponding to each observation.
The mode « corresponds to the maximum ordinate of the curve, measuring
the greatest frequentsy in the whole distribution.]
Skewness. There is one feature of a frequency distribution which
catches the eye sooner almost than any other, and that is its sym
metry or lack of symmetry. It is important therefore that we
should have some means of measuring it.
In a symmetrical distribution the mean, mode, and median
coincide, and we have, as it were, a perfect balance between the
frequency of observations on either side of the mode or ordinate of
maximum frequency. In a skew distribution the centre of gravity
is displaced and the balance thrown to one side : the amount of this
displacement measures the skewness. But there is another factor
to be taken into account, for when the variability of the distribu
tion is great the balance is more sensitive than when it is small,
and the difference between mean and mode is consequently more
pronounced though it may not be significant of any greater skew
ness. This will be clear in the light of the analogy of the swing
of a pendulum. If OPP' denote the pendulum in the accompanying
figure, OAA' its mean position, and OBB' an extreme position, the
displacement in the position OPP' from the mean, if measured
along the scale AB, is AP,
and, if measured along the
scale A'B', is AT'. But,
since the amount of swing
in either case is the same,
it would be more appropri
ate to write the linear dis
placement as a fraction of
the full swing so as to make
these two measures also the
same, thus
AP/AB=A'P'/A'B'.
So, in the case of a fre
quency distribution, Profes
sor Karl Pearson has suggested as a suitable measure for skewness,
not the difference between mean and mode, but the ratio of this
difference to the variability. Thus
skewness— (mean— mode) I S.D.
62
STATISTICS
or, approximately,
=3(mean— median)/S.D. (see p. 39),
a form which is sometimes useful.
According to this convention the skewness is regarded as positive
Skewness +
Skewness —
Mode Mean
Mean Mode
X increasing
X increasing
when the mean is greater than the mode, and as negative when
the mode is greater than the mean.
Illustrations of frequency curves, with the position of mode and
mean marked, will be found in Chapter xvii.
We proceed to the detailed calculations necessary in the infectious
diseases example.
Table (16). Proportion to Population or Cases of Infectious
Disease notified in 241 Large Towns of England and
Wales during the Thirteen Weeks ended 4th April 1914.
(1)
(2)
(3)
(4)
(5)
Case Rate per
1000 persons living.
Deviation
from 7.
Frequency of
Towns with
given Rate.
Product of .
Nos. in
Cols. (2) & (3).
Product of
Nos. in
Cols. (2) & (4).
and less than 2
[x)
 3
(/)
5 ^
15
{fx^)
45
2
, „ 4
 2
39 u.
78
156
4
6
 1
69 »■>
69
69
6
8
41
8
„ 10
+ 1
29
+ 29
29
10
» 12
+ 2
22
+44
88
12
„ 14
+ 3
16
+48
144
14
„ 16
+ 4
7
+28
112
16
„ 18
+ 5
5
+25
125
18
„ 20
+ 6
3
+ 18
108
20
„ 22
+ 7
4
+ 28
196
26
„ 28
+ 10
1
+ 10
100
••
••
241
+68
1172
FREQUENCY DISTRIBUTION 63
Example (2). — ^The various averages and measures of variability
of the distribution can be calculated just as in the case of the last
example, and the data required to determine the mean and the
standard deviation are set out in Table (16). We can afford now
to miss out some of the more obvious steps in explanation.
On the scale of col. (2), where a difference of 2 in the case rate,
per 1000 persons living, is the unit and where a case rate of 7 is
taken as origin, the mean, by the result of col. (4)
68
="2TT
=0282.
Hence, on the original scale, the mean
=7+2(0282)
=7564.
Again, the meansquare deviation, on the scale of col. (2), measured
from 7 as origin is
=4863 ;
and X, the deviation of the mean from 7 as origin, on the scale of
col. (2) =0282. Thus the meansquare deviation measured from
the mean,
=4863 (0282)2
=4783.
Therefore, the standard deviation a, on the original scale
=2V4^78^
=4374.
Since 3or= 13122, the range ' (mean— 3o) to (mean+3o) ' includes
all but one or two observations.
To determine the median, we conceive the towns ranged in order
according to the proportion of infectious cases notified in each,
from the least to the greatest, and the town with the median rate
is the 121st from either end.
But the 113th town has a notified case rate of approximately 6
per 1000, and the 154th town has a notified case rate of approxi
mately 8 per 1000.
Thus a difference of 41 towns corresponds to a difference of 2 in
the rate, hence a difference of 8 towns corresponds to a difference
of 039 in the rate ; therefore the median ra<e= 639 approximately.
By referring to the original records and writing down the rate
64 STATISTICS
for each town in the group ' rate 6 and less than 8 ' in which the
median lay, the accurate value of the median turned out to be 630.
The lower quartile or case rate of the imaginary town, No. J(241),
or 6025, onequarter way along the ordered sequence of towns, is
readily shown to be 447, and the upper quartile or case rate of
town No. i(241), or 18075, is 984.
Hence the quartile deviation
=i(984447)
=269.
With this may be compared (S.D.)=f(437)=292.
Again, the mean deviation measured from 7
=2(111)
=3253.
Measured from the mean, it becomes
=3253+ [(41+69+39+5)(29+22+16+7+5+3+4fl)]
241
= 3253+ (0564)(67)/241
=341
and this may be compared with i(S.D.)= 1(4374) =350.
If we estimate the mode by inspection of the frequency graphs in
figs. (2) and (3), we should say it comes between 5 and 6 ; supposing
we call it 55, very roughly.
In this case, taking the values actually calculated for mean and
median,
(mean— mode)=756— 550
=206,
and 3(mean—median)= 3(756— 639)
=3(117)
= 351 ;
so that the rule
(mean— mode) = 3(mean— median)
is far from being true according to these results ; this is partly due,
of course, to the very unsymmetrical character of the distribution.
The relative positions of the mean, median, and modal points
as calculated are indicated in figs. (2) and (3) by three fines drawn
paraUel to Oy through these points to meet the graph.
Finally, 5A;ew;iie55=(mean— mode)/S.D.=206/437=047.
Example 3. — The next example deals with the deaths of infants
under one year, out of every thousand born, in 100 great towns in
the United Kingdom during the thirteen weeks ended 4th April 1914.
FREQUENCY DISTRIBUTION
65
The details of the calculation may be left in this case to the reader,
who is recommended to follow the method shown in the last example
so far as possible throughout, including the plotting of the distribu
tion in different ways. The statistics are as follows : —
Table (17). Death Rate of Infants under 1 Year
PER 1000 Births.
(1) (2) (3) (4)
No. of Towns
No. of Towns
Death Rate.
with Death Rate
Death Rate.
with Death Rate
as in Col. (1).
as in Col. (3).
30 and under 40
1
120 and under 130
16
50 „ 60
3
130 „ 140
11
60
70
2
140 „ 150
10
70
80
6
150 „ 160
8
80
90
7
160 „ 170
3
90
100
6
170 „ 180
1
100
110
11
200 „ 210
1
110
120
13
240 „ 250
1 *
The more important results are : —
Arithmetic mean= 1189; S.D. = 322 ;
median= 1209 ; quartile deviation ^
195.
Example (4). — As another example corresponding details may be
worked out for the following temperature records J^ken at noon
at a certain spot in Chester week by week during a period o^me
covering five years, the results in this case being : —
mean=5510; S.D.=1033 ;
median= 5488 "f quartile deviation =7 94
Table (18). 257 Weekly Records of Temperature (Fahrenheit).
(1) (2) (3) (4)
Temperature
No. of Records
Temperature
No. of Records
Limits in
between Limits
Limits in
between Limits
Degrees.
shown in Col. (1)
Degrees.
showninCol.(3)
255295
I
535575
305
295335
1
575615
315
335375
9
1 615655
30
375415
115
1 655695
26
415455
28
695735
136
455495
315
735775
4
495535
365
775815
3
U i\^
G6 > STATISTICS
Before closing the chapter a shghtly different manner of graphing
the statistics is worth noticing, as it provides us with a fairly quick
though rough alternative method of determining the mode and
median.
Take, for example, the examination marks data which for this
purpose must first be thrown into the second form shown below
Table (7). We mark off on some convenient scale along OX dis
tances 5, 10, 15, 20 ... 65 from O to represent these numbers
of marks respectively, and at the points obtained we erect lines
parallel to OY of lengths 5, 14, 42, 91 . . . 514 to represent the
numbers of candidates who obtained not more than 5, 10, 15, 20
... 65 marks respectively. A freehand curve is then drawn
through the summits of these lines in the manner indicated in
fig. (4), starting from a height 5 and rising to a height 514 above
the axis OX.
By means of this curve we can approximately state at once how
many candidates obtained any given number of marks or less.
Suppose, for example, we wish to know how many candidates
obtained 22 marks or less, we have only to measure off a distance
22 from 0, represented by ON, and erect a perpendicular NP to
meet the curve at P. Since NP=110 we infer from the manner in
which the curve has been formed that 110 candidafes obtained
22 marks or less, so that, incidentally, the 110th candidate from
the bottom must have obtained approximately 22 marks. This
suggests that by working backwards we can also read off roughly
the^umber of marks gained by any particular candidate when his
order in the Hst is known. Thus, to find the median, i.e. the marks
due to candidate No. 2575, we merely draw a line parallel to OX
at a height 2575 above it and the portion of this line cut off between
the curve and OY measures the median. The value given by this
method is approximately 315. Similarly the quartiles are found
by drawing lines parallel to OX at heights 1285 and 3855 above
it with results about 233 and 392 respectively.
Again, as we gradually increase the number of marks, the number
of candidates getting that number of marks or less must increase
also, but the rate of this second increase is variable. The reader
will perceive that where the height above OX changes slowly the
gradient of the curve is small, but where it changes by big steps
the gradient is steep, and it is at its steepest just in the neighbour
hood where the greatest addition is being made to the height as
the marks increase, i.e. where the frequency of additional candi
dates is at its greatest, so determining the mode : this should be
FREQUENCY DISTRIBUTION
G7
clear on a comparison of the two arrangements of the data in and
below Table (7). By sliding a straightedge along the contour of
the curve we can estimate approximately where the curve is
steepest, for at this point the direction of turning of the ruler or
Y """:: 
i '<
^ . . .
/
J
^ _ .. _ ..
T . z ":. : : ±
K4i~ LpBerOuaitiilpiUne.
CO :::::::::::::"::::f ::::::: :::::::::::::
•S J
Q oen   __/_ .._ _.
"S ^°°
•S
1 J :::;:: :
« f .. _ . .
<^300__ _ _ t   . 
v^^°° ,i± : . .
^ t\
V. ,.f J
55 \\l\ . Wie^ian.J ii e^.
1 250 = = = = = ^ = = = == = = =7=:: = = ==:= = ==:===
a  . 
^ ..__..
T '■ 
r
150 H Hjf    
"Q/r ■ .(H ^Julu F glliai:
W. \ 
i ■  ■■ ■■ 
y _ . ._.._..
y ._....
^^ . __ . y^ . _ .. _ .. ^
oi:::Lj::...lL:u. LX,. ..
to
20 N 30 40 50
Number of Marks
60
70
Fig. (4). Graph showing the Number of Candidates who obtained not
more than any given Number of Marks.
straightedge must change. This gives for the mode a value in the
neignbourhood of 32.
It might be advisable to treat the other examples by this method
also, so as to compare results.
CHAPTER VIII
GRAPHS
From the mathematical point of view graphs may be regarded as
the alphabet of Algebraical Geometry.
We can locate a point in a plane, relative to two perpendicular
lines or axes as they are called, OX, OY, which serve as boundaries
of measurement, when we know y and a;,
its shortest distances from these boun
daries. This fact serves to connect up
Geometry, in which points are elements,
with Algebra, in which a:'s and t/'s,
X standing always for numbers, are ele
ments. The names abscissa (ah — ^from,
and scindo — I cut) and ordinate are given to x and y, or, when we
refer to them together, they may be spoken of as the coordinates of P.
The celebrated French philosopher, Descartes (15961650), was
the founder of Cartesian Geometry, and if we may venture to com
press the essence of his system into a single statement, it is this —
When a point P is free to take up any position in a given plane,
its X and y are quite independent : they may be allotted any values
irrespective of one another. Suppose, however, that P is constrained
to lie somewhere on an assigned
curve, such as APB in the figure,
then X and y are no longer inde
pendent, for, so soon as x is fixed,
y is fixed also ; it follows that in
this case some relation, algebraical
or otherwise, such as y=x^—2x{'l,
must exist between x and y, and the relation may be called the
equation of the curve which gives rise to it.
Now, if to every curve there corresponds in this way some
equation and to every equation some curve, it seems likely that the
simpler the curve the simpler will be the corresponding equation,
and vice versa. In fact, the student who does not know it already
GRAPHS
69
need only refer to the most elementary treatise on graphs to find
that every equation of the first degree in x and y, i.e. one which does
not involve any x^, y^, xy, or higher powers, represents some straight
line. Any such equation, e.g.
x3y{l2=0,
can be at once thrown into
either the form
(1)
12 4
where — 12 and 4 are intercepts
made by the line on the axes
OX and OY ; or
(2) 2/=ia:+4,
where J, i.e. 1 in 3, is the measure of its gradient and 4 the height
above the origin at which it cuts the axis OY.
Further, every equation of the second degree in x and y, which
may involve x^, y^, and xy, but no higher powers, represents geo
metrically some conic, a family of curves comprising the parabola,
the ellipse, and the hyperbola, with the circle and two straight
lines as particular cases. The earth and other planets, likewise
comets, in their journeys through space travel along curves belonging
to the same family, one of ancient and historical connections.
These conies need not, however, detain us, and we pass on at
once to an example of a cubic graph to show how a very little
knowledge of the theory may be put
to some practical use. Suppose a
box manufacturer has a large number
of rectangular sheets of cardboard,
3 ft. long by 2 ft. broad, and he
wishes to make open boxes with them
by cutting a square piece of the same
size out of each corner and turning
[The shaded flaps are bent upwards up the flaps that are left. How big
along the dotted lines.] , , , . , .. i . . u. i.
should the squares be if this is to be
done with as little waste as possible ? Clearly this is commercially
an important type of problem to solve.
Let us denote a side of the square to be cut out of each corner
by X feet. Then the bottom of the required box will have dimensions
(32a;) ft. by (22x) ft.
and its depth will be x ft.
70
STATISTICS
Hence the capacity of the box when completed will be
a:(32a;)(2— 2a;) cu. ft.,
and he makes best use of the material who produces the most
capacious box. Call this expression y and let us find the values
of y corresponding to different values of x so as to be able to draw
roughly the curve of which the equation is
y=^x(Z2x)(22x) . . . (1)
Table (19). Table of Corresponding Values of x and y
IN the Curve y=x{3—2x){2—2x).
X
2x
(3 2a:)
(2 2a;)
a;(32ic)(22x)
y
1
2
5
4
20
20
: .
1
4
3
 6
 6
\
I
f
M
 219
3
2
+t
+ h
1
f
+ M
+ 094
+ *
+ 1
2
1
+ 1
+ 1
+ 1
+ f
1
i
+ A
+ 056
+ 1
+ 2
1
+li
+ #
i
i
^
 031
+li
+ 3
1
+ 2
+ 4
1
2
+ 4
+ 4
+ 2i
+ 6
2
3
+ 15
+ 15
02
04
26
16
(0.2)(2.6)(1.6)
083
04
08
22
12
(0.4)(2.2)(l2)
106
06
12
18
08
(0.6)(1.8)(0.8)
086
08
16
14
04
(0.8)(1.4)(0.4)
045
038
076
224
124
(038)(224)(1.24)
1055
039
078
222
122
(0.39)(2.22)(1.22)
1056
040
080
2.20
120
(0.40)(2.20)(1.20)
1056
041
082
218
M8
(041)(218)(1.18)
1055
We get a tolerably good idea of the shape of the curve by plotting
the points (x, y) shown in Table (19) from x— — \ to x—\2 as in
fig. (5). It is simply a matter of practice to be able to determine
the whole curve from a few points in this way, and the greater the
number of points plotted the more accurately will it be possible
to draw the curve. It should be noticed that the points for which
^=0 are in a sense keypoints to the curve : they are readily
GRAPHS
71
QfTtthf"
magnifi^
025 050 075 100 X
Length of Side of Square cut out
Fig. (5). ^'*"!
72 STATISTICS
found by making the factors separately zero in the righthand side
of equation (1), namely x=0, 3— 2ic=0, and 2— 2ic=0, and by
plotting them first they serve as a guide to the position of points
subsequently plotted.
We want to laiow for what value of x the capacity of the box, t/,
is greatest and the preliminary plotting is enough to indicate a
maximum value for y between x=0 and x=l, for the curve first
rises and then falls between these two limits. In order to discover
more exactly where the maximum is located we therefore plot
in addition the points corresponding to x=02, 04, 06, 08 respec
tively, and this is done on a larger scale than that used in the
first diagram because the accuracy is thereby increased (see fig. (5)
inset).
The calculations and figure suggest that the maximum required
is very near the point for which a;=04, so we next work out values
of y in this neighbourhood, corresponding, say, to a;=038, 039,
040, 041, with the results shown at the foot of Table (19). From
these we conclude that to a fair degree of accuracy the maximum
value of y is given by taking a:=0395. It would be possible in
the same way to calculate more decimal places, but we have gone
far enough to make the method clear.
Hence the side of each square cut out should be of length
0395 ft., or 4 in.
Whenever the value of one variable, y, depends upon that of
another variable, x, in such a way that when x is given y is known,
so that y may be termed a function of x, corresponding values of
X and y can be plotted — as was done in the example just discussed —
and a curve drawn by joining up the points obtained, the relation
which connects x and y being the equation of this curve. More
over, it is possible, by calculating enough points from the equation
and plotting them, to get the curve as accurately as we please.
In Statistics, however, we usually have to start the other way
round and reach the equation, if at all, last. We make observations
of two sets of variables, a set of x's, and a set of y's, one of which
is dependent in some way upon the other — e.g. y, the dependent
variable, might denote the number of individuals observed to have
a certain organ of length x, the independent variable — and thus
we get pairs of corresponding values like {x^^, y^), {x^, y^), {x^, 2/3) •• •
We met with examples of this method of recording results in the
last chapter, and we need only repeat here that its chief virtue is
suggested in the root of the word itself — it is more graphic than a
GRAPHS 7S
long table of figures and, by means of it, many of the essential
features of a problem are immediately seized upon.
Now for some purposes it may be necessary to go further and
to find what curve would best fit the points plotted, assuming they
were numerous enough, and what equation between x and y would
best describe the curve. But the graphs we meet in Statistics,
bearing, for instance, upon sociological or biological problems, are
in general much more wayward than the mathematical kind we
have referred to in the present chapter : it is impossible to set
down simple equations to which they can be rigidly confined, and
when we are unable to find any relation which accurately and
uniquely defines 2/ as a function of x we must rest satisfied with the
most manageable equation and the best fit we can get.
In sciences such as Engineering and Physics it is often possible
to fix upon two mutually dependent variables, x and «/, and to
observe enough corresponding values of each to enable us to draw
a graph which answers very closely to the true relationship between
them, so that a connecting equation can be determined ; e.g. we
may plot the amount of elastic stretch, y, in a wire when different
weights, X, are hung from the end of it, and it is found that y is
directly proportional to a:. If we deal in this way with some
simple figures which are amenable to our purpose it may help to
make clear the nature of the same problem in Statistics.
The following corresponding values of x and y were given in a
Board of Education Examination (1911) : —
a;=l00, 150, 200, 230, 250, 270, 280 ;
2/=077, 105, 150, 177, 203, 225, 242.
Allowing for errors of observation, it was desired to test if there
was a relation between y and x of the type
y^a^hx'^ . . . (1)
In the first place, the shape of the curve obtained by plotting
y against x, as in fig. (6), would, to the initiated, probably suggest
a parabola, the equation of which is of type (1). In order to test
its suitability we proceed to plot y against x^, or, putting x'^=^, we
plot y against f . If equation (1) holds, then, in that case
2/=a+6^ . . . (2)
should also hold, and this, in (f , y) coordinates, represents a straight
line. The result of plotting y against f should therefore be a
number of points approximately in a straight line — we say * ap
proximately ' to allow for errors of observation in the original data.
74
STATISTICS
Now from the given statistics corresponding values of ^ and y
are, since f =a:2 : —
^=100, 225, 400, 529, 625, 729, 784 ;
2/=077, 105, 150, 177, 203, 225, 242 ;
Y

"■
~
~
r
25
i
»
/
)
20
,
/
/
y
1
'
A
15
f
/
/
/
10
■
/
y
^
05
r»
_,
J
_
J
_
_
_
_
_i
_
_
_
_
_
_
_
_
_
_
05 10
'15 20
Fig. (6).
25 30
and the resulting graph, fig. (7), is very approximately a straight
line. To determine its equation, choose two points (not too close
together) on the line, which has been drawn so as to run as fairly
as possible through the middle of the points plotted, and, in choosing,
take points which lie at the intersections of horizontal and vertical
cross lines (the printed lines of the graph paper) if such can be
Y
4 5
Fig. (7).
found, because their x's and 2/'s can be read off with ease and
accuracy. Two such points are
(28, 12) and (6*0, 20),
GRAPHS 75
and since each of these pomts lies on the line whose equation is
we have
l2=:a+6(28)
20=a+6(60).
Subtracting, we get
08=6(32).
Therefore 6=i.
Hence az=2 — 2 = J.
Thus the equation of the line is
t.g. 4t/=f+2,
and the law connecting x and y is therefore
4i/=a;22.
The following statistics, the result of an experiment in Physics
to verify Boyle's Law, may be treated in the same way. a; is a
number proportional to the volume of a constant weight of gas in a
closed space, and ?/ is a number proportional to its absolute pressure.
Corresponding values of x and y observed were : —
\x= 4689 4196 4033 3888 3737 3606 3471 3347
[y= 7632 8538 8893 9236 9609 9961 10351 10751
{x= 3239 3108 2997 2876 2726 2532 2404
\y=n\m 11569 12005 12508 13199 14209 14981.
Boyle's Law states that the product xy is constant, and this may be
tested by putting ^=  and plotting y against  ; the points obtained
x
should be approximately in a straight line.
Now in Statistics, as we have already explained, the exact con
nection between the variables, x and y, is rarely so clear, though
the absence of law is not so complete as it might seem at first sight.
At this stage, however, we need not enter into the difficult question
of curve fitting : if drawn with care and used with judgment much
that is of value may be learnt by simple plotting and by connecting
up the resulting points by straight lines or a freehand ciuve. We
shall briefly explain or illustrate by examples how graphs and
76 STATISTICS
graphical ideas may be used to serve three distinct purposes,
namely : —
(1) to suggest correlation or connection between two different
factors or events ;
(2) to supply a basis for finding by interpolation some values of a
variable when others are known ;
(3) as pictorial arguments appealing to the reason through the eye.
We reserve (2) and (3) for the next chapter and proceed at present
with an example of (1).
Correlation suggested by Graphical means. Consider the index
numbers, col. (2) Table (20), showing the variation from year to
year in wholesale prices between the years 1871 and 1912. It is
not an easy matter to take in satisfactorily the meaning of such a
mass of bare figures, but they are much easier to grasp when plotted
in a graph.
In this case the numbers x, representing years, and the numbers ?/,
representing prices, are measures of things of quite a different char
acter, so that it is not necessary to take the x and y units of the
same size. Moreover they need not, in a case of this kind, neces
sarily vanish at the origin, but it is convenient to draw the graph
in such a way that it shall occupy the greater part of the space at
our disposal. Thus, we have roughly 80 small squares across the
breadth of our graph paper, and between 1871 and 1912 we have
roughly 40 years ; we therefore take two sides of a square to 1 year
and mark off the years 1870, 1875, 1880, . . ., along an axis or
base Une parallel to the breadth of the paper, as shown in fig (8).
Again we have roughly 70 small squares in the available space
from this base line to the top of our graph paper, and the whole
sale price index numbers vary from 882 to 1519, a range of 63*7 ;
we therefore take one side of a square to correspond to a difference
of 1 in the price index number, and mark off the prices 90, 100,
110, ... , along an axis parallel to the length of the paper, as
shown in the figure.
We then plot points to represent the numbers in col. (2) of
Table (20). Thus, in 1880 wholesale prices stood at 129 ; we there
fore travel along the width of the paper till we reach 1880 and
then upwards until we are opposite the 129 level on the axis of
prices, inserting a dot to mark the position. Similarly for all other
points, and the required graph is given by joining them up in
succession.
GRAPHS
77
Table (20). Mabriage Rate and Wholesale Prices
Index Numbers.
(1)
(2)
(3)
(4)
(6)
(6)
(7)
Nine Years'
Difference be
Marriage
Nine Years'
Difference be
Year.
Prices.
Average
tween Nos. in
Average of
tween Nos. in
of Prices, j
Cols. (2) & (3).
rate.
Marriage rate.
Cols. (5) & (6).
1871
1356
167
1872
1452
174
1873
1519
. .
176
1874
1469
. .
170
. .
1875
1404
1393
+ 11
167
164
+ 3
1876
1371
1386
15
165
162 i
+ 3
1877
1404
1365
+39
157
159 1
_ 2
1878
1311
1338
27
152
157 1
 5
1879
1250
1315
65
144
155 !
11
1880
1290
1285
+05
149
153
 4
1881
1266
1252
+ 14
151
151
..
1882
1277
1208
+ 69
155
149
+ 6
1883
1259
1172
+87
155
148
+ 7
1884
1141
1147
06
151
148
+ 3
1885
1070
1118
48
145
149
 4
1886
1010
1092
82
142
149
 7
1887
988
1069
81
144
149
 5
1888
1018
1042
24
144
149
 5
1889
1034
1025
+09
150
149
+ 1
1890
1033
1010
+23
155
149
+ 6
1891
1069
999
+ 70
156
150
+ 6
1892
1011
987
+24
154
151
+ 3
1893
994
974
+20
147
153
 6
1894
935
963
28
150
155
 5
1895
907
950
43
150
156
 6
1896
882
943
61
157
156
+ 1
1897
901
938
37
160
157
+ 3
1898
932
934
02
162
158
+ 4
1899
922
938
16
165
159
+ 6
1900
1000
947
+ 53
160
159
+ 1
1901
967
957
+ 10
159
159
1902
964
969
05
159
158
+ 1
1903
969
983
14
157
158
 1
1904
982
995
13
153
156
 3
1905
976
1000
24
153
155
 2
1906
1008
1013
05
157
154
+ 3
1907
1060
1028
+ 32
159
153
+ 6
1908
1030
1048
18
151
153
 2
1909
1041
. .
147
1910
1088
150
• •
1911
1094
..
. ,
152
'
1912
1149
••
••
155
••
78 STATISTICS
It is comparatively easy from this graph to trace the change
in prices from year to year and from decade to decade : for example,
we note that from 1873 to 1896 the tendency of prices was on the
whole downward, and from 1896 to 1910 the tendency was upward.
Also on the assumption — not necessarily valid — that prices have
varied continuously, or at least consistently, during the intervals
between the dates to which the records refer, it is possible to read
off intermediate values from the graph : e.g. midway between 1883
and 1884 we get the figure 120 as the index number for prices.
On the same graph sheet we have also plotted the marriage rate
from year to year during the same period. The numbers are given
in col. (5) of Table (20). This rate varies from 142 to 176, a range
of 34, and we have a range of 40 small squares at our disposal in
plotting ; a difference of 1 in the marriage rate has therefore been
taken to correspond to one side of a square, and the marriage rates
140, 150, 160 . . . are accordingly marked along the axis perpen
dicular to the same base line as before, which is used again to
measure the passage of years, but the second graph is drawn below
the line whereas the first was drawn above it. In this way we
are able to compare the two graphs, namely, the one registering
the change in prices and the one registering the change in marriage
rate from year to year.
It is interesting to observe that the two seem to be not uncon
nected : they go up and down almost in the same time, and moun
tains and valleys in the one correspond roughly to mountains and
valleys in the other ; in other words, there is some kind of correlation
or reciprocal relation between them. Now these mountains and
valleys are largely the result of what may be called shorttime
fluctuations^ and it is important to distinguish between these changes
which are transient and the more permanent or longtime changes.
In order to get rid of the former, which sometimes conceal the
latter, the following device has been adopted : noticing that the
wave period, the length of time taken for each complete upand
down motion, is one of about nine years, nineyearly averages have
been taken of the figures for wholesale prices right down col. (2)
of Table (20) ; thus 1393 is the average of the index numbers from
1871 to 1879 inclusive, 1386 is the average of the numbers from
1872 to 1880 inclusive, and so on, the results being recorded in
col. (3). When the points corresponding to these numbers are
plotted we get the broken line in fig. (8) passing through the body
of the original graph of prices and indicating its general trend in
the course of years as separated from the temporary fluctuations.
GRAPHS
79
1870 1875 1880 1885 1890 1895 1900 1905 1910
Fig. (8). Graph showing Variation in Wholesale Prices Index Numbers.
1870 1875 1880 1885 1890 1895 1900 1905 1910
Fio. (9). Graph showing Variation in Marriage Rate Index Numbers.
80
STATISTICS
The same procedure has been followed with the marriage rate
statistics ; the nineyearly averages are shown in col. (6) of Table (20),
and their graph appears as a broken Une passing through the body
of the original marriage rate graph in fig. (9).
«+io
I
I
5
10
+10
+5
Graph 1 showing Fluctuations from iheirlNinf ^ea'rly Averaeres
1 ; pf t;he' Ind^x Numbers of Wholesale Pr ces
L T
J k Jl
4 t IX r
j I tx S
iU t J ^ 4% ^
jr "_/ A . , K /\ A
it X 4 it Z \ t^ XV
Lit ^ X ^T L_ tJ\ t\ _
5\/ \ iJJol iiJ *5 / 1J90 \ 1*95 K itoo V 190S' \ '9?o
X 3 T /^ \ n ^^ t
vU 4 i \ t ^
vl ^\ A \ ^
Xt ^v i vl
jiiy CI ^t
X 4 ' t
1
V4
1
1
/l '^ . 1 i
\ iA A y
it i A a\ jX 7h
4 Iff V t \^ ^^ X tx
x t Jr t z^ ^ t
\ / \ i I /a i \
± ± I ^JL t ^ ^^%t A
5 1 \ 1880/ \iJ8s / 1S90 \ 1895/I 1900 \ i9<5 \1 1910
• ^ 7 \ / / \ I
c f j I 4 i IT V X
\ / 1 / 1 /
X^zL v _t ^^^ il
iri w ^
ti V
\ /
\r
y Graph showing Fluctua ions from their Nine year y Averages
[ of Marriag^rktes    f i
10
Fig. (10).
Suppose we wish on the other hand to study the shorttime
fluctuations as distinct from the longtime changes, we may do so
by forming the differences between the numbers for each year
and the corresponding nineyearly averages, and plotting these
differences on convenient scales.
The numbers obtained in this way are recorded, with their proper
signs — positive if above the average, negative if below — in cols. (4)
and (7) of Table (20), and the graphs of these differences are drawn,
GRAPHS 81
one below the other for comparison, on the same graph sheet
(fig. 10). The agreement in fluctuation from the average between
the two factors, marriage rate and prices, is more easily remarked
now than it was in the original graphs. High prices go as a rule
handinhand with prosperous times, and such times lead to more
frequent marriages. This statement must not be taken to imply
that when prices are high the times are always necessarily pros
perous for the community as a whole : the lie direct would be given
to such an implication by any one who had experienced abnormal
war conditions.
After about 1892, while the fluctuations continue to be similar,
a tendency appears for the marriage rate graph to reach each
extreme point about a year in advance of the other, as though an
increase in marriages raised prices and a decrease lowered them.
There is no doubt that any economic change, especially if it takes
place on a large scale, will set up a system of corresponding forces,
sometimes in unexpected directions, actions and reactions succeed
ing one another at intervals like tidal waves producing each a back
wash as it breaks, but such effects, even when anticipated in theory,
are not always easy to unravel in practice.
The comparison we have been discussing between changes in
prices and marriages is suggested in Sir W. H. Beveridge's Unemploy
ment. The whole book will repay careful study, but it contains
one particularly illuminating chapter on ' CycUcal Fluctuation ' with
a chart labelled ' The Pulse of the Nation,' because of the remark
able picture it gives of the ebb and flow of the tide of national
prosperity. It consists of a series of curves representing respec
tively :^
(1) bank rate of discount per cent. ;
(2) foreign trade as measured by imports and exports per head
of the population ;
(3) percentage of trade union members not returned as unem
ployed ;
(4) number of marriages per 1000 of the population ;
(5) number of indoor paupers per 1000 of the population ;
(6) gallons of beer consumed per head of the population ;
(7) nominal capital of new companies registered in pounds per
head of the population.
The interesting thing about these curves is to see the way m
which they move in waves of varying size up and down almost
together, showing a connection between such phenomena moro
F
82
STATISTICS
intimate than one might at first have suspected. A note of caution
must be inserted here however : cavsal connection must not be too
confidently inferred in discussing the correlation of characters
changing simultaneously with time ; because two events happen
together, one is nof necessarily caused by the other.
An instructive article bearing on this point appeared recently in a
periodical well known to students of social problems. It was there
stated that high positive correlation exists between birth rate and
infantile death rate : in general the two rise or fall together, whence
NeoMalthusians argue that the way to lower a death rate is to
lower the birth rate. The writer then contrasts Bradford, the last
word in the scientific care of infants, with Roscommon, where con
ditions as to wealth and child welfare are the very reverse, and
points out that Bradford has a birth rate of 13 and an infant death
rate of 135, while Roscommon has a birth rate of 45 and an infant
death rate of 35. These figures, he suggests, prove instantaneously
that the NeoMalthusians are guilty of the commonest of all fallacies,
they confound correlation with causation.
As an exercise in plotting the reader may see whether he can
discover any suggestion of correlation between crime and unem
ployment by comparing the following statistics, showing the number
of indictable offences tried in the United Kingdom and the trade
union unemj)loyed percentages respectively from 1861 to 1905 : —
Table (21). Number of tried Indictable Offences and
Trade Union Unemployed Percentages (18611905).
No. of Indictable
Trade Union
No. of Indictable
Trade Union
Year.
Offences tried
Unemployed
' Year.
Offences tried
Unemployed
(in thousands).
percentages.
1
(in thousands).
percentages.
1861
560
37
1874
535
17
1862
613
60
1875
500
24
1863
ei4'
47
1864
584
19
1876
519
37
1865
699
18
1877
538
47
1866
1867
1868
576
595
624
26
63
67
1878
1 1879
1880
560
550
607
68
114
55
1
1869
613
59
1881
606
35
1870
561
37
1 1882
633
23
1871
531
16
1883
608
26
1872
519
09
1884
596
81
1873
535
12
1885
564
93
Graphs
S3
Table (21). Number of tried Indictable Offences and Trade
Union Unemployed Percentages (18611905) — Continued.
No. of Indictable
Trade Union
j
No. of Indictable
Trade Union
Year.
Offences tried
Unemployed
1 Year.
Offences tried
Unemployed
(in thousands).
percentages.
1
1
(in thousands).
percentages.
1886
562
102
^ 1896
507
33
1887
562
76
■ 1897
507
33
1888
585
49
1898
525
28
1889
576
21
1 1899
505
20
1890
550
21
1900
536
25
1891
541
35
1901
555
33
1892
583
63
1902
571
40
1893
574
75
1903
584
47
1894
563
69
1904
600
60
1895
508 /
5.8
j 1905
615
50
The chief point of difficulty in plotting such graphs is the initial
one of fixing upon the most convenient scales to use, and in this
matter hints only can be given, facility will come by practice. An
examination of Table (21) shows that the data cover a period of forty
five years which can be marked off horizontally along a base line so
as just to fit comfortably into the available space across the graph
paper. The unemployed percentages vary between 09 and 114,
giving a range of 105. Similarly the indictable offences recorded
(in thousands) present a range of 133. We might therefore very
well choose the same. vertical scale for the measurement of indict
able offences and unemployment, but, in order that the graphs
may run more or less together (without exactly overlapping) for
the sake of comparison, only the unemployment zero need be taken
actually on the base line, whereas the indictable offences may have,
say, the number 50 (thousand) at that level ; also it will be con
venient to show the scale for unemployment on the right side
and the scale for offences on the left side of the paper.
An example deaHng with matters somewhat different is provided
by a comparison of changes from week to week in —
(1) the mean air temperature ;
(2) the percentage of possible sunshine ; and
(3) the rainfall.
The following is a record of observations taken at Greenwich in
1912 [data from London Statistics, vol. xxiii.] : —
84
STATISTICS
Table (22). Weekly Meteorological Observations
AT Greenwich (1912).
Week
ended—
Jan. 6
Mean Air
Tempera
ture
Degrees
Fahren
heit.
Per
centage of
possible
Sunshine.
Rainfall
in
inches.
1
Week
ended—
Mean Air
Tempera
ture
Degrees
Fahren
heit.
Per
centage of
possible
Sunshine.
Rainfall
in
inches.
457
7
076
July 6
587
15
036
13
419
15
045
13
670
46
020
20
402
1
093
20
658
44
004
27
389
8
088
27
648
31
016
Feb. 3
300
21
002
Aug. 3
578
33
054
10
395
15
052 ■
10
576
28
126
17
455
11
044
17
562
14
023
24
474 .
6
065
24
572
24
127
Mar. 2
498
21
052
31
569
27
133
9
446
31
079
Sept. 7
548
36
021
16
451
16
019
14
524
14
.002
23
427
15
108
21
536
22
000
30
510
46
005
28
515
59
002
Apr. 6
480
43
007
Oct. 5
488
36
230
13
456
43
002
12
460
53
000
20
500
50
000
19
498
38
013
27
526
76
000
26
454
23
0g8
May 4
501
32
021
Nov. 2
491
31
055
11
597
29
006
9
472
6
018
18
552
49
069
16
433
3
017
25
541
38
019
23
462
6
031
June 1
570
47
017
30
404
13
106
8
542
35
099
Dec. 7
424
9
031
15
581
48
039
14
490
2
062
22
617
56
065
21
444
19
059
29
602
45
030
28
481
8
122
The rainfall graph here should be drawn reversed {i.e. so that
it goes up as the rainfall goes down in amount, and vice versa),
because one would expect in general much rain to go with little
sun and low temperature.
The range of temperature during the year is 37 degrees, of sun
shine 75 per cent., and of rainfall 230 in. Hence the vertical
scales for these three graphs might be chosen so that, roughly,
40 units of temperature should correspond to 80 units of sunshine
arid 2 units of rainfall. Also the zeros of the three variables should
be so placed,, relative to the horizontal base line registering 'the
weeks, that the three graphs may be conveniently compared without
causing confusion by too closely overlapping.
CHAPTER IX
GRAPHS {continued)
Graphical Ideas as a Basis for Interpolation. It frequently happens
in statistical records that awkward gaps occur which require to be
filled in ; this may be due to the fact that no record has been
made, or that it has been made with insufficient detail, or that it
has been lost or destroyed. Cases in point arise in connection with
returns Uke that of the Census which can only be undertaken every
few years, so that if figures are wanted for any intervening year,
as they are in very many instances, an estimate has to be made
from the known results of the years recorded. It is imperative, for
example, for many purposes of local or national government, to
be able to find with a fair degree of accuracy the population of
county boroughs and urban or rural districts at any given time,
to know the number of workers engaged in different occupations,
the amount of land in pasture and under various crops, the con
dition of the people as to housing, of the children as to education,
and so on indefinitely.
Symbolically, with the same notation as we have used before,
we conceive the statistics in tabular form, like
•^l> '^2^ '^a * • • "^n
VV 2/2» 2/3 • ' ' Vn
each y denoting the frequency corresponding to the character
measured by its companion x, e.g. the ic's may stand for successive
dates and the 2/'s for the frequencies of the population of a certain
district at those dates. If it happens that one or more of the y's, in
between the first and the last recorded, are missing, the problem is
to estimate the missing values by some method of interpolation, as
it is called. Various methods of arriving at such estimates are used,
but we shall only refer to the more elementary here.
A" rough way of making the estimate, but one which is often as
accurate as the data will allow, is to plot the observations, each
{x, y) being represented by a point, and connect them up, if there
85
86
STATISTICS
be enough of them, by a smooth curve drawn freehand P^ Pg P3 . . . P„
[see fig. (11)] ; to find the y proper to any other x we have then
only to draw the ordinate through the point {x, 0) and measure the
y at the point where it cuts the curve. This is a not unreasonable
principle to follow, for in effect it
gives due weight to each of the
observations actually recorded,
and it assumes an even course
from each one to the next — a
justifiable assumption in the
absence of any evidence that
some sudden discontinuity of
value has taken place.
If only two observations are
given, represented by the points
Pi (a^i, 2/1) and Pg {x^, y^), the
curve connecting them is a straight line, and the y corresponding to
any other x is at once given geometrically, as fig. (12) shows, by
PM PiM
P2M2
PiM,
%,e.
or
yyi _ ^^i
2/2 Vi ^2 ^1
y=yi\
{xx^),
the familiar proportional relation which is employed in this simple
case.
P.
Example. — Given
Required log 5826736.
Fio. (12).
log 582673=07654249,
log 582674=07654257.
GRAPHS , 87
Here a;i=5826730 2/i=07654249
:r2=5826740 2/2=07654257
a:=5826736.
Therefore, by means of the above relation,
,=0.7654249+^:222222_^o.000006)
^ 0000010 '
=07654249+000000048
=07654254.
The logarithmic curye 2/=log x is, of course, not a straight line,
and the value obtained for y only represents a first approximation
to the true value.
When more than two points are given there is bound to be a
margin of inaccuracy, more or less according to the data, intro
duced in drawing the curve. For an example of this method the
reader may refer back to the curve on p. 67, which was used to
determine the median and quartiles. We may, as we saw, read
off from it the number of candidates who obtained not more than
any stated number of marks : e.g. 300 candidates obtained not
more than 34 marks ; or we may use it the other way round and
find the number of marks obtained by a stated number of candi
dates : e.g. 10 per cent, of the candidates got less than 17 marks.
Such examples might be multiplied endlessly, and the method will
be foxmd extremely useful when a high degree of accuracy is not
looked for. But greater confidence will be felt perhaps in such
results — though the foundation for it may be no more secure in many
cases — if we can translate them from geometrical to algebraical
form, if we can find, that is to say, some formula, like the simple
proportional relation already introduced above, which will give
one y when others are known.
In order to make the argument as general as possible we shall
speak of x and y as variables, and we shall think of the value of y
as depending upon that of x in such a way that when x is given,
y is known or it can be estimated * (in the sense that when the
year is given the population is known or can be estimated).
Suppose
y=CQ\c^x\c^x^^r .
[* This is equivalent to assuming that y is some function of x, say y=/{x), and
clearly some such assumption is necessary if any estimate from the known values
to the unknown is to be possible. Further, for simplicity we assume f(x) can
be expanded in a Maclaurin's converging series of ascending powers of ar, which
simply means that we take the relation between x and y to be of the form
adopted above. ]
88 STATISTICS
where the c's are constants to be determined, and their number
can be made to depend upon the number of known values of y
which are used in the estimate.
Geometrically, the equation
represents a curve called a parabola of the nth order, and such
a curve could be employed (and uniquely found — there is only one
parabola of the kind which will go through all the points) if we
based our estimate upon a knowledge of (^+1) 2/'s corresponding
to given a:'s, for we could readily make it pass through the (n\\)
known points (Xq, y^), [x^, y^), (x^, y^), ... {Xn, yj by choosing
the (n\l) c's so as to satisfy the (n\l) simple linear relations : —
2/0^^0 I ^1*^0"!" ^2*^0 "T" • • • 'f'^nXo
■yi=Co]CiX^+c.^Xi^\ . . . +c„a;i"
When the curve is determined, in other words when the c's are
known, we can find any other y required by substituting the corre
sponding X in the equation
y=Co\CiX{C2X^\ . . . +c„x",
i.e. by supposing this point [x, y) to lie on the same curve that goes
through the known points.
It is well to mention here that the parabola is by no means always
the best curve for fitting any given statistics, and when the number
of observations is adequate it is possible often to make a more
satisfactory choice. Once the equation of a suitable curve has
been determined the subsequent interpolation or calculation of y
for any given x is not as a rule a very difficult matter. The larger
question of curve fitting in general is reserved for a later chapter.
Example of First Method (fitting with a parabolic curve). Let us
illustrate this process of interpolation by fitting a parabolic curve
to the following figures, extracted from Porter's The Progress of
the Nation, giving the annual cost of Poor Relief (excluding insane
and casual) at fiveyearly intervals, but with the amount for the
year 1845 omitted : —
Year . . . 1835, 1840, 1845, 1850, 1855]
Cost in £1000 , . , 5526, 4577, ? 5395, 5890J
GRAPHS 89
Assuming that no extraordinary conditions prevailed in 1845 to
cause abnormality in expenditure, let us estimate what the figure
would be for that year judging from the given records just before
and after. Since there are four known points in this case, we take
as the curve through them a parabola of the 3rd order, namely : —
y=Co{CiX{CiX^+C32^ ; . . . (1)
the four known points will then just suffice to determine uniquely
the four arbitrary constants Cq, c^, Cg, Cg. Also, since the x class
intervals are equal, it will simplify the algebra if we measure from
the year 1845 as origin, taking five years as unit for x and £1000
as unit for y, so that we get
x=2, 1, 0, +1, +2 \
y5526, 4577, y^, 5395, 5890J
where yQ is the number to be determined.
Since all five points are to lie on the curve with equation as in
(1), we have by substituting in that equation —
5526=Co— 2ci+4c2— 8C3
4577=Co— Ci4C2— C3
2/o=Co
5395=Co+Ci+C2+C3.
5890=Co+2ci+4c2+8c3.
Adding the first and last of these equations,
2co+8c2 =5526+5890 . . . . (2)
Adding the second and last but one,
2co+2c2=4577+5395
or 8co+8c2=4(4577+5395) . . . (3)
Subtracting (2) from (3),
6co=4(4577+5395) (5526+5890) . . (4)
=4(9972) (11416)
= 3988811416
=28472.
Therefore yo = Co = £4,745 ,000 .
If we only wish to make use of the records for the years 1840
and 1850, the appropriate fitting curve reduces to a straight line
y=c^+c^x,
90 STATISTICS
on which we assume the points
(1,4577), (0,2/o), (+1,5395)
to lie, so that
4577=Co— Ci
5395=Co+Ci.
Therefore, adding the first and last of these equations,
' 2co=4577+5395,
so that yo=Co=£4,986,000.
* Second Method {using a formula connecting the ordinates). When,
as above, the steps from each x to the next are equal, as commonly
happens in practice, it is possible to write down a simple relation
between the y's, known and unknown, without introducing the c's
at all. At bottom the method is the same as the last, inasmuch as
the elimination of the c constants by the first method really results
in the same formula for the unknown y.
Let us represent the given statistics in this case by
Xq, XQ{h, XQ{2h . . . XQ{nh\
2/0. 2/l, 2/2 • • • 2/n J
so that, if the fitting curve be
y=CQ+c^x+C2X^\ , . . +c„a;^,
we have, by substituting the coordinates of the first two points
in this equation,
yi=Co+c^{Xo\h)+c.^{Xo{h)^{ . . . \c^{xQih)''
and yo=CQfCi Xq \C2 Xq \~ . . . c„:^q .
Hence
yiyo=Cih+C2{2xoh\h^)\ . . . +c^{nxQ^^h\ . . .).
Now this result, which we call the 1st difference between the y's,
is of (ri— l)th degree in Xq, so that by subtracting two of the y's
we have reduced the degree in a^o by 1. Similarly,
y2yi=cJi+C2{2xoh+3h^)+ .... +c^(nxo^^h+ . . .)•
Thus we get a series of 1st differences, each with the highest
term of the {n—l)th degree in Xq. Treating them as a series of new
[* The nonmathematical ceader will do well to omit the rest of this section on
interpolation.]
GRAPHS
91
ordinates and forming their differences in the same way, we get
what may be called the 2nd differences between the y^s, a series
of ordinates each with the highest term of degree {n—2) in Xq.
Proceeding in this way the ^rd differences between the y's are a
series of ordinates of degree (n— 3) in Xq, the Uh differences q^xq of
degree (^—4), and so on, mi til ultimately we reach the nth differences,
which are of zero degree in Xq, and consequently involve only h.
It follows that the nth differences must all be equal in value and
therefore, if we go one step further and write down the (n\\)ih
differences, these must vanish altogether.
If the reader finds any difficulty in following the argument
he should test it step by step for himself in the simple case of a
parabola of the third order when it should be perfectly clear.
The formation of the successive differences is conveniently shown
in Table (23).
Table (23).
Successive Differences of
Ordinates.
First
Second
Third
Fourth
Fifth
y
difference
difference
difference
difference
difference
A
A2.
A3.
A4.
ab.
^l3/o)
J/2  1/1 )
¥222/1+2/0^
2/3 22/2+2/1 J
2/332/2+33/12/0^
2/432/3+32/22/1.
Vf.
2/4 42/3+61/2 ■* 2/1+2/0)
2/542/4+61/342/2+2/1'
3/32/2
2/5  52/4+103/3  102/2+52/1  J/o
3/»
2/422/3+2/2
1/4 1/S
2/532/4+32/32/2
V4
y52/4
¥522/4+2/3
2/5
The law of formation should be apparent from this table, for it
is precisely that which we meet in the binomial expansion, e.g. the
Tith difference is of type
, n{n—\) n{n—\)in—2)
+ ( 1^2/0,
and by equating to zero the {n{\)th. difference we have the relation
required between the i/'s.
Example. — Let us apply this method to the ' Poor Relief ' example
already considered. Since there are four knowTi points the relation
between x and y must be of the form
y=CQ+Cj,X\C2X^\C3X^
as before. Hence the 4th differences must vanish, and taking the
92 STATISTICS
. points in order from years 1835 to 1855 as {Xq, ^/o). (^i> ^i)* (^2» Vi)'
(a?a, ya). (^4. 2/4). we get
2/4%3+62/242/i+2/o=0
as the formula connecting five y's, four known and one (1/2) unknown.
Therefore %2=4(2/i+2/3)— (1/0+2/4)
=4(4577+5395) (5526+5890),
which is equivalent to equation (4) on p. 89.
Thus y,=£4,745,000.
Third Method {by means of advancing differences). In the last
method we employed a relation connecting ?/„ with all the preceding
y's, but it is possible also to express y^ in terms of 2/0 and the suc
cessive differences, which may be written /\, /\^, /\^, . . . A** »
we have, in fact, with the notation of Table (23) : —
Ao=2/i2/o. Ao^=2/222/i+2/o» Ao^=2/332/2+32/i2/o, • • •
Thus
2/1=2/0+ Ao
2/2=22/12/0+ Ao'=2/o+2Ao+Ao'.
2/3=%232/i+2/o+ Ao^
=3(2/0+2 Ao+ Ao')3(2/o+ Ao)+2/o+ Ao'
=2/o+3Ao+3Ao'+Ao'.
^4=42/3— 61/2+42/1— 2/0+ Ao*
=4(2/0+3 Ao+3Ao'+ Ao')  6(2/0+2 Ao+ Ao')+4(2/o+ Ao) 
2/0+ A 0*
=2/o+4Ao+6Ao'+4Ao'+Ao*.
Here again the law of formation is clear, and it is readily estab
lished by induction that, for all positive integral values of n,
,„=,„+„Ao+^W+"^^ff^W+ (5)
a series which automatically comes to an end at the term Ao"
An extension of this formula is obtained by writing 6 in place
of w, where 0<^<1. We then get
2/fl=2/o+c^Ao— Y2Ao'^ + — f"2~'3 — ^0   ,    (6)
which enables us to interpolate for a 2/ in between any two of a series
of y's corresponding to x's advancing by equal steps. This relation
is no longer identically true as was (5), for the series on the right
in (6) is unending, but its application in practice is justified when,
as the differences advance, the numbers obtained tend to grow
smaller and smaller, so that the remainder after a certain number
of terms can be treated as negligible. Unless this tendency is
reaUzed without carrying the differences far the formula is not
very satisfactory.
To illustrate the method of procedure the following figures may
be used from Table (7), p. 25:—
Table (24). Ma.rks obtained by certain Candidates
IN AN Examination
No. of
First
Second
Third
No. of Marks.
Candidates.
difference
difference
difference
y
A
A2
A3
Not more than 45
447
37
,, „ 50
484
21
16
1
» „ » 55
505
6
15
12
„ „ 60
511
3
 3
,, ,, 65
514
Suppose now we wish to know the number of candidates who
obtained a number of marks not more than 48. In that case, in
applying formula (6), we have
2/0=447, ^=(4845)/(5045)=3/5,
Ao= 37, Ao'=16, Ao'=l,
and hence, up to this order of differences, the required number of
candidates is given by
(1)
447+5 . 37:^^(16)+i*lliA^
1.2 1.2.3
=447+222+l92+006
=471, approximately.
Also, number of candidates obtaining more than 48 marks, but not
more than 50
=484471
= 13, approximately.
94 STATISTICS
Fourth Method {by means of Lagrange's Formula). We shall
consider one more formula, due to the famous French mathematician
Lagrange (17361813), which is useful when the recorded i/'s corre
spond to ic's which advance by unequal stages.
Let the given statistics be represented as before by
(a^o. 2/o)> (^1. 2/i). (^2. yi)^ ' ' • i^n, 2/n)»
and consider the equation
{xXi){xX2) . . . ixx„)
y=yo
+2/
{Xq Xj){Xq X2) . . . {Xq XjJ
{xXo){xX2) . . . {XXj ^
(Xi SJo/l^l •^2) ' • • l*^! **'«)
{xXq){xXi) . . . (a;a;„_i)
[X„ Xq)(X„ Xj^) . . . [Xji Xni)
It is of the nth degree in x, and it is identically satisfied by the
(n{l) pairs of values
{x=^Xo, y=yo), {x=xi, y=y^), . . . {x=x^, y=yn)
It will therefore clearly serve as the fitting curve
y=Co+c^x+C2X^{ . . . +c„a:«,
being exactly of this type, and in order to get the y corresponding
to any other x we have only to substitute that value of x in (7).
Example. — The following figures, based upon data from Porter's
The Progress of the Nation, show the age distribution of criminals
in the j^ear 1842.
Percentage of criminals up to age 25=520 (?/o).
„ 30=673 (2/1).
„ 40=841 (2/2).
„ 50=924 (2/3).
Let us employ Lagrange's formula to find the approximate
percentage of criminals up to 35 years of age, making use of the
four ordinates given, and taking a;=35. We have
_ (3 530)(3540)(3550) ^^g(3525)( 354 0)(3550)
^~ (2530)(2540)(2550) '* (30^2^5) (30 40) (30 50)
g^^ (3525)(3530)(3550) ^^^^(3525)(3530)(3540)
(4025)(4030)(4050) (5025)(5030)(5040)
^_ 10.41 50475+4205462
=775.
GRAPHS
95
Number of cigarettes bought
Fig. (13).
Reasoning made Clear with the Help of Graphs or Curves. The
graphical method not only produces an instructive picture of a
scheme of observations, but it may also be used effectively on
occasion to pilot one through the intricacies of economic or similar
argument. The eye is a very ready pupil and is quick to pass on
what it sees to the mind ; it acts, that is to say, as an ally to the
understanding, which might get on without it, but which certainly
gets on faster with it.
To illustrate this we shall consider the first principles of an
interesting class of curves relating
to supply and demand.*
Cur'ge of Demand. Conceive a
smoker who buys cigarettes at
the rate of x per day, and pays for
them at the rate of y pence each.
Altogether they cost him there
fore a sum of xy pence per day,
which is conveniently measured
by the rectangle OABC in fig. (13).
Notice that the cost price of each single cigarette is here represented
by the area (2/X 1), while the total expenditure is represented by the
area (yxx).
Now let us suppose his country is at war and that the smoker,
to put himself in a position to discourage luxuries, decides to give
Y up smoking. Let us try to
D ' measure in terms of pence the
cost of this great sacrifice to
' him on the first day.
The first cigarette is probably
the hardest to do without, and
the desire for it is so strong
that, if it were a mere matter
of money and not of patriotism,
~X he would be willing to give as
many pence as are represented,
say, by the rectangle 11 in
fig. (14) in order to have it to smoke. If he went on to bargain
'^C
12 34
Number of cigarettes bought
Fig. (14).
[* A fuller account of these curves will be found in Cunynghame's Geometrical
Political Economy, where a rather more accurate interpretation of "surplus
ralue" is given, involving the introduction of subordinate curves. The
simplified statement here adopted seemed sufficient in an introductory course.
Marshall's Principles of Economics also contains many fascinating illustrations
of the use of such curves, mainly in footnotes. ]
96
STATISTICS
with himself in imagination, he would not be ready to offer quite
so much for the satisfaction of a second smoke soon after the
first : he would perhaps only give a number of pence represented
by the rectangle 22 in the figure for this second cigarette. And
if it came to a third he would offer less still, only ' 33 ' pence
perhaps, for the fourth ' 44 ' pence, and so on. The rectangles
here are of varying height, but each stands on a base of unit length.
Thus we find that the total sum he would be prepared to offer,
bargaining for cigarette after cigarette in this way, would be repre
sented by the sum of the rectangles 11, 22, 33 ... in fig. (14),
where the addition of each unit length along OX means one more
cigarette in imagination smoked, and a diminution of unit length
in an ordinate parallel to OY means a reduction of Id. per cigarette
in the price the smoker would be prepared to pay.
But if he fell a prey to his persistent craving and actually bought
a number of cigarettes represented by OA in the figure, each would
cost him in the ordinary way only a number of pence represented
by AB, say, i.e. area (ABx 1), and his total expenditure would thus
be measured by the area of the rectangle OABC. He would get
them, that is to say, for less than he would be prepared to give
rather than go without them. The difference, the area of the
rectangles making up the portion BODE of fig. (14), represents the
measure in pence of surplus enjoyment which he would obtain free
of charge, or it represents the
measure of free sacrifice he
makes if he is true to his
patriotic principles.
Let us now take an example
on a larger scale. Imagine a
small community of people,
producers and consumers, buy
ing and selling among them
selves. Some of them are
coalowners and sell coal to
the others in the open market,
where competition is supposed free and unrestricted in any way. This
last condition is emphasized, because it is seldom perfectly satisfied
in the real world of commerce.
Just as in the previous case we may represent the number of
cwts. of coal bought by a length OA measured along OX in fig. (15),
and the price actually paid in shillings per cwt. by the area of a
rectangle on unit base and of height 00 along OY. Thus the
12 3 4 A
Number of cwts. of coal bought
Fig. (15).
GRAPHS
97
total cost to the consumers in shillings is measured by the area of
the rectangle OABC.
But here again we may picture the consumers during a coal
shortage, when, rather than go without the first cwt. of coal, some
one among them would be ready to offer for it as many shillings as
are represented by the rectangle 11 in fig. (15), and for the second
cwt. some one would be ready to offer ' 22 ' shillings, for the third
' 33 ' shillings, and so on. The demand for coal could thus be
measured in shillings by the sum of the rectangles 11, 22, 33
. . . and, if OA runs into thousands of units of coal, the lengths
01, 12, 23 . . . along OX, corresponding to additions of 1 cwt.
in the quantity bought, would in the limit be so small that the
sum of the rectangles would become practically equivalent to the
curvilinear area OAED in the figure, where DE is a curve drawn
through the summits of the rectangles, namely the curve of demand.
The consumers' surplus in this case would be measured in shillings
by the area BCDE, this being the difference between the measures
of the sum actually paid for the coal bought and the sum consumers
would have been willing to pay rather than go without it.
Curve of Supply. Now let us consider the question from the
point of view of the coalowners. We shall assume that the average
cost of production per cwt. of
coal increases steadily as the
number of cwts. produced in
creases ; this would not be an
unreasonable assumption in most
cases after passing a certain point,
since the richer coal measures
known are likely to be mined
before the poorer ones, and the
cost of mining near the surface
is bound to be less than when
deep shafts have to be bored.
If, then, OA, fig. (16), represents the number of cwts. of coal
sold, and if the price in shillings per cwt. at which it is sold is de
noted by the area of a rectangle on unit base and of height OC
along OY,/the total payment received by the coalowners will be
measured in shillings by the area of the rectangle OABC.
But the cost of producing the first cwt. is perhaps measured
by the rectangle 11, that of producing the second cwt. by the
rectangle 22, the third by the rectangle 33, and so on, each rectangle
being drawn on unit base representing an advance of 1 cwt. (The
12 3 A
Number of cwts. of coal sold
Fia. (16).
98
STATISTICS
advance in the cost of production would not in reality be measured
by so much the cwt. of course, but the assumption is inaccurate
in degree only, not in principle, and, by making it, the argument
is rendered clearer.) Thus the actual cost of production is, in the
limit when OA is very large and divided up into relatively very
small parts, measured in shillings by the curvilinear area OAED,
where DE is a curve drawn through the summits of the rectangles,
namely, the curve of supply.
The difference, BODE, between the areas OABC and OAED
represents what is known as producers' surplus, for it measures the
profit made by the owners in selling the coal at a higher price than
the cost price of production.
Now let us combine the curve of supply (S.C.) and the curve of
demand (D.C.) in the same 'figure, fig. (17). Their meeting point
P determines the number of cwts.
of coal bought (x), and the selling
price in shillings per cwt. (y).
For it is clear that under normal
conditions it would not be profit
able to coal producers to pass this
point, because beyond it the de
mand on the part of coal consumers
measured in money is less than
the cost of production : they are
not willing on the average to pay
so much as ys. per cwt. for it,
and it costs more than i/s. per cwt.
on the average to produce. If,
on the other hand, the amount of coal produced decreases below
X cwts., the greater this decrease the higher does the profit become
on the sale of it, because the greater is the difference between the
cost price and the selling price ; hence, as profits become more
pronounced, recruits will be attracted into the coalproducing
business, and, if this goes on, deeper shafts will have to be bored
and poorer fields worked until profits begin to decrease again and
the supply once more approaches x cwts. Thus sooner or later
the production of coal and its market price will tend to the level
determined by the equilibrium point P where the supply and
demand curves meet.
Endless varieties of problems may be discussed by altering the
conditions and observing the effect produced in the standard
diagram. Three examples will suffice to illustrate the method.
N X
Number of cwts. of coal bought or sold
S.C. = Supply curue
D.C. =Demand curue
Fig. (17).
GRAPHS
99
1. Effect of a Change in Normal Demand. Here we suppose the
normal conditions of supply are unaltered — it costs just as much
as before to produce the same amount of the commodity in question ;
but a more eager demand on the part of consumers shows itself in a
readiness to purchase more at any given price than would have
been purchased under the old conditions : this may conceivably
be due to a general increase in the purchasing power of these con
sumers, or it may be the result of a shortage of some other com
modity which causes this one to be more widely used, just as
margarine, for instance, has been known to take the place of butter ;
whatever the reason may be, the effect is that the demand curve
now occupies a higher level throughout its length, D'C in place of
D.C. in the figures.
When we turn to the supply side of the question, there are three
Y
N N' X
Decreasing Return
stages which, although they shade into one another in practice, it
is well to separate clearly in theory : (1) the only supplies immedi
ately available are those actually in the hands of dealers ; (2) to
meet the increased demand, and so earn for themselves increased
profits, manufacturers wdll speed up production, by working over
time, etc., with the help possibly of any disengaged labour or
capital they may be able to secure, and the resulting extra supphes
will be available after a short time ; (3) if the demand continues
unabated, manufacturers, by offering higher wages and interest,
will seek to attract fresh labour and capital from other engagements
into their business, and, by renewing their machinery and generally
improving their organization, they will produce on a larger and
relatively more economical scale. Moreover, other manufacturers,
seeing the profits to be earned, will be attracted into the same line
of business also, so that by this time the current available supplies
of the commodity may exceed very appreciably their old figure.
100
STATISTICS
But all this happens only in the long run, and the economist has
always to bear this extremely important element of time carefully
in mind when he seeks to estimate the effects of any proposed
action.
We assume then that the new demand remains long enough at
its higher level to allow for the gradual adjustment in this way of
supply to the changed
conditions, and for the
economic forces called into
play once again to arrive
at a balance between
them, most likely at a
new equilibrium point.
3C. The two figures illustrate
the difference in effect
according as the produc
tion of the commodity is
subject to a decreasing or
Increasing Return
an increasing return, i.e. according as the cost of production rises
or falls when the amount produced is increased. In both cases it
will be noted that more of the commodity is produced (ON' in place
of ON) in answer to the keener demand, but the difference is much
greater in the second case than in the first. Also the price has
gone up in the first case, while in the second it has gone down,
the difference being measured by the change in PN.
2. Effect of a Tax. If the
tax is at the rate of so much
per unit (say Is. per unit, if
the price is measured in shil
lings) of the commodity pro
duced, this will raise the
supply curve, S.C., bodily up
a distance of 1 unit into the
position S'.C, fig. (18), be
cause the effect is the same
as if Is. were added to the
cost of each unit in produc
tion. The production will
thus be diminished by N'N units, for P' is the new equilibrium
point ; the selling price will be increased by P'Ms per unit — by
less, it should be noted, than P'Q or K'K, the amount of the tax ;
producers' surplus, which is analogous to what economists term
N' N
Fig. (18).
GRAPHS
101
Y
d"
V
L
^v^^
i
y
3^''^
S
""^v^F
P'
0^
^^^^^^"^^
*
K
^
\' N
X
Fig. (19).
rent, is diminished by (area KPL— area KT'L')s ; consumers'
surplus is diminished by (area PLL'P')s ; finally, the tax produces
for the Treasury a number of shillings represented by a rectangle
with sides of length ON' and KK'.
3. Effect of a Monopoly. A monopolist has the power to stop
production short of the true equilibrium point, so that ON' cwts.,
fig. (19), are produced in place of the ON cwts. which free competi
tion would demand. The selHng price is thus raised by Q'Ss. per
cwt. ; producers' surplus is increased by (area KP'Q'M'— area
KPL)s ; while consumers' surplus
is diminished by (area PLD— area
DM'Q')s.
A word of explanation is neces
sary before leaving the subject of
these supply and demand curves.
It is probable that the reader will
have questioned the possibiHty of
drawing such curves for any com
modity with sufficient accuracy to
be of any value, but it would be
enough as a rule to be able to estimate what would happen
if a slight variation occurred in price or in production, and such
an estimate may sometimes be made by actual trial : e.g. a good
practical farmer most likely knows nothing about supply and
demand curves as such, yet from past experience he has a pretty
shrewd notion as to how far it may be profitable to spend an extra
pound here in rearing calves and a pound less there in cultivating
crops, bearing in mind the prices which cattle and com might be
expected to fetch. From his point of view the interest of the
curves, if he knew anything of them, would be centred in those
portions which correspond to normal conditions, i.e. somewhere in
the neighbourhood of the equilibrium point under the free play of
ordinary competition.
Their real value, however, as suggested at the beginning, does
not consist in the practical assistance which they afford to the pro
ducer or consumer, by way of foretelling the actual measure of
consumption or production, so much as in the light they throw
upon general tendencies which are rather apt to be obscured if they
are ponderously presented with elaborate economic argument.
They make plain in a moment to the eye what can only be stated
in two or three pages of writing.
CHAPTER X
COBRELATION
One of the most important questions which can be discussed by
statistical methods is that of possible connection, or correlation, as
it is called, between two sets of phenomena. If some factor in
each can be isolated and measured numerically, our object is to
discover if the size of either is sympathetically affected when a
change occurs in the size of the other ; or, to put the matter in
another way, do large values of the one factor go with large values
of the other factor and small with small, or vice versa ? And, if
some mutual dependence of this kind exists, can an estimate of
its extent be made ?
Consider, for example, the factor or character of height in husband
and wife. Is there any connection between stature of husband (x)
and stature of wife (y) ? Do tall men tend on the average to wed
tall women, or do we find tall men choosing short women for wives
just about as often as they choose tall women ? When correla
tion exists we shall want some measure for it which wiU tell us
the amount of change or devia
tion from the average in either
character associated with a given
change or deviation from the
average in the other.
In studying graphs we saw how
some hint of the existence of
correlation might be discovered,
but we wish now to go a little
more deeply into the subject.
The first step is to measure an
adequate number of pairs of values, x and y, of the characters
concerned in order to find what values are associated together,
and how frequently the same values are repeated. When this is
done we can draw up a table of double entry, see fig. (20), setting
out in rows and columns the frequencies observed. An examina
tion of Table (25), showing the variation of braiii weight with age
102
x^
AT,
^3
xp
y.
y.
y^
Fig. (20).
CORRELATION
103
in the case of 197 Bohemian women, will make clear what is meant.
The x's from x^ to x^ and the y's from y^ to y^ are supposed to
ascend in magnitude, and when, for example, the pair of values
(Xg, yz) is observed to be repeated nine times, the number 9 is placed
in the second column and third row of the table, so that the frequency
of each class is found recorded in the square proper to it : thus,
out of the sample in Table (25), there are 10 women between the
ages of 40 and 50 with brams weighing between 1300 and 1400
grams.
Table (25). Variation of Brain Weight with Age in the
Case of certain Bohemian Women.
[Data from Biometrika, vol. iv. pp. 13 et seq.. Variation and Correlation
in Brain Weight, by Raymond Pearl.]
Age in years
^1
2030
^2
3040
^3
4050
5060
6070
7080
Totals
CO
i
.^
c:
s
1
y.
10001100
1
_
1
1

3
y^
11001200
2
2
4
2
5
4
19
^3
12001300
28
9
8
14
10
4
73
13001400
26
14
10
6
5
4
65
14001500
13
7
7
2
2
31
15001600
2
3
■ 
1


6
Totals
72
35
30
26
20
14
197
Mean y
1325
1350
1310
1285
1250
1279
When each class interval, as in this table, includes a small range
of values, the x and y may, as an approximation, be taken as the
mid values of their class intervals : 2/3 would be taken, for instance,
as 1250, though it really includes all values between 1200 and
^x
104 STATISTICS
1300 grams. Strictly in such cases each single observation is not,
geometrically speaking, located at a definite point, but lies some
where within a small area, though it is treated as if it had the values
X and y which apply to the centre point of the area. It is some
times possible to correct for this assumption by what is known as
Sheppard's adjustment, but we shall not concern ourselves with
the correction in the present discussion, so as to avoid complications,
because the difference made is not generally large.
The table, when drawn up, may immediately suggest some
intimate connection between x and y. It may indicate that as
X increases y also in general increases, or that y tends to fall in
value as x grows bigger. But a more refined analysis is neccHsary.
It would be instructive perhaps to travel along the row of x'a, find
ing what mean value of y is associated with x^, what mean value
of y is associated with X2, and so on. This would give a sounder
basis for judging whether, as x increased, y in general increased or
decreased as the case might be : for example, in Table (25) the
mean values of y associated with the several types of x are shown
in their proper columns at the foot of the table and clearly, as
X increases, y tends to decrease, apart from conflicting readings at
the beginning and end of the table, and the latter of these may not
be significant of any real difference in brain weight at the end of
life, for it is only based on fourteen observations ; generally, the
inference from this table would be that the weight of the brain
decreases as the age increases after maturity is once reached,
although, of course, it would be rash to make more than a tentative
statement with so small a sample at our disposal.
Let us suppose y^ to be the mean value of y associated with x^,
y^ the mean value of y associated with 0^2, y^ with Xq, and so on.
If these values {x^, y^), [x^, y^), {^3, §3), etc., are plotted, it is very
often found that they cluster more or less closely about a straight
line, see fig. (21), so that we are led to ask whether there is not
some line which will very fairly describe the run of the points ;
the equation of such a line would be
y=.mx\c,
and if m and c were known we could find from this equation the best
average value of y corresponding to any given '^.
But, on reflection, ^1, §2, ^3 • . • are themselves only the best
2/'s corresponding to the particular values Xi, Xg, Xq . . . oi x, so
that the problem is really the same as that of finding the relation
y=mx\c,
CORRELATION
105
based on all the observations, which will enable us to estimate the
best y corresponding to any given x.
Now for any value x^ of x the value of y given by this relation
is (mx^ic), while by observation we may find more than one value
of y corresponding to the value Xj^ of x. If y^ be one such value
the dijfference between it and the value given by the above rela
tion is
(ma;i+c)— 2/1.
This difference we may regard as the error made in estimating _y A
from the relation instead of taking the value given by observatibn //
Y Pr'a^A ky^Aa^ i^fl,*.' lU^ritl R ^inJTxU iml^ifL
>  asso :i a ;ed w 1 1 va 1 )vis A p b f es
s
s
V
1325 ■• " :S^ " 1" " "
S ^  
Q • V '
1. t ^t
^ , _^s, ±
^ ± ^,.
o> X *S
1 1275   __vj_ _
. : ~ . : ^ : :
s ^ 
5 ^s^
\
^
O20
40
50 60
Age in years
Fio. (21).
70
80
which for the moment we think of as the true value. The best
relation will then clearly be the one which makes all such errors of
estimate as small as possible. But, algebraically, some of these
errors are positive, i.e. the value of y given by the relation is greater
than that given by observation, and some are negative, and it is
only their magnitudes that we wish to take into accomit. Accord
ingly we follow the method used in finding the standard deviation
in order to get rid of the ambiguities of sign : we form, that is to
say, the sum of the squares of the errors, because the expression so
formed will clearly be least when each separate error is as small as
possible in absolute magnitude.
106 STATISTICS J
To find, then, the values of m and c which will make
(mXifc2/i)2__(,^^^c2/2)^f . . . +(^a^n+cyn)^ j
a minimum (see Note 7 in the Appendix), where n is the total '
number of pairs of observations.
The required values are given by differentiating, first with regard i
to c treating m as constant, and then with regard to m treating c
as constant, putting each result equal to zero. Thus
(ma;i+c2/i)+ . . . +(ma;„+c2/J=0 J^ .
Therefore m(x^{ . . , \Xn)\nc—{y^^ . . . +2/n)=0
m{x^^^ . . . +a:„2)+c(xi+ . . . ■^x^)^(x^y^\ . . . x^y^)=0.
The first of these equations gives i
m(nx)\nc—{ny)=0, '■■
I
i.e. mx{c—y=0, j
i
where x is the mean of all the x's and y is the mean of all the y's, j
and it expresses the fact that the line y=mx\c passes through '
the point {x, y). \
This might have been expected, for, graphically, each pair of ;
observations (:tj^, 2/1), (a:2J 2/2)' (^3' 2/3) • • • corresponds to some point, <
and if we look for the line y=mx{c passing through the region
where they cluster most thickly together we should certainly expect
it to pass through their mean or centre of gravity [x^ y). This j
suggests how the values of m and c may be considerably simplified.
If we measure all the cc's from x, their mean, and all the ^/'s from y, j
their mean, which is equivalent to taking the point (x, y) as origin
and replacing every x by its deviation ^ from x and every y by \
its deviation "n from y, the first of the above relations is reduced
to c=0. and therefore the second becomes !
©
m(^^^+ . . .
+L')(^x\+ ■ ■ ■
+^»%)=o.
Hence
■ ■ +Lnn)l(^x'+ ■ ^
. • +L')
where p is the mean of all the product pairs f^, and a^ is the standard
deviation of all the a:'s.
CORRELATION 107
Thus the required equation for estimating the best v correspond
ing to any particular f is
p ' ' >
whence y—y=^A^—^) • • . (1)"" i.
The coefficient p/crj^ in this equation evidently gives the deviation
in y from the mean y con*esponding to unit deviation in x from
the mean x, for when {x—x)=l, {y—y)=p!(Tj^. Hence the greater
this coefficient is, the greater will be the change in y resulting from,
or at all events coexistent with, unit change in x.
Thus p/aa:^ would seem to supply a not unreasonable measure of
the correlation between x and y. But there is something very
unsymmetrical about this result. Why should the correlation be
measured by pla^s^ any more than by pJGy^ ? In fact, we might
repeat the whole of the previous argument, interchanging x and y
throughout wherever they appear. In that case we should first
travel down the column of i/'s and calculate the mean values of x
associated with 2/is 2/i» 2/3 • • • respectively. This would give a set
of points {xj^, 2/1), {x^, yo), (Xq, 2/3), • • . , which, when plotted, would
perhaps lie approximately in a straight line. We should thus be
led to look for some relation
x=m'y{c'
which would enable us to estimate the best average x corresponding
to a J/ of given type, and, proceeding just as before, we should
ultimately obtain the equation
or (xx)=^^Ayy). . .  (2)
^/
in which the coefficient pjuy^ givQQ now the deviation in x from the
mean x corresponding to unit deviation in y from the mean y.
Hence p/ffy^ has, seemingly, just as much claim asp/cja^ ^o measure
the correlation between x and y. The one gives the change in x
corresponding to unit change in y : the other gives the change in y
corresponding to unit change in x ; and the only reason why they
differ is because unit change in x does not mean the same thing as
unit change in y : their standards of changeableness or variability
are not equal. If then we could alter the scales of measurement
so that unit change in each were of the same magnitude, the two
coefficient!^ obtained ought to become identical, and we should then
have a really satisfactory measure for the correlation required.
108 STATISTICS
With this object let us examine the variability of the x's and
compare it mth the variability of the t/'s. Now the total dispersion
of the di£ferent x's on either side of x, the mean x, is conveniently
measured by g^, their standard deviation. And similarly the
dispersion of the y's on either side of y, the mean y, is measured
by Gy. The bigger cja is, the greater is the variability of the oj's,
and the bigger Cy is, the greater is the variability of the y's. Hence,
in equations (1) and (2), (x—x) should be divided by oa. and [y—y)
by Gy if we want to work with the same unit of change or variability
in each case. The equations then become
and
\ Gy / Gg,Gy\ Gj.
xx\ p iyy
^x^y\ ^y
Write r=plGjPy ; then r is taken to be the coefficient of correla
tioUy for it measures the change in either character corresponding to
unit change in the other when the units are made comparable.
The lines giving the best y for a given x and the best x for a
given y may now be written
y—y^r^(x—x)
and x—xz^r—iy—y),
G„
and they are called lines of regression. The term regression was
first used by Sir Francis Galton in a paper entitled Regression
towards Mediocrity in Hereditary Stature, though the root idea
is not by any means confined to characters affected by heredity :
it holds for any pair of correlated variables. Galton found that
if a number of tall fathers are selected and their heights measured,
the mean height being calculated, and if, further, the heights of the
sons of these fathers are measured, their mean height being like
wise calculated, the latter is not equal to the mean height of the
selected fathers, but is rather nearer the mean height of the popula
tion as a whole. There is, that is to say, a regression or stepping
back of the variable towards the general average. Professor Karl
Pearson has remarked that ' in the existing state of our knowledge
the recognition that the true method of approaching the problem
of heredity is from the statistical side, and that the most we can
hope at present to do is to give the probable character of t^ e offspring
CORRELATION 109
of a given ancestry, is one of the great services of Francis Galton to
Biometry.'
The expressions r— and r— are called coefficients of regression,
and they register in the above particular case the amount of abnor
mality to be expected in the height of the sons when the amount of
abnormality in the height of the fathers is known, and vice versa.
The regression of the sons' height, y^ on the fathers' height, x, is,
in fact, defined as the ratio of the average deviation of the heights
of the sons from the mean height of all sons to the deviation of the
heights of the fathers from the mean height of all fathers, and hence
it may be written
To make the definition more general, instead of speaking merely in
terms of height, we refer to any row or column — ^for there is no
intrinsic difference between row and column — in a table like
Table (25) as an array of y's or of x's, and selecting a particular
type J say a particular value of x (like fathers of height x), we define
the regression of the corresponding array of y's (like heights of sons
of these fathers) on the type x to be the ratio of the average devia
tion of the array of y's from the mean y to the deviation of the
selected type x from the mean x.
Example. To illustrate, let us take some figures due to Professor
Pearson and Dr. Alice Lee [Biometrika, vol. ii. pp. 357 et seq., On
the Laws of Inheritance in Man]. Suppose the mean stature of all
observed fathers, based on a sample of over 1000 observations
=6768 in., with S.D.=270 in.
Also suppose the mean stature of all sons= 6865 in., with S.D.
=271 in., and that the correlation r between stature of father
and stature of son= 0514.
The regression of son on father as regards stature is then given by
(.V6865)=: (0514)— (x6768)
where x is the height of selected fathers and y the mean height of
their sons.
Hence 2/=0516x+3373,
so that if we selected fathers of height 70 in., for example, the
mean height of their sons would not be 70 in., but
(0516)(70)+3373=6985 in.,
110 STATISTICS
i.e. there is a regression towards the general mean, 6865 in., of
all sons.
Also the coefficient of regression
^ =(0514)(271)/(270)
=0516.
It is not difficult to show that the greatest numerical value r
can in general take is unity, for consider the expression for the
sum of the squares of the differences between the observed devia
tions of the y characters from their mean and the corresponding
deviations as deduced from the best fitting regression line,
y—y=r^(x—x).
If, with our previous notation, 'n denote the observed deviation of
the one character y, associated with a particular deviation, ^, of
the other character, x, then, since (rajay,)^ denotes the best value
given by the line, the sum of the squares of the differences between
these values
o„ . a
2
=na^\\r\
Since the sum of a number of squared quantities must be positive,
it follows that r^ must be less than 1 and hence r lies between —1
and +1.
Further, n^y^{l—r^) can only vanish if every one of the squared
quantities on the other side vanishes independently of the rest,
so that we onJxget r=:^l, when
In this case the deviation of the one character from its mean is
always exactly proportional to the deviation of the other character
from its mean, and the correlation is then said to be perfect, for
it is equivalent to causation. In perfect correlation a onetoone
correspondence thus exists between the values of the two char
acters, for to one value of either there corresponds one and only
one value of the other, and the standard deviation of the array
CORRELATION
111
(measuring its variability) corresponding to any selected type
vanishes.
Zero correlation is at the opposite extreme where, no matter
what the type selected in the one character may be, the mean
value of the array in the second character i^ unaffected, because
the two characters are quite independent or uncorrelated ; the
deviation of y from its mean bears no relation at all to the deviation
of X from its mean, and unit change in either is associated with no
particular change in the other, so that r must in this case be zero.
When r is negative, since (y—y)l{x—x)=ra^x and the o's are
necessarily positive, corresponding to any value of x above the
mean of all the x's the best value of (y—y) is negative, that is, the
best value of y is below the mean of all the y's, and vice versa.
This means that in general high values of x would be associated
with low values of y, and vice versa.
If we take the mean as origin so that the regression lines become
y=rayl(T^ . x,
x=rajay . y,
these Hues coincide with the axes when the correlation is zero,
and with one another when r=±l and the correlation is perfect,
fig. (22). Given two equally
variable characters (cra.=<7j,) and
perfect correlation, the regres
sion lines coincide with one of
the bisectors of the angle formed
by the axes.
It may be helpful to look back
again now at the graphical view
of the argument leading up to
the determination of the co
efficient of correlation. For
successive values of x we calculated the means of the several
2/'s observed, these being presumably the best available y's corre
sponding to the particular x's selected, and we assumed that,
when plotted, the points so obtained, {x^, y^), (ajg, ^2)' G'^^a? ^3)' • • •'
lay roughly in a straight line. In the same way we calculated the
means of the several x's observed to correspond to particular y's
selected, and again we assumed that the resulting points, (Xj, y^),
(^2> 2/2)' (^3> 2/3) •• • lay roughly in a straight line. These assump
tions are justified in very many cases, but when they fail recourse
must be had to other methods beyond the scope of this book. [See,
^^^ (Mean)
Regression Lines when
Correlation is perfect (r'=¥\}
Fig. (22).
112
STATISTICS
for example, Pearson's paper in Drapers' Company Research Memoirs
Biometric Series ii., On the Theory of Skew Correlation and Non
linear Regression, introducing the correlation ratio, v, which is
equal to r in the particular case when the regression is linear.]
Sometimes, again, although the observations are so scattered that
the assumption of a straight line to describe the best fit seems
somewhat wide of the mark, it may be justified on the ground that
no better graphical result would be given by using any other curve
in place of the line. Moreover the linear expression, y=mx\c,
is simple and may serve to give at all events the first two terms of
some more complex relation supplying an estimate for the most
probable y corresponding to a given x.
If we had plotted all the original pairs of observations, instead
of plotting certain ic's and the mean t/'s associated with them, or
certain i/'s and the associated mean
ic's, the two lines of regression would
not have stood out so clearly : they
would have lacked definition, like an
optical image which is not strictly in
focus, but there would have been a
concentration of observations, as of
light, in the neighbourhood where the
lines of regression intersect, namely
at {x, y), the mean of all the a;'s and
all the 2/'s. When, however, the lines of regression lie close together
they become more clearly defined, all the observations being centred
then more nearly in one line, and the correlation tends towards
perfection. Such cases are frequent in Physics but rare, if found at
all, in that class of Statistics into which the element of human
impulse enters. When r is less than 1 the lines of regression, if the
regression is of linear type, will be inclined to one another at some
angle between and 90 degrees.
If only a rough value of r, the correlation coefficient, is required,
that may be obtained by merely estimating the gradient of each
regression line and multiplying the results together, one measured
relative to the axis of x and the other relative to the axis of y,
for this product
= (regression of y on x) (regression of x on y)
CORRELATION
113
Such an estimate may also be useful, though it may not be very
dependable, when the complete distribution of characters is not
known, for either regression line can be drawn when any two points
on it are known and a single array of values of either character
corresponding to a given type of the other is sufficient to fix one
such point ; also the mean {x, y), if it were known, would at once
give a point common to both regression lines. When all the facts
are available, however, the method of calculation is to be preferred
to that of simply graphing the observations and their means, as there
is bound to be a certain amount of guesswork and consequent error
in deciding from a graph how the best regression lines run.
It is frequently convenient to refer the deviations of the given
variables to some point other than the mean (x,y) as origin, and,
when this is done, a correction
must be applied to the resulting
value of r. We have already
explained how, in such a case,
to correct for standard devia
tions, and, as r—pja^dy, it only
remains to explain how to cor
rect for p.
Now p is given by
where the 's and ^'s denote deviations from x and y respectively.
Fig. (23) indicates the changes necessary in transferring from some
origin to the mean G. The coordinates of P (representing a
typical observation) referred to O are {x, y) and referred to G are
(f, 7/). Also the point G itself referred to is (x, y). Thus
i=X—X, •n = y—y^
and np becomes
(x,x)(y^y)\ . . . ^(Xnx){yny)
={^iyixyiyxi+xy)+ . . . +{Xr,yn^yny^n+^y)
= K2/i+ . . . +^n2/«)^(2/i+ • • • +yn)y(Xi+ • • • +^n)+nxy
= (^i2/i . . . \x^7j^)x.nyy .nx+nxy
=Z(xy)~nxy,
where S(xy) denotes the sum of expressions of the type xy.
Hence the corrected value of p
=(^(xy)ln)xy, i >.
H
Y
y'
^
F
3
t
x'
•T
X \
i
— 
X 
 >
X
Fig. (23).
114 STATISTICS
We proceed to a few applications of these results in the next
chapter.
[As early as 1846 a French physicist, Auguste Bravais, had conceived the
surface of error as a means of describing in space the path of a point whose x
and y coordinates are subject to errors which are not independent. It is
astonishing that although his work really embraces the fundamentals of the
theory of correlation as afterwards developed, it lay dormant for nearly forty
years until Sir Francis Galton introduced on graphical lines an improved nota
tion (Galton's function, or the coefficient of correlation) and gave practical
examples of its use.
A little later Edgeworth (1892), using Galton's function, independently
reached some of Bravais' results for the correlation of three variables, and
showed how they could be extended. Karl Pearson, in 1896, contributed
to the Royal Society Transactions a fundamental paper on the subject, with
special reference to the problem of heredity, drawing attention to the best
value of the correlation coefficient, and how it should be calculated. (See
Appendix, Note 11.) Yule, returning in the following year to Bravais' for
mulae, showed their significance also in the case of skew correlation.
Pearson afterwards developed a method of determining the correlation of
characters not quantitatively measurable, and in a discussion of the general
th3ory of skew correlation in another paper he proposed a new function, the
correlation ratio, applicable to the case of nonlinear regression.]
CHAPTER XI
CORRELATION — EXAMPLES
Example (1). — To find the correlation between Differences in Whole
sale Price Index Numbers and in the Marriage Rate from their corre
sponding Nineyearly Averages during the twenty years, 18891908.
using the data given on p. 77.
Table (26). Correlation between Differences in Wholesale
Prices and Marriage Rate from their respective Nine
yearly Averages.
(2) (3) (4) (5)
(1)
(6)
Year.
Difference in
Prices from
9yearly Average,
Square of
No. in
Col. (2).
Difference in
Marriagerate
from 9yearly
Average.
Square of
No. in
Col. (4).
Product of No8.
in Col. (2) and
Col. (4).
{X)
(x^)
(y)
{y')
1889
+ 09
081
+ 1
1
+ 09
1890
+ 23
529
+ 6
36
+ 138
1891
+ 70
4900
+ 6
36
+ 420
1892
+ 24
576
+ 3
9
+ 72
1893
+ 20
400
 6
36
120
1894
 28
784
 5
25
+ 140
1895
 43
1849
 6
36
+ 258
1896
 61
3721
+ 1
1
 61
1897
 37
1369
+ 3
9
111
1898
 02
004
+ 4
16
 08
1899
 16
256
+ 6
36
 96
1900
'\ 53
2809
+ 1
1
+ 53
1901
+ 10
100
. .
. ,
. .
1902
 05
025
+ 1
1
 05
1903
 14
196
 1
1
+ 14
1904
 13
169
 3
9
+ 39
1905
 24
576
 2
4
+ 48
1906
 05
025
+ 3
9
 15
1907
+ 32
1024
+ 6
36
+ 192
1908
 18
324
 2
4
+ 36
+241266
19717
+4125
306
+ 1419416
136
116 STATISTICS
The arithmetic is comparatively simple in this case because
there is only one value of each variable corresponding to each year,
so that there is no weighting or grouping to complicate the analysis.
The variables x and y, between which we wish to find the correlation,
appear in col. (2) and col. (4) in Table (26), and the positive and
negative differences are separated from one another in each case
so as to make their summation easier.
Thus for the arithmetic mean of the numbers in col. (2), we have
^=(+24l266)/20=0'125 ;
and for the mean of the numbers in col. (4), we have
j^=(+4125)/20=+08.
The straightforward procedure would now be to get the twenty
corresponding values of ^ and v, the deviations of the twenty aj's
in col. (2) and of the twenty y's in col. (4) from x and y respectively,
and, having found 03. and ay, we could immediately deduce r from
the formula
r~pla,^(Ty
But it is simpler to measure the deviations from (0, 0) as origin
rather than from the mean (—0125, +08), because x^, y^, and xy
involve fewer significant figures than would ^^ i/2^ ^nd ^^/, and,
of course, it will be necessary to correct for this at the end in the
usual way.
The mean square deviation of x referred to zero as origin
= 19717/20, by col. (3).
Therefore, cr^^^ 19717/20 (0125)2=9843
(7,314.
Again, the mean square deviation of y referred to zero as origin
=306/20, by col. (5).
Therefore, (j/=306/20 (08)2= 14.66
c7^=383.
Also the corrected p
= {Exy)lnxy
= 1003/20 (0125)(+08), by col. (6)
=5015+0100
=5115.
Hence 'f=vl<^x'^y
=5115/(314)(383)
=043.
CORRELATION — EXAMPLES 117
It is necessary to be careful with the signs in forming the numbers
in col. (6), but otherwise the actual calculation should present no
difficulty.
The regression equation giving the best marriage rate difference,
Y, for a given wholesale price difference, X, from their respective
nineyearly averages is
{Y0'8)=r^ . (X+0125)
= (043)gj(X+0.i25)
i.e. Y=:052X+086.
The regression equation giving the best wholesale price difference,
X, for a given marriage rate difference, Y, from their respective
nineyearly averages is
(X+0125)=r^ . (Y08)
=035(Y08)
i.e. X=035Y040.
We noted that fig. (10), p. 80, suggested a closer correlation
between the two factors we have been considering during the
earlier years of the period 18751908 than during the later years.
It might be worth while as an exercise to see if this is borne out
by calculating r for the years 18751889, and comparing it with
the value found for the years 18891908.
Example (2). — To find the correlation between Overcrowding and
Infant Mortality in London Districts. [Data taken from London
Statistics, vol. 23, published by the London County Council.]
The figures are apparently based upon the Census Report of
1911. The numbers in col. (2), Table (27), show what percentage of
the total population occupying private houses in each district were
living in overcrowded conditions, any ordinary tenement which
has more than two occupants to a room, including bedrooms and
sittingrooms, being defined as overcrowded. The numbers in
col. (5) show the infantile mortality in each district, that is, the
number of infants who died under one year out of every 1000
born, including both sexes.
For the sake of comparison these numbers have been plotted
together on the same graph sheet. The districts, arranged in
alphabetical order, were numbered from 1 to 29 so as to form a hori
zontal scale corresponding to the scale of years in discussing prices
and marriages. The scale in this case is, of course, purely artificial,
118
STATISTICS
and the only reason for joining up neighbouring points is that we are
better able by so doing to see whether or not high values of the one
variable go with high values of the other variable, and low with low.
In calculating the mean and standard deviation for overcrowding
we have measured deviations from 170 as origin, and in making the
same calculations for infant mortality we have measured devia
tions from 125 as origin. It is convenient, therefore, to use the
point (170, 125) as origin in working out also the product deviation
sum, col. (8) of Table (27), instead of using the mean (1786, 126).
Table (27). Correlation between Overcrowding and
Infant Mortality in London Districts (1911).
(1) (2) ' (3) ■ (4) (5) (6) (7) (8)
Per
centage
of
Deviation of 
Square
Infant
Deviation of
^square
of No.
in
Product of Nos.
District.
Popula
No.
in Col. (2)
of No. in
Mor ]
^To. in Col. (5)
in Col. (3) and
tion
Over
from 170. 1
Col. (3).
tality.
from 125.
2o\.(6).
Col. (6).
crowded
{x)
(y)
1) Battersea .
133
 37
1369
124
 1
1
+ 37
(2) Bermondsey .
234
+
64
4096
156
+ 31
961
+ 1984
(3) Bethnal Green .
332
+
162
2G244
151
+ 26
676
+ 4212
(4) Camberwell .
135
 35
1225
109
 16
256
+ 560
(5) Chelsea .• .
149
 21
441
109
 16
256
+ 330
(6) City of London
123
 47
2209
124
 1
1
+ 47
(7) Deptford .
122
 48
2304
142
+ 17
289
 816
(8) Finsbury .
398
+
228
51984
156
+ 31
961
+ 7068
(9)Fulham .
146
 24
576
125
(10) Greenwich
124
 49
2401
128
+ 3"
" "9
■■_ 14.7
(11) Hackney .
124
 46
2116
119
 6
36
+ 276
(12) Hammersmith.
142
 28
784
146
+ 21
441
 588
(13) Hampstead .
71
 99
9801
78
 47
2209
+ 4653
(14)Holborn .
256
+
86
7396
115
 10
100
 860
(15) Islington .
200
+
30
900
127
+ 2
4
+ 60
(16) Kensington
171
+
01
001
133
+ 8
64
+ 08
(17) Lambeth .
136
 .34
1156
123
 2
4
+ 68
(18) Lewisham.
39
131
17161
104
 21
441
+ 2751
(19) Paddington
162
 08
064
127
+ 2
4
 16
(20) Poplar .
206
+
36
1296
157
+ 32
1024
+ 1152
(21) St. Marylebone
207
+
37
1369
108
 17
289
 629
(22) St. Pancras .
255
+
85
7225
112
 13
169
1105
(23) Shoreditch
366
+
196
38416
170
+ 45
2025
+ 8820
24) Southwark
258
+
88
7744
144
+ 19
361
+ 1672
(25) Stepney
350
+
180
32400
144
+ 19
361
+ 3420
(26) Stoke Newington
88
 82
6724
102
 23
529
+ 188.6
(27) Wandsworth .
63
107
11449
122
 3
9
+ 32.1
(28) Westminster .
129
 41
16.81
103
 22
484
+ 902
(29) Woolwich.
6.3
10.7
11449
97
 28
784
+ 2996
+ 1193944
251981
+ 2.56226
12748
+ 432294161
CORRELATION — EXAMPLES
119
For overcrowding,
mean= 17+249/29=: 1786 ;
G,= V[(251981/29) (086)2]= V(8615)=93.
For infant mortality,
mean= 125+30/29= 12603 ;
(T,= V'[(12748/29) (103)2]= V438:5=209.
Also^, referred to (170, 125)=(43229416l)/29=3907/29, and,
referred to the mean (1786, 12603), tliis becomes
=3907/29(086)(l03)
= 1338.
Hence r=1338/(93)(209)=069,
so that the correlation between overcrowding and infant mortality
is fairly marked.
§•0
5 P20
^o
%4
. . _ , _ „ . _ „ 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 23 26 27 28 29
Numbers representing various London Districts
Fig. (24).
The regression equation giving the average infant mortality, Y,
for districts in which the extent of overcrowding, X, is known is
Y 12603=r^^(X 1786)
^ ^(069)(20.9)
93 ^ ^
i.e. Y=l55X+984.
Similarly, the regression equation giving the average percentage
of overcrowding, X, for districts with a known amount of infant
mortality, Y, is
X 1786=r^^(Y 12603)
=031(Y 12603)
*.e, X=031Y810,
120
STATISTICS
Example (3). — The reader might apply the same method to the
determination of the correlation between Ratio of Indoor Paupers
and Ratio of Outdoor Paupers, each measured per 1000 of the esti
mated Population in England and Wales, excluding casuals and
insane, during the years 19001914. The following are the statistics
required for the purpose : —
Table (28). Correlation between Ratio of Indoor and Ratio
or Outdoor Paupers, each measured per 1000 or the
Population.
Indoor
Outdoor
Indoor
Outdoor
Year.
Paupers
Paupers
Year.
Paupers—"
Paupers —
Rate per 1000.
Rate per 1000.
Rate per 1000.
Rate per 1000.
1900
59
158
1908
68
164
1901
58
153
1909
71
156
1902
60
153
1910
72
151
1903
62
154
1911
72
141
1904
63
154
1912
69
112
1905
66
161
1913
67
111
1906
68
160
1914
64
104
1907
68
156
The coefficient of correlation in this case comes out negative
and = — 15, but it is very small and probably not significant.
If it were, it would imply that as indoor pauperism diminishes
outdoor pauperism increases, and vice versa.
Example (4). — To find the correlation between the Number of
Cattle and the Number of Acres of Permanent Grassland in the Coal
Producing Counties of England (1915).
A Government Report was consulted giving the acreage under
crops and grass and the number of live stock in each petty sessional
division in the country, as returned on 4th June 1915, and the
counties included were those which appear in the coalmining
reports published monthly in the Labour Gazette.
In each county the petty sessional divisions with the greatest
and the least numbers of cattle and of acres of grassland were
noted, the numbers being written down to the nearest 1000, and,
after a rough examination of the range of these variables from
county to county, suitable class intervals were chosen and a table
of double eutry was drawTi up, Table (29), with an empty square
ready for each possible pair of variables.
CORRELATION — EXAMPLES
121
Table (29). Correlation between the Number of Cattle
AND THE Number of Acres of Permanent Grassland in
THE CoalProducing Counties of England (1915).
Total Head of Cattle (expressed to nearest thousand)
^1
05
^2
510
^3
1015
1520
2025
2530
3035
Xp
3540
Totals
Meanx
o
:S
i
s:
1
£
1
1
1
i
»2
^1
05
: lo
1 : 15
i : 150
15
250
510
: : 9>.
W :27
: : :2i6
4
3
: "
30
300
^3
1015
:: : 6
jjjso
: : :i8o
•: 3
:: 18
:: 54
48
437
1520
4
3
: 12
III
:::6o
33
704
2025
i ^
;
30
833
2530
1
:
I: ^*
: :
.
: 9
:
2
J
26
981
3035
t
: 6
: 6
: :
: j 22
:::o
z
3
• 3
31
1202
3540
2
1
:
i 12
: :
2
: 6
: "
4
. 4
: 16
23
1533
4045
3
:
3
. 4
• 12
9
1
. 9
8
1687
4550
3
:
4
3
: 12
8
3
: M
16
. 16
10
1900
5055
s
: *
: 20
10
. 4
: 40
15
1
. »5
9
2083
5560
6
1
. 6
12
2
: 24
24
1
. 24
30
1
. 30
5
2650
6065
21
1
. 21
1
2750
6570
24
1
. 24
1
2750
7075
27
1
. 27
1
2750
7580
40
3
; 120
3
325
8085
n
1
. zx
22
1
2
200
'
Totals
76
97
54
24
14
5
5
1
276
Mean y
9 14
2013
3324
4333
5000
59 50
6750
575
122 STATISTICS
Each petty sessional division was then considered in turn and a
dot was inserted in the particular square applicable to it : e.g. a
petty sessional division with 42,000 acres of grassland and feeding
19,000 cattle would be represented by a dot in the square defined
by row (4045) and col. (1520) in Table (29) ; x was used to repre
sent the number of cattle and y the number of acres of grassland
in any division, each expressed to the nearest 1000 units. All the
dots were ultimately added in each square giving the frequency
for each corresponding pair of variables, and these frequencies were
recorded in the centres of the squares to which they applied : e.g.
the frequency of petty sessional divisions stocking 10 to 15 thousand
cattle and with 30 to 35 thousand acres under permanent grass
was 22. The total frequency for each row, i.e. each array of
selected y ty^e, was also noted, in the column at the end of the
rows : e.g. altogether 31 petty sessional divisions were observed of
the type having 30 to 35 thousand acres of land under permanent
grass. Likewise the total frequency for each column, i.e. each
array of selected x tjrpe, was noted in the row at the foot of the
columns : e.g. altogether 54 divisions were observed of the type
stocking 10 to 15 thousand head of cattle.
It was possible now to treat each column separately and to
calculate the mean y^s associated with different types of x, namely
^i> ^2j ^3j • • • > ^'id the frequencies so obtained were inserted in
the bottom row of Table (29) : e.g. when x lies between 20 and 25
thousand, the mean value of y is 50 thousand. The resulting
points— (a?!, y^, (x^, y^, (x^, Vs)   ■ in the notation of Chapter x. —
are plotted together in fig. (25), and they are seen to lie approxi
mately in a straight line. The successive rows were treated in
precisely the same way and the mean cc's calculated corresponding
to 2/'s of different types, namely y^, y^, 2/3, • • • ? the frequencies
obtained being recorded in the extreme righthand column of
Table (29) : e.g. when y lies between 45 and 50 thousand, the mean
value of X is 19 thousand. The resulting points (x^, y^), (rcg, 2/2)?
{Xq, 2/3), • • • , are also plotted in fig. (25), and, excepting for values
which depend upon only one or two records, they too lie roughly
in a straight line which is not far from coinciding with the previous
one, so that we shall expect on calculation to get a high value for
the coefficient of correlation.
In order to calculate r we need first to find the mean and standard
deviation for each variable. For this let us take as origin the
point (125, 275). The essential details are shown immediately
below the relative Tables (30) and (31).
CORRELATION — EXAMPLES
123
Table (30). Distbibution of Petty Sessional DmsioNS ac
COEDING TO THE HeAD OF CaTTLE (EXPRESSED TO NEAREST
1000) STOCKED.
(1) (2) (3) (4) (6)
No. of Cattle
Devia
No. of Petty
Product of
Product of
stocked (in
tion from
Sessional
Nos. in
No9. in
thousands).
125.
Divisions.
Cols. (2) & (3).
Cols. (2) & (4).
(x)
i
(•■
(.n
05
2
76
152
304
510
1
97
 97
97
1015
54
. .
..
1520
+ 1
24
+ 24
24
2025
+ 2
14
+ 28
56
2530
+3
5
+ 15
45
3035
+4
5
+ 20
80
3540
+5
1
+ 5
25
276
157
631
27 6X5:
Mean number of cattle=125
units referred to 125 as origin ;
=966, since x=—^ji class
___ and a,=5V[inann
= 5Vl963=700.
[The numbers in col. (4) may be spoken of as the first moments
of the totals of x arrays and the numbers in col. (5) as the second
moments.]
In order to calculate easily the product deviation with reference
to (125, 275) as origin, the value proper to each square was inserted
just above the frequency and the product of the deviation by the
frequency was inserted just below the frequency in different type of
print to prevent confusion : e.g. the row (5055) is +5 class intervals
distant from the row (2530) containing the origin, and the column
(2025) is +2 class intervals distant from the column (1015) con
taining the origin ; hence, for the particular square defined by this
row and this column, the product deviation=5x2=10 ; also
the frequency recorded in this square =4, so that it supplies a
term 10 X 4 to the product deviation ; the numbers 10, 4, and 40
are therefore the numbers which appear in the square. It is neces
sary to be careful with the signs ; if the product deviation is to
be positive, the separate deviations must be of like sign, both
positive or both negative : hence they must either be both above
or both below the numbers 125 and 275 respectively from which
124
STATISTICS
they are measured. In this instance there are only two negative
terms among the product deviations in the whole table.
Table (31). Distribution of Petty Sessional Divisions ac
cording TO the Number of Acres of Land (expressed to
NEAREST 1000) UNDER PERMANENT GrASS.
(1)
(2)
(3)
(4)
(5)
No. of Acres
under Grass
(in thousands).
Deviation
from 275.
No. of Petty
Sessional
Divisions.
Product of
Nos. in
Cols. (2) & (3).
Product of
Nos. in
Cols. (2) & (4).
0 5
iy)
 5
15
 75
375
510
 4
30
120
480
1015
 3
48
144
432
1520
 2
33
 66
132
2025
 1
30
 30
30
2530
. .
26
. ,
. .
3035
+ 1
31
+ 31
31
3540
+ 2
23
+ 46
92
4045
+ 3
8
+ 24
72
4550
+ 4
10
+ 40
160
5055
+ 5
9
+ 45
225
5560
+ 6
5
+ 30
180 •
6065
+ 7
1
+ 7
49
6570
+ 8
1
+ 8
64
7075
+ 9
1
+ 9
81
7580
+ 10
3
+ 30
300
8085
+ 11
2
+ 22
242
276
143
2945
Mean number of acres =275 
iifx 52491, since y=\^l
class units ; and a^=5V[W/(4Tf)^]=5VlO402= 1612.
[The numbers in col. (4) are the first moments of the totals of y
arrays, and the numbers in col. (5) are the second moments.']
It is now a simple matter to sum the product deviation terms,
taking each column (or each row) in turn : e.g. the first column
gives
150+216+180+12=558;
the second column gives
12+54+60+2562143,
and so on ; and, summing these results together, we get
558+143+76+126+96+160+30=1189.
CORRELATION — EXAMPLES 125
But this is the sum of all the product deviations referred to
(125, 275) as origin. Transferring now to the mean, we have
=¥A'(ifi)(ifJ)
=4013, expressed in class units.
Hence, ^=vl(^x^y^
where u^ and Oy are also to be expressed in class units,
=4013/V(l'963)V(10402)
=089,
a result not far from unity, so that the correlation is high.
The regression of ' acreage of grassland ' (Y) on ' head of cattle '
(X) is given by
(Y2491)=r^(X966)
= (089)^i5^(X966),
(700) '
i.e. Y=205X+5ll.
The points representing the mean 2/'s for a;'s of different types
should lie close to this line which is shown in fig. (25). This equation
enables us to predict the acreage under permanent grass to be
found on the average in petty sessional divisions with a given total
head of cattle in each. The words ' on the average,' to be tacitly
understood even if not stated in all such cases, are emphasised
because the prediction relates to the whole array of divisions of a
particular type, and as it only professes to give the mean or most
likely result it is not to be pronounced worthless if it fails in an
individual trial with a selected division.
Again, the regression of X on Y is given by
(X966)=r^^(Y2491)
(Jy
i.e.. X=039Y+005,
which tells us the total head of cattle (X) to be found on the average
in petty sessional divisions when the acreage under permanent
grass (Y) is known. This line is also drawn in fig. (25).
Example (5). — The data for this example are taken from an
exceedingly interesting Government Report on the Cost of Living
of the Working Classes {Report of an Inquiry by the Board of Trade
into Working Class Reyits and Retail Prices, together with the Rates
126
STATISTICS
of Wages in certain Occupations in Industrial Towns of the United
Kingdom in 1912 in continuation of a similar Inquiry in 1905,
70
60
50
40
30
'20
10
i
m.
10 20 30 40 X
Total Head of Cattle (expressed to nearest thousand)
Fig. (25).
Cd. 6955). Some further particulars concerning this Report will
be found on p. 281.
CORRELATION — EXAMPLES 127
The towns included in the inquiry numbered 93, but in five
instances it was found desirable to consider closely adjacent muni
cipalities as single towns thus reducing the number of townunits
to 88, namely 72 in England, 10 in Scotland, and 6 in Ireland. In
the example which foUows the three zones of London, middle,
inner, and outer, have been treated as separate towns, so making
the net number of townunits 90. This number is too small to
allow any real value to be attached to our results, but the fewness
of the observations makes them easier to deal with as an illustration
of method.
We begin as before by choosing ^convenient class intervals for
the two factors we propose to consider, namely, Increment of Un
skilled Wages and Increment of Bents — by increment in each case
is meant the percentage increase (+) or decrease (— ) between
1905 and 1912 — and then form a correlation table. In the last
example separate tables were drawn up to find means and S.D.'s,
but that was only done in order to keep the argument clear at its
first presentment : generally we may dispense with these additional
tables and show all the worldng in one (see Table (32)).
The increment of wages runs from (—25) per cent, to (+115)
per cent., so that, if we take (—05) as origin and a difference of
2 per cent, as unit, the classes run from (—1) to (+6), these numbers
being shown in different type in the table, but in the same com
partments as the others. In the fourth row from the bottom
are shown the total frequencies for x arrays from class (—1) to
class (+6), and in the row just below it these several frequencies
are shown multiplied by their corresponding deviations measured
from (—05) as origin in terms of the class unit — the resulting
numbers give the first moments of the totals of x arrays. These
numbers, multiplied again by their corresponding deviations, give
the second moments of the totals of x arrays, and appear in the
last row but one of the table.
We deal in exactly the same way with increment of rents : a
percentage increment of (—1) is taken as origin from which devia
tions are measured, a difference of 3 per cent, is taken as unit,
and the different classes then have deviations running from (—3)
to (+6). The totals of y arrays, the first moments, and the
second moments of these totals appear in the last three columns
on the righthand side of Table (32).
To calculate the deviation products, numbers were inserted in
each square on the same principle as in the last example, and the
sums of these products for each x array, that is for each column,
128
STATISTICS
are given in the bottom row of the table— 1, 0, 14, 6, etc., making
in all a total of 126.
Table (32). Correlation between Increment of Unskilled
Wages and Increment of Rents in certain Industrial
Towns of the United Kingdom.
X
= Percentage Increment of Wages
1st.
2nd.
I
fi
+ 2,
+ 3
+ 4
+5
+ 6
Totals
of y
mo
ments
mo
ments
25
05
15
35
55
75
9.5
IV 5
arrays
ofy
ofy
arrays
arrays
J9
3
10
o
1
1
3
9
o
^
2
o
Qc
2
7
1
3
4
8
16
CO
2
^
o
I
2
4
5
6
O
I
4
4
2
1
1
1
10
10
10
Gl
o
I
4
4
5
6
o
o
o
t:
1
15
6
6
2
30


i
o
o
o
I
o
2
4
5
*K
+ 1
2
1
9
3
3
1
18
18
18
I
o
6'
12
5
?
o
4
8
lO
E:
+ 2
5
6
1
1
2
11
22
44
.N
o
4
8
20
c^
o
3
9
12
IS
c
+ 3
8
3
4
1
1
2
11
33
99
03
o
12
9
12
30
&.
o
^
+ 4
11
3
3
12
48
^
o
1
o
4S
14
1
o
1
5
25
Si
+ 6
17
24
1
24
1
6
36
Totals of X arrays
2
45
8
12
7
7
8
1
90
75
305
1st. moments of
2
8
24
21
2B
40
6
125
X arrays
2nd. moments of
X arrays
2

8
48
63
112
200
36
469
!,
Product Sums of
1
14
6
9
52
50
6
126
Total Product Sum '
'
The necessary calculations are as follows : —
1. Mean a:=05+2(125)/90=2'28,
(7^=2V[VV(W)']=2V(26585)/90.
2. Mean 2/=l + 3(75)/90:l50,
^t/^SVftV— (U)2]=3V(21825)/90.
^ 120 /12 6.\/7 5\
1965
(90)2'
expressed in class units»
CORRELATION EXAMPLES
129
Hence
r=pl(j^<iy
1965
(90)2
:008.
X
90
X
90
V(26585) V(21825)
In substituting for Gg. and cry to find r we have omitted the factors
2 and 3 respectively, because the S.D.'s have to be expressed in
the same units as p. Alternatively, if we worked with a difference
of 1 per cent, as unit, instead of taking a difference of 2 per cent,
as xmit for x deviations, and a difference of 3 per cent, as unit for
y deviations, each individual product of x and y deviations would
Y
''1T\ '
(2)
*3
S r
QC 5
«K
?
1
Sa
s
V.
o
^n
Oi
2
c
So
w
^ 2
»••"""■'
.?
—  — """"
c""
Ci.
_ » ■ "1 1 "
"
^
'"" M(2
28,1^5)
1
2 3 4 5 6 X
Percentage Increment of Wages
Fig. (26).
have to be multiplied by 2 x 3. Thus p would then be 6 X 1965/(90)2,
and we should get the same result for r as before by taking g^
and Gy as in (1) and (2) above. In this case r is so small as to be
quite insignificant of any correlation between the two factors dis
cussed, and the regression lines should therefore be not far from
perpendicular to one another.
The regression of y on x, or the equation giving the most probable
y for a given type a; is
(2/l50)=r^(a;228),
I.e.
y=0'Ux+l'26.
I
130
STATISTICS
Similarly, the regression oi x on y is
x=0'06y+2'2.
To draw the first line we note that it passes through the points
(0, 1*25) and (5, 1*8) ; also the second line goes through the points
(22, 0) and (25, 5). The two lines intersect at M (228, 15), the
mean of the distribution. They are drawn together in fig. (26).
Table (33). Correlation between Unskilled Wages
AND Rents in certain Industrial Towns of the
United Kingdom.
X = Index Number for Wages of Unskilled Labour
455
515
575
635
695
755
815
875
935
995
§.
•^
405
2
3
1
"to
1
485
2
1
3
1
4
3
1
565
1.
1
2
7
15
6
2
1
645
2
1
3
9
4
1
725
1
3
3
2
1
805
1
1
1
^
V
885
1
1
965
1
1045
:si
1125
1
Example (6). — Instead of discussing the Changes in Wages and
Rents between 1905 and 1912, it might be of interest to find the
correlation between index numbers representing Actual Wages and
Rents in October 1912, taken from the same Report. The necessary
data for this purpose appear in Table (33) showing the distribution
of frequency between the different classes : e.g. seven towns were
observed in which the index number for wages was between the
limits (7984) and the index number for rents was between the
limits (5360). The wages figures quoted in Table (33) refer only
to unskilled labour in the building trade ; the inquiry actually
embraced certain occupations in the building, engineering, and
CORRELATION — EXAMPLES
131
printing trades, these having been selected as industries which are
found in most industrial towns, and in which the time rates of
wages are largely standardised.
Table (34). Correlation between Increment of Working
Class Prices and Increment of Working Class Rents
IN certain Industrial Towns of the United Kingdom.
X = Percentage Increment of Prices
75
95
115
135
155
175
195
10
1
7
1
1
2
4
1
2
2
2
1
2
1
1
4
6
10
8
1
i
2
1
2
5
8
1
1
1
5
2
4
2
3
1
8
1
2
3
1
4
aj
11
1
1
* 1
14
1
T7
1
The coefficient of correlation turns out to be 046, distinctly larger
than in the previous case. Also the lines of regression are : —
(1) y=0'4nx\2l. (2) a;=0452/+56.
Example (7). — The Report also furnishes data for evaluating the
correlation between the Increment of Working Class Prices and
Increment of Working Class Rents, again meaning by increment the
percentage increase (+) or decrease (— ) between 1905 and 1912
(see Table (34)).
The correlation in this case is very small, being only 013. The
regression equations are : —
(1) y=022xl5, (2) x=0'01y{l3.
PART II
CHAPTEK XII
INTRODUCTION TO PROBABILITY AND SAMPLING
Sfppose we wish to know the average measurement of some organ
or character, e.g. length of forearm or weight or anything similar,
in a large population containing several thousand individuals. The
mean obtained by actual measurement if it were practicable to
carry it out on so large a scale, would evidently depend to some
extent upon the sex, the race, the age, the social class, and so on,
of the individuals selected, and we shall accordingly assume our
population to be composed of individuals of the same race and sex,
at about the same age, taken from the same class, etc. ; it would be
impossible in practice no doubt to secure that all conditions should
be identically the same for all the individuals observed, but the
population may be as homogeneous as we care to make it in theory.
Now suppose* that, instead of attempting to measure every single
individual, a random sample of 1000 from among the population
be taken and that the mean and variabiUty of the measurements
for this sample be calculated, giving results m^ and a^. With
these may be compared mg and g^, the results of measuriug a second
sample of 1000 individuals, m^ and og, the results of a third sample,
and so on. It is extremely unlikely that the values obtained for
the m's in this way will equal one another, neither will the o's
be equal ; but, if we have succeeded at the beginning in avoiding
aU 411balanced influences when we tried to make the field of
observation as homogeneous as possible, the resulting m's and cr's
will only differ from the values of the mean and variability for the
whole population, assuming they could be measured, within a
comparatively small range.
Differences of this kind, which arise merely owing to the fact
that we are often obliged in practice, for lack of time or means, to
deal with a comparatively small sample instead of with the whole
population of which it forms a part, are said to be due to random
132
INTRODUCTION TO PROBABILITY AND SAMPLING 133
sampling. Granted that the samples themselves are adequate in
size (containing, say, from 500 to 1000 individuals each) an esti
mate of differences to be expected between one and another can be
made, and unless the observed differences fall outside recognized
Umits it is said that they are not significant of any difference other
than such as might quite weU be accounted for by random sampling
alone.
In theory, then, we can imagine a large number of such random
samples selected, and by determining the S.D. of their means,
m^, mg, mg, . . . , we should have a fair measxire of the deviation
which might quite well occur from the true value, that is, from the
mean of the population as a whole, through working only with a
sample. Further, a range of two or three times the S.D. on either
side of the true mean ought to take in the majority of the sample
means observed.
Exactly the same principle holds good in dealing with the pro
portion of individuals in a given population which can be assigned
to a particular class, or in discussing the S.D. of the distribution, or
the C. of v., or a coefficient of correlation, or any other statistical
constant, no matter what the nature of the character may be which
is measured or observed, or whether it relates to animate or inani
mate objects. Take, for instance, the variabiUty — by selecting
several samples from a given population we get a series of values
a I, og, 03 . . ., and in the S.D. of this distribution of variabilities
we have a measure to which we can compare the deviation of any
sample variability, Gj., from the true variabiHty of the whole popu
lation, while a range two or three times the S.D. might be expected
to include the majority of the different variabilities met with in
the samples.
Although the S.D., as we have explained, provides quite a suit
able measure of the extent of deviation of a sample constant from
its true value in the population as a whole, in practice, owing to
the historical development of the theory having followed the track
of the normal curve of error [see Chapter xviii.] a measure known
as the probable error and equal roughly to twothirds of the S.D.
is not seldom employed in its place. The main, if not the sole,
purification for retaining this measure is that it has estabHshed its
position by long usage, and in any case it is very easily deduced
from the S.D. by the relation
p.e.=06745 S.D.,
which follows at once from the normal curve and is only strictly
134 STATISTICS ''
justified when the distribution is normal (see p. 246). Let it suffice
here that instead of simply using the S.D., as might now seem
the obvious course, some writers prefer to multiply the S.D. by a
certain fraction, in which there is no particular virtue except that
which arises through honourable descent, and to work with the
* probable error.'
Since we do not know how much weight to assign to any result
unless the magnitude of its p.e. is ailso given, results are frequently
stated in the following manner : in a study of the Variation and
Correlation in the Earthworm, by R. Pearl and W. N. Fuller [Bio
metrika, vol. iv. pp. 213229] :
Mean length of worm= 19171 ±0094 cms.,
S.D.=3077±0067 cms.,
C. of V.=16049di0356 per cent.,
meaning that the mean length of the worms measured was 19171
cms., subject to a probable error of 0094 cms. which might be in
excess or defect, in other words the mean length lay probably some
where between
19077 cms. and 19265 cms. ;
similar remarks apply to the variabiUty, absolute (S.D.) or relative
(C. of v.).
When the standard deviation (p.e./06745) is used as the measure
of error due to simple sampling, the fact is generally recorded, and
it is sometimes spoken of as the standard error in that connection,
but, as it seems unnecessary to multiply names for ideas which are
not really new, only that they appear in a new setting, we shall
not employ the term.
It must be clearly understood that no outstanding and predict
able cause exists, by our hypothesis, for such differences as occur
in the statistical constants between one sample and another : they
are the resultant effect of a complex of forces which cannot be
properly traced, still less measured, apart from one another, and
which have been happily described as that ' mass of floating causes
generally known as chance.' Since therefore the forces coming
into play, under the ideal conditions formulated, are of the same
chance nature as those affecting the spin of a wellbalanced coin
or the selection of a card from a smooth and weU shuffled pack,
it may be expected that the resulting distribution of means,
m^, mg, mg, . . . , of S.D.'s, cti, og, og, . . . , and of all the other
constants will likewise be subject to the same laws of probabiUty
iNTRODtJCTION TO PROBABILITY AND SAMPLING 135
which serve to describe within limits what happens in the case of
coin or card. It follows that some acquaintance with the first
elements of mathematical probability is essential if one is to under
stand the theory of sampling, and a short digression must here
be made in order to introduce that subject. This will be found
to lead directly to a solution, under certain prescribed conditions,
in the simple case when the character observed is an attribute like
complexion, fair or dark, or like birth, male or female, which can
only fall into one of two definite classes and when every one observa
tion in the sample is independent of every other. In the more
general case where the character observed is capable of direct
measurement and may lie in magnitude anywhere along a scale
of values divided up into a number of different classes, it is not
so easy to determine the effect of random sampling, because it is
not possible, as it is in the previous case, actually to draw up a
frequency table describing in detail the character of the distribu
tion to be expected from theory in any given sample.
The idea contained in the word probability is one familiar to us
in our everyday talk, but if we seek to analyse it as used we find
it as elusive as the personality of the user. A remarks : * Wars
will probably be stamped out, like duelling, in the course of time.'
B repHes : ' No ! fighting will probably go on as long as the world
lasts — you can't change human nature.' Now the amount of
credence we are prepared to give to each of these statements is
vague and uncertain until we know something about A and B
themselves and the value of their judgment, quite apart from the
influence of our own opinion upon the matter ; perhaps A is an
optimist or B is a pessimist, and in estimating the ' probably '
used by each we must allow for these facts. ProbabiHty, then, in
ordinary conversation, is something largely subjective : it has a
varying significance according to the person who uses the word
and, unless we could get rid of this personal element, it would be
hopeless to try and approach it along scientific lines.
Mathematical probability is unlike colloquial probability in that
all the uncertainty is taken out of it, or at least the uncertainty is
confined within defined limits. We shall only touch the fringe of
the subject in this book, and what we have to say may be best
introduced by considering some examples which may appear trivial,
but they possess the merit that no personal bias can enter into
their discussion to distort the results. The reader must not be
impatient at their artificial character : in many, if not in all,
branches of science, before tackling any particular problem as it
1S6 Statistics
actually exists, it is helpful to examine what can be deduced in a
simple case free from all complication, and, having settled that,
we try to see how the results are affected when we come to allow
one by one for the various compHcating factors which exist. For
example, in Astronomy, the track of a planet in space may first be
found on the hypothesis that the sun alone is the compelling influence.
Then we may proceed to discuss how it is deflected from its path
when the gravitational influence of neighbouring planets also is
taken into account.
Let us start with an ordinary pack of playing cards, and, after
shufifling, turn up one card. Can we measure the probability that
this card shall be (1) the 7 of spades ? (2) some spade ?
Altogether there are 52 cards, and we will suppose that the
cards are so cut and so smooth that each of the 52 has an equal
chance of being turned up : for instance, there is to be no sticki
ness or anything to help any particular card to evade us by sticking
fast to its neighbour. Now we are certain to turn up some card
and there are 52 different possibilities, each of them by hypothesis
equally probable. If, then, we agree to denote certainty by unity,
we must divide 1 into 52 equal parts and assign one part to each
card as the probabiHty of its appearance.
1. The probability (or chance as it is sometimes caUed) of turning
up any stated card, such as the 7 of spades, is therefore 1 out of 52,
i.e. 1/52.
2. Again, since there are 13 spades in all, the chance of turning
up some spade is 13 out of 52, i.e. 13/52=1/4.
These results may be put in another way which is often useful.
If the experiment is repeated a great number of times, a return to
the initial conditions of the problem being made after each trial
by replacing the card drawn and reshuffling the pack, we should
expect to turn up the 7 of spades on the average about once in
every 52 experiments, and we should expect to turn up some spade
on the average about once in every 4 experiments. This must
not be taken to mean that in 4 experiments we are sure to turn
up just one spade — a trial wiU readily prove such a statement to
be untrue — but that, if we went on performing experiment after
experiment, we should in the long run get a proportion of about
1 spade to every 4 experiments and a trial will likewise prove the
truth of this statement.
GeneraUy, when an event can happen in n different ways alto
gether, and among these different ways there are a which .give
what might be caUed successful events, the probability of success
INTRODUCTION TO PROBABILITY AND SAMPLING 137
at any single happening is a out oin^i.e. ajn, and is usually denoted
by the letter p, and the probability of failure is (n—a) out of n,
i.e. {n—a)ln, and is usually denoted by the letter q.
Clearly {p\q)=l, and this is reasonable because we are certain
to get either a success or a failure at a single trial and unity was
fixed as the measure of certainty. In k trials, the probable number
of successes would be kp and of failures kq, because in n trials, on
the average, there are a, or np, successes and (n—a), or nq, failures.
Example (1). — In the second case considered above, the pro
bability of success (turning up a spade) is a out of n
=a/7i= 13/52 1/4=^,
and the probabiUty of failure (not turning up a spade, i.e. turning
up one of 39 other cards) is (n—a) out of n
= (na)ln=39l52=3l4:=q.
And (^+g)=l/4+3/4=l.
Example (2). — What is the chance of drawing either a picture
card or an ace from the pack at a single trial ?
Altogether there are 12 picture cards, and the chance of drawing
any one of them is thus 12 out of 52
= 12/52=3/13;
and the chance of drawing any one of the 4 aces is 4 out of 52
=4/52=1/13.
Hence the total probability required
=3/13+1/13=4/13.
Generally, if the probability of one type of event is p^, and the
probability of a second t3rpe of event is ^2» ^^^ if either type is
reckoned a success, then the total probabiUty of success is (Pi+Pz)
This evidently holds good however many different types there
may be, and even if there is only one event of each type.
Consider now the simultaneous happening of two events, one of
which can happen in n different ways, a among which are to be
regarded as successful, and the second can happen in n' different
ways, a' among which are to be regarded as successful. Further,
the two events are to be absolutely independent of one another
in the sense that neither is to influence the success or failure of
the other. What is the probability of a double success occurring ?
The total number of different combinations of the two events
138 STATISTICS
possible is nn' , for any one of the n possible happenings for the
first event can be combined with any one of the n' possible happen
ings for the second event. Also the total number of different
combinations of two successes possible is aa\ for any one of the
a possible successes for the first event can be combined with any
one of the a' possible successes for the second event. Hence,
according to our definition of probability, the probabiUty of a double
success is aa' out of rin' =aa' jnn' ={aln){a' jn').
Thus to get the probabiUty of a double success for a combination
of two independent events we must multiply together the separate
probabilities for the success of each event taken by itself.
Similarly, in the above catee, the probability of a double failure
= (n—a)(n' —a')lnn' ; and the probability of one success and one
failure
_a n'—a'n—a a' •
— .  \ . — 
n n n n
for the first event can be a success and the second a failure or the
first a failure and the second a success.
Here, again, if we take all the different possibilities into account,
and add the probabilities corresponding to each case, we arrive
at certainty, the measure of which is unity, thus : —
probability of 2 successes =aa'lnn\
„ 1 success and 1 ia,iluTe=a{n'—a')lnn'}a'{n—a)lnn'
„ 2 failures ={n—a)(n'—a')lnn\
Therefore total probability, all cases,
_aa' a(n'—a') , a'(n—a) {n—a)(n'—a')
,T — T , + ; —
nn nn nn nn
= {aa' \an' —aa' \a'n—a'a\nn' —na' —an' {aa')lnn'
=nn'lnn'
= 1.
Example. — Take two packs of cards. What is the probability
of drawing an ace from the first pack and a king, queen, or knave
from the second pack ?
Here a=4, n=62, a' =12, n'=52 ; hence the required probability
=aa7^7i'=4/52x 12/523/169= l/56i.
Thus we might expect to succeed on the average about once in
56 trials.
INTRODUCTION TO PROBABILITY AND SAMPLING 139
We proceed to discuss the case of a coin spun a number of times
in succession, and we shall find the probabilities of the appearance
of so many heads (H) and so many tails (T) in so many spins on the
hypothesis that the coin is perfectly balanced and equally likely
to fall on either side.
In 1 spin there are 2 possible events, namely H or T, which
we shall write simply as
(H, T).
In 2 spins there are 4 possible events, because we can combine
the H or T of the first with an H or T at the second spin, and we
may express the result thus
(H, T)(H, T)=(HH, HT, TH, TT) ;
the interpretation of which is that we may get either head followed
by head, or head followed by tail, or tail followed by head, or tail
followed by tail.
In 3 spins there are 8 possible events, because we can combine
the 4 events previously possible with an H or T at the third spin,
thus getting
(H, T)(H, T)(H, T)
= (H, T)(HH, HT, TH, TT)
= (HHH, HHT, HTH, HTT, THH, THT, TTH, TTT) ;
the interpretation of which is that we may get either 3 heads in
succession, or 2 heads followed by 1 tail, or head followed by tail
followed by head, and so on.
In 4 spins there are 16 possible events, because we can combine
the 8 events previously possible with an H or T at the fourth spia,
thus
(H, T)(HHH, HHT, HTH, HTT, THH, THT, TTH, TTT)
= (HHHH, HHHT, HHTH, HHTT, HTHH, HTHT,
HTTH, HTTT, THHH, THHT, THTH, THTT,
TTHH, TTHT, TTTH, TTTT).
But the method here adopted to get the possible events at each
stage is precisely the same as that which gives the successive terms
in the ordinary algebraical expansions of
(H+T), (H+T)(H+T), (H+T)(H+T)(HHT), etc.
Also each new spin has the effect of doubling the number of possible
140 STATISTICS
events obtained at the previous spin, and we conclude that in
n spins, there are
(2 X 2 X 2 X . . . to ^ factors), •
or 2"*, possible events, and these events are given by the successive
terms in the expansion of
[(H+T)(H+T)(H+T) ... to 7^ factors.]
Let us now consider the probabilities of the different events
obtainable. The important point to notice is that at any stage
each possible event has exactly the same probability, for there is
no reason why any particular spin should give H rather than T,
or T rather than H : for example, in 3 spins there are 8 possible
events, each by itself equally probable, and we therefore divide
the unity of certainty into 8 equal parts and assign one part to each
event, thus
probability of 3 heads— HHH=J
probability of 2 heads and 1 tail— HHT=Jl
HTH=i I
THH^iJ
probability of 1 head and 2 tails— HTT=Jj
THT=iU
TTH^iJ
probability of 3 taUsTTT=J.
It is clear from this arrangement that, if the order of the appear
ance of H and T is indifferent, some events are of the same type
and some types are likely to appear oftener than others, e.g. the
probability of getting ' 2 heads and 1 tail ' (or ' 1 head and 2 tails ')
is three times as great as the probability of getting ' 3 heads '
or ' 3 tails.' Hence for conciseness it is convenient to adopt the
ordinary index notation and write
HHH=H3, HHT=H2T, HTHH^T, etc.,
so that the possible events in 3 spins are
H3, 3H2T, 3HT2, T^ ;
in 4 spins they are
H*, 4H3T, 6H2T2, 4HT3, T^ ;
and so on.
The probability of any particular type is now readily written
down : e.g. in 4: spins, the probability of getting 2 heads and 2 tails
= (number of successful events possible)/(total number of events
possible)
=6/2*=6/16=i.
INTRODUCTION TO PROBABILITY AND SAMPLING Ul
But the binomial expansion always sums together terms of the
same type for us in just the manner wanted, and we have the
possible events in n spins given by the successive terms in the
expansion of
(H+T)(H+T)(H+T) ... to n factors,
i.e. (H+T)«,
i.e. H«+"Ci . H'»iTi+«C2H"2T2+ . . . +T",
and therefore again the probability of any particular combination
is readily written down : e.g. probability of ' (n—2) heads, 2 tails '
= (number of successful events possible)/(total number of events
possible)
Another way of stating the result obtained is to say that we
might expect to get
n heads appearing on the average about once in every 2** trials,
(n—1) heads, 1 tail ,, ,, ,, "0^ times ,, „
(?i— 2) heads, 2 tails ,, „ ,, ^^Cg times ,, „
and so on.
If, in accord with our previous notation, we call the appearance
of, say, H at any spin a ' success,' and label its probability J by the
letter ^, and if consequently the appearance of T at any spin is a
* failure,' its probability, J, to be labelled by the letter q, we have the
probabilities of the different combinations of events in (H+T)", or
H«+«CiH«iTi+«C2H«2T2+ . . . +T«,
given by the corresponding terms in {jp\qY, or
where p=gr=:.
After each spin of the coin in the case considered the distribution
of probabilities was symmetrical, e.g. after the fourth spin the pro
babilities were
14 6 4 1
T^J TFJ T^» T^' T^
We pass on now to a case where the distribution is not symmetrical,
owing to the fact that p and q are no longer equal for any isolated
event.
Consider the throw of an ordinary die in which each of the six
faces is assumed to have an equal chance of appearing uppermost.
The probability of throwing, say, a 3 is 1/6, since we are certain
to throw either 1, 2, 3, 4, 5, or 6 ; and the probability of failing to
throw a 3 is 5/6, since we are certain either to throw a 3 or not
to throw a 3.
142
STATISTICS
If we represent the probability of success (say, in this case,
throwing a 3) by ^ {i.e. 1/6), and failure {i.e. in this case, failing
to throw a 3) by g {i.e. 5/6), we have
iJ+g= 1/6+5/6= I.
Bearing in mind then that the probability for a combination of two
independent events is determined by multiplying together the
separate probabHities for each, we have the following table showing
what might be expected when 1, 2, or 3 dice are thrown up together,
where 5 stands for success and / for failure : —
No. of
Dice
thrown.
Different
Possibilities.
Different
Probabilities.
1
2
3
ss, sf.
p,ff
sss, ssf, sfs, sff,
fss,fsf,ffs,fff.
QP, qq
pppy ppq^ pqp, pqq,
qpp, qpg, qqp, qqq.
The table is easily extended on the same principle, and at each
step, it will be noticed, a fresh pair of possibiUties, s or/, is intro
duced, with corresponding p or q, to be combined with what has
gone before.
If the order of appearance of s and / is a matter of indifference,
e.g. if it does not matter whether the first die shows s and the
second /, or vice versa, so that results of the type sff and fsf may
be regarded as equivalent, we may use the index notation, as in
the coin case, to render the table more concise, thus : —
No. of
Dice
thrown.
Different
Possibilities.
Corresponding
Probabilities.
1
2
3
s,f.
s\2sf,p.
^,3s%3sf^,f^
p>q
P\ 2pq, q'
p\ 3p% 3pq\ q\
When, therefore, n dice are thrown we again recognize the
different possibiUties as given by the successive terms in the ex
pansion of (5+/)", namely
5«4_nC^5«l/l+'^C2.S«2/2+ . , . +/«^
and the corresponding probabiUties by the successive terms in the
expansion of (33+g')", namely
INTRODUCTION TO PROBABILITY AND SAMPLING 143
Hence the probability of throwing n threes =j9"= 1/6" ;
(nl)
(n2)
_ 1 5
* 6"i * (5
=5n/6" ;
n(n—\)
62
12
■25n(n
l)/2 . 6« ;
and so on.
The result we have just obtained is of perfectly general appUca
tion. Whether we spin n coins, in which the probabihty, p, of
success (say ' heads ') for each is 1/2, or throw n dice, in which the
probability, p, of success (say ' to get a 3 ') for each is 1/6, or have
any n similar but independent events happening in which the
probability of success for each is p, the different resulting possi
bihties as to success are given by the successive terms in the expan
sion of («+/)", and their corresponding probabilities are given by
the successive terms in the expansion of (p\q)^.
We are thus in a position to form a frequency table, like that on
p. 53, showiQg the probabilities of getting 0, 1, 2 ... ti successes
(in other words, the proportional frequencies of these different
numbers of successes) at the occurrence of n similar independent
events, where p is the probability of success for each and q is the
probability of failure : —
Table (35). Binomial Distribution.
(1) (2) (3) (4)
Number of
Successes.
Frequency.
{X)
1
(/)
n{nl) n2^
12 ^ ^
^(^l)(^2)^3p3
123
Product of Nos. in
Cols. (1) & (2).
(^)
7i(?il)3"V
i{nl){n2)^_,^.
12
?ip"
np
Product of Nos. in
Cols. (1) & (3).
nq^~^p^
»2*»2
271(71 1)2«V
Mnl){n2) , .
U2 ^ ^
TlV
np[l+p(nl)]
144 STATISTICS I
Col. (1) gives the deviations from the origin of measurement, I
which in this case is taken as ' no successes,' the class interval j
being equal to a difference of 1 in the number of successes. \
The summations of the last three columns are effected as ^
follows : —
Col. (2). grn__^^nlpl__^H^^:ZO^n2p2__ , ^ , __^n \
1*2 '
'(
because jp+g=l. \
i
CoZ. (3). \
fiifi lU^i 2)
1 * Zi
■up
gnl^(^_l)^n2^1__(^ 1)(^ ^^g^3j92f . . . +^«l1
= ^29.
Coi. (4). ' ;,
wg"V+2TO(ro 1)^"^+ ^"^""^ ^>( "ZJV3j)3+ . . . +to2^« ,j
=^np[l + (nl)p(q+p)n^\
= 7l^[lH^(7l— 1)]. I
The arithmetic mean of the distribution 
=sum of terms in col. (3)/sum of terms in col. (2) \
=Z(fx)IS(f) \
=np. :
INTRODUCTION TO PROBABILITY AND SAMPLING 145
The meansquare deviation referred to zero as origin, zero in this
case corresponding to ' no successes '
=sum of terms in col. (4)/sum of terms in col. (2)
=Z{fx^)IZ{f)
=njp[\+'p(n—\)'].
Thus the standard deviation, a, is given by
(j^=njp\\\p{n—\)]—x^,
where x is the deviation of the mean from the origin of measure
ment, so that x=np.
Therefore G^=np[l\p{n—l)]—n^p^
='np{l—p){ n^p^— n^p^
=npq.
Hence cr= \/(npq),
and p.e.= 0*6745 V'(npq).
These two results are exceedingly important, and it is essential
to understand what it is they measure. An example may help
to make this clear.
If we spin 300 coins, counting * head ' for each a success, the
number of heads we shall get will be unlikely to differ very greatly
from the average or mean number of successes, np, i.e. 150 if p=ll2
for each coin, and in the long run, if we repeat the experiment a
great number of times, we shall get a proportion of about 150 heads
to every one experiment. Again, if we throw 300 dice, counting
every throw of the number 5, say, for each die a success, so that
p in this case =1/6, the number of fives we shall get will be unlikely
to differ much from np, i.e. 50, and in the long run, if we repeat the
experiment a great number of times, we shall get on the average
a proportion of about 50 fives to every experiment ; we should
find, for example, something like 5000 fives if we threw 300 dice
IPO times in succession. The arithmetic mean of the distribution
tells us therefore about what number of successes to expect in one
experiment with n events if n is fairly large, though we should be
unlikely to get exactly this number if we confined ourselves to the
one experiment.
The second result, the S.D., supplies us with a measure of the
unlikelihood of getting the exact number of successes expected at
any single experiment, for it defines the dispersion of the different
numbers of possible successes about their average. Clearly the
greater the dispersion, the greater is the likeUhood of missing the
K
146 STATISTICS
average. The mean number of successes when an experiment is
repeated a great number of times is n^, but at any single experi
ment it is not unlikely that the number of successes obtained may
differ from np by as much as 06745 '\/(njpq) in excess or in defect ;
it is, however, unlikely, as we shall see later (p. 244), that the
number will differ from np by more than ^^/{npq) in excess or
defect when the distribution is not very skew, or unsymmetrical,
especially if n be large. The probable error in the case above when
we throw a sample of 300 dice is
=06745 V(300 X 1/6 x 5/6)=06745 V(4167)=:44,
and it is therefore quite likely that the number of fives obtained
at one experiment will differ from the expected number, 50, by as
much as 4 or 5 in excess or defect, but it is unlikely that the number
will fall outside the limits 50±3V(4167), say 30 to 70.
It is sometimes more convenient to refer to the proportion of
successes, etc., expected at any experiment rather than to the
actual number expected. In that case, since with n events the
expected number of successes is pn, but the number obtained may
quite likely differ from this by ■±:Q^&14:^^/{npq), therefore with
n events the expected proportion of successes is pnjn, i.e. p, with
quite possibly an error =i 06745 y'(n25g)/7i, i.e. i 06745 v^(2)g'/^).
Thus, with the 300 dice, the expected proportion of successes at
one experiment lies between
[1/606745 V(l/6x5/6^ 300)] and [1/6+06745^(1/6x5/6^300)]
i.e. (1/606745/465) and (1/6+06745/465)
i.e. 1/55 and 1/66 ;
and it is unlikely that the proportion will differ from 1/6 by more
than 3/465, i.e. 1/155.
To illustrate how the binomial distribution might be directly
applied, an experiment was made with 900 digits selected at random
by taking in succession the digits in the seventh decimal place in
the logarithms of the following numbers : —
10054, 10154, 10254, . . . 99954,
as given in Chambers's Mathematical Tables. In this way each of
the 10 digits, 0, 1, 2, 3 ... 9, may be supposed to have stood an
equal chance of selection each time one was written down. Gaps
of 100 were left between the numbers selected so as to avoid runs
INTRODUCTION TO PROBABILITY AND SAMPLING 147
of the same figure which sometimes occm even in the seventh
decimal place owing to lack of independence.
The digits were arranged in 36 columns, each column containing
25 digits, and in this way we obtained what was equivalent to
36 separate but like experiments with 25 events each. If we agree
to regard the appearance of a 7 or an 8 as a successful event, and
the appearance of any other digit as a failure, the chance of success
at any appearance is 2/10, and the chance of failure is 8/10. The
case is thus of exactly the same kind as that of throwing 25 dice
36 times in succession, and if the probability of success, namely 1/5,
for each independent event, be denoted by ^, and the probability
of failure, namely 4/5, by q, the distribution of successes and failures
should approximately conform to that given by the expansion of
for any particular experiment, and since the experiment was re
peated 36 times, the total numbers of successes and failures of
different orders obtained should approximately conform to
m(p+qY\
for if the probability of an event is jp the number of events to be
expected in N trials is Np.
The actual distribution observed is compared with that given
by the binomial expansion in Table (36). Col. (2) is obtained by
picking out the appropriate terms in the expansion of 36Q3+g)25,
where p=l/5, g=4/5 ; this expansion is
/OK OK . OA \
36U'^+^.i)2Y+^P^¥+ . . • +g'H
Thus, 5 successes occur
•^ 25 • 24 ... 6 5 20
1 • 2 • 3 . . . 20^ "^
times, and this equals 706, or approximately 7.
The mean number of successes by theory=rip=25/5=5. The
mean by trial, since it is measured from zero as origin, the numbers
in col. (1) being the deviations,
=i;(/x)/2'(/)= 162/36=45.
The standard deviation by theory
= V(^M)=v'(25xix)=2.
m^
148
STATISTICS
Table (36). Distribution of Successes (getting a 7 or 8) m
THE Random Choice of 25 digits 36 times in succession.
(1) (2) (3) (4) (6)
No. of
Successes.
Frequency
Calculation.
Frequency
Experiment.
Product of
Nos. in
Cols.(l)&(3).
Product of
Nos. in
Cols.(l)&(4).
{X)
1
1
(/)
1
1
1
2
3
5
10
20
3
5
5
15
45
4
7
7
28
112
5
7
9
45
225
6
6
4
24
144
7
4
3
21
147
8
2
9
1
2
18
162
36
36
162
856
By trial, the mean square deviation, measured from zero as origin
=856/36.
Thus the S.D. by trial= VftV—^'),
where x is the deviation of the mean from the origin,
= V[856/36 (45)2
= 188.
It wiU be seen that not one of the 36 experiments gave a number
of successes differing from 5, the theoretical mean, by more than
twice the S.D., for the number ranges only between 1 and 9.
If we treat the 900 digits as 900 separate experiments with one
event each, instead of treating them as 36 experiments containing
25 events each, we have 1/10 as the chance for the appearance of
any particular digit, and hence the number of times any digit may
be expected to appear
=^i'ibfV(^PQ')> approximately
= (900)TVd=IV(900XTVxT'^)
=90±6.
The actual number of occurrences of each digit was as follows : —
Digit ....
No. of Occurrences
95
1
96
2
93
3
105
4
91
5
80
6
82
7
72
8
90
9
96
INTRODUCTION TO PROBABILITY AND SAMPLING 149
so that the digit 7 showed the greatest divergence from 90 of any,
and this was only just three times the probable error.
[The Theory of Probability is older than that of Statistics. Todhunter, in
his History, states that ' writers on the subject have shown a justifiable pride
in connecting its true origin with the great name of Pascal.' The wellknown
story of the latter being found, as a lad of twelve, tracing out on the hall floor
geometrical propositions which he had evolved in his own head is not to be
wondered at, nor yet that at sixteen he wrote a small work on Conic Sections,
when one reflects upon the fame he was to win as a philosopher and writer,
as well as a mathematician, in his too brief life of thirtynine years. He was
born in 1623 of a distinguished French family, and for the last half of his
life he suffered from the effects of a serious disease which contributed to turn
his attention from mathematics to religion and philosophy.
We learn from Todhunter how a certain gentleman of repute at the gaming
tables set Pascal pondering on a question of probability concerning the fair
division of stakes between two players who give up their game before its con
clusion — an old problem cited in a work by Luca Pacioli as early as 1494. A
correspondence followed between him and Fermat, then probably the two most
distinguished mathematicians in Europe, and so began a science which has
fascinated at one time or another all great mathematicians from that day to
this.
The illustrious family of the BemouUis, friends of Leibnitz, who championed
his claim against that made by English mathematicians on behalf of Newton
to the invention of the Calculus ; De Moivre, an exile in England, owing to
the revocation of the Edict of Nantes ; Euler, Lagrange, and Laplace, who
worked out in algebraical form Newton's theory of gravitation for the motion
of the planets — all these had a share in building up the science of Probabihty,
often by investigating problems in games of chance, where the conditions can
be made mathematically perfect, so by careful analysis preparing the way for
the use later of the same principles in matters of greater importance.
It has been said that the development of the subject owes more to Laplace
(17491827) than to any other mathematician ; nor did he confine himself to
its theory : he would have earned fame by his astronomical apphcations alone.
His method was to take certain observations, and to determine by means of
probability whether the abnormalities present were merely the results of chance
or whether there was some as yet undiscovered but constantly acting cause
behind the phenomena observed. In this way he was led to highly interesting
and important results such as those relating to the theory of the tides, the
effect of the spheroidal shape of the earth on the motion of the moon, the
irregularities of Jupiter and Saturn, and the laws which govern the motion
of Jupiter's moons. It needs but a step in thought to pass from the dis
cussion of such physical data to the statistics of social phenomena and the
causes which determine abnormalities met with in that field. Professor Edge
worth, in making reference to books that have been written on Probability at
the end of his excellent article under that heading in the Encycl&p^ia
Britannica, remarks that ' as a comprehensive and masterly treatment of
the subject as a whole, in its philosophical as well as mathematical character,
there is nothing similar or second to Laplace's TMorie analytique des
probabilites.']
CHAPTER XIII
SAMPLING {continued) — formula for probable errors
GENERAL POPULATION.
So far we have only considered the most simple case of random
sampling when we take a sample of n independent events each of
which falls into one of two classes according to its natm*e, the
chance of entering either class being the same for every event :
we have dealt, that is to say, more particularly with nonmeasurable
characters. We pass on now to measur
able characters which are distributed
among several classes according to their
size, so that a frequency distribution
table can be set up for each sample ; and
assuming that the population from which
the samples are drawn is homogeneous,
the samples themselves containing each
an adequate number of individuals, there
should not be greater differences between
one table and another than can be ac
counted for by random sampling. It is
our object to discover how great such
differences may be.
Given a homogeneous population of N
individuals which we will suppose could
be distributed into a number of groups,
Yi individuals in the first group, Yg in the
second group, Yg in the third, and so
on, according to the size of the organ or
character under observation. Suppose a
random sample of n individuals be taken
from this population, and when they are
assigned to their several groups let the
frequency table now take the form shown,
with 2/1 individuals in the first group, y^
in the second, and so on. To find the "probable error of ?/&, the
frequency observed in the kth group.
160
Class.
Frequency.
1st Group
2nd Group
Tcth. Group
N
SAMPLE.
Class.
Frequencj.
1st Group
2nd Group
Tcth. Group
2/2
n
SAMPLING — FORMULA FOR PROBABLE ERRORS 151
Consider the selection of the n individuals, one by one in succession,
to form the sample. When the first choice is made the probability
that we shall get an individual falling into the Jcth. group is, by defini
tion, Yj./N, and the probabiHty will remain practically the same for
each successive choice granted that N is considerable. We have thus
n independent events, the chance of success (falling into the kth.
group) for each being ^(=Yj./N) and the chance of failure being
/=!:
_ ^ The case is therefore analogous to the one pre
viously considered to which the binomial distribution is applic
able, so that the frequency to be expected in the kth group is np
i.e. y]c=np with a p.e.=0'Q14:5Vnpq.
Yg, Yg . . . would not be known,
and hence the true value of p would also be unknown, but since
yjg=np, approximately, when the sample is of adequate size, we
shall get a fair idea of the probable error involved by taking
p=yjcln, where 2/a; is the actual frequency observed in the Jcth group.
y>, (1)
with S.D., Gy =Vnpq
Now in practice the numbers Y^, j 2? ^ 3
Hence, o^y^=npq=yj,(l—p)=yJl—
and the frequency in the A;th group
yj,±06745
M^'
yu\
(2)
/.
The size of the S.D. is under ordinary conditions a test of the
adequacy of the sample, for the frequency in the kth group, if due
yvA simply to random sampling,
/i J^ should not differ from its
{ V expected value by more than
(Z/±3cT,
and a, J should therefore j
A
be small compared with ?/&
itself.
To find the correlation between
the frequencies in any two
groups of a sample distribution.
Let the expected frequencies
in the various groups of the
sample be denoted by y^, y^,
. . .^ 2/fcj • • •» ^^d suppose an
error 82/ & "^ Vk is associated
with errors Sy^, Sy^, . . • , S?/,, . • • in y^, y^,
require then the correlation between yj^ and y^.
Class.
Expected
Frequency.
Observed
Frequency.
1st Group
2nd Group
kth Group
5th Group
yi
y^
yk
y»
y^+^yz
yk+^yk
y,^^y,
i * *
n
n
y.,
We
152
STATISTICS
Now although the group frequencies may change relative to one
another, the total sum of frequencies in all groups is not affected,
because the n individuals of the sample make up its composition in
each case : to keep n constant the group frequencies must adjust
themselves accordingly, which explains the correlation between
them. Hence to compensate for an excess, Si/^ (assuming hy^\'"^),
of frequency in any one group there must be a defect {—Syj.) shared
among the other groups, and the fairest way of sharing will be in
proportion to the expected frequencies in the several groups.
But the total frequency divided between groups other than the
A;th is (w— 2/fc)» so that the proportion of {—Syjc) due to the 5th group
is VsKn—Vk), thus
S2/.= ^^(82/,).
nVic
Therefore,
^Vk'^Vs^
'Vk
n
Vs ^y^
Vki 1
Vs §2/^
n a'
•Vk
. (3)
by (1).
FIRST SAMPLE.
Size of Organ
or Character
observed.
Frequency of
Observations.
First Moment.
Second Moment.
X2
2/2
Vk
XkVk
Ay2
x\yk
n
1{xyy
2ix^y)
This gives the product moment of the deviations from yj^ and yg
in one particular sample ; summing for all such samples, remem
bering that by definitign the coefficient of correlation between ?/^
SAMPLING FORMULAE FOR PROBABLE ERRORS 153
and 2/s is ry^y^=I!{Syji. • ^ys)lv(JyGy^, where v is the total number
of samples, also cr^^ ^ZSy^j^lu, we have
Therefore,
r =l,Ml.
" ^^.^y.
(4)
gives the correlation required.
To find the p.e. of the mean of a sample of n observations. Let a
frequency table be drawn up in the usual manner showing the
number of observations y^, y^ . . . corresponding to organs of
different sizes x^, x^ . . .
The mean referred to some fixed point as origin is then given by
also the mean square deviation of the sample referred to the same
fixed point is /^^2' ^^7^ given by
and
m22_M2=(t2
where a is the S.D. of the sample.
For another sample of the same size the frequency distribution
SECOND SAMPLE.
Size of Organ
or Character
observed.
Frequency of
Observations.
First Moment.
2/1 + %i
2/2+^2/2
Vk+^Vk
^liVi + ^Vi)
^2(2/2 + %2)
XkiVk + ^k)
■
n
My+^y)
may be slightly different, say, 2/1+82/1, 2/2+^2/2, • • •, and conse
quently the mean will also be different, say,
U+m=[x,{y,+8y{l+x^{y^+Sy^)+ . . . ]/n,
154 STATISTICS
and, by subtraction,
SM=(a;i82/i+a;282/2+ . . .)M • . . (5)
Now we want to determine the S.D. of the different values of M
found among the different samples, and that is given by
where U denotes summation for all samples and v is the number of
samples. This suggests that we should square both sides of
equation (5), getting
Therefore, n^ . vG\=x\va^yi+ . . . +2x^xJ ^ . v]{ . . .,
by (3). Hence, making use also of (1),
\ n J n
= (^2/1+ . . . )(A2/S+ . . . +2^12/1.^22/2+ . . .)
n
=n^\l(xiy^^ . . 0^
n
Thus G\=(H'\W)ln=G''ln,
and the probable error of the mean == 0*6745(7/ \/y» . . • (6)
The p.e. in the arithmetic mean found by taking a random sample
of n events is a measure, so to speak, of the failure to hit the absolute
mean, and it follows that the precision of the sample, the accuracy
of aim at the mean, would be not unfairly measured by some
quantity proportional to the reciprocal of the above expression,
namely, ^/n/0'614:5G. With such a measure the precision would
evidently be increased if the number of observations in the sample
were increased, being proportional to the square root of their
number,
[It is desirable to draw a distinction here between what have been
termed biassed errors and unbiassed errors ; errors due to random
sampling are of the second class for there is, by hypothesis, no
[* We do not know the true mean for the population as a whole, but we take
in place of it M, the value given by the sample, which we may do with little
error if n is large. Similarly c is the S.D. of the sample. ]
SAMPLING — FORMULA FOR PROBABLE ERRORS 155
reason why they should be in one direction rather than in another.
Biassed errors, however, all tend to be in the same direction and
they may arise in different ways, e.g. they may be due to faults of
omission or commission on the part of the observer himself : he
observes either carelessly or badly, omitting certain factors which
ought to be taken into account, or so measuring or classifying his
results that they appear always larger or less than they really are
in fact.
Sometimes, although the bias is known to exist, it may be im
possible to correct it : the most one can do is to bear it in mind
and allow for it in using the results. A familiar example of this
occurs in the collection of household budgets from the poor to find
their standard of living, where it is only possible to get particulars
from the more intelligent and thrifty class among them.
Whereas in the case of unbiassed errors due to random sampling
we can diminish the probable error of the average by increasing
the number of observations, the same is not true of errors which
are biassed, for suppose an error e in excess be made in each of
n observations x^^ x^,  .  x^, the effect upon the average is to
increase it from
a?i+^2+ • • • +^n ^ (^i+e)+(^2+e)f • • • +(^n+6)
to — '
n n
i.e. from
n n
so that the average is overestimated by precisely the same amount.
If, therefore, we know that bias exists, it is well, if possible, to
correct it in each observation, for by so doing we change biassed
into unbiassed errors, and though our corrections may be somewhat
wide of the mark, the resultant error will then be diminished by
increasing the number of observations : e.g. a farmer offers 400
sheep for sale and, being anxious to make a good bargain, he asks
a higher figure for them than he is in reality prepared to take ;
let us suppose that this excess is 2s. 6d. for each sheep, then clearly
the average price per sheep at which he is prepared to sell will be
less than the amount he asks by 2s. 6d. also. But now suppose the
buyer, a simple person knowing little of the prices of sheep and
less of the ways of men, goes through the flock one by one and
makes the error of offering a price either much above or much below
what the seller is prepared to take ; even if his unbiassed offers
156 STATISTICS
differ by as much as 10s. for each sheep from the seller's reserve
price, so long as they are random in direction, i.e. sometimes too
much and sometimes too little, the resultant difference in the
average from what the seller is prepared to take will probably not
greatly exceed f 10s./\/400, or 4d. per sheep.
We can sometimes diminish the effect of bias, even when its
extent is unknown, by working with the ratios of the quantities
affected instead of with the quantities themselves : e.g. suppose
biassed errors, 61 and eg, enter into the measurement of the variables
Xj^ and X2, both in excess, the ratio of the variables then
= (^i+ei)/(a;2+e2)
=xJl+'^)/xJ 1 + .
Xi I \ Xi
x^X ;
( 1+— )( 1——+ higher powers of eg
:*' l + il!
if we omit higher powers of ej and eg than the first on the under
standing that they are both comparatively small. Suppose, for
example, there was an error of 5 per cent, made in measuring ic^
and an error of 3 per cent, of like sign in measuring X2 then the
resulting error in xjx2 would be 5 per cent. — 3 per cent. =2 per cent.
Clearly the same holds good also if the errors are both in defect.
This explains why a comparison of results arranged, say, on the
index number principle may be trustworthy, although the method
of formation of the numbers themselves may be in some respects
faulty, granted that the same faults are repeated each year so as
to produce Uke errors, i.e. the bias is to be unchanged in character.
To correct the faults in one case and not in the other would prejudice
the success of the method, since it depends upon the errors counter
acting one another.]
Example (1). — To illustrate the important result we have obtained
for the p.e. of the mean of n observations let us return to the experi
ment of selecting 900 random digits. The distribution actually
obtained, and the theoretical distribution to be expected in the
SAMPLING — FORMULJE FOR PROBABLE ERRORS 157
long run if the experiment were repeated several hundred times and
the average taken, are shown in the following table : —
Table (37). Disteibution op 900 Random Digits.
•Tk •*
Frequency
Theoretical
Digit.
Frequency
Theoretical
Digit.
Observed.
Frequency.
Observed.
Frequency.
95
90
5
80
90
1
96
90
6
82
90
i 2
93
90
7
72
90
1 3
105
90
8
90
90
1 '
1
91
90
9
96
90
It is a simple matter to calculate the mean and S.D. for the dis
tribution from this table in the usual way ; the results are : —
Observed mean =438
Theoretical mean=450
S.D.=2911
S.I).=2872.
Thus the p.e. of the mean based on the sample
= ±06745 x2911/\/900
= ±0065,
and 438 differs from 450 by less than three times the p.e.
The 36 averages of samples of 25 events apiece were also calcu
lated, and the following were the results obtained : —
276, 332, 368, 372, 372, 372, 376, 380, 392, 392, 408, 412,
416, 416, 416, 428, 436, 440, 440, 440, 444, 460, 464, 468,
472, 472, 476, 488, 496, 500, 500, 500, 508, 528, 540, 572.
The mean of this distribution=15772/36==4381, and the
S.D.=0612. But the S.D. of the whole distribution of 900 digits
=2911, and therefore the S.D. of the distribution of averages of
samples of 25 digits should be 2'911/V25=0582, differing from
0612 by about 5 per cent.
To find the p.e. of the sum or difference of two variables. Let the
mean values of the two variables be denoted by y and z, so that
deviations from these values found in a particular sample may be
denoted by Sy and Sz. If then we write
u=y\z
we have
Su=By+Bz
(7)
158 STATISTICS
To find the S.D. of u we therefore require E{hu^)jv, where the
Z denotes summation for all samples and v is the number of samples.
But, squaring both sides of equation (7), we have
Thus . Shu^=Ehy^+Zhz^\2E(hyhz),
where the summation extends to all samples. Hence
vg\= va^y+ VG%+ "IvOya^Ty^
or (7\=a^y+c72,+2r,,(7y(7,
where r^^ ^^ ^^ correlation between the variables. And the
p.e.=06745(7,,.
The p.e. of the difference of two variables follows at once by
changing the sign of z throughout ; for, if
v=y—z,
we have hv'^=hy'^\'hz^—1hy^z,
and o\=o''y+o'^,— 2ry,oyO,.
Generally, if x^, x^, . . . x^ be the mean values of n variables,
and if ^x^, SiCg, . . . 8a;„ denote deviations from these values in
a particular sample, we may write
It — X^ ~\~ X2 ~i • . . ~\~Xji
and Su=Sxj^\Sx2\ . . . +S^n.
Thus 2:Su^=2:Sxi^i . . . +2I(SxiSx2)i . . .
whence cr\=G\+ . . . \2i,^,a^a,^i . . .
Important Corollary. If y and z are quite independent so that
Vy^ is zero, the p.e. of their sum and the p.e. of their difference
have the same value, namely, the square root of the sum of the
squares of the p.e.'s of y and z themselves, which
=06745v/(o%+or\) . . . (8)
This result is exceedingly important, because it can be directly
used to test whether a difference between two samples is accidental,
i.e. whether it is such as might arise through sampling, or whether
it imphes a real difference between the two populations from which
the samples are selected. An example will illustrate the pro
cedure : —
Example (2). In a study of Minimum Rates in the Tailoring
Industry i by R. H. Tawney, a table is given (p. 114) which suggests
SAMPLING ^FORMUL^ FOR PROBABLE ERRORS 159
that * in the north of England women work in the tailoring trade
when they are young ... in London and Colchester they have
to work when they are older.' Taking some figures from that
table we find : —
District.
AVorkers over
35 years old.
Workers at
all ages.
Proportion '
over 35. ,
London and Essex
Manchester and Leeds .
11,718
4,029
35,316
21,822
0332
0185
r ' >'n
The difference between the proportions over 35 years of age
= (03320185)=0147.
Let us suppose for the moment that this difference is not significant
of any real difference in conditions between the two districts, but
is merely due to random sampling. In that case the most natural
value to assign to the true proportion of women workers over 35
for the trade as a whole, as given by these figures, would be
n^^718+4,029^15J47^^.2^^
35,316+21,822 57,138
The S.D. for the first sample (London and Essex) would then be
01= V(PQM= \/[0276 X 0724/35,316],
and for the second sample (Manchester and Leeds) would be
(72= a/[0276 X 0724/21,822].
Hence the p.e. for the difference between the proportions in the
two samples would be roughly
=Wi^\+^%), by (8),
= f V[0276 X 0724(1/35,316+ 1/21,822)]
=f VLO276 X 0724/13500]
=00026.
The actual difference between the proportions, 0147, being much
more than 3(00026), is certainly significant of a greater difference
between the two populations ihan can be explained by random
sampling alone.
><U
160 STATISTICS
Another method of attack would be to assume a real difference
between the two populations, if other considerations led us to
suspect such a difference, and to find whether such a difference could
be disguised by random sampUng. In that case the proper pro
portion to assume for the first sample would be 0332, giving
ai= V[0332 X 0668/35,316]= V628/10^
and for the second sample the proportion would be 0185, giving
(72= ^[0185 X 0815/21,822]= V^Ol/lO^.
Hence the p.e. for the difference between these two proportions
due to random sampling would be
= IVK'+^2'), by(8),
= ^^y(628+691)
=00024.
The actual difference is 0147, which certainly could not be out
balanced by an error in the opposite direction due to random
sampling, because it is much more than three times the probable
error due to sampling.
Sometimes we have to test the difference, not between two
simple proportions, but between two sample distributions. In
that case the mean of each sample may be calculated so that the
difference (M^— Mg) between the means is known ; to find out
whether or not it is significant of some real difference between the
two populations from which the samples are drawn, (Mj— Mg)
is compared with its p.e., namely
06745V(ct2mi+^Im2),
or 06745 V(o\/r^i+c7\/^2) • •  (9)
where Ui and n^ are the numbers of observations in the two samples
respectively, and g^, g^ are the S.D.'s of the samples. Unless
(Mj— Mg) is definitely greater than some two or three times this
expression we cannot be very sure that the difference between M^
and Mg may not have arisen merely through random sampling,
and it may quite Ukely not be significant * of any real difference
between the two populations as regards the organ or character
which is under consideration.
[* It should be observed that the S.D. provides a wider margin for significance
than the p.e., because a range of approximately 3 p.e. =3'§(r = 2o onl3^ It is
quite safe therefore to attach no great significance to a difierence which does
not exceed two or three times the p.e.]
SAMPLING FORMULA FOR PROBABLE ERRORS 161
Example (3). — Statistics have been collected to test whether there
is any significant difference between the eggs laid in general by
cuckoos and those laid by them in the nests of particular species
of foster parents. Results of the following kind were obtained
[see Biometrika, vol. iv., pp. 363373, The Egg of Cuckulus Canorus
(2nd Memoir), by 0. H. Latter] : —
Number
Mean
S.D.
(mms.)
Signi
Group.
of
Length
ficance
Remarks.
Eggs.
(mms.)
Test.
Eggs of the Cuckoo
race in general
1572
223
09642
. .
, .
Eggs laid in nests of —
Garden Warbler .
91
219
07860
70
Significant.
White Wagtail .
115
224
07606
16
Not significant.
Hedge Sparrow
58
226
08759
375
Probably significant
The diJfference between the mean lengths of eggs laid in the nests
of garden warblers and those laid by cuckoos in general
=223— 219:04 mms.
The p.e. of this difference
=06745 V[ (07860)2/91+ (09642)2/1572], by (9),
=06745^(0007380)
=0058.
Hence the significance test
=04/0058=70,
and we conclude that the difference in length between the two
classes of eggs is certainly significant. Similarly the other cases
may be tested.
In the example just given, to find out whether one population
differed from another, the arithmetic means have been compared ;
but the mean alone will scarcely serve to establish the identity of
any population. For example, we can conceive of two distinct
races of men, both of the same mean height, but one race embracing
a number of giants and dwarfs. Of course if we agreed to define
two races as identical when they have the same mean heights, there
would be nothing more to be said, but that would certainly only
be a very roughandready attempt at classification.
Taking into consideration only the character of height, a further
step in definition would be to measure the mode or most fashionable
L
162 STATISTICS
height, and the dispersion or variabiHty — absolute : the standard
deviation, and relative : the coefficient of variation — of the two
races. Then, after comparing heights with sufficient detail, the
attention could be turned to innumerable other characters, skull
and body measurements, physical, mental, and even moral
attributes.
Clearly the difficulty of definition and of establishment of identity
grows as we pass along the scale from physical to moral. Moreover,
other statistical constants must be requisitioned when the question
of the existence and degree of relationship between two organs or
characters is to be determined. As the S.D. and the C. of V. serve
to measure the amount of variability, so the coefficient of correlation
comes in to measure the amount of likeness or association. Further,
and especially in problems of inheritance, the coefficient of regres
sion must be measured. It might seem at first sight hopeless to
try and measure the correlation between two such characters as
athletic capacity and health in the same boy, or between the
truthfulness of one boy and that of his brother ; but the genius of
Karl Pearson has gone some way to solve even this difficult problem
by means of a system of adjectival instead of numerical classifica
tion [see Phil. Trans., vol. 195a, pp. 147, On the Correlation of
Characters not Quantitatively Measurable, and, as an exceptionally
interesting application of the method, see Pearson, On the Laws of
Inheritance in Man, ii. ; On the Inheritance of the Mental and Moral
Characters in Man and its Comparison with the Inheritance of the
Physical Characters; Biometrika, vol. iii. pp. 131190]. In short,
for a full and exact definition of a population of any kind, human
or otherwise, it is necessary to measure not only the means, but aU
the more important statistical constants, modes, medians, S.D.'s,
C.'s of v., coefficients of correlation and regression, and so on, and
it is no less necessary to calculate also their probable errors if we
are to test the real significance of such differences as are observed
in these constants between two samples from the same or from
different populations.
The probable errors for the more important constants, some of
which are only introduced later in the book, are collected together
in Table (38) for reference. The proofs in general are a little intricate
and would be lacking in interest to the ordinary person, who is
satisfied to take algebraical analysis on trust so long as he under
stands the nature of the results he uses, but the more mathematical
reader who is anxious to see proofs may refer for some of them to
Biometrika, vol. ii., pp. 273281, Editorial, On the Probable Errors
SAMPLING — FORMULA FOR PROBABLE ERRORS 163
0/ Frequency Constants, which has been freely consulted on the
subject here.
The usual notation is adopted, n being the total number of
observations in the given distribution, supposed normal in general,
o the S.D., etc.
Table (38). Probable Errors of Statistical Constants.
statistical Constant.
Probable Error (=06745 S.D.).
1
Any observed group frequency, y
; The mean of a distribution of any type
' The S.D. of a normal distribution, o .
[The second moment about the mean, n^
. „ third „ „ „ Ms •
[ „ fourth „ „ „ ^4 .
The coefficient of variation, v .
The coefficient of correlation, r
The correlation ratio, »/....
f X, as determined from (XX)=r^(Y Y),
ay
1 when Y is given
Y, as determined from {YY)=r^(XX),
^ when X is given
Distance between mode and mean in a skew
distribution . . . .
Skewness
^2 (which should = 3 for a normal distribution)
' ^i( »» »» =0 „ „ )
VW^ .
06745j< V[y{ly/n)]
cr/Vn
a/V2n
aW2fn
aW9Q/n
„ {lr^)/Vn
{lr]^)/Vn, nearly
(rV(3/2n)
„ V(3/27i)
„ V(24/7i)
»
» V(6/n)
Example (4). — Li the example which follows are given data
necessary for testing the significance of differences in variability
as well as in mean values. They represent an attempt made to
find whether members of a particular species of crab caught in
shallow water differed with regard to certain characteristics from
those caught in comparatively deep water [see Biometrika, vol. ii.,
pp. 191 et seq., Variation in Eupagurus Prideauxi, by E. H. J.
Schuster]. Only a few of the results are recorded here, to two
decimal places ; the reader wiU find it a valuable exercise to verify
for himself the p.e.'s given in each case.
164
STATISTICS
Measurement Made.
Sex.
Locality.
Mean (mm.).
S.D. (mm.).
C. ofV.
per cent.
Carapace length
Male
Female
55
Deep water
Shallow ,,
Deep
Shallow „
8.59±005
841 ±004
754±003
712±002
l67±0.04
149±003
094db002
0.86±0.02
1945db044
1775±037
1249±028
1212±025
Difference of Means (mm,).
Difference of S.D.'s (mm.).
Difference of C.'s of V.
per cent.
Sex.
018±007(poss. sig.)
042iO04(sig.)
018db005{prob.sig.)
008±003(poss.sig.)
1.70±0.58(poss. sig.)
037±037(not8ig.)
Male
Female
The significance or otherwise of differences between variabiUties
in the case of cuckoos' eggs (p. 161) might be tested in the same way.
CHAPTER XIV
FURTHER APPLICATIONS OF SAMPLING FORMULA
We have been discussing in the last chapter how to test two samples,
supposed each to contain homogeneous material, to find out whether
they belong to the same or to different types of population, but
the further question often arises as to whether a sample is or is not
homogeneous. t
Example (1). — To this we may obtain a partial answer by working
out the statistical constants of the sample and their p.e.'s in order
to compare them with the corresponding constants for a sample or
series of samples believed to be homogeneous and of the same
type. For example, Professor Karl Pearson has measured the
skulls of skeletons of the Naqada race, excavated in Upper Egypt
by Professor Flinders Petrie and presumed to be some 8000 years
old, and he places his results for comparison alongside those
for certain other races admittedly homogeneous [see Biometrika,
vol. ii., p. 345, Homogeneity and Heterogeneity in Collections of
Crania] : —
Variability (mm.).
Series.
Number of
Observations.
Skull Length.
Skull Breadth.
! /"Ainos
76
5936
3897
Bavarians .
100
6088
6849
Skulls J
Parisians
77
5942
5214
Naqadas
139
5722
4612
lEngUsh
136
6085
4976
Living r ^^"i^ridge undergrad'tes
heads ^^gl^^^ criminals
tOraons of Chota Nagpur
1000
6161
6055
3000
6046
6014
100
5916
4397
Mean Variability
5987
4877
166
166 STATISTICS
The S.D. of the variabihty of skull length calculated from this
series=0129 mm. and of the variabihty of skull breadth=0545 mm.,
and these supply standards for valuing the differences between the
Naqada and the mean variabilities.
Another method of procedure is to take a random sample out of
the sample itself, assuming the latter is large enough to admit of
an adequate subsample, and to compare the constants of the
whole and jjart. When they do not differ beyond the Hmits allowed
by random sampling the inference is that the whole may be treated
as a homogeneous class if judged by this test alone.
Example (2). — In an interesting and important memoir, On
Criminal Anthropometry and the Identification of Criminals, by W. R.
Macdonell [Biometrika, vol. i., pp. 177 et seq.], the author uses this
method to test the homogeneity of a class of 3000 criminals by
measuring also a random sample of 1306 ciiminals out of the 3000.
He obtained, for example,
S.D. of head length 604593±005265 mm., for the 3000 criminals ;
= 600247 ±007922 „ „ 1306
The difference between the variabilities in the sample and sub
sample, by result (8) on p. 158,
=004346±V [(005265)2+ (007922)2]
= 004346+009512
which is certainly not significant. If the same holds good with
regard to the means and other constants, then the whole may be
said to be homogeneous so far as this test goes.
Example (3). — ^Another example may be given from the memoir
on Variation and Correlation in Brain Weight, by Raymond Pearl,
[Biometrika, vol. iv., pp. 13 e^ seq.]. The author wished particularly
to investigate the change of brain weight with age ; on the hypo
thesis that the weight of the brain reaches a maximum between
the ages of 15 and 20, remains unchanged from 20 to 50, and then
begins to decline and so continues till death, the material was
divided into a * Young ' series, ages 20 to 50, and a ' Total ' series
including all between 20 and 80. The ' Young ' series thus formed
a selection from the ' Total ' series, but a selection based on age
and not on brain weight. If there were no correlation between
age and brain weight, this selection, based as it is on age, would,
of course, be random as regards brain weight. Now correlation
does exist between the two, but it is so slight that, within the hmits
l^URTHER APPLICATIONS OF SAMPLING FORMULA 16?
of error, the ' Young ' series does form practically a random sample
of the ' Total ' series, as is shown by the following figures : —
Difference in Variation Constants between Young and
Total Series (written with a positive sign when the
Young Series gives the greater value).
Male.
Female.
Swedes
Bavarians
S.D.
+2851+4066
1888+3556
C. of V.
+0122+0291
0173+0234
S.D.
+ 4786+5465
10357 + 3909
C. of V.
+0271 + 0435
0941+0320
Thus in only one case, that of the Bavarian females, is the differ
ence between the variabilities, S.D. or C. of V., of the two series as
great as its probable error, and even in that case the differences,
10357 and 0941, are not three times as large as their respective
p.e.'s, 3909 and 0320. Dr. Pearl concludes from these and similar
results that ' the series are reasonably homogeneous in other respects
than age.'
The reader is recommended to test his knowledge of the formulae
for probable errors by applying them to the following examples.
Dr. Alice Lee, in a note on Dr. Ludwig on Variation and Correlation
in Plants [Biometrika, vol. i., p. 316] makes use of the statistics
relating to Ficaria Vema in Example (4). Those in Example (5)
are taken from among a large number of others in the highly
interesting memoir, On the Laws of Inheritance in Man, by Professor
Karl Pearson and Dr. Alice Lee [Biometrika, vol. ii., pp. 357 et seq.]
cited once before.
Example (4). — Variation and Correlation in Ficaria Verna.
• No. of Observations.
Mean No. of
Petals; S.D.
Mean No. of
Sepals; S.D.
Correlation between
No. of Sepals and
No. of Petals.
1000 (Greiz A)
1000 (Greiz G)
8286; 13382
8232; 09954
3695; 08524
3437; 07033
02439+00201
02480+00200
We have here all the data necessary to find the p.e.'s of the
means, variabilities, and correlations, and we wish to know whether
168
STATISTICS
the differences between the means and variabilities of the A and G
plants can be accounted for by random sampling alone.
For example, the difference between the petal means
= (8.2868.232)±i /[ (1:3382)^ (0;9954n
\j[_ 1000 1000 J
=0054±0035.
Clearly this difference, being not so great as twice its p.e., is not
significant and may quite well be due to random sampling.
Again, the difference between the petal variabilities
(l338209954)±f
=03428±0025
\
(13382)2 , (09954)2
2000
2000
which is certainly much too great to be explained away by random
sampUng merely.
Similarly the differences between the sepal means, between the
sepal variabilities, and between the correlations, may be tested for
significance by comparison with their p.e.'s.
Example (5). — Size and Variability of Stature in the
Two Generations.
Father.
Mother.
Son.
Daughter.
Mean height (in.)
S.D. (in.) .
C. of V. (percent.)
6768±006
270 ±004
399±006
6248 ±005
239±004
383±006
6865 ±005
271 ±004
395±006
6387 ±005
261 ±003
409 ±005
The student in this case might use one of the formulae for the
p.e.'s to find the number of fathers, mothers, sons, or daughters
observed when the p.e.'s are known, and then the remaining p.e.'s
might be verified when the numbers of observations are found.
As evidence of ' assortative mating,' the tendency of like to
mate with like, the following particulars are given, based on 1000
to 1050 cases of husband and wife : —
Correlation between stature of husband and stature of wife=02804±00189
„ span „ „ „ span „ ,, =01989±00204
,, ,, forearm ,, „ ,, forearm ,, „ =01977±00205
To measure the average intensity of inheritance, the extent of
FURTHER APPLICATIONS OF SAMPLING FORMULiE 169
resemblance between parents and children in any character,
efficients of correlation are calculated such as the following : —
co
Coefficient of Correlation
between stature of father and stature of son =0514db0015
,, ,, ,, „ „ daughter =0510±0016
,, mother ,, ,, „ son =0494±0016
,, ,, ,, ,, ,, daughter =0507±0016
[In verifying the p.e.'s for this case take the number of observa
tions to be 1024.]
One more extract may be quoted, a prediction table, giving the
probable mean stature of sons of fathers of given stature, and
so on : —
Son's probable stature = 3373 + 0"516 (father's stature) ± 1 '56
Daughter's „ „ = 3050 + 0*493 ( „ „ )±151
Soq's „ „ =3365 + 0560 (mother's stature) ±159
Daughter's „ „ =2928 + 0554 ( „ „ )±l52.
All values given in this example for the p.e.'s should be
verified.
Before we consider further applications of these principles to
questions of a somewhat different kind, let us imagine a very
simple though artificial illustration. Suppose we have 999 sheep,
each one ticketed, the numbers on the tickets running from 1 to
999. Also suppose 666 of these sheep are white and 333 are black,
so that, if we pick out any one at random, the chance of it being
black is 333/999 or 1/3. Let us call picking a black sheep a ' success,'
then :p= 1/3, g=2/3.
We proceed now to select 99 sheep in succession at random
from the flock with the understanding that each sheep is returned
into the flock before the next is picked out. This insures that
the chance of a success at each selection remains equal to 1/3 and,
of course, there is nothing to prevent the same sheep being picked
more than once. The selection might practically be made by
placing in a box 999 tickets, numbered from 1 to 999, one to corre
spond to each sheep, then picking out 99 of them in succession,
being careful to replace each and to shake up the box before picking
out the next ; if there were absolutely no difference between the
tickets, such as would cause one to be picked more easily than
another, the selection made in this way would be random in the
170 STATISTICS
sense required, and the tickets so chosen would determine which
sheep were to be taken and which left.
The proportion of black sheep to be expected in such a random
selection of 99 is 1/3, but, if we only perform the experiment once,
it is quite Ukely that the proportion we actually get will differ from
1/3 by an amount
=06745V(2??/w)
=06745V(J . § . A)
= 1/31, about,
while it is unUkely that the proportion will differ from 1/3 by much
more than 3/31, or 1/10.
Conversely — and it is really the converse which is useful in prac
tice — if we do not know the proportion of black sheep in the whole
flock, we may get a fair estimate of it by taking a random sample
of 99 sheep (any other number wiU serve the purpose, but the
larger the better for accuracy), and if we find that in this sample
there are 33 black sheep, i.e. ^=33/99=1/3, it will appear that
the value of jp for the whole flock is 1/3, subject to a probable error
06745\/(2?9'/^) in excess or defect, i.e. the true proportion for the
whole flock may quite likely differ from 1/3 by as much as 1/31,
but it is unlikely to differ by much more than 1/10. It should be
noticed that the calculation of the probable error in this converse
case is based upon the value of p given by the sample taken, for
that is the only value of which we have knowledge.
Too much stress can scarcely be laid on the fact that the samples
chosen must be absolutely unbiassed, otherwise the use of the
formulae Tfp and ^/(npq), or the corresponding proportional formulae,
cannot be justifled : each sheep in our illustration must have the
same chance of being picked, and no one selection is to have any
influence on another. The failure to appreciate this essential
point has led to no little waste of time and effort in the collection
of valueless statistics.
The method of sampling has been employed in a way at once
interesting and useful by Dr. A. L. Bowley, and, as some of this
work has barely received the attention it deserves, it may be well
to explain two of his experiments in some detail.
The first was of interest because its results could be tested by
an examination of the original record from which the sample was
taken. The details concerning it are abstracted from the Journal
of the Royal Statistical Society, September 1906.
Example (6). — Bowley sampled the dividends paid by 3878
FURTHER APPLICATIONS OF SAMPLING FORMULA 171
companies as quoted in the Investors' Record. His sample con
sisted of 400 of these companies, i.e. about 10 per cent., selected in
a purely arbitrary fashion thus : the investigator took a Nautical
Almanac and noted down the last digits of one of the tables, record
ing them in groups of four, but if any particular group gave a
number bigger than 3878 he rejected it. In this way each of the
numbers between 1 and 3878 had an equal chance of selection (for
numbers under four figures would appear like 0327, 0042, 0009,
which would be taken to represent 327, 42, 9 respectively), and the
selection of one had no influence on that of any other. The com
panies in the Investors' Record were numbered consecutively, and
the dividends corresponding to the 400 arbitrary numbers obtained
formed the sample with which Bowley worked.
After making some interesting deductions with regard to the
average for the whole distribution, to which we shall return pre
sently, he proceeded to forecast the grouping of the original com
panies as to their dividends by setting out the grouping discovered
in the sample 400, as follows, using the standard deviation in place
of the probable error as the error due to randorii sampling : —
Table (39). Distribution of Dividends paid by a
Sample of 400 Companies.
(1)
(2)
(8)
(4)
Dividend.
Sample of
400
Companies.
Percentage of Sample
Companies in each Class.
Percentage of
all Companies
in each Class.
Nil
£1 to £2, 19s. 9d.
£3 to £3, 9s. 9d.
£3, 10s. to £3, 19s. 9d.
£4 to £4, 9s. 9d.
£4, 10s. to £4, 19s, 9d.
£5 to £5, 19s. 9d.
£6 to £7, 19s. 9d.
£8 to £10, 19s. 9d.
Ab6ve£ll
28
6
37
71
64
53
60
48
29
4
7 with S.D. = l27
n
9i „ =146
171 „ =190
16 „ =183
13J „ =168
15 „ =178
12 „ =163
7i „ =129
1
6
15
84
188
173
138
177
108
38
19
In col. (3) the S.D. for each group was calculated as follows : —
for the first group : out of 400 possible events we have 28 successful
events, meaning by ' successful ' here * a company paying no
dividend,' thus
^=28/400, 3=372/400.
172 STATISTICS
Hence the S.D. of the frequency in the first group
= V(28x372)/20
=5l.
Since this is for a sample of 400, the S.D. of the ^percentage * frequency
in the first group
J(5l)l27.
The other S.D.'s are calculated in the same way, but when the
number in a class is very small the forecast can scarcely be refied
upon and consequently the S.D. is not inserted.
It will be noted, by comparing with the numbers in col. (4),
showing the corresponding percentages for aU the 3878 companies,
that every forecast was remarkably good except one, class £8 to
£10, 19s. 9d., where the error approaches three times the S.D., and
the exception will serve as a warning that, in working with samples,
the unexpected sometimes happens. Professor Edge worth, in his
Presidential Address to the Royal Statistical Society (1912), points
out that the method appears to be a permanent institution in
the Statistical Bureau at Christiania, where it has given very good
results. These can be checked or ' controlled ' for safety if complete
statistics are obtainable under some heads. He fairly sums up the
utility of sampling when he says that ' we may obtain from samples
a general outline of the facts — often sufficient for the initiation of
a project like that of insurance — ^rather than the features in detail.'
Bowley also divided up his 400 random samples into 40 groups
of 10 companies each, and calculated the average for each group.
The S.D. for these 40 averages was found in the usual way, giving
0775. But since this was the S.D. for averages of 10, we conclude
that
(theS.D.forthedistributionofthe400companies)/\/10=0775
i.e. the S.D. for the distribution of the 400 companies =07 75^^10.
Hence, appl.ying the same principle again,
the S.D. of the average of the 400 sample companies
0775V10/\/400
=£0122.
[* It would not be correct to take \/[7(l iItt)] as the S.D. of the percentage
frequency in the first group ; this value would be double the true value, namely,
J v''[28(l TW)] = i v''[7(l T^ir)], because the accuracy is increased by increasing
the number of events in a sample, and the sample here is really 400 and not 100. J
i^
FURTHER APPLICATIONS OF SAMPLING FORMULA 173
Now the average of the 400 samples turned out to be £47435.
Hence it was judged that, if this was a fair selection (and the random
method adopted was such as to make it fair in all reasonable likeli
hood), the average for the 3878 companies should certainly lie
between
£[47435±3(0122)].
The true average was found by actual calculation to be £4*779,
well within the above limits, although the original items varied from
nil to £103, being grouped according to the nature of the security
— Government, Railways, Mines, etc., etc., and the averages and
S.D.'s on successive pages di£fered materially. This aggregation,
Bowley remarks, is very similar to that found in wages in different
occupations and localities, and in many other practical examples.
The value of the second experiment due to Dr. Bowley lies in the
suggestion that similar means can be applied with good results to
the investigation of many social phenomena.
If out of a large group a comparatively small sample of statistics
is collected in the purely random manner already described, we are
able by such means to estimate what is the average, and even to
obtain limits between which the average wiU almost certainly lie,
in the large group based upon values found for the average and
S.D. in the small sample.
Example (7). — With the collaboration of Mr. BurnettHurst and
a number of other workers. Dr. Bowley conducted an inquiry into
the conditions of workingclass households in four representative
towns — ^Northampton, Warrington, Stanley, and Reading — ^the
results of which are published by Messrs. Bell and Sons under the
title of Livelihood and Poverty. They are similar in character to
those obtained by Rowntree in his study of conditions in York,
but what is peculiar to Bowley's inquiry is that only a sample,
about 1 in 20, of the workingclass houses in each town was
examined, and the conditions in the towns as a whole were deduced
from these samples.
We are not concerned here with the actual facts disclosed by the
investigation, striking as they are, but with the explanation of the
sampling method adopted, and as to that it may be remarked that
the foundation on which it rests is precisely the same as that which
underlay the example of the 999 black and white sheep. The
main point to notice here again is that Bowley was careful to select
his samples in unbiassed fashion as follows : ' For each town a list
of all houses . . . was obtained, and without reference to anything
174 STATISTICS ;
except the accidental order (alphabetical by streets or otherwise^
in the list, one entry in twenty was ticked. The buildings sa
marked, other than shops, institutions, factories, etc., formed th^
sample.' It will be evident that this method of choice is not quit©
on the same level of randomness as that followed, for example, m
drawing cards from a wellshuffled pack, each card to be replaced
and the pack reshuffled before the next is drawn ; but, for that
very reason, the results of the experiment are all the more likely
to be well within the limits of error provided by the formulae o
the ideal case. The deliberate selection of every twentieth hous^
in each street is likely, that is to say, to give a more representative
picture of the town as a whole than would be obtained by selecting
the same number of houses in a purely random fashion which might
by chance give too much emphasis to some street or district.
A practical test of the goodness of the sample was possible by
comparing the results in a few instances with information available
from other sources. In order to make the method of working
quite clear, let the guiding principle first be recalled : —
* If , in a random sample of n items, the proportion of successes
is p, then the proportion of successes in the universe from which the
sample is selected will not be likely to fall outside the limits
p±3(06745)VtoM),
and, if that universe contains altogether N items, the number of
successes will not be likely to fall outside the limits j
Njp±3(06745)NV(i?^M).' T:
In Reading the total number of all inhabited houses in th^
borough was 18,000 at the time of the inquiry, i.e. N= 18,000.
The total number of houses visited was 840, i.e. 71—840. If we^
call a house assessed at £8 or less a ' success,' the number of suchi
houses found in the sample was 206. {
Thus ;p=206/840, ^=634/840, ;
and the number of houses rented at £8 or less in the whole borough
should be ^ i
N2? with a ly.e.=0•Q14:5N^/ (pq/n)
i.e. 4414±180. :
The actual number of houses so rented was known from other sources
to be 4380, weU within the limits forecasted.
The value used for p in the above is that given by the sampleJ
but when we know the actual number of successes in the universe!
FURTHER APPLICATIONS OF SAMPLING FORMULA 175
as a whole, as in this case we do, we might use the true value of
p, i.e. the value for the universe in place of that for the sample.
The argument might also be put in another way without affecting
the principle employed, thus : —
The number of houses rented at £8 or less in the whole borough
was 4380.
But the proportion of houses sampled in the whole borough was
840/18000, i.e. 1/2143.
Hence the number of houses at the above rental to be expected
in the sample=4380/2143=204.
The number actually found in the sample was 206, with a probable
^^^^^ =06745 V(^M)
=06745 V(840 X ^\% X il^SS)
= 8, approximately.
AgaiQ, the number of persons engaged in a certain occupation at
Reading was known to be 761 in the borough as a whole. Hence
the number of persons so engaged to be expected in the sample
was 761/2143, i.e. 35.
The number actually found in the sample was 29 with a probable
^^^^ = 06745 V(wi55)
=06745 V(840 X tIwt^ X kl^^l)
=4, approximately.
Further examples of the method are here given, in each of which
the total number of events is small so that the number in each
sample is also small, and since, as we have seen, the accuracy or
precision of the proportion of successes discovered in any sample
varies directly as the square root of the number of events the sample
contains, the results cannot be expected to be so good when this
number is small.
Example (8). — 514 candidates sat a certain examination paper ;
their marks ranged from 3 to 64. The candidates were numbered
consecutively from 1 to 514, and a random sample of 90 (17 J per
cent.) was selected from among them by writing down the 90
numbers formed by the digits in the seventh decimal place, taken
in groups of three, in the logs of the numbers 10104, 10204,
10304, . . . , as given in Chambers's Tables, neglecting all numbers
greater than 514 and calling such numbers as 005, 037, etc. — 5,
37, etc. In this way each of the numbers between 1 and 514 stood
an equal chance of inclusion.
176
STATISTICS
The distribution of candidates in the sample is compared with
that for all 514 together in the following table : —
Percentage of All
Percentage of Candidates in
No. of Marks Obtained.
Candidates who obtained
Sample who obtained
these Marks.
these Marks.
Less than 15
8
p.e.
8d=l9
15 but less than 25
19
17±26
25 „ „ 30
16
18±27
30 „ „ 35
18
13±24
35 „ „ 40
15
17±26
40 „ „ 50
19
18±27
50 and over.
^o^
10±21
^l)
The reader might verify the p.e.'s given in the last column :
e.g. proportion in the sample obtaining less than 15 marks— 7/90 ;
therefore ^j=7/90, g=83/90.
Hence the S.D. for this group
=.V[7(19V)]
=254,
and the S.D. for the percentage
=VTrx254=28.
Thus the p.e. for the percentage
= cr— 19, approximately.
Example (9) deals in a similar way with the data concerning
infectious diseases in 241 towns in England and Wales previously
recorded on p. 62.
A sample of 60 towns, i.e. about 25 per cent., was chosen in a
random fashion as in the last example, and the sample distribution
is compared below with that of the 241 towns as a whole.
The verification of the probable errors in this and the next case
is left to the reader.
Case Rate per 1000
of the Population.
Actual No. of
Towns so rated.
No. as suggested by
the Sample.
1 and under 5
5 „ 9
9 „ 13
13 and over.
85
86
42
28
p.e.
92 ±10
96±10
28± 7
24± 6
FURTHER APPLICATIONS OF SAMPLING FORMULA 177
Example (10) is concerned with the annual output per head in
142 different types of employment as given in 1907 by the Censiis
of Production [data from Sixteenth Abstract of Labour Statistics of
the United Kingdom, Cd. 7131]. The distribution suggested by a
random sample of 50 different occupations is compared with that of
the complete list of 142 occupations.
No. of Occupations
No. in Complete
Actual No.
Output per head.
in Sample with
List as deduced
found in
this Output.
from Sample.
Complete List.
Under £60 ..
4
p.e.
ll±36
12
£60 and under £80
16
45 ±62
42
£80 „ £100
6
17d=43
25
£100 „ £120
10
28±53
20
£120 „ £190
8
23±49
27
£190 and over
6
17±43
16
The S.D. in each of the last three examples has been calculated
by using the value for p given by the sample, which is the value
one must fall back upon in practice when the true p for the whole
distribution is unknown. In any case where we are able to test
our sample by comparison with the whole distribution, however,
it is possible to use the true value of p, e.g. in Example (10)
output £100120, p==20/142 as opposed to 10/50.
r
CHAPTER XV
CURVE FITTES^G PEARSON' S GENERALIZED
PROBABILITY CURVE
It may be recalled that in the introductory chapter an outline was
given of the manner in which the theory of Statistics might be
conceived to develop. It was shown how the desire for simpUfica
tion and the need for compression leads to the division of a large
mass of figures dealing with any given matter into groups ; indeed,
it may well be that the statistics have been so arranged at the
source in the act of collecting : e.g. we may have to deal with
so many males of height 54 in. and less than 55 in., so many of
height 55 in. and less than 56 in., so many of height 56 in. and less
than 57 in., and so on. Here corresponding to each given height,
which we may label x, or each range of height, such as x^ to x^,
we have a certain frequency of males of that height or range,
which frequency we may label y, and hence a frequency table can
be formed showing the variation of y with x. Further we have
seen how such pairs of corresponding values of x and y can be
plotted so as to picture the complete observed frequency distribution
to the eye.
Now the representation thus made, though helpful up to a point,
is not entirely satisfactory. Whether we simply join up successive
points (Xy y), or set up rectangles of varying height y on bases
spanning the successive ranges of x, or erect ordinates (y's) at the
midpoints of these bases, joining the summits in the manner
previously described, the connection so established between each
observation and the next is too superficial, depending merely on
the fact of casual neighbourship, and may sometimes give a false
impression of frequency and changes in frequency in the population
of which the observations are but a sample. And this is neces
sarily so if we confine ourselves strictly to the data observed.
One difiiculty which has to be faced is that only within certain
broad limits can we trust our observations to give us information
which is truly representative of the population in which we are
178
CURVE FITTING 179
interested. We seldom if ever deal with the whole population :
in fact it may be so large that it is impracticable even to reckon it ;
instead we make a random or unbiassed selection of a smaller but
adequate number of individuals belonging to the population, and
classify them according to the size or nature of the character which
concerns us. But, granted that our sample is adequate in size
and unbiassed, the numbers obtained in the different groups of the
frequency distribution will still be subject to the errors of random
sampling, and it is only after these errors have been calculated that
we can lay down the probable Umits within which our sample may
be regarded as really representative of the population as a whole.
Another difficulty arises owing to the fact that our observations
in general do not cover the whole field of values of the variables x
and y ; we may quite likely want to know the percentage frequency,
2/, of individuals with a character (height or whatever it may be) x
which does not chance to be any one of the a;'s observed, if the
observations are only recorded according to discrete (separately
distinct, Hke 5 ft., 6 ft., 7 ft.) values of x ; on the other hand, if
the observations have been classed in groups, the frequency in
which we are interested may refer to an x which does not coincide
with the centre of any group or which is even outside the range
altogether. We have therefore further to inquire whether such
information can be deduced in any way from the statistics collected.
Now it so happens that both these difficulties disappear if we
can only attain the ideal already outlined in discussing graphs,
and find a suitable curve to ' fit ' the statistics observed. Such a
curve would not necessarily pass through all or any of the points
(ic, y) representing the observations, for these, as we have remarked,
are subject to errors of random sampHng and the observed frequency
y of any ic may be greater or less than the corresponding y in the
population at large to which the curve is presumed to approximate.
The curve in short must remove the roughnesses which are in
separable from ordinary observation. Moreover, given any x, not
merely one of the x's observed, it must be possible to read off from
it the corresponding y, the frequency appropriate to that x.
It is not always accurate enough for our purpose to draw a curve
by eye, passing as evenly as possible through the middle of the
points observed in the manner conceived in an earHer chapter. It
is necessary in some way to find an algebraical formula, possibly
even a trigonometrical, exponential, or more complex expression,
which will give the y corresponding to any x desired. This formula
or equation must depend upon the statistics collected : i.e. the
180 STATISTICS
constants involved in it must be directly and fairly easily computed
from the 2/'s observed, and the results of all the observations should
enter into the equations which determine the constants in order to
make use of the full information at our disposal. In addition, the
method of determining the equation and its constants should be as
general as possible, so relieving us of the trouble of discovering a
new method owing to the failure of the original one at nearly every
trial. Finally, the equation should not be so intricate as to make
the labour of calculating y for any given x too heavy to be attempted
with the ordinary equipment at the statistician's disposal. Once
such an equation is found it is a fairly straightforward proceeding
to trace the curve for which it stands, and it wHl remain afterwards
to test the goodness of fit in some more refined way than by seeing
how closely it passes through the observed points by eye.
When we come to review the shapes of the frequency polygons
or histograms most commonly met, we find that the majority
of them start from low fre
quency, rise to a maximum
as X, the character observed,
increases, then fall again to
wards zero very likely at a
different rate. In fact the
statistics suggest a shape something like that shown in fig. (27)
for the corresponding frequency curve, though we cannot be sure
that it would coincide with the axis at either extremity. [Cases
do occur where the curve has two or even more humps (maxima),
but we purposely restrict ourselves to the simpler and more frequent
tyipe described.]
Now the simplest shape to deal with from the algebraical point
of view would certainly be symmetrical in character, corresponding
to statistics which rise and fall at the same rate, though this would
not necessarily be the most common shape among the records of
actual Hfe. In order to simplify our problem, therefore, we might
start by making up for ourselves an ideally simple set of statistics
which are perfectly symmetrical, and see whether we can discover
a process for fitting a curve in a case of that kind. If this prove
successful it might be possible afterwards to adapt the same process
to an unsymmetrical or ' skew ' set of statistics made up in a similar
way. Then finally we should inquire whether actual observations
conform to any of the types of curve discovered, and, if so, how
they can be fitted together.
Now in manufacturing our statistics we must keep before us the
CURVE FITTING 181
object at which we are aiming. Given the statistics, what we
want is a formula, algebraical or of some other kind, to fit them.
This raises the possibility of choosing the statistics themselves in
some algebraical form, and such a form is at hand in the binomial
expansion, which is, in fact, one of the first examples of a general
symmetrical expression one meets. Thus
(a+6)i=a+6
(a+6)2=a2+2a6+62
(a+6)4=a4+4a36+6a262+4a63fM.
(a+6)5=a5+5a*6+ I0a^b^+ l0a^b^+5ab*\b^
12
Clearly all these expressions become perfectly symmetrical if we
put a=6, for they read the same whether we run from left to right
or from right to left.
We have already seen what an important part the binomial
expansion plays in the early stages of the theory of probability :
e.g. (i+i)^^, when expanded, tells us at once the proportion of times
on the average we may expect 10 heads, 9 heads and 1 tail, 8 heads
and 2 tails, and so on, when we toss an evenlybalanced coin ten
times in succession ;" or again, if p is the probability that a certain
event will happen, and q the probability that it will fail to happen
at one trial, then the probabilities that it will happen p times,
ip—l) times, {p—2) times, . . .inn trials are given by the succes
sive terms in the expansion of {p\Q)^ However, we make no
assumption for the moment as to the values of a and 6, except
that in the symmetrical case with which we begin they are equal,
and we have as the successive terms of {a\a)^ : —
an, ^a^ ^(^^V, . . . , ^<^^U ruin, an, 
12 12
Let us suppose that our observed statistics take the above form
so that these terms may be plotted as a succession of ordinates,
2/i> 2/2' 2/3» • • • . Vn+v associated with abscissae, x^, x^y Xq, . . . , x^+i,
at equal distances apart measured, say, by c ; for convenience we
may place the origin as in fig. (28), so that
:2c, x^=Sc, . . . , XJ^+^=(n\l)c,
182 STATISTICS
and we can then form a frequency polygon, where
x^=rc, y^
n(n—\)(n—2)
(nr+\.
123 . . . (r1)
are typical values of a pair of the variables x and y, each such
pair defining a vertex of the polygon.
Now in this case, since the statistics have been artificially built
up by ourselves and are not in reaHty a random selection, they are
Y
/I
p^
>
[\
^A
K
^
y.
yr
yn^>^^
O
e — rt
n+2
c 
(
— *
y
Fig. (28).
not subject to errors of samphng and the fitting curve should,
therefore, pass through the summits of all the 2/'s, or, perhaps
better, touch each of the fines joining adjacent summits. The
curve only differs from the neighbouring outline of the polygon in
that the latter is discontinuous, it alters its direction relative to the
axis of X by jerks at equal intervals c measured along OX, whereas
the former must rise gradually and continuously and then fall in
the same way. This is one sense in which we mean that the fitting
curve removes the roughness of the observation statistics — ^it gets
rid of jerks besides fiUing gaps in the observations.
It will be clear that as n increases and c diminishes (and this is
what we aim at in collecting statistics, though it has not been assumed
in what immediately follows) the discontinuity in the polygon
becomes less and less pronounced and the outline of the figure
approximates more and more closely to the
curve. Moreover this approximation gains in
intensity if we make the slope of the curve at
each appropriate point the same as the slope
obtained by joining up the summits of adjacent
ordinates of the polygon.
^^•^r+r^J
yr%i
Now the expression
{yr+iyr)lG
CURVE FITTING 183
is the measure of the gradient from the rth ordinate to the (r+l)th,
and
yr+iyr _«"r M^l) • • ♦ {nr+l) _ n{nl) . . . (nr+2y \
c c\_ 12 ... r 12 .. . (r1) J
_a" n(n—l) . . . {n—r{2)rn—r^l 1
~~c 12 .. . (r1) L r ~ J
n—2r+l
=yr .
re
If this be also taken as the gradient of the tangent to the curve at
the point midway between (a;^, y^) and (aj^+i, yr+i), calling this point
(a;, y) we have, since, in the notation of the differential calculus,
— is the measure of the gradient of the curve at this point,
dx
dy^ yr+iVr
dx c
n2r+l
==yr
re
And
x=i{Xr+x,^i)=Krc+{r+l)c]=l{2r+l)
2 12 .. . (r— 1) L ^ J 2r
Hence
^^ n2r+l _ 2ry ^ (n+2)(2r+l) _ 2y L^2^''
re n\l re {n\l)c\ c
%_ 22/ /„^2_2.
Thus
dx {n\l)c\ c
But if we had started with any other two adjacent ordinates
instead of i/r and y^+i we should have been led to exactly the same
relation connecting the corresponding x and y of the required
curve, for r, which serves to particularize the ordinates, does not
appear in the relation at all — their individuality has been eliminated.
The above equation may thus, if we please, be taken as holding
good for, and therefore defining, all points {x, y) of the fitting curve :
it is, in short, the differential equation of that curve.
The equation may be slightly simplified by transferring the
origin to the point {n{2) , , evidently the point O' in fig. (28)
184 STATISTICS
corresponding to the maximum ordinate of the polygon or curve.
Algebraically, this merely means that for x we must write
x\ L in the equation, which then becomes
dy_ 2y [ 2a;\_ 4:xy
dx (n\l)c\ cj (n+l)c2
We may pass to the equation proper of the curve by integration.
Thus, separating the variables,
. J y {n\l)cV
2x^
Therefore, log y\ + A=0,
where A is a constant.
Hence ' 2/=2/o6''''^''^"+'^
where y^ is a new constant.
This may be written
y^Yoe^'/''^', . . . (1)
where a'^=(n\\)c^l4:, and it is called the probability curve or normal
curve of error.*
Let us now see whether the procedure so far followed is applicable
in the case of an unsymmetrical or skew distribution of statistics.
With this object we will suppose the frequencies of observations in
successive groups to be represented by the corresponding terms in
the expansion
vyivh 1 \
and as before we can form a frequency polygon by joining the
summits of the ordinates
n(n—l) ^ „ o
[* Karl Pearson's method of getting the normal curve equation has been
adopted as the basis of the above discussion, in preference to that usually
followed, which develops the curve also from the binomial expression but some
what on the lines of Laplace and Poisson. They showed that the sum of all the
terms lying within a range t on either side of the maximum term in the expan
sion of {p + g)" is approximately
V2ir(T[_J_t
where (r= ij{npq), whence the equation of the curve is derived. (See Historical
Note at the end of Chapter xviii. )]
CURVE FITTING 185
erected on the axis of x at distances from the origin given by
^i=c, a?2=^^c, x^=^6c^ . . . , x^j^y=\n\\)c,
the figure being very similar to that in the symmetrical case.
The gradient of the fitting curve where it touches the join of
{x^, 2/r) to (a;^+i, i/r+i) is given by
dy_ yr+iyT ^
dx c
and we must try and express the righthand side as before in
terms of {x, y), the coordinates of the midpoint of the hne joining
(^r> 2/r) to (a:^+i, i/r+l)
We have
dy
dx
_ir r.(^.l). . . {nr\\) _ n(nl) . . . {nr+2^ 1
cL 12 ... r ^ ^ 12 .. . (r1) ^ ^ J
1) . . . (7ir+2)p
12 .. . {rl) L
j3"y^ n{nl) . . . {nr+2)\'nr+l _ 1
c
Also
2x=Xj.\x^+^=rc+{r\l)c={2r{l)c
12 .. . (rl) . L ^ J
Thus
2y/nr+l \ //nr+l , \
dy_2y/n—r+l
dx
2v
=J[{n^\)qr(p\q)\l[(n+\)ci+r{pq)\
c
2?/
=A^(n+l)qc{jp+q)(2xc)]l[2(n+\)qc^{pq)(2x~c)\,
c
This, being true for all such pairs of values of x and 2/, is now in a
form independent of any particular point on the curve we seek ;
in other words, it may be taken as the differential equation of the
curve, and it is evidently of the type
dx iP+yx)
where a, jS, y involve only p, q, w, etc., the constants of the distri
bution we set out to fit.
186 STATISTICS
The equation is simplified if we transfer the origin to the point
(a, 0), when it becomes
dx yx\h
where 8=jS+ya.
To integrate, separate the variables as before :
'dy [ X
/+/■
dx=0.
y Jyx\h
Therefore, log y+ ^ /•(y^+3)3^^^Q
yj yx\h
X 8
log 2/h—  log (ya;+S)+A=0,
y 7
where A is a constant,
or y=Ee^''^(yx+hfly'',
where B is a constant.
It may be written
y=y.«"(i+a)'' • • •
(2)
where k=^l/y;a=S/y, and 2/0 is a new constant.
This, then, may prove a suitable type of curve to fit a set of
statistics forming a skew frequency distribution, but the question
now arises whether equations (1) and (2) are the most general
types possible. Clearly (1) is only a particular case of (2) obtained
by making p=q, and, this being so, (2) may itself be a particular
case of some still more general type.
Light may be thrown on this if we consider the geometrical
bearing of the differential equation obtained in the last case :
dy y{a—x)
dx P\yx
(3)
The presence of y and (a—x) in the numerator of the righthand
dij
side of (3) shows that — vanishes when y=0 and when x=a, i.e. the
dx
curve touches the axis of x where the two meet and there is a
maximum point on the curve at x=a. (Since a is the particular
value of the organ or character x for which the frequency is a
maximum, a is of course the mode.) Now these two characteristics
are the very ones to which we wished to give symboUcal expression
since they serve to describe in broad outline what was agreed to
CURVE FITTING 187
be the trend of the majority of frequency distributions — the rise
from zero to a maximum, at first gradually, then faster, and, after
passing through the maximum, the fall to zero again, generally at
a different rate.
As to the denominator of equation (3), the corresponding equation
for type (1), before the origin was changed, was similar to equation (3),
except that it contained no x term in the denominator, and that is
readily understood when we note that y is a multiple of {p—q)
and thus vanishes when p=q. Now, if from (3) we get a less
general tjrpe of curve by dropping the x term in the denominator,
we may perhaps get a more general type by adding an x'^ term, and
even an x^ term, an x^ term, and so on. In fact there seems no
reason why the denominator should not be any function of x, say
f{x), which, however, we shall suppose for simplicity capable of
expansion in a Maclaurin's series of ascending powers of x which
converges quickly.
We are led to propose, therefore, as more general than (3), the
differential equation
dy y(x\b)
dx px^\qx{r
(4)
We stop at a;2 in the denominator because it has been found, if we
may anticipate results to save needless labour, that beyond this
point the heaviness of the calculation involved and the decreasing
accuracy of the higher moments that have to be introduced out
weigh any other advantage gained. The curve or set of curves
resulting from the integration of equation (4) is known as Karl
Pearson's Generalized Probability Curve, and their author has
stated that, while it comprises the two other types as special cases,
it practically covers all homogeneous statistics he has had to deal
with.
Just as the differential equations in the first two cases considered
were related respectively to the symmetrical and the skew binomial
expansions, so is equation (4) related to the hypergeometrical
expansion
the successive terms of which express the probabiUty that r black
balls, (r— 1) black balls and 1 white ball, (r— 2) black balls and
2 white balls, . . ., r white balls, will be drawn from a bag contain
ing pn black balls and qn white ones, where ip\q) = l, when r balls
are drawn in all, each being replaced before the next is drawn.
188
STATISTICS
If the terms of this expansion are represented by ordinates of
which the summits determine a polygon as in the binomial cases,
the corresponding expression for the gradient of the curve at any
point is given by an equation of type (4). We need not go over
the detailed proof of this statement since it follows precisely the
same lines as in the previous cases.
The method of integration of the equation
dy_ y(x\b)
dx ' px^jqxjr
depends upon the nature of the roots of the quadratic in the
denominator which may be written
x+,
px^\qx]r=p\ ^ .
4
W pJJ
^+^ 
2p
4^2
4:pr
4pr /J
xh—
4r2 ,
— — /cU
g2
.,],
where k =q^l4:pr, and it is evident that the quadratic splits up into
real factors if k{k—1) is positive. This is the case when k has any
negative value, or when it is positive
and greater than 1, the truth of which
may be seen more effectively if the
curve
2/=/c(/c— 1),
K + (>\)
a parabola symmetrical about the line
K=i, be drawn, fig (29), by plotting
y against k.
Further, the product of the roots of the quadratic
px'^^qx\r=0
4:pr
Fig. (29).
IS
4
p q" ^pr g^
so that the roots when real will be of the same sign if k is positive
and of opposite signs if /c is negative. The boundary lines
K=0 and /c=l
thus divide the whole field into three parts, as shown in fig. (30), in
one of which the roots are real and of opposite sign, in the next
CURVE FITTING
189
the roots are imaginary, and in the third the roots are real and of
the same sign. At the boundaries we get particular cases as
follows : —
K=0 : this requires q=0, since K=q^/4:pr, which makes the
roots of the quadratic equal but of opposite sign, unless p=0 also,
and in that case both roots are
infinite ;
K=l : the roots are real and equal
and of the same sign ;
K=cc: this requires p =0 or r=0;
in the former case one root of the
quadratic is infinite, and in the
latter one root is zero.
Thus, returning to the differential
equation, the curves which result
from the integration
'dy f (x]b)dx
y
'^ \
•fe
c
?
«
/
i \
1
.51
09
h
«.?>^
u
cs
/« c
y II
^
"U
/— 2"
c °'
\ **
03
'■"
c
? "*
II
\i
1
■2
1
II
1
/
2l
.o
/
F
K
Fig. (30).
J y J:
px^{qx\r
are of different types according to the value of k, which is therefore
called the criterion.
Type I. — /c—^^. Roots of px^\qx\r=0 real and of opposite sign.
In this case we may write
and so get
px'^^qx{r=p{x{a)(x—P')
{x]b)dx
J y J via
=0,
p{a'+x){p'x)
or, transferring the origin to the point (—6, 0), the mode, we have
'dy . f xdx
or
j y^]p^a'b+x){p'^bx)
[dy [ xdx
J y J p(a\x)
=0,
where
Therefore,
where A is a constant.
a=a'6, j8=iS'+6.
, Ifa dx I f B dx ^ . ^
log y— + J +A=0,
^^ pJa\xa+P pJpxa+p
190 STATISTICS
Thus log y= ^ [a log (a+a;)+^ log (iSa;)]+log B,
where B is a constant,
whence y='B(a\x)v^'>+P')[Q—xY^'^'^^^
where v=l/p{a^P) and i/o is a new constant.
This is a skew curve of limited range, bounded by the lines x=—a
and a;=+iS, with the mode at the origin.
Type II. — K=0. q=0, but not p=0. Roots of px'^{qx^r=0
equal and of opposite sign.
This curve is just a particular case of type I., which reduces to
y=y.(i„.) , . . • (6)
symmetrical about the axis of y (because for any value of y there
are two values of x, equal and of opposite sign) and of limited
range bounded by a;=— a and x=^a, with the mode at the origin.
Type III. — K = oz.* p=0, but notr — 0. One root ofpx'^\qx{r=0
infinite.
This is the skew binomial case over again. It may be also de
duced from type I. by making one root, say ^' , tend to infinity.
The curve then takes the form
because j8=jS'+6, so that j8 tends to infinity with ^'. Hence
where A=— jS/ic.
1+^) e , . . . (7)
a skew curve limited in one direction by the Hne x=—a, with the
mode at the origin.
[* Although theoretically this type corresponds to an infinite value for /c, in
practice it will as a rule give a reasonable fit provided k is numerically greater
than 4. (See W. P. Elderton's Frequency Curves and CorrdcUion, p. 50)].
CURVE FITTING 191
Type IV. — /<:+^^ and <1. Roots of px^]qx+r=0 imaginary.
Put k{k—1) =—X^, and the differential equation then leads to
:{b)dx
2p) "^ q^ ^ J
Transfer the origin to the point ( — ±,
2pl
1. f ^■■r^\Jb_q_\_g_ ^_^^
log y=A+ log k2+4 — + — _^ J^ tani ,
^2p ^\ ^ q^ y\p 2p^j2rX 2rA
where A is a constant.
1
p2^
Therefore, y=yjl+^J e"^*'" '* . . . (8)
where a= — , m=— — , v =— — b—±
q 2p ap\ 2pj
and 2/o is a constant.
This is a skew curve of unlimited range in both directions. The
position of the mode is found by putting — =0 in (8) after differ
dx
entiation, or, what comes to the same thing, is seen by direct refer
ence to the differential equation itself. Thus the distance of the
mode from the origin
= — ib—^\=vpa
2p
■—va/2m.
Type V. — K = l. Roots of px^\qx]r=0 real and
The equation to integrate becomes
"dy f {x\b)dx
/?=/■
' '''h:
192 STATISTICS
Transfer the origin to the point ( — _, ), and this becomes
J y J 'px^
dx
log 2/=A+ log x— . ( 6^ ), '\
p p \ 2pjx ;
where A is a constant. i
l(bl)l \
Therefore, y=yQX^'Pe ^^ ^p'^~
y=yoXe7/x, ... (9) I
where s = — l/p, y=i b—— ), and 2/0 is a constant.
P\ 2pl
Here x cannot become negative, so that the curve is skew and \
limited in one direction. The distance of the mode from the origin j
Type VI. — K\^^ and >\. Roots of px^\qx\r=0 real and of the i
same sign. i
Equation becomes ]
rdy^ r {x^h)dx j
J y J p(x\a)(x^P) i
iog,^fr±i^.i+i^.ij.. '
]\jp{^a) x^ra p{ap) x+^j i
=A+_J— [(6a) log {x\a)(bP) log (a^+jS)], i
p(Pa)
where A is a constant ; ♦•
or, transferring the origin to (— j8, 0), j
y=yo\x(pa)f^^h^(^ . \
y=yo(xa)^^'^^ . . . (10) I
where a=jS— a, q2 = {b—a)/p(P—a), qi = (b—^)/p{P—a), and yo iB a, :
constant. j
This is a skew curve bounded by x=a in one direction. The j
distance of the mode from the ongm=— {b—^)=aqj{qi—q2). j
CURVE FITTING 193
Type VII. — /c=0, ^=0,^=0. Boots of the quadratic px^\qx]r=0
both infinite.
This is the symmetrical binomial case over again and the integra
tion reduces to
J y~
'!'>■
or, transferring the origin to (—
■b, 0),
fdy^
} y~
=/;
.
\ogy
=A+,>.
where A is
a constant.
Therefore
y
=y,e""'^
• . (11)
where i/o is a constant and a^=z—r.
This curve, the normal curve of error, is symmetrical about the
axis of y, where mean and mode coincide, and it is of unlimited
range on either side of it.
CHAPTER XVI
CURVE FITTING {continued) — the method of moments
FOR CONNECTING CURVE AND STATISTICS
We have now completed the first stage of the discussion upon which
we embarked : we have found by the application of general prin
ciples various types of curve, represented by different equations,
which are said to fit more or less satisfactorily a considerable number
at all events of frequency distributions composed of homogeneous
material.
Our next task is to pass from the general to the particular, to
see how to set up a connection between an actually observed fre
quency distribution and the appropriate theoretical curve. This
again seems to break up into two parts — (1) to find a way of deciding
which type of curve to adopt in a particular case ; (2) to determine
the constants of the curve in terms of the observed statistics ; but
since the criterion, k, which distinguishes one type of curve from
another is itself a function of the constants of the curve before
integration, it follows that the solution of the first part is incidental
to that of the second.
The general method proposed for determination of the constants
of the curve in terms of the observed statistics is the now wellknown
method of moments due to Karl Pearson, whereby the area and
moments of the fitting curve are equated to the area and moments,
calculated from the statistics, of the observation curve.
If a frequency table be drawn up (see Table (40)) showing the
number / of observations corresponding, to the deviation x of each
value, or group mid value, X of the character observed from some
fixed value, the expression
^1/1+^2/2+ • • • +^rfr+ ' • •
is called the first moment of the distribution with reference to the
fixed value, which may be termed the origin. Similarly,
is called the second moment, Zx^f, the third moment, Ux^f, the
191
CURVE FITTING
195
fourth moment, and so on. The following notation will be found
convenient for working purposes : —
N\ Uxf
= =r^, V
W^_Zx^f
Undashed letters are reserved for use when the distribution is re
ferred to its mean as origin, in other words when the deviations of
the X*s are measured from the mean X.
Table (40).
Deviation.
Frequency.
First
Moment.
Second
Moment.
Third
Moment.
Fourth
Moment.
fr
^V2
^V2
Totals .
N
N',
N'2
N'3
N'4
Now each N in the frequency, table is the sum of a number of
discrete quantities which only tend to form a continuous series as
the class intervals are made very small and the number of observa
tions is made very large. The corresponding frequency polygon
or histogram, if we drew it, would at the same time tend to become
a continuous curve, the observation curve. If that Hmiting stage
were attainable, if we could actually get an infinitely large sample
of observations in which the character observed changed by infinitesi
mal amounts, we could then replace the isolated /'s of observation
by the corresponding y^s, the ordinates of this observation curve,
and to get the moments we could write instead of the discrete
sums
2"/, Uxf, Ex^ , . .,
the continuous integral expressions
\y'dx, jxy'dx, jx^dx, . . .,
taking in the whole sweep of the curve by integrating throughout
196 STATISTICS
the range of deviation x. We should then have, if areas and
moments are equated according to Pearson's method,
jydx=jydx, \xydx=jxy'dx, jx^ydx=jx^y'dx, . . .,jx^ydx==jx^y'dx,
where y is the ordinate of the fitting curve corresponding to the
ordinate y' of the observation curve.
In practice, however, it is impossible to go to this limit : we
cannot deal with an infinitely large sample, so we take as large a
sample as is convenient, calculate the rough moments, N, N'^, N'2 . • .,
and find approximately what corrections or adjustments are neces
sary to obtain the moments of the observation curve, a procedure
which is really equivalent to the determination of the area of a
curve when only a number of isolated points thereon are known.
For the full analytical justification of the method of moments
the reader is referred to Professor Pearson's original paper, On
the Systematic Fitting of Curves to Observations and Measurements
[Biometrika, vol. i., pp. 265 et seq. ; also vol. ii., pp. 123], where
it is shown that ' with due precautions as to quadrature, it
gives, when one can make a comparison, sensibly as good results
as the method of least squares.' The latter, which is the traditional
way of approaching all such problems, is shown to be impracticable
in a large number of cases, either because the resulting equations
cannot be solved, or, when they are capable of solution, because
the labour involved would be colossal.
Let us consider next how to deduce the area and moments of the
observation curve from the statistics, in other words how to get
jy'dx, jxy'dx, jx^dx, . . .,
the integrals being taken throughout the range of the curve, when
we Imow the frequencies corresponding to only a certain number
of values or elementary ranges of the deviation x.
Now the character observed may be capable of the deviations
actually recorded and of no values in between, e.g. measuring
deviations from ' no rooms ' as origin, we might have /^ oneroomed
tenements, /g tworoomed tenements, /g threeroomed tenements, but
there could be no such thing as a twoandahalf or a threeanda
quarterroomed tenement ; on the other hand, any recorded devia
tion, x^, may be only the mid value (used as a convenient and
concise approximation) of a group of observations including all in
the continuous range from (a;^— J) to (a^^+J), where unit deviation
is the class interval : thus we might have f^ males deviating by
j6 in. from 5 ft. (comprising all the males observed between 5 ft.
CURVE FITTING
197
5J in. and 5 ft. 6J in.), /g males deviating by +5 in. from 5 ft. (com
prising all males between 5 ft. 4J in. and 5 ft. 5J in.), and so on.
These two cases must be discussed separately.
(1) When the observations are centred at definite but isolated values
of X.
The problem is to find
^x'^y'dx
(the Tith moment) when we have no definite curve given but we
know the values of x and y' at a number of isolated points, say
This is equivalent to discovering a suitable ' quadrature formula,'
i.e. a good approximation to
\zdx
O /f Wz 2 /z 3
Fig. (31).
h\ hlhZ
Fig. (32).
Ph
in terms of known points
(♦^0> '^0)' V**^!' ^l)' ('^'2' ^2/' • • • \^p> ^p)'
where we have written z in place of x^y' , and we may generally
take the ordinates to be at equal distances, h, apart. Several
such formulae have been suggested and they vary according as the
2's are situated at the ends (fig. (31)) or at the centres (fig. (32))
of the h intervals. The second type is perhaps the more useful of
the two, and we shall work out one formula in illustration of it. .
Consider the first five of the given points, namely,
(^0> '^0/' ('^1' ^l)j • • • (•^4j ^4).
As a simple * curve of closest contact ' let us find the parabola of
type
z=CQjCiX/h\C2X^/Ji^\c^x^/h^^CiXyh* . . (1)
which goes through these five points, where the c's are constants to
be determined. We may without loss of generality take the axis
198
STATISTICS
of z to coincide with the middle one of the five ordinates, so that
the known points on the curve become
(2h, zo), (h 2i), (0, zg), (+/i, zs), (+2^. «4),
and on substitution in (1) we get
2=0=^0— 2C1+4C2—8C3+I6C4.
Za Cn
Z4=Co+2Ci + 4C2 + 8C3+16C4.
2h h O +A +2>^
Fig. (33).
^1 — ^0 ^11^2 ^3 I ^4*
2^3 = Cq 4" C^ + C2 + C3 j" C4 .
These equations are just sufficient
uniquely to determine the c's, and
hence the parabohc curve of closest
contact, in terms of the five given
points, but for our purpose it is not
necessary to find all the c's. Suppose
our object is to find the area of the
shaded portion of fig. (33) in terms
of the coordinates of the five given
points. This area
+hl2
zdx
{CQ\Cix/hi'C2X^/h^\CQX^/h^\'C^x*/h^)dx
=Co^+ cJb/U\ cJi/SO.
But the equations between the z's and c's at once give
Z2=Co> ZQ\Z^=2{CQ+4:C2+lQCi), 2i + Z3=2(Co+C2+C4).
Thus
Therefore
2C2+2C4 = (Zi+Z3)22;2
24c2=16(2;i+Z3)(Zo+Z4)3022
24C4 = (Zo+Z4)4(Zi+23) + 6Z2.
Hence, by substitution, the shaded area becomes
2£?a;=^[z2+^F«l6(Zi+Z3)(2o + Z4)30«2
+TT^\(Zo+z^)Hzi+z^)+Qz^\]
= A[5178z217(.o+^4)+308(Zi+.3)],
£
h/2
(2)
/:
CITRVE FITTING 199
these particular ordinates being appropriate when the axis of z
coincides with the z<^ ordinate.
Similarly, it can be shown that
•+3A/2 }i
■m 2^^=2i^272;o+172i+52;2Z3]. (3)
by finding the parabolic curve of closest contact through (0, Zq),
(A, Zi), (2^, Zg)' (3^» 2:3), the axis of z coinciding now with Zq.
cHv+m
Now we require / zdx
(see fig. (32)), and this may be obtained by spHtting up the integral
thus
/•+3A/2 /•6/1/2 nhl2 r(pm f(P+i)h
+ + +...+ +
J~h/2 J3h/2 Jbhll kvi)^ \v\)h
and applying the formulae (2) and (3) to evaluate these sub integrals.
The first and last come under head (3), while all the rest come
under (2). In fact, we fit together portions of curves of parabolic
type based on the successive groups of points
(0, 1, 2, 3), (0, 1, 2, 3, 4), (1, 2, 3, 4, 5), (2, 3, 4, 5, 6), . . .
(p— 4, p3, p—2, p—\, p), (p— 3, p—2, p—l, p),
and as the points overlap, in the sense that neighbouring groups
have points in common, the curves dovetail into one another and
so provide a fairly good approximation to what we want in the way
of integral expressions giving areas based upon the positions of
certain known points.
We have, then : —
8A/2 h
zdx=—[27zQ+llZii5z2z^']
hl2 24
5/1/2 }i
r =2:^^=^;:^[5178z217(2:o+2J4)+308(2;i+2:3)]
3A/2 57 bO
6A/2 57 dO
i:
"'''=«i'^'^~[5n»z,n{z^+Zt)+S08(z,+z,)]
hji 57 dO
^^=^^5n8zp.^n{Zp.,{z^)+S0S(Zj^^+Zp.,)]
(j)i)h 5760
zdx=~[21zp\llZp.;i^+5zp.2—Zp^].
J(j>i)h 24
200 STATISTICS
f
2(^a;=^^^^[6463;:o+4371Zi+666922+5537z3+6463z^
Hence, by addition,
f(p+m , h
zdx= ^
•A/2 5760
+4371z^_i46669v2+55372;^_3]
=A[M220(Zo+s)+07588(Zi+Vi)+M578(z2+V2)
+ 09613(2;3+V3)+ (2=4+2^5+ . . . +V4)]
In effect, since z—x^y', this means that to calculate the moments
from the given statistics we may work simply with the observed
ordinates or frequencies, as drawn up in Table (40), so long as we
modify the first four and the last four by multiplying them by
suitable factors. In particular, when the frequencies at the be
ginning and end of the distribution are very small, that is to say,
when there is high contact at each end of the frequency curve,
we may dispense even with the modifying factors also since we
may assume that before the first and after the last ordinate observed
there are others which are so small as to be negligible.
Thus, given high contact at each extremity of the observation
curve, we may write
/:
zdx=h2Jz,
hii
or, if we take the class interval as unit in measuring x so that h=l,
this gives
jyx^dx=Zfx^,
where the integral may now be taken as referring to the fitted
curve, since the moments of the theoretical and of the observa
tional curves are to be equal, and the integration traverses the
extent of the curve. When, however, there is not high contact at
the extremities the same equation holds good if we multiply the
first and last of the observed /'s by 11220, the second and the last
but one by 07588, the third and last but two by 11578, and the
fourth and last but three by 09613.
In particular, when 7i=0, integrating throughout the curve,
\ydx=i:f=^, . . . (4)
which, being interpreted, means that the area contained between
the fitting curve and the axis of x measures the total frequency of
observations, modified if necessary.
Also, when the observation moments have been adjusted, if we
CURVE FITTING
201
write /Lt and fju' in place of v and v in the notation previously pro
posed (see Table (40)), integrating again throughout the curve,
\xydx/jydx=UxffN=fjL\, . . • (5)
and the geometrical interpretation of this is that the foot of the
ordinate passing through the centre of gravity of the area between
the fitting curve and the axis registers the deviation of the mean X
from the fixed origin.
If deviations are measured from the mean of the distribution
as origin i7(a:/) vanishes (see also Appendix, Note (5)) so that/>ti=0.
Generally, we have, with the same limits of integration,
jx^^ydx/jt/dx ^Z'x^Z/N =:/x' „ ,
and when the distribution is referred to its mean as origin the
righthand side is written /x„.
We now pass to the second case.
(2) When the observations appear in groups ranging between
definite values of x, the range of each group as a rule being the same
in extent.
Since the usual procedure here is to treat each member of a group
as though it were centred at the x at the middle of that group —
e.g. a group of school girls
each of some weight be
tween 7 stone and 7 stone
5 lbs. would be treated as
if all its members were of
weight 7 stone 25 lbs. —
this case evidently reduces
to that already considered.
It is necessary, however, to
examine what correction
must be made for assum
ing that all the members
of the same group have
the same x.
Consider again the expression
jx^y'dx.
The contribution to the nth. moment coming from the Zj. group of
observations (see fig. (34)) may be taken as the portion of the
above integral between limits '
A
Fig. (34).
a^ofr^) and [x^j^rh^] where
20^ STATISTICS
Xq is the distance of the centre of the first group from the
origin 0.
But, since all the observations in the same group are treated as
if they had the same x, by (2) this integral may be written
Mr
5760'
[5llS{x^{rhrl'7{{x^\r2hr\{Xo+r}2h)
H308{(a;o+rU)"+(a:o+r+Ur}],
where /^ is the frequency of observations in the group, and this, on
expansion in powers of {xQ]rh) and h,
^hMxo+rh)+^[2^0n{nlW(Xo+rhr'^
57 bO
+3n(nl)(n2){n3)h^{xQ\rh)»*+ . . .].
When we sum for all groups, the expression
X^"'hMxo+rh)
r«=0
gives evidently the nth moment of a set of isolated variables,
/o, fi, /g, . . . fp, and by Case (1) it may therefore be taken as
being practically equivalent to the required nth moment of the
observation curve, assuming that there is high contact at each end oj
the curve.
The remaining terms,
^^r^o 5760
+Sn(nl){n2)(nZ)hSxo\rh)^^i . . .],
may accordingly be taken as the correction required.
When n=0, these terms vanish, so we infer, just as in Case (1),
that, when the integration is taken throughout the curve,
j2/(Za:=27=N, . . . (4) bis,
or, the area between the fitting curve and the axis of x measures
the total frequency of observations when the class interval h is
treated as the unit in measuring x.
Again, when n—l, the corrective terms vanish, so we likewise
infer, as in Case (1), that, with the same limits of integration,
jxydxljydx=I!xf/N=fjL\, . . • (5) bis,
and that jLti=0.
When n—2, the reduction of the corrective terms gives
h'^
second unadjusted moment = second adjusted moment} — ^M*
1^
CURVE FITTING 203
or, dividing throughout by Ehf and bearing in mind the notation
adopted with the mean as origin,
when A=l as before.
When n=3,
third unadjusted moment =third adjusted moment \ —Zf^(xQ{rh) ;
4
but, if we refer the deviations to the mean of the distribution as
origin, Zf^{xQ\hr) vanishes.
Therefore, i"'3=^3 • • • (^)
When w=4,
fourth unadjusted moment
=fourth adjusted momentj— i^/r(a^o+^^)^H — ^M
2 80
Hence, dividing through as before by Zhf and taking A as 1,
Therefore, /^4=^4ii^2+2ib • • ' (^)
To sum up, the general procedure in Case (2) is to calculate
N, N'l, N'2, N'g, N'4 directly from the statistics and so deduce
v'l, v\y v\, v\. Then, transferring the origin to the mean, the v"b
become vi, V2, v^y v^ (see Appendix, Note 5), and finally the cor
rected /x's are given by
These adjustments, originally due to Dr. W. F. Sheppard * [Pro
ceedings of the Lond. Mathl. Socy., vol. xxix., pp. 353 et seq.], are
applicable only when the
curve of distribution has
high contact at each ex
tremity as very frequently
happens. To this case
we shall confine ourselves,
and when it does not hold
the unadjusted moments
may be used as a rough approximation failing a more refined but
also a more intricate adjustment.
The way in which the three chief kinds of average are related to
[* To obtain Sheppard's adjustments we have followed the method indicated
in Elderton's Frequency Curves and Correlation, pp. 28, 29. ]
204 STATISTICS
the fitting curve is of interest and deserves recapitulation. Whether
the observations are classed as in Case (1) or as in Case (2) : —
(1) the ordinate drawn through the highest point of the curve,
since the frequency there is a maximum, fixes the modal
value of X ;
(2) the median X is determined by the ordinate bisecting the
area between the curve and axis, since there are an equal
number of observations on either side of it ; and
(3) the mean is determined by the ordinate through the centre
of gravity of the area between the curve and axis.
We have still to show how to express the constants of the fitting
curve in terms of the moments calculated from the given statistics, and
it will be convenient now to make our approach from the other end.
Take the general equation of the fitting curve, express its con
stants in terms of its moments, and substitute for the latter the
values determined from the statistics, since the basis of the fitting
is the equalization of the moments of the observational curve and
of the theoretical curve. This will enable us to determine k, the
criterion for fixing the type of curve suitable to the given distribu
tion. When the type has been fixed it is, as a rule, not a very
difiicult matter to express the constants of the particular type
again in terms of the observational moments.
Now the general differential equation of the fitting curve was
dy y{x+b)
dx px^\qx\r
hence
j{px'^{qx]r)dy=jy(x]b)dx,
where the integration is to traverse the complete curve.
Therefore, multiplying both sides by x^,
j{px''+^]qx''+^\rx'')dy=j{yx''+^\byx'')dx;
or, if we integrate the lefthand side by parts
[{px''+^}qx''+'^+rx'')y']—jy{n\2px''+^\n\lqx''{nrx'>'^)dx
=j(yx^+^{byx^)dx.
But the expression in square brackets vanishes at both limits if
we suppose y to be zero at each end of the curve, so that the equa
tion reduces to
{l{pn^2)jyx''+^dxi{b\qnil)jyx''dx\rnjyx'^'^dx=0, ... (9)
CURVE FITTING 205
Now if deviations are measured from the mean of the distribution,
we have
jyxdx—'NfjLi=0, jyx^dx=l^fi2y jyx^dx='NfjL^, etc.,
and therefore, putting n=3 in the above relation,
put 71=2, (l+4i>)N/x3+(6+3g')N//,2=0
put 71=1, (1+3^)N/X2+^N=0
put 71=0, (6+g)N=0.
Thus b =—q, and, on substitution in the other three equations, we get
S/iiP + 3jLt3g+ 3/^2^ +/X4 =0,
3/>t2P + r\fjio^=0,
three simple linear equations to find p, q, r, the solution of which
leads to
^ = — (2^2i^4— 3/x23— 6/a32)/(10/X2/X4— IS/A^g— I2/X23),
^=6 = /i3K+3ia'^2)/(10ia2/X4lV212/x23),
We have thus expressed p, q, r, and 6, the constants of the fitting
curve in terms of the moments of the observed distribution, but the
results may be rendered more concise by writing
Pi=i^yi^\ P'z^i^ji^h, • • • (10)
whence
p=(2^23i3i6)/2(5ft6ft9), .... (11)
g=6=V(/x2ft).(i32+3)/2(5^26ft9), . . (12)
r=^2(4i823ft)/2(5ft6^i9) . . , . (13)
And Ky the criterion for fixing the type of curve suitable to the
statistics given, is immediately deduced from
K =q^l^pr
=A(iS2+3)V4(4ft3A)(2^23ft6) . . . (14)
Also, since ~ vanishes when x = — b, this fixes the mode relative
dx
to the origin. ' But the origin is now at the mean, so that
modemean=6= V(/^2iSi) • {p2+^)m^p2^Pi^) (15)
And
skewness = (mean— mode)/S.D.
=6/V(M2)
=Vi3i(i38+3)/2(5;8j6;8i9) . . . (16)
CHAPTER XVII ' 1
APPLICATIONS OF CURVE FITTING [
We are in a position now to test the application of these principles '
to given frequency distributions and we shall start by trying to ]
find a curve to fit the record of marks obtained by 514 candidates
in a certain examination (see p. 25).
Example (1). — This example is chosen because it turns out, '
when we come to evaluate k, that it is well fitted by the normal ^
curve, Type VII , which is one of the simplest and at the same time ;
the most important of all the types discussed. Before we start '
the numerical part of the work it will be well to express the [
constants 2/0 ^^d a of this curve in terms of the moments of the !
distribution. !
The equation of the normal curve is J
If N be the total frequency, we have by equation (4) bis, p. 202, j
f+co \
N= ydx
Jco \
/'+03 '■
=2/0 e*'/2a'^a., j
dx —
Put x72(j2=^^ so that —=(J^/2 and when a;=oo, f =00 also. j
di J
\
Thus N=2/o<^V'2f^%^'(Zf j
J co j
I
=yocrV2V'7T (see Appendix, Note 8) .'
=V(27r)cryo ... (1) \
206 ' . 1
APPLICATIONS OF CURVE FITTING
207
Again
r+oo / f+co
^2= yxHx yds
J co I J co
2 V2 . (j^i/o
N
[Mrf"<^]
2\/2 . ggyp Vtt
' ' 2 '
N
(«•):
vanishes at both Limits.
fi^ = V2. (72/0 Vi • ctVNct^, by (1).
since
Therefore
In fact, a is simply the S.D. of the distribution.
And yo=N/\/(2^).cr.
Table (41). Distribution of Marks obtained by 514 Candi
dates IN A CERTAIN EXAMINATION.
Mean No.
of
Marks.
Deviation
Frequency
of
Candidates.
First
Second
Third
Fourth
from 33.
Moment.
Moment.
Moment.
Moment.
(^)
.(/)
ifx)
ifx')
if^)
ifx')
3
6
5
 30
180
1080
6480
8
5
9
 45
225
1125
5625
13
4
28
112
448
1792
7168
18
3
49
147
441
1323
3969
23
2
58
116
232
 464
928
28
1
82
 82
82
 82
82
33
87
38
+ 1
79
+ 79
79
+ 79
79
43
+2
50
+ 100
200
+ 400
800
48
+3
37
+ 111
333
+ 999
2997
63
+4
21
+ 84
336
+ 1344
5376
58
+5
6
+ 30
150
+ 750
3750
63
+6
3
+ 18
108
+ 648
3888
—
—
614
110
2814
1646
41,142
208 STATISTICS i
The first 4 moments referred to 33 as oriein and with the class '
interval, 5 marks, as unit of deviation, are i
110/514, 2814/514, 1646/514, 41142/514. I
The arithmetic mean of the distribution j
=:33H5(^if) !
=335(0214008) j
=3192996. I
The second, third, and fourth moments referred to the mean as .
origin, and retaining five marks as unit of deviation, are given 
(see Appendix, Note 5) by i
1/2=2814/514^2^542891
j,3=_1646/5143^i/2^3_029296
z/4=41142/5144:ri/36:c2i;2^*^7879964.
After making Sheppard's adjustments i
/^2 — ^2~T2j /^3 — ^3' /^4 — ^4~4''2+ 2^4 5
these become
/x2=534558, /x3=029296, jLt4 =76 11436. ]
Thus j8i=/i23/^3^ =000056, jSg^/x^/^^^ =266365. ]
Hence «=ft(iS2+ 3)^/4(4^2 %)(2i82% 6) 
= (000056)(566365)2/4(1065292)(067438) \
=000063.
Since k and p^ are small and jSg does not differ greatly from 3, making ;
p and q small, we may fit a normal curve to this distribution.
The appropriate normal curve is . ^
2/=2/oe^/2«.2, .
where (t2=/x2 =534558 (5 marks as unit), j
2/o=N/V(27rf>t2)=514/\/2^(534558)^=886903. I
Hence the required curve has for its equation, writing results to
three significant figures,
j
Now the mean of the distribution is at 3192996, where the
central ordinate of the normal curve is erected, and the distance i
of any x, say x^^, from this point J
=(333192996)/5 (expressed with 5 marks as unit) ;
=0214008. .
APPLICATIOlsrS OF CURVE FITTING 209
Vny other x may be found in the same way and y can then be
deduced from the equation of the curve by taking logs, thus
log>o2/=log.o886903^^^bg,„e
=19478762 (00406218)a;2.
This enables us to calculate the ordinates of the normal curve and
thence we could evaluate the areas by successive applications of a
suitable quadrature formula.
We can, however, get the areas direct by using a table of the
probability integral, such as that due to Dr. W. F. Sheppard (see
pp. 284, 285). In that case the corresponding abscissae have first
to be expressed in terms of the standard deviation as unit, e.g.
a;4o.5=4053192996=857004,
and (7=5^/(534558) =1156025,
where the factor 5 is introduced because 5 marks was the unit in
the calculation of /Xg (a process equivalent in effect to that previously
adopted).
Thus a;4o.5/cT=0741336
=$, say.
The area of the normal curve up to the abscissa x/a or $
= 1 ydx
J co
= r yoe''''''^''''dx
J co
J CO
=nP zdi
J co
=N . i(l+<x),
where  represents the area of the curve z= — =e~^^ between
2 "^ V2,7
and £.
210
STATISTICS
Sheppard's Tables give the values of J(l+a) for different values
of f , and when
^=074, 1(1 +a) =07703500
^=075, i(l+a)==07733726.
Therefore, by interpolation, when
^=0741336, 1(1 4a) =07707538.
Thus the frequency of candidates with marks lying between and
405
=514(07707538) =39617.
Similarly the frequency of candidates with marks l3dng between
and 455=45220.
othlGO
lii
80
I
s
20
ffii
10 20 30 40
Marks obtained
Fig. (35).
50
60 70
Hence the normal frequency for the group with 43 as mean
number of marks =560, and the same method gives the area for
any other group.
The histogram of the observations and the curve plotted from the
ordinates are shown together in fig. (35).
In Table (42) are set out the calculated normal frequency (col. (4))
for each group alongside the corresponding observed frequency
(col. (2)), and the differences between the two are shown in col. (5).
We want to know whether the fit is a good one.
APPLICATIONS OF CURVE FITTING
211
(1)
Table (42). Comparison of Observed and Normal
Frequencies in Examination Example.
(3) (4) (5) (6) (7)
(2)
Mean No.
Normal Frequency.
Ratio of No.
of
Marks.
Observed
Frequency.
Deviation.
Sq. of
Deviation.
in Col. (6) to
No. in Col. (4).
Ordinates.
Areas.
3
5
39
57
+07
049
009
8
9
104
107
+ 17
289
027
13
28
232
235
45
2025
086
18
49
429
431
59
3481
081
23
58
658
656
+ 76
5776
088
28
82
837
831
+ 11
121
001
33
87
883
876
+06
036
000
38
79
773
768
22
484
006
43
50
561
560
+ 60
3600
064
48
37
337
340
30
900
026
53
21
168
171
39
1521
089
58
6
70
72
+ 12
144
020
63
3
24
35
+05
025
007
••
514
5115
5139
••
18451
X2=5.04
Now with this object we might square each difference as in
col. (6), sum the squares, and find the mean square deviation by
dividing by the total frequency ; this, after extracting the square
root, would give what might be called the rootmeansquare error,
regarding the theoretical values as the true ones. In the above
example it
=V(18451/514) =0599.
But this form of result, while it may be useful in some cases,
e.g. in comparing two distributions of the same kind to some
theoretical series, is open to objection ; for one thing it treats all
the differences as if they were of equal importance in absolute
magnitude, but a difference of 2, say, in a normal frequency of 10
is clearly more serious than a like difference in a frequency of 60.
The objection, however, goes deeper than that ; even when the
rootmeansquare deviation is found we are at a loss to estimate
its precise relationship to the quality of fit, as there seems to be no
definite connection between one distribution and another of a
different kind : there is no standard case, so to speak, to which we
can always appeal, where the fit is agreed to be good and supplying
therefore a suitable rootmean square deviation for comparison.
212 Sl^ATlSTICS
This leads us to the question : What constitutes goodness of
fit ? Suppose by some means we have selected a theoretical or
empirical formula to describe a certain frequency distribution in a
given population ; if the frequency values observed do not differ
from the theoretical frequencies by more than the deviations we
might expect owing to random sampling, then clearly the fit may be
regarded as a good one. And we have a measure of the fit if we
can find the proportion of random samples, of the same size as the
given distribution, showing greater deviations from the distribu
tion given by theory than those which are actually observed.
Now Professor Karl Pearson has shown how this proportion can
be calculated [Phil. Mag., vol. 1., pp. 157175 (1900)] ; he finds the
probabiHty that a random sample should give a frequency distribu
tion differing from that which theory proposes by as much as or by
more than the distribution actually observed. This probability, P,
is a function of ^, where
y and y' representing the theoretical and observed frequencies for
any particular group and the summation is to include all groups.
It will be noted that this expression gives each difference {y—y')
its appropriate importance by relating it to the frequency y of its
own group.
A table in Biometrika (vol. i., pp. 155 et seq.) gives the values of P
corresponding to different values of ^^ (including all integral values
from 1 to 30) and to values of n' , the total number of frequency
groups, from 3 to 30 (see also p. 285). The mathematics in
volved in finding P is difficult, and the reader who wishes to enter
into it must consult the original memoir, but the utiUty of the
function has been proved by experience and it is readily applied
in a particular case.
In the above example ^ is found from col. (7) : it equals 504,
and from the table of values of P, when ti' =13, we have
P=0957979 when x^=^^
and P =0916082 when x^=^'
Therefore, by proportional, interpolation, when '^^=504:,
p =0956303. Thus, supposing our data to follow the normal curve,
in 956 random samples out of 1000 we should expect to get a
worsefitting distribution than that given by the sample actually
observed. We may therefore conclude without hesitation that
the normal curve provides an excellent fit in this particular instance.
APPLICATIONS OF CURVE FITTING
213
We pass on now to fresh distributions to illustrate some of the
other types of frequency curve.
Example (2) deals with the percentage of trade union members
unemployed at the end of each month for the years 1898 to 1912
[data from the Sixteenth Abstract of Labour Statistics of the United
Kingdom, Cd. 7131]. Table (43) shows the distribution of the
180 records according to the percentage unemployed.
The deviations are measured from the centre of the group (39— 52)
as origin, and the class interval (13 per cent.) is taken as unit of
deviation as usual.
The first four moments are : —
I.e.
29/180(=:c), 425/180, 397/180, 3053/180 ;
01611111, 23611111, 22055556, 169611111.
Table (43). Distribution of Unemployed Percentages
OF Trade Union Members
Percentage
Devia
Fre
First
Second
Third
Fourth
Unemployed.
tion.
quency.
Moment.
Moment.
Moment.
]Moment.
0—
3
13—
2
33
66
132
264
528
26—
1
57
57
57
 57
57
39—
. ,
41
. .
, .
52—
+ 1
24
424
24
+ 24
24
65—
+2
10
+ 20
40
+ 80
160
78—
+3
11
+ 33
99
+ 297
891
91—
+4
3
+ 12
48
+ 192
768
104—
+ 5
1
+ 5
25
+ 125
625
••
••
180
29
425
+ 397
3053
Referred to the mean,
455+ l3:r =43405556,
the second, third, and fourth moments are (see Appendix, Note 5),
i.2=^23611111a2=23351543,
i/3=220555563:fc'i/2x3=3338395,
v^=lQ'96lUn^xPsex^V2x^=lS1^8ll.
Owing to the very doubtful contact at the beginning of the curve
Sheppard's adjustments were not made in this case, but the rough
moments as calculated above were used.
214 STATISTICS
Thus ^1 = vyv""^ =0875242
j32=i;4/z/22=343817
and Ac=ft(ft+3)V4(4i823i3i)(2ft3j3i6)=0466.
Since k is negative the fitting curve should be of Type /.,the equation
of which is
where mja^=m2la^, and (a^\a<^—h, say.
It is therefore necessary before going further to determine ?/o, a^,
ag, h, m^ and m^ in terms of v^, v^, v^, or jS^ and jSgj the constants of
the distribution.
The value of 2/0 is found to be most conveniently expressed as a
Gamma function which is defined, with the usual notation, thus : —
whence it follows that T{lc\\)=kT{k). [See Appendix, Note 9,
also p. 285.]
Also, if
B(m, n)=j^ x^^ {\xYHx,
it may be easily shown that
B(m, n)=T(m)T(n)IT(m[n). [See Appendix, Note 9.]
The general method of procedure in determining the constants
for all the different types is : —
1. Express the fact that the area of the curve is a measure of
the total frequency of the distribution — this enables us to
find 2/0.
2. Find the 71th moment of the curve with regard to some fixed
origin — giving n particular values, 1, 2, 3, 4, this leads to
the determination of /Xg, ix^, fi^, pi, ^2 i^ terms of the con
stants of the curve, and thence to formulae for calculating
the constants.
Once found, the same formulae may be used, of course, in all
cases of the same type : we have only to replace letters by the
numbers for which they stand.
Applying this method to the Type I. curve, we have
•+aa
= / ydx
2/0 '^''
/.:
«l""«;
APPLICATIONS OF CURVE FITTING 215
Put (ai+a;)=(%+«2K so that (a2— ^)=(^i+«2)(l~25) and
dx
— ={ai\a2)=b ; therefore
dz
^^.o6K+.2r"YV.(i_,)^,, ... (2)
a/^'a^"^ Jo
B(mi+1, m2+l).
Hence yA= . ;
■ + «2
/ r«2
Again, N/x'„=/ ?/(ai+a:)«(Zir
is the nth moment of the distribution referred to (— a^, 0), the
point where the curve starts from the axis on the lefthand side,
as origin.
Therefore, as above,
a^^a^"^ Jo
=6«nJ 2™i+^(12)"WJ z'''^{lz)'^dz, by (2).
Hence,
^'«=6«r(Wi+n+l)rK+m2+2)/r(mi+l)r(mi+W2+w+2)
=b^(mj^\n)(mj^\n—l). ... (Wi+l)/(mi+m2+w4l)(mi+m2+n)
. . ..(mi+m2+2),
by repeated appUcation of the relation r(k\l)=kr(k).
Putting n=l, 2, 3, 4 in succession, we have
/x'i=6K+l)/K+m2+2),
^'2=62(mi+2)K+l)/(mi+m2+3)(mi+m2+2),
/Lt'3=63(^^_^3)(^^_^2)(mi+l)/K+m2+4)K+m2+3)(mi+m2+2),
^'^=64(^j+4)(mi+3)(mi+2)(mi+l)/K+m2+5)K+m2+4)
(mi+m2+3)(mi+m2+2).
These relations are rendered more concise if we write
mi\'l==m\, m2\l==m'2, m^^m2\2=r ;
thus fjL\=bm'Jr
^\=b^m\(m\+l)/r{r+l) >
/x'3=63m'i(m\+l)(m\+2)/r(r+l)(r+2)
/x'4=6*m'i(m\+l)(m\+2)(m\+3)/r(r+l)(r+2)(r+3).
216 STATISTICS
To get the corresponding moments referred to the mean as
origin we have the relations : —
/ii=0, /X3==/x'3— 3/XV2— /^'\»
H'2=H''2—H''\> /^4=/^'4— Vl/^3— W2— /^'^>
which, after some straightforward reduction, give
/i3=263m\m'2(m'2m'i)/r3(r+l)(r+2)
Thus B =u^ /a3 _ '^b^rn'\m'^(m\m\) ^ i ¥m\m'\
HI /*3/A*2 r6(r+l)2(r+2)2 / r^r+l)^
=4(m'2m\)2(r+l)/m'im'2(r+2)2
Therefore, ^^ ^ft(^+2)^ ... (3)
m\m\ 4(r+l) ^ ^
Again 8 =a /a2 _ 36^^\^^2Ki^^2(^6)+2ri ibhn^^^m^
' ^' '^^^^ r*(r+l)(r+2)(r+3) / r*(r+l)2
3[m'im'2(r6)+2r2] (rfl)
m'im'2 (r+2)(r+3)
Therefore, JTL =.r+6+^J^±')^^ . . . (4)
m\m\ 3(r+l) ^ ^
Combining (3) and (4), 2ft(r+2)^ _ (r+2)(.+3)
whence r=6(i8,^,l)/(3ft2ft+6) . . (5)
Again, since iJL2=b^m\m' 2/r^{r\l),
therefore 62=^2(^+1) • lj8i(r+2)2416(r+l)]/4(r+l), by (3),
i.e. b=y;W[ft(r+2)^+16(r+l)] . . (6)
And m\m'2=4r2(r+l)/08i(r+2)2+16(r+l)],
while m' i\m' 2=r ; hence m'j and m'2 are roots of
^2_^^ I V — \ /_ _Q
^^i(r+2)2+16(r+l)
the solution of which quadratic is hi / r— ^ ^^"^ ^ ;
2 WL i8i(r42)2416(r+l)J'
APPLICATIONS OF CURVE FITTING
217
therefore, m^ and mj* are respectively equal to
C'
and a^ and ctg follow from
(7)
(8)
nil nig mi+nia
Applying these formulae to the ' unemployed ' example, we find
r=536048. mj =0169185. m2=3191295.
6=933236. ai=0469842. a2=886252.
Also 2/0=581282, and the equation of the curve is therefore
0169 / «. \ 319
y=58l(l+^) (1
0470/
886
The position of the origin, which is at the mode, is given by
<
(meanmode) =/x'i—%
_bm\ bnii
r mi\m2
■m\ m\—V
\ r r—2 I
m
m
r{r2)
V.r2'
mode =43405556 i . ^ . !^,
Vi r—2
in this particular case,
=23052009.
(9)
thus,
^ [* When fx^ is positive Wg goes with the positive root of the quadratic, and
vice versa.}
218 STATISTICS
This enables us to write down any x, and thence y by substituting
for X in the equation of the curve, which, by taking logs, may be
written
log y=\o% yo^m^ log ( l+^j+mg log ( 1— 
e.g. for the x of the group (26— 39), bearing in mind that 13 is the
unit of measurement for x, we have
^325=(32523052009)/l3=09447991/l3.
Hence ("1+^^^^ =2546835 ; (^1 ^^^=09179953 ;
mj log ^+?i:^'^j =00686892 ; m^log ("l?^^ = 0118587 ;
so that log 2/=l714489,
and y ^.2^=5182.
Similarly the ordinates at the centre points of the other groups
may be calculated, but it must be remembered that the resulting
values are only a first approximation to the observed frequencies,
and a better series is obtained if, by using some good quadrature
formula, we calculate the areas for the successive groups between
the curve, the bounding ordinates, and the axis of x. Indeed in
the case of the group (13—26) it is essential to do this, because
(1) the rise of the curve is so very abrupt as to render the deter
mination of the single ordinate at the centre quite inadequate for
an accurate measure of the frequency in that group, and (2) a
portion of the group falls outside the range of the curve which only
starts at 16944063 {i.e. mode— l3ai), and this has to be allowed
for in finding the frequency as represented by the area between the
curve and axis.
The base of the required area, range (16944063 to 26), was
therefore divided into eight equal parts and the ordinates at the
points of division were determined. The area was then found by
using Simpson's weUknown formula : —
Area=P[(i/o+2/2p)+2(2/2+2/4+ • • • +2/2p2)+4(2/i+2/3+ . . • +y2pi)l
where h denotes the length of one of the equal parts into which
the base is divided and 2p is their number ; in our case p=4: and
h=^, the class interval being the unit, and the result is to be
reduced in the ratio
09055937 : 13
APPLICATIONS OF CURVE FITTING
219
in order to allow for the smaller range of this group ; we thus get
as the area for the group
A.QQKKQO'T' 1
—5^3— X ^^[(2/0+2/8) +2(2/242/4+2/6)+4(2/i+2/3+2/5+2/7)] =37.39.
The observed and calculated frequencies for the whole series are
compared in Table (44), the remaining areas in col. (4) being calcu
lated by the simpler but somewhat less accurate form of Simpson's
formula, when only three ordinates are used, namely,
/.
+1
2/<^^=i(2/i+42/o+2/i).
Table (44). Comparison of Observed and Theoretical
Frequencies of Unemployed Percentages
(1) (2) (3) (4) (5) (6) (7)
Percentage
Unemployed.
Observed
Frequency.
Theoretical Frequency.
Deviation.
Square of
Deviation.
Ratio of No.
in Col. (6) to
No. in Col. (4).
Ordinates.
Areas.
13—
2.a
39
52—
65—
78—
91—
104—
33
57
41
24
10
11
3
1
553*
518
378
249
148
77
33
10
374
516
378
250
149
78
34
12
+44
64
32
+ 10
+49
32
+04
+02
1936
2916
1024
100
2401
1024
016
004
052
057
027
004
161
131
005
003
••
180
••
1791
••
••
X*=440
To test the goodness of fit we have n^=S, ^2—4.49^ whence, by
means of the P table, P =0731852. Thus, roughly, we may say that
three out of every four random samples of 180 records would give a
worse fit with the proposed curve than is given by the actual distribu
tion observed, so that the fit may be regarded as quite a reasonably
good one. This conclusion is also supported by an examination of
the curve which has been drawn, fig. (36), with the histogram of
the given statistics.
Example (3). — The data for this example concerning infectioils
diseases will be found in Table (16), p. 62 (or, see p. 224) ; the
reader should work out the moments for himself and verify the
following results : —
[* The ordinate in this case cannot be accepted as an approximation to the
frequency given by the curve.]
220
STATISTICS
The first four moments referred to 7 as origin are
0282158, 486307, 174855, 129394.
Referred to the mean, 7564316, the three latter become
7^2=478346, 1/3=134140, 7/4=111964.
If we do not assume high contact at the terminals, and certainly at
the lower end it is doubtful, we deduce from the above values of
the moments that
jSi=l64396, j32=489321, a^=153.
Thus the fitting curve is of Type I. and its constants, when calcu^
lated, are
r=ll7819. mi=:031171. m2=947020.
ai=079216.
a2=240671. 2/o=60363.
Dw r
t^'^z :
: i + s^ __ .
en I tt S^
50 __ _jL_^ _ ^ __
T ' S _. .
+ K
40 T ^ = "
Qn _:  . tiz a__:!:_ _
30 — IT — J ::?:
± :  4 ^v
s
nn I "•
s^   . ... . ..
::: : i^s": : :::::::::::::: ::
IQ I ,
"■^^
"= .^
::::::::: :::ffi:::::::j::::::
1 2 1 3 4^5
6 7 8 9 10 11 12
Percentage Unemployed
Fig. (36).
The equation of the curve is therefore, retaining three significant
figures throughout,
y=60.4(l+JLy7i_JL)'.
\ 0792/ \ 241/
The curve starts at 202904 (so that the first group of observations
lies whoUy outside its range) and ends at 517475. It is drawn,
together with the corresponding histogram, in fig. (37).
Supposing, just for the sake of comparison, we assume high
contact at the terminals and attempt to fit the given distribution
with a Type III. curve, to which Type I. is closely related.
We then have, after making Sheppard's adjustments,
;x2=:470013, /X3=134140, ix^=l0d60l,
whence j3i=l73295, ^2=496129, a:=147.
It will be noted that the theoretically correct type to take here
again is Type I., but this was discarded because, when attempted,
APPLICATIONS OF CURVE FITTING
221
it led to a curve starting at a point corresponding to a disease rate
of 3385, so that the central ordinates of each of the first two
observed groups lay outside the curve altogether.
Type III. curve is of the form
y^y^ev^ 1+
70
■eso
S40
30
CO 20
%
%
I
;Pype
I
XypelII
^i
't^
ro
10
15
20
25
30
Disease Rate per 1000 persons liuing
Fm. (37).
To express the constants in terms of the moments, noting that the
curve starts from a; = — a on one side and goes off to infinity on the
other, we have
N= ydx
J a
=2/oj e^(l+l dx
^Vo f ey%a\xfdx (where ya=p)
= f,rey(ya+yxrdx
=^6^1 e'^y''+y''\ya+yxydx
V eP /°^
=^o_ t'z^dz (where ya^yx=z)
yp^ k
Therefore, y,=Np''+'/ae''r(p+l) .
(10)
222 STATISTICS
Again, the nth. moment of the distribution referred to (—a, 0)
as origin is
Nja'„=l y{a'\x)*^dx
J a
J a
vo
.2/0 e^
Therefore, by (10),
Hence,
i^'i=rtp+2)/yr(i>+l) = (^+l)/y
i^'2=r(^+3)/y2r(^+l) = (^+2)(^+l)/y2
i^'3=r(i)+4)/y3r(p+l)=(i>43)(i9+2)(p+l)/y^
Transferring to the mean as origin we have for the moments, since
fJi3=H'' 3—Sxfji^—x^=2(p{l)/y^.
Hence, combining these last two equations,
y=V2//^3. v=(^f^Mfi\)i . . . (11)
In our particular case these equations give
y=0700780, :p=l30820, a=l86678,
and, therefore, by (10),
2/0=553323.
Hence the curve is
y=553e°"'ni+^^
\ 187
The equation of the curve, on taking logs, gives
log y=log yQ—y log io« . x+p log 1 1+:
=l7429790304345a;+ 130820 log (l+a:/l86678).
APPLICATIONS OF CURVE FITTING 223
Before we can go on to calculate the ordinates of the curve we
need to know where the origin lies, and since it coincides with the
mode it may be found from
meanmode =yJ ^—a
=(p+l)yp/y
='^ (12)
Thus, mode=75643162853960=47I036.
Mode Mean
Suppose now we wish to calculate the ordinate corresponding to
the X of the centre point of group (6—8), we have
a;7=J(7471036)
=114482,
bearing in mind that the unit is a rate of 2 per 1000.
Hence, substituting this value in the equation for log y,
log 2/7=1666278
2/746374,
and similarly any other y may be found.
The curve starts at
modea=4710362(l86678)=097680,
so that the range of the first group as determined from the curve is
(09768—2), and not (0—2) as in the observations.
The ordinates and afterwards the areas, calculated by a method
somewhat similar to that indicated in Example (2), were determined
for each separate group of observations, and the results for both
Type I. and Type III. curves are compared in Table (45).
Type III. curve is drawn on the same diagram, fig. (37), as Type I.
curve and the observation histogram, and the result lends emphasis
to an important point, namely, the necessity for replacing ordinates
by areas to obtain the frequency proper to any group.
In order to get a measure of the goodness of fit in each case,
the function P was calculated, but in the Type I. comparison the
first group had to be omitted to avoid the infinite term which would
have resulted in ^^^^ owing to this group falling right outside the
curve, that is to say, the test had to be confined to towns in which
224
STATISTICS
the observed case rate was not less than 2. The values found for
P were : —
Type I.— P=034307,
Type III.— P=046298,
so that in every 100 samples containing 241 observations each, we
should get, roughly, 34 deviating from the Type* I. curve and 46
deviating from the Type III. curve, at least as widely as the given
distribution. In neither case can the fit be regarded as a very
good one, but the failure is only marked in one or two groups, such
as that of maximum frequency, where there may be other than
random causes to account for it ; e.g. where isolation is inefficient
the disease is likely to spread, one case infects another : in other
words, the events are not independent.
Table (45). Comparison of Observed Distribution of In
fectious Disease Kates, notified in 241 large Towns of
England and Wales, with Theoretical Distribution.
(1) (2) (3) (4) (5) (6)
Observed
Frequency.
Theoretical Frequency.
Case Rate.
(/i/)V/i.
{fzmu
Type I.
Type in.
(/)
(/i)
(/a)
0—
5
66
, ,
039
2
39
526
437
352
051
4
69
554
543
334
398
6—
41
432
462
Oil
059
8—
29'
312
336
015
063
10—
22
215
224
001
001
12—
16
142
141
023
026
14—
7
91
86
048
030
16—
5
56
51
006
000
18—
3
33
29
003
000
20—
4
19
17
232
311
22—
10
09
100
090
24
05
05
050
050
26—
1
03
03
163
163
••
241
2398
2409
X\ = 1338
X^3= 1281
Example (4) refers to the wages of certain women tailors previ
ously recorded in Table (II), p. 41. The data as given in the
original suffered a disadvantage common to such statistics : at
APPLICATIONS OF CURVE FITTING 225
either end the grouping differed from that in the centre, two or three
classes being lumped together owing to the smallness of frequency
in each. The figures ran thus : — Under 5s., 19 ; 5s. and under 6s.,
180 ; 6s. and under 7s., 384 ; ... ; 23s. and under 24s., 64 ;
24s. and under 25s., 54 ; 25s. and under 30s., 122"; 30s. and over,
36. They were recast in the form shown in Table (46), suggested
by an examination of the histogram, in order to make the fitting
simpler.
The first four moments calculated from this adapted table and
referred to 12s. as origin are : —
z/'i=0556718, i;'2=5056373, i;'3=1670163, i;'4=1237691.
When referred to the mean, 13113436, the last three become
1/2=4746438, 1^3=860179, i/4=956914 ;
or, after making Sheppard's adjustments,
/X2=4663105, />t3=860179, /x^ =933474 ;
therefore, ft =07297 13, ^=429291, a:=163.
The curve is thus of Type VI.,
y=yo(xa)'^/x'^i.
To calculate the constants, the nth moment about the origin is
given by
NjLt'„=l yx'^dx
Ja
=yJ'^(x—af^x''^^dx
Ja ,
Vol <^"^^ • ,;^«(~2J^^(^here ^=j
fVi'^2n2n_2y2e^2
nljo
2/0
Biqiq^nl, g^+l).
Thus, putting n=0,
qQI'121
and fJi'n=a^r{q^q2'^n)r{q,)ir{q^n)r{qiq>,l) ;
therefore, ix\=ar{qiq2^)T{qi)ir{qil)r(qiqil)
=«(?! l)/(9'i9'22)
Also iJL\l^\.i=ar(q,q2'^n)r{q,n+l)ir{qin)r{qiq2n)
=a{qin)l{q^q2nl).
226 STATISTICS
Hence fJ^' 2=a^qj,l){q^2)l{qiq^2)(qj^q.^3)
/^'3=«'tol)fe2)fe3)/tei^22)feg23)(gi?24)
But these relations are precisely the same as those of Type I. with a
in place of b, —q^ in place of m^, and q^ in place of mg, so that
(lfQa), (1— Qi)* are the roots of
q2_rq__4r2(r^l)yf^^(r__2)2_^16(r+l)]=0 . . (14)
where r=603,j8,l)/(6+3ft2ft) .... (15)
Also yo=Na'^i'^2T(qi)/r(aia2l)r(q2+l), by (13) . (16)
and a is given by
/.,=a=^(laJ(Ha2)/r2(r+l), .... (17)
/X2 being the second moment of the given distribution referred to
its mean as origin.
The distance of the mean from the origin is
/^'i=a(aii)/(qia22),
and this fixes the origin, for the mean is known directly from the
statistics.
To get the mode, use the equation of the curve, putting — =0,
dx
and we have
origin =mode — ag'i/(g'i — g'g) •
Combining this with
origin =mean— a(gi— 1 )/(g'i— ^2— 2)
we have
mean.mode=a(ai+a2)/(aia2)(qiq22) • • (18)
Applying these formulae to the case of the women tailors,
r=387698, ^1=515269, ^2107571, a=2M1018,
and the equation of the curve is
y=yo(x21l)^»Vx"^
where log 2/0 =68 8254.
Also the origin is at —419104, the mode at 114498, and the maxi
mum theoretical frequency is 2299.
[* When Ms is positive (1 +5^2) goes with the positive root of the quadratic, and
vice versa. ]
APPLICATIONS OF CURVE FITTING
227
Table (46). Distribution of Wages of cebtain
Women Tailors, Actual and Theoretical.
Wages.
Frequency.
Wages.
Frequency.
Actual.
Theoretical.
Actual. Theoretical.
Is.—
3s.—
6s.—
7s.—
9s.—
lis.—
13s.—
16s.—
17s.—
5
14
564
1243
2045
2339
1815
1432
854
1
52
462 1
1332 j
2096
2255
1898
1353
859
19s.—
21s.—
23s.—
25s.—
■ 27s.—
29s.—
31s.—
33s.—
523
262
118
64
43
27
15
9
503
278
147
75
38
19
9
5
••
••
..
••
11,372 i 11,372
i
The theoretical and actual frequencies are compared in Table (46)
and the curve is drawn with the histogram in fig. (38).
2500
^^
2000
J 1500
1000
500
ft
m
s:
0/.
5/
10/^1 S
' O a)
15/ 20/
2 Rate of Wages
Fiu. (38).
25/
30/
35/
Example (5) discusses the distribution of frequencies of specimens
of Anemone nemorosa with different numbers of sepals, recorded by
G. U. Yule {Bicmetrika, vol. i., p. 307).
Wn=j;
228 STATISTICS
The first four moments referred to 6 as origin are
^^=0508, i.'2=l012, i/'3=2476, i,'4=9124.
Referred to the mean, 6508, the last three become
^2=07539360, z;3=M95905, 1/4=5459941.
The contact, at one extremity certainly, being doubtful, Sheppard's
adjustments were not made in this case. Hence,
j8i=3337259, j32=9605476, /c=l46.
Since k does not differ greatly from unity an attempt was made to
fit the observations with a Type. V. curve, namely,
y=yoXPe^^
The wth moment about the origin is given by
yx^dx
(since, p and y being positive, y vanishes at x—0 and at a;=oo)
=yQy''P+^rzP''h'dz (where z=y/x)
=y^yP+^r{pnl).
Thus N=2/oy^+'r(2)l).
And ^' Jii\x=yl{pn\).
Hence />t'\=y/(^— 2)
/^'2=y7(2>2)Cp3)
/x'3=yV(p2)(2>3)(2?4).
Referred to the mean as origin, the last two moments become
/^2=yV(f'2)^(p3),
/^3=V/(i'2)'{l>3)(i'4),
whence
this gives a quadratic for (i?— 4), one solution of which is
p_4=[8+4V(4+i3.)]/i3., . . . (19)
the positive root being taken in order to get a real y.
Thus y*(P2)V[(P3)/^J . . . (20)
and y,=Ny^Vr(pl) .... (21)
Since /^'i=y/(2'~2), the position of the origin is given by
Origin=Mean— y/(p— 2) . . . (22)
Also the distance of the mode from the origin is y/p, so that all
the constants of the curve are readily determined.
[* The sign of 7 is taken to be the same as that of /Xg.]
APPLICATIONS OF CURVE FITTING
229
In our particular case, we get
^=9643840, y=1710768,
and the curve is
y=yoX'e^/^
__
600  fJ    __
X
 Jt
It
H A
± i ::::::
1 T _
... TIl
500 f tt
t iJ
i il
r
S2 X : it
«  ^i T
§•  4 
<^ 1 i
si t IT
O   . 
~ : : :::
S it _ _ T IT
^ _  4^
o> 4t  it
1^ t  41
^ 1.1
'5 1 a
^^nn  t  S
03300 I jr
§ " t
1 . .. i
•p . . V
§ . . \
^ t
^ : : " jl
H^   ^
2oo : ■■£
S ja
a jn
1" ___ L^::_:
^ ::i:±:::::::::iin::i:i:i:::::::ii:i:i:::i:i
^i
 "■ \i
tx
ion   4  ■ A _ _ _ _ _ _
+ tr
_t ^S
t ^N
t  ^S
^^.
If _ ^^ a ^S
 J "3 jrt ^S«,
::;±±::;i:i:::::::::2:!±:Es«;;i==;;;::
8
Number 0^ Sepals
Fia. (39).
10
12
where log 2/0 —938179. The origin is at 427 and the mode at 604.
The greatest frequency is 620 approximately, and the frequency dis
tribution, calculating areas for the several groups as if they ranged
between (45— 55), (55— 65), etc., is shown alongside the observed
236
STATISTICS
distribution in Table (47). The curve is plotted in fig. (39) from the
ordinates which were calculated at the centre and extremities of
each group so as to enable Simpson's simple quadrature formula
to be used to get the areas.
Table (47). Distribution of Sepals of Anemone
Nemorosa, observed and calculated.
No. of
Sepals.
Frequency.
No. of
Sepals.
Frequency.
Observed.
Calculated.
Observed.  Calculated.
5
6
7
8
34
576
276
92
51
544
296
81
9 '
10
11
12
1
1
14 22
4 G
2
4 1
••
..
..
1000 , 1003
[Examples have been given above of five out of the seven different types
of frequency curve that have been enumerated. For further examples of
all the types and a complete account of the method reference should be
made to Professor Pearson's memoirs, especially the following : —
Roy. Soc. Phil Trans., vol. 186a, pp. 343414 (1895), On Skew Variation
in Homogeneous Material ; and a Supplementary Memoir in vol. 197a, pp. 443
459 (1901).
Biometrika, vol. i., pp. 265 et seq., On the Systematic Fitting of Curves to
Observations and Measurements, continued in vol. ii., pp. 123. Also vol. iv.,
pp. 169212, which discusses various historical hypotheses made to generaUze
the Gaussian Law, the basis of the symmetrical normal curve.
A large number of highly interesting practical illustrations of Pearsonian
curve fitting occur throughout the pages of Biometrika, while W. P. Elderton's
Frequency Curves and Correlation contains an admirably concise treatment of
the theory, with applications to meet more particularly the actuarial point
of view.
It should be stated that rival curves and methods have been proposed as
suitable for fitting certain types of frequency distribution, some of which have
scarcely received the attention and the trial they deserve. Among the most
interesting are those developed by Professor Edgeworth ; for some account of
his voluminous work upon the subject the reader may refer to several memoirs
in the Journal of the Royal Statistical Society, beginning December 1898
(the Method of Translation), among which the following are important as
giving more recent results of his researches : —
Vol. Ixix. (1906), The Generalized Law of Error or Law of Great Numbers.
Vol. Ixxvii. (1914), On the Use of Analytical Geometry to Represent Certain
Kinds of Statistics.
Vol. Ixxix. (1916), On the Mathematical Representations of Statistical Data;
continued in vol. Ixxx. (1917).
Two memoirs may be cited as of particular interest — those of May 1917
and March 1918 — because they reply to criticism and draw a comparison from
their author's point of view between his curves and those of Professor Pearson.]
CHAPTER XVIII
THE NORMAL CURVE OF ERROR
Let us return for a moment to the general statement on p. 143,
that ' whenever we have n similar but independent events happen
ing in which the probability of success for each is jp, the different
resulting possibilities as to success are given by the successive
terms in (sf/)", namely,
and their correspondent probabilities by the successive terms in
0)+^)", namely,
When we come to try and apply this theory directly to cases
other than those of random sampling in artificial experiments with
coins, dice, etc., we are faced at once with difficulties because of
the limiting character of the assumption on which the theory rests,
namely, that all the events are to he similar and independent. The
similarity demanded is of the same radical type as that existing
when we throw the same die or spin the same coin twice running,
and the test for it is that p, the chance of success, is to be the same
for every individual event. The independence is to be such that
no single event and no combination of events is to have any influence
upon any of the rest.
Now for most classes of events it is impossible to assign any
a priori value to p at all, still less can we be sure that p does not
change from one event to the next. For example, the chance of
death for soldiers in wartime varies from regiment to regiment
according to where they happen to be located ; for the same regi
ment it varies from battaUon to battaUon according to whether
they are in the trenches or behind the lines ; and from individual
to individual according to innumerable little accidents of time, place,
231
232
STATISTICS
and condition. Also, where the shells burst thickest, p increases
for any soldier there, but it increases also for his neighbour. Thus
the events in such a case are not similar, neither are they inde
pendent.
Moreover, as it stands, the theory cannot be appUed to any
distribution in which the character observed is capable of continu
oiLS variation. This difficulty, however, has been overcome, as we
have seen, by replacing the histogram representative of the binomial
by a continuous curve which at the same time serves to describe
the discontinuous series to a high degree of accuracy.
To illustrate how close this
description can be, even when n
is comparatively small, we will
fit with its appropriate normal
curve the symmetrical binomial
polygon formed by joining up
the summits of the ordinates
representing successive terms of
the series
— ^ 2io(HJ)i«,
A
A
A
K
N
\
erected at unit distance apart.
The total area bounded by the polygon, the extreme ordinates,
and the axis of x is practically
= (2/0+2/1+^2+ • • • +2/'i+2/'2+ • • Oxil)
=sum of toe given ordinates
=1024.
The equation of the normal curve is
where
and
Yo=N/V2^ • (7=1024/V(557r).
1 %
Hence, taking logs, we have
logi/=log Yo— logioe
=239I5437x2(00789626).
It is easy from this equation to calculate the normal curve ordinates
corresponding to x=0, 1, 2, 3, 4, 5, and the results, compared with
the polygon ordinates, are as follows : —
THE NORMAL CURVE OP ERROR
233
X
Ordinate of Polygon.
Normal Curve Ordinate.
(J , r^ *,.., „
262
2463 <^*«^
±1
210
2054' ^ ' ' ^'
±2
120
1190 / m; '
±3
46
480 *"" ^
±4
10
134 ^^^'
±5
1
26 / 7
Now although the circumstances in which the series
may be taken to represent the frequency distribution resulting
from a particular kind of experiment were so stringently defined,
there is no reason why the normal curve itself to which the theory
led should be subjected to precisely the same limitations. After
all, the real and only justification for choosing one curve rather
than another to fit any given observations is that it does succeed
in fitting them better. But when the further question is asked
why the normal curve should succeed in describing some results
so well, we must not be tempted by analogy to rush to the con
clusion that the causes at work are necessarily independent, and
equal, and so on. In short, the theoretical justification and the
empirical use of the normal curve are two quite different matters.
Experience shows that the normal curve suffices to fit certain
types of distribijtion, besides those which arise in tossing coins and
in similar experiments, with remarkable accuracy ; among these
may be noted : —
1. Certain biological statistics ; for instance, the proportions of
male to female births taken over a series of years for a large com
munity such as the population of a country ; also the propor
tions of different types of plants and animals resulting from cross
fertilization.
2. Certain anthropometrical, particularly craniometrical and allied
statistics^ such as the height, weight, lengths of various bones, skull
measurements, etc., of a large group of persons, and the agreement
is the closer if the group be reasonably homogeneous, i.e. composed
of individuals of the same nationality and sex between the same
narrow age limits, etc. ; also measurements of a similar character
in animals and plants.
3. Errors of observation in experimental work ; for example,
234 STATISTICS
several measurements of the same quantity — length, weight, speed,
temperature, or whatever it be — will contain errors of this kind
which are equally liable to be above or below the true value.
4. The marks of shots upon a given target, assuming that the
shots are equally liable to err in any given direction. This is an
interesting case of the normal law in two dimensions, for the north
and south line and the east and west line through the centre of
the target may both be regarded as axes of normal curves of error.*
5. Certain sociological statistics of a comparatively stationary char
acter ; for example, rates of birth, marriage, or death at neighbour
ing times or like places ; also the wages (and possibly the output
if it could be satisfactorily measured) of large numbers of workers
engaged in the same occupation under the same general conditions.
6. Any statistics or quantities that are individually compounded of
a large number of elements, mostly independent of one another, which
themselves vary between limits not very widely divergent, and none
of which exert a preponderating influence upon their resultant
statistic. The latter may be simply the sum of its elements, or,
more generally, it may be any function of the elements which, to
the first degree of approximation, can be expressed in linear form.
Now it would be a difficult matter in most of these cases to satisfy
ourselves as to the fulfilment or nonfulfilment of conditions like
those on which the binomial distribution rests. It is not easy
indeed to visuaHze them perfectly, except in artificial experiments
where they are largely under control. If anything, the chances
seem almost hopelessly against their fulfilment in ordinary life,
so closely must we hedge round our sample to keep out unequal
influences. For example, to use a frequently quoted illustration,
if p measures the chance of death for an individual, the death rate
varies, as we know, considerably from place to place according to
the age and sex constitution of the population ; it is influenced by
differences in class, and occupation, and manner of life ; it is
altered from time to time, violently by the ravages of war or disease,
more gradually by improvement in general sanitation, housing
conditions, etc. We should only expect to get the binomial distri
bution (and consequently the normal law if it depended upon the
[* Sir John Herschel published in the Edinburgh Review (1850) an a priori
proof of the normal law from a consideration of this problem. Taking <t>{.x'^) as
the expression of the law for one dimension and <t>{x^ + y^) for two dimensions,
the independence of errors in perpendicular directions leads to the functional
equation <p{x^\y^)='(p{x'^)>^(p{y^), the solution of which is of the form
0(x) = — p c " ^^^. It should be added that the assumptions underlying the proof
are not entirely above criticism. ]
THE NORMAL CtJRVE OF ERROR 235
same postulates) exactly verified if we were dealing with the same
stationary population existing under the same stable conditions
over a long period of time ; moreover, since jp is to be identical for
each individual event in the ideal case, it would be further necessary
that every family and every individual in our population should
also remain in the same stationary and stable state. This is mani
festly impossible, especially after the industrial revolution which
the advent of machine power created.
These considerations suggest the interesting question whether the
various types of statistics we have enumerated, as being approxi
mately subject to the normal law, could not, if we knew more
about them, really all be included under heading number (6), repre
senting a further development from the binomial theory and an
enlargement of the field in which it holds good.
In an earlier chapter, when we were discussing the connection
between marriage rate and prices, we showed how it was possible
by a method of averaging to differentiate between longtime and
short time effects. The more transient fluctuations, only super
ficial in character, were removed and the real nature of any per
manent change in the figures was revealed. In much the same
way, when we have a group of statistics which do not perhaps fit
a normal curve of error at all closely, it may be possible by random
averaging to get rid of some of the fluctuations which cause the
badness of fit and to obtain a new group of statistics which more
nearly obey the normal law. Averaging, that is to say, tends
to smooth away the rough outstanding abnormalities ; and we shall
presently show that if two variables, X^, Xg, which are independent,
obey the normal law, any linear function of the variables
{w^y^\w^,^, obeys the same law. This may throw some light
on Class (6) where each statistic represents a compound, that is,
in a broad sense, a kind of an average of a large number of elements
which partially neutralize one another's infiuence, or rub the corners
off one another, so to speak, since no single element is, by hypothesis,
to exert an overwhelming infiuence upon the compound itself.
But although the normal curve does serve to describe a consider
able number of frequency distributions within reasonable limits,
there are many more cases in which it fails : for example, the
greater part of those bearing on economic matters ; also statistics
relating to the incidence of disease and degree of fertility are, as
a rule, very markedly skew. Hence arose the necessity for an
extension from the symmetrical normal to some kind of skew
variation curves to fit such distributions.
236
STATISTICS
The normal curve, however, has an importance of its own to
which we must now draw special attention. It is the foundation
of the theory of errors and provides us with an invaluable method
of estimating the importance of one error in comparison with
another, or of determining the probability that an error shall lie
between stated limits. Upon it we depend for several most
important approximations which are in constant use.
The term ' error ' is used here in the sense that if we take the
mean of a number of observations, the deviation of any one of
them from the mean may be termed its error. When such devia
tions can be satisfactorily fitted, that is, within the limits of random
sampling, by means of a normal curve, they are said to be subject
to the normal law of error.
This law is expressed, as we
have seen, by the equation
a
where y . Sx measures the fre
quency with which an observed
organ or character deviates from
the mean by an amount lying between x and (icfSx) in a large
population, i.e. y . hx registers the frequency of an error of size x
to (x\hx), and N and a are constants dependent upon the particular
application of the law.
The 'probability curve or normal curve of error. As a guide to the
drawing of the above curve it may be worth while plotting
y=e^.
This is readily done by writing the equation in the form
—x^=\og^y.
Giving now to y the values 0, 01, 02, etc., we can find values of
loge y as shown in Table (48), and, by means of a square root table,
X is then determined.
Table (48). Corresponding Values of x and y to plot y=e^\
N
2/ = — =
V2^
y
logey
X
±00
V
logey
05108
X
±071
— 00
06
01
23026
±152
07
03567
±060
02
16096
±127 1
08
02232
±047
03
12040
±M0 :
09
01054
±032
04
09163
±096
10
05
06932
±083
THE NORMAL CURVE OF ERROR
237
This enables us to plot the graph as shown in fig. (40). Since
logg 1=0, and the logarithm of any number greater than 1 is
positive and thus cannot be equal to —x^, it follows that y cannot be
greater than 1 . Moreover y cannot be less than 0, for the logarithm
of a negative quantity is meaningless, but, as y approaches 0,
X approaches cxD.
Also the curve is symmetrical about OY because for any possible
value of y there are two values of x, equal and opposite.
Returning now to the curve
y — 7= — ^ >
V27T . (7
it must be of the same general shape as y^er^^ because the two
only differ in their constants. It is clearly symmetrical, for
200 175 150 125 100 075050 025 025 050 075 100 125 150 175 200
Fio. (40). The graph of ?/=c*'.
instance, about the axis of y, because, in this case also, to any value
of y there are two values of x equal and opposite. Moreover it
tails off to the right and left from OY, the axis of x being an
asymptote,. for as x tends to ioo? V tends to zero as before.
When
a;=0, 2/=N/\/27r . cr,
giving the point B, fig. (41), where the curve cuts the axis of y.
This is evidently the highest point on the curve, for
dy
Na:
■a;2/2<r2
dx V27r . a^
and this vanishes when a;=0.
d^y N
Again,
d^^ V27r(T»
which vanishes when iC=ior, and at these two points, H, H', we
238
STATISTICS
therefore have ' points of inflexion ' where the bend of the curve
changes its direction.
The axis of y about which there is symmetry evidently locates
the mean error, in this case zero ; in fact the mean and mode
coincide, so that the mean or zero error is also the one which most
frequently occurs, and any two other errors which are equal in
magnitude but above and below the mean respectively occur with
equal frequency : i.e. the frequency of positive errors is balanced
by the equal frequency of negative errors on the other side of the
mean, making the median error likewise zero.
Again, the area / ydx measures the frequency of errors lying
r + X
between x^ and X2 above the mean ; I ydx registers the frequency
Fio. (41).
of errors between and x, or of deviations up to this magnitude,
on either side of the mean ; and, in particular, for all errors
the total frequency = I ydx
V2.
N r^
277 . aJ^
^'I''dx
N
V27r.c7
(V27r . a) (as on p. 206)
This enables us, by means of the fundamental definition, at once
to write down the probability of errors between any stated limits
and explains the origin of the name, the probability curve, which
THE NORMAL CURVE OF ERROR
239
is sometimes given to the equation. Thus we have the probability
of an error between \Xi and +^2
_frequency of errors between the given limits
frequency of all errors
= / ydx/i ydx •
(1)
Incidentally, the probability of an error between x and
N
8x
(x+Sx)
x2/2(72
VS;
(2)
Fig. (42).
Greometrically, the area represented by the shaded portion of
fig. (42) measures the frequency of errors between {x^ and +a;j,
while the complete area between the curve and axis X'OX measures
the total frequency, so that the probability of an error between
\Xi and \X2 is measured by the proportion which the area of the
shaded portion bears to the whole area.
dx
If in the above expression (1) we put x/a=$, so that — =ct,
d^
it becomes
V27r4
(3)
which is known as the probability integral, ^^ and ^2 being the
240
STATISTICS
values of f which correspond to the values x^ and x^ of x. But
this integral measures the area of the shaded portion of the curve
1
y=
■u'
V27T
(4)
shown in fig. (43), which is really the normal curve over again, but
drawn on a different scale, namely, with the ordinates reduced in
the ratio N : a and with the standard deviation a taken as the
unitof measurement for a;, for f=: 1,2, 3 . . . whena:=cr, 2cr, 3a, . . .
This has the effect of making the total area unity and the area
given by
1 r^2 . .„
. (3) bis
27rJh
V2
now directly measures the probability of an error between of ^ and a^^.
Tables have been prepared
(see pp. 284, 285) which enable
us to write down the value of
this integral for different values
of fi and ^2 between certain
limits (see Appendix, Note 10).
Let us take an example to
show how the curve may be
used, and we choose one leading
to a binomial distribution, so
giving an expression for the
probability by first principles,
Fig. (43).
in order to compare the two methods
Example. — Suppose we toss simultaneously 100 coins, and sup
pose the chance of success, say ' heads,' is the same for each coin
and equal to 1/2. In that case, according to the binomial theory,
the probabiUty of 100 heads =(l/2)ioo,
„ 99 heads and 1 tail =iooCi( 1/2)99 (1/2),
„ 98 heads and 2 tails =iooC2(l/2)98(l/2)2,andso on.
The most probable number of heads==7ip=(100)(l/2)=50. This
does not mean, as explained before, that if we perform the
experiment once we are sure on that one occasion to get exactly
50 heads and 50 tails, but that if we go on repeating the experiment
we shall in the long run get 50 heads and 50 tails turning up more
often than any other combination.
Let it be required to find the probability of getting at least 65
THE NORMAL CURVE OF ERROR 241
heads, that is, we want the probability of getting 55 heads or
more, and this is given by
a sum not very readily calculated if we have to go at it in a straight
forward manner.
Now let us turn to the curve of error method. The standard
deviation for the distribution is given by
Since the mean number of heads to be expected if the experiment
is repeated a considerable number of times =50, we want to find
the probability of an error equal to or greater than 5, i.e. an error
lying between a and +CX), because a =5.
But the probability of an error between cr^i and erf g
Hence the required probability
=015866, by the probability integral tables.
In other words, if we repeated the experiment 100 times, we might
expect 55 or more heads about 16 times.
We can now show that if X^, Xg are two uncorrelated variables
obeying the normal law, then (w^^\w^2) ^^'^^ ^^^V ^^^ same law.
Suppose x^, X2 are observed deviations from the mean values
Xi, Xg in one particular record, a^, og being the respective S.D.'s.
Let X=i<;iXi+^2^2' ^^^ ^^^ ^ ^® ^^® deviation in X corre
sponding to deviations x^, x^ in the given variables.
Thus X\x=w^{X,+Xi){w^{X^+X2)
=KXi+w;2X2)+ Kiri+w;2^2)
Therefore, x=W;iXi]W2X2'
But the same error x may be obtained by giving x^, Xg many different
values provided their weighted sum is unaltered. Let us first
keep x^^ constant, so that the corresponding value of X2 required
242 STATISTICS
to produce an error lying between x and (x\Bx), where Bx is small,
must be such that
X<WjXj^\W2X2<X\ SiC,
i.e. x—WiXi<W2X2<x—WiXj^]Sx,
i.e. X2 lies between (x—WiX^fw^ and (x—w^x^\hx)lw2, and the
probability for this
Wi ' V27r . (72
Now this is in a form which only involves 8a;, x, and a^^, and we
get the total probabiUty for an error lying between x and (x^bx)
by giving all possible values to the error Xj^.
But the probability for x^ itself to lie between x^^ and (iCi+^^i)
.Xi+Sxi
V27rar
= _ f e'^"'^'^dx
0^1 ^a;2j/2o2
e ^1/2.1^ by (2),
V'27r(Ti
and the probability for this to. concur with a suitable a^g to produce
an error in the weighted sum lying between x and (x^Sx), on the
assumption that X^ and X2 are independent, is therefore
'Bx 1
L(JiV27T
_^2 G2V27T
^ _ x^i _ ixu'ixiy^
6^ c 2<''i '^'^t^'^ 8a;i.
g(xwia;i)'i/2(r22Uf'2
2^227roiCr2
Hence the total probability for an error lying between x and {x\Bx)
is obtained by integrating this result, that is, summing all possible
probabilities, between a;i=— 00 and x^=^]co. This gives
_^____/ e ^''^'^l 2(r'2Wi2' 2<rV<^vi2 ^ '^"■^^"'^f^a;
lyg . 273Cri(T2yoo
^^ .+00 .a:% ^^^ +J!M_:ci_^!_
_ "^ / g ^2(r2ia22w22^2a22t(;'^ ^ ^"^'^^^"^dx
where a^=w\cr2j^(2i;22or'i2
2ttg^2^'=°
W9 . 27roi
W2  27701(72
g 2o22iyV^
Jco J <7
THE NORMAL CURVE OF ERROR 243
Where t = —pi — — x
V2\o'iO"2^2 02^20'
8^ ,' • ..^V^/ /2.a,a,t.^
It^g • 27r(TiO'2
8X ^/2<r2
e
V27r.cr
which proves that the error x obeys the normal law with
S.D.=V(wVi+wV.) .... (5)
The above principle is readily extended, for if
X=w;iXi+i/;2X2+ . . . +?^^„X„,
Xi, X,, . . . X„ being independent variables obeying the normal
law, then X also obeys the normal law and its
S.D.=V(wVi+wV.+ • • . +wV\) . . (6)
In discussing the results of random sampling we worked upon
the principle that, given a number of sample observations of any
statistical constant, a mean or a percentage or a coefficient of
regression or anything else, an error or deviation as large as cr,
the standard deviation, from the true value for the whole population
might quite likely occur, but that an error exceeding 3cr would be
unlikely, and we explained that, as a result of convention, the
probable error, equal to fcr roughly, was largely used in place of a
by many writers. We have now to examine the basis of this
principle, and the first point to notice is that it only strictly applies
to a normal distribution.
To fiTid the probability of an error lying between —a and \a in a
normal distribution.
The required probability =y= — I e'^^'^'^'^dx
1 r+i
=^= e^i^d^ (where x =(7^ )
V2W1
^^(eP'^d^
V27tJo ^
=06827, by means of the tables.
This then is the probability that the error in a given sample shall
not exceed the S.D., cr. The probability that the error shall exceed
244
STATISTICS
cr is accordingly (1— 068)=032,' It therefore appears that the
odds against an error exceeding this amount are 68 to 32, or about
2 to I.
The probability of an error between —2g and +2(7
1 /•+2
V27tJ2
=09545,
and the probabiHty of an error outside these limits =00455.
Hence the odds against an error exceeding 2cr are about 21 to 1.
The probability of an error between — 3cr and +3(7
1 /• + 3
== e^'^Hi
=09973.
Hence the odds against an error exceeding 3cr are about 370 to 1 .
_
B
rf'r!!!""*, _ _ _
 *^ _ __ _ _ __
/ s^l It
.. .^ ^ 'S
P Z S I !v ,
iizx " ^ jiE:
7 : X" \ ^  "" "~~ " :" " :
/ ^ IS
/  ^ 1 "V
^ t j ^
l__ ir j "^ s^ it
__ — __ _g J ^ — ^__ — _ _ _ _
CO J •'^ "'^ F
_» J ^tik
:;;::::: ::±j:gi2i: :::::::: J S::i=iS,i
Fig. (44).
That these results are reasonable can be seen by an examination
of the curve of error
N
,a;2/2a2
the graph of which is drawn, fig. (44), in the particular case when
(7=5, N=100. The maximum ordinate is thus=20/V27r=7'98,
and the curve becomes
2/=798e*'/5o.
When x^ a^ 5, 2/=(798)(0606)=484, P^Ni in the figure.
„ a:=2(7=I0, 2/=(798)(0'135)=l08, PgN^ „
„ a=3(7=15, y=(798)(0011)=009, P3N3 „
THE NORMAL CURVE OF ERROR 245
There is a point of inflexion where the curve changes its
direction at P^, also at the companion point P'^ on the other side
of OB.
The areas ONiPjB, ONgPaB, ON3P3B, P3N3X represent respec
tively the frequencies of errors to cr, to 2ct, to Sa, Sa and over
(considering only errors on the positive side, that is, deviations
above the mean), and the figure shows how very improbable is a
deviation from the mean exceeding Scr, for the area between the
curve and axis beyond this limit is negligible. Put in another
way, a range of 6cr should include practically all the observations
in the sample.
The probable error has in the past received various names, such
as mean error, median error, quartile deviation, and although some
of these may seem more applicable and less confusing than the
name to which it has settled down, there is perhaps not sufficient
excuse for unsettling it again, even had we the power to do so,
by attempting a return to one of these old names.
If its magnitude be r it is defined to be such that the chance
of an error falling within the limits —r and +r is exactly equal to
the chance of an error falling outside these limits, in fact it is an
even chance whether a particular error falls within these limits
or not.
Since area measures frequency it follows that the ordinates
drawn through the probable errors divide both halves of the normal
curve (above and below the mean) into two equal parts ; the one
above the mean, QR, is shown in fig. (44), and consequently the
area OBQR=the area QRX, in that figure. These ordinates there
fore coincide with the quartiles, and the probable error is precisely
the same measure as the quartile deviation.
The magnitude of the error is readily calculated from the proba
bility integral table, for, by definition, we have
1 r+^
i=— =— e^'/'^'^'dx
V27T.aJr
1 f+rl<r
=7= e~^''H^ (where x=a^).
and the probability integral table at once gives
r=06745o=approximately a,
246 STATISTICS
Thus we have the frequently quoted rule that the
quartile deviation =?(standaxd deviation), . • (7)
or probable error =0*6745 (S.D.)
The probability of an error lying between — 3r and +3r
1 r+^r
2 /SCO 6745)
= —=1 e^^'H^ (where x=a^ as before)
=09570.
Thus the odds against a deviation exceeding three times the probable
error occurring in a single trial are about 22 to 1, or much the same
as the odds against a deviation exceeding twice the S.D.
There remains one other standard of measurement in connection
with errors which is at least deserving of mention, namely, what
we have previously called the mean deviation, which may be denoted
by t;. It is simply the mean of all errors without regard to sign ;
thus, since yhx measures the frequency of an error lying between
X and {x\hx)
rj=2 j xydx 2 1 ydx
= l^xe^l^'dx/ Te^'l^'^'dx
=ar$e^'lMdfe^''lH^ (where x=g^)
rco I rco
=^/2aj te'^dt/j e^^dt (where ^'^=2t*)
=v2c7[cn^
2 Jo'' 2
=aV2/7r
=07979c7,
hence the rough rule that the
mean deviation =g(standard deviation) . . . (8)
It must be borne in mind that all the above rules relating to
errors — using the term as synonymous with the deviations of single
or sample observations from the mean of a considerable number of
the same character — strictly apply, as we said before, to the normal
THE NORMAL CURVE OF ERROR
247
curve of error and are only approximately true for other distribu
tions, the approximation being the closer the nearer they approach
to the normal form and the larger the number of observations
involved. They have been tested in some cases in earUer chapters
(see, for example, Chapter VII.), and the results obtained, even
with very skew distributions of comparatively small numbers of
observations, are at all events close enough to suggest the utility
of the rules in more favourable cases.
The effect of variaf)ility on errors. The probability of an error
lying between and t
1 /•«
V2,
TT . CJ.'O
Put x—x'jm, and this becomes
1 p
,X''H2<T^)'l'^^
m
a/27t . (ma)
1 /("lO
. inujyo
Fio. (45).
Thus, if the variability be increased mfold the range of error (of
equal probability) is increased mfold, so that if we have two sets
of N observations, with the variability of one set double that of
the other, the range of error also in the one set is double that which
is equally likely to occur in the other. This is brought out fairly
clearly in fig. (45), which is the result of plotting the curve
N
2/=— =c
V27r<T
arhl<r
248 STATISTICS
in the two cases. The variability a of curve (1) is double that
of curve (2) ; if then we measure along OX in the figure
ONi=:20N2=2^,
the area B^ONiPi will be equal to the area B2ON2P2, showing that
the probability for an error between and 2^ in the one case is equal
to the probability for an error between and t in the other case.
[James Bernoulli (16541705), the eldest of three remarkable brothers,
showed how the binomial theorem could be used to estimate the probability
that the ratio of the number of successes to the number of failures under
defined conditions should lie between set limits, where success means that a
certain event happens and failure means that it f aUs to happen.
It was Gauss who first actually published a proof (1809) of the equation of the
normal curve, although Laplace had suggested as early as 1783 the utility
of a probability integral table, ^e^Ht. Gauss's proof depended upon certain
axioms which cannot be established and are not necessarily true, one of which
was that ' errors above and below the mean are equally probable.' Laplace
and Poisson improved upon Gauss and succeeded without assuming this
axiom, but with the aid of theorems due to Euler and Stirling, in developing
the continuous probability integral from the discontinuous binomial series.
Further extensions of the normal curve applicable to skew distributions
have been worked out by other writers, such as Galton and Mc Alister, Fechner,
Lipps, Werner, Charlier, Kapteyn, and finally by Edgeworth, who has contri
buted materially to the development of the idea of ' the Law of Great
Numbers.' Karl Pearson approaching the subject of skew variation from
the same point but by an original route, has discovered a complete system of
curves suitable for fitting almost all kinds of distributions in homogeneous
material, especially such as are met with in the biological world.
(See Todhunter, History of Probability.
Edgeworth, Law of Error in the Encyclopaedia Britannica (10th edition).
Pearson, Das Fehlergesetz und seine Verallgemeinerungen durch Fechner
und Pearson : A Rejoinder ; Biometrika, vol. iv., pp. 169212).]
CHAPTER XIX
FREQUENCY SURFACE FOR TWO CORRELATED VARIABLES
It may serve at this stage to widen the outlook upon the subject
of correlation for those who are able to follow it up on mathe
matical lines if we briefly consider the algebraical expression for
the combined distribution of two variables.
Let the variables be X^, Xg. They may be absolutely independent
or they may be related in some way, but in either case we shall
assume it possible to set up a onetoone correspondence between
them : thus, X^ might represent the marriage rate and Xg the
index number for wholesale prices, and we might always pair
together the X^ and the X2 which refer to the same year, as in the
correlation example in a previous chapter ; moreover this pairing
might still be effected even if there were really no other connection
at all between X^ and Xg.
If then a^i, x^ typify the deviations of X^, Xg from their respective
means (the means in the above case being derived by averaging
the figures for a number of years), it is possible to write down an
expression of the form
for determining the probability of deviations between x^ and
(o^if Sajj), X2 and (^Cg+S^Tg), occurring simultaneously (in the same
year, in the above case) ; or, to put the same thing in another way,
ySx^Sx^ would represent the proportional frequency with which
such deviations might be expected to occur together in a large
number of observations.
The frequency curve y=f{x), where ySx denotes the frequency
with which a variable with deviation lying between x and {x\8x)
from its mean value is observed in a given distribution, was repre
sented by plotting corresponding pairs of values of x and y as
points in a plane. In the expression y=Y{Xi, Xg), however, we have
three variables to consider, x^ and x^j and y which measures the
frequency of the simultaneous appearance of x^ and x^. Such a
trio may geometrically be represented by a point P {x^, x^, y) in
249
250
STATISTICS
Fig. (46).
space of three dimensions, for (xj, a^g) can first be located as a point
in a fixed plane and a height y may then be measured above this
plane as in fig. (46). Clearly as x^ and a^g vary, y also varies, and
consequently the point P moves about in space, but it moves always
in obedience to the relation
y=¥{x^, x^).
This relation is called the equation of the surface along which
P travels, showing that it holds good for
the coordinates (x^, x^, y) of any position
wl\ich the point can take up on that surface.
It is convenient, however, to use the notation
z=F(x, y)
in preference to y=F{xi^, x^ for the 'fre
quency surface,' because OX, OY are nearly
always taken to represent the axes of refer
ence in space of two dimensions (i.e. in a plane), and by a natural
extension OX, OY, OZ are taken to represent the axes of reference
in space of three dimensions, fig. (47).
We proceed to discuss the frequency surface for two variables,
and we shall start with the comparatively simple case when the
variables are completely independent.
Frequency surface showing distribution of
two completely independent variables each
subject to the normal law.
Let X, Y be the variables, and let x, y de
note deviations from their means X, Y, the
point (X, Y) being taken as origin of coordi
nates and the usual notation being adopted.
Thus the probability of a deviation between^ and (a:+Sa:) occurring
— g 0^*"
V277 . G^
and the probability of a deviation between y and (y\^y) occurring
Fig. (47).
hy
i^o
V27r . Gy
Therefore the probability of such deviations occurring together
since the variables are supposed completely independent
hx
g«2/2<r.^
hy
\V27T.G^ JW27T.G^
27rO'j.CTy
_g2/2/2<r,5
FREQUENCY SURFACE 251
Hence the frequency with which such pairs of deviations are
observed together if n be the total number of observations
Denoting this by zSxSy, we get for the required frequency surface.
z=n/27TC7^y . e ^''' ''"'^ . . . (1)
If we give y some particular value, 2/1, we find from the above
equation that the law of frequency for the corresponding x is
2i7TayXjy
\_27ra^y J
gxa/2crx2
where n^ has been written in place of
\V27r.(Ty
But this is evidently a normal curve in the plane XjOZj, having
the same mean, X, and the same S.D., erg., whatever be the value
of y^.
Hence all arrays of X are similar, having the same mean and the
same standard deviation, and this, by symmetry, also applies to
all arrays of y.
. Now put z equal to some constant, k, in equation (1), so that
k—— —6 '' '^'^
n
Since the lefthand side of this equation is constant for different
values of (x, y), it follows that the righthand side is also constant,
and hence
^+i^,=c, ... (2)
where c is a constant.
We conclude that the values of x and y which can occur together
with a given frequency, k, are such that the point {x, y) always lies
252
STATISTICS
somewhere on the ellipse (2) in the plane z^k, fig. (48) ; e.g. values
in the neighbourhood of x^^ and y^ occur with the same frequency as
values in the neighbourhood of x^ and 0, because in the figure the
points (x^, 2/i5 ^) ^nd (x^, 0, h) both lie on the ellipse defined by
z=k,
,+
The different ellipses which can be obtained by varying the
frequency, and consequently varying c, are clearly concentric,
similar, and similarly situated if they are orthogonally projected
on to the plane z=0, for the effect of such projection is that any
Fig. (48).
point (x, y, z) drops down on to the point (x, y, 0) which stands
immediately below it in the plane XOY.
The general shape of the surface can be gathered from fig. (48)
where the ellipse z^=k, and the normal curves a;=0, 2/=0, and 2/=2/i
have been drawn.
It will also be noted that if the scales of x and y are altered by
X u
writing — —x' and —=y', so that unit change in each may be the
same, the ellipse (2) becomes a circle
x'^\y'^=c.
This change of scales is equivalent geometrically to projecting
orthogonally the ellipse into a circle ; of course the planes of pro
jection are not the same as in the previous orthogonal projection
mentioned,
FREQITENCY SURFACE
553
Frequency surface for two correlated variables. Let the variables
be X and Y, and let us work as before with their deviations x and y,
whichis equivalent to taking the mean point (X, Y)of all the observa
tions as origin.
Now the line of regression giving the best y, or the y of greatest
frequency, corresponding to any x is
y=r
with the usual notation, r being the coefficient of correlation
between X and Y.
Hence the error made in estimating any y from this equation
instead of taking the y given by observation is
7]=y (observed) —y (estimated)
=yrJLx. [See fig. (49).]
Thus, corresponding to every pair of observations (ic, y) there is
an 77, and the same 77 will be repeated
just as often as the same pair of
observations (a;, y) is repeated.
Therefore the frequency distribu
tion of (a;, 7;) must exactly correspond
to that of {x, y).
Further, the correlation of the
variables x and rj is zero, for posi
tive and negative errors 77 are equally likely to occur for different
values of x; in fact, this coefficient of correlation is E{xr^)ln<jy.(T^, and
Y
V
4'
X
Fio. (49).
i:(xr^)=E\x[y
r^x
■■E{xy)
S(x')
P
=np
^^np—rvp
=0.
na.
Assuming then that the variables x and 7; are quite independent,
the probability of them occurring together is readily \^Titten do^^^l,
for it is simply the product of their separate probabilities.
254
STATISTICS
But the probability of a deviation between x and {x\%x) occur
ring, if we consider this variable alone, is
1^ .,2S,
V27r(T,
and the probability of a deviation between 7; and (tz+S?;) occurring. \
if we consider this variable alone, is \
s^^2:>
a/27
Hence the probability of a combined occurrence of such deviations
a;2
\V27r(7^ / \V27rc7^
277(7^(7,
_ 83:87;
27ror^CT„
U,2+ 02 j
27roa^,
But mj^^=E(yr'^x
2
:2;(2/2)2r . ^^ . 2:{xy)+r^2(x^)
jy
2
Similarly, no^^na^iX—^^
where f is the error made in estimating x from x=r—y
... %=<=(lr^).
Thus ^^ =L . °:v__L .^=Jl,
FREQUENCY SURFACE 255
/ 1 ." r^Gj\ 1 /, . . aJ\
and ^+!:^ =jL(i+,2.^\
1
Hence the probability of the combined occurrence of deviations
X to (x\hx), 7} to (77+ St;)
= ' « ^o„2 » otcTr,^ <rt2) •
2TTG^.ayVlf^
thus, if we denote by zhxSy the frequency of the combined occur
rence of deviations x to (x{Sx), y to {y^hy), when ri is the total
number of observations, we have *
z=
27r\/l— r^ . CjcOy
When the variables X and Y are completely independent, so that
r is zero, this reduces, as it should, to our previous result
27r(JxCry
In the surface z—fju.e ^<^^'^ '^f'^ <r««ry/ir3 . , . (3)
where /x = = , if we give y some particular value y^,
27r\/lr2 . <7^c7y
we find that the law of frequency for the corresponding x is
z=fx , e 2(1
1 (yh^x'^ gr'^J )
:/Lt.e 2(lr2)<r/ '^Va, a,/
=/x . e *"'' e
2<ry2 2(1 r2)V<rr cry/
(4)
[*" For an outline of Karl Pearson's method of reaching the Law of Frequency
for two correlated variables, and certain deductions from it, see Appendix,
Note 11.]
256
STATISTICS
But just as
y
i
(xa)2
V27rc7^
represents exactly the same normal curve as
1
y
A
Fig. (50).
A/27rC7a, I
shifted through a distance a along
the axis of x, fig. (50), so we con
clude that the curve (4) in x and
z, in the plane y=yi, is exactly the
same as the normal curve
a:2
yhlW
gior;.2(l_,.2)
shifted through a distance ry^— along an axis parallel to OX. In fact ,
CTy
(4) represents a normal distribution for x, the mean, corresponding
to greatest frequency when z=~—^, being determined by the]
intersection with the surface (3) of the planes
X y
y=yv =r, ]
and the standard deviation being a^Vl—r^, which we note is^
independent of y^, fig. (51). To put the same thing in another;
way, the array of x's corresponding to a particular value 2/1 of y\
have a mean deviating from X by r— . y^, and a standard deviation?
In particular, when y=0, z=fjbe ''<^'^^^')^ a normal distribution^
for X, the mean, corresponding to greatest frequency with z=fjby\
being determined by the intersection with the surface (3) of the;
y
planes 2/=0, — =r— , and the standard deviation being Ur^^/\—r^\
as before.
Similarly, when x=Xi, we get as in (4) a normal distribution for y.
fxe
the mean, corresponding to greatest frequency when z
determined by the intersection with the surface (3) of the planes
being]
X — Xiy
y
FREQUENCY SURFACE
257
and the standard deviation being or^Vl— r^, which is independent
of x^. In other words, the array of y'a corresponding to a particular
value Xi of x have a mean deviating from Y by r— Xj, and a standard
deviation GyVl—r^.
In particular, when x=0, z=fjLe '^y'^^^^)^ a normal distribution
for y, the mean, corresponding to greatest frequency with 2;=yLt,
being determined by the intersection with the surface (3) of the
planes x=0, —=r—, and the standard deviation being CyVl—r^.
By putting 2=some constant, k, and arguing just as we did in the
2
Fig. (51).
case of two independent variables, we find that all values of x and y
which occur together with the same frequency define points {x, y)
which lie on the ellipse
The different ellipses which can be obtained by varying the fre
quency, and consequently varying c, are concentric, similar, and
similarly situated, if they are orthogonally projected on to the
plane z=0. The planes giving the means of the x's, or the most
frequent x's, corresponding to particular values of y, and the means
of the 2/'s, or the most frequent 2/'s, corresponding to particular
values of ic, meet 2=0 in the Unes of regression
X y y
7 ,
CflJ <Ty (Jy
x
r —
258 STATISTICS
If we alter the scales of x and y by writing — ^=x' and — =?/'>
so that unit change in each shall be of the same magnitude, the
frequency surface takes the form
z=^e 2(1'^)'
(x"^+y".i  2r3fy')
When y'=0, z=fie ^(i^"^) ^ a normal distribution, the mean being
on the plane x'=ry\ and the standard deviation being Vl—r^.
Similarly for x'=0. When y'=y\, 2=jLte"*''%'2(ir2)(^''^'iV'^^ ^
normal distribution, the mean being on the plane x'=ry', and
the standard deviation being Vl—r^ as before. Similarly for
Again the ellipse which is the locus of the points {x'y') obtained
by putting 2;=constant, k, corresponding to variables which occur
with the same frequency, is (in the plane z=k) now
x'^+y"^2rx'y'=c,
and, projecting on to the plane z=0, the lines of regression are
x'=ry', y'=rx'.
These lines are the intersections with 2=0 of the planes containing
the means of the a;"s, or the most frequent x"s, corresponding to
particular y"s, and vice versa.
X 11
Since, geometrically, the transformation —=x\ —=y', is equiva
CTa, (Jy
lent to an orthogonal projection, we may learn something about
the more general ellipse by considering properties of the simpler
projected curve which are not changed by projection.
Let us first, however, find the magnitude and direction of the
axes of
x'^\y"^—2rx'y'=c.
By turning the axes through some angle 6 this equation is
reducible to the form
which is the ordinary form for an ellipse when its axes lie along
the axes of coordinates. But the equation in x\ y' is clearly
symmetrical about the lines y' ^=x' and y' ^—x\ because y' and x'
or y' and —x' can be interchanged without the equation being
affected. Hence these lines must give the directions of the major
and minor axes.
FREQUENCY SURFACE
259
To turn the axes of coordinates through an angle of 45°, fig.
(52), we must write
x' =x" cos 45°/ sin 45°=^^~j^''
V2
/' I « .//
y'=x" sin 45°+^/" cos 45^
Y'
x"+y
' V2
Fig. (52).
The equation of the ellipse thus becomes
(x"y"f , {x"+y"f ^S^"y"){x''+y'')
2r^
I.e.
i.e.
i.e.
2 ■ 2 V2V2
a;"2(lr)+2/"2(l+r)=c,
c.
x"^ y^
c c
l—r 1+r
=1.
Hence the semimajor axis is a= / , and the semiminor axis
SJ l—r
is 6= ^ / We note that as r increases from to 1, a increases
V 1+r
from Vc to 00, while h decreases from Veto . / • Also, as r decreases
from to —1, a decreases from ^/c to / — , while 6 increases from
Vc to (X).
260
STATISTICS
The ellipses, x"^\'y"^—^rx'y'=c, corresponding to different values
of r all pass through the points of intersection of 
x"^^y'"—c and x'y'=0. i
But x'^\y'^=c is what the equation of the ellipse becomes when r, ■
the coefficient of correlation, vanishes. The connection between ;
these curves is shown in fig. (53), which represents their projection 
on to the plane z— 0. A positive correlation between x and y i
might be expected to increase the y corresponding to a particular i
positive X, if the frequency be fixed beforehand, and that is the 
effect which the figure also would suggest. j
Fig. (53).
Now, in x'^+y'^2rxy=c,
the lines of regression are
y =rx , y =x ,
r
and the axes of the elHpse are
y'=x', y' =—x'
Hence the lines of regression are equally inclined to the axes of the
ellipse as well as to the axes of coordinates, fig. (54).
Further, the pair of lines
y'=x\ y'=x'
form a harmonic pencil with the pair
x'=0, y'=0,
and also with the pair
1
y'=rx\ y'=x
r
This is obvious from fig. (54).
FREQUENCY SURFACE
261
Now project back to the ellipse
^+— 2r^=constaQt.
The algebraical transformation for this is merely
Fig. (54).
Since the harmonic property is unaltered by projection we then
have the pair of lines
y _x y _ X
Gy Gx Oy CTg.
harmonic with the pair
x=0, y=0,
and also with the pair
y _ X y _1 X
Gy Gg, Gy T G y.
Hence the two lines of regression corresponding to maximum
correlation (r=+l and r=— 1) are harmonic with
(1) the axes of coordinates ;
(2) the lines of regression for any r.
Again it may be easily seen that the lines
y'=rx' and a;'=0
are conjugate diameters of the ellipse
x'^+y'^2rx'y'=c, . . . (6)
for they may be written as one equation thus :
rx'^x'y'=0,
262 STATISTICS
and this represents a pair of lines harmonic with the (imaginary)
asymptotes of (5), namely, with
x'^\y'^—2rx'y'=0.
[The criterion for ax^\21ixy\hy'^=0
to be harmonic with a' x'^ + 2h' xy{b'y^=0
is ab' iba' =2hh' .]
But it is a wellknown property of conies that any pair of lines
harmonic with the asymptotes are conjugate dianieters of the
conic.
Similarly it may be shown that the lines
y' =x' and y' =0
r
are conjugate diameters of the ellipse (5).
But, on projection, the conjugate property also is unaltered.
1/ X
Hence the lines — =r — , x=0,
II \ oc
and the lines — = , y=0
^y 'f ^x
are conjugate pairs of diameters of the ellipse
But for conjugate diameters the midpoints of all chords parallel
to either lie on the other.
Thus we come back again by another route to the familiar line of
regression theorems that, for a given r, all arrays parallel to a;=0
have their means on— =r— , and all arrays parallel to y=0 have
X 1J
their means on _=r— •
APPENDIX
1. Compound Interest Law. If the capital increases continuously,
instead of going up by jumps at the end of stated periods, the con
nection between the original principal S^, the rate per cent, per
annum r, and the amount S^ at the end of t years is given by
for the rate of increase is measured by
dB_ rS
which leads at once to the above equation on integrating.
Other instances of the same law are : —
(1) ^ particle moving against a resistance proportional to its
velocity, v^=VQe~'^\
where v^ is the velocity at time t, v^ is the original velocity, and c is
some constant.
(2) The .variation of the pressure of the atmosphere with height,
where pj^ is the pressure at height h above a surface level, p^ is the
pressure at the surface, and c is some constant.
{^) The rate of cooling, d^z=:0^e~'^\
where Of is the excess of temperature at time t of the hot body
over that of surrounding bodies, 6^ is the excess when the measure
ment begins, and c is some constant.
2. Weighted Mean. Let the observations be represented by the
different values, Xj^, x^, . . . x^, of the variable concerned, and let
the respective weights attached to these observations be /i,/2,   • fn^
so that the average, by definition,
_ a;j/i4a:2/2+ » ♦ ♦ \Xnfn
S68
264 STATISTICS
Now, suppose a different set of weights be chosen, namely,
fv /'2» • • • /'n» giving a new average
/1+/2+ • • • H"/w " ■
The difference between these two expressions
_ ^l/l + a;2/2+ • • • _ a^l/'l + a^2f2+ • • •
/1+/2+ . . . f l+/'2+ . . •
(/1+/2+ . • •)(/'!+/ 2+ . . .)
_ i(/lf2K^2)/2fl('^l^2)j + j/lf3(^l^3)/3f 1(^1^3)!+ • ' ♦
(/l+/2+ • • .)(f I+/2+ . . .)
flf2(^l^2)(^^J+/lf3(^l^3)(^^^J+ . . .
^ (/1+/2+ • • •)(/'l+/'2+ . . .)
Hence this difference is very small and the averages are very
nearly equal if the weights f^, /g, fz •  • ^^^ replaced by others
fi, /'a, fz ' ' • very nearly proportional to them, so that /i//'i,
/2//2> /s/Z's • • • are not far from equality, and this is the more
pronounced if the observations x^, x^, iCg . . . themselves are all
of the same order of magnitude and the sums of their weights,
27/ and 2*/', are large so that the expressions of ty^Q(x^—x^l(Ef){Sf')
are small.
3. Geometric and Harmonic Means. Given n numbers
a, 6, c . . .
their geometric mean, g, is defined by the formula
g=^l/(ahc . . . ),
and their harmonic mean, ^, is defined by
1=+.+'+ • • •
h a b c
We note that when a=b—c= . . . =k, say,
then g=l/(kkh . . .) = l/{k'')=k
and _=_}_[ . . . =_
ih fC i€ iC K
so that h=k.
7'
APPENDIX 265
It is worthy of remark that if the geometric mean be adopted as
average in discussing the index numbers of prices it possesses an
interestihg property which does not hold for any of the other means
in common use.
Suppose the prices of n standard commodities at three successive
dates be represented by (a^, 6^, c^ . . . ), (a^, h^, Cg . . .)> (<^3j ^3. ^3 . . .)•
Then the index numbers of the separate commodity prices at the
third date, taking the prices at the first date as standard, are
100«, 100^, 100^ . . .
a^ hi Ci
Hence the geometric mean of these n index numbers together
100?? X 100^ X 100?? X . . .
«! bi Ci
where g^, g^ denote the geometric means of the n prices at the two
dates.
It follows that the ratio
index number of prices at 3rd date with prices at 1st date as standard
index number of prices at 2nd date with prices at 1st date as standard
lOOgJgi
=9J92'
It is therefore quite ifidependent of the particular date chosen as
standard.
4. The Mean of Combined Sets of Observations. (1) Suppose one
variable x is expressed as the sum of a number of other variables,
thus a;=a+6+c+ . . .,
and suppose that we have n different values of the variables, giving
equations of the type
Xn=0'n+K^Cn+
266 STATISTICS
Hence, by addition,
so that nx=^nd\nB{nc\ ...
x=d\h\cY . . .,
where x, a, h . . . denote the means of the n values of the respec
tive variables.
Thus the mean of a sum equals the sum of the means, and, if some
of the positive signs in {a\b\c\ . . .) are made negative, there
will evidently be a corresponding change of sign in (a+6+ . . .).
Example. — Suppose 100 family budgets are collected and the
items in each are separated under five heads — rent, food, clothes,
coals and light, sundries. The expenditure, x, in each budget would
thus be expressed as the sum of five variables, a, b, c, d, e, and the
mean of the 100 different re's would equal the sum of the means of
the a's, the 6's, the c's, the d'a, and the c's.
(2) Sets of observations are mxide which differ in locality or time or
some other respect. To find the resultant mean.
Let I observations of the variable x refer, say, to one date,
„ m „ „ „ „ „ a second „
„ n „ „ „ „ „ a third „
and so on, and let the means of these successive groups of observa
tions be Xi, x^, :r^, . . . , so that we may write
Xi=I!xi/l, x^=.UxJm, x^^ZxJn, . . .
If then X be the resultant mean, we have
Zxi+2x^\ . . . _lxi+mx^+ . . .
Z+mj . . . Z+mH . . .
Example. — If the school children in the different schools of a
county are weighed, I children in one school, m in another, n in
another, and so on, giving mean weights Xi, x^, x^ .   » the
resultant mean weight for the children in all the schools combined
is then given by the above expression.
5. Mean and Standard Deviation of a Distribution of Variables.
Let Xi, X2, x^ . . . Xn denote the deviations of each value, or group
mid value, of the observed organ or character when measured from
some fixed value, and let f^, /2, fz    fn denote the observed
frequencies of these respective deviations.
APPENDIX
267
The arithmetic mean of the variables is thus given by
^ = (/l^l+/2^2+ . . . H/„^„)/(/i+/2+ . . . +/„),
referred to the fixed value as origin.
We may conveniently represent the deviations x^, x^, x^ . . . hj
lengths measured from an arbitrary origin along a straight Une,
in which case the point defines the position of the fixed value
from which the variables are measured.
Let P mark the position corresponding to a typical variable and
let G mark the position corre _ ^ ^ ^
sponding to the mean, x. Thus g g ^
OV=x, OG=:r, and if we denote "^ '^ ^
the distance of P from G by f , we have
x=x\^.
Hence
^==(/l^l+/2^2+ . . . +/a)/(/i+/2+ . . . +/n)
=[/l(^+^l)+/2(^+f2)+ • . • +/n(^+f J]/(/l+/2+ . . . +/n)
= mfl+f2+ . • • +/J+(/lfl+/2^2+ . • . +/nf«)]//l+/2+ . . . +/n)
=^+(/l^l+/2f2+ . . . +/nfn)/(/l+/2+ • • • +/„)•
Therefore {Ai,+f,^,+ . . . \fnL)=0 . . . . (1)
The expression {/liCi 4/2^2+ • • • \~fn^n) is called the first
moment of the distribution referred to as origin. We conclude that
when the distribution is referred to G as origin, i.e. when deviations
are measured from the mean of the distribution, thefirst moment vanishes.
Frequency Distribution Table.
(1) (2) (3) (4)
Deviations of Var
iables from some
fixed value.
Frequency of
Deviations.
Product of Nos.
in Col. (1) and
Col. (2).
Product of Nos.
in Col. (1) and
Col. (3).
Xq
/i
/3
fn
to
f^2
to
f^\
f^\ .
"
N
■N'l
N'a
In the notation of the above table, where the dashes are omitted
in Nj, N2 when the mean is origin, we have
;c=N'i/N and Ni=0.
268 STATISTICS
Again, the rootmeansquare deviation, s, measured from the
arbitrary origin 0, is given by
■ «'=(A^\+AX\+ . . . +/nX„^)/(/i+/2+ . . . +/n)
=N',/N,
and N'2 is called the second moment of the distribution referred to
as origin.
Substituting as before we have
_ xHf,^ . . . H/J + 2:^(A^1+ • . . +/ngn)+(/lfl+ . . . +fnL')
(/i+ . : . +/n)
=^'+(/lf 1+ . . . +fnL')l{fl+ . . . +/n),
since /i^i+ . . . +/„fn=0.
Hence 8^=x'+g\ . . . (2)
where a is the rootmeansquare deviation measured from G as
origin, or the standard deviation as it is called.
From this result it is clear that o is always less than s, or the root
meansquare deviation is least when measured from the arithmetic
mean.
Generally, if we write
^'*'=(/AH • • • +/a')/(/i+ • • • +/„).
V,c=(fA^+ ■ ■ ■ /nL')/(/l+ • • • +/n),
where E{fx^) and Z{f§^) may be called the A;th moments referred to
and to the mean as origins respectively, so that vi=0, v<i=a^,
v\=s^, we have
=vu^hv^^x^ ^^~ V fc2 . ^24 . . . J^^,
For example, when A; =2, since 1/0= 1 ^^^ 1^1=0,
v^^v\Ti' . . . (2) bis
Again, when A;=3, v^=^v\—^i'^—y^ . . • (3)
and, when ib=4, v^=v\—^v^—^v^—y.^ . . (4)
There are interesting statical analogues to the above results
concerning the mean and standard deviation.
APPENDIX 269
Let us imagine a set of weights, /^ /g, /g . . . suspended at
Pi, P2, P3 . . . from, a straight horizontal bar, and let the distance
of any typical weight / from some arbitrary origin on the bar be x.
Then the first moment,
/l^l+/2^2+ • • • ^h^n
(where some of the a;*s may be negative corresponding to weights
suspended to the left of 0) measures the total turning effect of all
the given weights about 0, and if we further imagine all these
weights replaced by a single weight ^^v ^ ^
equal to their sum (/1+/2+ . . • ^ — ^— ^ Tp —
4/n), then, in order to produce X
the same turning effect, it would /
have to be placed at a point G, the distance of which from
is given by
^(/l+/2+ • • • +/n)=(/l^l+/2»^2+ • • • +/n^„).
Thus x={S^x^\Ux^^ . . . +/„:rJ/(A+/2+ . . . +/J,
and, statically, this defines the position of the centre of gravity of
the given weights, /i, /g, . . . /„, relative to 0.
As before, x=Sf(x{^)ISf
hence fiii+M^^ • • • +fnL=0,
and, statically, this means that the turning effect of /j, /a . • • /«
about G is zero, in other words, the bar would balance freely about G.
Again, the second moment,
/l^ l~l~/2^ 2+ • • • ~\~JnXn 5
measures the moment of inertia of the weights /i, fz    fn about 0,
and, if we imagine these different weights replaced by a single
weight (/1+/2+ . • . +/«) as before, the moment of inertia will
be unaltered if the latter be located at a distance 5 from 0, where
(/1+/2+ . . . +fn)s'={fix\+f,x\{ . . . +/„:r„2);
therefore s^=(Ax\+ . . . +fnXn')l(fi+ • • • +/«)
=i:f(x+irii:f
=x^+g\
as before, and the interpretation of this is that the square of the<
radius of gyration of the system of weights about equals the
square of the radius of gyration about G, the centre of gravity of
the system, together with the square of the distance of G from 0.
Also, 5 is clearly least when it is measured from G.
e
o
.r,
>^i
X
f
X,
X
1
270 STATISTICS
6. The Mean Deviation a Minimum when measured from the
Median. Consider first the case when only two different values of
the variable are observed, X^, Xg, and let their deviations from an
arbitrary value, 0, chosen as origin, be respectively x^, x^.
If /i, /g be the observed frequencies of these values, the sum of
their deviations from is
which is clearly less when the
value lies between X^, Xg
than when it is smaller or
greater than both of them.
7 7 Choosing 0, therefore, be
^ X ^ tween X^, Xg, if /i be the
greater frequency we write the deviation sum
=f2^+{flf2K,
where x is the deviation of either of the values X^, Xg from the
other, and (/i— A) is positive since /i>/2.
Now this is evidently least when (fi—f<^x^ vanishes, i.e. when
(1) x^=^, in which case coincides with X^, the more frequent of
the two variables, or, when (2) /i=/2, and in this case, when the
two observed values occur equally often, the deviation sum is
constant for any origin between X^ and Xg.
When several different values of the variable are observed, they
may be arranged in order of magnitude, X^, Xg, Xg . . . X„, from
the least to the greatest, with frequencies f^, /g, /s • • • fu
ll fi>fn we pair off f^ of the X„'s with /^ of the X^'s ; the devia
tion sum for this pair is least and remains constant when measured
from any origin between X^ and ^x X X«iX«
X„. We next pair off some or all 4 ^ i ^' ^
of the Xi's which remain against ' ^ ^
an equal number of X„_i's and the deviation sum for this pair is
least and remains constant when measured from any origin between
Xj and X„_i. If some X^'s still remain, we pair them off so far
as we can against an equal number of X^.g's but, if it be X„_i's
that remain, we pair them off against an equal number of Xg's.
This process can evidently be continued until ultimately we
reach the origin from which the mean deviation of the whole
distribution is a minimum, for if any X be left unpaired the origin
will coincide with that X. Otherwise, the deviation is least when
APPENDIX 271
measured from any value between the last two X's paired off '
together, and within that range it is constant.
Since, by definition, the median is the value of the variable half
way along the series of given observations, ranged in order of their >
magnitude and assigning each its due weight or frequency, it is \
clearly such that a balance can be effected by pairing off the values I
on either side of it against one another in the manner explained
above ; it therefore follows that the mean deviation of a frequency
distribution is a minimum when the deviations are measured from ;
the median. ^x^
The statical analogy to the median also is worth noting. With j
the same notation as before, the moment or turning effect of two
forces, /i, /2, about is ^ .v ^ i
But in this case, if be taken / f
at some point in between X^ y ^ !
and Xg, since the mean devia  <^.^ ^^ y
tion sums the separate devia x^ O I ;
tions without regard to sign, v
we must imagine /^ reversed 4 \
so as to produce a turning effect in the same direction as before, i
The moment will then be still {fiX^+f^^^^ ^^^ i* is ^^^^ when j
occupies such a position than when it is on X^Xg produced in 
either direction. I
Taking 0, therefore, somewhere in between X^ and Xg, the moment 
may be written \
=/2K+^2)+^i(/i/2) ; ;
and, iffi>f2, this is least when x^ vanishes, that is, when coincides I
with Xj, but if /i=/2, the two forces constitute a couple, and the \
moment is the same whatever position occupies between Xj :
and Xg. i
7. The Method of Least Squares. To the student who is un
acquainted with the differential calculus, the following descriptive <
argument, the basis of the principle of least squares, for determining 
the values of m and c which make ■
(ma;i+c2/i)2+(ma;2+c2/2y'^+ • . . +(wa;„+c2/„)2 ... (1) 
a minimum, may prove instructive.
Let us call the above expression E and let us suppose that different j
values are given to m while c remains unchanged ; in that case E j
272
STATISTICS
will vary with m, and we might imagine the different values obtained
for E plotted against the corresponding values of m giving a curve
of some type. Such a curve may rise and fall in wavelike fashion
as in the figure, resulting in maximum points like A and C, and
minimum points like B, where we define a maximum point to be
such that, as we move away from it along the curve, whether to
left or right, the size of the ordinate (and therefore the value of E)
decreases ; likewise, a minimum point is such that, as we move
away from it, the ordinate (and therefore also E) increases. In
the neighbourhood of such points it is clear that the size of the
ordinate, such as Aa or B6,
changes so slowly as to be
practically stationary.
Suppose then that m and
(mf/x), fj, being very small,
are two values of m respec
tively at and near a minimum
position on the curve, i.e. a
position like B corresponding
to a minimum value for E.
Since E near such a point
does not differ appreciably from E at such a point, we may prac
tically equate the two expressions obtained for E by substituting
(m\fjb) and m respectively for m in (1), thus
(m+/ta;i+c2/ir+(m+/xa;2+c2/2)2+ • • .
=(ma;i+c2/i)2+(ma;2+c2/2)2+ . . .
=(ma;i+c2/i)2+(ma:2+c?/2)2+ . . .
[{mx^\c—yif+2fiXi{mx^{c—yi)\fjL^x\]{ . . .
=:(ma;i+c— 2/i)2+ . . .
Thus [2xi(mXi\c—yj)+ixx^j]\ . . . =0.
Now, the smaller we take /x, the nearer to the truth does this
result become. Hence, by making fi tend to zero, we are led to
the strictly true relation
a;i(ma;i+c— 2/i)+ ... =0.
This is one of the equations in the text. To obtain the second,
we keep m constant and vary c.
Suppose c and (c\y) are two values of c at and near a minimum
APPENDIX 273
position on the curve ; then, equating the two corresponding
values of E, we have as before
(maJi + Cfry— 2/i)2+ . . . =(^^^_^c2/i)2+ . . .
(ma:i+c2/i+7)2+ . . . ={mXi\cy^)^{ . . .
[{mXj^^cyJ^+2y(mXi\cyj)+y^]+ . . . =(mxi+cyj)^+ . . .
Thus [2(m:^,+c2/i)+7]+ ... =0,
and, proceeding to the limit when y tends to zero, we reach the
other equation in the text, namely,
(ma^i+c— 2/i)+ ... =0.
[The Method of Least Squares came first into prominence in
Astronomy in connection with the determination of the best value
to take when a number of observations, apparently equally reliable,
give results not quite in agreement. If, for instance, x be the true
value of some variable, and if x^, ajg, x^ . . . x^ he the results of
n observations, the method of least squares assumes x to be given
by making
y^ix—xj^iix—x^y^^ . . . ^{x—x^f
a minimum.
Now — =2(a;— a;i)+2(a;— iCg)^ • • • +2(ic—a;J, and this vanishes
dx
when {x—xi)^(x—X2)\ . . . \{x—Xn)=0,
i.e. x=(Xj^{X2+ . . . +Xn)/n,
so that in this case we are led to the ordinary arithmetic mean of
the n observations as the best value.
The method was used by Gauss as early as 1795.]
8. To prove
r+co
Jco
■^'6X=V7T.
Let
r+co
1= e^dx;
Jco
thus, also.
r+co
1= e'Hy;
JCO
therefore,
r+co f+co
P=/ e^dx e^'dy
Jco Jco
r+co r+co
. I e(^+y'^dxdy
J co Jco
rco r2n
e'\drd6
Jr=oJ0 = O
274 STATISTICS
(by changing to polar coordinates)
= e'^rdr\ dd
Jo Jo
=[?]:m:
„ i:
=(i)(27r).
.+00
Hence 1=1 er^dx=^/'n.
Jca
9. To prove : —
(1) r(n+l)=nr(n). (2) B(m, n)=^?^l \
r(m+n)
rco j
(1) r(w+l)= x^'erHx
Jo \
rco I
=— a;«c^(e^) i
Jx^O
=^r(7i), j
because the expression in square brackets vanishes at both Hmits. ^
/•CO rco '
(2) r(m)r(7i)= e^^'^Hil e^Tj^'^d'n \
Jo Jo
= ( e^x^^'^2xdx\ ey''y'^'''^2ydy,
Jo Jo I
where x^=i, y^=7]. \
Hence r(m)r{n)=4:( f e^^+y'^x^'^^y'^''Hxdy \
Jo Jo •
= [ f " eVm+2n2 cos^"*"!^ sin^^i^ rdrdO \
Jr=oJe=o
(by changing to polar coordinates) . •
Thus T{m)T(n) = T e''\^^+^^^dr j ^cos^'^'W sin^'^Wdd i
where p=T^ a,nd k=sin^B; \
therefore, r{m)r(n)=r{m\n)'B{n,m) \
=r(m+7i)B(m, 7i) 1
by symmetry. i
APPENDIX
275
10. Elementary Method of Testing the Probability Integral Table.
The reader may find more satisfaction in using the probability
integral table if he tests for himself one or two of its results by
means of squared paper or in some other way.
We have seen that the probability of an error between and g^
is given by the expression
^^di.
V2ttj
Put ^=V2Xf and this becomes
If" f I f'
^ e^dx= I e^dx/ I e'^dx, by Note (8)
i/^Jo Jo / Joo
=area OBPN/area A'BA, in the figure.
■+00
V.
Now the graph of y=e~^ is drawn in fig. (40) of the text, and it
is possible therefore to get an
approximation to the above
result for any value of x by
counting the number of small
squares in that figure enclosed
by the areas corresponding to
OBPN and A'BA respectively.
Each complete small square
may be reckoned as 1, and each ^ .^,2
portion of a square may be
reckoned as 1 if it exceeds half a square and as zero if it is less
than half a square.
This gives, for example,
1 /•025
VttJO
«^^a;=98/707 =0139,
whereas the tables give 0138.
For a value like a; =071, count the squares in the usual way
between curve, axes, and ordinate a; =070 ; then add to the result
onefifth of the number of squares in the small slice of area between
curve, axis, and ordinates a; =070 and .r=075. We get
1 roTi
e^dx=2U)ll01 =0339
as compared with 0342 from the tables.
These results are not unsatisfactory considering the rough nature
of the method followed to obtain them.
276 STATISTICS
11. Bravais' Law of Frequency in the case of two Correlated
Variables with certain Deductions therefrom— [based on Professor
Karl Pearson's memoir, Regression, Heredity and Panmixia {Phil.
Trans., vol. 187a, pp. 253318)].
Consider two variables whose deviations, x and y, from their
respective means are due to a number of independent causes, the
deviations in which from their means can be quantitatively denoted
by 61, €2, . . . 6^.
We assume that each e deviation is so small compared to the
mean value from which it is measured that x and y can be sensibly
expressed as linear functions, thus
x=a^e^+a^e^\ . . . \a^,,e^ . . . (I)
2/ =6^6,+ 6262+ ... +6^6^ . . . (2)
(Some of the a's and 6's may be zero, and if x only involved, say,
^1) ^2 • . • €fc, and y only involved e^+i . . . e^, then it would be
natural to expect no correlation between x and y.)
We further assume that each e varies according to the normal
law with S.D. a with appropriate suffix.
Equations (1) and (2) show that the same x and y may arise in a
multitude of different ways obtained by varying the e's so that
their weighted sums (the a's and 6's being the weights) remain
unaltered. The probability that the particular deviations Ijdng
between
^1^(^1+^61), 62.^(62+862), . . . e^S^^\he^)
shall concur, since they are all independent, is
z=
^^I__ ge3i/2<r2i ) , . / _^f«^gem2/2crm2
,C7iV27r / ' ' ' WV27T
But, writing
a3e3+ • • • +<^mem=a» ^3^3+ • • • +^m€m=ft
equations (1) and (2) become
a^ei^a^e^{(a—x)=0
6161+6262+ (iSi/)=0.
Therefore — .
61 _ 62 _ 1
0'2(?y)h(<^^) 6i(aa;)ai(^2/) a^h^aj)^
And, for any function z/,
J J J J \0€i O62 O62 061/
= (aib2—a2bi)jjvdeid€2'
APPENDIX 277
Hence
_ BxSy e U^a^ • • • +2<r„2;
g~ 2cr2i(oi&2a2bi)2 " 2ayiaib2  cu^hyi g^ . . . 8e .
The total probability for deviations between x^{x\Sx) and
y^{y+Sy) is obtained by integrating z between limits — 00 and foo
for all the e's from 63 to €„j, and it is not very difficult to see that
this will ultimately lead to an expression of the form
C . 8x8y . e("^^+^^^y+^y'').
This is Bravais' Law of Frequency.
To find the meanings of the constants a, b,h. The total probability
for a deviation between x^{x\8x) associated with any deviation y is
=:C8xj
00
But if a; be subject to the normal law, the probability for a devia
tion between x^(x\8x) is
V27r . (7a, '
where oa. is the S.D. of x independent of y.
Comparing these two results, we have
if r=—h/Vab.
Similarly, l/2<7/=(abh^)/a=b(l0,
so that h=— rVab=— r/2cT,(Ty(l— r*).
Again, we may integrate z for all values of x and y, and so get
the total frequency, N, of the (a;, y) pair.
/+00 r+co
^Ao^+2kxy^rmdxdy
00 Jco
=Gy/nTb\^"e^'^^'^"dx
Jco
^CV7T]bVWlbl{abh%
278
STATISTICS
Hence
7T
=V[«6(lr2)]
_ N
2770r,(7,V(l^')
Thus
1 rofi 2ra^ 3/21
where C has the above value.
It still remains to interpret r and to see that it is really the
coefficient of correlation as defined in Chapter x. For this purpose
let us suppose we have observed n pairs of associated x's and 2/'s,
namely
(^l2/l). (^2^2) • • • {^nVn)'
The probability for such a concurrence, taken along with a given
value for r and assuming the observations independent, is pro
portional to
1 1 p% 2ra;iyi yi^i •. 1 T^ 2r!CwVn yn^l
e~2(lr2)La:r2 <r;c<rs, +crj,2j X y g 2(lr2)Lo^ o;roy +oy2J
V(ir^) V(i»')
1 1 r2a:2_2rSa:j/ 2l/2"l
:= g ~ 2(1  r2)L<rx^ tr;t<ry "^ <ry2 J
(l_^2)«/2
_(l_,2)n/2g.7rbjf^2^»
where /c=.Exy/nagjcjy
=e
_log(lr2)^^(l Kr)
Now the probability of this particular distribution is greatest
when
J log (lr^)+p'^
is least, and, differentiating with respect to r, this leads to
2r {lr^){K)+2r{lKr) _^
^172 (172)2
i.e. r(lr^)Kilr^)+2r(lfcr)=0,
i.e. —r\r^—K{Kr^{2r—2Kr^=0,
i.e. (r/c)(l+r2)_0.
It is not difficult to show that r=K gives a minimum ; hence the
required probability is a maximum and we get the best value for
the coefficient by taking
APPENDIX 279
CERTAIN CURRENT SOURCES OF SOCIAL STATISTICS
Any one who is anxious to get reliable figures bearing upon some
social matter is somewhat at a pause unless he is thoroughly con
versant with all the statistical ramifications of Government autho
rities, local and national, of trade unions, friendly societies, and
hosts of other bodies of a public or semipublic character.
While recognizing the lavish outpouring of statistics of all kinds
upon a multitude of diverse topics every year, and appreciating the
immense care and patience shown by those who are responsible for
their collection and preparation, one cannot but deplore the lack
of any coordinating principle in general between one body and
another either in deciding what statistics shall be collected, by
whom and when they shaU be collected, or how afterwards they
shall be tabulated and presented to the public. Too often a narrow
minded jealousy prevents one authority from consulting with
another, and such cooperation as does exist is due largely to the
efforts of able and enlightened individuals. The result is that a
vast amount of labour and expense goes waste and the loss to the
public is incalculable, but the public do not care, and they do not
care because they do not know.
At present, to quote from an influential petition on the subject
recently presented to His Majesty's Government, * It is almost
universally the case that any serious investigation is reduced to
roughly approximate estimates in relation to some factor which is
essential for its result. ... It is not too much to say that there is
hardly any reform, financial, social, or commercial, for which adequate
information can be provided with our present machinery.' But
this state of things would be partly remedied by adequate control
such as might be secured by the establishment of a central statis
tical office with a minister in charge who should be responsible for
unification so far as possible in the collection, tabulation, and issue
of all public statistics.
It is scarcely possible for a single private individual to make
a quantitative investigation of any social question on a large enough
scale to produce results of real value ; conspicuous instances like
Booth and Rowntree might seem to be exceptions to this rule, but
even they had a number of workers acting under their direction,
without whose aid their task would have seemed almost hopeless.
280 STATISTICS
For such statistics as we have we are therefore dependent upon
Government departments, local authorities, public officials, trade
associations representing employers or labour, public companies,
and so on. The reader who wishes to get some idea of the extent
and the limitations of official British statistics is referred to the
admirable introductory chapters of Bowley's Elements of Statistics.
Here we cannot do more than mention a very few of the most
important sources whence such statistics are derived.
The most voluminous of all our records is probably the Census
of the Population which is taken every ten years. Its scope is but
faintly realized by enumerating the chief subjects on which the
Registrar General asked information from each householder in 1911,
namely :
(1) Numbers and Geographical Distribution of the Population.
(2) Nationality and Birthplace.
(3) Numbers at Different Ages, Male and Female.
(4) Numbers Single, Married, and Widowed.
(5) Sizes of FamiHes, including Children Dead.
(6) Numbers engaged in different Professions and Occupations.
(7) Numbers Blind, Deaf, Dumb, not in their Right Mind.
(8) Numbers occupying Dwellings of Different Sizes as measured
by the Number of Rooms.
This may seem an ambitious scheme when it is stated that the
mere enumeration of the people was successfully opposed less than
two hundred years ago as ' subversive of the last remains of EngUsli
liberty and likely to result in some public misfortune or an epidemi
cal disorder,' and the first census was only taken in 1801. [See
Article in the Encyclopaedia Britannica on the subject.]
The results of each census are published in bulky volumes as
soon as they can be reduced and tabulated, a process which, of
course, takes a considerable time even for an army of workers
with calculating machines and every modern device to faciUtate
their progress. It is to be regretted that more is not done to
advertise so valuable a record of work by publication in a cheap
and attractive form of a summary of matters which vitally affect
the good of the commonwealth. As it is, the census volumes tend
to be purchased only by pubUc authorities and officials who require
to use them occasionally as books of reference.
Neglect of the blandishments of advertisement — to be commended
in general because such neglect is somehow associated with the
presentation of all truth — may be perhaps carried too far in the
issue of statistics.
APPENDIX 281
It will be noted that in the periodical census no mention is made
of wages though the people are classified as regards occupation,
and for information upon this point we must turn to another source.
The last general census of wages was taken in 1906, following
and improving upon an earlier inquiry twenty years before, but,
in connection with an inquiry by the Board of Trade into the cost
of living of the working classes, information was collected as to
rates of wages in 1912 of workpeople in certain occupations in the
building, engineering, and printing trades, these being selected as
industries common to most towns, and because the time rates of
wages paid in them are largely standardized.
The 1906 inquiry into earnings and hours of labour, unlike the
decennial census, was conducted on a voluntary basis and was
never wholly completed. In brief it set out to discover from
employers : —
(1) The Numbers of Workingpeople Employed in Various
Occupations, distinguishing Men, Women, Lads, and Girls.
(2) The Nature of the Work done and the Rates of Wages Paid,
distinguishing Time Rates from Piece Rates.
(3) The Hours Worked, distinguishing Under or Overtime from
Normal Time.
The ground actually covered by the inquiry embraces the fol
lowing trades : Textiles, Clothing, Building and Woodworking, Public
Utility Services, Metal, Engineering, and Shipbuilding — in 1906 ;
also Agriculture, and Railway Service — ^in 1907 ; the reports upon
these trades were published separately at different dates between
1909 and 1912, and the following trades were bulked together in
one volume, pubHshed in 1913 — Paper and Printing ; Pottery,
Brick, Glass, and Chemicals ; Food, Drink, and Tobacco ; and
Miscellaneous Trades.
The Cost of Living Inquiry of 1912 was in continuation of a
similar inquiry in 1905, which in addition compared conditions in
the United Kingdom and certain foreign countries. It dealt not
only with wages but also with rents and retail prices.
The report states that ' particulars as to the rent and accommo
dation of tjrpical workingclass dwellings were obtained from
officials of local authorities, surveyors of taxes, house owners and
agents, and by housetohouse inquiry.' Also * returns of the
prices most generally paid by workingclass customers for a number
of specified commodities were obtained in each town by personal
inquiry from a number of retailers engaged in workingclass trade.'
Since then Lord Sumner's Committee and a Committee of the
282 STATISTICS
Agricultural Wages Board have examined the change in the cost of
living between 1914 and 1919, as evidenced by a number of house
hold budgets collected from among urban working classes and
workers in rural districts respectively.
One other highly important inquiry carried out by the Board of
Trade deserves notice, namely, the First Census of Production of the
United Kingdom (1907).
The published report shows : —
(1) The total Net Output in Money Value for each Trade Group
in each Industry.
(2) The Number of Persons Employed in each Trade Group
(salaried persons and wageearners exclusive of outworkers).
(3) The Net Output per Person Employed in each Trade Group
as deduced from (1) and (2).
(4) The Horsepower of Engines in Mines, Quarries, or Factories
Employed in each Trade Group.
It is explained that the term ' net output ' here represents the
value of the aggregate output of the factories, etc., from which
returns were received in each trade group, after deducting the cost
of materials purchased from factories, etc., not included in the
group, or supplied by merchants or others not making returns to
the Census of Production Office.
Valuable as the results of these inquiries undoubtedly are, they
would be of still more value were it only possible satisfactorily to
collate the various returns of population, wages, and production.
No record of wages was included, for example, in the Census of
Production statistics, and it is quite impossible to deduce the number
of wageearners and those dependent upon them in any trade at
any given time.
Apart, however, from such special inquiries as we have instanced,
and the tenyearly census of the people, there are other periodical
records issued which provide us with valuable information. The
Ministry of Labour, until recently a special branch of the Board
of Trade, charged with the duty of keeping in touch with labour
conditions, issues each month a Labour Gazette giving particulars
relating to the state of employment in the principal trades in the
United Kingdom based on returns from employers, trade unions,
and employment exchanges, besides information concerning trade
disputes, changes in wages and hours, the course of prices, railway
traffic receipts, foreign trade, etc. The Board of Trade also pub
lishes weekly a Journal and Commercial Gazette dealing with matters
of interest to all who are engaged in commerce or finance ; while a
APPENDIX 283
Monthly Bulletin of Statistics of production, trade, finance, employ
ment, etc., at present issued under the name of the Supreme
Economic Council, is an important recent addition to our knowledge
of international statistics.
Again the Registrar General makes a quarterly return and annual
summary of births, marriages, and deaths in the different counties
of England and Wales, and of births, deaths, and infectious diseases
in certain large towns. In each public health area the medical officer
reports periodically upon the hygienic condition of the district and
the health of the people under his care. The Board of Education
is answerable for conditions in the schools, and the Home Office
in factories and prisons ; they report from time to time. The
Ministry of Health similarly issues returns relating to pauperism
and to housing, while the Board of Agriculture and Fisheries registers
the acreage under crops and the number of Uve stock in the United
Kingdom, and the Commissioners of Customs record the expansion
or contraction of foreign trade.
In addition we have the endless accounts and statistics suppUed,
some voluntarily and some compulsorily, by municipal bodies,
public companies, banks, trade associations, cooperative societies,
insurance companies, trade unions, etc.
And yet, in spite of all this wealth of statistics, some surprising
gaps occur, as we have already seen, in important particulars
which cannot be traced. We shall quote only one more instance
of such a hiatus — the incometax returns provide a basis for measur
ing that part of the national income which is subject to taxation,
some idea also can be formed of what the wageearners receive,
but as to the earnings of the portion of the community falling in
between these two classes we are entirely ignorant. It is possible
that war conditions during the years 191419 may have vastly
increased the knowledge of the Government as to some matters
such as internal resources and inland trade, of which little was
known before, but, if so, the public, whom it concerns so closely,
have not yet been permitted fully to share in this advantage.
For an excellent summary of labour statistics compiled or col
lected by the Government the reader is recommended to consult
the Annual Abstract of Labour Statistics of the United Kingdom,
published in the past by the Labour Department of the Board of
Trade.
*\ix
284
STATISTICS
A NOTE ON TABLES TO AID CALCULATION '
The short tables which follow are only inserted as specimens, as
it is expected that the reader who wishes to make extensive use
of such tables will have access to the fuller ones to which reference
is made below.
100
Fio. (55),
Probability Integral Table, giving area of curve z
terms of corresponding abscissa, see fig. (55) : —
V27T
•00
•10
•20
•30
•40
•45
•50
•55
•60
•65
•70
•71
•72
•73
•74
•75
Ul + a)
•50000
•53983
•57926
•61791
•65542
•67364
•69146
•70884
•72575
•74215
•75804
•76115
•76424
•76730
•77035
•77337
•00000
•07966
•15852
•23582
•31084
•34728
•38292
•41768
•45150
•48430
•51608
•52230
•52848
•53460
•54070
•54674
•76
•77
•78
•79
•80
•85
•90
•95
100
105
110
150
200
250
3^00
350
Ul + a)
•77637
•77935
•78230
•78524
•78814
•80234
•81594
•82894
•84134
•85314
•86433
•93319
•97725
•99379
•99865
•99977
•55274
•55870
•56460
•57048
•57628
•60468
•63188
•65788
•68268
•70628
•72866
•86638
•95450
•98758
•99730
•99954
Fig. (56), the result of plotting a against , enables us to estimate'
the probability of an error Ijdng between any two limits.
APPENDIX
285
Table giving P, to test *
values of n' and ^ : —
goodness of fit,' corresponding to certain
n'
7
x2>4
5
6
7
8
9
10
11
12
13
14
15
•67668
•54381
•42319
•32085
•23810
•17358
•12465
•08838
•06197
•04304
•02964
•02026
8
•77978
•65996
•53975
•42888
•33259
•25266
•18857
•13862
•10056 ' ^07211
•05118
•03600
9
•85712
•75758
•64723
•53663
•43347
•34230
•26503
•20170
•15120 11185
•08176 , 05914
10
•91141
•83431
•73992
•63712
•53415
•43727
•35048
•27571
21331 16261
•12232 09094
11
•94735
•89118
•81526
•72544
•62884
•53210
•44049
•35752 i •2850() 22367
•17299 : 13206
12
•96992
•93117
•87336
•79907
•71330
•62189
•53039
•4432(3 36264 •29333
•23299 18250
18
•98344
•95798
•91608
•85761
•78513
•70293
•61596
•52892 44568 36904
30071 i 24144
14
•99119
•97519
•94615
•90215 i ^84360
•77294 ^69393
•61082 52764 44781
•37384
•30735
15
•99547
•98581
•96649
•93471 ^88933
•83105 ^76218
•08604 j 60630 i 52652
44971
•37815
One of the earliest tables of the probability integral appeared in
Kramp's Analyse des Refractions (Strasbourg, 1798), where the
calculation of j^e'^Hx was given to eight places from x=0 to a; =3
at intervals of 001. Tables more recent and extensive are those
due to J. Burgess {Trans. Roy. Soc. Edin. 1900) and to W. F.
Sheppard (Biometrika, vol. ii., pp. 174190). Of these the latter
I 00
•50
::iS
■apftW
it
fnamm&l _ .a. 3^
5r
m
2'Sd
50
. i_oq ^
160
Fig. (56).
200
250 f
is reproduced in the admirable Tables for Statisticians and Bio
metricians, edited by Karl Pearson (Camb. Univ. Press, 1914), and
the same volume also contains Palin Elderton's P Tables for testing
' goodness of fit ' which first appeared in Biometrika, vol. i., and
Duffell's Tables of the Logarithms of the T Function from Biometrika,
vol. vii., besides a large number of other valuable tables.
It should be remarked in connection with the lastnamed table that
the formula T(x\\)=x T{x) enables us to reduce the calculation
of any T function to one in which x lies between 1 and 2, by repeated
applications of the logarithmic relation, thus
logr(a;+l)=log a:+log T(x)
=log a;+log (a:l)+log T(x\),
286
STATISTICS
and so on. When x is large, however, say greater than 10, the
wellknown approximate formula
(see, for instance, Whittaker's Analysis, § 110) will be found useful,
and it may also be written
log ^:(^±i)=0.3990899+«:2??l!^+^ log x,
x^e^ X
a form often convenient.
It may be of service to record here the values of a few constants
which frequently recur for speedy reference :
6=2718 2818
7r=3141 5926
logio 2=0301 0300
i = 0367 8794
e
logio7r= 0497 1499
logio 3=0477 1213
logio 6=0434 2945
logio(logioe) = 1637 7843
logio^^ 1600 9101
V27r
The statistician who has Pearson's Tables, Barlow's Tables of
Squares, ate, together with a good set of Tables of Logarithms
(unless he is so fortunate as to have a mechanical calculator, for
instance a Brunsviga, at his disposal) and of Trigonometrical
Functions such as Chambers's SevenFigure Tables, may consider
himself amply provided for serious research and decidedly better
off than his predecessors who prepared the way for him by doing
great work with much poorer tools.
Edinburgh : Printed by T» and A. Constabi.e LtO;
Uniform with this Volume
BELL'S MATHEMATICAL SERIES
(Advanced Section)
General Editor: WILLIAM P. MILNE, M.A., D.Sc.
Professor of Mathematics, Leeds University
AN ELEMENTARY TREATISE ON DIFFERENTIAL
EQUATIONS AND THEIR APPLICATIONS
By H. T. H. PiAGGiO, M.A., D.Sc, Professor of Mathematics,
University College, Nottingham ; formerly Senior Scholar of St.
John's College, Cambridge. Demy 8vo. 12s. net.
The earlier chapters contain a simple account of ordinary and partial
differential equations, while the later chapters are of a more advanced
character, and cover the course for the Cambridge Mathematical Tripos,
Part IL, Schedule A, and the London B.Sc. Honours. The number of
examples, both worked and unworked, is very large, and all the answers
are given.
* With a skill as admirable as it is rare, the author has appreciated in every
part of the work the attainments and needs of the students for whom he
writes, and the result is one of the best mathematical textbooks in the
language.' — Mathematical Gazette.
A FIRST COURSE IN NOMOGRAPHY
By S. Brodetsky, M.A., B.Sc, Ph.D., Reader in Applied Mathe
matics at Leeds University. Demy 8vo. los. net.
Graphical methods of calculation are becoming of ever greater importance
in theoretical and industrial science, as well as in all branches of engineering
practice. Nomography is one of the most powerful of such methods, and
the object of this book is to explain what nomograms are, and how they can
be constructed and used. The book caters for both the practical man who
wishes to learn the art of making and using nomograms, and the student
who desires to understand the underlying principles. It is illustrated by
sixtyfour figures, most of which are actual nomograms, their construction
being analysed in the text. In addition, there are numerous exercises
illustrative of the principles and methods.
' A good introductory treatise . . . calculated to appeal to the student who
desires to make early practical use of the knowledge he acquires.'
The Mechanical World,
ELElNiENTARY VECTOR ANALYSIS
With application to Geometry and Physics. By C. E. Weatherburn,
M.A., D.Sc, Lecturer in Mathematics and Theoretical Physics,
Ormond College, University of Melbourne, University Evening
Lecturer in Pure Mathematics. Demy 8vo. 12s. net.
This book provides a simple exposition of Elementary Vector Analysis,
and shows how it maybe employed with advantage in Geometry 'and Mathe
matics. The use of Vector Analysis in the former is abundantly illustrated
by the treatment of the straight line, the plane, the sphere and the twisted
curve, which are dealt with as fully as in most elementary books. In
Mechanics the author has explained and proved all the important elementary
principles.
LONDON: G. BELL AND SONS, LTD.
BELL'S MATHEMATICAL SERIESContd.
(Advanced Section)
THE ELEMENTS OF NONEUCLIDEAN GEOMETRY
By D. M. Y. SOMMERVILLE, M.A., D.Sc, Professor of Mathematics,
Victoria University College, Wellington, N.Z. ys. 6d. net.
'An excellent textbook for all students of Geometry.' — Nature.
* A useful and stimulating book.' — Mathematical Gazette.
ANALYTICAL CONICS
By D. M. Y. SOMMERVILLE, MA., D.Sc. (In the Press.)
A TREATISE ON DYNAMICS
By W. H. Besant, ScD., F.R.S., and A. S. Ramsey, M.A., President
of Magdalene College, Cambridge. New and Revised Edition.
8vo. \is. net.
For this edition the text has been revised by Mr. A. S. Ramsey, and a
number of examples from recent examination papers have been inserted,
so that the book now contains over a thousand examples for solution, besides
a large number of examples worked out in the text. The ground covered
represents an adequate course in Dynamics for students reading for Part II.,
Schedule A, of the Cambridge Mathematical Tripos.
A TREATISE ON HYDROMECHANICS
Part I. Hydrostatics. By W. H. Besant, Sc.D., F.R.S., and
A. S. Ramsey, M.A. Eighth Edition, Revised and Enlarged.
Demy 8vo. los. 6d. net.
Part II. Hydrodynamics. By A. S. Ramsey, M.A. Second
Edition, Revised. 12s. 6d. net.
This new edition is suitable for beginners, but also forms an introduction
to the more advanced branches of the subject, and includes chapters on
Applications of Conformal Representation, The Motion of a Solid Through
a Liquid, Vortex Motion and Waves, together with the Theory of Vibrating
Strings and Plane Waves of Sound.
A TEXTBOOK OF GEOMETRICAL OPTICS
By A. S. Ramsey, M.A., President of Magdalene College, Cambridge.
New and Revised Edition. Demy 8vo. Zs. 6d.
This is a revised edition of the work published in 1914. It contains
chapters on Reflection and Refraction, Thin and Thick Lenses, and Com
binations of Lenses, Dispersion and Achromatism, Illumination, The Eye
and Vision, Optical Instruments, and a chapter of Miscellaneous Theorems,
together with upwards of three hundred examples taken from University and
College Examination Papers. The range covered is somewhat wider than
that required for Part I. of the Mathematical Tripos.
LONDON: G. BELL AND SONS, LTD.
V'A
r
University of Toronto
•^
library
no NOT y^
REMOVE /
THE 1
CARD
FROM U
'$'
THIS \
POCKET \^
Acme Library Card Pocket
Under Pat. "Ref. Index File"
Made by LIBRARY BUREAU