Cc HA 33. oe University Library On a novel it ‘i regarding the ass WC ALBERT R. MANN LIBRARY New YORK STATE COLLEGES OF LTURE AND HOME ECONOMICS AGRICU CORNELL UNIVERSITY “42D ‘yoy SOT Nines -2uj "SOU8 4 _ DEPARTMENT OF APPLIED STATISTICS 48 UNIVERSITY COLLEGE, UNIVERSITY OF LONDON 2 Wak hh : “DRAPERS’ COMPANY RESEARCH MEMOIRS fb oecsleeuetage SERIES. VIII MATHEMATICAL CONTRIBUTIONS TO THE THEORY OF EVOLUTION. XVIII. ON A NOVEL METHOD OF REGARDING THE ASSOCIATION OF TWO VARIATES CLASSED SOLELY IN ALTERNATE CATEGORIES © BY. KARL PEARSON, F.R:S. ' WITH TWO ABACS CONSTRUCTED BY’G. H. SOPER, M.A. “6 ebony | Sep 2 20S ee Ns - 1959 ee Sas « ¢ We / CAMBRIDGE UNIVERSITY PRESS LONDON: FETTER LANE, E.C, 4 ALSO . - HK. Lewis & Co.,;Lrv., 136, Gower Street, London, W.0.1 ; WHELDON & Westey, Lrp., 2-4, Arthur Street, New Oxford Street, London, W.C. 2 Bombay, Caloutta, Madras: Macmillan & Co,, Limited _ Tokyo: The Maruzen-Kabushiki-Kaisha Price it PAR a FEB 18 1945 > =| * a) & 1S ve, UNIVERSITY OF LONDON. UNIVERSITY COLLEGE, LONDON, The Francis. Galton Laboratory fer _ The: Biometric ani ac ‘National Eugenics, My | eis This ‘Laboratory is jinbenrded to hese as atatintical stud This Laboratory was founded | Sir Fanos Garrow, arid. of Biological sari . Director :. Kani Paarson, ERS." y is under the direction of Professor Kant Pearson, F.RS. - - Kgaistaiite: Junta Buti, | M A., Herperr G. Sorzr: M Kk, Assistants: Davi Huron, M.A: D.Se., Brant M. Evper- -Eveuine Y. Taomson: Benin n Studentshi 1g Vane et: Ton, Amy Bagainoton, Kataupex T. ‘Buuey. Hon. Bee. : | Vacant. oy ue ? : siscio H. Gunraupe Jonzs. | Until. the en of ‘an ore inoule have bod 2 National Eiagenics: \e the tad’ ‘of tigencies ‘ind oil, * gibjetted. to ibeaaurerhant i nuigber! af alge the diatud’, “ control, that may Jniproe or impwir the. racial awaleiies, fn vue sain. dignity of a science.—-FRANCIS Gaxtow: - ce generations, either physically or mentally. “The Laboratory, is assisted by'a ‘a grant frors, the Vombigtad It was the intention of the Founder, that the Laborekiry’s Company ; of D cor ‘It’ provides a complete’ training. should serve (i)'ag a storehouse of statistical material-bearing | statistical method and assists. rosea, ‘workers’ engseed “on - on the mental and physical conditions in man, and the relation biometric: protiemn, + . of these conditions: to inheritance and ehvironment ; i) | as 4. ae, dee : “centre for the ‘publication or other form “of distribution: of ine |) js 8, ee Paks fen ee _ formation concerning National Eugenics; 3 Gi) as atachdol: for’ ave - poumagidationds' for. both’. training and adsisting pesenael workers in the Reger poplin addressed to | i OF Eugenics, |, ae . * Short | courses” are. vowided for. those’ ‘wie are. on ed in | , social, medical,’ or’ aan ee work! a nee "aust : a oe * ype The. Enogne of Parental Aleoltism on, thi Physique and: Ability’ of. the Offspring. "Ay Reply; to ‘the: Cam--_ EUGENICS: LABORATORY UBSCTURE SERIES. bridge’ Economists. By Karu ‘PRABSON, ERS, _ leaned eS ». Price 1s. “net. . : I, “The Scbpe and I t “the State of t! Seiler ice]: : coe a of lone | danportanes a ak Paamion,. i Ss. TEs “Meal: ‘Defect, : Mal Nut mare d, ‘thé Teacher's. dp . og — Assued.: Third editi Price: 1s, net. : Re a che nt 4d . Th ao avon - Hageniod. 2 By Kagy Pinson, PRS. a4 pe Paper oavable soe ‘Enya ‘onmer a ‘hs Bane Sse of. Narty and. Nature. “By et a 4 re + ‘ L LDERTON. + Lesie rice As.” net. ers ew “Iv. Qn thé Marriage “of * First Co nase: “By: Erie, M. TEE dada to: meee ‘somae sat ‘the: Rtesdaleietty: nse, Ae BF Siv-Viorer Horstny, F.R:S,,-F.R.C:S., and a Experron. |. fesued. Price 1s, net. D. SrvRes, M.D¥ in their Griticigms, inal Galton 7! oe Lv. The Problem’ of Practical: Bugeniics. By. Karn Pianos, Pa Epborete ‘Mémoir: “A First Say ‘of the: Influence... “ERS. Seitied,~. Second Edition: = Price 1s. net,” > ae: Patani Aleoholjam my” ’ Me: : : 4 wm Reture and Nurture, the’ ‘Problem: “of the: Future. By. 2 _ Kant Pearson; F.RS. ~ [se wed. “Price 1s, net. 2 WEL The: Academic Aspect’ of. the” Science, “Of. ional -|:, - ., J: Rugenica::, By Kanu ‘Paakbow, F.R.S. Tasued, Price 7 cau ., BRS. | r aeued. Price 1s. : a mes t= TW, Sadiat. ean ar ‘yu. ‘Pabereudbain, “Heredity and. Environment. By Kai ‘Futur’. By ‘Barn "PEARSON, HR oe EB, elo. , Prioa: ‘Le. net. - As net. = DRPARTMENT ¢ OF APPLIED STATISTICS, wu ‘VERSITY ; "COLLEGE, LONDON, °° ; DRAPERS’ ‘COMPANY RESEARCH. MEMOIRS Gs SERIES. Stidies Wp ‘National: Deterioration’. : I. on. ‘the Ralatiogl of ‘Yertility one Man ‘to Soci at Status, ‘and’ on the Jcawnca” in mth Bastin “that Have taken plage in: ‘lie last ‘50 years.’ By Davin, Hanon, M.A. bhi: Price Bai 1 ee 4 ee h First “Study. of ‘the’ Statisties of Pulmonary ‘Tuberealosis Inheritance. - * dts TEs emied..- ‘Price 38. eel. ap IIL..-.A Second ‘Study ‘Of the ‘Statistics: of Pulinonary. TuBorgalodise Marital, Telia By. tha ee _ “+ E.G. Pors. Edited and reviséd by Kaui. “Prarson; F.R.8. with an: A endix ; on “Assortive: a - ‘Mating from Data reduced by Verner M. ‘Enpenton,.. Issued: ‘Price Se. net. oocIy. ‘The Health of the. ‘School Child in, elation, ‘to its Mental Chieactens By y Kar pom ‘Shortly. 0). en ee ‘ah OVE On’ the Inheritance of ‘the Diatheoss of: a Pathin, and " Thiapibgs A Statistical. Study based v >) “upon the Hamill ah of, 1500 '¢ Criminals ‘By. ae a oe aa MD. BSe.". ‘ Lesued: os . Price ‘3s. net, ° a ry VI. A Third Study of thie Statistics of Bahay Taberoulosis: The. Mortality of the Taberoulous and ~ Sanatorium Treatment, aa Ww. ‘Patt ELpEnton,. FLA and 8. J. ‘Pusny, ALA. Tavued: p were? ,Price 3s. met ; VIL, . On the. Entanaify of Natural Selection. it Man a & the telation of Datwiniom. to the Tn ie 3 Pesberneey y C. on M.A. _ ee: rie. Ae aS ; ‘ —. me res 1 Published rf ‘the cambridge University om Fetter Lane, C. 4 ' DEPARTMENT OF APPLIED STATISTICS UNIVERSITY COLLEGE, UNIVERSITY OF LONDON DRAPERS’ COMPANY RESEARCH MEMOIRS BIOMETRIC SERIES. VIII. MATHEMATICAL CONTRIBUTIONS TO THE THEORY OF EVOLUTION. XVIII. ON A NOVEL METHOD OF REGARDING THE ASSOCIATION OF TWO VARIATES CLASSED SOLELY IN ALTERNATE CATEGORIES BY KARL PEARSON, F.R.S. WITH TWO ABACS CONSTRUCTED BY G. H. SOPER, M.A. Published by the Cambridge University Press, Fetter Lane, E.C. 4 1912 Cornell University The original of this book is in the Cornell University Library. There are no known copyright restrictions in the United States on the use of the text. http://www.archive.org/details/cu31924013993179 ON A NOVEL METHOD OF REGARDING THE ASSOCIATION OF TWO VARIATES CLASSED SOLELY IN ALTERNATIVE CATEGORIES. By Kart Pearson, F.R.S. In a memoir published twelve years ago in the Phil. Trans. I have shewn that in the case of the fourfold table for the correlation of two variates, 1.e. A, Ay Totals B, a b a+b B, c d c+d Totals a+e b+d N the correlation between the means of the two variates, when each is measured in terms of its standard deviation, is* ez ad — be * J/(b+d)(a+c) (¢+d)(a +b) This correlation naturally vanishes with the transfer, 7.e. e=(ad—bc)/N, or if the two variates are absolutely independent. Further if r,, be the correlation between the two variates x and y concerned, 7, must of course vanish with 7,,, but it is very far from equal to it, or proportional to it, as has been apparently assumed by certain recent writers on correlation, the multiplying factor varying with the values of h and k, z.e. with the positions where the dividing classifications are made. The above statements depend upon the assumption that the distribution of frequency is normal or Gaussian in character. Quite apart from any assumption as to the nature of the distribution, I have shewn that the mean square contingency {¢ of a fourfold table is , 3p (ab — cd)’ eet 2 ¥=G+aye+byate(d+s)—™ tae ats cena dees (ii), x’ = N¢"’, we can from the general theory of the deviations from the probable in a correlated system of variables} reach a quantity P giving the probability that the system is * Phil. Trans. Vol. 195, a, p. 12, 1900. It is well to take as our standard arrangement of the table one in which a+b>c+danda+e>b+d, + “On the Theory of Contingency and its relation to Association and Normal Correlation,” Drapers’ Company Research Memoirs, 1. p. 21. t Phil. Mag. July 1900, pp. 157—75. and that, if we take 1—2 4 KARL PEARSON really a random sample from material in which the two variates are independent. Tables for finding P from y* have been calculated by Mr Palin Elderton, the well- known actuary*. By determining the value of P, we are always in a position to ascertain the improbability of independence or 1—P is a proper measure of the grade of relationship. Unfortunately we do not think in millions, and to say that P=718/10° gives us a very poor mental estimate of the interrelationship of two quantities, compared with the simple statement that their coefficient of correlation is ‘60. We do not think on such an extended scale of figures as the improbability scale provides us with, and we are bound to ask ourselves whether it is not possible to translate it into the simpler ideas of correlation. We might ask: what is the probability P’ that in a population of N individuals an observed correlation 7 has arisen not from real association but from random sampling? We should then reduce our correlation to a probability scale. There is no difficulty about such a process at all, it depends solely on the distribution of frequency of 7 in random sampling. By simply equating the above value of P to P’, we could then determine on a correlation scale—that is on a scale readily appreciable, the improbability of a given deviation being due to random sampling and not to true association. We should say it is as unreasonable (or as reasonable) to suppose this contingency has arisen by random sampling in a population of WN individuals as to suppose that a correlation coefficient of magnitude r could arise solely from random sampling. Thus r would not be used in any way to represent features of linear or other regression lines, but solely as an artifice for transferring to an adequate mental scale improbabilities often sensible only in the 30th or 40th decimal place. Now the improbability of r, arising from a random sampling of material having its variates unassociated, depends on the size of the standard deviation of r, and this size depends on the method by which 7 is determined. It is not the same when found from (i) a product moment table assumed to represent a Gaussian frequency f, or (ii) from a fourfold table representing the same frequency divided at its meansf, or again (iii) from a fourfold table of Gaussian frequency divided very far from its meansf, or lastly (iv) from a product moment table for a frequency which is very far from Gaussian ||. Hence to obtain a scale of correlations by which to represent contingency improbabilities, we must select the nature of the method by which r is supposed to be reached as well as the size of the population. It will not do to say that ‘67449 (1—7°)//N or, for zero real association °67449/VN, is the probable error of r, because this probable error depends on the determination of 7 by a method which is never applicable to a fourfold table. It seems needful to select our corre- lation scale to be such: (i) that the standard deviation of our correlation will vary * Biometrika, Vol. 1. p. 155. T Pearson and Filur, PAil. Trans. Vol. 191, a, p. 242, 1898. t Sheppard, Phil. Trans. Vol. 192, a, p. 148. § Pearson, Phi. Trans. Vol. 195, a, p. 14. | Sheppard, Pha. Trans. Vol. 192, a, p. 128. See also Pearson, Drapers’ Company Research Memoirs, 11. p. 20. ON A NOVEL METHOD OF REGARDING ASSOCIATION 5 with the relative frequency of the two pairs of groups into which our categories divide the variates. This has no relation at all to the assumption of a Gaussian frequency. Ifa group contain n, individuals, its probable error is 67449 Vn, (1—n,/N), whether the variate be Gaussian or not, and the ratio of probable error to the number in the group contains the factor’1//n,, and increases rapidly as n, becomes small. Any categories which contain only small percentages of the total in a fourfold division, even if the variates be purely categorical and not quantitatively measurable (e.g. divorced women and married women), involve increased probable error of our conclusions, and no scale of 7 will be satisfactory which does not recognise this ; (ii) beyond this when we suppose our fourfold table to represent Gaussian material the value of r obtained by our selected scale of correlation deduced from “ equality of improbability” ought to be reasonably close to the r given by the usual process on a Gaussian fourfold table. The probable error of the coefficient of ee for a fourfold table on the supposition that it is a random sample from uncorrelated material of Gaussian distribution is 67449 u ores?) (d+c) JNHK where = Fe coh, =D ' TT h and k corresponding to the ratios of the distances of the means from the dividing lines of the categories to their respective standard deviations. It is clear that this value increases rapidly with h or k, since i +e) (b+d) 1 a+b) (c+d) an a 0 Fel oie x increase to infinite values with hf and k. Now the value of r from a Gaussian fourfold table has to be determined from an equation of the form ad — be BS (Pog Wy) . WAK =7 {1 + 8 |n } a alnterdia “bi a's eavveieiciste’e aie iern (iv), where @,_, and #,_, are known converging factors in h and k respectively™*. It follows that the ratio of the standard deviation of r on the supposition that it is truly zero to its observed value is approximately te J(a+b)(a+c)(d+b)(d +c) (w) Or, UN lad — yy nena v), or approximately: oan * ae cre a OT ea (vi). * Phil. Trans. Vol. 195, p. 6. 6 KARL PEARSON © Now let us determine the probable error of ,, for uncorrelated material, = y 7 * J(b+d)(a+c)(c+d)(a+6) d’ where y is zero for uncorrelated material. We have, using differentials, Sel = Way: But for uncorrelated material we may put y=0 after the variation due to random sampling has been allowed for, 7.e. y=0, but not Sy=0. Hence (37) = (By) 5 or, summing ‘and dividing by the number of random samples, 1 Fry = Or To find or, therefore for uncorrelated material, we require to find o,. Now y =ad — be; . dy =a8d + d8a—bde—cdb. Square, sum for all random samples and remember that On =a (1 -¥) ‘ ab and FOV ay = — N ’ we find of=ad (1-5) +da (1-5) +¥e (1-4) +eb (1-5) 207d? «2b? ~=2abed 2abcd 2abed 2abced rs i=) — B | Factory ......... 540 — — = 540 £ ra. Chaitin sutauss 58 20 10 — 88 g | Work taken a 2) —_ 2 Ue ethene } 70 30 29 122 i Unemployed ... — _— — 250 250 oa 3 Es = OR caddies 668 50 32 250 1000 Now, if the division here is between “employed” and “unemployed” mothers only, the coefficient of correlation on the assumption of a Gaussian frequency is unity, and the coefficient of association is also unity. Against these we find that the coefficient of contingency is only ‘707. There cannot be I think a doubt that this is the better estimate. It leaves °293 over in reserve until we know something of the sub-classifications of mothers’ employment before and after the birth of the children. In the example we see that the degree of employment changes after the birth and that a number of the factory workers tend to take in work or to go charing. The coefficient of contingency is now C,='755. It will be clear from such an illustration that we should have got a poorer result had we tried to correct contingency on the basis of tables with diagonal cells only occupied being representable by a coefficient of value unity. Contingency very properly allows for the extent of our ignorance, the coefficient of association does not. The correlation coefficient assumes a know- ledge of the exact character of the distribution, and even if a Gaussian distribution be adopted, still no trained statistician would dream of applying it to a table of the form oS. and argue that the correlation was therefore perfect. According to the view taken here we should anticipate that when a fourfold table really represents approximately Gaussian material, then the value of 7/,c, will give a probability fairly closely approaching that deduced from the contingency ; on the other hand when the material has no approach to a Gaussian distribution the value of 7 found by equality of improbabilites will be higher than that deduced by a simple fourfold correlation table. To sum up then, the present paper proposes to deal with tables of few cells by using the probability P determined from the square contingency x’. I can see ON A NOVEL METHOD OF REGARDING ASSOCIATION 11 absolutely no valid theoretical objection to this method of reckoning the relation- ship of two characters. Practically it suffers from the transcendent difficulty of mentally appreciating the relative differences of indefinitely large improbabilities. In order to surmount this -difficulty I propose to think in a scale of correlations; I ask what r would have equal improbability if it arose from a random sampling of Gaussian material at the same dividing lines. In a paper now in type I have given tables from which the probable error of r can be readily found for a given division. I use these tables to find..c, From an extended Elderton’s Table for “Goodness of Fit,” I find log P. I then determine on what appears to be a reason- able hypothesis a value of the correlation coefficient which would be equally improbable; and thus reduce my improbability of independence to a mentally apprehensible scale. Thus the coefficient of correlation is merely used as a standard of improbability, and we pledge ourselves to no hypothesis as to frequency distri- bution, of which in many cases we know nothing. Still as we wish to approach fairly closely to the actual value of the correlation when the distribution is Gaussian, we select by preference a standard scale of correlation improbabilities, which will not contradict Gaussian results, when the fourfold table is of that character. We should not anticipate absolute agreement, for the reasons already stated, and it would be a sufficient justification of our method, if the results obtained by it, when the material is truly Gaussian, lie within a range limited by twice the probable error taken on either side of the Gaussian value. We have next to consider how the frequency of correlation coefficients for an actual value zero is to be distributed in large random samples. We cannot use a normal curve of standard deviation ,o,; for it is quite obvious that the tails of this will extend beyond the limits — 1 to +1, and although such a curve is quite legitimate for ordinary probabilities in the neighbourhood of r=0, it is wholly inadequate when we have to ask for example what is the probability that + will equal 0°8, when its actual value is zero, for say a sample of 1000. The curve of distribution of r must be symmetrical about r=0, and vanish for r=+1. The only one of my generalised frequency curves* which satisfies these conditions is Type II, we. +1 If N be the total frequency, N=y, ii (1—2’)"dx, and -1 +1 Nua Notey, i (1 —a\"atde +11 od 1, —a?)yn* a2 a - 2(m+1) oh (1 —x "dx _ 1 es “Hof (m+) pine Nps); a =5(o- ) il hence Par gq OF mals OL ganeisCtaeaumslouuseaiey enemas (xii). * Phil. Trans. Vol. 186, a, p. 372. 12 KARL PEARSON We now want y. Remembering that the range is 2, we have* I (m+1°5) Jr (m+1) But o is small, @.e. 07 would be large for o and therefore o° large if equal to ‘005. Thus 1/o* will be small if it is 200 and m small if only of order 100. We may therefore safely use Stirling's theorem to obtain the I-functions. Thus T(m+1°5) — /2r(m+1°5) et?) (m+ 1°5)"*2° P(m+1) — J2a (m+ 1) em +l) (m+ 1)m4h mgt BEES, (HII Jmt+1 \m+l “8x GE | aot) Y= J/m+l ae nearly. se) ; Hence we can take for our frequency curve for r Nn ates : [=> (1 —2") ( ) ee ee rere (xiv). When the sample is very small we must retain the full value 1 T(s5 = 1/1 ea (1 ayil@-*) =: r(i(4-) (3-1)) This agrees in the special case for product-moment 7’s when o tea with the form proposed by “Student” in Biometrika, Vol. v1. p. 306 and experimentally justified by him. We have next to find the area of the tails of this curve beyond a given value rf, to measure the improbability that with no correlation a random sample could give a value r. We clearly have P= 2| | 1-29 )dem PELL ayn ae --J23 ap! ae aia _ Efe a ter a ge wyede), * Phil. Trans. Vol. 186, a, p. 372. 7 ris to be treated as a quantity without sign, a mere numerical quantity and therefore both tails of the frequency distribution are taken—this is the origin of the first factor 2 in the value of P. ON A NOVEL METHOD OF REGARDING ASSOCIATION 13 and continuing the integration in this way by parts we have P= ja ey 1 {.- bers ie ) r 2(m+1)l 2(m+2) 7 CEE Cee ” 7 1.3.5 1 E) : TeSy ICSE) | a i This series converges with considerable rapidity if o be not greater than ‘07 and r moderately large. If 1/o°=s, 2m=s—8, and if we write \=(1—7°)/7’, we have ae ae Gee 2 \achaes 6 (Oy ey 1.3.5... p=,/? _ : ae 8.5. ~vs—7 z {1 fe +f stl (s+1)(s+3) (s+1)(s+8)(s+5) whence P may be fairly easily calculated. The series is a semi-converging one, which is satisfactory enough until 7 gets small and therefore \ large. When r is small or o very large (xvi) fails to give the 1 result closely enough*. In these cases the value of the integral | (1—a’)" da must be found from other considerations. Thus 1 z 1 at of [a-ey ae= | il | ont o- (E45) dn Let ma’?=42, then 1 1 (v2m 1 1 1 i 2\m of =, alt pane 8 8 a . ja x)” dz Tan eee 1 aa? oan? +i56me? + dz Now the integrals eo” d. aie — $2? zy tI. ° z and 75 =| dz are tabled integrals, the first being the usual probability integral (Beometrika, Vol. 11. p. 182), and the second being the incomplete normal moment function which has been calculated for n=1 to 10 ne Vol. vi. p. 66). Thus Te 2 5 We (v2m) — 1, (/2mr)} — 3 as ti (2m) — p, (/Bmr)} = {Ms (V2) — py (v 2mr)} +-—~— {u, (2m) — pw, (/2mr)} — ete. ...(xvii), sam oe ie where Pa(@)= an | Me 4” dda, * For most values of o for r= ‘02 the two formulae coincide for practical purposes, so that the formula to be given below may be used for values of r=-02 or under. For o = ‘08, we cannot use (xvi) for r=0°3. 14 KARL PEARSON As a matter of fact /2m may be treated as infinite compared with J/2mr for values of +=0'2 and less, in which case Hon (V 2m) = (20 — 1) (2n — 8) (2n— 5)... 1X5. It will be found as a rule unnecessary to go beyond the terms in p,*. From (xvi) and (xvii) Table I has been constructed; this gives the value of —log P for each value of rand ,o,. In other words it expresses the probability that with a given probable error of a zero coefficient (i.e. °67449,c,) a given value of the correlation will arise from this uncorrelated material in a random sample of definite size. The size of the sample and the nature of the process by which 7 is obtained are indifferent, provided regard is paid to them in determining ,g,. Table II is the required extension of Palin Elderton’s Table of Goodness of Fit (see Biometrika, Vol. 1. p. 159). It gives the value of —log P for n’=4, i.e. for four groups, from x*=1 to x’=25,000. This enables us to ascertain the improbability of a given yx’, even when that improbability is only significant in the 5000th place of figures. Table III gives the x’, which would have the same improbability as the 7 of Table I, and is obtained by simple interpolation from Table II. Table IV replaces y? by logy? and forms a reasonable working table. Given log x’ from the fourfold table and ,o,, we can find the value of 7 which expresses the same improbability. Thus far we have not even selected our scale of correlation, which is wholly determined by the choice of ,o,. We might take ,o, simply equal to 1//n, but this would not be a really satisfactory scale of correlation improbabilities. The reason is obvious; it supposes a knowledge never conveyed by a fourfold table, i.e. the knowledge involved in our having the material in a large number of equal- ranged cells. Very naturally, therefore, we avoid this scale, for it certainly would not give at all comparable values of r for those cases of fourfold table where the material is known, or may legitimately be supposed, to be Gaussian. Accordingly we adopt for ,o, the value as given by a fourfold Gaussian table, #.e. where x,, and x,, are respectively EG Fa)FU=8) nq YEUFH) P= a) K ’ a table of which function has been recently published by me, and is reproduced here as Table V. It enables us at once to determine ,o,. Finally, Table IV has been converted into an “abac,” upon which the value of r—the “equal improbability correlation ””—can be read off as soon as log y’ and ,o, * For example for r=0-1 and ,o,=-03, —log P= 3-072 by (xvi), it equals 3-076 from (xvii) and 3-066 from the Gaussian, or probability integral. The most troublesome values were those for )o,= 08, r=0°2 and 0-3. They were finally determined as 1°924 and 3-903, but they cannot be guaranteed to a unit in the last figure. ON A NOVEL METHOD OF REGARDING ASSOCIATION 15 are determined to two figures. This abac was constructed in the following manner : Accurate curves were drawn of the values of each ,o, of logy’. From these curves the values of log x’ were read off for each value of 7 proceeding by ‘01, from ‘05 to ‘95. It was then possible to plot the family of curves which for each 7 give the relationship of log x’ and ,o,. I owe this excellent diagram to Mr G. H. Soper. The bulk of the laborious calculating work on the tables has been carried out by Miss Julia Bell. In the course of our investigations two additional tables have been calculated. In the first place it was needful to extend Sheppard’s Tables far beyond the limits of published work on the probability integral in order to compare how far it was possible to trust the Gaussian to give the distribution of frequency of correlation coefficients obtained by sampling independent material. As we have seen, the Gaussian is of no service for this purpose except for very low values of r (e.g. 0°1 or less). But the table has independently considerable value, and most statisticians will remember cases when they have had laboriously to calculate 1 2 F=—_ —30? yf Fale ‘ beyond the usual limit of z=5. I reproduce this table here as Table VI. It was calculated by aid of Schélmilch’s formula%, 2.e. ie Se 2 1 1 1 1 F=—— ~ ae" = =ae eer faces | PEE Nie le a (a +2) * a (a +2) (+4) 5 9 ~ at? (a? +2) (a? + 4) (a? +6) © x? (a? + 2) (a? + 4) (a? + 6) (a + 8) 129 ~ at (at? + 2) (a? + 4) (a? + 6) (a? + 8) (x? +10) 2 PACE and the table gives —log F, for ease of interpolation. It was calculated to seven decimal places, but only five are retained, as the seventh figure was not trustworthy and occasionally the sixth is doubtful. It remains to examine some of the correlations found for fourfold tables by this novel process, and to compare the value as found by the assumption of Gaussian frequency. Illustration I. The following table is given by me for good and bad temper in pairs of brothers (Phil. Trans. Vol. 195 a, p. 147). First Brother Good temper Bad temper Totals i a Tere 2 4s 3 © | Good temper ...... 330 255 585 (aa) z Bad temper ...... 255 454 709 3 M Totals...... 585 709 1294 * Compendium der héheren Analysis, Bd. 11. S. 270, Braunschweig, 1879. 16 KARL PEARSON Here log x’ = 1°9614, }(1-+4)=4 (1 +6.) ="5479. Hence, by Table V, Xe, = Xao= 1°2566, and = Fiod ei ee = 0439. Interpolating from Table IV we have for ,o, = "0439, r=0°3, log y’=1'7596, r=0'4, log x*= 20025. Hence log x°=1'9614, r='38. Had we treated “Temper” as a continuous variate of Gaussian distribution, we find r='324+°08. Mr Soper’s abac gives us at once r=°38, and saves the labour of the second interpolation. Illustration II. The following table gives the relation between deaths or recoveries from small-pox and the presence of a vaccination cicatrix (Phil. Trans. Vol. 195 4, p. 48). Small-pox Recoveries Deaths Totals 4 Present ...... 1562 42 1604 » 3 Absent ...... 383 94 ATT Totals...... 1945 136 2081 Here log x’? = 2°2549, 2 (1+a,)='9346, 3(1+4,)=°7708. Hence, by Table V, Xa, = 1°9437, x,, = 1°3869, and OF X Xa, * Xa, = 0590. Interpolating from Table IV we have for ,o,.=°0590, r=0°6, logy’=2°1090, r=0°7, logy’?=2°3088. Hence for log x? =2°2549, r=°67, precisely the value read off beforehand from the abac. Had we treated Cicatrix and Recovery or Death as Gaussian variates, the correlation would be r='60+°03. ON A NOVEL METHOD OF REGARDING ASSOCIATION 17 Illustration III, The following fourfold table is given by Macdonell (Biometrika, Vol. 1. p. 193) as connecting stature and head breadth in animals. Stature 5’ 4,%," and under Over 5’ 4,%” Totals = ‘x g | 14°8 cm. and under ... 455 622 1077 Pa 2 3 Over 14:*8em. .. ...... 599 1324 1923 o tt Totals............ 1054 1946 3000 Here Hence, by Table V, and log y? = 1°5718, 4 (1+,) = "6487, £(1+a,) = 6410. Xa,= 12871, x,,=1'2791, 1 r = x 07 73000 Xa, * Xo, = ‘0301. Either by interpolation from tables or the abac we reach ‘165 (actually 1654), so that r=°17 is the nearest second place figure. frequencies, finds r='18+'02. Macdonell, assuming Gaussian Illustration IV. The following table is taken from my memoir on the relation of intelligence to other mental characters (Biometrika, Vol. v. p. 146). relationship of self-consciousness to intelligence in 2054 boys. It gives the The intelligent group cover the quick intelligent and intelligent, the other group the slow intelligent to very dull. Intellectual Grade Intelligent Slow Intelligent to Dull Totals a & | Self-conscious ......... 447-5 544 991-5 =] ° ‘S| Unself-conscious ...... 438:5 624 1062-5 8 {e) Totals ......... 886 1168 2054 Here log x’? = 0°4942, Hence, by Table V, and Z(1+a,)= "5686, $(1+a,)='5173. Xq,=1'2601, YX, =1'2538, ‘ Co, = > en /2054 ® Xe * Xe, = ‘0349. 18 KARL PEARSON Interpolating from Table IV we have, for .o,= ‘0349, r='05, logy’ =0°7308 Thus the correlation is less than ‘05, and an inspection of the abac and rough extra- polation indicates that it must be about ‘03. To test this, remember that for such low values of 7, the Gaussian curve gives the area closely. Now log x’=0°4942 corresponds to x7=3°12, and this from Table II to —log P=0°428, or log P =1°572, i.e. P='3733, which gives a single tail of ‘1866, or we must enter the probability integral table with $(1+a)=°8134. This gives x= °89 =17/,0,=1/('0349), or r='031, agreeing excellently with the value read from the abac by extrapolation. Using Everitt’s Tables we have for a Gaussian distribution 009,631 ='156,6507r + 000,6217? + '025,2717° + °000,4617%, which gives r= 061+ °024, a result within the limits of random sampling of the r obtained by the previous method, «.e. °031. Illustration V. I take for this illustration absolutely Gaussian material for a population of 1000 destined to give °80 correlation. A Not-A Totals B 704 160 864 Not-B 22 114 136 Totals 726 274 1000 Such a table is easily constructed from Everitt’s Supplementary Tables of the Tetrachoric Functions (Biometrika, Vol. vit. p. 385). We find log x’ = 2°4018, 4 (1+0,)='726, 4(1+,)="864. Hence, by Table V, Xa, = 1°3391, x,,=1°5713, and 1 a= aan Xo. Xo.= 06654, Interpolating from Table IV, we have r="8, log y’=2°3828, r="9, log y*=2°5873, whence we find for log x’=2°4013, r="809. From the abac r=°81, as against the r=°800 + '022 actual value. oe Illustration VI. I take another illustration of truly Gaussian material, namely 1000 cases distributed so as to give r=°50. Here log x’ = 1°9452. We have #(1+a,)="709, 4(1+a,)=°813, giving Xa, = 13248, x, = 14512, and 0, = 06080. ON A NOVEL METHOD OF REGARDING ASSOCIATION 19 A Not-4 Totals B 629 184 813 Not-B 80 107 187 Totals 709 291 1000 From Table IV, by interpolation for ,o, = -0608, log x? = 11-9366, log y= 2°1138, The abac gives us ‘51. and we see that the difference is again not significant. r=0'5, r=0°6, whence r=°506, when y?=1°9452. values against r='500 + 033, We have to set these Illustration VII. Asa last Gaussian material case, I take the following table, namely 2000 distributed so as to give r= 25. A Not-4 Totals B 1115 85 1200 Not-B 685 115 800 Totals 1800 200 2000 Here log x’ = 1°4526, $(1+a)="9, $(1+a,)="6. These give by Table V poe ae x 1:7094 x 1:2680 = 048,467. /2000 Whence by interpolation we find r= ‘226 and by the abac ‘23. We have then to compare the Gaussian r='25+'03, with r=°23, and we see that the difference is again less than the probable error of random sampling. Illustrations V, VI and VII seem to show that, for truly Gaussian material, the two methods lead to closely similar results. Illustration VIII. The following table represents in a fourfold form the corre- lation between the length of left foot and left middle finger in 3000 criminals. It is taken from Macdonell’s paper (Biometrika, Vol. 1. p. 226). 3—2 20 KARL PEARSON Left Foot % 25-5 cms. and under Over 25:5 cms. Totals a * | 11:5 cms. and under... 1103 411 1514 Z Over 11:5 ems. ......... 274 1212 1486 a 4 Totals ...........- 1377 1623 3000 Here log x? = 2°9514, 4(1+a,)='5410, 4(1+a,)= "5047. Hence Xe, = 1°2557, Xa, = 1°25384, 1 and Op = F=—— XX, X Xa, = 028735. 07/3000 2” Xa Interpolation and the abac give us r="72 and Macdonell has ‘76. The probable error is only ‘01. These last two numbers on the assumption of a Gaussian distribution. Illustration IX. JI consider lastly a table which would by many be considered to represent perfect correlation. A Not-4 Totals B 800 0) 800 Not-B 0 200 200 Totals 800 200 1000 Here log y*°= 3, $(1 +a,)=}(1+4,) = ‘8, Xo, = Xa, = 174288 ; thus oO, = 064,557. The point on the abac is outside the contour r= 95, and some might be prepared on this account to consider the correlation as perfect. as the value lies outside Table IV, by a slightly different method. By Table II, x’=1000 gives us or —log P=215'745, log P =216'255. We now turn back to equation (xvi) and note that 1 ae an 239°946. We must however proceed, ON A NOVEL METHOD OF REGARDING ASSOCIATION wal Suppose r="995, 7°='990025, 1—7r?='009975 and A=:01007, pe Je /239°946 (009975) ( ‘O1 0003, = pecle 238°946 995 ~ 240°946 , 241 x 242 It is clear that for such high values we may treat the series factor as unity. We find —log P = 240°360. Now putting r='99, we have —log P = 204:525, and thence by interpolation —log P=215°745, where r="992. The correlation is thus very high, but not perfect, and this seems reasonable because we are not really making the assumption of a Gaussian frequency, and the value of P if very small is still not zero. Conclusions. Without laying too much stress on a short series of numerical illustrations, which were merely taken at random for various values of the correlation and for various divisions of the categories*, we may, I think, conclude that our correlation scale of the improbability of independence in variates gives quite reason- able results when tested on fourfold tables treated as or really representing Gaussian distributions. In these cases we shall rarely get a divergence amounting to twice the probable error, and usually a result well within the probable error. In the following table I have put together the chief results for the present series of illustrations. In the first column we have the value of x’; in the second column the resulting probability of independence, P; in the third column we have ¢’, the mean square contingency ; in the fourth column C,, the coefticient of mean square con- tingency; in the fifth column 7,,=¢; in the sixth column Yule’s coefficient of association, Q,; in the seventh column the coefficient of correlation, 7,, as found by a fourfold table on the assumption of Gaussian distribution of frequency ; and lastly, in the eighth column, the value of the coefficient of correlation, rp, on the equal improbability scale discussed in the present memoir. Several interesting results are at once manifest : (i) The reader will at once appreciate the difficulty of mental apprehension of the relative probabilities involved in the column P. (ii) The order of the probabilities is not the same as that of the coefficients of correlation ry found by assuming a Gaussian frequency. Nor should we anticipate that they would be; for by increasing the total population, and still distributing it * T have worked out numerous other examples since the illustrations here given, and on the basis of them might have very reasonably supposed the divergences in I and IT to be due to arithmetical errors. But I have not found such errors. For example take, to compare with IT, the Table given by Macdonell (Biometrika, 1. p. 222) for stature and left m. finger length (Table XII of his paper): it gives 68 as against Macdonell’s ‘663 + ‘013, as found by Gaussian methods. 3—3 22 KARL PEARSON among the four categories in the same proportions, we increase in the same ratio y’, and also decrease P, but we do not modify 7,. Our scale therefore of appreciation ought to allow for this factor in P, and this is done when we reckon r, by considering the relation of r to ,o,, which varies with the size of the population. Analysis of Results of Illustrations. Illustration x? P ¢ Cy The Qo T Tp IV (2054) 3:12 37/10? 0015 04 ‘04 08 064+ °02 | -03 TIT (3000) 37°31 40/10° 0124 ‘ll ‘11 "24 184-02 | -17 VII (2000) 28°35 31/10 0142 12 ‘12 38 254+:03 | °23 T (1294) 91:50 10/10” 0707 26 27 39 *324:03 | 38 VI (1000) 88-15 55/10" 0882 28 “30 64 50+:03 | -51 IT (2081) 179-85 95/10” 0864 28 29 ‘80 ‘60+ °03 | ‘67 VIII (3000) 894°13 38/10 2780 “AT 53 84 ‘T6401 | ‘72 V (1000) 251-94 25/10* 2519 “45 50 “92 80+ °02 | ‘81 IX (1000) 1000-00 18/1025 | 1-000 ‘71 1:00 1:00 1:00+-00 | -992 X (1000) 0°1423} 99/10? 000142 012 012 1:00 *-314°:26 | -008 (iii) C, and 1, are of course free from this objection, but they are absolutely incomparable with true coefficients of correlation ; the former because the coefficient of contingency must be based at least on a 4 x 4, and better a 4x5 or 5x5, table, before it approaches r, and the latter because 7,,=¢@ is never equal to 7, by its very definition and nature. I pointed this out many years ago when first dealing with 7+. Quite recently Mr G. U. Yule has reintroduced r,, under the novel name of a “theoretical value” for the correlation coefficient of a fourfold table. I am unable to see why it should be a “theoretical value,” as it seems so far as I can follow Mr Yule’s deduction to involve, when deduced by his method, a very arbitrary relationship between the standard deviation and the position of the mean of each subrange in the case of both frequencies. Like C;, 7, may even displace the true order of relationship in the series. I do not think that 7,, can be used, as Mr Yule suggests, as a measure of association, at any rate it is a measure wholly incomparable with true correlation, and it is quite possible—out of the indefinitely large number of measures of association—to select one practically as easy of determination and which does approximate to the true correlation}. * This value is insignificant as compared to its probable error. + An Introduction to the Theory of Statistics, p. 212. a|b ¢ Given —|— as our fourfold table, the correlation is not necessarily perfect in actual practice, if c\d either } alone or ¢ alone be zero. This is quite clear if the distribution be Gaussian, and the dividing lines of the classification be taken so as to meet on the elliptic contour of the frequency surface which contains inside itself the whole volume of the population. Thus in practice it is quite possible to obtain @:=1, where the correlation is small or even zero. ON A NOVEL METHOD OF REGARDING ASSOCIATION 23 (iv) The coefficient of association comes out badly from these tables—it gives a difference of mean square ='109 against that of r,=-036. But its value here is by no means as bad as it can be. Its chief evil is that it gives wildly different values according to the position of the dividing lines, and when for Gaussian material r=0, Q, may take any value from 0 to 1 according to the position of the dividing lines, ae. the percentages of the two variates in their sub-categories. When we know this is so, for material the distribution of which we can measure, what confidence can we have that the result has any significance when we know nothing at all of the frequency distribution? This may be exemplified as follows: Illustration X. A Not-A Totals B 23 971 994 Not-B 0 6 6 Totals 23 977 1000 Here x’?='14225 and P=:986*, Z(1+a,)='977, 4(1+a,) ='994, which give Xa, = 2°7377,, Xa,=4°5419, 1 and thus Pe= ana X Xa, * Xa, = 3932, and this gives 7 for equal probability =-008, 7.e. sensibly zero. Actually the table was obtained from material having zero correlation. The same material divided at the mean gave 250 | 250 500 250 | 250 500 500 | 500 | 1000 for which the correlation is absolutely zero. In the above table, however, the association coefficient of Mr Yule is unity, in this second table it is zero !—Clearly such a coefficient when it is liable to swing over from zero to unity can be of no real service for accurate work, such as the determination of the relationships between _ [2° ,- we =i. p=,/2f é dx + a xX: see Pearson, Phil. Mag. Vol. u. pp. 157—75, or Biometrika, Vol. 1. p. 156. * Calculated from 24 KARL PEARSON deformities and in other cases to which Mr Yule, and—I regret to say—continental anthropologists and economists on his authority are now applying it*. (v) The value of 7, seems to me based on a scientific conception. We agree to measure the closeness of association by the improbability that the material could have arisen from a random sample of unassociated variates. But this probability offers no easily apprehensible mental scale. Accordingly we determine to replace our improba- bilities by correlations which would have been equally unlikely to arise from a random sample of uncorrelated materia]. The choice then to be made is one of a correlation scale. The probability of any r can be determined in terms of the standard deviation of r for uncorrelated material. But what standard deviation shall we select? In order that our results shall agree fairly closely with the results for Gaussian distribu- tions we select our arbitrary standard deviation, and so our scale, to be that of a zero correlation for a fourfold Gaussian table with its variates divided in the same proportions as in the actual material. If we estimate our probability of independence on this correlation scale, we see that the values of 7,, the probability correlation, never differ very widely from those which would be obtained by supposing the four- fold table to represent a Gaussian frequency distribution. In other words, even when a table is non-Gaussian, or cannot be thought of as representing continuously varying material at all, so that 7, ceases to have any meaning as connected with regression or array variation, still its value has a perfectly definite and new significance: it measures reasonably closely the improbability that the sample could have arisen from non-associated material; it is a measure of association on a probability scale. (vi) By aid of Table V and of Table IV, or the accompanying abac, 7p, or approximately r,, this measure of the improbability of independence on a standard correlation scale can be found for any fourfold table in a few minutes. The extension of the fundamental idea of this paper to 3 x 8 tables suggests itself, and I hope shortly to publish a supplementary paper on that point. * T am sorry to animadvert thus strongly on the work of an old pupil and colleague, but I consider that the association coefficient never had more than formal logical interest, and that to try to resuscitate it in practical statistics is to check the advance of modern scientific methods. Since this paper was printed, I have seen a memoir by Mr Yule in type, which is shortly to be issued in the Journal of the Royal Statistical Society. In that paper he defends, on what appear to me to be wholly inadequate grounds, the use of his Coefficient of Association and introduces what he terms a “ Colligation Coefficient ” —a very old friend with a new name. A reply, at length, to that memoir will appear in the forthcoming number of Biometrika. ON A NOVEL METHOD OF REGARDING ASSOCIATION 25: TABLE I. a Values of (—log P), entering with r and yay. Values of ,o;,. ‘01 02 08 ‘04 05 ‘06 ‘07 ‘08 0-05 6:248 1:907 1-020 0°675 0-498 0392 0°322 | 0-273 0-075 13-228 3°760 1-908 1-217 0874 0°674 0545 | 0°456 0-1 22924 6-267 3076 1-910 1343 1-019 0-814 | 0°675 O15 50°687 13°329 6-298 3°784 2586 1916 1-498 | 1:218 0-2 90-035 23°254 10°71 6343 4-259 3-100 2°384 | 1:924 03 206°348 52°453 23°836 | 13°758 9°057 6-478 4-906 | 3-903 O-4 380°266 96:013 43°254 | 24°726 | 16112] 11-407 8552 | 6686 05 626°428 157'607 70°669 | 40177 | 26:025 | 18312] 13-642 | 10°597 0-6 970°879 243°753 | 108-980 | 61°747 | 39°845 | 27-922 | 20-713 | 16-020 O7 1463-946 367033 | 163-781 | 92°579 | 59584 | 41°634 | 30°792 | 23-740 0-8 2220:267 556°100 | 247-801 | 139°832 | 89°819 | 62°625 | 46-209 | 35°539 0-9 3607:924 902°949 | 401-907 | 226-479 | 145-241 | 101:085 | 74:442 | 57-134 0°95 | 5056547 | 1265-013 | 562°757 | 316-904 | 203-069 | 141-207 | 103-886 | 79671 Values of r. TABLE II. Values of (—log P) corresponding to given values of x? in a 2x2 table.' (Extension of Palin Elderton’s Table for n’= 4.) x? |-logP]} x? |-logP] x? -log P x? —log P x? —-logP x? -logP 1 | 0096 | 26-) 5-021 50 | 10-097 | 1100 | 237-439 | 2600| 562:973] 18500 | 2929°521 2 | 0:242 | 27 | 5-230 60 | 12231 | 1150 | 248°287 | 2700] 584-680 | 14000 | 3038086 3 | 0-407 | 28 | 5:440 70 | 14370 | 1200 | 259°135 | 2800| 606-387 | 14500 | 3146-652 4 | 0583 | 29 | 5-650 80 | 16513 | 1250 | 269-983 | 2900| 628-094] 15000 | 3255-219 5 | 0°765 | 30 | 5-860 90 | 18659 | 1800 | 280°832 | 3000| 649-801 | 15500 | 3363-785 6 | 0952 [| 31 | 6-071] 100 | 20°809 | 1350 | 291°681 | 3500| 758-341] 16000 | 3472°352 7 | 1143 | 82 | 6281] 150 | 31:°579 | 1400 | 302°531 | 4000| 866°886 | 16500 | 3580°919 8 | 1337 | 33 | 6-492] 200 | 42°375 | 1450 | 313°381 | 4500| 975-434] 17000 | 3689-486 9 | 1533 | 34 | 6703] 250 | 58:184 | 1800 | 324-231 | 5000) 1083-995 | 17500 | 3798-053 10 | 1°731 | 35 | 6:914]' 300 | 64-002 | 1550 | 335-081 | 5500 | 1192538 | 18000 | 3906-621 11 | 1-931 | 36 | 7:126] 350 | 74826 | 1600 | 345:931 | 6000 | 1301-092 | 18500 | 4015-188 12 | 2132 | 37 | 7-337] 400 | 85-655 | 1650 | 356°782 | 6500 | 1409°649 | 19000 | 4123°756 13 | 2334] 38 | 7549] 450 | 96-487 | 1700 | 367-°633 | 7000 | 1518-206 | 19500 | 4232-324 14 | 2537 | 389 | 7°761] 500 | 107-321 | 1750 | 378-484 | 7500 | 1626-765 | 20000 | 4340°892 | 15 | 2°741 | 40 | 7-972] 550 | 118-158 | 1800 | 389°335 | 8000 | 1735-324 | 20500 | 4419-461 16 | 2-945 | 41 | 8-184] 600 | 128-997 | 1850 | 400°187 | 8500 | 1843-885 } 21000 | 4558-029 17 | 3151 | 42 | 8:397] 650 | 139-837 | 1900 | 411-038 | 9000 | 1952-446 | 21500 | 4666-597 18 | 3:357 | 48 | 8-609] 700 | 150-678 | 1950 | 421-890 | 9500 | 2061-008 | 22000 | 4775-166 19 | 3564 | 44 | 8821] 750 | 161-520 | 2000 | 432°742 | 10000 | 2169-570 | 22500 | 4883:735 20 | 3:770 | 45 | 9:034} 800 | 172364 | 2050 | 443-594 | 10500 | 2278-133 | 23000 | 4992°304 21 | 3-978 | 46 | 9-246] 850 | 183-208 | 2100 | 454446 | 11000 | 2386-697 | 23500 | 5100'873 22 | 4186 | 47 | 9-459] 900 | 194-053 | 2200 | 476-151 | 11500 | 2495-261 | 24000 | 5209-442 23 | 4394 | 48 | 9672] 950 | 204899 | 2300 | 497-856 | 12000 | 2603°825 | 24500 | 5318-011 24 | 4602 | 49 | 9°885} 1000 | 215-745 | 2400 | 519-561 | 12500 | 2712°390 | 25000 | 5426:580 25 | 4811 | 50 | 10-097] 1050 | 226592 | 2500 | 541:267 | 13000 | 2820°955 26 | 5021 1100 | 237-439 | 2600 | 562°973 | 13500 | 2929°521 26 KARL PEARSON TABLE III. Values of x? corresponding to the values of (—log P) in Table I. Values of ,o;. ‘01 02 03 O4 05 ‘06 07 08 — 0:05 31°84 10°88 6°36 4°51 352 2°91 2°48 2°19 . | 0075 64:66 19°95 10°89 7°38 5°58 451 3°78 3°28 | =|) OL 109°82 31:93 16°64 10-90 8:03 6°35 5-26 451 ‘3 | 0-15 238-45 65°13 32°08 20°07 | 14°24 10°93 8°82 7°39 a | 02 422-29 11135 53°16 32°29 | 22°35 16°75 13-25 10°97 | 2 03 956'68 246°62 | 114°05 6714 | 45°11 32°93 25°45 20°64 oe O-4 1758°21 447°81 | 204:07 118718 | 78:13 56°14 42°73 33°92 > | 0d 2892°33 73195 | 330°80 | 189-82 | 124-22 88°38 66°60 52°34 0-6 4479:02 | 1129°10 | 507-65 | 289°58 | 18828 | 133°02 99°55 77°70 07 675009 | 1697:24 | 760°43 | 431°96 | 279°58 | 196°57 146°35 | 113°61 0-8 10233°49 | 2568°34 | 1147°76 | 649°98 | 41922 | 293°64 | 217°74 | 168°34 0-9 16624:37 | 416612 | 1857-93 | 1049-48 | 67492 | 471-22 | 348:23 | 268-26 0°95 | 23295°86 | 5833°82 | 2599-00 | 1466-24 | 941°56 | 656°32 | 48415 | 372°37 TABLE IV, Values of log x? corresponding to values of r and yo, in Tables I and II. Values of yo. Values of r. 01 02 03 ‘O4 05 ‘06 ‘07 08 0:05 1:5030 1:0366 0°8035 0°6542 0°5465 0°4639 03945 0°3404 0-075 | 1:8106 1:2999 10370 0:8681 0°7466 0°6542 0°5775 0°5159 O1 2:0407 15042 12212 10374 0:9047 0°8028 0°7210 0°6522 0-15 2°3774 18138 15062 13025 11535 10386 0°9455 0°8686 0-2 2°6256 20467 1°7256 15091 13493 1:2240 1:1222 1:0400 08 29808 2°3920 2°0571 18270 16543 15176 1°4057 13148 O-4 3°2451 2°6511 2°3098 2°0725 1:8928 1°7493, 16307 15305 05 3°4612 2°8645 2°5196 22783 20942 1°9464 1:8235 17188 0-6 3°6512 30527 2°7056 2°4618 2°2748 2°1239 19980 18904 0-7 3°8293 3°2297 2°8811 2°6354 2°4465 22935 21654 2°0554 0-8 4:0100 3°4097 3:0598 2°8129 2°6224 2°4678 2°3379 22262 0-9 4°2207 36197 3:2690 30210 2°8293 2°6732 25419 2°4286 0°95 43673 3°7660 3°4148 31662 2°9738 28171 2°6850 2°5710 ON A NOVEL METHOD OF REGARDING ASSOCIATION TABLE V.* Values of x. for values of $(1+ @). a(1+a) Xe 4(1+a) Xa 3(1+a) Xa a(1+a) Xe 50 12533 65 12877 *80 1°4288 95 21132 “51 1'2535 ‘66 1/2928 “81 1°4457 96 2°2740 52 1°2539 ‘67 1:2984 82 1:4641 ‘97 2°5071 53 1:2546 68 1°3044 83 1°4844 98 2°8915 b4 1°2556 69 1:3109 84 1°5067 985 32097 55 1°2569 ‘70 1°3180 85 1°5315 “990 3°7333 56 1°2585 wel 1°3256 86 15590 991 3°8854 57 1°2604 ‘72 1°3338 “87 1°5897 “992 4:0639 58 1°2626 738 1°'3427 88 1°6245 ‘993 4°2784 59 1°2652 ‘Th 1°3523 89 1°6640 994 4:5419 60 1-2680 75 1°3626 ‘90 1°7094 995 4°8779 61 1:2712 ‘76 13738 91 - 1°7623 ‘996 5°3278 62 1:2748 ‘77 1°3859 92 1°8249 997 59776 63 1:2787 ‘78 1°3990 93 1-9003 998 7°0465 64 12830 ‘19 1°4133 “94 19937 ‘999 9°3870 * Reprinted from Biometrika, Vol. 1x. TABLE VI. Extension of Sheppard’s Table of the Probability Integral 1 2 — ha? oe , F=—| e dx, giving (—log F) for «x. / Qa J x x —-log F x —log F x -log F 5 6°54265 30 197°30921 50 544°96634 6 9-00586 31 210°56940 60 783°90743 7 11°89285 82 224°26344 70 1066-26576 8 15°20614 33 238 °39135 80 139204459 9 18°94746 34 252°95315 90 1761°24604 10 23°11805 385 267 °94888 100 2173°87154 11 27 °71882 36 283°37855 150 4888 °38812 12 32°75044 37 299°24218 200 8688°58977 13 38°21345 38 315°53979 250 13574-49960 14 44:10827 39 332°27139 300 19546-12790 15 50°43522 40 349°43701 350 26603 °48018 16 57°19458 41 367 ‘03664 400 34746°55970 17 64°38658 42 38507032 450 43975-36860 18 72°01140 43 403 53804 500 54289-40830 19 80-06919 44 42243983 20 88°56010 45 441°77568 N.B. Toobtain anything 21 9748422 46 46154561 but a rough apprecia- 22 106°84167 47 481°74964 tion after #=50, the 23 116°63253 48 502°38776 table would require 2h 126°85686 49 523°45999 much extension, but for many practical 25 13751475 50 544:96634 problems it suffices to 26 148 -60624 take after 7=50: 27 160°13139 11 . 23 | 17209024 pa! =}, 29 18448283 Vn & 30 197°30921 To each of the values in this table -30103 must be added, if we wish to obtain the probability that the value is greater than x, without regard to sign. 27 28 KARL PEARSON DESCRIPTION OF PLATES. This memoir is accompanied by two abacs for which I have heartily to thank my assistant Mr G. H. Soper. : ; The first abac Plate I gives ,c,. It has not been used in the text because interpolation from the Tables gave slightly more accurate values, but its readings are quite sufficient for most practical purposes. The maximum difference we found in determining ,o, for the ten illustrations of this paper was 0006, or a unit in the value of ,o, read to two significant figures. The method of using it is as follows: Run along the horizontal giving the size of the population (left-hand scale), until you meet the vertical giving the value 4(1 +.) (bottom scale) ; then follow the 45° line through that point till you reach the left-hand scale, take the horizontal through this point and follow it, till you meet the vertical through $(1+ a,) (bottom scale), then again follow the 45° line to the left-hand side, and from the point reached traverse the horizontal to the right-hand side of the diagram where the scale gives the proper value of .o,. If, in traversing the 45° line we meet the top of the diagram instead of the left-hand side, we follow the usual rule of such abacs, i.e. drop by a vertical through the point to the bottom scale and run up the 45° line through that point to the left-hand scale and continue as before ; the final value read for yo, on the right-hand scale has for each such drop to be multiplied by 10. The second abac Plate II is entered by the value of log ® on the left-hand scale and the value of or on the bottom scale, the meet of the horizontal and vertical lines through these points determine a contour the value of which in “ probability correlation” rp is recorded on the right-hand vertical scale. CAMBRIDGE: PRINTED BY JOHN CLAY, M.A. AT THE UNIVERSITY PRESS NOTE In the course of the present memoir it is shewn that afte Bot oFnk 0%Q 0%% It may then be asked why not measure the improbability of ¢ exceeding ,o, by the ordinary theory of the probability integral? We know that ¢° is by its essence positive and we should probably have to take ¢ positive also. The distribution of ¢° for samples from a population in which ¢° is zero, has not been studied; we know the mean value, ¢’, for such a population, but we do not know the frequency of ¢ in terms of ,7, and we have no reason to suppose that it can be expressed by aid of a Gaussian distribution in terms of the constant ,o,. Similar remarks apply to Q/,og and 154/.0%3 the latter will clearly be a limited range frequency, and a normal curve distribution especially for large values of 7,, will certainly be inadmissible. As a matter of fact the probability that a sample occurs with a y over a given value is 9 (= 2 — a 3x? — -$ x? = RA taxtx,[26 oe which connotes a frequency distribution my = = 2 -4 x? J ig Se and this is not a normal distribution*. The above relations between y, rx, Q, > and their standard deviations are interesting, but they do not provide us, by aid of a Gaussian probability table, with the requisite “ equality in probability,” which we are seeking in this memoir. There is very grave danger when, having found in some case the value of the standard deviation of a statistical quantity, we then assume that for this case the Gaussian distribution must apply, and deduce thereby a measure of the significance of the quantity, e.g. estimate the significance of @ from a knowledge of oa. * It also is only a close approximation; we have actually assumed the Gaussian may be used to describe the frequency binomials, but it is a far closer approximation than using a Gaussian to describe the frequency of x, or of Q/org. LAibitAia ds FEB 18 1946 REPT. Or AGRIC. ECOR, 05 08 i +> r+] — = — — — =< — = — — — i P= 4 — — t— — = r— i ct ——] om —t+ [+ +] — me = J — — — = tt — — i — tt i t— i t— pa = r— t— pa | +1 a t-—! == = = PLATE I. —] + | + — ee i i i — = i i i i i = 4 i — i — i 1 t—| — —, — — — r— i— = = — tt = rr = — — — t— = t— +07 — i it — — = — — iy = = — i Pt i i —— st — 4 I — ra a Re a t— -—— i — t+—| ee ee rl +} rH] + -—t ——s TI iI ~~ i i ITT TD TTT = I ro +_| — — = tT — — — — — ~~ — r— i — tr ei 06 tr rt ae a — tt = = fo) + tI tt r+ i i = ——t i + [+ — i tt i t— = it + —t | — i — rm ‘a om ~~ rm rm r— r— it~ _———~] r+ rr rm rm rm ae | bof =] | — pot ee ee TA i i ft r—| — I it~ i to Pr a 4 it ~~ t~ 1 | | r— i ee r— | +o tT] por — =a tl] — = — — = TJ = [Tt t— — 7 Se t— = ri ‘05 — — t— i — rt rt rt rt + rt ee 4 — r—~ — | 1] tr rt i pt i i = = = ~~ r— r— rt rr rr rt r~ rh — =: — = — tr = — rt P+ —-+— ES He a ee ia Sas tt i i ~~ aa ee ae r ro Oe se re Sime. = = = = eee ee r re — r = tt = TJ = rt ia rm TI rr ~~ rh rh—~ a P—~ Value of oc, =e r—~ Pl _he re ~~ ae se re rm i ~~ ~~ “04 ‘a rh ph ™ Abae to determine 7p = re ‘a a ee rm i m™ ~~ = il = jp Pr = Pl Pr Pr r—~ Meee ‘a rh rh mop aH rm EN PERSE tJ = t~J [~~] tJ NS Pr rl M™ PI Pein we > mT LIT™N P ae Sal ‘03 SSS “< 1 WW i m ee me PERSE SESE ee ae ee KWAK ~ MS NN “02 NI NS NNN L| IN s mm MN NLS MA mM \ 2 L YY, ya a aa" en pte! ee es eee 1: cr Oo aN xX Sor Jo sonje a ° 0-5 100 Abac to determine ,c, PLATE IL, 200: 300: 400: 500 600 700: 800 900: 1000 Scale of n, the size of population 2000 3000 4000 5000: 6000 7000: 8000 9000 10000. *50-65 +75 *60-70 [g2 |g +80 784 6 91 *88 -90 92 93 94 95 96 97 980 985 Scale of $(1+,) and $(1 + a) ‘991 9 ‘990 -992 93 “994 -995 996 997 ‘998e “999 ‘09 “08 O07 06 “05 04 02 01 Scale of yo, 5 the be Thor Coating pe nat ; ON; ea eS "s oo . ae 4a "ne P RS ona IE. Mathiemhatical, cobain 6 “the 3 ‘Correlation: d Non-linear, Regriaon 2 t dated, ye ry ( BvolubiongeeRV. - Or the Mathes iatioal: Theory, ; 1 EBS, with thé assistance of Jom Bua Al ~ of, Ragid 1 Migratioa, . ge, Fastiad.. Price 5s 9 me: to ‘thé \"Phika Evolu vies 3 x ae oH By: “Kap, Pranson, | PRS ct fed ey Mathematical Conteitutio rr ~ Animal “Kingdom. © ; ‘Magoy. ‘Raprokh, hice Kam, ‘Bukiag Tithe upward s\ of: oné ‘hundred plates at, < sing 185% lhustrasions. of * “Albinism 2. Price: 3 - A; Monograph on Albiniém, ji Man, By.’ _ Petty, ~Part” AY. and Adag, Part. I. F Hog Regarding the “Asa ‘Kane PEARso; E.BS , A x spaeelive 8 Study sa “parison of, Queens of za: Single bal 0) 18 . Of 3: Single * nith * ‘the ‘Generel Aithanin ‘Population. * oy, r ., Conta Beau, MA. .and A - ” wo diagrams in the text:) > Pigthentation ofthe Hair and Eyes _ HL ¢ * from: the Acute Fevers, ' Steen *perative Power ° ted Hace 8 acta eee a « Macbon aay M-Bi, ChB. $0 - First Regults‘from the Oxford. Asti tory. B. ee ‘Seuverms, D.S . IV. “On the Correlation ‘bébwesn, ‘Somatic 4 "> Bertility : Tlugtrations from 'the- Inve Hibiscus, - By J. ARTHUR ciara Ae -haabtopors inthe } sett) & oe , WN. t) ometry o! ern} tians : Cys. : MAL PR. 8.5... (With two diegrams' ‘inthe text)" | -~ Vii hie ‘Teacher's: Estiniation of the Sgheral, Intelligence of, " Mi “one Seat o Hae yee a : i in. the carve “O *Reacher’s Appt ne Sines igence. -By. Watrun H.: é - sa Bee spel ‘and: Ene ag tablés,y 2° & ari intas gta ih eet } etri AL. ARL PRARSON; PRG stric Characters. “By panes ‘tee jest ware ; 4 van. the Danger of ott ee Bs ‘for REIT, B.Sb. ; 3 : | Hanon, Dab a ote on the ‘Extént'to whic Distibation of Capes, j IX. 6 ‘Disebses. it que? is determined “bycthe Lays of £ & . ba nn MD. prey Chance. Ry J.- MoD... Taovr,. MB. and GD. ' -Pradsony. RRS,” rate Mia Bosh Fae sae aay * text al, seg two ‘copies | ‘on ‘teste. inj se x. “The Op: PF Mbatetotal tir od mg, in ‘the texte) oh oat . fee) The aibsoription. pris eae in _adiaiess is. 3f nt hee “volun the. (i Volumes I, IL,, TH, TV. icy, Pay T ; i : 308., net der volume. Bptrid it per volunie, Index to Fatines . "Sige, pay oa te 6. University Press, Fetter ‘ Larie;..: respecting advertiseitients: should,-als Till further. “pala; new: Fearon * Hy obt for £11 net. ye Sg igh. ait Tbookelter. and 4 EUGENICS. LABORATO ou € pia Too PUBLICATIONS. J “Memsbir “Sepiee, i, ‘The Inheritance of. Ability. Being’: a statistical Bxiituination if the Oxford Class Lists from, the year: 1800" onwards, na ‘of. thé School - Listi: of | Harkow and- Charterhouse. ‘By’ Epgar : Scnysrer, MA, Formerly’ ‘Galton Research. ¥¢ low in National ‘Eng mics, , and & M. ELDERtOw, Galton ., * Research Scholar in National Hugenicy.. Jeeued, “Price 4s. nebess > Bee ‘TL A First Stu ‘of: the Statistics of. Insanity: and. the ‘Thheritencs of the Insane - Diathesia. »By Davin Herow, “Mi, Galton, Research Fellow. Jssued.' Pric 38. net.” TI. ~The Promise of Youth ‘and the ‘Performance of Manhood; 3 Beit a stetistical ' Examination into ‘the. Relation’ existiig’ between: Success,in. the Examinations. for the B.A. ‘Degree at - Oxford. and subsequent Success’ in ..professions). Life. é thie: Church.) “By ‘Epear: ‘Scuusrix, MAG, De; Former -deaugd.,. “Pricd: 23, 6d. Nee, : J ie: On the Measure-of the Resainblanios of First Cou z ousins. | By rem! ELDEAION, “co (Galton Restarch Scholar, assisted: by Kay Ppanson, BRS. Jésued.' Price 30, Bd," a . 7g: A. First Study’ of the Inheritance of Vision and of the. | “professions” ‘sonahien ed are(the’ Bar-and faeces sai ae Fallow i in National Eugenics, telative: ‘Tufluence’ of Heredity. and Baviroemeat. on ‘Sight. By ae grease and. Kant, 'BAREON, | Fy RS. , Teswed. Price: 4s. net. “By ' Vis. Tr asury of Human. t eritance Pedi; 88 \f a Chevabtees 3 in, Man),; Parts Taher ance (Pedi igre 2 Diane Brachydactylian, caf My tism, ; ‘Price TAs, net” a Ai. The ‘Influence - of, ° ‘Parental . Oc upation: mo “Physique, ot the, Offspring.” By’ Eowey M.. itieunon, ay vill. ‘The: Influence: of pacha ‘Home En > on the a ae of Seb. aka Price 4s: neh... shite and: pathologi ical j«Roat, Polj brie m0; Galton 18 Peat ‘it enc losis oy 7] SA First Study. ‘of the | iets of Parent tal Alcoholism ate Physique “' and , Intélligence. of. the: Diffepring. - By. Exag.: M,"-Experroy, .Galton EB arian Scholar, assisted hy : ~ ‘Kagb Pearson, BRS. | Jegued. Second. Lidition: Price 4a. net. mice mt XE The Treasury of Human Inheritance (Pedigxéek of physical, psychical:'and pathological Pes a pyle 09 mae Ox 3) ine Charsiters in Fe Pate ve {Cleft Palate Hikes of Prige 10s. “qed: 3 eo ee ae alee XII. the Pe Mconets of Haman Inheritance (edi eos of physical, ps “.., . Characters/in, Man), . Parti’ V- anid:-VI. ’ fen wed; - Price 1 P % ‘XI A \ Second Study ofthe Influence of Pare ae atid Intelligence of the’ ‘Offspri A Reply yo ” Kyidencé cited By. them: .: By’ "AHL PEARSON, _ XIV. A Preliminary Study of Extreme. iaeees ne = a elim inary Bb dy assisted by Davy: Haron, D.Sc." Jesued. Price 3 net. : ove Raa rong of Human pratima (Bedi ea iyaital; pe ; oueiie — tations and . ates af Chardoters’ in an). ‘Parts VIL- and: “VAL ay Pedigrees... “Fesueds: : Pricé 158, net. ° : af 2XVE The Tréasury ‘of: Banas i iaherltanoe: Pista ‘Matter: and: com ete: amevand : ei une bject Tntice f0 ¥ to Val.’ I. (With: Fro: ee ‘Postraite « of. FB Hreneis Galto: comple Nine e andl “Price 38." ‘net: ibe XVI, A Second Study of Extreme a i. medi ' Buékram’ covers be or higding” Yatome. 7 of. ‘the Deeley of: Hu cts ian: Inheritance: wit . nee of: ‘tie. “bust of Sir Francis Garon’ ‘by: ‘Sir Guorce Framrrox can: be obtained from: the: ee Laboratary by. j ‘sending a postal order for Qs, 9d. to the Hon. Secretary. 2 4 ; “ON arge™ phatognap’ (I1” x x13") of Sir FrancisGanron™ by. ihe: late MeDew caren can: a also ‘be bt ¢ ” aoe the ee iakles sending a ‘Postal order for. 106, ae Ac) the! Aig Letiate a co ae