REPRINT FROM THE PROCEEDINGS OF THE ROYAL SOCIETY OF EDINBURGH. SESSION 1909-1910. VOL. XXX. — PART VI.— (No. 34.) The Significance of the Correlation Coefficient when applied to Mendelian Distributions. By John Brownlee, M.D., D.Sc. EDINBURGH : Published by ROBERT GRANT & SON, 107 Princes Street, and WILLIAMS & NORGATE, 14 Henrietta Street, Covent Garden, London MDCCCCX. Price Two Shillings. XXXIV. — The Significance of the Correlation Coefficient when ap- plied to Mendelian Distributions. By John Brownlee, M.D., D.Sc. (MS. received February 22, 1910. Read January 24, 1910.) 1. At the present moment there is much discussion regarding the means by which properties are hereditarily transmitted from a parent organism to its offspring, and of the extent to which the Mendelian theory is capable of accounting for the facts. In this note it is not proposed to discuss the general question but to investigate the conditions under which the theory of correlation may be applied to Mendelian groupings. Two important papers on this subject have already been published : one by Pro- fessor Pearson, entitled “ A Generalized Theory of Mendelian Inheritance ” ; * the other, which is largely a criticism of this, by Professor Udny Yule.f In Professor Pearson’s paper the results produced when two organisms with any number of pairs of different zygotes mate indiscriminately are fully considered. He finds that such a population once established is stable, and he then deduces the parental and fraternal correlation coefficients. He finds that the parental correlations are independent of the number of zygotes, and also that the coefficients are considerably inferior in value to the numbers actually found by observation. Professor Yule, in criticism, says that the observed value of the coefficients can be obtained if a certain amount of weight is given to the effect of the hybrid and recessive elements, and he gives a formula in which this result is exhibited. 2. Professor Yule’s criticism suggests that if the Mendelian theory is true, great care will be required in interpreting the meaning of a correlation coefficient, and the purpose of this paper is to investigate how far values of the latter can be taken as representations of real relationships. As Professor Pearson has shown that the simplest Mendelian formula has the same regression as the more complex, it is unnecessary for me to repeat his mathematical proofs, the case of the mating of two organisms differing in one particular giving the information required. 3. Professor Yule has pointed out there are several varieties of corre- lation possible on a Mendelian basis. The chief, however, are, (1) where the hybrid has properties of its own differentiating it from either of its parents ; * Royal Soc. Trans., 1903, p. 53. t “ On the Theory of Inheritance of Quantitatively Compound Character on the Basis of Mendel’s Laws,” by G. Udny Yule. Report of Conference on Genetics, published by Royal Horticultural Society of London. 474 Proceedings of the Royal Society of Edinburgh. [Sess. and (2) where the dominant includes the hybrid. It is obvious that the correlation between parent and offspring will be much greater in the former case than in the latter. This argument will be made clearer if the element- ary Mendelian formula is examined. In the first place, consider a population consisting of two pure races. Let them be denoted by (a, a) and (b, b) respectively and let (a, b) be the hybrid between them. Then the whole population may be expressed by a parentage of both sexes each repre- sented by a2 (a, a) + 2 xy (a, b) + y 2 (b, b), where x2, 2 xy and y 2 denote the numbers respectively of each type. If mating is random and fertility equal, we have offspring in the following proportions : — x2 (a, a) mating with x 2 (a, a) gives xi (a, a) >> >> 2 xy (a, b) 55 xhy (a, a) + x3y (a, b) 55 55 y 2 (b, b) 55 x2y2 (a, b) 2 xy (a, b) 55 55 x2 (a, a) 55 xhy (a, a) + x3y (a, b) jj n 2 xy (a, b) 55 xhy2 (a, a) + 2 x2y2 (a, b) + y'2x2 (b, b) ? ? >> y 2 (b, b) 55 xy3 (a, b) + xy3 (b, b) y 2 (b, b) 55 55 x 2 (a, a) 55 x2y2 (a, b) 55 5 5 2 xy (a, b) 55 xy3 (a, b) + xy3 (b, b) 55 55 y2 (b, b) 55 yi (b, b) Adding together and arranging the terms, we have the population of offspring given by x2(x + y)2 (a, a), 2 \xy(x + y)2 (a, b), y2(x + y)2 (b, b), or the numbers of the offspring are in the same proportions as those of the parents ; that is, the population is stable. Stability, then, depends on the number of the hybrid being equal to twice the geometric mean of the number of the pure races. It is also easily shown that even though these proportions are not originally present they at once appear. 4. When these figures are arranged so as to show the correlation from parent to child the following table is formed : — Number of Parents of each Type. Number of Offspring of each Type. (a, a). (a, b). (b, b). (a, a) . xi+xhj xhj + xhj2 (a, b) . xhj + xhj- x3y + 2x2y2 + xy3 xhj2 + xy3 (b,b). . x2y2+xy3 xy3 + yi 1909-10.] The Significance of the Correlation Coefficient, etc. 475 Dividing by the common factor x-\-y this becomes Parents. Offspring. (a, a). (a, b). (b, b). (a, a) X3 x2y (a, b) . x2y xy(x + y) xy2 (b, b) . xy2 y3 In this table the regression is linear, and therefore the correlation between parent and offspring may be determined by the product method and is given by ?• = • 5. This shows that in a stable population the correlation is independent of the relative proportions of purer races. Now in ascertaining the correlation when the hybrid can be distinguished from the dominant the process given above is correct, but when the hybrid has no points of special distinction and must therefore be included in the dominant, the table is condensed to the following : — Parents. Offspring. (a, a) + (a, b). (a, a) + (a, b) . (b, b) . . x3 + 3 x2y + xy2 xy2 Here the regression is linear as shown by Professor Pearson, so that by the product method r = — ^ x+2y or, - -333 when x — y. 5. By repeating the above process the correlation of offspring with remoter ancestors can be easily evaluated. The first hypothesis, namelv, that the hybrid is independent of the dominant, leads to correlation of \5, •25, T25, etc., or, in other words, they are there given by Galton’s Law of Ancestral Inheritance.* On the second hypothesis, the one investigated by * Professor Pearson, Royal Soc. Trans., vol. cxcv. p. 119, Table IX., “Exclusive Inheritance.” 476 Proceedings of the Royal Society of Edinburgh. [Sess. Professor Pearson, the same correlation coefficients are represented by J , tV> 24 > etc- The well-known correlations found by observation have no obvious relation to either of these sets of figures, and if Mendel’s law is proved to be efficient, some means of reconciling theory and observation must be found. In the subsequent pages the various factors which influence correlation will be considered under different heads. Influence of the Different Methods of Calculating Correla- tion Coefficients on the Values Deduced if Mendelian Principles hold. 6. When the typical correlation table for parent and offspring given by Mendelian theory is considered it is evident that it shows several properties. If, say, the population consist of (a, a), (a, b), (b, b), then it may be tabulated in two ways : Pure (a, a) containing two a elements ; Hybrid (a, b) „ one a element ; Pure (b, b) ,, no a element ; or if the hybrid (a, b) resemble (a, a) in appearance we have (a, a) + (a, b) not having a pair of b zygotes and (b, b) possessing a pair of b zygotes. Both these forms have linear regression, and in consequence the product method of determining correlation is valid. The case already given may be repeated. Taking x equal to y the correlation of parent and offspring reduces to the following simple form : — Parent. Offspring. (a, a). (a, b). (b, b). Totals. (a, a) . 1 1 2 (a, b) . i 2 1 4 (b, b) . 1 1 2 Totals . 2 4 2 8 This table shows obvious symmetry, has evidently7 linear regression, and gives a correlation coefficient between parent and offspring of r=5. But if the table is further condensed, that is, if (a, a) and (a, b) are considered as one class we have instead : — 1909-10.] The Significance of the Correlation Coefficient, etc. 477 Parent. Offspring. (a, a) + (a, b). (b, b). Totals. (a, a) + (a, b) 5 1 6 (b, b) . ] 1 2 Totals 6 2 8 Here again the regression is linear, and as the result we have o o •333. So far all is clear. In the last case, however, the distribution is markedly skew, and while the product method is applicable it is only applicable because the regression is linear. 7. It is therefore specially important to consider what happens when other methods of obtaining the correlation are employed. The chief of these is the fourfold division method. In a Mendelian instance such as this, the fourfold table seems specially applicable, but it assumes normality of distribution so that the fourfold table should give a higher correlation than r=-3333. As a matter of fact it does. The equation for determining r is •62035 =r+‘22747r2 + -04951r3+ -12279^ + -001898r5+ . . . which gives r = - 53. That is to say, the correlation is even higher than that obtained when the hybrid is distinguishable from the dominant, and in applying the fourfold method we have returned to or even gone beyond the uncondensed table. The higher coefficients are likewise increased and the series becomes Parental. Grand- Great- Great-great- parental. grandparental. grandparental, •53, •29, •15, and -073 as against •5, •25, T25, and -063. 8. If the simple Mendelian table be again considered, and if for the moment the distinguishing character of the hybrid and the dominant be assumed somewhat indefinite, we can make several tentative divisions, either bisecting the hybrid or dividing it into such divisions that one- fourth resembles the recessive as follows : — 478 Proceedings of the Royal Society of Edinburgh. [Sess. Parent. Parent. M. Offspring. (a, a). : (a, b). j (b, b). (a, a) 4 2 2 (a, b) 2 2 2 2 2 2 2 2 (b,b) 2 2 4 giving fourfold distributions, N. Offspring. (a, a). ; (a, b). (b, b). (a, a) (a, b) (b, b) 4 3 3 5 1 1 3 1 1 3 1 1 1 4 Parent. M. Offspring. 10 6 6 10 leading to correlations M. N. Parent. N. Offspring. 15 5 5 7 r = -441, r = '501, when calculated by the fourfold method. Thus, again, Mendelian principles do not lead to low correlations but to figures approximately equal to those found by observation. 9. When more complex formulae are taken the result is nearly the same. Supposing that instead of one pair of zygotes the parents possess two or three, that is, we have Dominant. Recessive. Father. Mother. (a, a) (b, b) (c, c) (d, d) (e, e) (f, 0 and let mating be random, then the correlation table in the case of two pairs of zygotes becomes Parents. Offspring. Two Pairs of Dominants. One Pair of Dominants. No Dominants. Two pairs . 25 10 1 One pair 10 12 2 None . 1 2 i 1909-10.] The Significance of the Correlation Coefficient, etc. 479 admitting of two fourfold divisions, namely : — B. 25 11 11 17 The former of these gives and the latter C. 57 3 3 1 r= '45, r=- 45, both values much in excess of the '333 given by the product method. When the three pairs are involved we have : — Parent. Offspring. Three Pairs. Two Pairs. One Pair. None. Three pairs 125 75 15 1 Two pairs . 75 105 33 3 One pair 15 33 21 3 N one . 1 3 3 1 This form is capable of three different fourfold divisions, namely : — A. 125 91 91 205 C. 497 7 7 1 B. 380 52 52 28 Giving A. ?•=• 42, B. r= -42, C. r=-45. 10. It is evident that when two and three pairs of zygotes are con- densed we do not go straight back to the normal distribution. The reason of this is that the normal surface obtained when the elements are con- sidered separately, represents something different from the surface which is condensed into the last tables. 480 Proceedings of the Royal Society of Edinburgh. [Sess. If the parents be a, a and b, b | c, c d, d then the offspring having two elements from the same parents are — P. Q. R. a, a C, C a, b d, d b,b c, d which represent different things according as dominance exists or not ; for if dominance exist R is included among those having apparently two pairs of dominant zygotes, while if the hybrid is distinct it is grouped with P and Q as containing two units from the same parent. 11. In addition to the methods just given Professor Pearson has also discovered two methods of determining correlation by means of what he calls contingency. It is not necessary to go fully into this part of the question. The manner in which the results given by these methods differ from those just considered is illustrated in the subjoined table. They are not in general suitable for simple Mendelian cases, as they depend for success on the number of divisions being much more numerous than these tables o-ive. Table showing the Correlation Coefficients calculated by different Methods WHERE ONE, TWO, OR THREE DOMINANT ZYGOTES OCCUR IN ONE PARENT AND A like Number of Recessive in the other. Fourfold Table. Product Mean Square Mean Method. Contingency. Contingency. A* B* c.* One zygote ■333 ■32 •37 •5 Two zygotes . •333 •33 •41 •46 •46 Three zygotes •333 •32 •39 •42 •42 •45 * See par. 9. Results of Assortive Mating. 12. With the same notation as just used the most general form of correlation under a Mendelian system for assortive mating between husband and wife, if the standard deviation of each is equal, is the following : — Husbands. Wives. (a, a). (a, b). (b, b). Totals. (a, a) . m 2 r 71 m+n+2r (a, b) . 2 r 4 p 2 r 4 (r+p) (b, b) . n 2 r m m + 2 r + n Totals . m + 2r + n 4 (r+p) m + 2 r+n 1909-10.] The Significance of the Correlation Coefficient, etc. 481 When the hybrid is distinct from the dominant the value of the correla- tion coefficient depends only on the value of m, n, or r, though in the case when the hybrid is not distinct the value of p exercises an influence on the result. In a typical simple Mendelian distribution of the population m + n will be equal to 2 r and p to r. Those values, however, do not give an immediately stable population, the standard deviation of the offspring being higher than that of the parents. This population, however, quickly tends to stability. On the other hand, if the population is immediately stable it is easily seen that p must be equal to n, for the first generation gives a parentage and offspring as below : — Parent. Offspring. (a, a). (a, b). (b, b). Totals. (a, a) . m + r r+p m+2 r+p (a, b) . r + n 2 (r+p) r + n 4?- + 2 p + 2 n (b, b) . ... (r+p) m + r m+2r+p Totals . m + 2r + n 4 (r+p) m + 2 r + n 2m + 2n 4- 8 r + 4 p and as the total is the same whether the addition is made by columns or by rows, the sum of each row must be equal to tbe sum of the corresponding column if the standard deviation remains the same. Or, m + 2r + p = m + 2r + n, which requires that n shall be equal to p. 13. In the first place, the varieties of the correlation coefficients when m + n = 2p will be considered. In this case, changing the letters for convenience, the initial correlation table between husband and wife may be taken to be : — Husbands. Wives. (a, a). (a, b). (b, b). Totals. (a, a) . n-a n a ' 2 n (a, b) . n 2 n n 4n (b, b) . a n n-a 2 n Totals. 2n 4 n 2 n | 8 n (a) 482 Proceedings of the Royal Society of Edinburgh. [Sess. If the hybrid is distinct from the dominant the correlation of husband and wife is given by — c a >•= • 5 n If the dominant include the hybrid, then the table condenses to — Husbands. Wives. (a, a) + (a, b). (b, b). Totals. (a, a) + (a, b) bn - a n + a 6 n (b,b). . . n + a n-a 2 n Totals . fin 2 n 8 n giving a correlation 14. If the parentage be as in (a) the correlation table for parent and offspring is — Parent. Offspring. (a, a). (a, b). (b, b). Totals. (a, a) . f n-a n |n - a (a, b) . "1” Cl 2 n 4“ d 3n + 2a (b, b) . n #n - a %n — a Totals . 2 n 4 n 2 n 8 n giving a correlation — * _ 3n - 2a t 0' J(in-5n - 2a) ’ reducing if a = 0 to r=- 5, i.e. there is no assortive mating, or to rf.0. = -596 if r,m. = -25. 15. The population given by the parentage (a) is evidently represented by offspring in the proportion -§(« - a) (a, a) + (3 n + 2a) (a, b) + ■§(?* - a) (b,b), * ff.o. signifies correlation of father and offspring. H.m. » » t> » mother. 1909-10.] The Significance of the Correlation Coefficient, etc. 483 which has a higher standard deviation than the parentage, being equal in the latter case to -5 and in the former to - n — , or if a = \n, to -5625. This latter value is not, however, constant with such mating, but increases gradually up to a limit. 1G. In addition to the correlation coefficients the contingency coefficients have also been calculated in some instances to show the degree of correspond- ence of the two. It is seen that for the parental correlations they fall short of the former, but approach them closely when they arrive at great-grand- parental correlations. Three sets of figures have been calculated for each case. Case 1. That when there is no assortive mating. Case 2. That when there is assortive mating with equal fertility and population not immediately stable. Case 3. That when there is assortive mating with equal fertility and an immediately stable population. 17. Tables of Parental, etc., Correlations based on Different Hypotheses. i. The hybrid separate : — No Assortive Mating. Assortive Mating. r=- 25. Assortive Mating : Immediately Stable Population. ?'=125. r=-25. Product Contingency Product Contingency ■ Product Product Method. Method. Method. Method. Method. Method. Parental •5 •487 •589 •576 •563 ■625 Grandparental •25 •242 •366 •343 •316 •391 Great-grandparental •125 •124 •234 •223 •178 •244 Grt.-grt. -grand parental . ■0625 •0624 •143 •142 •100 •153 ii. The dominant including the hybrid : — No Assortive Mating. Assortive Mating. r=-25. Assortive Mating : Immediately Stable Population. r=-1875. Parental •3333 •495 •548 Grandparental 1667 •307 •351 Great-grandparental 0833 •203 •236 Grt.-grt. -grandparental . •0417 •141 •161 484 Proceedings of the Royal Society of Edinburgh. [Sess. It is to be noted that column 2 in Table ii. gives almost exactly the figures found by observation and would thus appear a possible expression of .the facts, though it is more probably a mere coincidence, as will be shown later. 18. Two more cases of importance remain to be considered: that where like mates unlike, and that where the dominant includes the hybrid. Taking that where like mates unlike and reversing the mating given in par. 13, we have : — Husbands. W ives. (a, a). (a, b). (b, b). (a, a) . a n n - a S (a, b) • n 2 n n Kb, b) . n - a n a If the population of offspring be then found and the correlation cal- culated we find that — 1 + 2- 2( 3 + 2- Table of Values. {Hybrid distinct.) Value of a n Correlation, Husband and Wife. Correlation, Parent and Offspring. •ooo - -500 •289 •125 - 375 •347 •250 -•250 •401 •375 - -125 •452 •500 0 •500 •675 •125 •546 •750 •250 •593 ■875 •375 •631 1-000 •500 •671 This table also gives the effect of assortive mating when it is positive as well as negative. 19. When the dominant includes the hybrid and the assortive mating is confined to the mixture we have then a correlation table as the following : — 1909-10.] The Significance of the Correlation Coefficient, etc. 485 Husbands. Wives. (a, a). (a, b). (b, b). (a, a) . m 2m n (a, b) . 2m 4m 2n (b, b) . 1 n 2 n m which gives the parent and offspring table : — reducing to — or Parent. Offspring. (a, a). (a, b). (bj b). (a, a) . 2m 2m (a, b) . m + n 3 m + ii 2 n (b, b) . m + n m + n Parent. Offspring. (a, a) + (a, b). (b, b). (a, a) + (a, b) 8m + 2 n 2 n (b, b) . m + n n + m Parent. Offspring. (a, a) + (a, b). (b, b). (a, a) + (a, b) 8 + 2a 2a (b, b) . 1 + a 1 +a n a = — . m it 486 Proceedings of the Royal Society of Edinburgh. [Sess. Table of Values of the Correlation of Parent and Offspring for Different Values of — by Fourfold Table Method. m Values of a. Assortive Mating. Parent-Offspring Correlation. •500 •454 ■666 •667 ■287 •621 •750 •593 1-000 •ooo •539 1-5 -•315 •454 2 - -525 •397 20. The values of the grandparental coefficients can likewise be evaluated, but the labour is somewhat greater than in the previous sections, and does not seem to promise any results beyond what can be surmised from the previous argument. In this case a moderate degree of assortive mating in the parents has apparently little effect on the correlation coefficients. 21. In general it is to be noted that a large variety of different values of the correlation coefficients arises on different hypotheses, and also that the correlation of parent and offspring differs greatly according to the kind of assortive mating of the parents, so that the value of the coefficient of assortive mating gives very little guide to the value of correlation between parent and offspring. It is also to be noted that the successive heredity correlation coefficients are not in an exact geometrical progression. Effect of Parental Selection on the Correlation Coefficient. 22. The effect of parental selection has been investigated by Professor Pearson on the basis of the normal curve of error. On this basis it is shown that the higher the parental selection the lower the correlation coefficients. This, however, does not seem to follow on a Mendelian mechanism. Three cases occur on this basis which require to be considered separately: (1) Where the dominant is present in excess or defect ; (2) where the hybrid is present in excess or defect ; (3) where the recessive is present in excess or defect. These are very easily evaluated. The correlation tables here, however, are different from those which go before. Regression is not linear, so that the product method does not give an exact but only an approximate value of the correlation coefficient. 23. Case I. — Let m (a, a) + 2 (a, b) + (b, b) be the population of the selected parent and {(a, a) + 2 (a, b) + (b, b)} of the non-selected parent. 1909-10.] The Significance of the Correlation Coefficient, etc. 487 These are equal if m + 3 = 4 p ; but p may be neglected as occurring in every term and therefore not affecting the result. The correlation table for the selected parents and offspring, if mating be random, is then the following : — o Selected Parents. Offspring. (a, a). (a, b). (b, b). Totals. (a, a) m 1 m+ 1 (a, b) . m 2 1 m + 3 (b, b) . 1 1 2 Totals . 2 m 4 2 2m + 6 This gives / 6 m + 2 V m2 + 1 4 m + 17 if the hybrid be distinct, or to m + 1 :2(m + 2) if the dominant include the hybrid ; reducing if and to m = 1 to r = '5, /=■ 333, respectively, as before seen to be the case. The correlation table for the non-selected parent and the offspring is : — Non-Selected Parent. Offspring. (a, a). (a, b). (b, b). Totals. (a, a) . m + 1 m + 1 2m + 2 (a, b) . 2 m + 3 m+1 2m + 6 (b, b) . 2 2 4 Totals . m + 3 2m + 4 m + 3 4m+ 12 * rs. o. signifies the correlation of the selected parent and offspring. rn,0. „ „ „ non-selected parent and offspring. VOL. XXX. 32 [Sess. 488 Proceedings of the Royal Society of Edinburgh. This gives if the hydrid be distinct ; m + 3 v/{2(m2+ 14m + 17)} r„„.= v/(3-m + 2) if the dominant includes the hybrid ; reducing if m = 1 to r= 5, and r'=-333. 24. Case II. — In like manner, if the parentage be such that hybrid is in excess or defect we have, if the population of selected parents be (a, a) + 2m, (a, b) + (b, b), and the population of non-selected parents p{(a, a) + 2, (a, b) + (b, b)}, 1 ^ s.o. — if the hybrid be distinct ; J2(m + 1) 1 ,so' 3(2m+l) if the hybrid be included in dominant. The correlations in this case of the non-selected parents and offsprings are constant and identical with those where there is no selection, namely, r = - 5 and r = '333, for the correlation table for the non-selected parents when written out is as follows : — Non-Selected Parent. Offspring. (a, a). (a, b). (b, b) (a, a) . m+ 1 m+ 1 (a, b) . m+ 1 2m + 2 m+\ (b, b) m+ 1 m+ 1 and m + 1 being a factor throughout, the result is not affected. 25. Case III. — If the recessive be in excess or defect we take again, the population of the selected parent, (a, a) + 2, (a, b) + m (b, b), and „ „ non-selected parent, p{ (a, a) + 2, (a, b) + (b, b)}. From this the correlations, if the hybrid is distinct, are obviously the same as in Case I., but if the hybrid be included in dominant then we have— 4 m 3(m + l)(m + 5) (m+l)> v/3(m + 5)s 1909-10.] The Significance of the Correlation Coefficient, etc. 489 26. The values of the correlation coefficients on these bases as m varies are given in the following tables. Table I. — Correlation of Parents (Selected and Non-selected) and Offspring WHERE THE HYBRID IS DISTINCT FROM THE DOMINANT. Dominant or Recessive in Excess or Defect. Hybrid in Excess or Defect. m. / 6m + 2 „ 2(m + 3) \/2 rn.o.‘5. ,s-°.— \/ m2 + 14m+17 D'°- 2(m- + 14m +17) 's0- 2(.w+l) ■o •343 •515 •702 •5 ■25 •413 •507 •632 •5 •5 •454 •503 •577 •5 •75 ■481 •501 •534 •5 1-00 ■500 •500 •500 •5 1-5 •523 •502 •447 •5 2-0 •536 ■505 •408 •5 2-5 •540 •510 ■378 ■5 3-0 •542 •516 •342 •5 4 •541 •525 •316 •5 5 •534 •534 •289 ■5 6 •526 •544 •267 •5 00 0 •702 0 Table II. — Correlation of Parents (Selected and non-Selected) and Offspring WHERE THE HYBRID IS INCLUDED IN THE DOMINANT. Dominant in Excess or Defect. Hybrid in Excess or Defect. Recessive in Excess or Defect. m. ^ S.O. 1 n.o. ^ S.O. ^ n.o. ^ S.O. r' ' n.o. _ m+ 1 l 1 / 4 m (m+l)4 2(m + 2) V3(m+2)4 \/3(2m+l)J‘ •333 V 3(m+l)(m-|-5) V3(m + 5){ •o •250 •408 •578 •333 ■o ■258 •25 •278 •385 •471 •333 •225 •281 •50 •300 ■365 •408 •333 •284 •301 •75 •318 •348 ■365 •333 •319 •320 1-00 •333 •333 •333 •333 ■333 •333 1-50 •357 •308 •289 •333 ■350 ■357 2-00 •375 •289 •258 •333 •356 •378 2-50 •388 •273 •235 •333 •356 •394 3-00 ■400 •257 ■218 ■333 •353 •408 4-00 •411 •236 T92 ■333 •344 •430 5-00 •429 •218 T79 •333 •333 •447 6-00 •437 •204 T60 •333 •236 •460 oo •5 0 0 0 •577 27. Considering the values of the correlation coefficients in these tables, we see that uni-parental selection except when large makes little 490 Proceedings of the Royal Society of Edinburgh. [Sess. difference in the correlation. Selection may raise or lower the correlation. In some cases there is a maximum and in others a minimum, these points being in general not far distant from the points of normal Mendelian distri- bution of the population. Selective mating is not, then, likely to interfere with the correlation coefficients to any appreciable extent except when the selection is stringent. 28. There are a few other cases which demand attention, some of which will be referred to when the actual figures are discussed, while some others are added in this place. 29. Case (A). — If both parents be equally selected and if the parentage is given being vi (a, a) 2 (a, b) (b, b) vi (a, a) 2 (a, b) (b, b), we have as the correlation of either parent and offspring, when the hybrid is distinct. 3 m + 1 2(w/ + l) Table of Values. m. r. m. r. 0 ■353 1*5 •522 ■5 •456 2 •577 1-0 ■500 OC •612 30. Case (B). — If the hybrid be present in normal numbers but the recessive present in defect and the dominant in corresponding excess. In other words, both parental populations consist of (1 + jw) (a, a) 2 (a, b) (l-m)(b, b). This gives a correlation coefficient when the hybrid is distinct of / 2 - m°- T~ \ 2(4 - »«2)’ m being always less than unity. Table of Values. VI. r. m. r. 0 •500 •6 •474 •2 •498 ■8 •450 •4 •490 31. Case (C). — Let the race be made up of such a population that a part only of the hybrid assumes dominant characters, that is, let it consist of parental populations of (1+m) (apparently dominant), (2— m) (hybrid), (1) (recessive), and let it mate indiscriminately. 1909-10.] The Significance of the Correlation Coefficient, etc. 491 Case (a). — Let the hybrid offspring be distinguishable at birth, develop- ing the resemblance to the dominant later, a condition frequently seen. The correlation table is as follows : — Parent. Offspring. (a, a). (a, b). (b, b). (a, a) . 2 + to 2 -TO (a, b) . 2 + 2 to 4 - 2m 2 (b, b) . m 2 - TO 2 which gives a correlation coefficient between parents and offspring at birth of the latter, (8 -(- 4 m - ra2)- ’ Table of Values. to = 0 r='500 I m— 1 r=m 426 m—' 5 r=‘ 453 | m= 1‘5 r—' 412 32. Case (b). — Let a normal population (a, a), 2 (a, a), (a, b), mate at random, and let dominance appear among the offspring later. The normal correlation table, Parent. Offspring. (a, a). (a, b). (b, b). (a, a) . 2 2 (a, b) . 2 4 2 (b, b) . 2 2 then becomes — Parent. Offspring. (a, a). (a, b) (b, b). (a, a) . 2 + to 2 + 2m m (a, b) . 2 - to 4 - 2 to 2 - TO (b, b) . 2 2 492 Proceedings of the Royal Society of Edinburgh. [Sess. giving the same correlation coefficient as before, namely, ? (8 + 4 m - m2y With the increase of m the correlation becomes less. The values of V are given under Case (a). Correlation Coefficients when more than two Races Mix. 33. So far, a mixture of two races alone has been considered. Many stocks of cattle, etc., are supposed to be derived from more than two, so that a brief consideration of how this affects the correlation values is necessary. With the same notation let the original races be — (a, a) (b, b) (c, c). Then the stable population with random mating is as before, (a, a) + (b, b) + (c, c) + 2 (a, b) + 2 (a, c) + 2 (b, c). A correlation table is then easily written down and is as follows : — Parent. Offspring. (a, a). (a, b). (b, b). (b, c). (c, c). (c, a). (a, a) . 3 3 3 (a, b) . 3 6 3 3 3 (b, b) . 3 3 3 (b, c) . 3 3 6 3 3 (c, c) . 3 3 3 (c, a) 3 3 3 3 6 To evaluate the correlation the product method hitherto used is inapplicable, and the method of contingency must be employed. In the first place, on the supposition that all hybrids are distinct, we have r — ‘ 597, which is considerably higher than the value r=-487, found by contingency when only two types of parent are considered. Secondly — 34. If a be dominant over b, b over c, and c over a (indicated in table by dotted lines), the coefficient when estimated by mean square contingency falls in value to '425. This case is very suitable for a fourfold division, 1909-10.] The Significance of the Correlation Coefficient, etc. 493 and if a and b be gathered against c, allowing for dominance, the correla- tion coefficient rises to a value of r = ‘ 51, and when the mean contingency is used to ?’ = -56. Thirdly— 35. If b and c are both dominant over a and the hybrid (b, c) is distinct, the correlation becomes -460. Thus in all cases we have a higher figure than in the case where only two types intermingle. As Professor Pearson has shown, the figures in the latter case are quite independent of the number of zygotes, and the like will probably hold here. Table showing the Correlation between Parent and Offspring in two and three Races. Two Races. Three Races. Correlation. Contingency. Contingency. Fourfold Division. Hybrid distinct •500 •487 •597 Hybrid included in Dominant •333 •316 •425 •51 The same effects will also be produced in this case by assortive mating and parental selection as in the previous cases. Fraternal Correlation. 36. The question of fraternal correlation remains to be considered. As we have seen, uni-parental selection does not in general affect seriously the values of the correlation coefficients. Assortive mating is more powerful. The effect of the latter on fraternal correlation can be estimated as follows. Consider a parentage of the following arrangement : — Husbands. Wives. (a, a). (a, b). (b, b). (a, a) . 3 4 1 (a, b) . 4 8 4 (b, b) . 1 4 3 This gives a correlation of "25 between husbands and wives. With Professor Pearson let the average families be 4 n, and we get a family grouping as follows : — 494 Proceedings of the Royal Society of Edinburgh. [Sess. Children. Number of Times each Fraternal Group occurs. (a, a). (a, b). (b, b). ( 3 4 n Father (a, a) . . < 4 2n 2 n u 4 n (4 2 n 2 n Father (a, b) . . 1 8 n 2 n n u 2 n 2 n f 1 4 n Father (b, b) . .14 2 n 2 n u An On re-arranging we have each group occurring as in the table. Brethren. Number of Times a Group occurs. (a, a). (a, b). (b, b). 3 An 8 2 n 2 n 2 An 8 n 2 n n 8 2 n 2 n 3 ... 4n So that we can write the correlation table for brothers as follows : — First Brother. Second Brother. (a, a). (a, b). (b, b). (a, a) . 3-4n(4n-l) + 8-2w(2n-l) + 8 n(n- 1) 8(4n2 + 2ti2) 8 n2 (a, b) . 8(4n2 + 2n2) 2-4n(4n - 1) + 2A'2n(2n- 1) 8(An2 + 2n2) (b, b) . 8n2 8(4w2 + 2n2) 3‘4w(4?i- l)+8-2n(2n- 1) 8 n(n - 1) 1909-10.] The Significance of the Correlation Coefficient, etc. 495 Or dividing by 4 n, First Brother. Second Brother. (a, a). (a, b). (b, b). (a, a) . 22 n - 9 12 n 2 n (a, b) . 12 n 32m - 14 12 n (b, b). in 12w 22m -9 This gives a correlation as below : — Size of Family. Assortive Mating rf.m. = —5. No Assortive Mating. 4 n= 1 •407 •333 8 n = 2 •508 •428 16 m = 3 ■515 •454 00 71= CO •555 •500 That is, if the hybrid be distinct from the dominant, and an assortive mating of the parents equivalent to '25 is assumed, the correlation coefficients quickly approach the figures given by observation. 37. Taking the dominant to include the hybrid we require a different parental grouping to give the necessary correlation, namely : — Husbands. Wives. (a, a). (a, b). (b, b). (a, a) . 7 8 1 (a, b) . 8 16 8 (b,b). . 1 8 7 This has a correlation of =|('5 — £) = '25 when the dominant includes the hybrid. Proceeding as before, we obtain the table of fraternal correlation : — First Brother. Second Brother. (a, a). (a, b). (b, b). (a, a) . (48w- 19) 24m 4 n (a, b) . 24m 56m - 26 24m (b,b). . 471 24m 48m -19 496 Proceedings of the Royal Society of Edinburgh. [Sess. Or condensing, First Brother. Second Brother. (a, a) + (a, b). (b, b). (a, a) + (a, b) (b, b) . . 152m -45 28m 28n 48n - 19 Which gives the correlation coefficients as in the following table : — Size of Family. n = Correlation Co- efficients with Assortive Mating. r=§- 25. Correlation as cal- culated by Prof. Pearson with no Assortive Mating. 4 i •317 8 2 ■401 •333 16 3 •429 •364 00 00 •476 •407 The fraternal correlation is not therefore increased so much by assortive mating as the parental-offspring correlation is. The resulting figure is still in defect of observation. 38. The same process may be applied to ascertain the correlation co- efficients when three races mix. If a standard population is taken and the method just outlined applied we get the following correlation table : — First Brother. Second brother. (a, a). (a, b). (b, b). (b, c). (c, c). (c, a). (a, a) . 1 6n - 9 8 n n 2n n 8n (a, b) . . 8 n 36 n - 18 8 n 9 n 2)i 9n (b, b) n 8 n 16m - 9 8 n n 2m (b, c) . 2 n 9 n 8 n 36m - 18 8 n 9m (c, c) . n 2 n n 8m 16« - 9 8 n (c, a) . 8 n 9 n 2 n 9n. 8 n 36m - 18 Then if n = l the contingency coefficient is r = '449, and when n = 2, r = '569, much higher values, which will be further increased if assortive mating 1909-10.] The Significance of the Correlation Coefficient, etc. 497 exists in addition ; and even when reduced by the inclusion of the hybrid with the dominant they must approach those given by observation. 39. The effect of parental selection on fraternal correlation remains to be considered. Referring to the parentages before given with reference to the correlation of offspring and parent, the two chief cases are given. Case I. Let the parentage on both sides be m (a, a) + 2, (a, b) + (b, b), and let the pure zygotes (a, a) be in excess or defect; and, Case II. Let the parentage on both sides be (a, a) + 2m, (a, b) + (b, b), and let the hybrid (a, b) be in excess or defect. Then if n = 2, i.e. if the family be 8 on an average, we have the fraternal correlation as in the accompanying table : — Fraternal Correlation. (Hybrid distinct.) Value of m. Case I. Case II. •5 •333 •514 1-0 •428 ■428 2-0 ■523 •347 3-0 •572 •314 40. Thus, such selection as that when the dominant is in excess or the hybrid is in defect tends to raise the correlation, while the opposite condi- tion tends to lower it. If both conditions exist, and if the parentage be such that the dominant is twice as numerous and the hybrid half as numerous as in the stable population, we have r = -70 when hybrid is distinct and r = -40 (product method) when dominant includes the hybrid. Consideration of Actual Cases. We have seen that many different factors affect the value of the correlation coefficients. What effect these have practically can only be estimated in a few cases. Professor Pearson has considered three cases of colour inheritance, namely : — 1. Coat colour in horses.* 2. Coat colour in cattle.f 3. Coat colour in greyhounds.^ Each of these cases will be briefly discussed and the divergences of value in the correlation coefficients explained as far as possible on the basis of what has gone before. * Roy. Soc. Trans., vol. cxcv. p. 92. Biometrika, vol. i. p. 361 ; vol. ii. p. 230 et seq. t Biometrika, vol. iii. p. 245 et seq. J Ibid., vol. iv. p. 427 et seq. 498 Proceedings of the Royal Society of Edinburgh. [Sess. Coat Colour in Horses. . This case may well be considered first, as the data are large and probably accurate. Stud books giving the colour and pedigree of the horse have been in existence for many years, while the value of the animals and the great interest which exists in breeding combine to give the facts authority. To find the correlation Professor Pearson has divided the parents and offspring into groups of Bay and Darker, and Chestnut and Lighter, and calculated the coefficients by the fourfold method ; the coefficients as determined by him are as follows : — Inheritance of Coat Colour in Horses. Parental .... •5216 Grandparental . •2976 Great-grandparental . •1922 Great-great-grandparental . T469 Now brown and bay seem both dominant over chestnut and white,* at least to all intents and purposes. Chestnut with chestnut breeds true, and brown or bay mating with chestnut breeds in the first instance dark. The relations of brown and bay do not concern us, being both dominant. The number of pale horses not chestnut is so small that it may be neglected as not affecting the result to any appreciable extent. The proportion of these colours present is roughly that of three dark horses to one chestnut, though it must be borne in mind that this has nothing directly to do with Mendelism, but represents simply the proportions which find favour at present among those who breed horses. It is worth while reproducing the fourfold tables. That of parent and offspring is as follows f : — Bay or Darker. Chestnut or Lighter. Totals. Bay or darker . 631 125 756 Chestnut or lighter . 147 147 294 Totals . 778 272 1000 This table at once reminds us of that already found from Mendel’s theory, namely (pars. 6 and 7) : — * Bateson, Mendel’s Principles of Heredity, p. 124. t Roy. Soc. Trans., vol. clxxv. p. 35. 1909-10.] The Significance of the Correlation Coefficient, etc. 499 Parent. Offspring. Dominant. Recessive. Totals. Dominant . 5 1 6 Recessive . 1 1 2 Totals . 6 2 8 which when evaluated by the fourfold method gives r = '5 3 as the correla- tion. As a matter of fact the table just quoted gives r — ' 54. If the highest ancestral coefficient is now examined we find some © difference. The table for great-great-grandparental inheritance * — Great-great-grandparents. Offspring. Bay and Darker. Chestnut and Lighter. Totals. Bay and darker . 497 252 749 Chestnut and lighter . 130 99 229 Totals . 627 351 978 is marked by the presence of a great excess in chestnut horses. As before shown (par. 25), f this tends to raise the correlation of parent and offspring. The effect of this, however, on succeeding generations may be here inquired into. In the case in point we have approximately one-third of the parentage recessive. The remaining two-thirds may be divided in two ways : it may be taken as of pure Mendelian composition, that is, we have one case of pure dominant and two of hybrid dominant ; on the other hand, considering that the pure horse may be a better animal than the hybrid, and therefore more likely to be chosen for breeding purposes, we may assume that the number of pure and of hybrid dominants is equal. The parentages on this hypothesis will then be : — 2 (a, a) 4 (a, b) 3 (b, b) (A.) 2 (a, a) 2 (a, b) 2 (b, b) (B.) The former (A) will probably give the dominant in defect and the latter (B) in excess, so that some value between the results obtained on these two hypotheses may be taken as true. * Biometrika, vol. ii. p. 255. t Cf. also par. 4. 500 Proceedings of the Eoyal Society of Edinburgh. [Sess. The first generation of parentage (A) mating freely gives offspring in the following proportions : — Parent. Offspring. (a, a). (a, b). (b, b): Totals. (a, a) . 8 8 16 (a, b) . 10 18 12 40 (b, b) . 10 15 25 Totals . 18, or 2 x 9 36, or 4 x 9 27, or 3 x 9 81 Which shows that the hybrid offspring are in number twice the geometric mean of the pure races as should be (par. 3). To obtain the next genera- tion with a like parentage an increase in the number of the pure races is required, so that the last table becomes : — Parents. Offspring. (a, a). (a, b). (b, b). (a, a) . 10 10 20 (2 x 10) (a, b) . 10 18 12 40 (4 x 10) (b, b) . 12 18 30 (3x10) Let these mate freely and we have for the correlation table of grand- parents and grand-offspring the following distribution : — Grandparents. Grand- offspring. (a, a). (a, b). (b, b). Totals. (a, a) 300 380 120 800 (a, b) 475 895 630 2000{