"> EMPIRICAL STUDIES IN THE THEORY OF MEASUREMENT BY EDWARD L. THOKNDIKE, Professor of Educational Psychology in Teachers College, Columbia University. ARCHIVES OF PSYCHOLOGY EDITED BY R. S. WOODWORTH No. 3, APRIL, 19O7 Columbia University Contributions to Philosophy and PsychologyrVol. XV. No. 3 ORK THE SCIENCE •e CONTENTS MEASUREMENTS OF TYPE AND VARIABILITY § 1. The Comparative Accuracy of the Average and the Median 1 § 2. The Comparative Accuracy of the Mean Square Deviation and the Average Deviation 5 § 3. The Divergencies of the Obtained from the True Measures by Theory and by Experiment 8 § 4. The Relation between the Amount of a Central Tendency and the Amount of the Variability of the Group about the Cen- tral Tendency 9 MEASUREMENTS OF RELATIONSHIPS § 5. The Meaning of Typical Measures of Relationship 15 § 6. The Presuppositions of Measures of Relationship 25 § 7. The Advantages of the Different Measures 25 § 8. The Attenuation of Measurements of Relationship 35 § 9. Minor Advice to Students of Mental and Social Relationships ... 41 EMPIRICAL STUDIES IN THE THEORY OF MEASUREMENT IN the present condition of psychology, sociology and education, convenience, economy and directness are as important desiderata in methods of measurement as refinement with respect to precision. The results of these studies justify certain methods which have the decided advantage of giving measures which are direct functions of the data, independent of any hypothesis about the prevalence of the so-called 'normal' distribution, but which have been somewhat dis- countenanced or at least neglected in both the theory and the prac- tice of statistics. The section on correlation attempts also to make clear just what is measured by a coefficient of correlation and what the dangers are in the application of correlation formulas without constant super- vision by an adequate sense for the concrete individual facts to be related. MEASUREMENTS OF TYPE AND VARIABILITY § 1. The Comparative Accuracy of the Average and the Median The median as a measure of the central tendency of a series of measures"~has the advantages of greater quickness of calculation, freedom from the influence of erroneous measurements, ease of in- terpretation and often greater practical significance. It is, there- fore, important to know whether the accuracy, with which the median actually obtained from a small sampling of a series conforms to the true median of the total series, is much less than the similar accuracy in the case of the more commonly used measure, the average. It is possible with any given form of distribution to calculate on the basis of the theory of probability the accuracy in either case. Trusting that some one will soon do this for typical forms of dis- tribution other than the so-called 'normal' I have chosen to get empirical data on the same question from actual experiments with random samplings from certain large series of measures. The median was calculated for each sampling by regarding the total series as measures of a continuous variable, quantity 61, for instance, equalling from 60.0 up to 62.0, quantity 63 equalling from 62.0 up to 64.0, etc. Where the median fell within a unit of the scale, as of course it usually did, the fractional part was taken 1 2 EMPIRICAL STUDIES OF MEASUREMENT which would be correct, supposing the cases within that unit of the scale to be equally frequent in all equal subdivisions of that unit of scale. The series used were the four presented in Table I. A is an almost perfect representative of the so-called 'normal' surface of frequency, limited at about -f 3.2* and — 3.2*. B is also a sym- metrical distribution following, but not so closely, the so-called 'nor- mal' type. C is a skewed distribution of the kind so frequently found in mental and social measurements. D is a flattened and rather sharply cut-off type of distribution, such as occurs often in facts subject to conventional regulation. The number of cases was for A 1,000, for B 1,307, for C 1,250 and for D 600. The mechan- ical arrangement of each series was simply so many small cards or slips of paper each with a number written on it. In each series these cards were approximately of the same size, shape and weight. From such a series, properly shuffled in a large bowl, drawings were made. The total number of cases in any series is of course of no signifi- cance. Whether a series contains 1,000, 1,100, 1,426, 13,982 or 160,000 cases makes no appreciable difference to any of the matters to be investigated here, and in the case of a distribution of the type of D, drawings of 100 from 6,000 cases would not differ appre- ciably from drawings from 600. The reason for the particular sizes of the total series was economy of time. It is most convenient to arrange series for such experiments with measures •+• and — from the central tendency, as in B and D ; the time of recording the results of draws is lessened and also the likelihood of errors. Thus in A —31, —37, —35, etc., would be better than 61, 63, 65, etc. I give the series, however, in just the way they were made and used. Every drawing of 10 or 50 or 55 or whatever number of cases was made from the full series. However, a draw of 10 having been made and recorded, a draw of 50 was obtained by adding 40 to the 10 and one of 100 by adding 50 to the 50. The 100 is thus from the full series, but is obtained with a saving of time. As a rule drawings of 10 or 11, 50 or 55, 100 or 110 and 275 were made, but, with the larger drawings, if not exactly 50 or 100 were drawn, the drawing was still utilized. Of course exact sim- ilarity in the size of the drawings is of no consequence whatever to any of the conclusions drawn. MEASUREMENTS OF TYPE AND VARIABILITY TABLE I. Quan- tity A Fre- quency Quan- tity B Fre- quency Quan- tity C Fre- quency D Quan- Fre- tity quency 61 1 1 30 — 7 20 3 1 5 1 2 80 — 5 80 7 2 9 3 3 140 — 3 100 71 5 3 6 — 27 1 4 175 — 1 100 5 9 — 25 7 12 — 23 2 5 200 + 1 110 9 15 — 21 2 81 20 — 19 8 6 160 + 3 90 3 26 — 17 10 5 31 — 15 26 7 120 + 5 70 7 37 — 13 28 9 43 — 11 58 8 95 + 7 30 91 50 — 9 62 3 54 — 7 98 9 80 5 59 — 5 102 7 62 — 3 128 10 60 9 63 — 1 129 101 63 .+ 1 132 11 45 3 62 + 3 125 5 59 + 5 102 12 35 7 54 + 7 98 9 50 + 9 64 13 20 111 43 + 11 56 3 37 + 13 28 14 10 5 31 + 15 26 7 26 + 17 11 9 20 + 19 7 121 15 + 21 2 3 12 + 23 1 5 9 + 25 1 7 6 + 27 9 5 131 3 3 2 5 1 7 1 9 1 Av. 100 0 6.0 .0 Med. 100 0 5.5 .0 A.D. 10.0 6.2 2.3 3.13 a 12.4 7.8 2.9 3.68 Q. 8.4 5.4 2.0 2.94 A.D. = the average deviation from the average. a :=zthe mean square deviation from the average. Q. = one half the difference between the 25 percentile and 75 percentile measures. 4 EMPIRICAL STUDIES OF MEASUREMENT The results of these drawings are summarized in Table II. In Table II., Nt = ihe number of sets drawn; ATc = the number of cases in each set; Av.=the average divergence of the obtained1 from the true2 average; Med. = the average divergence of the ob- tained from the true median; A.D. = the average divergence of the obtained from the true average deviation; . Obt.- Tr. .06 .12 .15 .21 .35 .18 Obt.- Tr. .12 .13 .16 .17 .35 .18 <£,.- Tr. .10 .13 .14 .27 .44 .22 + -2 + .3 In 3 sets of 110 cas< 38 + .3 + -4 — .19 — .12 .00 .00 .10 + .4 + .5 — .03 — .06 .02 .06 .20 •+ -4 + .5 + .12 + .33 .09 .06 .33 + .4 + .5 Av. Div. + .0 + .5 from true .11 .17 .04 .04 .21 + .5 + .8 + .6 + -8 In 2 sets of 275 casi es + .8 + 1.0 + 1.1 + 1.2 + 1.6 + 1.2 + 1.5 + 1.5 + 1.5 + 1.5 Av. Div. from true — .07 + .02 .045 — .05 + .06 .055 .04 .06 .05 .07 .08 .075 .04 .07 .055 + 1.8 + 3.1 Av. Div. from true = .77 SERIES D In 16 sets of 10 cases AT. Obt-AT. True. M. Obt.-M. True. — 1.6 —2.5 — 1.4 —2.0 — 1.4 —2.0 — 1.2 —1.0 — 1.2 —1.0 — 1.0 —1.0 — .8 —1.0 — .6 —1-0 — .4 0 — .4 0 0 0 0 0 + .6 +1.0 + .6 +1.0 + .8 +1.0 + 1.8 + 2.0 Av. Div. from true = .86 1.04 In 6 sets of 50 cases AT. M. A.D. 1.80 92" 11 25 .69 .62 90" 10 16 .99 .86 87" MEASUREMENTS OF TYPE AND VARIABILITY 9 § 4. The Relation Between the Amount of a Central Tendency and the Amount of the Variability of the Group about the Central Tendency In comparing groups with respect to variability allowance must be made for the fact that, in certain cases at least, the amounts of the central tendency influence the amounts of the variabilities. Thus the A.D. of men in weight is hundreds of times that of butterflies, yet the former are of course not really a hundred times as variable. Thus the A.D. of a group in a test of addition was, for trials of 40 seconds, 2.18 ; for trials of 80 seconds, 3.41 ; and for trials of 120 seconds, 5.18. It would obviously be silly if we had tested men with trials of 80 seconds and women with trials of 40 seconds, and ob- tained these results, to infer that men are 50 per cent, more variable in ability to add than are women. In using the so-called coefficient of variation (proposed by Pear- son) onemakes allowance for the possible influence of the central tendencies' amounts by dividing through the gross variabilities each by the amount of its corresponding central tendency. I have else- where shown that for mental and social measurements no one such rule can be always or even often right and suggested that in any case a division through by the square root of the corresponding cen- tral tendency is more in accord with both theory and facts.1 In this section enough data will be presented to practically dem- onstrate both of these assertions. It is not important to investigate the matter exhaustively for the very reason that no one general rule for comparing groups with respect to variability can be found. All that is needed is a clear enough proof of the inadequacy of the prac- tice of comparing groups after dividing through the gross variabili- ties by the corresponding means — clear enough to stop the spread of the practice and to warn readers against conclusions based on such comparisons. If we take the arrays of y in a case where y is positively cor- related with x we have a series of groups with central tendencies varying from lower to higher which are selected at random so far as concerns any influence on the variability except the influence of the amount of the mean. The differences in variability found for these arrays give, then, in connection with the differences in the amounts of their central tendencies, the answer to our problem for the case of comparisons of groups with respect to their variability in the same trait. If we find that even in such cases there is no constant relation of difference in central tendency to difference in 1 Mental gnd Social Measurements, pp. 102-103. 10 EMPIRICAL STUDIES OF MEASUREMENT variability, but that one law obtains for stature and another for span or finger length, then a fortiori no constant relation can be pre- supposed when the variability of a group in one trait is to be com- pared with its variability (or that of a second group) in a different trait. The first facts to which I call the reader's attention are the com- parison of arrays of y corresponding to very low values of x with arrays of y corresponding to very high values of x in the case of ten correlations chosen at random (so far as this issue is concerned) from Vols. I. and II. of Biometrika. The number of cases ranged from 49 to 319. The results are given in Table V. in the form of (1) the variability of arrays related to high central tendencies of x (and consequently having high central tendencies of y) divided by the variability of arrays related to low central tendencies of x, under the heading 'Gross'; (2) the Pearson coefficient of variability for the former divided by the Pearson coefficient of variability for the latter under the heading ; and (3) the similar ratio for the 0 1 two variabilities each having been divided by the square root of the amount of the corresponding central tendency, under the heading ==. A perfect method would give values of 100 throughout. VC.T. Width of head Length of left middle finger Number of stamens Number of stamens TABLE V. (a) Gross. 101.5 94.4 112.3 Frontal breadth Length of right antenna (aphis) Number of stamens 164.7 106.8 88.2 111.6 Number of stamens (lesser celandine) 132.6 Span Forearm length Median 115 95.7 109.2 Gross C.T. 95.7 89.5 110 86 90.8 106 99 90.1 I/ C.T. 98.5 96.3 135 96.1 100.8 119 107 97.5 Nearest to Equality. Gross VC.T: Gross Gross Gross C.T. Gross Gross Gross Gross C.TT Gross Gross The detailed facts from which these ratios come are given in Table V. (6). MEASUREMENTS OF TYPE AND VARIABILITY 11 TABLE V. (6) VABIABILITIES OF ARRAYS OF y RELATED TO Low AND HIGH VALUES OF x. IN TERMS OF A.D. (Each case measured is recorded in three lines: the first line gives the values of x; the second line gives the variabilities of the related arrays of y ; the third line gives the numbers of cases in the arrays. The volume and page numbers refer always to Biometrika.) I., 214 o? = Head length 18.0 18.1 18.2 20.1 20.2 20.3 20.4 t/ = Head breadth 3.2 3.1 3.0 3.2 3.9 3.5 3.0 35 38 51 53 54 33 30 I., 216 x = Height 58.6 59.6 60.6 69.6 70.6 71.6 y = Left middle finger length 3.6 4.0 3.0 3.2 3.3 3.2 23 48 90 97 46 16 I., 126 # = No. of pistils 12 13 14 20 21 22 23 Table I. y=zNo. of stamens 2.1 2.3 2.4 2.8 2.6 2.3 2.4 13 12 22 19 13 15 10 I., 126 a? = No. of pistils 678 16 17 18 Table II. y = No. of stamens 1.8 1.1 1.1 2.0 1.8 2.2 6 16 35 23 16 11 L, 152 x =. Frontal breadth (aphis) 1st brother 13.5 19.5 y = Frontal breadth (aphis) 2d brother 1.46 1.56 57 50 I., 153 x =. Length of antenna (aphis) 1st brother 26 28 48 50 y = Length of antenna (aphis) 2d brother 1.6 1.6 1.2 3.3 14 71 43 12 II., 161 x = No. of pistils (lesser celandine) 13 22 y = No. of stamens (lesser celandine) 1.8 2.3 24 25 II., 162 x •=• No. of pistils (lesser celandine) 14 15 16 17 28 29 30 31 32 33 t/ = No. of stamens (lesser celandine) 1.9 2.6 2.1 4.7 3.6 3.4 3.2 3.4 4.0 3.8 10 17 16 28 20 16 13 11 19 15 II., 399 x — Height 61 62 72 73 t/=Span 1.2 1.4 1.7 1.5 8.5 32.5 33 13 II., 403 x = Span 63 64 75 76 y = Forearm .83 1.28 1.03 1.28 13 32 28 11.5 12 EMPIRICAL STUDIES OF MEASUREMENT The gross variabilities often increase as we would expect with higher central tendencies, though by no means always. Seven out of ten do so, giving a median value of 109.2 instead of 100. The Pearson coefficient of variation makes too much of a deduction for an increase in the amount of the central tendency in all but three cases, giving a median value of 90.1 instead of 100. The square root deduction, with a median value of 97.5, makes the least error of any one single method. These facts alone disqualify the so-called * coeffi- cient of variation' as a means of comparing variabilities. But more detailed studies of the cases of length of finger, span and stature will be still clearer. The facts for length of left middle finger are as given in Table VI. TABLE VI. RELATION OF AMOUNT OF VARIABILITY TO AMOUNT OF CENTRAL TENDENCY. FINGEE LENGTH. (Biometrika, Vol. I., p. 216) Array. 1 2 3 4 5 9 10 11 12 13 14 15 16 17 In the case of finger length increase in the amount of the central tendency does not imply an appreciable increase in the amount of variability. No allowance is needed. In the case of span it would be equally absurd not to make an allowance and one as great or nearly as great as the Pearson method makes. For the preliminary study of the variability of span re- ported in Table V. is confirmed by the facts in the case of three other span series. These facts (given in Table VII.) abundantly prove that the influence of the amount of the central tendency on the amount of the variability follows totally different laws in the case of span ana of ringer length. Value of x to Which the Array is Related. 581 No. of Cases in the Array. 6 Central Ten- dency of the Array. 103 Variability (A.D.) of the Array. 167 591 23 107 357 601 48 108 404 611 90 109 309 621 175 111 325 631 317 112 347 641 393 114 312 651 661 920 116 331 671 413 118 339 681 264 119 345 691 177 120 334 701 97 122 322 711 46 124 333 721 17 126 318 731 7 128 386 741 4 128 275 MEASUREMENTS OF TYPE AND VARIABILITY 13 TABLE VII. RELATION OF AMOUNT OF VARIABILITY TO AMOUNT OF CENTRAL TENDENCY. SPAN. (Biometrika, Vol. II., pp. 399-401) Daughters. N. C.T. Var. Fathen i. SODS. Mothers. N. C.T. Var. N. C.T. Var. N. C.T. Var. 32.5 642 578 31 657 519 18 571 428 42.5 651 587 56 660 505 34.5 582 596 71.5 658 560 78.5 670 516 79.5 593 600 122.5 667 666 127 677 580 135.5 600 524 142.5 675 662 178.5 687 608 163 609 608 136.5 687 593 189 700 600 183 619 573 154.5 692 574 137 707 636 163 627 554 118.5 702 658 137 715 505 114.5 637 542 102.5 713 698 93 720 601 78.5 640 624 56.5 720 601 52.5 735 503 41 647 588 33 735 678 39 745 595 16 655 881 15.5 52 101 150 199 438 169.5 151.5 81.5 40.5 19.5 585 471 595 447 600 466 613 515 620 485 625 510 650 492 660 605 660 601 665 481 680 436 As a final case let us take stature. Here the variability is slightly less as the amount of the central tendency increases. The facts are given in Table VIII. constructed on the same plan as Table VI. TABLE VIII. RELATION OF AMOUNT OF VARIABILITY TO AMOUNT OF CENTRAL TENDENCY IN GROUPS DIFFERING IN CENTRAL TENDENCY. STATURE. (Biometrika, Vol. I., p. 216) Related to x. 10.0 .1 .2 .3 .4 .5 .6 .7 .8 .9 11.0 .1 .2 .3 .4 .5 .6 .7 .8 .9 12.0 .1 .2 .3 .4 .5 .6 .7 .8 .9 13.0 44 74 177 315 347 461 458 346 289 180 44 52 35 31 25 -7 8 C.T. 61.1 61.1 60.6 60.7 62.3 62.8 62.8 62.1 63.6 64.6 65.1 66.1 66.1 67.1 67.1 67.6 69.6 69.6 68.6 70.1 69.1 69.1 A.D. 286 146 190 179 170 183 132 153 137 155 152 158 156 147 157 170 158 147 127 148 136 264 14 EMPIRICAL STUDIES OF MEASUREMENT MEASUREMENTS OF EELATIONSHIPS The importance to any science of exact and convenient methods of measuring the relationships of the facts it studies should be obvious. It is therefore unfortunate that students of psychology and the social sciences have with few exceptions neglected both the theoretical problem of correlated variations and the careful measure- ment of such relationships as they have in fact found. The failure to utilize the methods devised by Galton, Pearson, Sheppard, Spearman and others is due partly to an ignorant and partly to an intelligent suspicion aroused by the mathematical derivations of these methods. Ignorance of the rationale of their derivations cooperating with ignorance of the conditions which re- quire their use and of the necessity of some such refined methods has caused the stupid suspicion and aversion. Inability to follow the mathematics of the derivation of formula?, at least in detail, cooperat- ing with the rational expectation that too abstract methods will fit the concrete cases imperfectly and with the equally rational con- fidence that proofs resting upon the assumption of close approxima- tion of actual variations in mental and social facts to the probability curve distribution are always unsafe and, perhaps, usually mislead- ing, has caused the intelligent suspicion. It is probable that unless these methods are soon subjected to a review by some one who can both make perfectly clear their presup- positions to the rank and file of investigators in psychology and the social sciences and prove their applicability to actual cases of rela- tions to be measured, there will be damage done in two ways. Many investigators will as in the past use hopelessly crude methods and misinterpret relationships; and also many investigators will learn off the formulas of the mathematical statisticians and apply them to cases where they are out of place and give inadequate and mis- leading results. To both of these errors the writer, for instance, confesses himself guilty in the past. I am unable to make such a review but as no one of those who are able seems willing,1 I have made a partial and inferior substi- tute for it which I hope may, in so far as it is sound, be instructive to students of mental measurements and, in so far as it is unsound, 1 Perhaps Mr. C. Spearman's article on ' The Proof and Measurement of Association between Two Things' (in the^jw. J. of Psy., Vol. XV.) may be considered as filling the need, but I fear that it is too technical in parts and not inquisitive enough concerning the actual relations between (1) the indi- vidual relationships, from which all our computations ought to start, and (2) the general expressions or summaries of them. At all events I am not trying to do over again, for better or worse, what Mr. Spearman has done, but some- thing which is needed as introductory and accessory to his work. MEASUREMENTS OF RELATIONSHIPS 15 may provoke some capable student to give the adequate review that is so much needed. This report will presuppose in the reader knowledge of the bare elements of the theory of measurement of variable facts such as is given for instance in the writer's Introduction to the Theory of Men- tal and Social Measurements. It will deal in order with the fol- lowing topics : I. What is actually measured by typical measures of the relation- ship between first and second member of a pair in a series of pairs of values, each first-member value being a deviation from the central tendency of one series and each second-member value being a related deviation from the central tendency of a second series? II. What are the respective presuppositions of each of these typical measures? III. What are the advantages and disadvantages of each of these typical measures? The only original contributions which this discussion contains are (1) the investigation of certain artificially constructed cases of correlation and (2) a laborious but not very important experimental testing of the comparative reliability of different measures of rela- tionship, and (3) a similar experimental testing of methods for cor- recting measures of relationship for the 'attenuation' due to inaccu- rate original data. § 5. I. What is actually measured by typical measures of the relationship between first and second member of a pair in a series of pairs of values, each first-member value being a deviation from the central tendency of one series and each second-member value related deviation from the central tendency of a second series Consider the following series of paired values of A and B : A — 1 — 5 3 — 5 — 5 — 3 — 1 — 7 3 o — 3 — 1 A _ I _ J _ 1 — 1 + 1 + 1 — 3 — 1 + 1 + 1 + 3 — 3 — 1 + 1 + 3 + 5 B + 7 + 3 - + 3 + 3 + 3 + 5 + 5 + 5 + 7 + 1 + 3 + 5 — 1 + 3 + 3 + 5 — 3 — 1 +1 '.+• 1 + 1 — 5 +1 Pearson Coefficient =.634. Median Ratio B/A = .65. Average of Ratios =.902. The average of ratios is valueless because it overweights positive values of 2 pairs, etc. A Per cent, unlike signs = .267, r as calculated therefrom being .665. [Mi, ff" 16 EMPIRICAL STUDIES OF MEASUREMENT Each of these pairs represents a relationship, the entire series reading: A deviation in A of — 7 from the central tendency of A brought with it a deviation in B of — 5 from the central tendency of B; a deviation of — 5 brought in one case a deviation in B of — 5, in a second case one of — 3, and in a third case of — 1, etc. Consider now two measures each expressing an important fact concerning this series of 30 individual relationships. The first is, .634. The second is, The median of the 30 B/ A ratios = .65. The former is of course the Pearson Coefficient of correlation for A — B; the latter is. the Median or Mid Ratio B/A. What the former measures can not be stated except in terms not yet given by the individual relationships themselves. Professor Pearson's own statements for instance are in terms of certain facts of a correlation diagram such as Fig. 1, not in terms of the indi- vidual relationships. It is clear that in the case of Fig. 1, which represents our 30 relationships graphically, the slope of the straight line LL1 through -7 -S -3 -I -3 +3 +7 O so drawn that the sum of the deviations of the individual dots from it is zero (measuring deviations in the direction of the B line and calling deviations above the line in the left hand half of the surface and below the line in the right hand half of the surface +, and calling deviations below the line in the left half and above the line in the right hand half — ) is a measure of an important fact about the series of relationships. I The Pearson Coefficient does not, however, measure the slope of. / just such a line as we have supposed to be drawn in Fig. 1 and I described in the last paragraph. Its line is not so calculated as to 1 In this case the slope is roughly 73 per cent, of 45°, the slope which would be found were correlation perfect. The slope for the A's taken as dependent on the B's is roughly 64 per cent, of 45°. MEASUREMENTS OF RELATIONSHIPS 17 make the deviations from it toward closer correlation equal to the ' deviations from it towards less correlation, but is so calculated as to make the sumof the squares of the deviationsT-from it least This of course weights the extreme deviations much more than those near the jenterof the ..sn^fapp^ f°r the same change in the slope^oFthe Ime alters the sum of the squares of the deviations from the line near the center of the surface far less than that of the re- mote deviations. This is a possibly questionable feature of the Pearson Coefficient. Moreover it is calculated as the slope of this line of so-called ' regression ' as found when the two traits are reduced to equivalence of variability and double entries are made in the correlation table, *. e., B's as related to A's and A's as related to B's, the two sets of entries being so superposed that the intersection of the means in the one case coincides with the intersection of the means in the other case. Professor Pearson gives many readers the impression that his coefficient of correlation is calculated as the slope of the straight line Fi F(3. 3. -7 -5" -3 -I +1 +3 +S +1 -S -3 -I + l +3 through 0 to fit the points in the correlation diagram that represent the means of the arrays1 (the two related series being reduced to an equivalence in variability and entered doubly), but in fact it is the slope of the line from which the sum of the squares of the deviations of all the dots each representing one relationship is least, not the slope of the line from which the sum of the squares of the deviations of the dots representing each the mean of one array is least. It is in onr illustration a line to fit the dots of_Fig. 3rjnot fhnsy ftf Figr. 2. That is, an array of 100 cases is (quite properly) given greater weight than one of 2 cases. 1 See, for instance, ' Grammar of Science,' 2d edition, 1900, p. 393 and p. 396. 18 EMPIRICAL STUDIES OF MEASUREMENT Consider now the Pearson Coefficient from another point of view. Let us for the present restrict relationships to those between two series of the same form of distribution, and also define perfect corre- lation as a relationship such that any deviation of A from its central tendency will imply a deviation of B from B's central tendency which shall be the same fraction of B 7s variability that the deviation" of A is of A 's variability! That is, A, 3-z, etc. Var. of B series Var. of A series' Var. of B Var. of A' j- -j j t. Var. of 5 series . ,, . If then all values of B are divided by TT. — „ — . - , we should in Var. of A series perfect correlation find each deviation of A accompanied by an identical deviation of B. The sum of the AB products would be equal to the sum of the A2, or to the sum of the B2, or to V2A2 V5B2. In the case of two series of the same form of distribution and of equal variability the Pearson Coefficient formula then measures the proportion which the sum of the series A^B^ A2B2, etc., is of what it would be with perfect correlation as defined. It can be shown that without reducing B or A to equivalence in variability perfect correlation as defined would give for the sum of the AB products V2A2 V2.B2, provided the form of distribution of A is the same as that of B. The Pearson Coefficient measures, then, in cases where the form of distribution of the two facts to be related is the same, the propor- tion which foe sum of the AB products is of what it wouldJae_were correlation ^perfect. There is no ambiguity as to what is measured by the median of the B/A ratios. Whatever the distributions may be or the ratios, the median means always a definite thing: the ratio B/A which is exceeded in magnitude by as many of the ratios as it exceeds. We have only to note that the median of the B/A 's and the median of the A/B's are two different things and that if we are interested in representing in one number both what a given A deviation implies with respect to B and what a given B deviation implies with respect to A, we must use both the B/A and the A/B median. Certain other measures deserve mention. The directly calcu- lated average of all the individual relationships B/A or A/B is a perfectly comprehensible measure but rather a useless one. The Modal Ratio B/A or A/B is also a perfectly clear conception and, in cases where it can be easily and accurately determined, a very valuable one. The per cent, of direct or the per cent, of inverse relationships i$ equally comprehensibly ami is an important, fnnctinn nf tt ness of relationship. MEASUREMENTS OF RELATIONSHIPS 19 —39 etc. TABLE IX. —1 +1 +39 39 1 37 1 35 1 1 1 1 1 1 1 1 111 1111 2 1 1 11 1111 1 , 1 1 1 11 11111 1 1 1 1 1 1 121 111 111 1 111 1 111 1 122221 1 1 1 21 2 3112221111 1 2 1 1 11 1 1 221122122112221 1 1 1 2 1 1 133233323312 1 3 12 2 52224422331312 21 1 1243535334441111 11 1 1 1 11425536434342121 2 5 1 1 1 211345463643 322 22 11 1 3 1 1224546646473211 1 1 1 1 1 1 1121325346584642221 1 111325155444543331111 21 5 1 1 135433444655231111 1 11 2133334464542221 3 1 1 1 33244263533512 1 212233232234342211 1 1 1 1 2222223451322 1 1 11 1 2111223234 3211 1 1 1 111121212 121 211111 1 1 22211121 1 2111 1 1 1112211 21 1 11 1 1111111 1 1 11111 1 1 11 1 1 1111 1111 1 1 1 1 1 1 1 1 39 1 111 2356912 1620263137435054596263 63625954504337312620151296 532111 When the individual values of A and B are not measured as amounts of deviation from their central tendencies, but only as so many AlJs known to be less than Z and so many A2's greater than Z, and as so many B^'s less than W and so many J32's greater than W, the per cents, of A1^1 pairs, A*B2 pairs, A2^1 pairs and A2B2 pairs give important information. The number and amount of the divergences of the ranks of the second members from the ranks of their related first members also give important information. If the two related facts are of the so-called normal distributionl and the relationship is uniform for all amounts of A and each array! is also a normal distribution, the Median Ratio, the Modal Ratio and! 20 EMPIRICAL STUDIES OF MEASUREMENT X \ FIG. 4. the Pearson Coefficient will, if the two series are reduced to equiv- alence in variability, coincide and will equal cosine wf/.1 This is the case of so-called normaTcUTfeTation approximated in many or- ganic and hereditary anatomical relationships. It is of course only one of many possible types of relationship. The extent to which it prevails in mental and social relationships is not known. Its pre- valence in the case of anatomical facts has probably been over- estimated. Table IX. gives the facts of the relationship between two series both of the same form of distribution, almost exactly the so-called normal, and of the same variability, the relationship being devised artificially so that the average of each array of y is .5 X the corre- sponding value of x. This regression of y on x is shown graph- ically in Fig. 4, which gives the average of each array of the i/'s. The regression of x on y is shown graphically in Fig. 5, which gives the average of each array of the x's. The Pearson Coefficient for this case is .53. The Median Ratio is much higher (.60 for the y/x 1 U equalling the per cent, of unlike-signed pairs. MEASUREMENTS OF RELATIONSHIPS 21 and x/y ratios together) because the correlation is much closer for mediocre values of x and y than for extreme values (see especially the regression of x on y}. U is .292 and r from cos «T7 is accorcU. ingly .61. This case illustrates the fact that the relation of y to x may not be the same as that of x to if even when the form of distribution and variability is the same for both cases. It also illustrates a rather close approach to the so-called 'normal' correlation. FIG. 5. Table X. gives graphically the correlation in the case of age at death of husband with age at death of wife in 935 pairs from records of the Society of Friends. This is taken from the table on p. 498 of Vol. I. of Biometrika, the table being due to Mary Beeton in cooperation with Karl Pearson. This case shows a rela- tionship between two series neither of which is anything like normal in form of distribution, which are not of the same form of distribu- tion and which therefore are in strictness incomparable in varia- bility. 22 EMPIRICAL STUDIES OF MEASUREMENT Age of Husbnd. 11-11 M M-3/eK. I I ' I I I ' I I I I i I I I I I I TABLE X. Fig. 6 gives the regression of y (wife's age) on x (husband's age) in terms of averages of arrays of y and also of medians of arrays of y. To give the regression by single modes for the arrays would be fallacious, for each array is more or less clearly a bimodal distribution. This is shown in Fig. 7, where the s/'s are grouped in four large arrays. It should be clear that any single figure is inadequate to express this relationship. The Pearson Coefficient of correlation is .20 and the regression of y (wife's age) on x (hus- band's age) calculated from it is .25. But this would lead one far astray concerning the real regression, as we see by Fig. 6. The relationship is closer for early deaths than for late. The form of distribution of the relationship is, apart from this, skewed in gen- eral from a mode of close resemblance toward very great diversity, and is in the third place complicated by the submodal tendency of a wife to die at about 35 more often than at 30 or 40. Jguch a case illustrates the fact t.ha.f. panTi typp nf measure of a relationship meas- ures some particular aspect thereof and also the fact of the extreme {jbstractness from realityjjf the Pearson Coefficient, which in this MEASUREMENTS OF RELATIONSHIPS 23 case measures neither a uniform tendency nor a central tendency of the series of individual relationships. The reader will obtain concrete information about the meaning of the different measures of relationship and of their merits in actual practise if he will calculate them for a score of representative rela- tionships and examine them in the light of the entire correlation tables. I have done this for the cosine irU and Median Ratio (or rather, X Age o( Musi/and FIG. 6. The dotted line is from averages ; the continuous line from medians. The dash line is the regression as calculated from the Pearson Coefficient. in order to have the resulting figure comparable directly with the At •y'QT* 'P cosine irU and the Pearson r, for the median of all the ratios : — and x var' y \ in the case of nine relationships representing organic y var. x) and hereditary and conjugal relations, relations in animals and in plants, relations of definite structural features and complex prop- erties. The results are given in Table XI. They show that the 24 EMPIRICAL STUDIES OF MEASUREMENT median ratio method gives results as close to the unlike-signs method as does the Pearson method. The reader who will examine Table XL in connection with the original1 correlation tables in Biometrika il-43 *K-S8 SI- If 71-103 Vol. I., I., L, I., II., II., II., II., III., FIG. 7. will find also that where the Pearson Coefficient r and the Median Ratio r diverge at all widely it is the latter which better fulfils Pearson's criterion of telling how much nearer the most probable value of a second member of a pair is to the value of the first mem- ber than it would be with no relationship at all. TABLE XI. Page Traits to be Related 84 Longevity of adult brothers 126 No. of stamens with No. of pistils in late flowers of Ficaria ranunculoides 214 Human head length with head width 216 Human height with left middle finger length 97 Capsule height of brother plants (Shirley poppies) 97 Stigmata of brother plants ( Shirley 163 NoVof stamens with No. of pistils in lesser celandine from Surrey 498 Longevity of husband with longev- ity of wife. Friends' records 170 Cephalic index of brothers Average difference of r by Pearson Coefficient from r by cos wU .055. Average difference of r by Median Ratio from r by cos irU .045. 1 These examples are all taken from the first three volumes of Biometrika, the ' Vol.' and ' Page ' of the table referring to that journal. x y Mutual Relationship By By By Pearson Median Cosine N Coef. Ratio nU 2000 .2853 .479 .3763 Pistils Length Stamens Width 373 3000 .7489 .4016 .80 .415 .7815 .3875 Height L.M.F. 3000 .6608 .69 .6747 13800 .3782 .48 .5030 4716 .2561 .253 .2160 Pistils Stamens 500 .6601 .55 .5570 Husband Wife 935 1982 .1999 .49 .41 .53 .2560 .5090 MEASUREMENTS OF RELATIONSHIPS 25 § 6. The Presuppositions of Measures of Relationship The Pearson Coefficient. Taken at its mere face value, — — - or , the Pearson V2x2 VSt/2 Tioi^ Coefficient has of course no presuppositions, but if it means the proportion that the 2(xy) is of what it would be with perfect corre- lation it presupposes sameness of form of distribution in the two series. If it means the proportion which the slope of a certain straight line is of the slope of the line of perfect correlation, the certain line being so drawn that the sum of the squares of the divergences from it of the given y values (in double entry) toward greater correlation equals the sum of the squares of those toward less, it presupposes the 'normal' distribution in the case of both series. The Median Ratio. The Median Ratio need have no presuppositions. It is simply one of the obtained individual relationships. When, however, we come to draw inferences from it about the entire series of relation- ships, we must state certain additional facts or use certain presup- positions. The Modal Ratio and the Percentage of Like-signed or of Un- like-signed pairs are also directly drawn from the series of indi- vidual relationships themselves. In calculating the general trend of relationship, r, from r= cosine irV (U being the per cent, of un- like-signed pairs) we presuppose (if I understand Mr. Sheppard cor- rectly) that the correlation surface is transformable into a surface of revolution by a slide and two stretches. § 7. The Advantages of the Different Measures The two previous sections are preliminary to the main topic which forms the title of this section. I shall first compare the conventional measure, the Pearson Coeffi- cient, with the Median Ratio and later deal very briefly with some of the other measures. The main desiderata in any measure are that it measure some real fact and that this fact be important! Other desiderata in the case of a measure of relationship are that the measure be comparable with other measures of other relationships, that it be conveniently and easily calculated and that it diverge little from the correspond- ing measure of the total series from a random sampling of which it is calculated. These desiderata we will consider in the above order. Reality. The Median Ratio is a clear statement of a real fact, an observed 26 EMPIRICAL STUDIES OF MEASUREMENT relationship, suchthatthe number of relationships closer than it e(fuals the number less close. It gives the amount of i/'s difference from its central tendency implied by such difference in x for this mid-case. The Pearson r is not an observed relationship but a measure in- ferred from certain features of the observed relationships on the basis of certain presuppositions about them and the distribution of the facts from which they come. It is of course real in the sense of being the most probable real central tendency of the relationships if these various presuppositions are true, but in fact they never are except by chance more than approximately true, and in the majority of the cases in which students of the mental and social sciences need to measure relationships, they are far from true. The 'regression,' that is the relation between actual amounts of y and actual amounts of x, is the reality at the basis of all measures of the relationship. The Median Ratio expresses it directly. It can be ascertained from the Pearson r only indirectly and on the hypoth- esis that certain very questionable conditions are realized. Importance of the Fact Measured. There is no great advantage either way in this respect. Neither the Pearson Coefficient nor the Median Ratio gives the entire fact of the relationship. Only the total distribution of the relationship that. For 'normal' correlation where the relationship is the same regardless of the amount of x and where all of the arrays are distributed in normal surfaces of frequency the Pearson Coeffi- cient and the Median Ratio both give the central tendency of the rela- tionship. In other cases than this the Median Ratio is a trifle more important because less misleading and because it is nearer the modal relationship if the distribution of the relationship is skewed. It is also worthy of note that our thinking about relationships should for practical reasons usually be in terms of the actual y/x or x/y ratios, that is the 'regressions,' since what we usually need to know is the implication of some actual deviation of one concern- ing the related deviation of the other. It seems better then to calculate the y/x or x/y ratio directly and when necessary to infer the r (that is the ratio when both traits are reduced to an equivalence in variability and the correlation table is one of double entry) rather than to calculate the r and infer the y/x or x/y ratio. Comparability. To compare the relationship between A and B with that between C and D adequately, we must compare the total distribution of the relationship A — B with the total distribution of the relationship C — D. The Pearson Coefficients of A — B and C — D are per- MEASUREMENTS OF RELATIONSHIPS 27 fectly fit to compare only when the form of distribution of the relationship A — B is the same as that of the relationship C — D. So also of the median B/A and median D/C, or of the median A/B and median C/D, or of any measure of the central tendency of relationship which may be inferred from them. In so far as what we wish to compare is the modal relationship, however, there is a smaller error as a rule in inferring from the comparison of the Median Ratios _of unlike distributions of relationships than in in- ferring from the comparison of their Pearson Coefficients. Convenience of Calculation. Provided the original measures are on a sufficiently fine scale, as they ought for every reason to be where relationships are to be measured by a Pearson Coefficient or a Median Ratio or a Modal Ratio, the Median Ratio is of course far more convenient than the Pearson Coefficient. Once a correlation table is written out the Median Ratios can be obtained with very little computation or eye strain. Inspection of the correlation table will tell about what they will be and only a few of the ratios will need to be ranged in order. I append a sample calculation (Fig. 8). First one makes an exact median sectioning of the #'s and the y 's and then counts the cases that give negative ratios. By inspection one then chooses for the y/x ratios an approximate median (here of about .25) and for convenience draws a line to include these cases and counts them. One then increases their number by adding the cases of the next smallest ratios not included or by taking away the cases of the largest ratios included until one reaches the Median Ratio (here .333). One then repeats the process oi: guessing at an approximate median for the y/x ratios and cor- recting it, In making comparisons on the basis of the median ratios we must of course bear in mind the variabilities of our A, B, C and D. In the Pearson Coefficients the series concerned are reduced to an equivalence in variability in the process of calculation. With the Median Ratios, if we wish to make this reduction to terms of the variability as a unit we must do it as a separate operation. For instance let A, B, C and D be series with variabilities 1, 2, 4 and 5. If then the Median Ratios found are B/A = 1.00, A/B = .25, D/C = .625 and <7/D = .40, the Median Ratios that would be found if the differences in varia- bility were eliminated would be B/A -=-2/1, A/B-+-1/2, etc., that is .50, .50, .50 and .50. If we wish to compare the mutual implica- tion of A and B with the mutual implication of C and D we must go further still and combine the median — - • --'- - with the median 28 EMPIRICAL STUDIES OF MEASUREMENT BBS it n n if tf n •w i.} $ u 9 n # a i< n i i is i jj 1 [_/ n i / 11 i i 1 / / 1 I 1 / / n i / I 2 [ J 1 / U i / I 1 2 i |2 I / f 3 2 i 13 / i / 1 2. 2 3> / 3 2 / / / V 3 2 2 II I i 2 i 3 H 6\f 6 f f 3 3 / 1 1 -1 1 2 I 1 2 3 7 9 Ft t f S V 2 1 1 1 -7 I 3 3 1 S 1 i a j0 <\ 1 6 f 3 3 / -5 " -i_£j_ 3 C i ID » a [it 10 1 6 1 t 1 III 1 -3 / T © (, 13 11 If 15 It 13 1 1 2 1 1 1 f -1 z 2 \ 3-7 Tl3~W]/f 15 15 IS 1 1 3 3 III II t| / / j 3 k S 1 II IS IS 16 IS\I3 tt 7 h 3311 +3 / / 2 2 1 5 S S IS 13 / 7 1 1 1T\ II II 10 6 S +7 / 2 3 I S 6 f f //I II 1 10 S 6 2 I / / ~0~ +9 1 2 3 H (> 6 6 S~ 1 2 1 221 f /I 1 2 3 2 S f ~S\S 1 1 1 Z 2 / / 13 1 1 i 1 1 I 2 2*2 1 7 6 / ' / jj- 1 / 1 1 / / jjj 2 y y / 2 ' 17 1 / p Z 3 2 2 11 1 1 1 / 1 1 11 i 1 13 / ~ ~ i 2? i 21 i i / N-I30S J4N-C51 )= 516 i= (of v«= IS /n~ S FIG. 8. var. B ,. D var. C -r- and similarly the median ^ var. A * C var. D with the median A B n'~~~7i • Tn^s combination would be made by taking the median U V9.1". 0 of both the-- - A var. B and B' var' A ratios, an equivalent of the double entering involved in the Pearson Coefficient, or more easily B_ var. A , A var. B still by taking A ' var. B B var. A MEASUREMENTS OF RELATIONSHIPS 29 Comparison is thus more awkward with the Median Ratios than with the Pearson Coefficients, because the latter method automat- ically both divides through by the variability and gets a measure of mutual implication. The superiority of the Pearson Coefficients is to some extent specious for it makes comparison easy not by re- moving difficulties but by presupposing that they do not exist. The obvious additional steps needed in the case of comparison of Median Ratios witness and emphasize the hypotheses on the basis of which we do compare. They may also prevent us from inadequate com- parison. For instance from the facts that the Pearson r for adult brother's longevity with adult brother's longevity is .2853 and that the Pearson r for stature with left middle finger length is .6608, we have no right to conclude that the latter relationship is 2.3 times a.s close. Any one who will study the individual relationships in these two cases1 will see that no single ratio can express the com- parison of the two relationships. Speed of Calculation. Onee the correlation-table is written out the Median Ratio can be calcinated iii from one tenth to one hundredth of the time taken for the Pearson Coefficient. Divergence of Results Obtained from a Partial Sampling from the Results from the Entire Series Sampled. The Pearson Coefficient is for normal correlation by the theory of error the more reliable. Whether in the actual cases of relation- ship with which we work, where the distributions and correlations are not exactly normal and where the theory of error does not apply without modification, it is more reliable, is a matter to be determined. Its use of the exact amount of every case of the relationships makes for superior reliability, but its weighting of extreme cases may some- what conterbalance this. The reason given by Professor Pearson for replacing Galton's method of obtaining the Median Ratio by this product-moment method was this superior reliability. No other reason has so far as I am aware ever been advanced. It is doubtful if Professor Pearson now would lay so much stress on greater reliability in the case of normal correlation of normal distributions, since he has so emphatically shown the rarity of both of these, and has been at some pains to test empirically certain measures which are valid re- gardless of the normality of distribution of the two facts. Since in almost every other respect the Median Ratio is a more advantageous measure, it seems worth while to determine empir- ically, for some typical relationships, the comparative freedom from 1 See Biometrika, Vol. I., p. 84, and Vol. I., p. 2 1C. EMPIRICAL STUDIES OF MEASUREMENT H. —27 etc. TABLE XII. _5 _3 _i -j-i +3 4-5 +27 —27 1 1 -25 -23 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 !8 1 1 2 2 2 2 2 1 1 1 3 2 1 21 ••S 1 1 1 2 2 2 3 1 3 2 1 1 1 432 2 32 I 1 1 243 4 6 5 6 5 5 3 3 1 1 51 f 1 2 2 223 7 5 6 6 6 5 5 4 2 1 1 61 2 3 345 9 9 11 10 9 9 6 5 3 3 92 — 5 3 3 366 10 10 11 12 10 9 6 7 4 4 1 1 1 108 — 3 1 2 3 286 18 12 15 15 14 13 9 9 2 2 1 1 129 - 1 2 2 379 13 14 15 15 15 15 9 9 3 3 11 11 139 + 1 113 365 9 11 15 15 16 15 13 13 763 3 2 1 148 + 3 1 12 245 8 8 15 13 14 15 13 12 552 2 1 1 129 + 5 1 2 223 6 7 9 9 12 11 11 10 653 2 2 1 104 1 232 5 6 9 9 11 11 9 10 562 211 1 96 2 2 3 4 6 6 6 6 5 4 2 1 2 2 1 1 53 1 2 3 2 5 5 6 5 5 4 1 1 2 211 46 1 1 2 1 1 2 2 2 2 2 761 1 1 32 1 1 1 1 1 1 2 2 441 2 1 22 1 1 1 232 2 12 1 1 1 1 1 1 6 1 1 1 1 1 3 + 27 1 1 1 2 2 81026285862 97103128129 132125102 986456282611 72211 TABLE XIII. -11 -10 —9 —8 —7 —6 —5 —4 —3 —2 —1 1 1 1 1 1 +1 +2 +3 +4 +5 4-6 +7 +8 +9 +10+11 —5 —4 -3 —2 — 1 +1 +2 +3 +4 +5 1 1 1 2 1 2 4 1 3 5 1 3 2 1 3 1 1 2 2 1 2 1 333 5 12 2 8 17 19 5 20 30 3 18 20 1 2 1 4 17 20 24 1 5 11 17 20 1 1 6 8 12 1 4 4 11 1 1 3 5 1 1 1 2 11 16 18 28 21 17 10 3 1 1 1 4 11 15 18 30 '20 1C, 6 1 1 1 1 4 11 13 17 '24 '24 17 3 1 2 1 2 3 11 16 U 21 30 5 1 1 1 1 2 3 3 10 11 19 18 9 7 1 2 1 2 5 7 6 9 4 5 1 2 4 2 4 4 4 2 2 2 1 2 3 1 1 2 1 1 1 1 2 2 7 11 20 1 1 1 90110120130 130120110 90 30 20 11 7 2 11 18 39 89 112 118 128 124 118 107 86 39 23 11 1 7 2 2 1 MEASUREMENTS OF RELATIONSHIPS 31 chance error of it and of the Pearson Coefficient. I have also tested the influence of the number of cases on the per cent, of unlike- signed pairs (which I have called^ U) because at least for pre- liminary investigations of mental and social relationships the formula^ r = cosine -n-U (where U = the per cent, of unlike-signed pairs, deviations being calculated from an exact median sectioning, with no zero deviations) will often possess great advantages. The accuracy with which the Pearson r, the Median Ratio and the cosine vU calculated from a random sampling of a series of individual relationships approximate the true r, the true Median Ratio, and the true cosine nil of the entire series was experimentally determined in the case of the series A, B and C (shown in Tables IX., XII. and XIII.).1 These reliabilities could, I suppose, be calculated by theory for any given series of relationships but it seemed wise to determine them also by experiments with real cases. In calculating the results for each draw of 200, 100 or of 50 cases the deviations were reckoned always from the true central tend- encies of the total series, not from the obtained central tendencies of the draw itself. This saves much time and introduces no error relevant to the problem. The Median Ratio was taken simply as the observed ratio of which it was a case. That is, if the distribu- tion of ratios was : Less than 1.00 — 49 1.00 — 12 over 1.00 — 39, the Median Ratio would be taken as 1.00. If one took as the Median Ratio the average of this observed ratio and the ratio halfway be- tween the 40 and 60 percentiles, the divergences for the Median Ratio would be reduced. The results are given in Table XIV. In every case the Median Ratio means the median of all the ratios (y/x and x/y), the two series being reduced to an equivalence in variability. The relationships as calculated from the entire series are: Series A Series B [ Series C Pearson Coefficient .51 .27 .73 Median Ratio .60 .33 .83 Cosine vU .61 .30 .79 It is clear from Table XIV. that if A7 is as great as 100, there is no great loss in precision from the use of the Median Ratio method or even of the unlike-signed pairs method. 1 Table IX. is on page 19. I 32 EMPIRICAL STUDIES OF MEASUREMENT TABLE XIV. AVERAGE DIVERGENCE OF OBTAINED FROM TRUE MEASURE OF RELATIONSHIP1 (Figures in parentheses give the ranks of the three methods in freedom from chance error.) No. of No. of Pearson Median Ratio Trials Cases Coefficient (Double-entry) Cosine -nil Series A 10 200 .039(1) .053(2) .058(3) 10 100 .065(2) .062(1) .101(3) 10 50 .100 (1) .155 (3) .135 (2) Series B 5 200 .064(2) .063(1) .082(3) 5 100 .105(3) .072(1) .075(2) 10 50 .153(1) .192(2) .197(3) Series C 3 200 .044 (2) .072 (3) .013 (1) 3 100 .032(1-2) .050(3) .032(1-2) 5 50 .119 (2) .120 (3) .077 (1) The Advantages of Certain Other Measures. The Average Ratio has no advantage over the Median Ratio and suffers from the disadvantage of taking an enormous amount of time and being influenced so much by extreme ratios. No experi- enced worker with relationships would favor its use. The Modal Ratio is in some respects the most important single feature of the entire series of relationships, and is probably a better basis of comparison between different relationships when either is not normally distributed than the Pearson Coefficient or the Median Ratio. The observed Modal Ratio from a small sampling diverges so much from the true Modal Ratio of the total series, however, that^ it can not be well used alone unless the number of ratios is 500 or more: The scale should also be fine. The most probable true Modal Ratio inferred from a large part of the total distribution of 1 It is hardly worth while to compare the empirical divergences of Table XIV. for the Pearson Coefficients with the divergences to be expected from the .7979(1 — r2) formula A.D. true r-obtained r = — - — -7= , for this formula, calculated for ' normal ' correlation, would not be expected to fit very closely any of the three sets, A, B and C, or to fit C at all closely. A certain interest does attach to the .7979(1 — r2) comparison from the fact that the formula A.D. ,rue r- obtained r = - has also been proposed as the valid one. So far as my drawings go, the former is surely the better. They vary from it, moreover, with a constant deviation toward a larger divergence, the divergences by theory being: Series A Series B Series C .042 .053 .027 .059 .074 .038 .083 .105 .054 MEASUREMENTS OF -RELATIONSHIPS the relationship is a very valuable measure but one the calculation of which takes a long time and involves presuppositions about the form of distribution of the relationship. In all cases the investigator of a relationship should be observ- ant of the form of distribution of the individual relationships and of their approximate mode. Where the correlation table shows any marked eccentricity in the distribution of the relationships the ob- served modal relationship at least should probably be stated, even though the more reliable Median Ratio or Pearson Coefficient has been calculated. The correlation (in the sense of the slope of the line which the Pearson Coefficient measures) may be inferred from the frequencies of certain types of pairs, as in the case, r = cos. irl] (U equalling the percentage of unlike-signed pairs with median sectioning). The methods of making this inference are especially valuable when we wish to compare two relationships, one (or both) of which is measured very crudely, for instance, the relation between health and cheerfulness and the relation between intellect and morality. From such measures as the following : g Much g Little Health Sickly Healthy 150 150 Inferior Intellect Dull Bright 315 285 250 450 1 2 Superior 145 2G5 of one can not compare directly the closeness of relationship health and cheerfulness with that of intellect and morality. The following formulas, suggested by Pearson, are probably the best available for dealing with such casesT In all N= the total FIG. 9. number of pairs; a, b, c and d mean respectively the numbers of ^Wi, £22/i» x\y-t and x2y2 pairs where Xj. means measures above any given degree of x and x2, measures below it, and similarly for y1 and 3/8 (see Fig. 9). 34 EMPIRICAL STUDIES OF MEASUREMENT , TT 1 labcdN I. r = sin - where F = — -, j-^ 1 H cases being so chosen that ad > &c. III. r = sin * -^L — -z^. + l/6c t2 - 3), etc. Since and (a + &) — (c + d) -IT" 7i and A; are found from tables of the probability integral, a, &, c and eZ being known. H is taken as -4= e~y^ H and K are thus found from tables. Of these^formulas IV. is for 'normal' correlation the most ac- curate. It presupposes 'normal' correlation: I., TT anH TTT Hn not When the facts to be related are measured on a fine scale but in terms of relative position only, not of amount, the relationship may be measured, as Spearman has shown, by the degree of conformity of the second member's position to that of the first member.1 This method suffers from the disadvantage of giving results only with much difficulty comparable with other methods and of taking much more time without being much more reliable than the cosine irU method. From the reduction in variability of an array of y related to a given value of x below the variability of the total series of y, the correlation may be inferred on the supposition that the correlation is 'normal' and that the variabilities of all arrays of y are equal. The infrequency of 'normal' correlation and the fact that, as 1 See American Journal of Psychology, Vol. XV., p. 86 ff. MEASUREMENTS OF RELATIONSHIPS 35 shown in § 4, the variabilities of all arrays of y are usually not equal make this method of no great practical service except for the few cases where no better method can be used. l Section 4 tested the hypothesis of equal variability of all arrays of' y and found it true in some cases and false for others. It is some- what extraordinary that Professor Pearson should in support of his coefficient of variability argue that the gross variability depends on the size of the mean from which the variability is measured, be- ing proportioned to it, and yet not recognize that, since the means of the arrays of y in positive correlation would then increase as we pass from arrays related to low values of x to arrays related to high values of x, the variability of one of the latter arrays should be greater than that of one of the former. §8. The Attenuation of Measurements of Relationship Chance inaccuramps in flip m-lonnql measures make the relation- ship obtained therefrom vary toward zero from the relationship that would be found with accurate measures. C. Spearman announced in the American Journal of Psychology, Vol. XV., pp. 89-91, that the following formulas gave the necessary correction ; a) rq,q, where rp,q,= ihe mean of the correlations between each series of values obtained for p with each series obtained for q ; »yy=the average correlation between one and another of these several independently obtained series of values of p; rgV=the same as regards q; and rp<,= the required real correlation between the true objective values of p and q. where m and n = the number of independent gradings for p and q respectively ; 1 Cases, that is, where we know the variability of a related array but lack the data needed for the use of the better methods. For instance, we may find the variability of 100 men eminent in engineering science in early liking for arithmetic to be only 30 per cent, as great as the variability of men in general and so infer the amount of relationship between early liking of arithmetic and engineering ability. The actual rating of a random sampling of men in both early liking for arithmetic and engineering ability would be hardly possible. 36 EMPIRICAL STUDIES OF MEASUREMENT ry^ — the mean correlation between the various grad- ings for p and those for q ; and rp,,g,, = the correlation of the amalgamated series for p with the amalgamated series for q. He has been criticized with some venom bv Karl Pearson (Biometrika, Vol. III., p. 160), who believes these formulas wrong, and concludes that "Perhaps the best thing at present would be for Mr. Spearman to write a paper giving algebraical proofs of all the formulas he has used, and if he did not discover their erroneous character in the process, he would at least provide tangible material for definite criticism, which it is difficult to apply to mere unproven assertions. ' ' These formulas of "Spearman's, if correct, are of importance. They should be proved valid or replaced by formulas that are valid. The first formula may be replaced by * — ovp,2 1/oy2 — vtq? where rpq and rp>q> are as above and %/ = the mean square deviation of the series of measures of p ; oy = the mean square deviation of the series of measures of q ; o-q,, = the mean square deviation of the different measures of p in the same individuals ; ov = the mean square deviation of the different measures of q in the same indivduals.1 The presupposition of this formula and of Spearman's first formula is. that the attenuation is due to chance errors. Dr. Clark Wissler has called attention to the fact that, where practise, fatigue and other constant influences help to cause the different observations of a fact to vary, these formulas will, therefore, pive inaccurate results.2 ^) ^ /3 •> Of these two formulas, Spearman's possesses the advantage of being usable in cases ^Fere the twcTtraits are not measured in units f 5 ) o£ amount, such as allow the variabilities of the two traits to be calculated; the formula of Boas has the advantage of being more, rapid and convenient in cases where the variabilities of the two traits can be calculated. No active attention has so far as the writer knows been yet given ^ to formula (2) above.3 Practical necessity seems to justify the labor 1 This formula is due to Professor Franz Boas. See also the note by Dr. C. Wissler in Science, Vol. XXII., p. 309 ff. 2 Loc. cit. in note 1. * Spearman's second formula has the advantage of measuring the probable true correlation by the actual changes produced in the obtained ' raw ' correlation by a certain increase in accuracy. The nature and validity of the presupposi- tions upon which it is based I am not competent to discuss. MEASUREMENTS OF RELATIONSHIPS 37 of testing it (and in a measure the first formula also) inductively. This I have done to some extent for values of r where the r's from accurate measures are from .70 to .80 in connection with my 'Meas- urements of Twins' (Archives of Philosophy, Psychology and Scien- tific Methods, No. 1, September, 1905). I had records from 50 pairs of twins in 5 tests of efficiency of perception; (1) in marking A's on a sheet of printed capitals, (2) in marking A's on a second sheet of printed capitals, (3) in mark- ing words containing e and r on a page of Spanish, (4) in marking words containing a and t on a page of Spanish and (5) in marking misspelled words on a page of narrative, 100 of whose words were misspelled. I had also 6 tests in efficiency of controlled association, tests 6 and 7 being addition, 8 and 9 being multiplication and 10 and 11 being writing the opposites of two lists of words. If we combine all 5 of the tests of efficiency of perception allow- ing approximately equal weight to each, we have a measure which is presumably close to the true measure of a child's capacity at a certain day and hour to pick out small details efficiently. The cor- relation between twin and twin is for this combined score .697. Similarly the combined measure for addition, multiplication and opposites gives a measure presumably close to the true measure of a child's ability at a certain day and hour to make proper mental connections. The correlation between twin and twin is .815. The .697 and .815 are presumably only slightly below the true r's. Now the correlations for twin and twin in tests 1-11 were in order .607, .633, .595, .428, .754, .645, .644, .653, .579, .734 and .560. Subjecting these values to correction by Spearman's formulas, taking, as he does, the mean of both corrected r's I obtained for the perception tests: Marking A's, true r=.69; marking letters in words, true r = .71 ; misspelled words, not corrected because only one test was given. The Spearman correction thus produced results in accord with the expectation derived from the value r— .697 for the combined mark. For the association tests J obtained after correction: Addition, true r=.75; multiplication, true r = .84 ; opposites, true r = .90. The average of these, .83, is again closely in accord with the .815 from the combined measure. In both cases the result by correction is slightly higher than the result empirically obtained from the more accurate data, as of course it should be. I have made a test ad hoc in the case of a series of 100 pairs drawn at random from Series B which give a true r of .281. These 100 pairs of accurate measures I made inaccurate artificially. I then calculated the r's obtained from such inaccurate measures, applied the Spearman formulas and in so far tested their validity. 38 EMPIRICAL STUDIES OF MEASUREMENT Special precautions were taken to have the errors artificially in- duced in the 200 measures such as would come in reality from variable errors of apparatus, observation and record. The errors were in fact a random sampling of the errors actually made by a psychologist in estimating areas. A series of 121 rectangles of approximately the same shape, 40, 41, 42 ... 160 sq. cm. with also many duplicates were used. The area of each was estimated, the slips being drawn in a random order, and the error -\- or — from the true area was recorded. The errors used by me were those made after from 3 to 5 trials with the series and were little in- fluenced by practice (the sums of the errors regardless of signs were for successive repetitions of the series 605, 614, 563, 613, 587, 637, 531, 542, 578, 581). I used the deviation from the standard if the constant error for the given area was less than 1 sq. cm. and the approximate deviation from the subject's own average judgment if the constant error was over 1 sq. cm. The errors taken were those (10 in each case) made with areas 43 sq. cm. up through 122 sq. cm., four errors being taken for each of the 200 accurate meas- ures. These errors were assigned to the accurate measures so that the magnitude of the area with which the error of estimation was made corresponded roughly to the magnitude of the measure to which the error was assigned. Thus errors from areas 43-53 would be put with measures — 27, — 25, — 23 and the like, and errors from areas 110-122 would be put with measures +17, -f- 19, -(-27 and the like. The true measures and the errors assigned to each are given in Table XV. If now to each true measure is added (regarding signs) its as- signed error, we have (four errors having been assigned to each) four series of inaccurate measures of two series whose true values and true correlation are known. These facts give the data for test- ing the Spearman formulas.1 1 These errors can of course be used with any series of 400 or less measures to test Spearman's formulae, as I have done for this series (r = .281 of Series B) . MEASUREMENTS OF RELATIONSHIPS 39 TABLE XV. True i a —19 b —17 e —15 d —15 etc. —15 —15 —13 —13 —13 —11 —11 —11 — 3 — 2 + 9 + 6 0 — 8 + 1 — 8 + 1 + 1 — 2 0 Errors Assigned + 2 — 4 + 5 — 1 + 4 0 — 2 — 2 - 1 - 1 + 1 +2 + 6 — 2 + 1 - 1 + 2 +1 — 5 +11 0—6 — 2 0 — 2 - 4 + 1 — 3 — 3 + 2 — 3 + 4 + 3 — 4 + 4 + 7 Truey a —11 b - 1 c —27 d — 9 etc. + 3 + 7 —11 — 3 + 13 —13 — 5 — 5 o + 12 + 7 0 + 7 + 3 + 6 + 4 + 1 — 4 + 7 + 7 Errors Assigned _ 4 — 4 +5 —13 — 9 — 7 + 2 — 4 — 3 + 3—8 0 + 7 - 2 — 3 + 10 — 7 +7 -5+9 0 + 3 +3 +6 - 1 — 1 - 6 - 3 +7 — 5 - 4 +2 +4 — 1 0—6 —11 —12 — 1 0 — 3 — 3 - 1 — 6 — 8 — 9 — 9 Q + 7 + 9 0 — 5 + 3 0 0 + 9 — 9 + 13 + 8 + 3 —12 — 3 + 1 — 1 + 1 + 6 — 9 + 2 + 1 — 3 + 2 — 1 Q + 6 — 5 — 3 — 9 — 4 — 1 — 4 — 3 - 1 — 5 + 6 + 3 — 6 — 9 — 6 + 6 + 4 + 12 + 5 o + 3 — 6 + 2 — 9 — 3 — 2 — 2 0 + 13 — 8 - 1 + 6 + 4 — 7 0 + 2 — 2 + 5 - 1 + 5 — 5 + 1 — 5 - 7 — 2 — 6 + 3 + 2 + 7 + 1 + 2 — 4 + 3 — 7 + 7 + 4 + 5 — 2 + 9 + 2 . K — 3 K — 5 — 6 — 2 — 2 + 3 —13 + 1 + 1 + 4 + 6 — 5 + 13 — 4 — 4 + 3 — 9 — 6 0 + 3 — 3 — 5 + 8 — 3 — 7 0 — 7 — 6 + 2 — 2 + 2 — 5 — 3 + 9 — 4 — 3 — 7 + 1 + 3 — 5 + 2 — 5 — 7 — 3 — 5 + 2 — 3 o + 6 — 2 — 7 — 5 + 7 + 2 0 1 — 3 + 7 + 1 + 4 — 2 — 5 - 1 + 4 0 — 2 — 3 + 5 + 10 — 5 + 4 — 5 + 4 — 2 + 4 — 3 — 3 —14 + 6 — 3 — 5 — 5 + 1 — 3 — 3 — 3 3 - 1 —11 — 3 — 2 — 5 + 6 + 3 — 3 — 9 + 13 — 3 — 4 + 3 + 11 — 3 + 3 + 7 + 6 — 5 —11 + 11 + 7 O —11 — 3 + 7 + 3 + 6 — 2 — 9 + 5 — 2 — 3 + 3 — 3 - 1 — 7 — 2 — 7 — 9 — 6 — 3 — 5 + 5 — 3 + 3 — 9 + 6 — 2 AC- X4 J* Errors Assigned True y Errors Assigned — 1 — 9 + 9 — 1 0 + 1 — 5 + 5 + 9 + 3 - 1 + 8 — 5 + 5 0 + 3 0 + 1 + 1 0 - 1 + 1 — 5 — 5 - 1 + 5 0 + 1 + 2 — 3 - 1 + 2 + 7 - 1 + 4 + 7 + 3 — 2 + 14 — 6 -f- + 3 — 5 — 6 0 + 17 0 + 4 — 1 — 8 4- — 6 + 3 —15 — 2 + 11 —13 + 7 + 4 + 1 + + 1 — 2 + 6 + 1 + 11 + 2 — 4 + 4 + 12 -j- + 6 + 8 + 2 0 + 5 + 7 + 5 + 4 — 7 4. + 1 0 + 2 0 + 1 — 7 —10 — 9 + 8 + + 1 + 3 + 2 — 8 + 1 + 2 + 6 — 9 — 3 + + 17 + 1 — 9 2 + 1 + 5 0 + 3 + 7 -f- — 3 + 4 — 5 — 6 — 3 — 9 — 1 + 4 — 6 + 0 + 7 + 2 + 4 — 5 + 3 + 7 - 1 + 6 + 2 + 1 + 7 — 9 — 7 +11 + 3 — 2 — 3 + + 3 — 6 + 4 - 1 A 1 — 7 — 2 — 3 + 0 + 3 — 5 — 4 — 9 + 3 — 1 + 10 — 8 + + 5 + 1 + 5 + 4 —13 — 6 + 6 + 6 — 1 + 3 — 3 — 9 — 3 + 6 + 1 — 4 + 3 — 3 0 + 3 + 1 + 3 + 1 - 1 + 1 0 — 3 — 3 — 2 + 3 o O — 2 — 2 — 1 + 4 0 0 + 7 + 3 + 7 0 + 3 + 5 — 5 + 1 2 + 3 — 2 + 5 + 3 + 2 + 4 - 1 + 11 + 1 - 1 + 1 + 2 + 5 + 5 + 3 + 5 —11 + 5 + 8 — 2 —12 A + 5 + 3 + 2 0 — 9 + 1 + 7 — 3 +16 + 4 + 5 — 4 — 8 — 4 + 3 + 1 - 1 — 2 — 2 — 6 + 5 + 10 + 9 — 2 + 3 — 7 — 2 — 7 + 6 + 5 + 5 — 5 — 5 + 4 + 3 — 9 + 1 — 3 — 4 + 1 + 5 + 10 — 6 — 8 + 12 —19 + 2 + 5 + 1 — 4 + 7 — 6 A + 2 — 4 + 11 — 4 — 3 — 8 — 5 + 7 + 5 + 10 — 5 — 3 + 7 + 7 + 15 — 7 - 7 + 7 — 2 — 7 + 2 — 5 + 5 — 3 — 3 — 3 + 14 + 7 + 7 + 1 + 15 1 + 1 + 2 + 1 — 4 — 4 + 7 — 8 + 6 — 6 +10 — 3 — 2 + 7 + 3 —11 + 7 + 8