H_ American Foundation ForThe Blind inc. Digitized by the Internet Archive in 2012 with funding from - Lyrasis Members and Sloan Foundation http://www.archive.org/details/statisticsinpsycOOhenr STATISTICS IN PSYCHOLOGY AND EDUCATION STATISTICS IN PSYCHOLOGY AND EDUCATION BY HENRY E. GARRETT ASSISTANT PROFESSOR OF PSYCHOLOGY, COLUMBIA UNIVERSITY WITH AN INTRODUCTION BY R. S. WOODWORTH PROFESSOR OF PSYCHOLOGY, COLUMBIA UNIVERSITY LONGMANS, GREEN AND CO. 55 FIFTH AVENUE, NEW YORK CHICAGO, TORONTO, LONDON 1926 Copyright, 1926, by LONGMANS, GREEN AND CO. First Edition, January, 192G Reprinted, November, 1926 MADE IN THJB UNITED STATES INTRODUCTION Modern problems and needs are forcing statistical methods and statistical ideas more and more to the fore. There are so many things we wish to know which cannot be discovered by a single observation, or by a single measurement. We wish to envisage the behavior of a man who, like all men, is rather a variable quantity, and must be observed repeatedly and not once for all. We wish to study the social group, composed of individuals differing one from another. We should like to be able to compare one group with another, one race with another, as well as one individual with another individual, or the indi- vidual with the norm for his age, race or class. We wish to trace the curve which pictures the growth of a child, or of a population. We wish to disentangle the interwoven factors of heredity and environment which influence the development of the individual, and to measure the similarly interwoven effects of laws, social customs and economic conditions upon public health, safety and welfare generally. Even if our statistical appetite is far from keen, we all of us should like to know enough to understand, or to withstand, the statistics that are constantly being thrown at us in print or conversation— much of it pretty bad statistics. The only cure for bad statistics is apparently more and better statistics. All in all, it certainly appears that the rudiments of sound statistical sense are coming to be an essential of a liberal education. Now there are different orders of statisticians. There is, first in order, the mathematician who invents the method for performing a certain type of statistical job. His interest, as a mathematician, is not in the educational, social or psychological problems just alluded to, but in the problem of devising instru- VI INTRODUCTION ments for handling such matters. He is the tool-maker of the statistical industry, and one good tool-maker can supply many skilled workers. The latter are quite another order of statisti- cians. Supply them with the mathematician's formulas, map out the procedure for them to follow, provide working charts, tables and calculating machines, and they will compute from your data the necessary averages, probable errors and correla- tion coefficients. Their interest, as computers, lies in the quick and accurate handling of the tools of the trade. But there is a statistician of yet another order, in between the other two. His primary interest is psychological, perhaps, or it may be educational. It is he who has selected the scientific or practical problem, who has organized his attack upon the problem in such fashion that the data obtained can be handled in some sound statistical way. He selects the statistical tools to be employed, and, when the computers have done their work, he scrutinizes the results for their bearing upon the scientific or practical problem with which he started. Such an one, in short, must have a discriminating knowledge of the kit of tools which the mathematician has handed him, as well as some skill in their actual use. The reader of the present book will quickly discern that it is intended primarily for statisticians of the last-mentioned type. It lays out before him the tools of the trade; it explains very fully and carefully the manner of handling each tool; it affords practice in the use of each. While it has little to say of the tool-maker's art, it takes great pains to make clear the use and limitations of each tool. As any one can readily see who has tried to teach statistics to the class of students who most need to know the subject, this book is the product of a genuine teacher's experience, and is exceptionally well adapted to the student's use. To an unusual degree, it succeeds in meeting the student upon his own ground. R. S. Woodworth Columbia University PREFACE The present day emphasis on measurement and the quanti- tative treatment of results has made a knowledge of statistical method not only extremely useful but almost necessary to the student of psychology, education, and the social sciences. To those who have been well trained in mathematics, the acquisi- tion of statistical technique offers no particular difficulty. To many otherwise capable students, however, either because of inadequate preparation in mathematics, or because their prep- aration is not very recent, the application of statistical method to data obtained from test and experiment is more than ordinarily difficult. It is for this last group of students, especially, that this book has been written. Its primary purpose is to present the subject in a simple and concise form understandable to those who have no previous knowledge of statistical method. With this end in view, theory has everywhere been subordinated to practical application, and numerous illustrations of the various statistical devices have been provided. References have been given, however, for the benefit of those interested in the mathe- matical theory underlying the methods introduced. The reader will note that in nearly all cases formulas have simply been stated without proof. This has been done, because the writer believes that most students of mental and social measurement are — and probably should be — more concerned with what a formula means and does than in how it is derived. There is considerable justification for such an attitude. In every science certain facts obtained from other fields must be taken on faith. We do not, to take a simple example, restrict the use of the radio or the microscope to those who understand the physical principles involved, and there seems to be no real yii vni PREFACE reason why a student of psychology should not make good use of a correlation formula when he cannot derive it mathe- matically. A chapter has been given to the subject of reliability — a topic too often passed over lightly — and considerable space has been devoted to correlation. An entire chapter, also, has been given to partial and multiple correlation. This method, while comparatively recent, is being widely used in educational research, and is probably destined in the near future to be more often used in the psychological laboratory. In the last chapter, the application of correlation and other statistical methods is shown to tests and testing. Many have contributed to the making of this book of whom only a few can be mentioned. To Professors R. S. Woodworth and Mark A. May who read the manuscript, the writer is indebted for many useful and constructive criticisms. He is also grateful to Dr. M. R. Neifeld, to Mr. V. W. Lemmon, and to Miss Elizabeth Farber for computations and helpful suggestions. Henry E. Garrett Columbia University CONTENTS CHAPTER I THE FREQUENCY DISTRIBUTION SECTION PAGE I. The Tabulation of Measures into a Frequency Distribu- tion 1 1. Measures in General: Continuous and Discrete ... 1 2. Classification of Measures in Continuous Series ... 2 3. Three Ways of Expressing the Limits of a Step-interval . 5 4. The Meaning of a Single Score in a Continuous Series . 7 II. Measures of Central Tendency 8 1. The Average, or Arithmetic Mean . ... . . . 8 2. The Median 11 3. The Mode . . 15 III. Measures of Variability 16 1. The Range 17 2. The Quartile Deviation, or Q 17 3. The Average Deviation, or AD 22 4. The Standard Deviation, or SD 26 ( IV. The Short Method of Finding the Average, AD, and SD(a) 28 1. The Calculation of the Average by the Short Method . 28 2. The Calculation of the AD by the Short Method ... 32 A. The Calculation of the AD from the Average ... 32 B. The Calculation of the AD from the Median ... 35 3. The Calculation of the Standard Deviation by the Short Method 35 4. The Short Method Applied to Discrete Series .... 36 V. The Comparison of Groups 40 1. The Measurement of Relative Variability 40 2. The Comparison of Two Groups in Terms of Central Tendency and Variability 42 3. The Comparison of Two Groups in Terms of Overlapping 44 VI. The Calculation of the Percentiles in a Frequency Dis- tribution 45 is X CONTENTS SECTION PAGE VII. When to Use the Different Measures of Central Ten- dency and Variability 50 VIII. Summary of Formulas for Finding the Measures of Cen- tral Tendency and Variability 51 IX. Illustrative Problems 53 CHAPTER II GRAPHIC METHODS AND THE NORMAL CURVE I. The Graphic Representation of the Frequency Distribu- tion 59 1. The Frequency Polygon 59 2. The Histogram or Column Diagram 63 3. The Ogive, or Cumulative Frequency Graph ... . .66 II. Other Uses of Graphical' Methods: the Comparative Line Graph .71 III. The Normal Probability Curve 74 1. Elementary Principles of Probability , 76 2. Why the Probability Curve is Employed in Psychological Measurement - 81 3. Important Properties of the Normal Curve 84 4. The Measurement of Skewness 86 IV. Some Practical Applications of the Normal Curve . . 89 1. The Construction and Use of Tables X and XI .... 89 2. A Variety of Problems Solved by Means of Tables X and XI 94 3. The Arrangement of Problems or other Test Items into a Scale in Which the Difficulty of Each Item is Known with Reference to Each Other Item as Well as Some Selected Zero Point 101 4. The Conversion of Judgments by Relative Position — or Relative Merit — into a or PE Positions on the Scale . . 107 5. The Scaling of Total Scores on a Test 109 V. The Transmutation of Measures by Relative Position (in Order of Merit) into Units of Amount on the Assumption of Normality in the Trait Measured . Ill CHAPTER III THE RELIABILITY OF MEASURES I, What is Meant by the Reliability of a Measure . . 118 CONTENTS XI SECTION PAGE II. The Reliability of Measures op Central Tendency . . 120 1. The Reliability of the Average or Mean 120 A. In Terms of the Standard Error, <r av . 120 B. In Terms of the Probable Error, PE av . . . . .125 2. The Reliability of the Median 126 III. The Reliability of Measures of Variability .... 127 1. The Standard Deviation, or a 127 2. The Quartile Deviation, or Q 128 IV. The Reliability of the Difference between Two Measures 128 1. The Reliability of the Difference between Two Averages . 128 A. In Terms of the o"(diff.) 128 B. In Terms of the PE( dm .) 133 2. The Reliability of the Difference between Two Medians . 136 V. Some Problems which Involve Measures of Reliability . 138 VI. Limitations to the Reliability Formulas, and Cautions to be Observed in Interpreting Them 142 VII. Summary of Reliability Formulas 145 CHAPTER IV CORRELATION I. What is Meant by Correlation 149 II. The Coefficient of Correlation: What it is, and what it Does 152 1. The Coefficient of Correlation as a Ratio 152 2. Graphical Representation of the Coefficient of Correlation 158 III. The Calculation of the Coefficient of Correlation by the Product-moment Method 163 1. The Product-moment Formula when Deviations are Taken from the Guessed Averages of the Two Distri- butions 163 2. The Product-moment Formula when Deviations are Taken from the Actual Averages of the Two Distribu- tions 168 IV. The Probable Error of a Coefficient of Correlation . 170 1. The PE r . 170 2. The PE of the Difference between Two r's 171 V. The Regression Equations 173 1. In Deviation Form 173 2. The Regression Equations in Score Form 180 3. The Reliability of the " Predictions" made from the Regression Equations 183 xii CONTENTS SECTION PAGE VI. The Complete Solution of a Correlation Problem . . 185 VII. Methods of Measuring Correlation which Take Account only of the Relative Position or Rank . . . 189 1. The Method of Rank-differences 190 2. The Method of Gains, or the Spearman Footrule . . . 192 3. Summary of the Rank Methods 195 VIII. A Method of Measuring Relationship when the Data are Grouped into Classes or Categories. The Contin- gency Method 195 IX. Non-linear Relationship 203 1. The Correlation Ratio 203 2. The Correction of "raw" eta . . . 209 3. Test of Linearity of Regression ; 209 X. The Correction of a Coefficient of Correlation for "Attenuation." 211 XI. Summary of Formulas in Chapter IV 213 CHAPTER V PARTIAL AND MULTIPLE CORRELATION I. The Meaning of Partial and Multiple Correlation . . 221 II. A Correlation Problem Involving 3 Variables . .* . 223 III. General Formulas for Use in Partial and Multiple Corre- lation 231 1. General Formulas for Partial r's 231 2. General Formulas for Partial o-'s of any Order .... 233 3. General Formulas for the Regression Equation, and Co- efficients of Regression 235 4. General Formulas for Standard and Probable Errors of Estimate 237 5. General Formula for R, the Coefficient of Multiple Correla- tion 23S 6. Outline of the Formulas Needed in Correlation Problems which Involve (a) Four Variables, (6) Five Variables . 240 IV. A Multiple Correlation Problem Involving 4 Variables . 244 V. The Value and Use of Partial and Multiple Correlation 251 VI. Spurious Correlation 258 1. Spurious Correlation Due to Heterogeneity of Material . 25S 2. Spurious Index Correlation 260 CONTENTS xin SECTION PAGE 3. Spurious Correlation of a Single Test with a Composite of which it is a Member 260 VII. SUMMARY OF FORMULAS IN CHAPTER V 261 CHAPTER VI SOME APPLICATIONS OF STATISTICAL METHOD AND TECHNIQUE TO TESTS AND TEST RESULTS I. The Validity of Test Scores 266 1. Validity Determined through Correlation with a Criterion . 266 2. Indirect Measures of Validity 267 II. The Reliability of Test Scores 268 1. The Reliability of a Test as Measured by its Self-Correla- tion 268 (A) The " Reliability Coefficient" 268 (B) Effect on Reliability of Lengthening or Repeating the Test 269 (C) Coefficient of Reliability from One Application of a Test 271 (D) Dependence of the Reliability Coefficient on the Size and Variability of the Group 271 2. ' The Index of Reliability 272 3. The Standard Error and Probable Error of Measurement: <T( M ) and PE( M ) 274 III. Combining the Scores from Different Tests .... 277 1. Combining Test Scores by Percentiles 278 2. Combining Test Scores by the Method of Median Mental Age 279 3. Combining Tests which have been Weighted According to the Variability of the Test Scores 279 4. Combining Test Scores by Converting the Scores of Dif- ferent Tests into Comparable Series 281 IV. The a of the Sum or Difference of Corresponding Values of Two Series of Test Scores . 286 V. How to Interpret the Coefficient of Correlation between Two Tests or Other Measures 288 1. The Interpretation of a Coefficient of Correlation in Terms ofo- (es t.) 288 2. The Iiiterpretation of a Coefficient of Correlation in terms of the Standard Error of Measurement, cr^ M) . . . . 290 3. Interpretation of a Coefficient of Correlation in Terms of the Percentage of Common (Overlapping) Elements or Fac- tors 291 STATISTICS IN PSYCHOLOGY AND EDUCATION CHAPTER I THE FREQUENCY DISTRIBUTION I. The Tabulation of Measures into a Frequency Distribution 1. Measures in General : Continuous and Discrete Series In the measurement of mental and social traits or capacities most of the facts with which we deal fall into what are known as continuous series. A continuous series may be defined simply as a series which is theoretically capable of any degree of subdivision. JQ's, for example, are generally thought of as increasing by increments of 1 on a scale which extends from the idiot to the genius; however, there is actually no real reason — at least theoretically — why with more refined methods of measurement we should not be able to get IQ's of 100.8 or even 100.83. Nearly all capacities measured by mental and educa- tional tests and scales, as well as such attributes as height, weight, cephalic index, etc., have been found to be continuous, so that within the range of the scale used, any measure — integral or fractional — may exist and have meaning. When- ever gaps occur in a truly continuous series, therefore, these are usually to be attributed to our failure to measure enough cases, or to the relative crudity of our measuring instruments, or 2 STATISTICS IN PSYCHOLOGY AND EDUCATION to some other fact of the same sort, rather than to the fact that no measures exist within the gaps. There are, however, measures which do not fall into continu- ous series. Thus a salary scale in a department store may run from $10 per week to $20 per week in units of 50 cents or $1.00; no one receives, let us say, $17.53 per week. Or, to take another example, the average family in a certain locality may work out mathematically to be 4.57 children, although there is obviously a real gap between four and five children. Series like these, which contain real gaps, are called discrete or dis- continuous. It is probably fortunate— at least from the standpoint of the beginner in statistics— that nearly all of the measures which we make in psychology are continuous or can be treated as con- tinuous. This considerably simplifies the problem, inasmuch as we may concern ourselves (for the present at least) almost entirely with methods of handling continuous data, postponing the discussion of discrete series to a later page. 2. The Classification of Measures in Continuous Series Data collected from test or experiment are often merely a series of numbers or mass of figures without meaning or signifi- cance until they have been rearranged or classified in some systematic way. The first task that confronts us, then, is the organization of our material, and this leads naturally to a grouping of the measures into classes or categories. The pro- cedure in grouping falls under three main heads, which are given in order below: (1) The determination of the range: the interval between the largest and the smallest measures. The range is easily found by subtracting the smallest from the largest measure. (2) Deciding upon the number and size of the groups to be used in classification. The number and the size of these steps or class-intervals depend largely upon the range and the kind of measures with which we are dealing. THE FREQUENCY DISTRIBUTION (3) The tabulation of the separate measures within their proper step- or class-intervals. TABLE I Army Alpha Scores Made by 54 Columbia College Men 1. THE ORIGINAL ! SCORES (UNGROUPED) 185 174 127 183 168 * 126 177 154 157 189 172 *201 158 160 179 184 155 137 177 164 198 176 188 197 151 188 188 169 195 165 185 188 164 195 176 185 185 179 146 182 153 158 160 191 176 138 185 155 178 151 144 191 170 157 * Maximum score = 201 * Minimum score = = 126. 2. THE SAME SCORES GROUPED INTO A FREQUENCY DISTRIBUTION BY THREE METHODS (A) (B) (C) (1) (2) (3) Scores Tabulat: ion F Scores F Scores F 200 up to 205 / 1 200-204.99 1 200-204 1 195 " " 200 //// 4 195-199.99 4 195-199 4 190 " " 195 //, 2 190-194.99 2 190-194 2 185 " " 190 MU 10 185-189.99 10 185-189 10 180 " " 185 'ill" 3 180-184.99 3 180-184 3 175 " " 180 mu III 8 175-179.99 8 175-179 8 170 " " 175 in 3 170-174.99 3 170-174 3 165 " " 170 in 3 165-169.99 3 165-169 3 160 " " 165 mi 4 160-164.99 4 160-164 4 155 " " 160 mu 1 6 155-159.99 6 155-159 6 150 " " 155 mi 4 150-154.99 4 150-154 4 145 " " 150 i ( 1 145-149.99 1 145-149 1 140 " " 145 i 1 140-144.99 1 140-144 1 135 " " 140 ii 2 135-139.99 2 135-139 2 130 " " 135 130-134.99 130-134 125 " " 130 n 2 125-129.99 2 125-129 2 AC- 54 A T = 54 N = 54 These three principles of classification are illustrated in Table I. The figures in this table represent the Army Alpha scores received by 54 college men. Since the highest score is 201, and the lowest 126, the range is found at once to be exactly 75 points. In deciding upon the number of "steps" or class- intervals to be used in grouping, the best general rule is to select by trial a step-interval which will yield not more than 20 nor less than 10 steps. The number of steps which a given interval will yield can be determined approximately (within one step) 4 STATISTICS IN PSYCHOLOGY AND EDUCATION by dividing the range by the step tentatively chosen. In the present problem, for example, 75 (the range) divided by 5 (the step-interval) gives 15, which is one less than the actual number of steps, namely 16. A step-interval of 3 points will yield approximately 25 steps, while a step-interval of 10 points will yield approximately 7.5 steps. (Actually, for the given data, a step-interval of 3 points yields 26 steps, and one of 10 points 8 steps.) The tabulation of the separate scores within their appro- priate step- or class-intervals is shown in Table I(2A). In the first column of this table, — in the column marked " Scores, " — the step-intervals have been listed serially, with the smallest measures at the bottom of the column. The first interval, "125 up to 130," begins at 125 and ends at 130; the second interval "130 up to 135," begins at 130 and ends at 135 and so on. The last interval, "200 up to 205," begins at 200 and ends at 205. In column 2, marked "Tabulation," the separate scores have been listed opposite their proper intervals. The first score, 185 [see Table 1(1)], is represented by a tally placed opposite step-interval "185 up to 190"; the second score, 201, by a tally placed opposite step-interval "200 up to 205"; the third score, 188, by a tally placed opposite "185 up to 190" and so on for the other scores. When all 54 scores have been listed, the total number of tallies on each step-interval (i.e., the frequency) is written in column 3, headed F (frequencies). The sum of the F column is called N. In the present case, of course, N = 54. When the total frequency of each step-interval has been tabulated opposite its proper step-interval, as shown in column 3, our 54 Alpha scores are arranged into what is known as a Frequency Distribution. The reader will note that the lower limit of the first step in the distribution (i.e., 125 up to 130) has been taken at 125 although the lowest actual score in the series is 126. This is due to the fact that when the step-interval equals 5 units, it facilitates tabulation as well as computations which come later on x if the lower limit of the first step-interval (and accordingly THE FREQUENCY DISTRIBUTION 5 of each succeeding step-interval) is a multiple of 5. A step- interval of 126 up to 131 is just as good as a step-interval of 125 up to 130, theoretically; the second, however, is much easier to handle from the standpoint of the arithmetic involved. 3. Three Ways of Expressing the Limits of a Step-interval Table I (2 A,B,C) illustrates three ways of writing the limits of a step-interval. In (A) the interval "125 up to 130" means that all scores from 125 up to but not including 130 fall on this step. In (B) the step-interval 125-129.99 means exactly the same thing. The upper limit is written 129.99 simply to emphasize the fact that this step-interval includes score 129 plus fractional parts up to 130, but does not include score 130. (C) expresses the same facts more clearly than (A) and not so exactly as (B). Thus 125-129 means that this step-interval begins with score 125 and ends with score 129. A diagram will indicate how (A), (B), and (C) are simply three ways of express- ing the same facts. Step Step Begins Ends 1 1 » 2 , 3 , 4 , 5 1 125 126 127 128 129 130 Either method (B) or method (C) is advised as preferable to (A). It is fairly easy — even when one is on guard — to let a score of say 160 slip into the step-interval 155 up to 160 due simply to the presence of the 160 at the upper limit of the step. The accurate tabulation of a frequency distribution depends on getting each score into its proper step-interval, and for this reason one cannot be too careful in defining the limits of the steps. In any frequency distribution we always assume that the scores within a given interval (i.e., the frequency) are spread evenly over the entire interval; and this assumption holds whether the length of the step is 3, 5 or 10 units. If we wish to represent all of the scores within a given interval by some single value, however, the midpoint of the interval is taken as 6 STATISTICS IN PSYCHOLOGY AND EDUCATION the most logical choice. To illustrate, in the step-interval 155-159 [see Table I (2 C)] the six scores on this step are all represented by the same value, 157.50, the midpoint of the interval, although the scores are 155, 155, 157, 157, 158, 158. The reason why 157.50 is the midpoint of the step-interval can be shown graphically as follows: Step Step Begins Ends I 1 i 2 ,3,4,5| 155 156 157 1 158 159 160 157.50 A simple rule for finding the midpoint of a step is __. , . . , ,. ., - . . (upper limit — lower limit) Midpoint = lower limit of step -j — . For example, in the present case, 155H ^ =157.50. Again, since the length of the step is 5, it follows that the mid- point must be 2.5 points from the lower limit of the step, i.e., at 155+2.5 or 157.50. It is often a question whether the midpoint is a fair repre- sentative of all of the scores on a given step-interval. If we examine the six scores on step 155-159, two scores, the two 155's, are below the midpoint; two scores, the two 157's, are practically on the midpoint; and two scores, the two 158's, are above the midpoint. Also an examination of the step preced- ing and the step following 155-159 shows that on both of these steps there are 2 measures above and 2 below the midpoint. There seems good evidence, therefore, for assuming that the midpoint represents fairly the scores on these intervals, though it is true that the balancing of scores above and below the midpoint is not always as clear cut as in the examples cited. In certain cases, in fact (e.g., when the distribution is considerably "skewed" *), there are often many more scores on one side of the midpoint than the other, and the midpoint assumption is 1 When the scores are " piled " up at either the lower or the upper end of the scale, the distribution is said to be " skewed.'! See page 86. THE FREQUENCY DISTRIBUTION 7 then clearly untenable. The fact remains, however, that in most frequency distributions of mental and educational measure- ments, especially when the number of measures is large, the assumption that the midpoint represents all of the scores on the interval is a valid one, since in the long run about as many scores will fall above as below the midpoint value. 4. The Meaning of a Single Score in a Continuous Series So far we have discussed the classification of scores into step- intervals (the frequency distribution) and the necessity of defin- ing carefully the upper and lower limits of our step-intervals. We shall now try to give a more precise notion of what is meant by a single score, for example, a score of 165 points on Army Alpha. If we think of the score 165 as occupying a certain interval or distance on a linear scale, then any fractional value from 165 up to (but not including) 166, e.g., 165.3, 165.8, etc., will fall within this interval and be scored simply as 165. See illustration : Step 1G5 165 166 A score of 165 may mean, therefore, that the person who made it was just barely through 165 items, or that he had nearly completed 166 — in either case his score will be 165. In performance scales a score equal to or greater than 8, say, but less than 9 is placed on step 8-9 or 8-8.99 and scored 8. In most product scales, however, — the Thorndike Handwriting Scale is an example — a score of 8 represents any value from 7.5 to 8.5: i.e., any value from a point one half step below 8 to a point one half step above. Thus scores 7.7, 8.0, 8.4, etc., would all be scored 8. If as before we think of a score on such a scale as a linear magnitude, 8 represents the midpoint of that interval which extends from 7.5 to 8.5. See illustration: Step 8 ! i 7.5 8 8.5 8 STATISTICS IN PSYCHOLOGY AND EDUCATION This method of scoring is employed in scales which measure handwriting, drawing, composition, etc. It is evident from the foregoing that the meaning of a single score in a continuous series will depend upon how the test is scored. If the score is not defined by the test, it is probably safer to assume that a score of 22, say, means 22-23, rather than 21.5-22.5. II. Measures of Central Tendency When scores or other measures have been tabulated into a frequency distribution, generally our next task is to find a measure of central tendency. The value of a measure of central tendency is twofold: in the first place, it is a single measure which represents all of the scores made by the group, and as such gives a concise description of the performance of the group as a whole; secondly, it enables us to compare two or more groups in terms of typical performance. There are three measures of central tendency in common use, (1) the average or arithmetic mean, (2) the median, and (3) the mode. We shall consider these three measures in order. 1. The Average, or Arithmetic Mean 1 The average is the best known of the measures of central tendency. It may be defined simply as the sum of the sepa- rate scores or measures in a series divided by their number. To illustrate, if a man makes $3.00, $4.00, $3.50, $5.00 and $4.50 on five successive days, his average daily wage ($4.00) is obtained by dividing the sum of his daily earnings by the number of days he has worked. The formula for the average of a series of ungrouped measures is simply A 2 (Measures) /1N Average = -^ , (1) in which N is the number of measures in the series. 2 1 The term " average " is often used as a general expression to cover any measure of central tendency. It is here used in a more restricted sense. 2 The symbol 2 means "sum of." THE FREQUENCY DISTRIBUTION 9 When measures have been grouped into a frequency dis- tribution, it is necessary to calculate the average by a slightly different method from the one given above. The two illustra- tions in Table II will make this method clear. The first of these shows the calculation of the average for the 54 Army Alpha scores which we have already tabulated into a frequency distribution in Table I. Note that we first calculate the FXM column by multiplying the midpoint (M) of each step-interval by the number of scores (F) on it; and that the average (171.57) is then simply the sum of the FXM (9265) divided by N (54). The use of the midpoint for all of the scores on the interval is made necessary by the fact that when scores have been grouped into step-intervals they lose their identity and are thereafter represented by the midpoint of the particular interval on which they happen to fall. Hence, we must multiply or "weight" the midpoint of each step (M) by the frequency (F) on that step; add the FXM, and divide by N to get the average. The formula may be written Average = *^ (2) Example (2), Table II, is a second illustration of the calcula- tion of an average from grouped data. This frequency dis- tribution represents 200 scores made by a group of adults on a cancellation test. These scores are classified into 9 steps; and since the step-interval is 4 points, the midpoint of each step is found by adding J of 4 to the beginning of each step (for example, 104+2=106). The FXM column (found as shown above) totals 23988, and N equals 200. Hence, applying formula (2), the average is found to be 119.94. In both illustrations in Table II we have found the average of the scores made by a given group. There is no reason, however, why we cannot use either formula (1) or (2) to find the average of a number of measurements made on the same individual, as well. Thus an individual's reaction time to light may be measured 100 times, the measures tabulated into a 10 STATISTICS IN PSYCHOLOGY AND EDUCATION TABLE II To Illustrate the Calculation of the Average, Median, and Mode, from Data Grouped into a Frequency Distribution 1. data from table i (2), 54 army alpha scores the step-interval = 5 points Scores Midpoint F FXM 200-204.99 202.5 1 202. 50 195- -199 99 197.5 4 790. 00 190- -194 99 192.5 2 385. 00 185- -189 .99 187.5 10 1875. 00 180- -184 .99 182.5 3 547. 50 175- -179. .99 177.5 3 26 1420, ,00 170- -174.99 172.5 517 50 165- -169 .99 167.5 3 502 50 160- -164 .99 162.5 4 650 .00 155- -159 .99 157.5 6 945 .00 150 -154 .99 152.5 4 610 .00 145- -149 .99 147.5 1 147 .50 140 -144.99 142.5 1 142 .50 135- -139 .99 137.5 2 275 .00 130- -134.99 -129.99 132.5 127.5 2 N = 54 125- 255 .00 9265.00 vprn p-p = X(FXM) = 9265 1 .57. (2) (^ = 27^ Median = 175+ix5 = 175.625. (3) Crude mode falls on class-interval, 185-189.99 or at 187.5 2. SCORES MADE BY 200 ADULTS ON A CANCELLATION TEST STEP-INTERVAL = 4 POINTS F FXM Scores Midpoint 136- -139 138 132- -135 134 128- -131 130 124- -127 126 120- -123 122 116- -119 118 112- -115 114 108- -111 110 104- -107 106 mffe 2(FXM) _23988. 3 414 5 670 16 2080 23 2898 52 6344 49 52 27 bl 5782 3078 18 1980 7 742 AT = 200 23988 (1) Average = ~" " M/ =^— = 119.94. (2) (^ = 100) Median = 116-f^X4 = 119.92. (3) Crude mode falls on class-interval, 120-123, or at 122. THE FREQUENCY DISTRIBUTION 11 frequency distribution, and the average found in exactly the same way in which we find the average reaction time to light of 100 different observers. 2. The Median When scores or other measures are arranged in order of size, the median is defined as the midpoint of the series, that is, as the point above which and below which are 50% of the measures. By definition, therefore, the median may be found N by counting off one half of the measures, i.e., — , from either end of the series. Let us first consider the calculation of the median for scores or measures in a simple ungrouped series. Two cases arise: Case I when N is odd, and Case II when N is even. As an illus- tration of the first case, take the following eleven consecutive scores: 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24. Now since N N equals 11, — = 5.5; and counting off the first five scores, namely, Ji 14, 15, 16, 17, 18, we reach 19, since score 18 means "18 up to 19." (See page 7.) The .5 left of our 5.5 then locates the median midway between 19 and 20, viz., at 19.5. To verify this result we may count off 5.5 scores beginning at the other end of the series. The five scores, 24, 23, 22, 21, and 20, take us to 20 (the upper limit of score 19) and the .5 left puts the median at a point midway on the scale between 20 and 19, viz., at 19.5 again. (See diagram below.) Case I (N is odd) Begin 5.5 Scores Median 5.5 Scores End ] I 1 1 I 1 I 19-5 1 1 I I [ 1 14 15 16 17 18 19 20 21 22 23 24 25 To illustrate the procedure when N is even, let us drop off the first score (14) from the series of eleven scores in Case I. N is N now 10, and -^ is 5.0. Counting off the first five scores, therefore. 12 STATISTICS IN PSYCHOLOGY AND EDUCATION from the small end of the series, i.e., 15, 16, 17, 18, 19, we reach 20 (the upper limit of " 19 up to 20") as the median. Likewise, if we count down five scores from 24, i.e., 24, 23, 22, 21, 20, we again reach 20, the lower limit of the step " 20 up to 21." See diagram below: Case II (N is even) Begin (5 Scores) Median (5 Scores) End 1 1 11111111 15 16 17 18 19 20 21 22 23 24 25 It will be noted that in the two cases just cited, the measures were taken to be in continuous series. If, instead of continuous, the eleven scores under Case I are taken as discrete or discontin- uous there is now no value which fulfills the definition of the median as the midpoint in the series. When N is odd, however, the midscore or the middle measure may be obtained by counting off - — ~ — scores from either end of the series, after the scores have been arranged in order of size from least to greatest. 11 + 1 Thus, (Case I) — - — or 6 scores counted off from either end of our series puts the midscore at 19 — since there are 5 scores above and 5 scores below this score. A slightly different pro- cedure is necessary when N is even. If the ten scores under Case II, for example, are taken as discrete, there is in this series, clearly no median value, and no midscore. However, in such cases as this it is customary to take the midscore arbi- trarily at a point midway between the two middlemost scores. N+l Thus, in our illustration, — - — =5.5, which puts the midscore A at 19.5, midway between 19 and 20, the two middlemost scores. (For a discussion of the median for discrete measures grouped into a frequency distribution, see page 36.) The method of calculating the median for continuous data grouped into a frequency distribution is shown in the two examples in Table II. Since there are 54 scores in the first THE FREQUENCY DISTRIBUTION 13 N . . distribution, — is 27. The median, therefore, is that point on the scale which has 27 scores on each side of it. If we begin at the small end of the distribution x and add up the scores in order, the step-intervals 125-129.99 to 170-174.99, inclusive, are found to contain just 26 scores. The next step, 175-179.99, contains 8 scores (assumed to be evenly spread over the entire step. See page 5.) To get the 1 extra score needed to make 27, therefore, we must take 1/8X5 — the length of step — and add this amount (.625) to 175, the beginning of the step- interval 175-179.99. This puts the midpoint at 175+.625 or 175.625, which is, accordingly, the median of the distribution. (See Diagram I.) A second illustration of how the median is found when the data are grouped into a frequency distribution is given in Table II (2). This second example should aid in clearing up any doubtful points in the first problem. Since there are 200 scores in this distribution, one half of the scores is 100, and the median must lie at a point 100 scores distant from either end of the distribution. If we begin at the small end of the distribu- tion, i.e., at 104-107, and add the scores in order, 52 scores will take us through step 112-115. The 49 scores on the next step- interval, (116-119) total 101 scores — one too many to give us the median. To get the 48 scores needed to make exactly 100, therefore, we must take 48/49X4 (the length of the step) and add this amount, 3.92 to 116, the beginning of the step-interval. This takes us exactly 100 scores into the distribution, and locates the median at 119.92. Diagram I (2) shows graphically how this median is obtained. Summary of the steps in computing the median from data tabulated in a frequency distribution: N (1) Find — measures. , z N 1 While the median may be found equally well by counting in — scores from the large end of the distribution, it is simpler to begin at the small end, and the student is advised to follow this plan first. 14 STATISTICS IN PSYCHOLOGY AND EDUCATION (2) Begin at the smaller end of the distribution and count the measures serially up to the interval which contains the median. N- (3) Divide the number of measures necessary to fill out — by the frequency on the interval containing the median [reached Scale F 179 178 177 176 IT 8 ! 8 7 6 Step 2 \ 5 175-179 3 4 3 2 1 174 173 172 171 120 3 Step •- 170-174 s M5 2 1 34 Scores to 180 8F's .21 Scores to 175.625, the Median 26 Scores to 175 3F's Median =175 +^ X 5 =175.625 DIAGRAM I (1) The Calculation of the Median. Explanation — 26 9cores go up to 175 on the scale; 34 scores to 180. To find how far 27 scores will go, we must take J of 5 (the step length) and add this to 175. This puts the median at 175.625. in (2) above] and multiply the result by the length of the Btep-interval. (4) Add the amount obtained in (3) to the lower limit of THE FREQUENCY DISTRIBUTION 15 the step which contains the median. This will give the median point on the scale. 3. The Mode The mode is most simply defined as that measure which occurs most often in a series. In the series, 10, 11, 11, 12, 12, Step £ 116-119 § Step jj 112-115 s Scale _120_ 119 118 117 -1-16- 115 114 113 4-1-2- OS X 101 Scores to 120 100 Scores to 119.92, the Median 52 Scores to 116 Median = 116 +*%> x 4 =119.92 DIAGRAM I (2) The Calculation of the Median. Explanation — 52 scores counted off take us to 116 on the scale; 101 scores take us to 120. To find how far 100 scores go, we must take 48/49 of 4 (the step length) and add this amount (3.92) to 116. This locates the median at 119.92. 13, 13, 13, 14, 14, and 15, for example, since the most often recurring measure is 13 this measure may be taken as the mode. In Table I (1) we find from the ungrouped scores that 185 occurs 5 times — more often than any other single score — and hence 185 may be taken as the mode of this series. 16 STATISTICS IN PSYCHOLOGY AND EDUCATION When the scores or measures are continuous and have been grouped into a frequency distribution, the " crude mode" is often taken as the midpoint of the step-interval which contains the greatest frequency. In Table I, for example, if we did not know from the ungrouped scores that 185 is the modal score, the crude mode of the distributions given in (2) would be taken at 187.50, the midpoint of step 185-189, the step-interval con- taining the greatest frequency. Likewise, in Table II, the crude mode would be 122, the midpoint of the step which con- tains the greatest frequency. It is clear that the crude mode will be dependent to a large extent upon the size of the step-interval selected (i.e., on whether the grouping is by large or small steps) and for this reason it is often an unstable measure of central tendency. This is not necessarily a serious drawback, however, as the mode is usually employed simply to indicate in a rough way the center of con- centration in the distribution. For this purpose it is not necessary to define it so carefully as we do the median or the arithmetic mean. III. Measures of Variability In Section II we discussed the calculation of the so-called " measures of central tendency" — measures typical or repre- sentative of the set of scores as a whole. Our next step is the calculation of the variability of the scores, i.e., of the "scatter" or "spread" of the separate scores or measures around their measure of central tendency. This will be the task of the pres- ent section. The usefulness of some measure of variability can be shown by a simple example. Suppose that we have given a test of controlled association to a group of 50 boys and the same test to a group of 50 girls. The average scores are, Boys, 34.6 sees., and Girls, 34.5 sees. — so far as the averages go, there is apparently no difference in the performance of the two groups. Suppose, however, that on examining the original scores, we THE FREQUENCY DISTRIBUTION 17 find the boys' scores ranging from 15 to 51 sees, and the girls' scores ranging from 19 to 45 sees. This discovery would make it evident at once that in a general way, the boys " cover more territory" — are more variable — than the girls, and this greater variability may be of considerably more interest than the lack of difference in the average scores. If a group is homogeneous, i.e., made up of individuals of nearly the same ability, most of the scores will fall near the same point on the scale, the range will be relatively short, and the variability will be small. If, however, the group contains individuals of widely differing capacity, the scores will be strung out from high to low, the range will be relatively wide, and the variability will be large. Four measures have been devised to take account of this factor of variability within a set of measures. These are (1) the range, (2) the quartile deviation, or Q, (3) the average deviation, or AD, and (4) the standard deviation, or SD. 1. The Range In grouping the scores in Table I into a frequency distribu- tion (page 3) we have already had occasion to use the range. It may be re-defined simply as the interval between the largest and the smallest measures. In the illustration given above, the range of the boys' scores is 51-15 or 36, and the range of the girls' scores 45-19 or 26. The range is the most general measure of " spread" or " scatter." It includes 100% of the distribution, and is employed when we wish to make a rough comparison of two or more groups for variability; or when the number of measures is too small to justify the calculation of some more refined measure of variability. Since the range only takes ac- count of the extremes of the series, it is obviously unreliable when frequent or large gaps occur in the distribution of scores. 2. The Quartile Deviation, or Q The quartile deviation, or Q, may be defined as one half of the distance between the 75th and the 25th percentile points in the given distribution. The 25th percentile, or Qi, is the 18 STATISTICS IN PSYCHOLOGY AND EDUCATION first quarter or quartile point on the scale; the point below which lie 25% of the measures. In like manner, the 75th percentile, or Qz, is the third quarter or quartile point on the scale, the point below which lie 75% of the measures. (By analogy, the median is Q2, the second quartile point.) In order to find Q, it is obvious that we must first calculate the 75th and 25th percentile points. These points are found in exactly the same way as the median: viz., to find Qi we count off 25% of the scores from the beginning of the distribution; and to find Qs, we count off 75% of the scores from the beginning of the distribution. Table III illustrates the calculation of Q for the distribution of 54 Alpha scores tabulated in Table I. First, to find Qi, we must count off 1/4 of the total number of scores, i.e., 13.5, from the small end of the distribution. When the scores (the F's) are added in order the first six step-intervals (the steps 125-129.99 to 150-154.99 inclusive) are found to contain 10 scores. The next step, 155-159.99, contains 6 scores. 1 We need only 3.5 additional scores, however, to make up the necessary 13.5; 3 5 hence we take -77- X 5 (the step length) and add this amount (2.92) to 155, the beginning of the step. This locates Qi at 155+2.92 or 157.92. In like manner, we find Q% by counting off 3/4 of the score^ from the small end of the distribution. 3/4 of 2V = 40.5; and thb F's on steps 125-129.99 to 180-184.99, inclusive, added in order, total 37. The next step, 185-189.99, contains 10 scores. To 3 5 round out the necessary 40.5, therefore, we take tttX5 (the step length) and add this amount (1.75) to 185, the beginning of the step. This puts Q3 at 186.75 since 40.5 scores reach this point. 1 Assumed to be spread evenly over the entire step. See page 5. THE FREQUENCY DISTRIBUTION 19 TABLE III To Illustrate the Calculation op Q, AD, and SD from Data Grouped into a Frequency Distribution 1. DATA FROM TABLE I, 54 ARMY ALPHA SCORES V (1) Scores 200-204 . 99 195-199.99 190-194.99 185-189.99 180-184.99 175-179.99 170-174 . 99 165-169.99 160-164.99 155-159.99 150-154.99 145-149.99 140-144.99 135-139.99 130-134.99 125-129.99 (2) Midpoint 202.50 197.50 192.50 187.50 182.50 177.50 172.50 167.50 162.50 157.50 152.50 147.50 142.50 137.50 132.50 127.50 (3) F 1 4 2 10 3 8 3 3 4 6 4 1 1 2 2 AT = 54 Average = 171.57 (Table II) AT — = 13.5, therefore, ^ = 155+^X5 = 157.92 (4) D 30.93 25.93 20.93 15.93 10.93 5.93 .93 ■ 4.07 ■ 9.07 -14.07 ■19.07 -24.07 -29.07 -34.07 -39.07 •44.07 (5) FD 30.93 103.72 41.86 159.30 32.79 47.44 2.79 -12.21 -33.28 -84.42 -76.28 -24.07 -29.07 -68.14 -88! ii 837.44 (6) 956.66 2689.46 876.13 2537.65 358.39 281.32 2.79 49.69 329.06 1187.79 1454.66 579.36 845.06 2321 . 53 '3884^33 18353.88 — =40.5, therefore, Q 3 = 185+^X5 = 186.75 g.A=g»,186-75-157.92 Bl4>42 AD = ZTO 837^4 =15<51 N 54 SD = V 2TO2 N -4 18353 . 88 54 V339. 887 = 18.44 20 STATISTICS IN PSYCHOLOGY AND EDUCATION TABLE III — Continued 2. DATA FROM TABLE II (2), 200 CANCELLATION SCORES (1) (2) (3) (4) (5) (6) Scores Midpoint F D FD FD* 136-139 138 3 18.06 54.18 978.49 132-135 134 5 14.06 70.30 988.42 128-131 130 16 10.06 160.96 1619.26 124-127 126 23 6.06 139.38 844 . 64 120-123 122 52 2.06 107.12 220.67 116-119 118 49 - 1.94 - 95 06 184.42 112-115 114 27 - 5.94 -160.38 952.66 108-111 110 18 - 9.94 -178.92 1778.47 104-107 106 7 N = 200 -13.94 - 97.58 1360.27 1063.88 8927.30 Average = = 119.94 (Table II) N 4 : = 50, therefore, 3N 4 = 150, therefore t Qi- = 112+— ^27 X4 = 115.70 Qz-- 49 = 120+^X4 = 52 123.77 Q ^ Q3-Qi = 123.77-115.70 _ 1Q1 sro_ 1063.88 AD ~ N 200 ~ 5 ' 6Z on jWD* /8927.30 pQ ^ = VnV- = V-200- =6 - 68 With Qi and Q3 known, the quartile deviation, Q, is easily calculated from the formula Q = ^^ (3) _ ., ul n 186.75-157.92 1/f ._ In the present problem, Q = or 14.42. A second illustration of the calculation of Q from a frequency distribution is given in Table III (2). Since the N of this dis- THE FREQUENCY DISTRIBUTION 21 tribution is 200, 1/4 of the measures equals 50. The steps 104- 107 and 108-111 contain 25 scores; and the next step contains 27 scores. To find the point reached by 50 scores, therefore, we must take 25/27X4 (the step length) and add this amount (3.70) to 112, the lower limit of step 112-115. This locates Qi at 115.70. To find Q3, we must count off 3/4 of AT or 150 scores from the small end of the distribution. The first four steps include 101 scores, and the next step, 120-123, contains 52. To fill out 150, therefore, we take 49/52X4 (the length of step) and add this increment (3.77) to 120 to locate Q 3 at 123.77. Sub- stituting 115.70 for Qi and 123.77 for Q 3 in formula (3) we get a Q of 4.04 points. The quartile points, Qi and Q3, are of considerable impor- tance in that they mark off the limits within which fall the middle 50% of the measures in the distribution. The distance between these two points is often called the interquartile range; hence Q is sometimes called the Semi-interquartile Range. Q actually measures the average distance of the two quartile points from the median, and because of the ease with which it can be found is a valuable measure of the closeness with which the scores are grouped directly around the median point. If the scores of a distribution are closely packed together, the quartiles will be close together and Q will be small ; if the scores are scattered, the quartiles will be relatively far apart, and Q will be large. When the distribution is symmetrical or " normal " (see page 85) Q marks off exactly the limits of the 25% of the cases just above, and the 25% of the cases just below the median: and accordingly, the median lies just halfway between the two quartile points Q\ and Q3. Q is then commonly known as the PE (probable error). The terms Q and PE are often used inter- changeably, although it is probably best to restrict the use of the latter term to normal distributions, and to the measure- ment of reliability. The value of the PE as a measure of reliability will be discussed at length in Chapter HI, 22 STATISTICS IN PSYCHOLOGY AND EDUCATION Summary of Steps in Calculation of Q (Data Grouped) To find Qi : 1. Divide N by 4. 2. Begin at the small end of the distribution, and count the scores up to the interval which contains Q\. 3. Divide the number of measures necessary to locate / N\ Qi ( i.e., to complete — J by the frequency in the interval reached in (2) above, and multiply the result by the step-interval. 4. Add the amount obtained in (3) to the lower limit of the step-interval on which Qi lies. The result is Qi. To find Q 3 : 1. Find 3/4 of .V. 2. Begin as before at the small end of the distribution, and count up the scores until the interval which contains Qs is reached. 3. Divide the number of scores required to locate Qs by the frequency in the interval reached in (2) and multiply the result by the step-interval. 4. Add the amount obtained in (3) to the lower limit of the step-interval on which Q3 lies. This locates Qb. To find Q: Substitute Q 3 and Qi in formula (3), n_ Qs-Qx ^~ 2 * 3. The Average Deviation, or AD The average deviation or AD (also written mean deviation- or MD) may be defined as the average of the deviations of all the separate measures in a series taken from their central tendency (usually the average, less frequently the median, THE FREQUENCY DISTRIBUTION 23 or mode). In averaging deviations to find the AD, no account is taken of signs, and all deviations, whether positive or negative, are treated as positive. An example will make the definition clearer. If we have five scores, 6, 8, 10, 12, and 14, the average is easily found to be 10. It is then a simple process also to find the deviation of each measure from the average by subtracting the average from each measure. Thus 6, the first score, minus 10 equals —4 (calculation algebraic); 8-10= -2; 10-10 = 0; 12-10 = 2; and 14 — 10 = 4. The five deviations measured from the aver- age are —4, —2, 0, 2, and 4. Now adding these deviations without regard to sign, the sum is 12; and dividing 12 by 5, we get 2.4, as the average of the 5 deviations from the average, or the AD. The formula for the AD with simple ungrouped numbers like these may be written, 22) 1D = y (arithmetical), (4) in which 2D = sum of deviations, and N is, as before, the num- ber of cases or items in the series. In Table III, the calculation of the AD for scores grouped into a frequency distribution is illustrated by two problems. The average of problem (1) has already been found in Table II to be 171.57. Hence, to find the average deviation of the scores in this distribution from the average, we must take our deviations (D's) around this point. Note, however, that, since the scores have been grouped into step-intervals, we are no longer able to get the D of each score from the average; and hence we simply find the deviation (D) of the midpoint of each step from the average. The substitution of the midpoint value for all of the scores within the step is the only difference between the computation of D's with grouped and ungrouped measures. For example the D of step 200-204.99 is 30.93, found by subtracting 171.57 (the average) from 202.50 (the midpoint of the step). Likewise, the D of the next step is 25.93, found by subtracting 171.57 from 197.50. All of the D's 24 STATISTICS IN PSYCHOLOGY AND EDUCATION are positive as far down the scale as 170-174.99, as in each case the midpoint is larger numerically than the average. From the step-interval 165-169.99 on down to the beginning of the series, however, the D's are negative, as the midpoints of these steps are all smaller than 171 .57. Thus the D of step 165-169.99 is -4.07, e.g., 167.50-171.57= -4.07; and the D of the lowest step in the distribution, 125-129.99, is —44.07. It will be helpful in finding deviations to remember that the average is always subtracted from the individual score or midpoint value. That is, Deviation = Score or Midpoint — Average (calculation alge- braic). Hence it is clear that when the score or midpoint is numerically larger than the average, the deviation must be positive; when the score or midpoint is numerically smaller than the average, the deviation must be negative. It is obviously unnecessary to subtract the average from each midpoint separately in order to obtain the different D's. The reason, of course, is that each step-interval is 5 points; hence, after finding the D of step 200-204.99 to be 30.93, we need only subtract 5 points from this D in order to obtain 25.93, the D of the next step; then 5 again to obtain 20.93, the D of the next step, and so on. 1 The negative D's are obtained in exactly the same way as the positive D's. Thus .93-5= -4.07; -4.07-5= -9.07 and so on to -44.07. Column 4 gives the deviation of each step-interval (as represented by its midpoint) from the average of the dis- tribution. There are, however, more scores on some steps than on others; and for this reason each midpoint -devia- tion (D) in column 4 must be " weighted " (multiplied) by the number of scores (F) which it represents. This gives the FD column, — column 5. The first FD is 30.93; for since there is only 1 score on step 200-204.99, we need simply multiply the first D by 1. The next FD is 103.72; since each 1 Checking the D's occasionally to avoid carrying an error throughout our calculations. THE FREQUENCY DISTRIBUTION 25 of the 4 scores on step 195-199.99 has a D of 25.93. In like manner, we obtain the other FD's, by multiplying each D in column 4 by its corresponding frequency (F) in column 3. When all of the FD's have been calculated, we sum the column without regard to sign and divide by N to obtain the 837.44 AD. In the present problem, the AD equals — =j — or 15.51. The formula for the AD for measures grouped into a fre- quency distribution may now be written as follows: AD= — -(arithmetical) (5) This formula applies equally well to the AD found from the average, median, or mode. The second problem in Table III shows the calculation of the AD for the 200 cancellation scores, grouped into a fre- quency distribution with a step of 4. The average for this distribution has been found to be 119.94 (see Table II, 2). Hence, the D of the first step 136-139 (midpoint 138), from the average is 18.06. The next D may be found by subtracting 4 (the step-interval) from 18.06, and each succeeding D in turn by subtracting 4 from the D just preceding it. The FD's in column 5 are found [as previously shown in (1)] by " weighting " each D by the F which it represents, — by the F opposite it. The sum of the FD column is 1063.88; and since N is 200, from formula (5) we obtain 5.32, as the AD of the scores in this distribution from their average 119.94. In a perfectly symmetrical or normal distribution (page 85) the AD — when measured off above and below the average — marks the limits of the middle 57.5% of the measures. Thus the AD is seen to be slightly larger than the Q. In general, a large AD means that the scores in the distribution are scat- tered around the central tendency; a small AD means that they are concentrated within a relatively narrow range. 26 STATISTICS IN PSYCHOLOGY AND EDUCATION 4. The Standard Deviation, or SD The standard deviation or SD is the most reliable of the measures of variability, and for this reason is customarily used in research which requires great accuracy. The SD differs from the AD in several respects. In the first place, in cal- culating the AD we disregard signs and treat all deviations as positive; in finding the SD, on the other hand, we avoid this difficulty of signs by squaring the separate deviations. Again, the deviations used in computing the SD are always taken from the average, and never from the median or mode as is sometimes done in finding an AD. The conventional symbol used to denote the SD is the Greek letter sigma, a. We may define the SD or a as the square root of the mean (or average) of the squared deviations taken from the average of the distribution. To illustrate the calculation of the SD in a simple case, let us consider the example used to illustrate the calculation of the AD (see page 25) in which the devia- tions of the five measures, 6, 8, 10, 12, and 14, from their average 10 were found to be —4, —2, 0, 2, and 4, respectively. If we square each of these deviations we get 16, 4, 0, 4, and 16 (the minus signs become plus in squaring). Next, summing up these five squares and dividing by 5, the mean of the squares (8) is obtained; extracting the square root of this result gives 2.828 the SD or a of the series. The formula for the a of a series of numbers, ungrouped, is 2D 2 w (6) Table III illustrates the calculation of a for scores grouped into a frequency distribution. The process is identical with that used for simple numbers except that in addition to squar- ing the D of each midpoint from the average, we " weight ' each of these squared deviations by the frequency which it represents — the frequency opposite it. This gives the FD 2 column. By simple algebra, DXFD~FD 2 ) and accordingly the easiest way to obtain the entries in this column is by THE FREQUENCY DISTRIBUTION 27 multiplying the corresponding D's and FD's in columns 4 and 5. The first FD entry, for example, is 956.66, the product of 30.93X30.93; the second is 2689.66, the product of 103.72 X 25.93, and so on to the end of the column. All of the FD 2, s are necessarily positive, since each negative D is matched by a negative FD and consequently the product is positive. The sum of the FD 2 column (18,353.88) divided by N(54) gives the mean of the squared deviations as 339.887; and the square root of this result is 18.44, the standard deviation. The formula for the SD when the data are grouped into a frequency distribution is fzFm ^\^r (7) Problem (2) of Table III furnishes another illustration of the calculation of cr from grouped data. Column 6, the FD 2 column has been obtained, as in the previous problem, by multiplying each D by its corresponding FD. The sum of the FD 2 column is 8927.30; and N is 200. Hence, applying formula (7) we get 6.68 as the standard deviation [see Table III, (2) for calculations]. The standard deviation is, in general, less affected by chance fluctuations than the AD, and is, therefore, a more stable measure of dispersion. In a " normal " distribution (page 85) the SD when measured off above and below the average marks the limits of the middle 68.26% (roughly the middle 2/3) of the distribution. This is approximately true, also, for less symmetrical distributions. For example, in the first problem in Table III, the middle two thirds of the scores will fall roughly between score 190 (171.57+18.44) and score 153 (171.57—18.44). The standard deviation is always larger than the AD which, in turn, is always larger than Q. This relation supplies a rough but simple check on the accuracy of calculated measures of variability. 28 STATISTICS IN PSYCHOLOGY AND EDUCATION IV. The Short Method of Finding the Average, the AD, AND THE SD(a) In Tables II and III, the average, the AD, and the SD have been calculated by what is oftentimes known as the Long Method. The reader will recall that the average in these tables was found by multiplying the midpoint of each step- interval by the number of scores on the step, summing up this column (the FXM) and dividing by N, the number of cases (page 9). Besides, in finding the AD and the SD all midpoint deviations were figured from the actual averages of the distributions. It is, no doubt, already apparent that the Long Method (LM) requires the handling of large numbers and decimals and that the calculations are often tedious. To save time and labor, therefore, the Guessed Average Method, or more simply the Short Method (SM), has been devised for the express purpose of cutting down the calculations involved in finding the average, the AD, and the SD. (The Short Method does not apply to the computation of the Median and the Q, which are always found by the methods with which we are already familiar.) The student of statistics should make a special effort to learn the Short Method to the point where he can use it with facility. Not only is it a great time and labor saver, but in the calculation of coefficients of correlation it is well-nigh indispensable. Table IV (2) illustrates the calculation of the average, AD, and SD by the Short Method. In order to make a com- parison of the computations involved in the two methods easier, the calculations by the Long Method of the average, AD, and SD for the same data are also given in the Table. 1. The Calculation of the Average by the Short Method The first important fact to grasp in beginning a study of the calculation of the average by the Short Method is that we " guess " or assume an average at the outset, and later apply THE FREQUENCY DISTRIBUTION 29 TABLE IV To Illustrate the Calculation of the Average, AD, and SD by the Short Method. Data from Table II (1) Calculations for Long Method Given for Comparison. 1. long method (i) Scores 200-204 195-199 190-194 185-189 180-184 175-179 170-174 165-169 160-164 155-159 150-154 145-149 140-144 135-139 130-134 125-129 1. Aver. (2) Midpoint 202.5 197.5 192.5 187.5 182.5 177.5 172.5 167.5 162.5 157.5 152 147. 142 137 132 127 (3) F 1 4 2 10 3 8 3 3 4 6 4 1 1 2 2 (4) FXM 202 . 5 790.0 385.0 1875.0 547.5 1420.0 517.5 502.5 650.0 945.0 610.0 147.5 142.5 275.0 iV=54 255.0 9265.0 (5) D 30.93 25.93 20.93 15.93 10.93 5.93 .93 - 4.07 -9.07 -14.07 -19.07 - 24 . 07 - 29 . 07 - 34 . 07 - 44 . 07 ■ZFM 9265 N 54 = 171.57 — V^N 183 ! 54 2. SHORT method (1) (2) (3) (4) Scores Midpoint F D 200-204 202.5 1 7 195-199 197.5 4 6 190-194 192.5 2 5 185-189 187.5 10 Fg = 31 1 4 180-184 182.5 3 3 175-179 Average =177.5 8 2 170-174 171 57 172.5 3 1 165-169 167.5 (GA) 3 ] 160-164 162.5 4 -1 155-159 157.5 6 -2 150-154 152.5 4 -3 145-149 147.5 1 > Fi = 23 -4 140-144 142.5 1 -5 135-139 137.5 2 -6 130-134 132.5 -7 125-129 127.5 2 -8 A=54 (6) FD 30.93 103.93 41.88 159.30 32.79 47.44 2.79 -12.21 -36.28 -84.42 -76.28 -24.07 - 29 . 07 -68.14 -88.14 837.44 (7) FD* 956.66 2689.46 876.13 2537.65 358.39 281.32 2.59 49.69 329.06 1187.79 1454.66 579 . 36 845 . 06 2321.53 3884^33 18353.88 2. 53 . 88 AD = SFD_ 837.44 N 54 15.51 = 18.44 (5) (6) FD FD* 7 49 24 144 10 50 40 160 9 27 16 32 3 ( + 109) 3 4 4 12 24 12 36 4 16 5 25 12 72 16 (-65) 128 GA= 167.50 c2= .6639 C = .8148X5=4.07 Average = 167 . 5 +4 . 07 = 171 . 57 2. AD 174 2FD+c(Fi- 770 Fg) c=4+= .8148 5 4 N Xstep _174 + . 8148(23-31) = 15.51 VSFD2 /770 — j c2= -J-gj— .6639 = 3.687X5=18 54 44 X5 30 STATISTICS IN PSYCHOLOGY AND EDUCATION a correction to this guessed average (GA) in order to obtain the actual average. There is no set rule for guessing an average. The best plan is to take the midpoint of a step somewhere near the center of the distribution, and if possible the mid- point of that step-interval which contains the greatest frequency. In our problem the greatest F is on step 185-189. However, the GA is taken at 167.5 instead of 187.5 since the former is closer to the center of the distribution. With the question of the GA settled, the correction which must be applied to it to get the average is determined as outlined in the following steps: (1) First, we fill in the D column, column 4. Here are entered the deviations of the midpoints of the steps measured from the GA in units of step-interval. Thus 172.5, the mid- point of step 170-174, deviates from 167.5, the GA } by 1 step-interval; and hence, a figure 1 is placed in the D column opposite 172.5. In like manner, 177.5 deviates 2 steps from 167.5; and accordingly, a 2 goes in the D column opposite 177.5. Reading on up the column from 177.5, the succeeding D entries are found in the same way to be 3, 4, 5, 6, and 7. The last entry, 7, is the step deviation of 202.5 from 167.5 (the actual point deviation, is, of course, 35). Returning to 167.5, we find that the D of this point, measured from the GA (from itself) is 0; and hence a is placed in the D column opposite step 165-169. Below 167 . 5, all of the D entries are negative, as all of the midpoints are less than 167.5, the GA. So the D of 162.5 from 167.5 is -1 step-interval; and the D of 157.5 from 167.5 is —2 step- intervals. The other D's are —3, —4, —5, —6, —7, —8. (2) The D column completed, we next compute the FD column — column 5. The FD entries are found in exactly the same way as in the Long Method [compare (1)]; namely, each D in column 4 is multiplied, or " weighted," by the appropriate F in column 3. Note that in the Short Method we multiply each F by its deviation from the GA in units of step-interval instead of by its actual deviation from the THE FREQUENCY DISTRIBUTION 31 average of the distribution, and that for this reason the com- putation of the FD's is much simpler here than in the Long Method. All of the FD's above (greater than) the GA will be positive, and all below (smaller than) the GA negative, since the signs of the FD's depend on the signs of the D's. (3) From the FD column the correction is obtained as follows: The sum of the plus FD's is 109; of the negative FD's, — 65. This makes 44 more plus FD's than minus (the algebraic sum is +44) and 44 divided by 54 (N) equals .8148, which is the correction, " c," in units of step-interval. If we multiply c (.8148) by 5, the length of the step, the result is C (4 . 07) , the score correction, or the correction in score units. When +4.07 is added to 167.5, the GA } the result is 171.57, the average. (Compare this result with the average found by the Long Method.) A summary of the steps in the calculation of the average by the Short Method may be outlined as follows (see Table IV, 2) : (1) Organize the scores or measures into a frequency distribution. (2) Guess an average somewhere near the center of the distribution, and preferably on the step containing the greatest frequency. (3) Find the deviation of the midpoint of each step-interval from the GA in units of step-interval. (4) Multiply or weight each step-deviation (D) by its appropriate F, i.e., by the F opposite it. (5) Find the algebraic sum of the plus and minus FD's, and divide this sum by N, the number of cases. This gives c, the correction in units of step-interval. (6) Multiply c by the length of the step-interval to get C, the score correction. (7) Add C algebraically to the guessed average to get the actual average. Sometimes C will be positive and some- times negative, depending upon where the average has been guessed. The method applies equally well in either case. 32 STATISTICS IN PSYCHOLOGY AND EDUCATION If it seems to the reader that the Short Method belies its name, let him compare the calculations in columns 4 and 5 (SM) with the calculation of column 4 (LM). In spite of the extra column, the SM has a decided advantage over the LM, for as all deviations from the GA are in units of step-interval (whole numbers) the arithmetic is considerably easier in the latter method. In distributions containing large numbers, the calculation of the average by the LM becomes very laborious; and it is with such distributions that the SM justifies itself as a time and labor saver, rather than with distributions containing small numbers. 2. The Calculation of the AD by the Short Method (A) The Calculation of the AD from the Average The chief advantage in finding the AD by the Short Method instead of the Long Method lies in the fact (already noted in calculating the average) that in the Short Method deviations are taken from a GA in units of step-interval. This procedure eliminates fractions and cuts down multiplication; but at the same time it necessitates the application of a correction to the XFD and as a result complicates the AD formula. The formula for the AD by the Short Method is: l . n 2FD+c(Fi-Fg), ,, , . . ■ . , . AD = ~ -X length of step-interval. . (8) The term Fl in the formula refers to the sum of the F's on those steps whose midpoints are less (the subscript " I ' means less) than the average of the distribution. The term Fg refers to the sum of the F's on those steps whose midpoints are greater (the subscript " g " means greater) than the average. In Table IV, for example, all of the midpoints from 167.5 down to 127 . 5, inclusive, are less than 171 . 57, the average and hence the Fl is 23. All of the midpoints from 172.5 up to 202.5, inclusive, are greater than 171.57; and hence the Fg is 31. It is important to remember that the Fl and the Fg 1 This formula applies equally well to the AD calculated from average, median, or mod©. THE FREQUENCY DISTRIBUTION 33 are always calculated from the actual average of the distribution (never from the guessed average) as the reference point. In con- sequence the 3 scores on step 165-169 whose midpoint, 167 . 5, is less than 171.57 are included in the Fl. A simple check on the size of the Fl and Fg is to make sure that Fi+Fg=N. (Note that in the present problem 23+31 = 54.) The other terms in the formula require little explanation. The c is the correction in units of step-interval. It has already been found in calculating the average (page 31) and equals .8148. The 2FD is the arithmetic sum of the FD column, and equals 174. If now we substitute for 2FD, c, Fl, and Fg in formula (8), the numerator is 174+ .8148(23-31) or 167.482. Dividing this result by 54 (2V) we obtain 3.102, the AD expressed in units of step-interval; and this value multiplied by 5 (the step) gives 15.51, the AD of the distribution. (Compare with the AD found by the Long Method.) Notice that it is always necessary to multiply the result given in the formula by the step-interval, since XFD and c are both in units of step. Formula (8) is a relatively quick way of rinding the AD of a frequency distribution. The value of the formula is somewhat limited, however, since it gives correct iD's only when c, the step-correction, is less than 1.00. In Table IV, c= .8148 — is less than 1.00 — and in consequence the formula holds, as we find on comparing the AD's given by the Long and Short Methods. One method of circumventing this limitation in the AD formula, is to make use of the fact that no matter where the GA is taken, a correction can always be calculated by means of which we can obtain the actual average. If the c so found is less than 1 . 00, formula (8) may be applied directly; if, however, c is larger than 1.00, we must guess another average on the same step as the actual average (which is now known) and take deviations from this " new " GA. The formula will then hold. (There is another formula for the AD which avoids the difficulty mentioned: see Kelley T. L., Statistical Method, p. 72ff.) 34 STATISTICS IN PSYCHOLOGY AND EDUCATION A summary of the steps in the calculation of the AD from the average by the Short Method may be given as follows: (1) Find c, the correction in step-units, as shown on page 31. If c is less than 1.00: (2) Find the arithmetic sum of the FD's. (3) Calculate the Fl: the total number of scores on steps with midpoints less than the average. Next calculate the Fg : the total number of scores on steps with midpoints greater than the average. (4) Substitute for FD, c, Fl, Fg, N, and the step length in formula (8) to find the AD. TABLE V To Illustrate the Calculation of the AD from the Median by the Short Method. Data prom Table 11(2) (1) (2) (3) Scores Midpoint F 133-139 138 3 132-135 134 5 128-131 130 16 124-127 126 23 120-123 122 52 116-119 118 (GM) 49 112-115 114 27 F a = 99 108-111 110 18 f 104-107 106 7 J Fi = 101 A T = 200 265 (4) (5) D FD 5 15 4 20 3 48 2 46 1 52 -1 -27 -2 -36 -3 -21 N 2= 10 ° 48 Median = 116+^X4 = 119.92 Guessed median = 118 (midpoint of step 116-119) Correction, C = 119. 92- 118. 00 = 1.92 1.92 c = — j— = . 48 4 Applying formula: AD = ^ Xstep length . n 265+ .48(101 -99) ^ AD = 200 X4 = AD = 1. 33X4 = 5. 32 THE FREQUENCY DISTRIBUTION 35 (B) The Calculation of the AD from the Median It is sometimes desirable to calculate the AD from the median instead of the average. The formula for the AD from the median is exactly the same as formula for AD from the average (see page 32). However, the scheme of the work differs in some respects from the calculation of the AD from the average, and hence it is illustrated in Table V for the 200 cancellation scores taken from Table II (2). First we find the true median, 119.92, by the method outlined on pages 13-14. Next, we assume or guess a median at the midpoint of the step-interval which contains the true median, viz., at 118. Since the true median is known, the score correction, C, is found directly to be 1 . 92 by subtracting 118 from 119.92 (true median — assumed median). Then dividing 1.92 by 4, the step-interval, we obtain .48, the cor- rection in step-units (c) . The D's are taken from 118, the guessed median, and the FD's are obtained (as shown in Table IV) by " weighting " each D by its corresponding F. The arithmetic sum of column 5, i.e., the XFD, is 265. Fl, the total number of scores on mid- points 118 to 106 inclusive (those less than 119.92) equals 101. And Fg, the total number of scores on midpoints 122 to 128 inclusive (those greater than 119.92) equals 99. With 2FD, c, Fl, and Fg known, the AD is now easily found by substituting these values in formula (8). The numerator becomes 265+. 48 (101 — 99) or 265.96; and divid- ing by 200 and multiplying by 4, the step-interval, we get 5 . 32 as the AD from 119.92, the median. 3. The Calculation of the Standard Deviation (a) by the Short Method The calculation of the standard deviation by the Short Method is considerably less complex than the calculation of the AD. The formula is : (7 = kFD 2 \~~Aj c 2 X the step-interval, ... (9) 36 STATISTICS IN PSYCHOLOGY AND EDUCATION in which the ZFD 2 is the sum of the squared deviations in units of step-intervals, taken from the guessed average, and c is the correction in units of step-interval. An illustration of the calculation of a by the Short Method is given in Table IV. The first step is to fill in the FD 2 column (column 6) by multiplying each D in column 4 by its corre- sponding FD in column 5. The process is identical with that used in the Long Method, except that the Z)'s are all expressed in units of step-interval. This, of course, considerably simpli- fies the multiplication. The calculation of c has already been described on page 31. The sum of the FD 2 column (2FD 2 ) is 770, and c 2 is .6639. Applying formula (9) therefore, we get 3.687X5 or 18.44 as the a of the distribution. The formula for a by the Short Method unlike the AD formula, holds good no matter what the size of the correction, c. This general applicability of formula (9) serves to increase its value. 4. The Short Method Applied to Discrete Series We have defined a discrete series on page 2 as one in which there are real gaps. This means that in a truly dis- crete series each measure, instead of representing an interval on a scale as in a continuous series, is a separate and distinct value. There is, for example, a real gap between one man and two men; or between one dollar and two dollars — provided the unit of measurement in the latter case is one dollar. Table VI illustrates the method of finding the measures of central tendency and variability for discrete measures tabu- lated into a frequency distribution. The data consist of the records of the number of children in 44 families of a rural community. In the first column of the table is given the number of children in the family; in the second column — under the F — the number of families of a given size. We find, for example, one family of 10 children; three of 9; four of 8, etc. Since the measures — here the children — are discrete, THE FREQUENCY DISTRIBUTION 37 TABLE VI To Illustrate the Calculation of the Average, Median, <t, AD, and SD When Measures are Discrete The "F" column gives the number of families containing the children listed in first column. Measures, No. Children 10 9 8 7 6 5 4 3 2 1 F Families 1 3 4 3 5 N = 44 N 2= 22 F„ = 24 Fi = 20 D FD 90 FD* 5 5 25 4 12 48 3 12 36 2 6 12 1 5+40 5 -1 - 7 7 -2 - 8 16 -3 -12 36 -4 - 8 32 -5 -15-50 75 292 GA=5 -10 c = 44 Average = 4. 77 Median = 5.0 Mode = 5.0 N = -.23 c 2 = .054 Q = Q i zQi = 6^-3 = 1 75 AD = 2 2 XFD+c(Fi-F g ) 90- .23(20-24) N 44 AD = 2.07 SD = )FD* A N -V! 292 054 £D = 2.57 22; since 22nd measure falls on 5, Median =5 N •j- = 11; since 11th measure falls on 3, Qi = 3. 3.V = 33; since 33rd measure falls between 6 and 7, $3 = 6.5. each measure must be taken at face value, and there are, in consequence, no midpoint values for the different steps. As a result, the average being guessed at 5, D's are taken directly from this point. The FD and the FD 2 columns are calculated exactly as shown in Table IV for continuous series — the 38 STATISTICS IN PSYCHOLOGY AND EDUCATION first column is obtained by multiplying corresponding F and D values, and the second by multiplying corresponding D and FD values. Note that since the step-interval is 1, the correction c equals C directly. If we apply the correction — . 23 to 5, the guessed average, the average of the distribution 4 . 77 is obtained. This result, while mathematically correct, is obviously a rather difficult one to interpret in a practical way, however, as it is impossible for a family to have four and a fraction children. Possibly the median is a more meaningful measure. One half of the measures is 22, and counting in from the small end of the series we find that the twenty-second score falls on the fre- quency opposite step 5. Fractional values are, of course, really meaningless in a discrete series ; and hence we must simply take 5 as being rough 1 , y the median of the distribution without any interpolation. The median family, accordingly, — and the modal family as well — may be said to contain 5 children, and on the face of it, this result seems to be of more practical value than the statement that the average number of children to a family is 4 . 77. It is worth while examining further, however, exactly what is meant by the statement that the average number of children per family is 4.77. In the first place it means, of course, that the number of children in the N families examined, divided by N, gives us 4.77. But furthermore, if the families examined are actually a fair sample of all of the families in the " population " from which they are taken (see page 120), it means that if we had taken all of these families — or another fair sample of them — the average size of the family would have been (approximately) the .same. The average, then, is a constant factor for the given population, such that, knowing the number of families in any fair sample of the population, we can multiply this number by the constant factor and obtain (approximately) the number of children in all of these families. Good use may thus be made of the average, therefore, even when the measures are necessarily discrete: THE FREQUENCY DISTRIBUTION 39 exactly the same kind of use that can be made of the average In the case of continuous measures. The median, on the other hand, together with the quartiles, really breaks down in the case of discrete measures. In the example above of the families, there is actually no value which fulfills the definition of the median as such a point or value that one half of the measures exceed it, and one half fall below it. There are just 44 families in all; the median, then, would be such a point that 22 families exceeded it and 22 fell below it. Now there are 20 families falling below 5; 8 families at 5: and 16 families above 5. If we place the median exactly at 5, only 20 families instead of the required 22 fall below. And if we place the median even the least fraction above 5, the number falling below is increased by all of the families having 5 children, so that there are then 22+8 families falling below the median, or more than half. There is, in short, no median value for this series under the definition of the median which we have been using. Sometimes, however, another definition of the median is given, namely, that it is the score or measure made by the middle individual wjien the individuals have been arranged in order — for scores — from least to greatest. 1 Strictly speaking, this definition also breaks down in the case of discrete measures, since there is really no sense in speaking of two or more individ- uals who have the same score as being arranged in order of magnitude, when measures are discrete. Thus the 8 families, of 5 children each, are all exactly equal as regards number of children. Of course, we might admit that in a sense, some one (any one) of these 8 families is the middle of the whole series, and since it is a family of 5 children, the median — so defined — is just 5, no more nor less. This is the median as we have used it. At best, however, it is a rough and unreliable measure. In computing the measures of variability in a discrete series, the Q is the only one which offers difficulties. In the 1 See discussion of midscore, page 12. 40 STATISTICS IN PSYCHOLOGY AND EDUCATION present illustration, one fourth of the measures ( — ) is 11, and counting in from the small end of the series 11 scores, we put Qi on step 3 (as in the case of the median, no interpola- tion is made). If we check this value of Qi by counting in 33 scores from the large end of the distribution, we again obtain /3N 3 as the value of Qi. Three fourths of the measures f— - is 33; and counting in 33 scores from the small end of the series, we find that we complete — or count through — the frequency on step 6. If 11 scores are counted off from the other direction, we complete — or count through — the frequency on step 7. This puts Q% at either 6 or 7, and the best way out of the difficulty is to take Qs as roughly equal to 6.5, i.e., midway between 6 and 7. This is of course a makeshift, though even at that probably as accurate as the median or quartiles ever are in discrete series. Taking Q± q 5 — 3 equal to 3, and Qs equal to 6 . 5, Q is — "— — - or 1 . 75. The AD and a in a discrete series are found from formulas (8) and (9) in exactly the same way as in a continuous series. For example, Fl — the number of families less than 4.77 — ■ is 22; and Fg — the number of families greater than 4.77 — is 24. The AD is, therefore, 90+[ ~ -231(20-24) xl ^ I292 step-interval) or 2.07. The a is */— — .054X1 (the step- interval) or 2 . 57. V. The Comparison of Groups 1. The Measurement of Relative Variability. The Coefficient of Variation Thus far we have been dealing entirely with measures of absolute variability within the distribution, the Q, the AD, and the SD. It is sometimes desirable, however, to measure relative variability as for instance to compare the variability THE FREQUENCY DISTRIBUTION 41 of one group on two different tests, or of two or more groups on the same test. The measures of absolute variability are not sufficient in such cases as these unless the averages of the two distributions are equal or approximately equal. A problem will serve to make this clear. A group of 50 boys works for 6 minutes on an arithmetic test and makes an average score of 20 . 5 with a a of 5 . 24. The same group works for 10 minutes on the same test and makes an average score of 34 . 8 with a a of 9 . 62. If we compare the a's of these two distributions we should probably be inclined to say that the group was considerably more variable in the 10 minute period than in the 6 minute period. Despite the fact that the a in the second period is nearly twice as large as the a in the first period, however, this does not mean necessarily that the variability of the group has doubled with the increased time allowance (or even increased at all) for the average score has also increased from 20.5 to 34.8. In other words, the two o-'s are not directly comparable as they have been measured around different central tendencies. In order to compare the relative variability of this group in the two periods it is evident, therefore, that we must have a measure which takes account both of the dentral tendency and the variability. Such a measure is Pearson's Coefficient of Variation, given by the formula, V=^- (10) Average Applying this formula to the present problem we find that For the 6 minute period : V = ' , , — = 25 . 56. 20.5 i? 4-u m • i. • j tt 9.62X100 _ nA For the 10 minute period: 7= — ^-r-x — = 27.64. o4 . o Instead of being 50% as variable in the 6 minute period as 25 56 m the 10, therefore, the group is seen to be actually ' or 93% as variable, 42 STATISTICS IN PSYCHOLOGY AND EDUCATION The coefficient of variation is especially useful in those problems in which the variability of the group under different conditions is the factor studied. As stated above, when the averages are equal the absolute variability may be compared directly. 2. The Comparison of Two Groups in Terms of Their Measures of Central Tendency and Variability The existence of a difference between the averages or the medians of two groups does not indicate, necessarily, that there are any very marked differences in the performance of the various individuals within the two groups. An obtained differ- ence in central tendency may mean that the person ranking lowest in the one group is better than the person ranking high- est in the other; on the other hand, it may mean also that only a very small per cent of the better group is actual^ ahead of the poorer. For this reason in comparing groups it is not sufficient to state simply the difference between their averages or medians, for any such difference will depend for its significance largely upon the variability, or spread, within the groups compared. Table VII will illustrate what is meant. A group of 300 boys and a group of 250 girls have been measured on the same test, and the average, median, Q and a of each group computed. Now if we compare the central tendencies, it is clear that the average girl is 2 . 19 points ahead of the average boy, and that the median girl is 2.25 points ahead of the median boy. If taken alone this result might suggest a fairly definite sex difference in the given test; but before drawing this conclusion, we should compare the variability of the two groups. A comparison of the Q's and c's shows that the girls tend to scatter somewhat more around their central tendency than the boys. The range of scores is, however, practically the same in both groups: 100% of the boys and 92% of the girls score between 12 and 32 on the scale. Also from the quartiles it is evident that the middle 50% of the boys scored between THE FREQUENCY DISTRIBUTION 43 19 and 24 (approximately) while the middle 50% of the girls scored between 20 and 27 (approximately). TABLE VII Comparison OF Two Groups in Terms of Central Tendency, Variability, and Overlapping Boys Girls Scores F D FD F£)2 Scores F D FD FD* 28-32 15 24-28 68 20-24 128 16-20 79 12-16 10 AT =300 f=150 2 1 -1 -2 30 68+98 -79 -20-99 60 68 79 40 247 32-36 20 28-32 35 24-28 73 20-24 68 16-20 41 12-16 13 iV = 250 J-u. 2 40 80 1 35+75 35 -1 -68 68 -2 -82 164 -3 -39-189 117 464 GA=22.0 &4=26 -1 C 300 -.003 -114 C 250" -.456 c 2 = .208 C=-. 003X4 = = -.01 C= -.456X4= -1.82 Average = 2 1.9£ 1 - Average =24.18 Median = 20+ ^X4 = 21.91 Median = 24+^ i o X4 = 24.16 [?-»>- = 16+^X4 = 19 .29 [^=62.5]q,= = 20+~X4 = 20.50 68 [^ = 225] , = 24+^X4 = 24.47 [f=i87. 5 ]e, = 24+^-X4 = 27.59 Q=2.59 :4 Q = 3.55 /247 a ~\300 >< /464 ff= V250- 208 >< 4 = .907X4 : = 3 .63 = 1.28X4 = 5.12 What per cent of the boys reach or exceed 24.16, the median of the girls? 217 boys score below 24. Step 24-28 contains 68 scores; hence there are 68/4 or 17 scores per scale unit on this step. 17X-16 = 2.72. 217+2.72 or 219.72 of the boys' scores fall below 24.16, the girls' median. 300-219.72 ~80.28. Accordingly, ~* or 26.76%— approximately 27%— of the boys reach or exceed the median score of the girls. 44 STATISTICS IN PSYCHOLOGY AND EDUCATION Again, we find from comparing the o-'s that the middle 2/3 of the boys scored between 21. 99 ±3. 63, i.e., between 18 and 25 (approximately) and that the middle 2/3 of the girls scored between 24.18±5.12, i.e., between 19 and 29 (approximately) on the scale. In spite of the difference in averages and medians, therefore, it is evident from the measures of varia- bility that the boys and girls scored over almost exactly the same part of the scale. To compare the variability of the boys as a group with that of the girls, we must compute the coefficients of variation. These are „ « T7 3.63X1 00 ir - For Boys: V= g ^— = 16.5. For Girls: F= 5 -^** 00 = 21.2. 24.18 16 5 Expressed as a per cent, the boys are 91 ' or 78% as variable as the girls. 3. The Comparison of Two Groups in Terms of Overlapping A second way of showing how alike, or unlike, two groups are in their performance on a given test is to state the amount of overlapping in the distributions of scores made by the two groups. This information serves as a valuable supplement to that secured from a comparison of central tendencies and variabilities. Overlapping is usually measured by the per cent of the one group which reaches or exceeds the median of the other. In the present problem we may compute the per cent of boys who reach or exceed the median score of the girls. The calculation of this measure of overlapping is as follows. First, we add up the boys' scores from the small end of the distribution to find how many fall below 24 . 16, the girls' median. Two hundred and seventeen boys, 10+79 + 128, score below 24, the lower limit of the step 24-28. To find how many score below 24.16, we divide the 68 scores on this THE FREQUENCY DISTRIBUTION 45 step-interval by 4 (the length of step) and multiply the result (17) by .16 in order to find how far beyond 24 we must go to reach the point 24 . 16. The result of this last calculation is 2.72, and accordingly a total of 217+2.72 or 219.72 of the boys' scores out of the total 300 fall below 24.16, the girls' median score. If we subtract 219.72 from 300, it follows that 80.28 of the boys' scores lie above 24. 16. It is clear, then, that 80 28 ' or 27% of the boys score at or beyond the girls' median. oUU (See Table'VII.) Summarizing the results from Table VII and the discus- sion of the preceding paragraphs, we find that the difference between the average boy and average girl is 2. 19 points in favor of the girls, and that the difference between the median boy and median girl is 2.25 points in favor of the girls. Twenty- seven per cent of the boys reach or exceed the median score of the girls; 100% of the boys and 92% of the girls score within the same limits on the scale; the middle 2/3 of the boys score between 18 and 25, and the middle 2/3 of the girls score between 19 and 29. The obvious conclusion from these data seems to be that individual differences within either group — between boy and boy or between girl and girl — are probably of more importance (because greater) than the differences between boy and girl indicated by the averages or medians taken alone. VI. The Calculation of the Percentiles in a Frequency Distribution We have already found it necessary in finding the quartile deviation, Q (see page 18) to calculate Qi, the first quartile or 25th percentile, and Qz, the third quartile, or 75th percentile. It is often very useful to know, in addition to these points, the ten decile points in the distribution as well, viz., the 10th, the 20th, the 30th, the 40th, etc., percentile points. These values are calculated in exactly the same manner as the median and the quartiles. As the 25th percentile, for example, was 4G STATISTICS IN PSYCHOLOGY AND EDUCATION found by counting off 1/4 of the scores from the small end of the distribution, and the 50th percentile (the median) by count- ing off 1/2 of the scores, in exactly the same way the 10th percentile is found by counting off 1/10, and the 20th percentile by counting off 2/10 of the scores from the small end of the dis- tribution. Percentiles are of considerable value in enabling us to compare the standing of different individuals in a number of tests, or to combine the standing of the same individual in different tests (see page 278 for a fuller discussion of this). Table VIII gives the method of calculating the percentiles in the distribution of 54 Army Alpha scores taken from Table I. The 10th percentile, 147, is located by finding 10% of 54, and counting off 5.4 scores from the small end of the distribu- tion. In like manner, the 20th percentile, which is 2/10 or 10.8 scores from the small end of the distribution is located at 155.67. The 20th percentile score is taken as 155. This is due to the fact that a score of 155 in a continuous series means "155 up to 156" and consequently 155.67 falls on score 155, just as 160.25, the 30th percentile point, falls on score 160. 1 The other percentile points, and their scores, are tabulated in Table VIII. A word should be said with regard to the calculation of the and 100th percentiles. These values are the lowest and the highest scores, respectively, in the distribution. For example, we find from the original scores in Table I that the lowest score is 126 and the highest 201. Therefore, the percentile falls at 126 and the 100th at 201. Note the column in the table marked Cum. F (cumulative frequency) . The entries in this column were obtained by adding the scores (the F) serially beginning with those on step 125-129 : e.g., 2+0 = 2; 2+2=4; 4+1 = 5, etc. From this column we can quickly tell how far we must count into the distribution in order to reach any percentile point. For example, the 70th percentile is 37.8 scores from the beginning of the distribution; 1 This applies also to the median and the quartilep in a distribution of scores in continuous series. THE FREQUENCY DISTRIBUTION 47 TABLE VIII To Illustrate the Calculation of the Percentiles in a Frequency Distribution 1. data from table i Scores F Cum. F Percentiles Scores 200-204 1 54 100 201 195-199 4 53 90 194 190-194 2 49 80 188 185-189 10 47 70 185 180-184 3 37 60 179 175-179 8 34 50 175 170-174 3 26 40 167 165-169 3 23 30 160 160-164 4 20 20 155 155-159 6 16 10 147 150-154 4 10 126 145-149 1 6 140-144 1 5 135-139 2 4 130-134 2 125-129 2 2 N~- = 54 CALCULATIONS : 10% of 54 = 5.4 4 145 + — -X5 = 147 20% of 54 = 10.8 30% of 54 = 16.2 40% of 54 = 21.6 50% of 54 = 27 60% of 54 = 32.4 70% of 54 = 37.8 80% of 54=43.2 90% of 54 = 48.6 155 + ^-X5 = 155.67 (155) 160 + ^-X5 = 160.25 (160) 165+^X5 = 167.67 (167) 175+ I X5 = 175.626 (175) o 6 4 175+-^-X5 = 179 185+ Io x5 = 18540 ( 185 > 185+^X5 = 188.1 (188) 190+^X5 = 194 48 STATISTICS IN PSYCHOLOGY AND EDUCATION TABLE VIII— Continued 2. DATA FROM "A SCALE OF PERFORMANCE TESTS," BY PINTNER AND PATTERSON, PAGE 133. SCORES MADE BY 72 NINE-YEAR OLDS ON THE SUBSTITUTION TEST (iN SECONDS). Scores (sec.) F Cum. F Percentiles Scores 80-89 1 1 100 80 90-99 2 3 90 108 100-109 5 8 80 121 110-119 5 13 70 126 120-129 13 26 60 133 130-139 9 35 50 141 140-149 6 41 40 152 150-159 11 52 30 158 160-169 5 57 20 172 170-179 3 60 10 192 180-189 4 64 219 190-199 3 67 200-209 2 69 210-219 3 72 N = 72 calculations: 10% of 72 (90th percentile 20 % of 72 (80th percentile 30% of 72 (70th percentile 40% of 72 (60th percentile 50% of 72 (50th percentile 60% of 72 (40th percentile 70% of 72 (30th percentile 80% of 72 (20th percentile 90% of 72 (10th percentile = 7.2 100+^X10 = 108.4 (10S) o = 14.4 120+^X10 = 121 = 21.6 120+^X10 = 126.6 (126) =28.8 130+^X10 = 133 = 36 140+ -r X10 = 141.67 (141) o o = 43.2 150+j^X10 = 152 = 50.4 150+j^Xl0 = 15S.5 (15S) = 57.6 170+ -- X10 = 172 = 64.8 190+ 4 X10 = 192.67 (192) THE FREQUENCY DISTRIBUTION 49 hence it is clear from the Cum. F's that 37 scores will take us to 185 — upper limit of step 180-184 — and that the 70th percentile lies on step 185-189. When once the percentile table has been drawn up, it is a relatively simple matter to find the percentile corresponding to any given score. In our problem, for instance, the man who makes a score of 177 falls on the 55th percentile — midway between the 50th (175) and the 60th (179) percentiles; while the man who scores 158 has a percentile score of 26, six tenths of the interval between the 20th percentile (155) and the 39th percentile (160). Other interpolations may be easily made in like manner. In Table VIII (2) the percentiles have been calculated for the distribution of scores (in seconds) made by seventy-two 9-year olds on the Woodworth- Wells Substitution test. 1 As the scores are in time-units, the lowest score is the best (the quickest) performance, while the highest score is the worse (the slowest) performance. Consequently, the percentile scale is reversed: we count from the 100th percentile down instead of from the percentile up. To find the 90th percentile for example, we count in 7.2 (10% of N) from 80-89 until we reach 108.4 (score 108). Counting in two tenths of N from 80-89, we reach 121, the 80th percentile. The 100th per- centile is taken at 80, theoretically the fastest record; the percentile at 219, the poorest record. From the percentile table we may say that a 9-year old who completes the Substitution Test in 141 sees, has a percentile score of 50 — stands at the median of the group; while a child of 9 who takes 181 sees, to complete the test sjtands 15th in the group — midway between the 10th percentile (192) and the 20th percentile (172). 1 Pintner and Patterson: A Scale of Performance Tests, 1921, p. 133. 50 STATISTICS IN PSYCHOLOGY AND EDUCATION VII. When to Use the Various Measures of Central Tendency and Variability The beginner in statistics is often at a loss to know which measure of central tendency or variability to use. The following summary will serve as a guide for most of the problems which the student will ordinarily meet : 1. When to Use the Average, Median, and Mode 1. Use the Average: (1) When each score or measure should have equal weight in determining the central tendency. (2) When the highest reliability is sought. (3) When product-moment coefficients of correlation, or measures of reliability are to be subse- quently computed. 2. Use the Median: (1) When a quick and easily computed measure of central tendency is necessary. (2) When there are extreme measures which would affect the average disproportionately. (3) When certain scores or measures should influence the central tendency, but all that is known about them is that they are above or below the central tendency. 3. Use the Mode: (1) When a quick approximate measure of concentration is desired. (2) When only the most often recurring score is sought. 2. When to Use the Range, Q, AD, and <r 1. Use the Range: (1) When the data are too scant or scrapp3 T to justify the calculation of another measure of variability. (2) When a knowledge of the total spread is all that is necessary. THE FREQUENCY DISTRIBUTION 51 2. Use the Q: (1) For a quick, inspectional measure of variability. (2) When there are scattered or extreme measures. (3) When only the concentration around the central tendency is sought. 3. Use the AD: (1) When it is desired to weight all deviations accord- ing to their size. (2) When extreme deviations should not influence the measure of variability. 4. Use o". (1) When the highest reliability is desired. (2) When it is desired that extreme deviations influence the measure of variability. (3) When coefficients of correlation or measures of reliability are later to be computed. VIII. Summary of Formulas for Finding the Measures of Central Tendency and Variability 1. Measures of Central Tendency I. Average: A. Long Method: (a) data ungrouped : A 2 (Measures) ,_ Average = — — j= '- (1) (b) data grouped : Average = - A -^ — - (2) B. Short Method: (a) data grouped : Average = GA +C (Algebraic.) c = 2(TO)(al g ebraic) xlengthofstep 52 STATISTICS IN PSYCHOLOGY AND EDUCATION 2. Median: Arrange the measures in order of size, and count off 1/2 of the measures beginning at the small end of the series. 3. Mode: For Crude Mode take most frequent score, or mid- point of atep with largest frequency. 2. Measures of Variability 1. Range = (largest measure) — (smallest measure). 2. Quartile Deviation: Q= Qj ^-, (3) 3. Average Deviation: A. Long Method : (a) data ungrouped : . n 2D (arithmetical) fA . AD— jy -, (4) (b) data grouped : . ~ 2FD (arithmetical) /rN AD= K —^ ', (o) B. Short Method: (a) data grouped : , n 2FD+c (Fl-Fg) „, ., , , fQ . AD = ^ -X length of step, . . (8) 4. Standard Deviation: A. Long Method : (a) data ungrouped : '->Sr. ( 6 ) (b) data grouped : H N .-^ m THE FREQUENCY DISTRIBUTION 53 B. Short Method: (a) data grouped: (T= V Z FD 2 N c 2 X length of step, .... (9) 5. Coefficient of Variation: 100(7 V Average' IX. Illustrative Problems (10) The following problems illustrate the calculation of the average, median, mode, Q, AD, and o- for continuous and discrete series. They are given as examples of the Short Method, and should be carefully reviewed by the student. Example I Calculation of the Average, Median, Mode, Q, AD, and SD. Step » = 7 Measures Midpoint F D FD FZ)2 145-151.99 148.5 1 1 6 6 36 138-144.99 141.5 1 5 5 25 131-137.99 134.5 2 4 8 32 124-130.99 127.5 2 ►F*7=34 3 6 18 117-123.99 120.5 3 2 6 12 110-116.99 113.5 10 1 10+41 10 103-109.99 106.5 Av = 15 96-102.99 99.5 106 .26 14 1 6 3 -1 -14 14 89- 95.99 82- 88.99 92.5 85.5 >Fi = 25 Z\ -12 - 9 24 27 75- 81.99 78.5 2J -4 - 8-43 32 N = 59 84 230 N 2 = 29.5 GA = 106.5 2 C= "59 = AD= Si +< -.034)[25- 59 -34] X7 -.034 t : 2 = . 001 AD = 10.00 C=-. 034X7= -.238 Average = 106 . 5 -f- ( - . 238) = 106 . 26 Median = 103 + ~X7 = 105. 10 15 .= J?30. V 59 .001X7 er = l. 97X7 = 13. 79 Mode = 106. 50 N 4=14.75 f=44.25 [ [ Qi=96+^-X7 = 97.875 #3 = 1104 14 4.25 Q = 7.55 10 X7 = 112.975 54 STATISTICS IN PSYCHOLOGY AND EDUCATION Example II Calculation of Average, Median, Q and SD. Step = 1 Soores 22-22.9 21-21.9 20-20.9 19-19.9 18-18.9 17-17.9 16-16.9 15-15.9 14-14.9 13-13.9 12-12.9 11-11.9 10-10.9 9- 9.9 8- 8.9 7- 7.9 6- 6.9 5- 5.9 4- 4.9 3- 3.9 2- 2.9 1- 1.9 F 1 7 16 35 81 172 330 600 1,031 1,793 2,572 2,951 3,187 3,319 2,891 2,149 1,315 684 302 112 38 10 # = 23,596 N ,J =11,798 GA = 10.5 -2234 c=- 23,596 C=-.09 Average = 10.41 = -.09 D 12 11 10 9 8 7 6 5 4 3 2 1 -1 -2 -3 -4 -5 -6 -7 -8 -9 c 2 = .008 Median = 10 978 '3187 Xl = 10.31 FD 12 77 160 305 648 1,204 1,980 3,000 4,124 5,379 5,144 2,951+24,984 •3,319 -5,782 •6,447 5,260 ■3,420 ■1,812 • 784 - 304 - 90-27,218 -2,224 FD* 144 847 1,600 2,745 5,184 8,428 11,880 15,000 16,496 16,137 10,288 3,319 11,564 19,341 21,040 17,100 10,872 5,488 2,432 810 1S0,715 ,1S0, 715 „„ V 23^96 - 00SX1 r^= 5,899] q 1 ==8+iii?> [^= 17,697] <?.« 2891 7QQ 12+^X1=12.29 25/2 = 2.77 Q = 1.92 THE FREQUENCY DISTRIBUTION 55 Example III Calculation of Average, Median, Mode, Q, AD, SD, for Discrete Series Step = 1 Measures F 21 21 22 1 23 4 > Fl 24 9 25 Average "~ =25.036 26 21, 11 \ 27 28 6 1 ■ Fg 29 _^j N = 56 N 2 28 GA=25 5o ( ; c-=.ooi Average = 25 . 04 Median =25 Mode = 25 [?-»] Qi=24 Of*-] & =26 D -4 -3 -2 -1 1 2 3 4 FD FD -8 32 -3 9 -8 16 -9-28 9 11 11 12 24 3 9 4+30 16 58 126 AD = 58+. 036(37-19) xl 5o 4D = 1.05 <r = 1.50 O-i.o 56 STATISTICS IN PSYCHOLOGY AND EDUCATION PROBLEMS 1. Tabulate the following scores into three frequency distributions, using class-intervals of 3, 5, and 10 units respectively. Scores made on the Thorndike Entrance Examination by 100 applicants for admission to Columbia College. (From Sommerville, R. C: Physical, Motor and Sensory Traits, Archives of Psychology, 75, 1924.) Note: — Fractions have been dropped. 2. 63 80 75 90 81 83 78 81 83 83 89 98 46 90 103 81 71 93 82 78 86 85 73 83 74 86 84 72 63 76 103 78 85 81 105 94 78 101 76 98 74 75 88 65 80 81 98 56 103 90 92 85 78 73 87 75 102 58 78 95 73 73 73 96 83 110 95 90 87 86 96 98 82 86 70 70 95 71 89 86 85 72 94 92 73 84 79 74 88 72 92 86 93 84 50 85 76 82 99 91 The following distributions represent the scores made on a logical memory test by two racial groups, A and B. (1) Find the average, median, Q and SD of each distribution. (2) What per cent of group A reaches or exceeds the median of group B? (3) Compare the relative variability of the two groups by means of their coefficients of variation. Scores Group A Group B 79-83 6 8 74-78 7 8 69-73 8 9 64-68 10 16 59-63 12 20 54-58 15 18 49-53 23 19 44-48 16 11 39-43 10 13 34-38 12 8 29-33 6 7 24-28 3 2 # = 128 # = 139 THE FREQUENCY DISTRIBUTION 57 3. Compare the 30th, 60th, and 90th percentile scores in Group A [problem (2)] with the corresponding percentile scores in Group B. 4. The following problems are given for the purpose of affording practice in finding measures of central tendency and measures of variability. In every case where the Average, AD, or SD is to be found, use the Short Method. (1) Find the Average ! and SD. Scores F 70-71 2 68-69 2 66-67 3 64-65 4 62-63 6 60-61 7 58-59 5 56-57 4 54-55 2 52-53 3 50-51 1 (2) Find the Median and AD (from the Median.) Scores 90-94 85-89 80-84 75-79 70-74 65-69 60-64 55-59 50-54 45-49 40-44 iV = 39 F 2 2 4 8 6 11 9 7 5 2 iV = 56 (3) Find the Average, AD, and SD. Scores F 120-122 2 117-119 2 114-116 2 111-113 4 108-110 5 105-107 9 102-104 6 99-101 3 96-98 4 93-95 2 90-92 1 (4) Find the Average and SD. (Discrete Series.) Scores 80 ' 79 78 77 76 75 74 73 72 71 2V = 4Q F 1 3 3 6 8 7 3 4 2 1 iV=38 58 STATISTICS IN PSYCHOLOGY AND EDUCATION (5) Find the Median and Q. (6) Find the Average, Median and SD. Scores F Measures F 100-109 5 80-84 8 90-99 9 75-79 14 80-89 14 70-74 19 70-79 19 65-69 24 60-69 21 60-64 29 50-59 30 55-59 27 40-49 25 50-54 26 30-39 15 45-49 28 20-29 10 40-44 20 10-19 8 35-39 15 0-9 6 30-34 10 # = 162 # = 220 2. (1) Answers Group A Group B Average 53.88 56.21 Median 52.70 56.64 Q 9.64 9.90 SD 13.82 13.73 (2) 39% of Group A reaches or exceeds the median of Group B (3) Coefficient of Variation, Group A = 25. 64; Group B =24.43 ; Group B is 95 . 3% as variable as Group A. 3. Group A Group B 30th percentile score 46 49 60th percentile score 56 60 90th percentile score 74 75 (1) Average = 61.26 £D= 4.99 (2) Median = 67.27 AD= 8.97 (3) Average = 106. 5 AD= 5.55 SD = 7.2S (4) Average = 75.66 SD= 2.11 (5) Median = 55.67 (3 = 16.41 (6) Average = 57.0 Median = 57. 04 £D = 13.17 CHAPTER II GRAPHIC METHODS AND THE NORMAL CURVE I. The Graphic Representation of the Frequency Distribution We learned in the last chapter how scores or other measures of capacity may be organized and condensed into the tabular arrangement called a frequency distribution. In addition we found how such arrangement aids us in calculating measures of central tendency and variability, and, in general, gives us a better idea of the facts as a whole. Still further aid in analyzing numerical data may be secured by a graphic or pictorial treat- ment of our material. The advertiser has long recognized the power of the illustration to catch the eye and hold the attention where the most careful array of statistics fails. And in like manner, the statistician, through the medium of dia- grams and graphs^ attempts to utilize the attention-getting power of visual presentation and at the same time to translate numerical facts — often abstract and difficult of interpretation — into a more concrete and understandable form. There are three methods of representing graphically — i.e., of " plotting " — measures which have been grouped into a frequency distribution. The first method gives the Frequency Polygon; the second the Histogram or Column Diagram; and the third, the Ogive, or cumulative frequency graph. These will be considered in order. 1. The Frequency Polygon Before outlining the method of constructing a frequency polygon, it might be well to review briefly the simple algebraic principles which apply to all graphical representation of 59 60 STATISTICS IN PSYCHOLOGY AND EDUCATION Y F a, 3) 0) CO II '<& o a bs< jiss a JC numerical data. Graphing or plotting is done with reference to two lines or " coordinate axes," the one the vertical or F-axis, the other the horizontal or X-axis. These basic lines are perpendicular to each other, the point where they inter- sect being called 0, or the origin " (see Diagram II). To locate or "plot" a point "P" whose coordinates are x =4, and 2/ = 3, we go out from the origin 4 units on the X-axis, and up from the origin 3 units on the F-axis, and, where the perpendiculars to these points intersect, locate the point P (see Diagram II). In like manner, any point whose x and y values are known can be located with reference to OY and OX, the coordinate axes. Distances measured along the X-axis are commonly called abscissas, and dis- tances along the Y-axis ordinates. We may now show how these principles of graphing apply to the construction of the frequency polygon shown in Diagram III (1). This graph pictures the frequency distribution of Table I. The limits of the step-intervals (the abscissas) are laid off at regular intervals along the base line (the X-axis) from the origin; and the frequencies within each interval (the ordinates) are measured off on a scale along the F-axis. There are 2 scores on the first step, 125-129 (see Table I). To represent these on our diagram, we go out on the X-axis to 127.5 — midway between 125 and 130 — and up 2 F-units. Here we locate the first point. The frequency on the next step-interval, 130-134 is 0; hence the second point falls mid- way between 130 and 135 directly on the X-axis. The 2 scores on step 135-139, the 1 score on step 140-144, and the frequency on each succeeding step is, in every case, represented DIAGRAM II The Use of Coordinate Axes X and Y. GRAPHIC METHODS AND THE NORMAL CURVE 61 to fi .2 o a 3 D o ll 1 | , , i a: / V i /ec ^r ! *— % II jj 1 p u 1 > s 1 r S 1 - , 120 125 130 135 140 145 150 155 160 165 170 175 130 185 190 195 200 205 210 Scores DIAGRAM III (1) Frequency Polygon Plotted from Distribution of 54 Scores in Table I J.U 9 . 8 7 S 6 a §5 1 o &4 o 3 I oc r-H c ?. i ral II 2 W5 -31 o r-i ir 1 — R II 1 > < f , ! j 120 125 130 135 140 145 150 155 160 165 170 175 180 185 190 195 200 205 210 Scores DIAGRAM III (2) Histogram Plotted from Data in Table I. G2 STATISTICS IN PSYCHOLOGY AND EDUCATION by a point the specified number of scores (Y-units) above the X-axis, and midway between the upper and lower limits of the step on which it lies. It is important to remember in plot- ting a frequency polygon that the midpoint of the step is always taken to represent all of the scores within that interval. The heights of the ordinates at the different midpoints represent the frequencies within the intervals. When all of the points have been located they are joined in regular order to give the outline of the frequency polygon shown in Diagram III (1). In order to complete the figure, note that the step next below the lowest (125-129) and the step next above the highest (200-204) are included on the X-scale. The frequency of each of these steps is taken as 0; and in consequence the frequency polygon begins and ends on the X-axis. The distance taken to represent a step-interval on the X-axis will usually depend on the width of the cross section paper used and on the number of steps in the distribution. No general rule can be given for the choice of an X-unit: nor for the choice of the unit taken to represent 1 score on the F-axis. The length of the diagram, and the maximum fre- quency on any given step (as, for example, the 10 scores on step 185-189) will generally serve to indicate within what practical limits the F-unit must be selected. After plotting several polygons, the student will soon discover that a too- long F-unit exaggerates the changes in the distribution from step to step, while a too-short F-unit makes the graph too flat. In like manner, a too-long X-unit tends to stretch out the polygon, while a too-short X-unit crowds the separate points on the frequency surface and makes comparisons difficult. The total frequency (N) of the distribution is represented by the area of the polygon: that is, by the area between the boundary or frequency surface and the base line. The area of any given interval cannot be taken as proportional to the number of cases within the interval, however, because of the GRAPHIC METHODS AND THE NORMAL CURVE 63 numerous irregularities in the distribution, and consequently of the frequency surface. To show the position of the average, median, and mode on the graph, we must first locate these values on the X-axis, and then erect perpendiculars as shown in the diagram. Note that the mode is easily located as the highest point on the* frequency surface. The steps involved in constructing a frequency polygon may be summarized as follows: 1. Draw two straight lines perpendicular to each other, the vertical line near the left side of the paper, the horizontal line near the bottom. Call the vertical line — the F-axis — OY, and the horizontal line — the X-axis — OX. Put the where the two lines intersect. This point is called the origin. 2. Lay off the step-intervals of the frequency distribution at regular intervals along the X-axis. Begin with the lower limit of the step next below the lowest as the origin, and end with the upper limit of the step next above the highest. Label the successive X-points with the step limits. Select as the X unit a distance which will permit all of the steps to be represented on the one graph. 3. Mark off on the Y-axis successive unit distances to represent the scores on the different steps. Choose a scale which will permit the maximum frequency to be represented on the graph. 4. From the midpoint of each step-interval on the X-axis, go up in the Y direction a distance equal to the number of scores on the step. Place a point here. 5. Join the points plotted in (4) with straight lines to give the frequency polygon. 2. The Histogram or Column Diagram A second method of representing a frequency distribution graphically is to construct a histogram or column diagram. This type of graph is illustrated in Diagram III (2), with the same distribution of scores represented by the frequency polygon in Diagram III (1). The two graphs are constructed 64 STATISTICS IN PSYCHOLOGY AND EDUCATION in much the same way with this important difference: that whereas, in a frequency polygon, all of the scores within a given interval are represented by the midpoint of that interval, in the histogram the assumption is made that all of the scores within an interval are spread uniformly over the entire interval. For this reason, the measures within any given interval in a histogram are represented by a rectangle constructed with base equal to the length of the step-interval, and altitude equal to the number of measures within the interval. Thus [see Diagram III (2)] the 2 scores on step 125-129 are represented by a rectangle with base equal to the length of step-interval on the X-axis, and altitude equal to 2 units measured off on the F-axis. As there are no scores within the next interval 130-134, no rectangle is drawn here. The altitudes of the other rectangles vary with the number of scores on the intervals. When the same number of scores occur on two (or more) adjacent steps, as in the intervals from 140 up to 145 and from 145 up to 150, the base of the rectangle covers two (or more) intervals on the X-axis. The highest rectangle is, of course, that which has the step 185 up to 189 as its base and 10, the maximum frequency, as its altitude. In selecting scales for the X- and F-axes, the same considerations as to numbers of intervals, size of paper, maximum frequency, etc., noted under the frequency polygon, must be observed. Although in a histogram each step-interval is represented by a separate rectangle, it is not necessary to project the sides of these different rectangles to the base line, as shown in Diagram III (2), as the rise and fall of the boundary line showing the increase or decrease in the number of scores from step to step is usually the important fact to be brought out. As in the frequency polygon, the total frequency (N) is represented by the area of the histogram. In contrast to the frequency polygon, however, the area of each rectangle in a histogram is directly proportional to the number of measures in the interval, so that we have in the column diagram an accurate picture of the number of scores falling on each step. GRAPHIC METHODS AND THE NORMAL CURVE 65 In order to make easier a comparison of the two types of frequency graph, the distribution of Table III is plotted in Diagram IV, on the same coordinate axes, both as a frequency polygon and a histogram. The increased number of cases and the more symmetrical distribution of scores make both 52 1 '? \ 5U rrr \ 4o i \ 4b 1 \ 44 / \ 4^ / r 1 4U i \ OO / \ o4 32 / \ \ / \ -2 oO §28 / \ \ / \ p <so £24 / / \ / C-i CM X J* 22 9(1 / OS \ \ / / T— 1 III II — CD — C \ lo 16 14 19 / K \ \ / / 5* \ / <H 3? \ in / ci \ g > / i— t || \ a 7^ p > s / 1 <l L ^H & / / ^ 100 104 103 112 116 120 124 Scores 128 132 136 140 144 DIAGRAM IV Plotting op Frequency Polygon and Histogram. [Data from Table III (2)]. of these graphs more regular in appearance than the graphs of Diagram III. 1 The question of when to use the frequency polygon and when to use the histogram cannot be answered, unfortunately, by giving a general rule which will cover all cases. The frequency polygon is less exact than the histogram in that it does not represent accurately— i.e., in terms of area— the 1 Other examples of frequency polygons and histograms may be found on page 75. 6G STATISTICS IN PSYCHOLOGY AND EDUCATION number of measures on the successive step-intervals. For comparing two or more distributions plotted on the same diagram, however, the frequency polygon is probably the more useful, since the many vertical lines in the histogram often coincide. Both the histogram and the frequency polygon tell the same story, and both are useful in enabling us to show in a graphic fashion whether the scores of a group distribute uniformly over the scale, or whether they pile up at the low or the high end. Not only information with regard to the group but information with regard to the test may be thus secured. If a test is too easy, the scores will fall dispropor- tionately at the high end of the scale; if too hard at the low end. If the test is neither too hard nor too easy, the scores will tend to be symmetrically distributed, a few individuals scoring high, a few low, and the majority scoring somewhere near the middle of the scale. In this last case, the frequency polygon or histogram approximates the " ideal " or normal frequency distribution (see page 76). 3. The Ogive The ogive, or cumulative frequency graph, is a third way of representing a frequency distribution by means of a diagram. Before we can plot an ogive, the scores of the distri- bution must first be added serially or cumulated, as shown in Table IX for the two distributions taken from Table II (1 and 2). (These two distributions have already been used to illustrate the frequency polygon and histogram in Diagrams III and IV.) Note, that the first two columns in Table IX are exactly the same as in any frequency distribution, but that in the third column the scores have been " accumulated " successively from the low end of the distribution as described on page 46. The last cumulative score is, of course, equal toiV. 1 1 Cumulative distributions are useful also in telling quickly how many in a group scored above or below a certain point on the scale. In Table IX, for example, we read that 10 men in the group made Alpha scores below 155, 47 below 190, etc. GRAPHIC METHODS AND THE NORMAL CURVE 67 125 130 135 140 145 150 155 160 165 170 175 180 185 190 195 200 Step-Intervals DIAGRAM V (1) Ogive Curve. Data prom Table II (1). 205 200 _ 100 180 ( 90 Frequencies § S 8 80 70 60 |100 1 80 50 40 /l / 1 / l i 60 1 1 - 30 40 M<in. 1 - 20 20 m. t _ i 1 j I i i r ail a - 10 i 14 108 112 116 120 124 Step-Intervals 128 132 136 14 DIAGRAM V (2) Ogive Curve. Data prom Table II (2). 68 STATISTICS IN PSYCHOLOGY AND EDUCATION The two ogives which represent the distributions of Table IX are shown in Diagram V (1 and 2). Consider first the ogive of the 54 Alpha scores shown in (1). The step-intervals of the distribution have been laid off along the X-axis, and successive distances equal to the total number of scores in the distribution (here 54) have been laid off on the F-axis. It will be remembered in plotting the frequency polygon that the frequency of each step was taken at the midpoint of the step- interval; in constructing an ogive, however, each cumulative TABLE IX Cumulative Frequencies OF THE Two Distributions in Table 11 (For Plotting the Ogives of Diagram V) (1) (2) Measures F Cum. F Measures F Cum. F 200-204 1 54 136-139 3 200 195-199 4 53 132-135 5 197 190-194 2 49 128-131 16 192 185-189 10 47 124-127 23 176 180-184 3 37 120-123 52 153 175-179 8 34 116-119 49 101 170-174 3 26 112-115 27 52 165-169 3 23 108-111 18 25 160-164 4 20 104-107 7 7 155-159 6 16 150-154 4 10 iV=200 145-149 1 6 140-144 1 5 135-139 2 4 130-134 2 125-129 2 iV = 54 2 frequency must be plotted at the upper limit of the step on which it falls. The first point on the curve, for example, is 2 Y- units (the cumulative frequency on step 125-129) above 130; the second point is 2 7-units above 135, the third, 4 7-units above 140, and so on to the last point which is 54 7-units above 205. The plotted points are joined in order to give the ogive. Note that the curve begins at 125 on the A"-axis, and ends at 205 just 54 7-units above the X-axis. GRAPHIC METHODS AND THE NORMAL CURVE 69 Because the sample is small and the distribution of scores unsymmetrical, the ogive in (1) is somewhat jagged in outline. To eliminate such irregularities as these and to facilitate later computations, we often " smooth " an ogive by sketching in a smooth curve through as many of its points as possible. The dotted line in Diagram V (1) shows the result of this smooth- ing process. If the sample is large, and the measures well distributed, smoothing is often unnecessary [see Diagram V (2)]. The ogive in Diagram V (2) has been plotted from the distribution in Table IX (2), as described above. It offers no new difficulties and need not be considered in any detail. Note that the curve begins at 104, the lower limit of the first step, and ends at 140, the upper limit of the last step on the scale; also that the cumulative F% 7, 25, 52, etc., have all been plotted at the upper limits of their respective step-intervals. This ogive does not require any smoothing as the distribution which it represents is very symmetrical. The ogive has been less frequently used by workers in exper- imental psychology and education than either the frequency polygon or the histogram, and is probably somewhat more difficult for the general reader to interpret. It has, however, several distinct advantages. In the first place, unlike the other frequency graphs, the shape of the ogive remains prac- tically the same when the size of the step-interval varies. Furthermore, while the frequency polygon and histogram can- not be compared unless the step-intervals are the same, this restriction does not apply to the ogive. Probably the chief value of the ogive to the student of mental measurement lies in the relative ease with which percentile values may be calculated from the curve. The method of getting these values is illustrated in Diagram V (1 and 2). First, a perpendicular is erected on the X-axis at the upper limit of the last step-interval, and continued until it reaches the curve. (In the first ogive this perpendicular will be erected at 205.) Next, this line between the curve and the 70 STATISTICS IN PSYCHOLOGY AND EDUCATION X-axis is divided into 10 equal parts (by means of a compass or mm. rule) and the points of division labeled 10, 20, 30, 40, 50, 60, 70, 80, 90, and 100 (the 100 point lies on the curve, the point on the X-axis). These points are used to locate the 10 decile points in the distribution. To find the second decile, or 20th percentile, for example, we draw a line from the second point, i.e., from 20, parallel to the X-axis, and where this line cuts the curve, drop a perpendicular to the X-axis. Individuals in Order DIAGRAM VI Another Way of Constructing an Ogive. The Individuals are Arranged in Order Along the Baseline, Each Man's Score Being Marked Off on the Ordinate Above Him. This perpendicular locates the 20th percentile on the A'-scale. The other percentiles and quartiles may be found in the same way. Notice in ogive (1) that the percentile is 125 — theo- retically the lowest score in the distribution — and that the 100th percentile is 205 — theoretically the highest score in the distribution. The student should compare the percentile values obtained from the ogive with the same values as calculated in Table VIII (1). Due to the greater smoothness of the curve, the GRAPHIC METHODS AND THE NORMAL CURVE 71 percentiles obtained from ogive (2) will be more accurate than those got from the ogive (1). The accuracy with which we are able to obtain the percentiles graphically will depend, in general, on the accuracy with which the points of the curve have been plotted, the fine- ness of the scale, the number of cases, and the symmetry of the distribution. Another way of constructing an ogive is shown in Diagram VI, with the data of Table IX (1). Imagine the 54 individuals in the distribution arranged along the baseline according to the size of their scores, the score of each man being marked off on the ordinate above him. When these points are joined by straight lines, we have a series of rectangles of the histogram type, the base of each rectangle representing the number of men making the given score, the height of each rectangle representing the size of the score. A smooth curve may be sketched through (or as near as possible to) the midpoint of the upper base of each rectangle — as shown in the diagram — to give an ogive curve. From this ogive, percentiles may be easily found. To get the median, for example, we erect a per- pendicular at 27 ( -d- J on the X-axis, and draw a line through the point where this perpendicular cuts the curve parallel to the X-axis to locate the median approximately at 175 on the F-scale. The quartiles and the percentile points may be found in exactly the same manner. II. Other Uses of Graphical Methods — the Com- parative Line Graph Many problems in mental measurement, especially those which involve the measurement of changes attributable to growth, learning, practice, etc., readily lend themselves to graphical treatment. Diagram VII illustrates several such problems, in which the data are represented by " line graphs." As in all graphs hitherto considered, the measures are plotted 72 STATISTICS IN PSYCHOLOGY AND EDUCATION with reference to the coordinate axes, OY and OX, the coor- dinates of a plotted point being its abscissa or X-distance, and its ordinate, or F-distance. Figure 1 illustrates the " age " or " growth " curve. It 10 11 12 13 14 .15 16 17 18 Ads. Age Fig. 1. — Logical memory. Age is represented on X-line (horizontal); score, e.g., number of ideas remembered, on F-line (vertical). (After Pyle.) 12 16 20 24 28 32 36 Weeks of Practice 40 44 48 Fig. 2. — Improvement in telegraphy. Weeks of practice on X-lines; number of letters per minute on F-line. (After Bryan and Harter.) DIAGRAM VII Comparative Line Graphs. represents the growth in logical memory (for a connected passage) in boys and girls from 8 to 18 years old. Figure 2 illustrates the " learning " or " practice " curve. It shows the improvement in sending and receiving telegraphic messages, resulting from successive trials at the same task GRAPHIC METHODS AND THE NORMAL CURVE 73 over a period of weeks. Improvement is measured in terms of the number of letters sent or received per minute. Figure 3 is a " performance " or " practice " curve. It represents 25 successive trials with the hand dynamometer 60 r 50 w C & 30 u O 20 10 J L j L 12345678 9 10 11 12 13 14 15 16 17 18 19 20 21 Trials 23 24 25 Fig. 3. — Hand dynamometer readings in kilograms for 25 successive grips at intervals of 10 seconds. Two subjects, a man and a woman. 100 r i i_ j_ lhr.91ir.24hr. 48 hr. 144 hr. Fig. 4. — Curve of forgetting. The numbers on base line give hours elapsed from time of learning; numbers along F-axis give per cent retained. (After Ebbinghaua.) DIAGRAM VII Comparative Line Graphs. by one man and one woman. Note that the successive trials are laid off on the X-axis, and the strength of grip (in kgs.) on the F-axis. Graphs like these are useful in enabling us to compare individuals or groups at various stages in the test' or performance. They also enable us to study the effect of fatigue with successive trials. Figure 4 shows the well-known " curve of forgetting " (or 74 STATISTICS IN PSYCHOLOGY AND EDUCATION retention). It represents memory retention, as measured by the percentage of the original material retained after the passage of different time intervals. The time intervals between relearning are laid off on the X-axis; the per cent retained, as shown by the relearning, on the X-axis. III. The Normal Probability Curve In Diagram VIII are shown four graphs — two frequency polygons and two histograms — which represent frequency distributions of data drawn from anthropometry, psychology, and meteorology. It is at once apparent that all of these graphs have the same general form — the measures are con- centrated closely around the center, and taper off" from the central high point, or crest, equally to right and left. In general we find relatively few measures at the " low " score end of the scale; an increasing number up to a maximum at the midposition, and a progressive falling off as we go toward the " high " score end of the scale. If we divide the area under each curve (the area between the curve and the X-axis) by a line drawn perpendicularly through the central high point to the base line, the two parts will be practically similar in form and equal in area. This results from the fact that each curve shows almost perfect bilateral symmetry. The perfectly symmetrical curve, or frequency sur- face, to which all of the figures in Diagram VIII approximate, is shown in Diagram IX. This bell-shaped curve is called the Normal Probability Curve, or simply the Normal Curve, and is of the greatest value in psychological measurement. An understanding of its characteristics is essential to the student of experimental psychology and measurement; and consequently the rest of this chapter will be concerned with the study of the properties and uses of the Normal Curve. GRAPHIC METHODS AND THE NORMAL CURVE 75 saiouorib&i^ fl <r> <BcO u T) 4> bll -fl o a •n r7 V c3 O 03 s S-i CO fl H 3 u flO> "So T3 o «*-i u 03 vim o « 1 76 7. shorn , page 6 68 70 72 7 In Inches 85 adult male (After Yule i 6 ture f 85 es. ^ ^s fir, ro 1 "" 1 \ fl OQ \ 58 6 1.— Sta in Bri V sjityvig jo I'BAjtvjni qoni jed •ba.ij o DIAGRAM VIII fa fa §3 oiaoiflOiooooiaoia OO l~- t» <o «o >o iO •* •* 0-3 M(N .«8 eo V 2* a *- ««2 a S to <U 03 Sea a> 83 _ <H . fl) •NOW Samples op Frequency Distributions Drawn prom Different Fields. 76 STATISTICS IN PSYCHOLOGY AND EDUCATION 1. Elementary Principles of Probability. The Derivation and Construction of the Probability Curve Perhaps the simplest approach to an understanding of the Normal Curve is through a consideration of the elementary facts of probability. As used in statistics, the " probability " of the occurrence of an event may be defined as the expected relative frequency of occurrence of the given event in a very 5C % v 1 68.26% V S / — 4PE. S'-X PE - I -2: I 'E -1] D E ll >E 23 D E 3 f eV 4PE *Y — 3(T -2<r Sigma Scale -Iff Mean +lff +2ff + 3<r DIAGRAM IX Normal Probability Curve. large (infinite) number of observations. This expected relative frequency of occurrence may be based upon a knowledge of the conditions determining the probable occurrence, as in dice throwing or coin tossing, or upon empirical data, as in mental and social measurements. The probability of an event may be stated most simply, perhaps, as a ratio; as, for example, when we say that the probability of a coin falling heads or tails is 1/2, or that of a die showing a two spot is 1/6. This ratio, called the " probability GRAPHIC METHODS AND THE NORMAL CURVE 77 ratio," may be defined as that fraction the numerator of which equals the expected outcome or outcomes and the denominator of which equals the total possible outcomes. Such a ratio always falls between the limits (impossibility of occurrence) and 1.00 (certainty of occurrence). Thus the probability that the sky will fall is 0; that an individual now living will some day die is 1.00. Between these limits there are all possible degrees of probability expressed by the probability ratio. Let us now apply these simple principles of probability to the specific case of what happens when we toss coins (coin tossing and dice throwing furnish simple and often-used illus- trations of the laws of chance). If we toss one coin, obviously it must fall either heads (H) or tails (T) 100% of the time and a head or tail is equally probable. Expressed as a ratio, the probability of an H is 1/2; of a T, 1/2; and (H-f-T), i.e., 1+|= 1.00. Again, if we toss two coins, (a) and (6), at the same time there are 4 possible arrangements which the coins may take: (1) (2) (3) (4) a b a b a b a b H H H T T H T T That is, both coins (a) and (6) may fall H; (a) may fall H and (b) T; (6) may fall H and (a) T; or both coins may fall T. Expressed as a probability ratio, the chances of 2 heads are 1/4; of one head and one tail, 2X1/4 or 1/2; of 2 tails 1/4. Let us go a step further and increase the number of coins to three. If we toss three coins, (a), (6), and (c) simultaneously there are 8 possible outcomes: (1) (2) (3) (4) (5) (6) (7) (8) a b c a b c a b c a b c a b c a b c a b c a b c HHH HHT HTH HTT THH THT TTH TTT Expressed as a ratio, the chances of 3 heads are 1/8 (combina- tion 1) ; of 2 heads and 1 tail 3/8 (combinations 2, 3, and 5) ; 78 STATISTICS IN PSYCHOLOGY AND EDUCATION of 1 head and 2 tails 3/S (combinations 4, 6, and 7) ; and of 3 tails 1/8 (combination 8). In exactly this same way we can figure the probability of different combinations when we have 4, 5, or any number of coins. These probable outcomes may be secured in a very much simpler way than by listing all of the various possible com- binations as shown above. If there are two independent events, the probability of the occurrence or non-occurrence of each being the same (as in the probability, of a coin falling heads or tails) the " compound " probabilities may be found by the expansion of the binomial (p+q) 2 in which p equals the prob- ability of its happening, q the probability of its not happening, and the exponent 2 indicates the number of events. Now if we substitute H for p, and T for q (tails = non-heads), we have for two coins (H+T) 2 : and squaring, the binomial (H+T) 2 = H 2 +2HT+T 2 . This expansion may be written, 1 H 2 1 chance in 4 of 2 heads; probability ratio = 1/4 2 HT 2 chances in 4 of 1 head and 1 tail; probability ratio = 1/2 1 T 2 1 chance in 4 of 2 tails; probability ratio = 1/4 Total = 4 Note that these results are identical with those obtained above by listing the various possible outcomes when two coins are tossed. If we have three independent events, the expression (p+q) 3 becomes, for three coins, (H+T) 3 . Expanding this binomial, we get H 3 + 3H 2 T+3HT 2 +T 3 which may be written, 1 H 3 1 chance in 8 of 3 heads; probability ratio =1/S 3 H 2 T 3 chances in 8 of 2 heads and 1 tail; probability ratio =3/8 3 HT 2 3 chances in 8 of 1 head and 2 tails; probability ratio = 3/8 IT 3 1 chance in 8 of 3 tails; probability ratio = 1/8 Total = 8 Again these results are identical with those got by listing the various possible outcomes obtained by tossing throe coins. GRAPHIC METHODS AND THE NORMAL CURVE 79 The binomial expansion may be applied more generally to the case in which there are any number of independent events, just so long as the probability of occurrence or non-occurrence is the same for each separate event. Thus if we toss 10 coins simultaneously, we have by analogy with the above (p+#) 10 , which equals (H+T) 10 , putting H for probability of a head, T for probability of a non-head (tail) and 10 for the number of coins tossed. When the expression (H+T) 10 is expanded, we have, 1 H 10 +10H 9 T+45H 8 T 2 + 120H 7 T s +210H 6 T 4 +252H 6 T 5 +210H 4 T i + 120H :i T 7 +45H 2 T 8 +10HT 9 +T 10 which may be summarized as follows: Probability Ratio 1 H 10 1 chance in 1024 of all coins falling heads. . . toVt 10 H 9 T 10 chances in 1024 of 9 heads and 1 tail T i^ 45 H 8 T2 45 chances in 1024 of 8 heads and 2 tails T ££ T 120 H 7 T 3 120 chances in 1024 of 7 heads and 3 tails yV^r 210 H C T 4 210 chances in 1024 of 6 heads and 4 tails ^T 252 H 5 T 5 252 chances in 1024 of 5 -heads and 5 tails t %Vt 210 H 4 T 6 210 chances in 1024 of 4 heads and 6 tails ■£££. 120 H 3 T 7 120 chances in 1024 of 3 heads and 7 tails T Vftr 45 H 2 T 8 45 cliances in 1024 of 2 heads and 8 tails T f| T 10 HT 9 10 chances in 1024 of 1 head and 9 tails T ^J T IT 10 1 chance in 1024 of all coins falling tails ToW Total = 1024 These results are represented graphically in Diagram X, by a histogram and frequency polygon plotted on the same axes. The eleven terms of the expansion have been laid off at equal distances on the X-axis, and the chances of the occurrence of each combination of H's and T's plotted as scores on the F-axis. The result is a symmetrical probability curve, with the greatest concentration in the center, and the " scores " (the chances) falling away by corresponding decrements above and 1 The reader may take this expansion on faith ; or he may refer to the chapter on Binomials in any elementary Algebra. 80 STATISTICS IN PSYCHOLOGY AND EDUCATION below the central point. Diagram X represents the results which we should expect to get theoretically by tossing 10 coins 1024 times. Many experiments have been made for the purpose of checking the theoretical against the actual results, by tossing coins or throwing dice a great many times. In one well- known experiment 1 12 dice were thrown 4096 times, each / \ \ / / t \ / \ 200 \ i f \ \ i i \ V i i \ i i \ 100 / \ \ / t \ i \ \ i s \ / / \ . • N — 1 ^-t^ B 10 10H°T 45H 8 T 2 120H 7 r 3 210i/ t5 T 4 252H 6 r 5 210H 4 T t5 120fl 3 T T 45H-T 6 10HT 9 T 10 DIAGRAM X Probability Surface Obtained from the Expansion of (H+T) 10 . 4, 5, and 6 spot being taken as a " success " and each 1, 2, and 3 spot as a" failure.'' For example, in a throw of 3, 1, 2, 6, 4, 6, 3, 4, 1, 5, 2, 3, there would be 5 successes. The observed frequencies of the different number of successes and the theoretical results secured from the binomial expan- sion have been plotted on the same axes in Diagram XI. The reader will note how closely the observed frequencies check the theoretical: how close the two polygons are to being identical. If the reader should care to verify the results of Diagram XI by tossing 10 coins 1024 times, he will find his 1 Yule G. Udny, An Introduction to the Theory of Statistics, 5th edition, 1919, p. 258. GRAPHIC METHODS AND THE NORMAL CURVE 81 empirical results closely in accord with the theoretical expectations. 2. Why the Probability Curve is Employed in Psychological Measurement The frequency curve plotted in Diagram X from the expansion of the expression (H+T) 10 is a symmetrical 10-sided polygon. If the number of factors (e.g., coins) is increased 1000 >4 § 600 o> a c o > 400 200 *""■-< \ s • \\ / / / / \ 1 V / \ •\ / s ^^ r^"~* 'S- ^*5!^=^ 10 11 12 Theoretical curve Actual curve DIAGRAM XI Comparison of Observed and Theoretical Results in Throwing 12 Dice 4096 Times. (After Yule, page 258.) from 10 to 20, to 30, and then to 40 (the baseline extent remain- ing the same) the number of sides of the polygon will increase from 10 to 20, to 30, to 40. With each increase in the number of factors, the points on the curve will move more and more closely together, until finally when the number of factors becomes very large [when n in the expression (p+q) n becomes infinite] the polygon will become a perfectly smooth curve like the one in Diagram IX. The " ideal " polygon or normal curve, therefore, may be said to represent the relative frequency of occurrence of various combinations of a very large number of equal, similar, and independent factors, when the chances of the occurrence or non-occurrence of each factor is the same. 82 STATISTICS IN PSYCHOLOGY AND EDUCATION If now we compare the frequency curve in Diagram IX with the four graphs plotted from actual data obtained in measurements of height, intelligence (IQ), memory span, and temperature (see Diagram VIII) the similarity — as noted above — of these graphs to the normal curve is clearly evident. In other words, these distributions of variable phenomena act as though they were determined by the operation of factors which are present or absent according to the same laws which govern the combinations of coins and dice. This is found to be true of many other distributions as well; so that the general tendency of quantitative data to follow the normal probability curve is often called the " law of normal frequency." Stated briefly, this law is as follows: measurements of natural phenomena as well as measurements of mental and social traits tend to be distributed symmetrically about their central tendency in proportions which are determined by the laws of chance. The reason why frequency distributions of variable phenomena are similar to chance distributions obtained from tossing coins or throwing dice is that the former, like the latter, are probably due very often to the operation of the laws of chance. " Chance " may be defined as the result obtained from the operation of a great many factors, none of which is dominant, or, put id another way, all of which are (relatively) similar, equal, and independent. A number of small factors, for example, determine whether a coin will fall heads or tails, or whether a die will show a 2, 3, or 6 spot: the twist of the wrist, height from which coin or die is thrown, weight or size of coin or die, kind of floor on which experiment is made, and many others. 1 In like manner a man's height, or his weight, or the shape of his head, or his intelligence, or his eye color is determined, very probably, by a large number of factors which have approximately the same influence on the final result. (Note: Should one or more of these factors have special weight the distribution will no longer be of the prob- 1 See Jerome Harry, Statistical Methods, 1924. pp. 169-170. GRAPHIC METHODS AND THE NORMAL CURVE 83 ability type, but will be skewed or shifted over towards the uoper or the lower end of the scale. The question of " skew- ness " will be considered on page 86.) Experiments have shown that the normal probability curve serves to describe the frequency of occurrence of many variable facts with a relatively high degree of accuracy. Some of these distributions have already been shown in Diagram VIII. Important facts which give normal, or approximately normal, distributions may be classified as follows: 1 1. Biological statistics: the proportions of male to female births for the same country or community over a period of years; the proportion of different types of plants and animals in cross-fertilization (the Mendelian ratios). 2. Anthropometrical statistics: height, weight, cephalic index, etc., for large groups of same age and sex. 3. Social and economic statistics: rates of birth, marriage, or death, under uniform conditions; wages and output of large numbers of workers under like conditions and in same occupation; labor costs, prices, etc. 4. Psychological measurements: intelligence as measured by standard tests; speed of association, perception, reaction time, etc.; educational test scores, e.g., in spelling, arithmetic, reading. 5. Errors of observation: measures of height, speed of movement, magnitudes, physical and mental traits, etc., contain errors which are as likely to cause them to lie above as below the true value Such errors follow the normal probability curve. (This topic is treated in Chapter III.) The normal curve is often called the normal probability curve because it gives the theoretical probabilities of the occurrence of chance phenomena. It is also called the normal frequency curve because frequency distributions of actual data obtained from the measurement of many variable facts are normal. Finally, it is called the " curve of error " because when repeated measurements have been made of such variables as height, 1 Jones D. Caradog, A First Course in Statistics, 1921, p. 233. 84 STATISTICS IN PSYCHOLOGY AND EDUCATION linear magnitudes, time and extent of movement, reaction, time, etc, the separate measures tend to diverge from the " true " measure (or standard) by amounts which when plotted give the characteristic probability curve (see Chapter We may conclude this discussion of the normal curve with a word of caution. Despite the similarity of actual and chance distributions, the student must be careful not to draw the conclusion that because of this analogy, we can assume forthwith that mental and physical traits are always (or neces- sarily) due to the operation of equal, similar, and independ- ent factors governed entirely by chance. The factors which determine, say, musical ability or intelligence are too little known to warrant the assumption, a priori, that they operate in the same manner, and in accordance with the same laws, as those factors which give chance distributions of coins or dice. The selection of the normal curve, rather than some other type of curve, is, after all, sufficiently justified by the fact that it does generally fit the data better. However " the theoretical justification and the empirical use of the curve are two quite different matters." x 3. Important Properties of the Normal Frequency Curve In the normal frequency curve, the average, the median, and the mode all fall exactly at the midpoint of the distribution, and hence are numerically equal. This follows from the fact that the normal probability curve is perfectly symmetrical bilaterally, and in consequence all of the measures of central tendency must fall at the middle of the curve. Also in the normal curve, the measures of variability include certain con- stant fractional amounts of the total area of the curve as follows (see Diagram IX) : 1. If the SD is laid off in the plus and minus directions from the mean (to right and left) along the baseline, and if perpendiculars are erected at these points, the area included 1 Jones D. Caradog, ibid., p. 233. GRAPHIC METHODS AND THE NORMAL CURVE 85 ' by the perpendiculars, the baseline, and the curve itself con- tains the middle 68 . 26% of the total area under the curve. Stated briefly, between the mean and ±1<7 are found the middle 2/3 (approximately) of the cases in the normal dis- tribution. 2. If the AD is laid off in the plus and minus directions from the mean along the baseline, and if perpendiculars are erected at these points, the area included by the perpendicu- lars, the baseline, and the curve, contains the middle 57 . 5% of the total area. Put briefly, between the mean and ±1AD will be found the middle 57.5% of the cases in the dis- tribution. 3. If the PE is laid off in the plus and minus directions from the mean along the baseline, and if perpendiculars are erected at these points, the area included by the perpendicu- lars, the baseline and the curve contains the middle 50% of the area. Since the PE (equivalent to the Q in a normal dis- tribution) equals 1/2 the distance between the 75th and 25th percentiles, in a perfectly symmetrical distribution it marks off the 25% of the area directly above and the 25% directly below the mean — the middle 50% of the measures. Certain constant relations will be found to obtain among the measures of variability. These are easily derived from the per cents of area included by each. 1. PE= .6745 a 2. PE= .84534D 3. <r = 1.4825P# 4. <7 = 1.2533AD 5. AD= .7979 o- 6. AD = 1.1843P# The first of these relations is the only one used often enough to warrant its being memorized. From these equations it should now be evident why it was stated earlier (page 27) that the a is always greater than the AD which is, in turn, always greater than the Q(PE). 86 STATISTICS IN PSYCHOLOGY AND EDUCATION 4. The Measurement of Skewness In a frequency polygon or histogram, usually the first thing which strikes the eye is the symmetry, or — what is more often the case — the lack of symmetry in the figure. In the normal curve the mean, the median, and the mode all coincide, and there is a perfect balance or symmetry between the right and left halves of the figure. In a " skewed " distribution, on the other hand, the mean, the median, and the mode fall at different points in the distribution, and the balance (or center of gravity) is thrown to one side or the other — to right or left. The degree of displacement or skewness is measured by the formula, ~. 3 (mean — median) ,. „ N Skewness = ^ , .... (11) and in the normal distribution, since the mean = the median, the skewness is 0. The more nearly the distribution approaches the normal type, the closer together the mean and the median, and the less the skewness. If we apply formula (11) to the distribution of 54 Army Alpha scores in Table I, we get — .66 as the measure of skew- ness. Distributions like this one are said to be skewed negatively, or to the left: the scores are massed at the high end of the scale (the right end), and spread out gradually at the low or left end, as shown in Diagram XII. Distributions are skewed positively or to the right when the scores are massed at the low (the left) end of the scale, and spread out gradually at the high or right end (see Diagram XIII). Formula (11) gives the measure of skewness of the distribu- tion of 200 cancellation scores in Table II (2) as + . 003. This indicates a very low degree of positive skewness, and shows how very closely this distribution approaches the probability type. There are several reasons why distributions are skewed. In the first place we should hardly expect the distribution of IQ's obtained from a group of 25 eight-year old boys to be normal, GRAPHIC METHODS AND THE NORMAL CURVE 87 nor the distribution of IQ's obtained from a special class for the dull and feebleminded, even though the latter group Median Average DIAGRAM XII Negative Skewness: To the Left. were large. The small size of the group in the first case, and " special selection " l in the second are sufficient causes of skewness. 2 Again, technical faults in the construction Median DIAGRAM XIII Positive Skewness: To the Right. of the test, errors in scoring and the like may often produce skewness in a distribution of test scores. In addition to these more obvious causes, skewness also *A " selected " group is one which is not representative of the larger group from which it is drawn. 2 For an illustration of skewness due to both of these causes, see the distribu- tion of Table I. 88 STATISTICS IN PSYCHOLOGY AND EDUCATION results, oftentimes, from a real lack of " normality " in the data. 1 This condition arises when several of the factors determining a given result are dominant or prepotent and hence are present more often than chance would allow (see page 83). A simple illustration of this will be found in those distributions which result from the throwing of loaded dice. When dice of this sort are thrown, the resulting distributions will always be skewed, due to the greater " potency " of the heavier faces. Again, to take an illustration from real data, the graph representing the chances of death is considerably skewed — being higher in infancy and old age than in youth or old age — because of the difference in number and impor- tance of the " causes of death " at certain ages. One other illustration may be taken, this time from the field of tests. If an arithmetic test which involves only the four fundamental operations is given to 1000 eighth grade children, there will be a piling up of the scores towards the high score end of the distribution: a negative skewness. On the other hand, if the test contains only problems in fractions, square root, interest, etc., there will be a piling up of the scores (or at least a shift in the peak of the curve) towards the low score end of the scale: a positive skewness. These results may be ex- plained in terms of the small positive and negative factors which produce the probability curve. Too easy a test excludes from operation some of the factors which make for an extension of the curve at the upper end, such as a knowledge of more ad- vanced arithmetical relations, which the brighter children would know. Too hard a test excludes from operation factors which make for the extension of the curve at the lower end, such as a knowledge of very simple facts which would permit the answer- ing of a few, at least, of the questions had these been included. 1 Theoretically, there is no real reason why distributions should always be normal. Thorndike has written: " There is nothing arbitrary or mysterious about variability which makes the so-called normal type of distribution a neces- sity, or any more rational than any other sort, or even more to be expected on a priori grounds. Nature does not abhor irregular distributions." — Mental and Social Measurements, pp. 88-89. GRAPHIC METHODS AND THE NORMAL CURVE 89 In the one case we have a number of perfect scores, and little discrimination; in the second case a number of zero scores, and equally poor discrimination. IV. Some Practical Applications of the Normal Curve The entire area under any frequency curve represents the total number of frequencies in the distribution (see page 62). If we know the total area of the curve, therefore, and in addition the proportion of the total area in a given segment, it is pos- sible to compute very simply the frequency represented by the segment. This information in regard to the normal curve is given in Tables X and XI from which the theoretical frequency of any fractional part of the probability curve may be easily obtained. Acquaintance with these tables is extremely valuable in the solution of a large number of varied problems. For this reason before considering any problems which depend for their solution on the assumption of the normal distribution, it is very desirable that the construction and use of Tables X and XI be clearly understood. 1. The Construction and Use of Tables X and XI Table X gives the fractional parts of the total area under the normal curve found between the mean and ordinates erected at various distances from the mean, such distances measured in a units. 1 The total area of the curve (the num- ber of cases in the distribution) is taken arbitrarily as 10,000 because of the greater ease with which fractional parts of area x may then be calculated. The first column of the table, -, a gives the distances in tenths of a measured off on the baseline from the mean as the point or origin; distances in hun- dredths of cr are given by the headings of the columns. To find the number of cases in a normal distribution between the mean and the ordinate erected at a distance of l<r from 1 Table X should be studied in conjunction with Diagram IX. 90 STATISTICS IN PSYCHOLOGY AND EDUCATION x the mean, we go down the - column until 1.0 is reached, a and in the next column under . 00 take the entry opposite 1 . 0, viz., 3413. This figure means that there are 3413 cases in 10,000, or 34.13% of the entire area of the curve between the mean and la; or put more exactly, 34.13% of the cases in the normal distribution fall within the interval bounded by the baseline, the F-ordinate erected at the mean, the F-ordinate erected at a distance of la from the mean, and the curve itself (see Diagram IX for illustration). To find the per cent of the x distribution between the mean and 1 . 57a we go down the - a column to 1.5, then across horizontally to the column headed .07 and take the entry 4418. This means that in a normal distribution, 44.18% of the entire distribution falls between the mean and 1 . 57a-. Thus far we have considered only a distances measured in the positive direction from the mean; that is, we have taken account only of the right half — the high score end — of the normal curve. Since the curve is bilaterally symmetrical, however, the entries in Table X may be used for a distances measured in the negative (to the left) as well as the positive direction. Accordingly, to find the per cent of the distribution between the mean and — 1 . 26<r, we simply take the entry 3962 in the table: the entry in the column headed .06 opposite 1.2 x in the - column. This means that 39.62% of the cases in a the distribution fall between the mean and — 1.26o\ In the same way, the percentage of cases between the mean and — l.OOo- is found to be 34.13; and the student will now be able to verify the statement made on page 85 that between the mean and ±1.00cr are 68.26% of the cases in the normal distribution. While theoretically the normal curve meets the baseline at infinite distances to the right and left of the mean, for practical purposes the curve may be taken to end at points GRAPHIC METHODS AND THE NORMAL CURVE 91 TABLE X Fractional Parts op the Total Area (Taken as 10,000) under the Normal Probability Curve, Corresponding to Distances on the Baseline between the Mean and Successive Points Laid off from the Mean in Units of Standard Deviation. Example : between the mean, and a point 1 . 3 er ( — = 1.3), is found 40.32% of the entire area under the curve. .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 0.0 0000 0040 0080 0120 0160 0199 0239 0279 0319 0359 0.1 0398 0438 0478 0517 0557 0596 0636 0675 0714 0753 0.2 0793 0832 0871 0910 0948 0987 1026 1064 1103 1141 0.3 1179 1217 1255 1293 1331 1368 1406 1443 1480 1517 0.4 1554 1591 1628 1664 1700 1736 1772 1808 1844 1879 0.5 1915 1950 1985 2019 2054 2088 2123 2157 2190 2224 0.6 2257 2291 2324 2357 2389 2422 2454 2486 2517 2549 0.7 2580 2611 2642 2673 2704 2734 2764 2794 2823 2852 0.8 2881 2910 2939 2967 2995 3023 3051 3078 3106 3133 0.9 3159 3186 3212 3238 3264 3290 3315 3340 3365 3389 1.0 3413 3438 3461 3485 3508 3531 3554 3577 3599 3621 1.1 3643 3665 3686 3708 3729 3749 3770 3790 3810 3830 1.2 3849 3869 3888 3907 3925 3944 3962 3980 3997 4015 1.3 4032 4049 4066 4082 4099 4115 4131 4147 4162 4177 1.4 4192 4207 4222 4236 4251 4265 4279 4292 4306 4319 1.5 4332 4345 4357 4370 4383 4394 4406 4418 4429 4441 1.6 4452 4463 4474 4484 4495 4505 4515 4525 4535 4545 1.7 4554 4564 4573 4582 4591 4599 4608 4616 4625 4633 1.8 4641 4649 4656 4664 4671 4678 4686 4693 4699 4706 1.9 4713 4719 4726 4732 4738 4744 4750 4756 4761 4767 2.0 4772 4778 4783 4788 4793 4798 4803 4808 4812 4817 2.1 4821 4826 4830 4834 4838 4842 4846 4850 4854 4857 2.2 4861 4864 4868 4871 4875 4878 4881 4884 4887 4890 2.3 4893 4896 4898 4901 4904 4906 4909 4911 4913 4916 2.4 4918 4920 4922 4925 4927 4929 4931 4932 4934 4936 2.5 4938 4940 4941 4943 4945 4946 4948 4949 4951 4952 2.6 4953 4955 4956 4957 4959 4960 4961 4962 4963 4964 2.7 4965 4966 4967 4968 4969 4970 4971 4972 4973 4974 2.8 4974 4975 4976 4977 4977 4978 4979 4979 4980 4981 2.9 4981 4982 4982 4983 4984 4984 4985 4985 4986 4986 3.0 4986.5 4986.9 4987.4 4987.8 4988.2 4988.6 4988.9 4989.3 4989.7 4990.0 3.1 4990.3 4990.6 4991.0 4991.3 4991.6 4991.8 4992.1 4992.4 4992.6 4992.9 3.2 4993.129 3.3 4995.166 3.4 4996.631 3.5 4997.674 3.6 4998.409 3.7 4998.922 3.8 4999.277 3.9 4999.519 4.0 4999.683 4.5 4999.966 5.0 4999.997133 From: Tables for Statisticians and Biometricians, edited by Karl Pearson, Cambridge University Press, 92 STATISTICS IN PSYCHOLOGY AND EDUCATION — 3cr and +3o- from the mean. We find from Table X, for example, that 4986.5 cases in the total 10,000 fall between the mean and 3a; and 4986.5 cases will, of course, fall between the mean and — 3cr also. Therefore, since 9973 cases in 10,000, or 99.73% of the distribution, fall within the limits set by —3cr and +3<r, by cutting off the curve at these two points we disregard only .27 of 1% of the distribution — a negligible amount, except in very large samples. Instead of a, the PE may be used as the unit of measurement in determining the theoretical frequencies within given intervals of the normal curve. Table XI gives the fractional parts of the total area under the normal curve found between the mean and ordinates erected at various PE distances from the mean. The table is read in exactly the same way as Table X. To find, for instance, the number of cases between the mean and 1PE (or more accurately the ordinate erected at that point) x we go down the ^=— - column to 1.0 and in the next column PE under .00 read 2500. Twenty-five per cent of the cases in the distribution, therefore, lie between the mean and 1PE. In like manner 25% of the cases lie between the mean and —1PE; hence, it is clear that the middle 50% of the distribution is con- tained between the mean and —1PE and -\-lPE measured off from the mean. This table does not read in as fine units as Table X, only tenths and .05ths PE divisions being given. If smaller divisions are desired, however, interpolation can be made. Just as it is customary to disregard that part of the curve beyond the limits ±3<r, so we ordinarily disregard that part of the curve beyond the limits ±4PE. This is done because 9930 cases (4965X2) in the total 10,000 fall between the mean and ±^PE (see Table XI). Hence, in cutting of the curve at +4PE and —4PE, we disregard only .70 of 1% of the cases in the distribution. There is little to choose as between Tables X and XI. The former admits of slightly easier interpolation, but the latter GRAPHIC METHODS AND THE NORMAL CURVE 93 is probably accurate enough, without interpolation, for most of the work done in psychological measurement. TABLE XI Fractional Parts of the Total Area (Taken as 10,000) under the Normal Probability Curve, Corresponding to Distances on the Baseline between the Mean and Successive Points Laid off from the Mean in Units of PE. Example : we find between the mean and a point 1 . 55 PE ( -^= = 1 . 55 J from the mean 35.21% of the entire area under the curve. X .00 .05 X .00 .05 PE PE 0000 0135 3.0 4785 4802 .1 0269 0403 3.1 4817 4831 .2 0536 0670 3.2 4845 4858 .3 0802 0933 3.3 4870 4881 .4 1063 1193 3.4 4891 4900 .5 1321 1447 3.5 4909 4917 .6 1571 1695 3.6 4924 4931 .7 1816 1935 3.7 4937 4943 .8 2053 2168 3.8 4948 4953 .9 2291 2392 3.9 4957 4961 1.0 2500 2606 4.0 4965 4968 1.1 2709 (2810 4.1 4971 4974 1.2 2908 3004 4.2 4977 4979 1.3 3097 3188 4.3 4981 4983 1.4 3275 3360 4.4 4985 4987 1.5 3441 3521 4.5 4988 4989 1.6 3597 3671 4.6 4990 4991 1.7 3742 3811 4.7 4992 4993 1.8 3896 3939 4.8 4994 4995 1.9 4000 4057 4.9 4995 4996 2.0 4113 4166 5.0 4996 4997 2.1 4217 4265 5.1 4997.1 4997.4 2.2 4311 4354 5.2 4997.7 4998 2.3 4396 4435 5.3 4998.2 4998.4 2.4 4472 4508 5.4 4998.6 4998.8 2.5 4541 4573 5.5 4999 4999.1 2.6 4602 4631 5.6 4999.2 4999.3 2.7 4657 4682 5.7 4999.4 4999.5 2.8 4705 4727 5.8 4999.55 4999 . 6 2.9 4748 4767 5.9 4999.65 4999.7 94 STATISTICS IN PSYCHOLOGY AND EDUCATION 2. A Variety of Problems Solved by Means of Tables X and XI Under this heading we shall consider a number of problems which may be solved by means of Tables X and XI, on the assumption that the distributions which they involve are normal or approximately normal. For easy reference later, each group of examples is preceded by a general statement of the problem which they are designed to illustrate. A. To Determine the Per Cent of Cases in a Normal Distribution which Fall within Given Limits. Problem (1) — Given a normal distribution with Average = 12, and a = 4.00. (a) What per cent of the cases fall between 8 and 16? (6) What per cent of the cases lie above 18? (c) below 6? (a) A score of 16 is just 4 points above the mean, and a score of 8 is just 4 points below the mean. If we divide this differ- ence of 4 points by the a of the distribution (by 4) it is clear that 16 is la above the mean and that 8 is la below the mean (see Diagram XIV, Fig. I). 68.26% of the cases in a normal distribution fall between the mean and ±la (Table X). Hence, 68.26% of the scores in the given distribution, or approximately the middle 2/3, fall between 8 and 16. This result may also be stated in terms of " chances." Since 68.26% of the cases in the distribution fall between 8 and 16, the chances are 6826 in 10,000 or 68 in 100 that any score in the distribution will be found between 8 and 16. (b) A score of 18 is 6 points or 1.5a above the mean. From Table X we find that 43.32% of the cases fall between the mean and 1.5cr. Accordingly, 6.68% of the cases (50% -43.32%) must lie above 18, in order to fill out the 50% of cases in the right half of the curve (see Fig. 1). Stated as " chances," there are 668 chances in 10,000 or about 7 in 100 that any future score will lie above 18. (c) A score of 6 is — 1.5<r from the mean. Between the GRAPHIC METHODS AND THE NORMAL CURVE 95 -1.5CT 1.5C FlG.l. -1:150" 1:15(T Fig. 3. 150^ 1:25PE 182.50 Fig. I. .530-^ 7 .8i0- V 1.280- FlG. 5. -1.20- -1.20- 1.20" 1.20" FlG. 8. -2.45PE 2p£ 1PE 1PE 2i?E point FIG. 7. DIAGRAM XIV Illustrating a variety op Problems Solved by Means of Tables X and XI. 96 STATISTICS IN PSYCHOLOGY AND EDUCATION mean and — 1.5a (6) are 43.32% of the cases in the entire distribution. Hence, 6.68% of the cases lie below 6 — fill out the 50% below the mean — and the chances are 7 in 100 that any future score will lie below 6. Problem (2) — Given a distribution with Average = 29 . 75, and Q = 4 . 56. What per cent of the distribution lies between 22 and 26? What are the chances that a score will fall be- tween 22 and 26? In a normal distribution Q is equal to the PE. Score 22 is since . ' = 1 . 70 J from the mean, and score 26 is 3 . 75 points or — . 822PE from the mean (see Diagram XIV, Fig. 2). From Table XI, we find that 37.42% of the cases in a normal distribution lie between the mean and — 1.7QPE; and that 21% of the cases he between the mean and — .WIPE. By simple subtraction, therefore, 16.42% of the cases fall between — 1 . 70PE and — . S22PE or between 22 and 26. The chances are 1642 in 10,000 or 16 in 100 that a score will fall between 22 and 26. B. To Find the Limits in Any Normal Distribution which Will Include a Given Per Cent of the Cases Problem (1) — Given a distribution with Average = 16, and (T=4. What limits will include the middle 75% of the cases? The middle 75% of the cases in a normal distribution must include the 37.5% just above and the 37.5% just below the mean. From Table X, we find that 3749 cases in 10,000, or 37.5% of the distribution fall between the mean and 1.15a-; and consequently, 37.5% of the distribution must fall between the mean and — 1 . 15a-. The middle 75% of the cases, there- fore, lie between the mean and ±1.15<r; or since a equals 4, between the mean and ±4.60 points. Adding ±4.60 to the mean (to 16), we find that the middle 75% of the scores in the given distribution lie between 20.60 and 11.40 (see Diagram XIV, Fig. 3). GRAPHIC METHODS AND THE NORMAL CURVE 97 Problem (2) — Given a distribution with Average = 150, and Q =26. What limits will include the highest 20% of the group? The highest 20% of a normally distributed group must have 30% of the cases between its lower limit and the mean in order to fill out the 50% of cases in the right half of the dis- tribution (see Diagram XIV, Fig. 4). From Table XI, we find that 3004 cases in 10,000, or 30% of the distribution, fall between the mean and 1 . 25PE. Since the PE of the given distribution is 26, 1.25PE will be 1.25X26 or 32.5 points above the mean, namely, at 182 . 50. The lower limit of the highest 20% of the given group, therefore, is 182.50; and the upper limit is the highest score in the distribution, whatever that may be. C. To Determine the Relative Difficulty of Test Questions, Problems, or Other Test Items Problem (1) — Given a test question or problem solved by 10% of a large unselected group; a second problem solved by 20% of the group; and a third, solved by 30% of the group. Assuming that the capacity measured by the test problems is distributed " normally " what is the relative difficulty of questions 1, 2, and 3? Our first task is to find for question 1 a position in the distribution, above which are 10% (the per cent passed) and below which are 90% (the per cent failed) of the entire group. The highest 10% in a normally distributed group has 40% of the cases between its lower limit and the mean (50% — 10% = 40%, see Diagram XIV, Fig. 5), and from Table X we find that 39.97%, i.e., 40%, of a normal frequency distribution falls between 1.28a and the mean. Hence, question 1 falls at a point on the baseline of the curve whose abscissa is 1.28o- from the mean; and accordingly 1.28a may be taken as its difficulty value. In the same way, question 2, passed by 20% of the group, falls at a point in the distribution 30% above the mean 98 STATISTICS IN PSYCHOLOGY AND EDUCATION (50% -20% = 30%, see Fig. 5). From Table X we find that 29.95%, i.e., 30%, of the group falls between the mean and .84(7; hence question 2 has a difficulty value of .84a-. In like manner question 3, which falls at a point in the distribution 20% above the mean has a difficulty value of .53(7, since 20.19% of the distribution lies between the mean and .53o\ To summarize our results: Question Passed by <t value <r difference 1 2 3 10% ' 20% 30% 1.28 .84 .53 .44 .31 The a difference in difficulty between 2 and 3 is .31, roughly only 3/4 of the o- difference in difficulty between 1 and 2 (.44) in spite of the fact that the per cent difference is the same in the two cases. On the assumption that ability follows the normal frequency distribution, therefore, it is evident that the a and not the per cent difference gives the real index of dif- ferences in difficulty. Problem (2) — Given three test items, No. 1, No. 2, and No. 3, passed by 50%, 40%, and 30%, respectively, of a large group. What per cent of the same group must pass test item No. 4, in order for it to be as much more difficult than No. 3, as No. 2 is more difficult than No. 1? A question or problem which is " passed " by 50% of a group is, of course, " failed " by 50% also, and accordingly, such a problem falls exactly in the middle of normal distribu- tion of difficulty. Test item 1, therefore, has a a value of 0; it falls just on the mean (see Diagram XIV, Fig. 6). Test item 2 lies at a point in the distribution 10% above the mean, as 40% of the group passed, and 60% failed this problem. Accordingly, the a value of this item is .25, since from Table X, we find that 9 . 87% — roughly 10% — of the cases He between the mean and . 25c. Test item 3, passed hy 30% of the group, lies at a point 20% above the mean, and this item, therefore, has a difficulty value of . 52<r as 19 . 85% (20%) of the normal distribution lies between the mean and . 52c. GRAPHIC METHODS AND THE NORMAL CURVE 99 Now since item 2 is .25<r further along on the difficulty scale (towards the high score end of the curve) than item 1, it is clear that item 4 must be . 25a above item 3, if it is to be as much harder than 3 as 2 is harder than 1. Item 4, therefore, must have a value of .52(7+ .25(7 or .11 a) and from Table X, we find that 27.94% of the group fall between the mean and this point. This means that 50% — 28% or 22% of the group pass item 4. To summarize by a table: Test Item Passed by Difficulty Value (<r) <r difference 1 50% * .00 — 2 40% .25 .25 3 30% .52 — 4 22% .77 .25 A problem or test item must be passed by 22% of the group, therefore, in order for it to be as much more difficult than an item passed by 30%, as an item passed by 40% is more difficult than one passed by 50%. Note again that per cent differences are not reliable indices of differences in difficulty when the capacity measured is taken to be distributed normally. D. To Separate a Given Group into Sub-Groups According to Capacity, When the Capacity is Normally Distributed Problem (1) — Suppose that we have measured 100 college men on a certain test. We wish to classify our group into 5 sub-groups A, B, C, D, and E, according to ability, the range of ability to be equal in each sub-group. Assuming that the capacity measured by the test is distributed normally, or approximately so, and that the group is relatively unselected, how many men should be placed in groups A, B, C, D, and E, respectively? Let us first represent the positions of the five sub-groups graphically on the normal curve as shown in Diagram XIV, Fig. 7. If the baseline of the curve is taken to extend from — 3cr to +3(7, that is, over a range of 6(7, dividing this range by 5, we get 1 . 2(7 as the baseline extent to be allotted to each group. 100 STATISTICS IN PSYCHOLOGY AND EDUCATION These five intervals may be laid off on the baseline as shown in the figure, and perpendiculars drawn to demarcate the various sub-groups. It is clear that group A covers the upper 1.2a; group B, the next 1.2a; that group C lies .60- to the right and .60- to the left of the mean; and that groups D and E occupy the same relative positions on the left half of the curve, as B and A occupy on the right half. Now to find what per cent of the whole group falls within the A group, we must find what per cent of a normal distribu- tion lies between 3a (the upper limit of the A group) and l.Sa (the lower limit of the A group) (see Fig. 7). From Table X we know that 49.86% of a normal distribution falls between the mean and 3a; and that 46.41% falls between the mean and l.Sa. Hence 3.45% of the total area under the normal curve (49.86%-46.41%) falls between 3a and 1.8a, and, accordingly, group A comprises 3.45% of the whole group. The per cents in the other groups are found in exactly the same way. Thus, 46.41% of the normal curve falls between the mean and 1.8a (upper limit of group B) and 22.57% falls between the mean and .60- (lower limit of the same group). Subtracting, 46. 41% -22. 57% or 23.84% of our whole group evidently belongs in sub-group B. Group C lies .60- above and . 6a below the mean. Between the mean and . 60- is con- tained 22.57% of a normal distribution, and the same per cent is contained between the mean and — . 60-. Group C, then, includes 45% (22. 57% X 2) of the whole group. Finally, group D which falls between — .Qa and — 1 .80- contains exactly the same percentage of the total as group B; and group E which falls between — 1.80- and — 3a contains the same per cent as group A. The percentage (and number) of men in each group is given in the following summary: Group A B C D E Per cent of total in each group 3.5 23 . 8 45 23 . 8 3.5 Number in each group (100 men in all) ... 4 or 3 24 45 24 4 or 3 GRAPHIC METHODS AND THE NORMAL CURVE 101 On the assumption that the capacity measured follows the normal probability curve, therefore, only 4 men in the group of 100 should be placed in group A — call the marked ability group; 24 in group B, the high average ability group; 45 in group C, the average ability group; 24 in group D, the low average ability group ; and 4 in group E, the very low or stupid group. The above procedure may be used in determining how many individuals in a large class should get grades of say, A, B, C, D, E, or it may be employed for any number of grade-groups. The assumption must be made, however, that the subject in which the individuals are being graded follows the normal curve. 3. The Arrangement of Problems or Other Test Items into a Scale in which the Difficulty of Each Item is Known with Reference to Each Other Item as Well as Some Selected Zero Point One of the important tasks which confronts the worker with tests is the construction of scales which shall contain problems or questions graded in difficulty from very easy to very hard by known steps or intervals. Given a set of problems or test items, if we know what per cent of a large group (selected from among those for whom the test is intended) pass or fail each problem, it is a comparatively easy matter to arrange the problems in a rough order of difficulty. Such an arrangement, however, constitutes a very crude scale, as we know very little about the relative difficulty of the separate problems (see page 98) and next to nothing about the range of ability tested. For this reason in most scaled tests — if we can assume a normal or approximately normal distribution in the capacity tested — the unit of measurement is taken as the a or the PE. By so doing we are able not only to arrange the test items in a simple order of difficulty, but to " set " or space them at definite points along a scale of difficulty — along the baseline of the normal curve. On such a scale the distance from one item to another, 102 STATISTICS IN PSYCHOLOGY AND EDUCATION or from any given item to the selected zero point is known as definitely as the distance between two divisions on a foot rule. To illustrate concretely how a scale of this sort is made, let us suppose that we wish to construct a scaled test for measuring " reasoning ability " (e.g., by means of syllogisms) in 12 year olds; or an addition scale for Grade IV; or a scale for testing sentence memory in 8 year olds. The steps involved may be outlined as follows: (1) First it is necessary to compile a large number of problems or other test items which vary in difficulty from very easy to very hard, and which are fairly representative of the field covered by the test. (2) These problems are then given to as large a random sample as possible from among those for whom the scale is intended. (3) The per cent of the group which solves each problem correctly is next computed. This allows duplicates and prob- lems too easy or too hard or those which for one reason or another are unsatisfactory to be discarded. It also permits the arrangement of the problems selected for the scale into an order of difficulty. A problem solved correctly by 90% of the group is obviously easier than one solved correctly by 75%; while the second problem is, in turn, clearly less difficult than one solved correctly by 50%. The larger the per cent passing the lower the position of the problem on the difficulty scale. (4) By means of Table XI each per cent correct found in (3) may now be converted into a PE (or a) * distance above or below the mean. The procedure here is as follows. An item solved correctly by 40% of the group is 10% or .375PE above the mean. In like manner, an item solved correctly by 78% of the group is 28% (78% -50%) or l.lbPE below the mean. We may tabulate the results for five items selected at random as follows (see Diagram XIV, Fig. 8) : 1 The procedure is identical when a is employed instead of the PE. GRAPHIC METHODS AND THE NORMAL CURVE 103 Problem A B C D E Per cent solving 93 78 55 40 14 Distance from mean in per- centage terms —43 —28 —5 10 36 Distance from the mean in PE terms -2.20 -1.15 -.20 .375 1.60 Note that Problem A is solved by 93% of the group, i.e., by the upper 50% (the right half of the curve) plus the 43% to the left of the mean. Hence it is — 2 . 20PE to the left of the mean. In like manner, the percentage distance from the mean measured to the right or left — plus or minus — for each problem may be found by simply subtracting the per cent passing from 50%. From these percents, the PE distance of the problem from the mean can be read from Table XIV, as shown above. (5) With the PE distance of each problem above or below the mean established, the PE distance of each problem from the " zero point " of ability in the test may be calculated. This zero point is located in the following way. Suppose that 5% of the whole group failed to solve a single problem correctly. This puts the point of zero ability 45% of the distribution below the mean or at a point — 2A5PE from the mean. 1 The PE distance of each problem in the scale may now be found from this arbitrary zero point. To illustrate with the five problems above : Problem A B C D E PE distance from mean -2.20 -1.15 -.20 .375 1.60 PE distance from assumed zero, i.e., -2A5PE .25 1.30 2.25 2.83 4.05 The simplest way to find the PE distances from the given zero point is to subtract, algebraically, the distance of the zero point below the mean, from the PE distance of each problem from the mean. Problem A, for example, is —2.20 — ( — 2.45) or .25PE from the zero point; while problem E is 1.60 — ( — 2.45) or 4 . 05PE from the zero point. The PE value of each of the other 1 Note that this point is not a true zero unless the problems range down to zero difficulty. It serves, however, as a convenient reference point for the group for whom the test is intended. 104 STATISTICS IN PSYCHOLOGY AND EDUCATION problems as measured from the given zero point is found in the same way. When the PE value from zero of each of the problems has been determined, the difficulty of each problem with respect to every other problem as well as to zero is known and the scale is finished. It is evident, of course, that a scale of this sort will not usually have equal difficulty intervals or " steps " from easy to hard. However, this fact, while inconvenient, does not necessarily invalidate the usefulness of the scale as a measuring instrument. In lieu of a rule, one might use a stick on which marks had been set at 2, 3.7, 4.8, etc., inches with a fair degree of accuracy. Nevertheless linear measurements are certainly more easily obtained with a rule, and in like manner scores are more easily obtained when the scale has equal steps than when the steps are unequal. For this reason among others, scale makers have tried as far as possible to have the steps on their scales approximately equal. One method of doing this is to eliminate from the scale as first constructed, certain " odd n problems, and retain only those which fall at points approx- imately the same distance apart. Another plan is to try out a new set of problems, and from among these select problems which will fill in the gaps in the scale ; or to change the wording or scoring of a problem in such a way as to shift it up or down on the scale of difficulty. A good example of the first method of securing equal steps on the scale is given by the Woody Arithmetic Scales, Series B. 1 These scales represent a selection of certain problems from the longer Series A (scales constructed by the method outlined above) and contain problems which are progressive^ more difficult by approximately equal steps. The problems in Series A are not spaced at equal points on a difficulty scale. In the Addition Scale, for example, problem No. 1 has a difficulty value of 1 . 23PE as measured from the arbitrary zero 1 Woody, Clifford: Measurements of Some Achievements in Arithmetic. Teachers College, Columbia University, 1916. GRAPHIC METHODS AND THE NORMAL CURVE 105 -2.425PE; 1 problem No. 2 has a difficulty value of 1A0PE, and problem No. 3 a difficulty value of 2.50PE. i TABLE XII Difficulty Values (PE) of the Problems in the Woody Arithmetic Scale (Addition), Series A and B PE Differences Problem No. Series A, PE Value Series B, PE Value ± jjj jL»iucicuuca (Series B) 1 1.23 1.23 2 1.40 1.40 .17 3 2.50 2.50 1.10 4 2.61 5 2.83 2.83 .33 6 3.21 7 3.26 3.26 .43 8 3.35 9 3.63 10 3.78 3.78 .52 11 3.92 12 4.18 13 4.19 4.19 .41 14 4.85 4.85 .66 15 4.97 16 5.52 5.52 .67 17 5.59 18 5.73 19 5.75 5.75 .23 20 6.10 6.10 .35 21 6.44 6.44 .34 22 6.79 6.79 .35 23 7.11 7.11 .32 24 7.43 7.43 .32 25 7.47 26 7.61 27 7.62 28 7.67 29 7.71 7.71 .28 30 7.71 31 7.97 32 8.04 33 8.18 8.18 .47 34 8.22 35 8.58 36 8.67 8.67 .49 37 8.67 38 9.19 9.19 .52 1 The arbitrary zero point on the Woody addition scale is —2A25PE below the median of Grade II. 106 STATISTICS IN PSYCHOLOGY AND EDUCATION The number and the PE value of the other problems in Series A (Addition) and the problems which have been selected from this series to make up Series B are shown in Table XII. Each problem in Series A, as noted above, is expressed in terms of its PE distance from the arbitrary zero point —2A25PE below the second grade median. The extremely high PE values of the problems in the upper half of the scale result from the fact that the scale is intended for the elementary grades from II to VIII inclusive, and hence the more difficult problems fall entirely out of the range of second grade ability. Note that except in a very few cases, the problems in Series B appear as a graded series from easy to hard in which the steps from problem to problem are fairly well equalized. The score on this scale is simply the number of problems solved correctly — the distance which one progresses up the scale — just as a child's height is so many feet and inches on a scale of height. On a scale which has equal steps, we know that the increase from say point 10 to 12 is the same as the increase from 12 to 14, and 1/2 the increase from 14 to 15. Moreover, we may say that the child who works 8 problems is as far ahead of the child who works 4, as the second child is ahead of one who cannot work a single problem. We must be extremely careful not to interpret one measure of capacity on such a scale as "so many times' ' another measure, however. Unlike measures of height or weight which are measured from absolute zeros, the measures given by a scale of performance are taken from some arbitrary zero point selected by the experimenter. So while we may say that a man 72 inches in height is twice as tall as a child who is only 36 inches in height, we cannot, by analogy, say that a child who scores 5 on an addition test has doubled his ability when he is able to score 10, unless the measures in the test have been taken from the absolute zero point of " just no ability at all " in addition. The method of constructing a scale outlined above may be used with any group, grade, or class. When the scale is designed for use with more than one group, e.g., for the whole GRAPHIC METHODS AND THE NORMAL CURVE 107 elementary school, an extension of the method given is often used. In brief, this is as follows: (1) The PE value of each problem is determined for each grade separately, as shown above, by computing the per cent who pass each problem. (2) The PE distances between the different grade medians are then computed. This is done by finding the per cent of the pupils in each grade who have scores larger than the median score of the next grade. These per cents, when turned into PE values by means of Table XI, give the PE distances between adjoining grade medians. (3) Knowing the PE distances between the grade medians, we may now convert the PE distance of each problem from a given grade median into a PE distance from some common zero point. The different PE values of each problem as determined for the various grades are averaged to give the final scale value * — the distance from the common zero point. A shorter method than the one described may also be used. This is to compute the PE value of a problem once for all from the per cent of a large sample — drawn from the entire group — who pass the problem. This plan is practically identical with that which we have already described on page 102. It assumes that the capacity which the scale is designed to measure is dis- tributed normally throughout the entire group. While probably not as exact as the more elaborate method, it has the advantage of simplicity and straightforwardness. 4. The Conversion of Judgments by Relative Position — or Relative Merit — into a or PE Positions on a Scale The preceding paragraphs have dealt with the construction of performance scales built up on the principle that the per cent passing (or failing) a given problem is the best index of the difficulty of that problem. It sometimes happens, however, 1 A method of weighting the PE values of a problem in averaging the results from the different grades is described by Woody in his "Measurements of Some Achievements in Arithmetic." 108 STATISTICS IN PSYCHOLOGY AND EDUCATION that the ability to be measured is of such a nature that per- formance in it cannot be scored simply as correct or incorrect, but must be determined by a comparison with other perform- ances of a like sort. This leads to the construction of product scales. Handwriting scales, composition scales, drawing scales are examples of instruments in which the quality of the product is measured, and not its presence or absence in terms of a per cent or number correct. For example an individual's handwriting is rated for merit by comparing it with " standard " specimens of handwriting the quality of which is known. Quality scales are constructed on the assumption that equally often noticed differences — in merit or excellence — are equal. The first step is to secure a large number of samples of the thing to be measured, e.g., specimens of handwriting or composition, ranging from very poor to excellent. The next step is to have a large number of presumably able judges arrange these specimens in order of merit, in this way comparing each specimen with each other one. The number of times each specimen is ranked above each other one is now reduced to percentage terms, and this per cent is expressed as a PE difference between the two specimens. The PE difference determined, specimens selected for the scale may be expressed as so many PE above some arbitrary zero point. We may take specimens 8 and 9 on the Hillegas Composition Scale 1 as an illustration of the method. Hillegas had each of 202 judges arrange a number of English compositions in order of merit. An artificial composition was selected as being of zero merit, and given the value on the scale. Of the 202 judges, 136 or 67.5% ranked 9 as better than 8. From Table XI, we know that a percentage difference of 67.5% indicates a PE difference of .QQPE, and this value, therefore, expresses the amount by which 9 is better than 8. The value of 8 had already been found to be 7 .72PE above the point on the scale. Hence 9 is 7 . 72+ . 66 or 8 . SSPE above the zero compo- i Hillegas, Milo B. A Scale for the Measurement of Quality in English Composition by Young People. Teachers College, Columbia University, 1912, GRAPHIC METHODS AND THE NORMAL CURVE 109 sition. The values of the other compositions on the Hillegas Scale as measured in PE values from zero, the differences deter- mined in terms of relative merit, are 0, 1 . 83, 2 . 60, 3 . 69, 4 . 74, 5.85, 6.75, 7.72, 8.38, 9.37. Note that the steps on this scale are fairly regular, being approximately 1PE apart. 5. The Scaling of Total Scores on a Test Before concluding this brief review of the methods of con- structing scales, we should mention several methods used for scaling total scores on a test. The distinction between these methods and those we have outlined is that in the latter, instead of scaling each separate element on the test for difficulty — except possibly to secure an approximate order of difficulty — we simply determine the difficulty value attained as a result of doing correctly a certain number of test elements. In other words the score depends on total number of questions answered or problems worked, and the difficulty value of individual problems is not considered as in (3) and (4) above. The three methods 1 proposed for scaling total scores give, respectively, (a) a percentile scale, (6) an age scale, and (c) a T-scale. (a) We have already learned how to locate the percentile values in a distribution of scores (pages 45-46). In a per- centile scale a child making a certain score (total number correct) on a test is given a percentile rating of 20, 30, 70, etc., according to his position in the distribution. The percentile method assumes that the difference between a percentile of say 10 and 20 is the same as the difference between a percentile of 40 and 50: that percentile differences are equal throughout the scale. There is considerable reason to doubt this assumption of equal units on the percentile scale, however; and for this reason while practically very useful, the percentile scale is not entirely sound theoretically. (6) In the age scale, the mean number of points scored, on the test by unselected 7 year olds is scored 7, the mean num- ber of points scored by unselected 9 year olds is scored 9, and i See McCall, W. M. How to Experiment in Education, 1923, p. 95ff. 110 STATISTICS IN PSYCHOLOGY AND EDUCATION so on for other age groups. Scores which fall between age groups are evaluated by interpolation. The age scale is widely used, and is easily interpreted. The chief drawback to its use seems to be the difficulty of getting unselected samples for determining the norms of the low and high age groups. Many very young children are not in the schools, while many of the older ones for one reason or another have been eliminated. As a result, age scales are only strictly accurate between very narrow ranges of ability. (c) Recently McCall has suggested a method of scaling total scores, the !T-scale, which eliminates many of the defects of both the percentile and the age scale methods. In this method, scores are based on the a of the distribution of scores made by un- selected 12 year olds. jT-scores range from to 100. The zero point on the scale is taken at 5a below the mean and the 100 point at 5a above the mean. The unit of measure, or one " T " is .1 of the a of the distribution of unselected 12 year olds. The mean T'-score, therefore, is 50 and each 10 points above or below this point represent la of the 12 year old dis- tribution. In actual practice I'-scores will be found to range generally between 15 and 85. A person who stands at the mean of 12 year olds on a given test has a !T-score of 50; one who stands la above the mean, a T-scove of 60; and one who stands la below the mean of 12 year olds a T'-score of 40. x The construction of the T-scale has been described in great detail by McCall in Chapter X of his How to Measure in Education, and in consequence only the most important advantages of the scale need be considered here. 2 In the first place, the scale covers a wide range of ability which may be extended if necessary. Secondly, all T-scores are expressed in terms of the same unit and with respect to the same zero point and are equal throughout the scale. Accordingly, scores from different tests are directly comparable and may 1 For an example, see the Thorndike-McCall Reading Scales, published by Teachers College, Columbia University. 2 For a complete discussion of the advantages of the T-Scale over the age and percentile scales, see McCall, How to Experiment in Education, 1923, 94ff. GRAPHIC METHODS AND THE NORMAL CURVE 111 be combined by simple addition. Finally, a score of a given size will always have the same meaning when referred to the mean of unselected 12 year olds which remains at 50. V. The Transmutation of Measures by Relative Position (in Order of Merit) into Measures in Units of Amount It is often very desirable, especially in the calculation of coefficients of correlation, to be able to transmute measures arranged in order of merit into measures in units of amount or " scores " on some linear scale. This can easily be accom- plished by means of tables, provided we can assume " nor- mality " in the trait for which the ranking has been made. To take an example, let us suppose that we have 15 salesmen ranked in order of merit for selling efficiency, the most effi- cient ranked No. 1, the least efficient ranked No. 15. Now if we are justified in assuming that selling efficiency follows the normal probability curve, we can — with the aid of Table XIII — assign to each man a " selling score " on a scale of 10 or 100 points which will very probably represent his capacity as a salesman much better than a rank of 2, 6, or 14. The problem may be stated as follows: Problem (1) — Given 15 salesmen ranked in order of merit by their sales-manager, transmute these rankings into scores on a scale of 10 points. The procedure is as follows: First by means of a simple formula, „ , ... 100(^-.5) l , 10 . Per cent position = — =r= - / . . . (12) in which R is the rank of the individual in the series, and N the number ranked, we determine the " per cent position " of each man. Next, from Table XIII we read off the score on a scale of 10 points. Thus Salesman A who ranks No. 1 (see the 1 This formula and the method built around it were devised by Professor Clark Hull. See Hull, The Computation of the Pearson r from Ranked Data, Journal of Applied Psychology, 1922, 6, 385. 112 STATISTICS IN PSYCHOLOGY AND EDUCATION table below) has a per cent position of ^— — or 3.34, and his score from Table XIII is 8.5 (finer interpolation un- necessary). In like manner, Salesman B who ranks No. 2 has a per cent position of r— — : — or 10, and his score, accord- ingly, is 7.5. The scores of the others, found in exactly the same way, are given in the following table: Salesmen Rank Per cent Position Score (Scale 10) A 1 3.34 8.5 B 2 10.00 7.5 C 3 16.67 6.9 D 4 23.34 6.4 E 5 30.00 6.0 F 6 36.67 5.7 G 7 43.34 5.3 H 8 50.00 5.0 I 9 56.67 4.7 J 10 63.34 4.3 K 11 70.00 4.0 L 12 76.67 3.6 M 13 83.34 3.1 N 14 90.00 2.5 15 96.67 1.5 On several previous occasions, it has been pointed out that the assumption of normality in a trait or capacity implies that differences at the extremes of capacity are relatively much greater than the same differences around the average or mean. This is clearly brought out in the table above; for while all differences in the order of merit series equal 1, the differences between the transmuted scores vary considerably, being greatest at the ends of the series, and smallest in the middle. The difference between A and B, for example, or between N and O, is three times as great as the difference between G and H. Stated differently, we may say that it is three times as easy to move from H to G (from 8th to 7th place) as from B to A (from 2nd to 1st place). GRAPHIC METHODS AND THE NORMAL CURVE 113 TABLE XIII [From Hull, Journal of Applied Psychology, 1922] The Transmutation of an Order of Merit into Units of Amount or "Scores." Let R represent the rank in the Order of Merit, and N the number iked. Then from the formula, Per ( per cent position, and from it the score. ranked. Then from the formula, Per cent position = =r= — '- — , find the Example :: IfJV=25, and R= 3, Per cent position = 100(3-5) 25 or 10.00, and from th e table the score is 7 . 5. Per cent Score Per cent Score Per cent Score .09 9.9 22.32 6.5 83.31 3.1 .20 9.8 23.88 6.4 84.56 3.0 .32 9.7 25.48 6.3 85.75 2.9 .45 9.6 27.15 6.2 86.89 2.8 .61 9.5 28.86 6.1 87.96 2.7 .78 9.4 30.61 6.0 88.97 2.6 .97 9.3 32.42 5.9 89.94 2.5 1.18 9.2 34.25 5.8 90.83 2.4 1.42 9.1 36.15 5.7 91.67 2.S 1.68 9.0 38.06 5.6 92.45 2.2 1.96 8.9 40.01 5.5 93.19 2.1 2.28 8.8 41.97 5.4 93.86 2.0 2.63 8.7 43.97 5.3 94.49 1.9 3.01 8.6 45.97 5.2 95.08 1.8 3.43 8.5 47.98 5.1 95.62 1.7 3.89 8.4 50.00 5.0 96.11 1.6 4.38 8.3 52.02 4.9 96.57 1.5 4.92 8.2 54.03 4.8 96.99 1.4 5.51 8.1 56.03 4.7 97.37 1.3 6.14 8.0 58.03 4.6 97.72 1.2 6.81 7.9 59.99 4.5 98.04 1.1 7.55 7.8 61.94 4.4 98.32 1.0 8.33 7.7 63.85 4.3 98.58 .9 9.17 7.6 65.75 4.2 98.82 .8 10.06 7.5 67.48 4.1 99.03 .7 11.03 7.4 69.39 4.0 99.22 .6 12.04 7.3 71.14 3.9 99.39 .5 13.11 7.2 72.85 3.8 99.55 .4 14.25 7.1 74.52 3.7 99.68 .3 15.44 7.0 76.12 3.6 99.80 .2 16.69 6.9 77.68 3.5 99.91 .1 18.01 6.8 79.17 3.4 100.00 19.39 6.7 80.61 3.3 20.93 6.6 81.99 3.2 114 STATISTICS IN PSYCHOLOGY AND EDUCATION Another use to which Table XIII may be put is in the combining of incomplete order of merit rankings. To illus- trate with a problem: Problem 2 — Given six persons, A, B, C, D, E, and F, to be ranked for honesty by three judges. Judge 1 knows all six well enough to rank them; Judge 2 knows only three well enough to rank them; and Judge 3 knows four well enough to rank them. Can we obtain a fair order of merit for all six persons by combining these three sets of rankings, two of which are incomplete? We may tabulate the data as follows: Persons A B C D E F Judge l's ranking 1 2 3 4 5 6 Judge 2's ranking 2 1 3 Judge 3's ranking 2 1 3 4 Now assuming that honesty is " normally distributed ' : it seems fair that A should get more credit for ranking first in a list of six than D for ranking first in a list of three, or C for ranking first in a list of four. In the order of merit rankings, all three are given the same rank. But when we assign scores to each person in accordance with his position in the list bj r means of formula (12) and Table XIII, A gets 77 for his first place, D gets 69 for his, and C gets 72 for his (see table below) . ! Persons A B C D E F Judge l's ranking 1 2 3 4 5 6 Score 77 63 54 46 37 23 Judge 2's ranking .. 2 .. 1 .. 3 Score 50 69 . . 33 Judge 3's ranking 2 .. 1 .. 3 4 Score 55 .. 72 43 28 Sum of scores 132 113 126 115 SO S4 Average score 66 57 63 58 40 28 Order of Merit 1 4 2 3 5 6 1 It is somewhat doubtful whether it is usually worth the trouble to trans- mute orders of merit into scores as shown above and then combine them so as to get a weighted order (see Garrett, H. E., An Empirical Study of the Various Methods of Combining Incomplete Order of Merit Ratings. Journal of Educational Psychology, 1924, XV, pp. 157-171). If it is deemed desirable to weight ratings, however, the method given will prove useful. GRAPHIC METHODS AND THE NORMAL CURVE 115 The other ratings are transmuted in the manner shown above. All of the scores are then combined and averaged to give the final weighted order of merit as shown in the table. With formula (12) and Table XIII it is possible to transmute any set of ranks into scores on the assumption of a normal distribution in the trait for which the ranking is made. This is very useful in the case of those traits which are not easily measured by ordinary methods, but for which individ- uals may be arranged in an order of merit, as for example athletic ability, personality, beauty, etc. It is also valuable in correlation when a set of ranks is the only available " crite- rion " for a given ability while the " independent " tests are scored in ordinary test units. 1 Transmuted scores may be combined, or averaged, like other test scores. A word of explanation may be said in regard to the con- struction of Table XIII. This table was derived from a table of the theoretical frequencies of the normal frequency distri- bution in which the curve was taken to end at ±2.5cr. The baseline of the curve is 5cr, therefore, and may conveniently be subdivided into 100 parts, each . 05<r. The first . 05<r from the upper extreme limit of the curve takes in .09% of the distri- bution and is scored 9.9 (or 99 on a scale of 100). The next .05(7 (.lOcr from the upper end of the curve) takes in .20% of the entire distribution and is scored 9.8, or 98, and so on. In each case, the percent position gives the fractional part of the normal distribution which lies to the right of the given a value on the baseline. The a values determine the score. PROBLEMS 1. (a) Plot both distributions given in example (2), page 56 as frequency polygons and histograms. For comparative purposes plot the frequency polygon and the histogram for each distribution with respect to the same coordinate axes: on the same diagram. (b) Calculate a measure of skewness for both distributions. 1 The definition of a criterion and its value in determining the validity of one or more tests is discussed at length in Chapters V and VI. 116 STATISTICS IN PSYCHOLOGY AND EDUCATION 2. Plot distribution A, example (2), page 56, as an ogive. Compare the percentiles obtained from the graph with the calculated values. 3. Assuming that trait X is completely determined by 6 factors — all equal in value, similar, and independent, and each as likely to be present as absent — plot the distribution which one would most probably get from the measurement of trait X in an unselected group of 1000 people. 4. In a random sample of 1000 cases, Average = 14 . 4, and a = 2. 5. (a) What per cent of the cases lie between 12 and 16? (b) What are the chances that any future case will be above 18? (c) What are the chances that any future case will be below 8? 5. In an approximately normal distribution of 100 cases, Average = 29.74, Q(PE) =3. 18. (a) What per cent of the cases lie between 24 and 25? (6) What limits include the middle 60% of the cases? (c) What limits include the lowest 5% of the cases? 6. In a certain test the 7th grade median is 28, with a Q of 4.8; and the 8th grade median is 31 .6, with a Q of 4.0. What per cent of the 7th grade is above the median of the 8th grade? 7. A group of 12 year olds, two years ago, had a reading ability expressed by an average of 40, and a <r of 3.6; and a composition ability expressed by an average of 62, and a a of 9.6. Today the group has gained 12 in reading and 10.8 in composition. How many times greater is the former than the latter gain? 8. Four problems, 1, 2, 3, and 4, are solved by 50%, 60%, 70%, and 80%, respectively, of a large group. Compare the dif- ference in difficulty between 1 and 2 with the difference in difficulty between 3 and 4. 9. In a college the 10 grades A+, A, A- ; B+,B,B-; C+,C,C-; and D are given. On the assumption that ability in mathe- matics is distributed normally, how many men in a group of 500 Freshmen should receive each grade? 10. Five problems are passed by 15%, 34%, 50%, 62%, and 80% of a large unselected group. If the zero point of ability is taken at — 3a, what is the a value of each problem as measured from this point? GRAPHIC METHODS AND THE NORMAL CURVE 117 11. In a large group of competent judges, 88% rank composition A as better than composition B; 65% rank B as better than C. If C is known to have the PE value of 3.5 as measured from the zero composition, i.e., the composition of zero merit, what are the PE values of B and A as measured from this " zero "? 12. Twenty-five men on a football squad are ranked in order of merit from 1 to 25 for general playing ability by the coach. Assuming " normality " in the trait " general playing ability " transmute these ranks into units of amount on a scale of 100 points. Answers 4. (a) 57.04%. (b) 749 in 10,000. (c) 52 in 10,000. 5. (a) 4.8%. (6) 25.76 and 33.72. (c) 21.95 and the lower limit of the distribution. 6. 30.65%. 7. 2 . 96 (approximately 3) times as great. 8. Difference between 1 and 2, .25<j; between 3 and 4, .315a-. 9. Grades: A+ A A- B+ B B- C+ C C- D No. men receiving: 3 f 14 40 80 113 113 80 40 14 3 10. In order: 4.04; 3.41; 3.00; 2.69; 2.16. 11. B, 4.07PE; A, 5.82PE. 12. tank Score Rank Scoi 1 89 13 50 2 80 14 48 3 75 15 46 4 71 16 44 5 68 17 42 6 65 18 39 7 63 19 37 8 61 20 35 9 58 21 32 10 56 22 29 11 54 23 25 12 52 24 20 25 11 CHAPTER III THE RELIABILITY OF MEASURES I. What is Meant by the Reliability of a Measure By the " true " measure of an individual's capacity in any trait, as for example, the true measure of his height, reaction, time, or intelligence, we mean the average of an infinite number of measurements of the given capacity made under precisely the same conditions. Obviously, in actual practice, we can never deal with true measures as thus defined — for usually w r e must be satisfied with a single measure, or at best with a compara- tively few measures of the given trait. We can, however, measure the amount by which an obtained measure "most probably" varies from its corresponding true measure; and this measure of "probable divergence" serves as an index of the reliability of the obtained measure — of how good an approxi- mation it is of the true measure. In like manner, the reliability of an obtained measure of a group is determined by finding the probable divergence of the obtained measure from the true measure of the group. The true measure of a group — as for example the true average or the true a — is defined as that measure obtained by taking into account all of the members of the group, and the true measure of difference between two groups is the difference between their true means or medians. To show just what is meant by the " true measure " of a group, let us suppose that we could measure the height of every 12 year old boy in the United States. If from this frequency distribution of heights, we should calculate a measure of central tendency and a measure of variability — the average and a for example — this average would be the true average height of 12 year old IIS THE RELIABILITY OF MEASURES 119 boys in the United States, and the a would be the true measure of scatter around this average. In the same way, if we could measure the height of every 12 year old girl in the United States, it would be possible to secure the true average height, and the true variability around it, of 12 year old girls in this country. Moreover, knowing the true average height of 12 year old boys and the true average height of 12 year old girls, it would be a very simple matter to find the true difference between the average height of 12 year old boys and 12 year old girls in the United States. Unfortunately it is rarely, if ever, possible to measure all of the individuals in a group or " population," and it is, of course, impossible to take an infinite number of measures of a given individual. We must be content, therefore, to deal with " samples " selected from the total number of possible meas- ures; and, as a result, due to slight differences in the samples chosen, measures of central tendency and variability are often larger or smaller than their corresponding true measures. Hence, whenever we have measured an individual or a group, we must ask ourselves this question: " How reliable a measure of capacity have I secured? How well does it ' represent ' the true measure which I should get from a very large (infinite) number of measures of this individual — or from measuring all of the individuals in the population from which my group is taken?" This question will often lead to a second: " How many measurements must I make in order to get a result which shall meet a certain standard of reliability, i.e., show a probable divergence from the true result which is less than some given amount?" The purpose of the following sections is to develop methods which will enable us to answer these questions. First, the reliability of the mean and median will be considered; then the reliability of the measures of variability; and finally the reliability of the difference between two measures. 1 1 The method of finding the reliability of a coefficient of correlation is given later on page 170. 120 STATISTICS IN PSYCHOLOGY AND EDUCATION II. The Reliability of Measures of Central Tendency 1. The Reliability of the Average or Mean A. The Reliability of the Mean in Terms of its Standard Error O av .) Perhaps the simplest approach to the study of the reliabil- ity of the average is to examine the factors upon which the reliability of this measure must depend. Suppose that we wish to find the average score of college freshmen in the United States on Army Alpha. To measure the achievement of college freshmen in general, would require in strict logic that we test all of the freshmen in the United States. However, this is a well-nigh impossible task, and hence we must be satisfied with taking the records of as large and random a sample of freshmen as we can secure. This means that we cannot use freshmen from only a single institution or from only one sec- tion of the country, and that we must guard against selecting only those with low or high scholastic records. The more successful we are in getting an " unselected " group the more nearly representative will this group be of all of the freshmen in the country. Evidently, therefore, the reliability (the " repre- sentativeness ") of an average depends, for one thing, on how impartially we have selected our sample. Granted a fair sample, the reliability of an average can be shown to depend upon two characteristics of the distribution, (1) the number of cases, and (2) the variability or spread of the measures within the sample. (1) It is clear that the number of cases must influence the stability of an average, since the addition of even one extra measure to a series will bring about a change in the average unless the additional case happens to coincide with it exactly. Moreover, the addition of one case to a set of 10 measures will cause a greater change in the obtained average — written " average (0 bt.)" — than the addition of one extra case to a set of 1000 measures, as each case counts for less in the larger THE RELIABILITY OF MEASURES 121 group. It has been shown empirically, as well as theoretically, 1 that the reliability of an average (0 bt.) will increase, not in pro- portion to the number of measures upon which it is based, but rather in proportion to the square root of the number of measures. Thus the average (ob t.) of 25 measures of a vari- able quantity is not 25 times, but V25 or 5 times as reliable as a single measure of the quantity. And in like manner, the average of 36 cases is not 4 times as reliable as the average of 9 cases, but only twice as reliable — since V 36 divided by V9 equals 2. (2) In addition to the size of the sample, the reliability of an average must depend also upon the variability of the separate measures around the obtained average. If the a of the distribution is large, the separate measures tend to scatter widely from the average, and we are unable to say where those cases in the population which we have not measured will most probably fall: whether they will be close to, or far from the obtained average. On the other hand, if the a is small we may be fairly certain that unmeasured cases will fall fairly close around the average. For this reason, the reliability of an obtained average depends upon the size of its a — and as a increases, the reliability decreases. We find, then, that the reliability of an average depends first upon our having selected a fairly representative sample from the larger group — or population — which we are studying. When this condition has been met, and only then, the reli- ability of an average can be measured mathematically in terms of its standard error — in terms of the number of cases, and the a of the distribution (written cr (dis) ). The formula for the standard error of an average or mean, written o- av . is °"~Vft' (13) 1 Yule: An Introduction to the Theory of Statistics, 19l9, p. 257. For results of experiment, see Fullerton and Cattell: On the Perception cf Small Differences, Publications of the University of Pennsylvania, Philosophical Series 2, 1892. 122 STATISTICS IN PSYCHOLOGY AND EDUCATION This is one of the most important — and most often used — of the reliability formulas. Note that a decrease in <7(di s .), or an increase in the size of N will cause the standard error to be- come smaller numerically. A decrease in <r av . means that the probable divergence of the obtained average from the true is just so much less; hence the reliability of an average (0 bt.) in- creases as cr av . decreases. A problem will illustrate the value and use of formula (13). Problem (1) — In 1883, the Anthropometric Committee of the British Association found the average height of 8585 adult males in the British Isles to be 67 . 46 inches with a a of 2 . 57 inches. 1 How reliable is this average? What is its probable divergence from the average which would have been secured had all adult males in the British Isles been measured? Applying formula (13) the standard error of the mean, <r av ., is found to be .0277 inch. This result is interpreted in the following way. The chances are 6826 in 10,000 or 68 in 100 that the obtained average of 67.46 inches does not diverge from the true average by more than ±l<r av . 7 i.e., by more than ±.0277 inch. Stated in another way, the chances are 68 in 100 that the true average lies within the limits 67.46+ .0277 and 67. 46 -.0277, or between 67.488 and 67.432 inches. We can be practically certain that the true mean lies within the limits 67.46±3X .0277 (=fc3o- av .), or between 67.543 and 67.377 inches (see Table X for a values). Just how the standard error measures the reliability of an average may be shown most clearly, perhaps, by an illustra- tion. Suppose that we have measured the heights of 1000 groups of men, each group containing 8585, the groups or samples chosen at random from the general population. The 1000 averages obtained from these groups will tend to differ slightly from one another due to so-called errors of sampling (see page 143) and hence not all samples will represent with equal accuracy the population from which they have been i Yule, An Introduction to the Theory of Statistics, 1919, pp. 112 and 141, THE RELIABILITY OF MEASURES 123 drawn. Now suppose, further, that it were possible to secure the average height of the entire male population of the British Isles. If we should subtract this true mean from each one of the 1000 obtained means, obviously we would get 1000 differ- ences, and these 1000 " measures " (differences) would — according to the best assumption that we can make — follow the normal probability curve (see page 83). In this hypo- thetical distribution of differences, we should have relatively few large plus or minus deviations, and a relatively large num- ber of small plus, small minus, and zero deviations — in short, the obtained means would hit close to the true mean more often than they would miss it. The average of this distribution of differences would fall (most probably) at 0; for other things being equal, this will be the difference most often obtained — the maximum frequency — in subtracting the true from the obtained means. The a of this distribution is given by the formula -^=. In other VN words, the standard error of the mean measures the spread of the differences (obtained-true) around as a central tend- ency; and for this, reason o- av . is a measure of the probable diver- gence of the obtained average from its corresponding true average. These results are represented graphically in Diagram XV, Fig. 1. The 1000 differences between the 1000 obtained means and the true mean are shown arranged into a normal frequency distribution with mean at 0, and a equal to . 0277. The heights of the different ordinates represent the frequency of the various obtained-true differences: the height of the maximum ordinate at the mean is the zero difference. Now we know that the a of a normal distribution includes the middle 68.26% of the cases, when measured off in the plus and minus directions from the mean. Hence we may say that the chances are 68 in 100 that the difference between the obtained mean of 67.46 inches and the true mean will not be greater than ± . 0277 inch. Or, as stated above, there are 68 chances in 100 that the true average 124 STATISTICS IN PSYCHOLOGY AND EDUCATION lies within the limits 67. 46 +.0277 and 67. 46 -.0277, or between 67.488 and 67.432 inches. Furthermore, we can be practically sure that the true average will fall within the limits dz3o- av . from the mean. Three times ±.0277 is ±.0831; and accordingly there are 9973 chances in 10,000 (see Table X) that the true average lies within the limits 67.46± . 0831, or between 67.543 and 67.377 inches. -.0831 —.0277 FlG.l .0277 +3 <r .0831 5000- cases 28.1 29 29.C 30.2 30.8 Fig. 3 2.17CT 31.5 32 -1.6PE 2i 26.4 30 Fig. i 142.7 147.7 149.7 151.7 152.7 153.7 Fig. 5 1.340- Fig. 6 DIAGRAM XV The average height of our sample of 8585 British males has been found to be 67.46 inches with a standard error of .0277 inch. Let us now proceed to the second question stated on page 119, viz., "How many measurements must I make in order to get a result whose probable divergence from the true result is less than some given amount ?" Suppose, for example, that we wish to secure an average which is twice as reliable as the average we now have — how many cases will be required? Assuming that the spread in the increased group, THE RELIABILITY OF MEASURES 125 i.e., <T( d ig.), remains approximately the same, all that we need do in order to cut the standard error in two and thus double the reliability, is to place a 2 in the denominator of the fraction ; . But 2V8585 becomes V4X8585 when the 2 is placed V8585 under the radical, and, accordingly, it is evident that 8585 must be multiplied by 4 in order to make <r av . just 1/2 its original size. By analogy, to double the reliability of any average we must multiply N by 4; to triple the reliability, by 9, etc. Assuming substantially the same o- (dlSi) , the average obtained from 400 cases is twice as reliable as the average got from 100, and the average from 900 cases three times as reliable as that from 100 cases. B. The Reliability of the Mean in Terms of the PE of the Average In measuring the reliability of an average the PE of the average — written PZ? (av .) — may be used instead of the cr av The Pi?(av.) is interpreted in exactly the same way as the o- (av .) . Its formula is derived simply by multiplying formula (13) by .6745 (seepage 121): PE (av ^ ' 67 y^ (14) Applying this formula to our problem of heights P£ , (av .) is found to be .0187 inch. The chances are even, therefore, that the obtained average of 67 . 46 inches does not differ from the true average by more than ± . 0187 inch. Moreover, since ±4PE includes practically all of the cases in a normal distribution, we may be certain (the chances are 99 in 100) that the true average lies within the limits 67.46±4X .0187, or between 67.39 and 67.53 inches (see Table XI for PE values). A comparison of the extreme limits within which we may be practically sure that the true average will lie shows that the values of these limits differ slightly when ±4P2£ instead of ±3<r are taken as limiting points [see Problem (1) above]. 126 STATISTICS IN PSYCHOLOGY AND EDUCATION This discrepancy is due to the fact that ±3<7 takes in 9973 of the 10,000 cases in the normal distribution, while ±4Pi? takes in but 9930 cases (see Tables X and XI). The a limits, therefore, contain 43 more cases than the PE limits, and while 43 cases in 10,000 may seem to be an insignificant number — and is insignificant if taken from the middle of the distribution — even so few cases as this have considerable importance at the extremes of the distribution. This may be seen in the fact that we must take ±4:A5PE, in order to have our PE limits correspond exactly to ±3<r, since these limits include 9974 cases in 10,000. It is customary, however, in measuring reliability to use zt4:PE instead of ±4.45P1? as limits of practical certainty. In the first place, ±4:PE mark off limits within which the chances are very great — 9930 in 10,000 — that the true average will fall. And furthermore, the slight increase in reliability got by using ±4.45Pi? instead of ±4PE is not usually sufficient to offset the greater convenience of the latter figure. 2. The Reliability of the Median The formulas for measuring the reliability of an obtained median are easily derived from those for measuring the reli- ability of the mean. The o- (mdn .) and Pi^mdn.) are 1.25331, or roughly 5/4, times the o- av . and P2£( av0 respectively. _5 0-( d i s .) n »* <r (num.)- J" ;^f> UOJ DJ? _5 . 6745Xcr( d | S) _ . 84 54cr ( d ls-) , p . or PBo-w-f-^. 1 (16a) Formulas (15), (16), and 16a) are all used and interpreted in the same way as the reliability formulas for the average or 1 This formula should be used when Q and not a is given. THE RELIABILITY OF MEASURES 127 mean. A problem will serve to show how the reliability of the median is found. Problem (2) — Measurement of 801 12 year old boys on the Trabue Language Scale A 1 gave the following results : Median = 21.4; Q = 4.9. What is the reliability of this median? How close is it to the true median score of 12 year old boys? From formula (16a) the PE {md n.) is found to be .2164. The chances are 50 in 100, therefore, that the true median does not differ from 21 . 4 by more than ± . 2164. We may be practically certain that the true median lies within the limits 21.4±4X .2164, or between 22.27 and 20.53. Since cr (mdn0 and PE imdn , } are both larger — approximately 1 . 25 times — than the corresponding measures of reliability of the average (obt.), it is clear that the obtained average is always more reliable than the obtained median of the same group. For this reason the average is used whenever the highest reliability is sought (see page 50). III. The Reliability of Measures of Variability 1. The Standard Deviation, or <r We have seen that the reliability of an obtained average or obtained median is found by determining the probable divergence of the obtained from the true measure. In the same way, the reliability of an obtained a or an obtained Q is measured by the probable divergence of this measure from the true a or the true Q, viz., the a or the Q which we should get from all possible measures of the trait in question. The formula for finding the reliability of an obtained a is -*«** (17) " V2N' In Problem (1), page 122, we found that for 8585 adult British males, the obtained <t — the a taken around the i Trabue, M. R., Completion Test Language Scales, 1916, p. 15. 128 STATISTICS IN PSYCHOLOGY AND EDUCATION average (ob t.) of 67.46 inches — was 2.57 inches. The question may well be asked: how reliable is this a? How well does it represent the true a which we should get if deviations could be taken from the true average? Substituting for <r i6iam) and N in formula (17), the value of ov is found to be .0196 inch. This means that the chances are 68 in 100 that 2 . 57 inches does not differ from the true a by more than ±.0196 inch; and that the chances are 997 in 1000 that the o- (dls0 does not differ from the true a by more than 3X=b.0196 or ±.0588 inch. We can be practically certain, then, that the true a lies within the limits 2.57± .0588, or between 2.63 and 2.51 inches. 2. The Quartile Deviation, or Q The reliability of the Q of a distribution is found from the formula, CQ - vm ' (18) 1.65X0 , 10 v OQ= -7m~ (18a) or in terms of Q, The 801 12 year old boys who took the Trabue Completion Test, Scale A (see page 127), had a median score of 21 .4 points with a Q of 4.9 points. What is the reliability of this Q? From formula (18a) a Q is found to be .202. The chances are 68 in 100, therefore, that 4.9, the obtained Q, does not differ from the true Q by more than ± . 202 point. And the chances are 9973 in 10,000 that the true Q lies within the limits 4.9± 3 X . 202, or between 5 . 5 and 4 . 3 points. IV. The Reliability of the Difference between Two Measures 1. The Reliability of the Difference between Two Averages A. The Reliability of the Difference in Terms of the c(dm.) Suppose that we wish to find whether there is any difference in the performance of 10 year old boys and 10 year old girls THE RELIABILITY OF MEASURES 129 on a certain general intelligence test. The usual method of attacking this problem is to select as large and as random a sample of 10 year old boys and 10 year old girls as possible; give them our test, compute the average scores, and find the dif- ference between the two averages. If this difference is, let us say, several points in favor of the girls, such a result would be evidence (on the face of it) for believing that the average girl is better than the average boy. Before drawing this conclusion definitely, however, we should know how reliable the obtained difference is: what its probable divergence is from the true dif- ference which we should get if we could subtract the true average of the boys from the true average of the girls. 1 Otherwise, if we compared the averages of other groups of boys and girls similarly selected as our groups, we might wipe out or even reverse the difference found. One formula for calculating the reliability of an obtained difference is C(diff.) = * & (av. l)~r°" (av.2); .... (19) in which <r av . x is the standard error of the first obtained average, o"av.2 is the standard error of the second obtained average, and c«iifl.) is the standard error of the difference between the two averages. Thus to find the reliability of the difference between two averages, we must first know the reliability of the averages themselves. Let us illustrate the use and value of formula (19) by means of a problem. Problem (3) — In a study of the intelligence of foreign born white draft during the Great War, a sample of 308 native born Germans and a sample of 325 native born Danes were found to test as follows on the " combined scale:" 2 Country of Birth Germany Denmark No. of Cases Average Score 0-(dIs.) 308 13.88 2.43 325 13.69 2.23 1 Simpler methods of studying the significance of the difference between two averages are given in Chapter I, p. 40. 2 The combined scale was made up of the 8 Alpha tests, the Stanford-Binet, and tests 4, 5, 6, and 7 of Beta. The maximum score was 25. 130 STATISTICS IN PSYCHOLOGY AND EDUCATION The difference between the two obtained averages is seen to be . 19 in favor of the Germans. Is this a reliable difference? Would further testing of other groups of Germans and Danes give approximately the same difference; or is it probable that the difference would be reduced to zero, or even reversed in favor of the Danes? Stated more exactly, what is the probable divergence of this difference from the true difference between Germans and Danes? To answer these questions, we must find the reliability of the averages of the Germans and the Danes, and from these the reliability of the difference between the averages. By formula (13) the standard errors of the two averages are, For Germans: 2.43 (Tov — or .1385. For Danes: V308 — = or .1237. V325 Substituting these values in formula (19) we have that a idm = V(. 1385) 2 + (. 1237) 2 = . 1857. The actual difference between the two averages is .19, there- fore, and the standard error of this difference, earn, is . 1857. An obtained difference is interpreted in terms of its standard error in exactly the same way in which an obtained average is interpreted in terms of its standard error. Thus we may say that the chances are 68 in 100 that the obtained difference of . 19 does not diverge from the true difference by more than ± . 1857; and that the chances are 99 in 100 that . 19 does not differ from the true difference by more than 3X±.1S57 — by more than ± . 56 (see Table X) . To sum up our findings so far, we may be almost certain that the true difference between the averages of the Germans and Danes lies within the limits . 19±.56 or between —.37 and + .75. Note that the lower limit of this range is negative, THE RELIABILITY OF MEASURES 131 and in consequence there is at least some chance that the true difference is less than zero — that the average of the Danes will sometimes actually be higher than that of the Germans. In spite of the obtained difference in favor of the Germans, we cannot be 100% sure that the true difference between the average German and the average Dane is greater than zero. Just what then, it may be asked, are the chances of a true difference greater than zero between Germans and Danes? Before answering this question, let us digress for the moment to consider the following hypothetical situation. 1 Suppose that we could secure the averages of 1000 groups of native born Ger- mans and 1000 groups of native born Danes on the combined scale, the samples selected at random from the general popula- tion of native born Germans and Danes and roughly of the same size as the samples we have. Suppose further, that these groups could be paired off so that we should have 1000 differ- ences between the obtained averages of Germans and Danes, these hypothetical differences corresponding to the actually obtained differences of . 19. Now according to the best assump- tion that we can make this distribution of differences would fol- low the normal probability curve; the lower limit of the dis- tribution would be at — .37, the upper limit at . 75 and the mean at . 19 as shown in Diagram XV, Fig. 2. The mean is taken at . 19 because this is the difference actually obtained, and hence may be fairly taken as the most probable. Again, the chances are even that any other obtained difference will be greater or less than . 19; and accordingly, the logical place for this differ- ence would seem to be at the mean. The a of this distribution of differences is . 1857, the cr dlff .. Now to determine the chances that the true difference between Germans and Danes is greater than zero, we divide . 19, which is the distance of the mean difference from the zero dif- ference, by . 1857, the a of the difference-distribution. This tells 1 The argument here which differs somewhat from that on page 123 is believed to be better adapted to the present illustration than the other. The two are essentially the same, however. 132 STATISTICS IN PSYCHOLOGY AND EDUCATION us how far the zero difference is below the mean in u terms. 19 ■ ' „. is 1 . 02cr, and from Table X we find that in the normal . 1857 curve 3461 cases in 10,000 lie between the mean and 1.02cr. Adding in the 5000 cases above the mean (see Digram XV, Fig. 2) and translating cases over into " chances," it is clear that the chances are 8461 in 10,000 that the true difference between the averages of Germans and Danes is greater than zero. We may be practically certain, therefore, when we compare groups of Germans and Danes on the combined scale, that 84 times in 100 or 4 times in 5, the difference between the average scores will be in favor of the Germans. This answers the question put on page 130: "What are the chances of a true difference greater than zero between the Germans and Danes?" The obtained difference of . 19 is sufficiently large to insure considerably more than an even chance of a true difference between Germans and Danes. It is not large enough, how- ever, to guarantee that the Germans will always score higher, on the average, than the Danes. The further question arises, therefore: — how much difference would be required to insure absolute reliability, — to guarantee that the Germans will always lead the Danes. This question is easily answered with the help of Fig. 2. If the point —3a- below the mean (the point taken at — . 37) were the zero-difference point, we should then be practically certain, since the whole curve of differences would lie to the right of this point, of a true difference always greater than zero. To accomplish this, however, i.e., to shift the zero-difference point down to — . 37, the mean difference would have to be .37+. 19 or .56. This new difference (D) 56 divided by <r d , fl . would equal * . or 3a-, and the chances would . lo57 then be 9986.5 in 10,000 that the true difference between Germans and Danes on the combined scale will always be greater than zero. We may summarize the preceding paragraphs as follows. The obtained difference between the averages of the Germans THE RELIABILITY OF MEASURES 133 and Danes on the combined scale is found to be . 19, or 1/3 (approximately) of what it should be, (.56) to insure a com- pletely reliable difference. The obtained difference is large enough, however, to guarantee that 4 times in 5 the average score of the native born Germans will be higher than the average score of the native born Danes. 1 Once we understand what the <r d!fL formula means, the reliability of an obtained difference in terms of " chances that the obtained difference represents a true difference greater than zero " may be conveniently read from Table XIV. For example, when D=.19 and cam.- = • 1857, so that - = 1.02, Odlff. we find at once from the table that the chances are 84 in 100 that the true difference is greater than zero. Moreover, since a of 3 means practically complete reliability, we know that a 0"diff. of 1 . 02 is ' or about 34% of what it should be in order to insure a difference always greater than zero. It is usually customary to take a of 3 as indicative of , °dlff. complete reliability, since — Scr includes practically all of the cases in the " distribution of differences " below the mean (see Diagram XV, Fig. 2). A greater than 3 is to be taken as Cdiff. indicating just so much added reliability. B. The Reliability of the Difference in Terms of the PE(diff.) The reliability of the difference between two obtained means may be measured by the PE^m.) as well as by the a- (d , fl .). The formula for PE^m.) is PE (d m, = VP^V. d +^ 2 <av. 2), . . . (20) in which PE iax , y and PE Cav . 2 > are the PE's of the two given ob- 1 Assuming that the samples used represent adequately — at least as ade- quately as the present samples — the population of native born Germans and Danes. 134 STATISTICS IN PSYCHOLOGY AND EDUCATION TABLE XIV To Find the Chances of a True Difference Greater than Zero, Given the Actual Difference between the Two Obtained Measures, and the earn- For example: a —=1.3 means that the chances are 90 in 100 that the true ff dlff. difference (the difference between the true measures) is greater than zero. Note. — The "chances in 100" increase so slowly after 1.50 that the column increases thereafter by .10 instead of by .05. dlfl - D D . Chances in 100 Chances in 100 ""din*. ""cliff. .00 50 1.15 87 .05 52 1.20 88 .10 54 1.25 89 .15 56 1.30 90 .20 58 1.35 91 .25 60 1.40 92 .30 62 1.45 93 .35 64 1.50 93 .40 65 1.60 94 .45 67 1.70 96 .50 69 1.80 96 .55 71 1.90 97 .60 73 2.00 98 .65 74 2.10 98 .70 76 2.20 99(98.6) .75 77 2.30 99(98.9) .80 79 2.40 99(99.2) .85 80 2.50 99(99.4) .90 82 2.60 99(99.5) .95 83 2.70 100(99.7) 1.00 84 2.80 100(99.74) 1.05 85 2.90 100(99.8) 1.10 86 3.00 100(99.9) tained averages. Formula (20) is interpreted in exactly the same manner as formula (19) — a problem will illustrate its use. Problem (4) — On the two halves of the Wood worth-Wells Substitution Test 1 timed separately, 200 Barnard Freshmen made the following records : Average (Sees.) o^dls.) First half 65.51 11.13 Second half 60.32 12.04 1 Carothers, F. E., Psychological Examination of College Students, Archives of Psychology. 46, 1921, p. 36. THE RELIABILITY OF MEASURES 135 TABLE XV To Find the Chances of a True Difference Greater than Zero, Given the Actual Difference between the Two Measures AND THE P-Edlff- D For example: a PE, 1.10 means that there are 77 chances in 100 that the true cliff. difference (the difference between the true measures) is greater than zero. Note. — The "chances in 100" increase so slowly after 2.0 that the increases thereafter by .10 instead of .05. D „, . _ D D PE column diff. -P^'dlff. .00 .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 .55 60 .65 .70 .75 .80 .85 .90 .95 1.00 1.05 1.10 1.15 1.20 1 . 25 1.30 1.35 1.40 1.45 1.50 Chances in 100 PE am. Chances in 100 50 1.55 85 51 1.60 86 53 1.65 87 54 1.70 87 55 1.75 88 57 1.80 89 58 1.85 89 59 1.90 90 61 1.95 91 62 2.00 91 63 2.10 92 64 2.20 93 66 2.30 94 67 2.40 95 68 2.50 95 , 60 2.60 96 71 72' 2.70 2.80 97(96.6) 97 73 74 75 2.90 3.00 3.10 97(97.5) 98(97.9) 98 76 77 78 79 3.20 3.30 3.40 3.50 98(98.5) 99(98.7) 99(98.9) 99 80 3.60 99 81 3.70 99 82 83 84 3.80 3.90 4.00 99(99.5) 100(99.6) 100(99.7) 84 Is this gain in time from the first to the second half of the test sufficiently large to indicate a true difference in the time required to learn the key after practice, or would further testing with other groups probably reduce, or even reverse, the gain? 136 STATISTICS IN PSYCHOLOGY AND EDUCATION First, to find the probable errors of the two averages: First half: P£ ( av. i)= ' 674 ^— 1-13 ^ . 5310. By formula (14) Second half: PE(*v.2)= j== = .5743. By formula (14) Substituting PE {SLV , X) and PE itLy . 2 > in formula (20) we have PE m n.) = V(.5310) 2 + (.5743) 2 = . 7822. The obtained difference, D, is 5 . 19 and the PE m n.) is . 7822. Therefore, r^= is 6.64, and since we find from Table XV " & (diff .) (to be read exactly like Table XIV) that a ^—= of 4 indicates P& (diff.) complete reliability, it follows that our obtained difference is not only completely reliable, but is 2.64P#(6.64— 4.00) or about 66% larger than it need be in order to insure a true difference greater than zero. Just as it is customary to take a of 3 as indicative of 0"dlff. complete reliability, so a ^ = must be at least 4 in order P& (diff.) to insure complete reliability. 2. The Reliability of the Difference between Two Medians The two formulas (19) and (20), used in finding the relia- bility of the difference between two means, may be used also for finding the reliability of the difference between two medians when written: 0'«Uff.) BS ' V » 2 (m<ln. l)+0' 2 (mdn.2)j .... (21) and P ■^(dlfl.) == ^ / -f > -E'*'(mdn. 1) + -P-E""" (mdn. 2), ■ • • (-2) THE RELIABILITY OF MEASURES 137 We may illustrate these formulas by a problem: Problem (5) — The following results were obtained from a group of 12 year old boys and a group of 12 year old girls — Grades III to VIII inclusive — on the Trabue Language Scale A. 1 iV Median Q Boys 801 21 40 4.9 Girls 448 22.80 5.3 The actual difference between the two medians is 1.4 points in favor of the girls. Assuming that the two groups are fairly unselected, is this difference sufficiently large to insure a true difference greater than zero in favor of the girls? Since the measure of variability given is the Q, we shall use the formula for PE (Am .). First, to find the reliability of the two medians: For girls : PE^ (la .) = j • A^= = .3130. By formula (16a) For boys : P# (md n.) = j • 4= = . 2164. By formula (16a) Substituting in (22) we have, PE (flUL) = V(.3130) 2 +(.2164) 2 = . 3805 The obtained difference is 1.4 and the PE m n.) is .3805. Therefore, ^ is 3.68, and from Table XV we find that P -^(dlft.) the chances are 99.3 in 100 that there is a difference greater than zero between the true median scores of 12 year old boys j-?^ ) of what it should be conventionally in order to guarantee complete reliability. However, it is sufficiently high to be taken — for all practical purposes — as completely reliable. 1 Completion-Test Language Scales, 1916, p. 15. 138 STATISTICS IN PSYCHOLOGY AND EDUCATION V. Some Problems Which Involve Measures of Reliability This Section is designed to illustrate a variety of problems which require in their solution the reliability formulas given in this Chapter and the frequency tables. For quick reference later, each group of examples is preceded by a general state- ment of the essential problem involved. A. To Find the Probability That the True Average is Greater or Less than Some Designated Point on the Scale, or That it Falls within Given Limits Problem (1) — Given Average obt . = 30.2. C(di 3 .) = 6.00. N — 100. On the assumption that this sample is fairly repre- sentative of the population from which it is drawn, (a) what is the reliability of the obtained average? (b) What are the chances that the true average is less than 29? (c) greater than 31.5? (d) that the true average lies between 28 and 31? (a) From formula (13) we find that the cr av . is .6; hence the chances are 68 in 100 that the obtained average does not diverge from the true average by more than ± . 6, and that the true average falls between the limits 30.8 and 29.6. Moreover, the chances are 99.7 in 100 that 30.2 does not diverge from the true average by more than ±.6X3 or ±1.8; i.e., that the true average falls within the limits 28.4 and 32. These results are represented graphically in Diagram XV, Fig. 3. This normal probability distribution represents the distribution of means that we should expect to get from a large number of random samples, selected in the same way as the sample we have. 1 The central tendency of this hypo- thetical distribution of means is taken at 30.2, the actually obtained, and hence the most probable, mean. The standard deviation o£ the distribution is .6, the standard error of the given obtained mean. (b) What are the chances that the true mean is less than 29? 1 See the discussion on pages 122-123. THE RELIABILITY OF MEASURES 139 29 lies 1.2 points or 2a below the obtained mean of 30.2 (see Fig. 3). From Table X, we find that 4772 cases in 10,000 fall between the mean and 2a in a normal distribution; and, accordingly, 5000 — 4772 or 228 cases must lie below 2a. The chances are 228 in 10,000, therefore, that the true mean lies below — is less than — 29. (c) What are the chances that the true mean is greater than 31.5? This score is 1.3 points or 2.17o- above the obtained mean. There are 4850 cases in 10,000 between the mean and 2.17<r in a normal distribution: and 5000 — 4850 or 150 cases above this point. Hence the chances are 150 in 10,000 or about 2 in 100 that the true mean is greater than 31.5 (i.e., lies above 2.17a). (d) What are the chances that the true mean lies between 28 and 31? 28 is 2.2 points or — 3.67o- from the mean; and 31 is .8 of a point or 1 . 34c- from the mean. Between the mean and —3.67(7 in a normal distribution are 4999 cases in 10,000, and between the mean and 1.34ct are 4099 cases in 10,000. Within the interval from — 3.67<r to 1.34cr, therefore, we find 4999+4099 or 9098 cases. Stated as chances, there are about 91 chances in 100 that the true average lies between 28 and 31. Problem (2) — Given Average (obt-) = 26 . 4. PE {SLV-) = 1.5. What are the chances that the true average of the group of which the given group is a random sample is (a) as large as 30? (b) as small as 24? As in Problem (1), this situation may be represented by a normal probability curve, with the mean at 26.4 and PE equal to 1.5 (see Diagram XV, Fig. 4). (a) What are the chances that the true average of the group is as large as 30? 30 is 3.6 points or 2.4 PE above the obtained average of 26.4. There are 4472 cases in 10,000 between the mean and 2.4 PE in a normal distribution (Table XI); and 5000-4472 or 528 cases above 2.4 PE, i.e., above 30. Hence the chances are 528 in 10,000 or about 5 in 100 that the true average is as large (or larger than) 30. 140 STATISTICS IN PSYCHOLOGY AND EDUCATION (6) What are the chances that the true average is small as 24? 24 lies 2.4 points or —1.6 PE from the mean. There are 3597 cases in 10,000 between the mean and — 1.6 PE in a normal distribution, and 5000-3597 or 1403 cases below -1.6 PE. The chances are 1403 in 10,000, therefore, that the true average is as small (or smaller than) 24. B. To Find the Probability That the Divergence of an Obtained Measure from its True Measure Will be within Given Limits Problem (3) — Given Average (obL) = 152.7 and c (av .)=4.5. Find the probability that the given obtained average will not diverge (or vary) from the true, by more than (a) 1 point, (b) 3 points, (c) 5 points, (d) 10 points. (a) This is essentially the same problem, expressed in a slightly different way, as the problems under A. To find the probability that the obtained average differs from the true by as much + 1 or — 1, we must find the chances that the true mean lies within the limits 152.7=1=1, i.e. between 151.7 and 153.7. (This is shown in Diagram XV, Fig. 5). A deviation of ±1 point is a deviation of ±t~^ or ± .222c from the obtained mean. From Table X we 4.5 find that 880 cases in 10,000 in a normal distribution fall between the mean and + .222<7 or — .222a. Accordingly, 880X2 or 1760 cases fall within the interval + .222o- to — .222<r, and the chances are 1760 in 10,000 that the obtained mean will not diverge from the true mean by more than ± 1 point. 3 (6) Three points are i^— r or ^ ■ ^7 a ^ rom the mean. There are 2475X2 or 4950 cases within the interval .667cr measured off to the right and left of the mean. Hence there are 4950 chances in 10,000 that the obtained mean will not diverge from the true mean by more than dz3 points. 5 (c) Five points are zk— or d= 1 . llo- from the mean. Hence there are 3665X2 or 7330 chances in 10,000 that the obtained THE RELIABILITY OF MEASURES 141 average will not differ from the true average by more than ±5 points. (d) Ten points are ±j-r or ±2.22o- from the mean; and accordingly there are 4868X2 or 9736 chances in 10,000 that the obtained mean will not diverge from the true mean by more than ± 10 points. C. To Find the Probability That the True Difference between the Measures of Two Groups is Greater or Less than a Given Amount Problem (4) — The difference between two obtained means is 3. o" (dlft) = 1.5. (a) What are the chances that the true difference between the means of the two groups is greater than 0? (b) greater than 1? (c) greater than 3? 3 (a) Zero difference is — - or 2a below the mean of differences, I . o viz., 3 (see Diagram XV, Fig. 6). There are 4772 cases in 10,000 between the mean of a normal distribution and 2a. Accordingly, there are 5000+4772 or 9772 chances in 10,000 that the true difference is greater than zero. (Note that this result may be read off directly from Table XIV— that = 2.) tfdlff. 2 (6) One is — — or 1 . 33o- below the mean. There are 4082 1.5 cases in 10,000 in a normal distribution between the mean and 1 . 33(7. The chances, therefore, are 5000+4082 or 9082 in 10,000 that the true difference is greater than 1. (c) What are the chances that the true difference is greater than 3? The obtained difference of 3 has been placed at the mean of differences as the obtained, and hence the most prob- able difference. The chances are even, therefore, or 50-50 that the true difference is greater (or less) than 3. Note that is 0"(dlff.) —^ or 0. (Table XIV.) 142 STATISTICS IN PSYCHOLOGY AND EDUCATION VI. Limitations to Reliability Formulas, and Cautions to be Observed in Interpreting Them The formulas which have been given in this chapter for calculating the standard errors of obtained measures of central tendency and variability make use of only two characteristics of the distribution from which the measure has been obtained, viz., the a (distribution) — the spread of the measures — and N, the number of cases. It is obvious that so far as the formulas themselves are concerned there is nothing which would prevent our finding a standard error for a measure obtained from any group. Such a general and uncritical appli- cation of reliability formulas, however, will almost surely lead to erroneous conclusions, and for this reason it is necessary to indicate briefly some of the limitations to reliability formulas as well as some cautions to be observed in interpreting results secured from them. (1) In the first place, in interpreting standard errors we always make the assumption that measures obtained from successive samples are distributed according to the normal probability curve. This assumption is only true, however, when the number of cases is large; it is not valid when the sample is small. Hence the significance of a measure of relia- bility is conditioned upon our having a sufficiently large number of cases. If N is less than 25, there is little sense or justifica- tion in using reliability measures. One simple and practical method of judging whether the sample is " sufficiently " large is to continue taking independent measures or adding cases drawn at random, until the addition of extra cases fails to produce an appreciable fluctuation in the average or median. When this point is reached the sample is probably large enough to be taken as fairly representative of the larger group from which it has been drawn. As a corollary it must be recognized, however, that mere numbers are not in themselves a guarantee of a representative sample. (2) A more serious limitation to the measures of reliability THE RELIABILITY OF MEASURES 143 arises from the fact that standard and probable errors of obtained measures can be assumed to measure only those errors which result from fluctuations due to " random sampling." An illustration will make this term clear. On page 122 we found that the obtained average height of 8585 adult British males was 67.46 inches with a standard error of .0277 inch. This means that the chances are 997 in 1000 that the true average height of British males lies between 67.54 and 67.38 inches. Now by "true average height" we mean the average height of all British males, from whom our group of 8585 is an attempted random sampling. If our group were per- fectly representative, its average would equal the true aver- age exactly. Except by chance, however, neither this sample nor another similarly selected, and approximately of the same size, will represent the entire population perfectly; and further- more, it is extremely unlikely that the averages calculated from successive samples will equal each other. Nevertheless, if the samples are actually random, and there are no large con- stant errors present, the calculated averages will tend to vary around the true average of the whole group within a compara- tively small range. ( Variations like these, which arise from the fact that we must generally work with samples instead of the whole population, are called " errors of sampling." The function of the standard and of the probable errors is to give a measure of this sampling error, i.e., of the probable amount of deviation to be expected in an obtained measure from the corresponding true measure, as a result of working with a single sample. In other words, the standard or probable error meas- ures the error made in taking a sample as representative of the larger group or population. If the standard error of a given mean is small, it does not follow that the obtained mean is highly reliable, necessarily; a small standard error indicates merely that the reliability is high, in so far as fluctuations due to differences in sampling are concerned. Reliability formulas give no measure of the effects of errors due to other causes than those which arise from sampling. 144 STATISTICS IN PSYCHOLOGY AND EDUCATION Errors which arise from the failure to get a random sample, for example, are neither detected nor measured by these formulas. To illustrate this point, the average Army Alpha score made by 500 college men between the ages of 18 and 25 will not be representative of the male population of this age-range. Col- lege men form a highly selected group, and in consequence, other samples of 500 drawn at random from the male population between the ages of 18 and 25 will return very different results from that of the college group. These differences in average score cannot be attributed to errors of sampling; and to take this group as representative of the general male population between the ages of 18 and 25, and to calculate the standard error of its average will lead to an entirely erroneous idea of the intelligence of the general population. (The given sample might, of course, serve very well as a group representative of the population of college men.) Other variations not measured by the reliability formulas arise from errors due to practice, fatigue, coachability of tests, faulty technique in giving and scoring tests, and, in fact, errors due to a bias of any sort. Standard errors calculated for measures secured from samples which contain such errors will always be of doubtful value. The careful study of successive samples, retests when practicable, care in controlling conditions, and the use of objective checks whenever possible, will eliminate many of these troublesome and prolific sources of error. Assuming that constant errors are small or practically negligible, one of the simplest tests of the adequac}^ — the " representative- ness" — of a sample consists in taking several other groups of approximately the same size from the general population. If the measures calculated from these groups are of very nearly the same size, we may be reasonably assured that we have representative samples. If the similarity is not fairly close, we must continue adding cases until the successive samples are approximately similar. Oftentimes more information may be secured in regard to the reliability of our measures in this THE RELIABILITY OF MEASURES 145 way than could be obtained from a blanket use of reliability formulas. (3) In concluding this discussion, we should add one word in regard to the use of formulas which measure the reliability of the difference between two obtained measures, namely, oW.) and PE@w.)- These formulas make allowance only for variable errors in the original measures — for errors which arise in sampling. Constant errors in the original scores and errors of the sort mentioned above are not detected, nor their influence measured. Furthermore, these formulas always assume that the measures or scores in the two series which are compared are uncorrelated (see page 288). These limitations must be borne in mind when using or interpreting differences in terms of the " true " difference. . . . VII. — Summary of Reliability Formulas 1. The Reliability of Measures of Central Tendency (1) The Average or Mean i „ — q ' (dl3 -> (\<X\ -l. <T(aver.) — ,— - \lO) 9 PF - ■ 6745(7 (dls-) nA \ L. /'^(ave,..) — -== ^14; (2) The Median 1 ^ _ 5 g~(diS.) y--v 1- 0-(mdn.)-^7/^ UOj I. J PA (mdn . ) =- — -= — (16) 3. -P^Cmdn.) = T ,— (16a) 2. The Reliability of Measures of Variability (1) The Standard Deviation i. ff „=^ (17) 146 STATISTICS IN PSYCHOLOGY AND EDUCATION (2) The Quartile Deviation <e,_ V2N (I8) '""-vw (18o) 3. The Reliability of the Difference between Two Measures (1) The Average 1- 0"(dlff.) = VCT (aver. 1)4*0" ( ave r. 2) (19) 2. PE(am.) = vPE (aver.l)-\-PE (aver. 2). ■ ■ ■ (20) (2) The Median 1- 0"(dlff.) — ^C^Cmdn. l)~rf w (mdn. 2) (21) 2. PE {a ift.)=vPE 2 ( man . i)+P-E 2 ( mdn. 2). • . • (22) PROBLEMS Note: For uniformity in figuring "chances" in the following problems, take all a and PE distances to three decimals and correct back to the second place. Count all fractions over one half as wholes and drop all under one half. For example, write 1.876<r as 1.88a; .023 PE as .02 PE, etc. 1. Given that the obtained average is 26.4; a is 3.2; N is 100. {a) What are the chances that the true average for the 10,000 from which the 100 cases measured are a random sampling will be greater than 27? (b) That it will be between 26 and 27? (c) What are the chances that the true variability will be between 3.1 and 3.3? (d) That the true variability will be less than 3 . 5? 2. Given: Median = 72 . 40. Q = 12.84. N = S1. (a) What are the chances that the true median of the population from which this random sample is drawn is above 75? (b) That it lies between 70 and 74? (c) What are the chances that the true Q is not greater than 15? (d) That it lies between 10 and 14? THE RELIABILITY OF MEASURES 147 3. Given: Av. 1=29.6. <r (dtoi) = 3 . 54. N=100. Av. 2 = 28.4. o- (dl8 .) = 5.36. # = 225. (a) Find the o- av . for both distributions. (6) Find the reliability of the difference between the means, (c) What difference would be completely reliable, assuming that the variability remains practically unchanged? 4. In Example 2, page 56, find the reliability of the difference between the means of distributions A and B [use the <r (difl .)]. 5. Average (obt-) =K. PE (Siy) = 3.5. What are the chances that the true average will not diverge from the obtained by more than (a) 1, (b) 3, (c) 10. 6. Given that Mdn. 1-Mdn. 2 = 3.6. PE idm = 3 . 0. (a) What are the chances that true difference is less than 0? (b) That it is 1 or more? (c) What per cent is the obtained difference of the difference neces- sary for complete reliability? 7. Find the reliability of the average in (a) Example 4, page 116. (b) Example 5, page 116. 8. In a random sample of 100 cases each from the four groups A, B, C, and D, the following are obtained : A. Average = 101. cr (dls) = 10 . 0. B. Average = 104. <r (dIs . ) = 11.0. C. Average = 93. o- (dls<) = 9.6. D. Average = 86. c^\s.)— 8-5. What are the chances that, in general, the average of (a) the A's is better than the average of the B\s. (6) the A's is 5 better than the average of the C's. (c) the A's is 10 better than the average of the D's. What are the chances that (a) a B will be better than the average A. (6) a B will be better than the average C. (c) a B will be better than the average D. 148 STATISTICS IN PSYCHOLOGY AND EDUCATION A^SWEBS 1. (a) 3 in 100. (b) 86 in 100. (c) 34 in 100. (d) 91 in 100. 2. (a) 16 in 100. (b) 55 in 100. (c) 90 in 100. id) 71 in 100. 3. (a) 0- av . i = • 354. o- av 2 = . 357. (6) 99 chances in 100 of a true difference (c) 1.51. 4. 92 chances in 100 of a true difference. ( 5. (a) 15 in 100. (6) 44 in 100. (c) 95 in 100. 6. (a) 21 in 100. (6) 72 in 100. (c) 30%. 7. (a) o- av .= .0791. (6) P# av .= .318. (Table XIV)< a) 222 in 10,000. b) 9846 in 10,000 or 99 in 100. c) 9999.277 in 10,000 (100%). a) 61 in 100. b) 84 in 100. c) 95 in 100. CHAPTER IV CORRELATION I. What is Meant by Correlation Up to this point in our discussion we have concerned our- selves chiefly with methods of computing statistical measures which shall represent in a reliable way the performance of an individual or a group in some denned capacity or trait. Fre- quently, however, it is of greater importance to examine the relation of some capacity, such as general intelligence, to some other capacity, such as musical ability, than to measure performance in a single trait alone. For example, we may ask whether there is any relation between general intelligence as measured by a standard intelligence test and scholastic achievement as measured by " grades " or " marks." Or, more specifically, we may inquire whether an individual who gives evidence of high general intelligence tends to outstrip the average individual in school work. Again, knowing the ability of an individual in one test, can we say anything about his ability in another and different test? Are certain abilities highly related, and others relatively independent? These questions, and others of the same general nature, are studied by the Method of Correlation. The statistical device whereby relationship is expressed on a quantitative scale is called the " coefficient of correlation," and is designated by the letter " r." Let us consider first the situation where the correlation is fixed and unchanging. We know that the circumference of a circle is always 3.1416 times its diameter, no matter how large or how small the circle, or in what part of the world we 149 150 STATISTICS IN PSYCHOLOGY AND EDUCATION find it. Each time that we increase or decrease the diameter of a circle, we increase or decrease the circumference by just 3.1416 times the same amount. In short, the relation is fixed and definite, and hence we say that the " correlation" between diameter and circumference is perfect, and that r is equal to 1.00. In like manner, if we find that 100 men take exactly the same arrangement in two tests, so that the man who ranks first (or highest) in the one ranks first in the other, the man who ranks second in the first test ranks second in the other, and that this one-to-one correspondence holds throughout the entire list, the correlation here is perfect also, for the relative position of each man is exactly the same in one test as in the other. The coefficient of correlation, r, is equal to 1.00. Now let us consider the case where there is just no relation at all. Suppose that we have examined 100 college seniors on the Army Alpha test and on a tapping test. The average Alpha score for the whole group is 175, and the average tap- ping rate is 185 taps in 30 seconds. Suppose further, that when we divide our group into three equal parts, the average Alpha score of the upper one-third is 190, and the average tapping rate 184; the average Alpha score of the middle third is 175 with an average tapping rate of 186; and the average Alpha score of the lowest one-third is 160 with an average tapping rate of 185. Now clearly since the tapping rate is almost identical in all three groups, we should be unable to draw any conclusion from a man's tapping rate alone as re- gards his probable score on Alpha. An average tapping rate of, say, 185 to 190, is as liable to be found with an Alpha score of 150 as with one of 175 or even 200. We should be as well qualified, then, to estimate a man's Alpha score knowing only his tapping rate as we should be able to estimate it if all we knew about the man in question was that he had blue eyes and light hair. In either case our estimate would be no better than a guess. There is, therefore, little or no correspond- ence in the degree or amount of capacity possessed by a given individual in the traits measured by the two tests, and the CORRELATION 151 coefficient of correlation r will equal zero, which means that there is just no correlation present. So far we have indicated that perfect relationship may be expressed by a coefficient of 1.00, and that just no rela- tion by a coefficient of 0. Between these two limits we may have relations of varying degree, indicated by such coeffi- cients as .30, .60, .90. In every case a coefficient between and 1.00 implies some degree of positive association, the degree of association depending on size of the coefficient. Relation may be negative as well as positive, however. That is, a large degree of one ability may be associated with a small degree of another, or vice versa. When this inverse relation is perfect, r equals — 1 . 00. To illustrate, suppose that in a certain group of 25 boys, we find that the boy standing highest in Latin ranks lowest in Shop Work; that the boy who stands second in Latin stands next to the bottom in Shop Work ; and that any given boy is found to stand exactly the same distance from the top of the group in Latin as he stands from the bottom of the group in Shop Work. Table XVI on p. 152 will illustrate the situation. The correspondence here is fixed and definite enough, but the relation is inverse. Hence the correlation, while perfect, is negative, and the coefficient of correlation r equals — 1 . 00. Negative coefficients may range all the way from — 1 . 00 up to 0, just as positive coefficients range from 1 .00 down to 0. Coefficients of correlation, then, may range up and down on a scale which extends from — 1 . 00 through to + 1 . 00. A positive correlation indicates a positive relation or correspond- ence; a zero correlation the absence of relation; and a negative correlation indicates an inverse relation. While for the sake of simplicity, we have illustrated above only perfect positive, perfect negative, and zero correlation, only rarely do we get coefficients at the extremes of the scale. In most cases cal- culated coefficients will be found at intermediate points, e.g., at .90, . 20, — . 30, etc. Such intermediate values as these are to be interpreted as " high " or " low " in a general way 152 STATISTICS IN PSYCHOLOGY AND EDUCATION depending upon how close they are to ± 1 . 00 or 0. A more complete discussion of the meaning of a correlation coefficient is given later on page 160. TABLE XVI To Illustrate a Correlation of -1.00 Boy Standing in Latin Standi ing in Shop Work 1 1 25 2 2 24 3 3 23 4 4 22 5 5 21 6 6 20 7 7 19 8 8 18 9 9 17 10 10 16 11 11 15 12 12 14 13 13 13 14 14 12 15 15 11 16 16 10 17 17 9 18 18 8 19 19 7 20 20 6 21 21 5 22 22 4 23 23 3 24 24 2 25 25 1 II. The Coefficient of Correlation: — What it is, and What It Does 1. The Coefficient of Correlation as a Ratio Instead of taking up directly the method of computing an r, we shall first try in this section to give a clear notion of just what an r represents and how it measures relationship. The steps in the calculation of r by the "product-moment ' method — the standard method — will then be given in detail in the next section. Let us begin with Diagram XVI. This diagram, which is CORRELATION 153 DIAGRAM XVI To Show How Correlation May be Expressed as a Ratio Weight in Kgs. (X- variable) 45- 49 50- 55- 54 59 60- 65- 64 69 70- 74 75 79 80- 84 189 1 185 / "3 184 1 3 3 4 2 3 XJ 180 / /// /// //// // /// eS "S > 179 4 11 6 3 2 2 TO 175 //// Mm// m/ /// // // 174 2 9 11 8 2 1 H 170 // M//// m m m/ /// // / a 169 1 5 7 10 3 fell 165 / m m/// m/m/ /// 164 1 2 7 i 2 160 / // m/// / 159 1 1 i 155 / / / Fy Av.wt. 1 82.5 16 71.3 28 66.4 33 62.8 26 59.2 13 57.9 3 54.2 Fx 10 28 37 22 Av. ht. 162.5 166.5 169.8 172.8 173.6 178.6 178.5 (A) Weight 80-84 75-79 70-74 65-69 60-64 55-59 50-54 45-49 « Av. ht. for given wt. 181.7 « 178.5 7 178.6 S 173.6 S 172.8 ^ 169.8 S 166.5 | 162.5 « Height 185-189 w 180-184 7 175-179 I 170-174 X 165-169 ~ 160-164 | 155-159 * 6 120 181.7 (B) Av. wt. for given ht. 82.51 £ 71.3J 71 - 9 1 66.4 3 62.8 I 59.2 S 57.9 54.2 a Increase in average height 19.2-^-6.55 = 2.93 Corresponding increase in actual weight 37 . 5 -f- 7 . 75 = 4 . 84 Ratio, ttt7 = -60 4.84 Increase in average weight 17.7^-7.75 = 2.28 Corresponding increase in height 25-^6.55 = 3.82 Ratio, |^|= .60 Average height = 172 . 6 cms. (rbt. = 6 . 55 cms. Average weight = 63 . 4 kgs. <r w t. = 7 . 75 kgs. Ratio, -^-' = ~Tr- = 118 p-ht. o . 55 154 STATISTICS IN PSYCHOLOGY AND EDUCATION called a " scatter diagram," represents the paired heights and weights of 120 college men. The construction of such a scat- ter diagram is relatively simple. Along the left hand margin from bottom to top are laid off the steps of the height distribu- tion; while along the top of the diagram from left to right are laid off the steps of the weight distribution. Each of the 120 men may now be located on the diagram with respect both to his height and his weight. Suppose, for example, that a man weighs 68 kgs. and is 176 cms. tall. His height locates him in the 3rd row from the top, and his weight in the 5th column from the left. Accordingly, this man belongs in the third " cell " of the 5th column and a tally is put in this cell. Note that in Diagram XVI there are 6 men and 6 tallies in this cell — that is, there are 6 men who weigh 65 to 69 kgs. and are 175 to 179 cms. tall. In the manner described every one of the 120 men has been located in some cell or square according to the two attributes, height and weight. Along the bottom of the diagram in the Fx row will be found the number of men who fall within each weight column (weight is the ^-variable, page 60) ; while along the right hand margin in the Fy column are tabulated the number of men who fall within each height row (height is the F-variable, page 60). Of course, both the Fy column and the Fx row total 120, the number of men in all. All of the frequencies in each cell may be totaled and written in numerical form as shown in the diagram. When only the total frequency in each cell is given, a scatter diagram becomes a correlation table (see Diagram XXI). Several important facts may be gleaned from the scatter diagram as it stands. For example, we are able to classify all the men in a given weight-column with regard to height. In the 3rd column we find 28 men all of whom weigh 55 to 59 kgs. One of these 28 is 180 to 184 cms. tall; 4 are 175 to 179 cms. tall; 9 are 170 to 174 cms. tall; 7 are 165 to 169 cms. tall; and 7 are 160 to 164 cms. tall. In the same way we may classify all the men within any height-row accord- CORRELATION 155 ing to weight. In the row next to the bottom we find that of the 13 men who are 160 to 164 cms. tall, 1 weighs 45 to 49 kgs.; 2 weigh 50 to 54 kgs.; 7 weigh 55 to 59 kgs.; 1 weighs 60 to 64 kgs.; and 2 weigh 65 to 69 kgs. It is fairly clear, too, that the " drift" of paired heights and weights is from the upper right section of the diagram (the "high score" end) to the lower left hand section (the "low score" end). That is to say, even a superficial examination of the diagram indicates, in general, a fairly marked tendency for tall, medium, and short men to rank high, medium, and low, respectively, on the weight scale; and this observation holds, in spite of the scatter of heights or weights within any given "array" (an array is the distribution of cases within a given column or row) . Without any further evidence, therefore, we should probably be willing to hazard the guess that the correlation between height and weight is positive and fairly high. Suppose that we go a step further and calculate the average height of the men who weigh 45 to 49 kgs. — the men in column 1. The average height of these 3 men — using the guessed average method of Chapter I — is 162.5 cms., and this figure is entered at the bottom of the diagram. In the same way, we can find the average height of the men who fall in each of the succeeding weight-columns. These averages are tabu- lated under (A) and from the summary it is evident that for an actual weight increase of approximately 37.5 kgs. 1 (from 47.5 to 85) we have a corresponding increase in average height of 19 . 2 cms. (from 162.5 to 181.7). Thus it is clear that in our group of 120 college men, an increase of approximately 37.5 kgs. in weight is paralleled by increase of 19.2 cms. in average height. Before going any further let us shift from height to weight, and applying the same method as above find the increase in average weight which corresponds to the actual increase in height. Taking the bottom row — the 3 men 155 to 159 cms. tall — we find that the average weight of this small group is 1 The complete range is not taken into account because the data are scanty at the ends of the distribution. 156 STATISTICS IN PSYCHOLOGY AND EDUCATION 54.2 kgs. The average weight of the 13 men who are 160 to 164 cms. tall is 57.9 kgs., and in like manner the average weight of each height-row may be found and entered in the " Average Weight" column. Summarizing the results for the group in (B) as we did in (A) above, we find that along with an increase in height of 25 cms. (160 to 185) there goes a cor- responding increase in average weight of 17.7 kgs. 1 (71.9 to 54.2). Now if the coefficient of correlation measures the mutual dependence or the degree of correspondence between two sets of scores or measures, we should expect the ratio increase in average height 19.2 . ,. e.g., ^— to measure the cor- corresponding increase in weight' 37.5 relation of height and weight, that is, to give us r. And like- wise, and for the same reasons, we should expect the ratio increase in average weight 17.7 , ,, e.g., -^=- also to measure the corresponding increase in height' 25 correlation. The two ratios work out, however, to be . 51 and .71 respectively, which means evidently that neither is suit- able as a measure of correlation, since the relation of height to weight should certainly be the same as the relation of weight to height in the same group. The difficulty here — and while not an obvious one, it is easy to understand once it has been pointed out — is that we have failed to take account of the fact that the increases in height and weight, and naturally the ratios formed from them, depend for their numerical value upon the units which we have arbitrarily chosen for measuring height and weight. Thus while we have measured height in cms. and weight in kgs., it is clear that different units, say, of 1 mm. for height and 1 kg. for weight, or of 1 inch for height and 1 lb. for weight, would have given us very different ratios. In other words, the ratios which give the change in average height with corresponding change in weight, and the change in average weight with cor- i The single F in the top row has been combined with the F of the row just below to prevent overweighting. CORRELATION 157 responding increase in height will vary according to the units in which height and weight are measured, and we have no way of telling which ratio (or what unit) is the right one. The best way out of this difficulty is to express the changes in height and weight in terms of the a's of the height and weight distributions, respectively. It will make no difference then in what units our original measurements have been made, as changes in both height and weight will be recorded in terms of <j. The <j of the height distribution of our 120 men is 6.55 cms., and the a of the weight distribution is 7.75 kgs. (see Diagram XVI). Accordingly, if we divide the increase in average height and the parallel increase in weight by 6.55 and 7.75 . . ! „ . . increase in average height , respectively, the ratio T . — ^— -. — . , - becomes corresponding increase in weight 2 93 . ' j or .605 (see Diagram XVI). And in like manner, if we divide the increase in average weight and the parallel increase in height by 7.75 and 6.55, respectively, the second ratio, increase in average weight , 2.28 ^ „. becomes - — or .60. lire two corresponding increase in height 3 . 82 ratios are now equal, and either may be taken as representing the coefficient of correlation 1 — as giving the degree of association between height and weight in our group of 120 men. This method of finding relationship is useful for demon- strating in a simple way what the ratio which we call the coeffi- cient of correlation actually does. It is, however, neither a very practical nor precise method of finding a coefficient of correlation and is never used in actual practice. Its chief lack of precision lies in the fact that in estimating the range of scores or measures in either or both distributions (see footnote, page 155) we are often uncertain where to begin or end the series, due to the fact that the data are oftentimes scanty at the extremes of the distributions. As a matter of fact, the coeffi- cient of correlation in the present problem was first found 1 On a scale in which 1.00 denotes perfect relation. 158 STATISTICS IN PSYCHOLOGY AND EDUCATION by the method given later on in Section III, and proper adjust- ment was then made in the ranges so as to give the correct r. 2. Graphical Representation of the Coefficient of Correlation Not only can we represent the coefficient of correlation as a ratio, but we can also demonstrate graphically what a coeffi- cient of correlation means. The correlation coefficient of . 60 found in Diagram XVI between height and weight is shown graphically in Diagram XVII. In this diagram the distance taken to represent one unit (consider the step-interval as the unit) on the height scale and the distance taken to represent one unit on the weight scale have been selected with due regard for the difference in size of the two cr's in order that changes in height and weight may be comparable. This adjustment is a very simple one. We know from Diagram XVI that the cT( Wt .) which equals 7.75 kgs. is 1.18 times the or (ht .) which equals 6.55 cms. (since ' ' =1.18). Hence it is only neces- sary that we take each height-step 1 . 18 times the length ar- bitrarily taken to represent one weight-step, in order that the X and Y distances may be comparable. (Since the weight distribution is laid off from left to right, and the height dis- tribution from bottom to top, the first may be referred to as the X variable, and the second as the Y variable, see page 60.) To take a simpler case, if the a for height were twice as large as the a for weight, we should take each step on the height scale just \ each step on the weight scale. When the diagram has been laid out in the manner described above represent by a cross the mean height of the men in each array — each weight column (these mean heights may be found from Diagram XVI). Next, draw a vertical line through the mean of the distribution of 120 weights, and a horizontal line through the mean of the distribution of 120 heights. [The average height of the 120 men is 172.6 cms., and their average weight is 63.4 kgs. (see Diagram XVI)]. With these two lines as coordinate axes, draw through their CORRELATION 159 intersection (the origin) a straight line which shall go through, or as close as possible to, each of the crosses which have been plotted. A rough — but fairly accurate — method of drawing a .22 *^ > *"' T— I a H o a •i-l C5 +3 CD S 2 45-49 50-54 Weight in Kgs. (X - variable) 55-59 60-64 65-69 TO -74 rs-79 80-84 sc=3 o /o "^ X II x/ X ? y=3 X* £C=5 xx y/y. °/ X / ° ' o Average weight line drawn through 63.4 kgs. height " " " 172.6 cms. DIAGRAM XVII Coefficient of Correlation Shown Graphically such a line is to stretch a black thread through the origin and shift it back and forth until it touches as many crosses as possible. The crosses at the extremes need not concern us very much, since they are located from only a few cases. This 160 STATISTICS IN PSYCHOLOGY AND EDUCATION sloping line, which may be called the line of " best fit," describes better than any other straight line the " run " of the crosses — the increase in average height which corresponds to the given increase in weight. Accordingly, to find the correlation simply find the ratio of the distance of any point on this sloping line from the horizontal or X-axis to the distance of the same point from the vertical or Y-axis. For example, if a convenient point P is taken with x = 5 cms., its y distance (measured by mm. ruler) will be found to be approximately y . 3 3 cms., and the ratio - is -= or .60. In like manner, the x and x 5 y coordinates of any other point on this sloping line will be y found to give the ratio - a value of . 60. x 2 93 Our sloping line pictures graphically the ratio ' — the 4 . o-± correlation of .60 — which we worked out in (1) above. This line, which will be known hereafter as the " regression line of height on weight," has important properties which will be considered later (page 173). Also in the following sections we shall give the equation of this line, which will enable us to draw it in on the diagram very much more accurately than can be done by the trial-and-error method described on page 159. It is a comparatively easy though not a necessary task to verify the correlation coefficient of .60 found from the regression line of height on weight by drawing in the second " regression line," that of weight on height. This can be done by designating the means of the different height -rows by circles in exactly the same manner in which we marked the means of the weight-columns by crosses. (The means of the rows may be obtained from Diagram XVI.) The mean of the lowest row is 54 . 2, of next above 57 . 9, etc. When all of the circles have been correctly placed, we draw a straight line which shall go through — or as close as possible to — each circle, just as we did with the crosses above. Now if a point P' is taken on this second line with a y = 5 cms., its x distance will be found to be CORRELATION 161 approximately 3 cms., and the ratio - is .60. This relation holds for any point on the line. Both regression lines, there- fore, give us the same measure of the correlation between height and weight. Diagram XVII is still further useful in showing just what a correlation of 1.00, 0, or —1.00 is graphically. Suppose (1) that the two regression lines in the figure move together until they coincide in such a way as to make an angle of 45 degrees with the horizontal or X-axis. The x value of any point on this " compound " line will always equal its y value — hence the ratios - and - are always equal to each other l and r equals 1 . 00 (see Diagram XVIII). Accordingly, in perfect positive cor- relation, ail the crosses and all the circles in a correlation diagram fall along a single straight line which runs from the upper right hand section of the diagram (the 1st quadrant) to the lower left hand section (the 3rd quadrant). x The tallest man is the heaviest, the next tallest, the next heaviest, and throughout the entire 120 the correspondence of height diagram xviii and weight is always 1 to 1. Now suppose (2) that the first regression line, the line through the means of the height arrays in the columns — through the crosses — moves around until it coincides with the X-axis, the line through the average of all the heights in the table. And suppose again that the second regression line, the line through the means of the weight arrays in the rows — through the circles — moves around until it coincides with the F-axis, the line through the average of all the weights in the v x table. The ratios - and - are now both equal to (since in x y the first case x, and in the second case y, equals 0) and r, the 1 This is true also because the compound regression line becomes the diagonal of a square. Again, the tangent of an angle of 45° = 1.00. ftf 162 STATISTICS IN PSYCHOLOGY AND EDUCATION o o o )( X X X XX o C) C) DIAGRAM XIX coefficient of correlation, equals 0. The conclusion that r = might also be drawn from the fact that under the conditions described the average height is the same for the whole range of weights and the average weight the same for the whole range of heights. Hence, a man of average height is equally liable v to be heavy, medium, or light, and a man of average weight equally liable to be tall, medium, or short. (Compare with the case in which the average tapping rate was the same for very high, high, and medium high Alpha scores, page 150.) A picture of zero correlation is shown in Diagram XIX. Lastly, suppose (3) that the two regression lines swing around until they run from the upper left hand section (the 2nd quadrant) to the lower right hand section (the fourth quadrant). Now if the two lines again coincide so as to make an angle of 45 degrees with the X-axis — as described in (1) — the x of any point on this compound line will always equal the v x y of the same point, and the ratios - and - will again always x y equal 1.00. A glance at the figure will show, however, that either the x or the y of these ratios must always be negative, and for this reason the ratios will always be negative. The coef- ficient of correlation, therefore, equals — 1.00, and the relation is perfect but inverse. In perfect negative correlation, it is clear then that all of the crosses and all of the circles fall along a single straight line which runs from the upper left to the lower right hand corner of the diagram. The tallest man in the group is the lightest, the next tallest the next lightest, and as height de- creases weight increases progressively. (Diagram XX.) The regression lines coincide only when the correlation is perfect — positive or negative. For degrees of correlation 45 > DIAGRAM XX CORRELATION 163 between these limits, the two regression lines are separate, and take intermediate positions as shown in Diagram XVII for an r = . 60. III. The Calculation of the Coefficient of Correlation by the Product-Moment Method 1. The Product-Moment Formula When Deviations Are Taken from the Guessed Averages of the Two Distributions With the meaning of a coefficient of correlation firmly in mind as a result of the discussion of the last section, we are now ready to consider the calculation of r by the product- moment method. 1 Diagram XXI will serve as an illustration of the computations involved. This correlation table gives the paired heights and weights of 120 college men and is derived from the scatter diagram for the same data shown in Diagram XVI. The complete process of calculating r is out- lined in the following steps. (Diagram XXI should be con- stantly referred to in the discussion that follows.) Step I Construct a scatter diagram and from it a correlation table as described on page 154. Step II Guess an average for the height distribution (given in the F y column), and draw double lines to mark off the row which contains the GA^, as shown in Diagram XXI. Note that the average for the height distribution has been guessed at 172.5 (midpoint of interval 170-174) and that D y 's have been taken from this point. Now fill in the FD y and the FD y 2 columns. From the first column the correction C v (cy in units of step) is obtained; and this correction together with the sum of the FD y 2 column will give the <j of the height distribu- tion, uy. The value of <r y is 6.55 cms. (1.31X5) — see calcula- tions in the Diagram. 1 The r found by this method is often called the " Pearson r " after Prof. Karl Pearson, who devised the product-moment formula, following Bravais's earlier work. 164 STATISTICS IN PSYCHOLOGY AND EDUCATION DIAGRAM XXI Calculation of the Product-Moment Coefficient of Correlation between the heights and weights of 120 college men Weight in kgs. CX variable) 4549 50-54 55-69 60-64 65-69 70-74 75-79 80-84 Fy By 3 2 I -1 , a -3 FD y FDfr 2a»V 3 9 12 oo (12) 1 12 I 16 28 33 26 13 3 *•*! TO C l-H "ice 3 (-2) 1 -2 3 3 (?) 6 (4) 4 16 (G) 2 12 3 <8> 24 32 64 68 2 (-1) 4 -4 11 (l) 6 6 (2) 3 6 (3) 2 6 (4) 2 8 28(63) 28 26 4 Eg 2° 9 11 8° 2° 1° (3) 1 3 (2) 5 10 (1) 7 7 10 (-1) 3 -3 -26 26 20 3 <S 3 CO (6) 1 6 2 <4) 8 7 (2> 14 1 -4 -26 52 28 4 J? (9) 1 9 (6) 1 6 1 - 9 (-61) 27 15 Ea Ac 3 10 28 37 22 -3-2-1 1 9 5 6 120 2 3 4 2 206 159 -13 (146) iFDa; -9 -20 -28 (-57) 22 18 15 24 (79) =22 ,.FZ>| 27 40 28 22 36 45 9S = 2Q4 Calculation of r: VEST- 017 22 Cx = ^-T=.183 146 Y^-.017X.183 c 2 2/=.0003 c 2 *=.0334 r 1.31X1.55 Cy=.0 85 <5 Cx=. 915 r = .60 S-.OOOS) /294 / 0334X5 PEr .6745[l-(.60) 2 ] Vl20 <ry = 1.3lX5 <rx = 1.55X5 PEr = .04 (Table XVIII) tTy = 6 . 55 <rz = 7.75 Now guess an average for the weight distribution (given in the F x row) and draw double lines to designate the column which contains the GA {yrt , ) . The average of the weight distribution has been guessed at 62.5 (midpoint of interval 60-64) and ZVs have been taken from this point. Fill in the FD X and FD X 2 rows. From these rows the correction C x CORRELATION 165 (c x units of step) and the a of the weight distribution a x , may be obtained. The value of a x is 7.75 kgs. (1.55X5) — see calcula- tions on the Diagram. Step III The calculations in Step II simply repeat the familiar proc- ess of finding a <r by the Guessed Average Method. (Chapter I, page 35.) Our first new task is to fill in the 'Zx'y' column. The entries in this column may be either + or — , and hence two columns are provided under ^x'y', one for plus and one for minus entries. The procedure for determining the entries in the 2x'y' column may be illustrated by taking the single entry in the only occupied cell in the topmost row. The deviation of this cell from the GA of the weight distribution, that is, its D x , is 4 steps, and its deviation from the GA of the height distribution, its D y , is 3 steps. Hence, the product of the deviations of this cell — its " product-moment " — from the two guessed averages is 4X3 or 12, and a small figure 12 is placed in the upper right hand corner of the cell. 1 Moreover, since the " product- moment " of the 1 frequency in this cell is 1(4X3) or 12 also, a figure 12 is placed in the lower left hand corner of the cell to denote the product of the deviations (or the product-deviation) of this single frequency from the two GA's. There are no other frequencies in the cells of this row, and 12 is placed at once in the Xx'y' column 2 under the + sign. Now let us consider the next row from the top, taking the cells in order from right to left. The cell below the one whose product-deviation we have just found, also deviates 4 steps from the GA of the weight distribution (its D x = 4) but its devia- tion from the GA of the height distribution is only 2 steps 1 We may take the coordinates of this cell to be x = 4, and y =3. The first is obtained by counting over 4 steps from the vertical column containing the GA for weight, and the second by counting up 3 steps from the horizontal row containing the GA for height. In each case the unit of measurement is the step- interval. 2 The prime (') of x and y deviations is to indicate that all deviations are taken from the two GA's. 166 STATISTICS IN PSYCHOLOGY AND EDUCATION (its D y = 2). Hence the product-deviation of this cell is 4X2 or 8 [note the small (8) in the upper right hand corner of the cell], and since there are 3 frequencies in the cell, each with a product-deviation of 8, the final entry in the lower left hand corner of this cell is 3(4X2) or 24. In like manner, the product- deviation of the 2nd cell in the row is 6, — its D x =3, and its D y = 2, — and since there are 2 frequencies in the cell, the final entry is 2(3X2) or 12. Each of the 4 frequencies in the third cell has a product-deviation of 4 (the D x of the cell is 2, and the Dy is 2 also) and the final cell entry is 4(2X2) or 16. In the 4th cell each of the 3 frequencies has a D x of 1 and a D v of 2, and the product deviation is 3(1X2) or 6. The entry of the 5th cell, the cell in the (?A (wt0 column, is 0, since D x = 0, and of course 3(2X0) =0. Notice particularly the entry in the last cell of this row, viz., —2. This negative entry results from the fact that the deviation of this cell from the GA (wt0 , its D x , is —1, and its D y is 2; the product-deviation of its single frequency, therefore, is 1( — 1X2) or —2. Now total separately the plus and minus x'y"s in this row. The results, 58 and —2, are entered separately in the lix'y* column under the appropriate signs. The final entries of the cells in the other rows in the table and the sums of the product-deviations of each row are obtained in the manner described above. It must be borne in mind in calculating x'y"s that the product-deviations of all frequencies in the first and third quadrants are positive, while the product- deviations of all the frequencies in the second and fourth quad- rants are negative (see page 162). Also remember that all frequencies in either the column containing the GA iwti) or in the row containing the GA iht , } have product-deviations, since in one case the D x , and in the other the D y , equals 0. All frequencies in any given row have the same D y , and for this reason the arithmetic of calculation may be considerably reduced if each frequency in the row is first multiplied by its D Xj and the sum of these deviations multiplied once for all by the common D v . To illustrate, for the 2nd row from the CORRELATION 1G7 bottom — taking the cells from right to left — when we multiply the frequency of each cell by its D X) the result is (2 X 1) + (1 X 0) + (7X-l) + (2X-2) + (lX-3) or -12. Now multiplying this partial " deviation-sum " by the D y of the whole row, i.e., by — 2, we get 24 at the final Hx'y' entry for the row. This result checks the 28 and —4 entered separately in the lix'y' column. This shorter method is useful in getting the total Xx'y' entry of a given row quickly. It is less easy to check for errors, however, than the method of getting the entry for each cell separately, illustrated on page 166. l Step IV When the sum of the product-deviations of each row have been entered in the Zx'y' column, the algebraic sum of the Xx'y' column may be obtained (e.g., 159 — 13 = 146). The coefficient of correlation is then found by the formula: (23) x'y' ■at <-ZOy Xx'y' 146. 120 ' <J x (Jy for c x , Substituting for ( Ar , r^: for c x , .183; for c v , .017: and I\ 1Z0 for a x and <r V} 1.55 and 1.31, respectively, (see Diagram XXI for figures) r is found to equal . 60. Notice that the terms c x , c y , a x and o y are all left in units of step-interval when substituted in formula (23). This is done simply because all product-deviations (x'y n s) are in step-units and hence it is very much easier to keep all the other terms in the formula, and in consequence both numerator and de- nominator, in step-units. By this procedure the value of the 1 Printed charts for facilitating the calculation of coefficients of correlation by the product-moment method are now available. Examples are the Ruch- Stoddard Correlation Charts, University Bookstore, Iowa City, Iowa, and Thurstone Correlation Data Sheet, C. H. Stoelting & Co., Chicago. The first of these gives the product-deviation of each cell printed on the chart. Otis has also devised a correlation chart based on the product-moment method which does away with the necessity of finding the x'y ,J &. This chart is published with directions for its use by the World Book Co., Yonkers, N. Y. 168 STATISTICS IN PSYCHOLOGY AND EDUCATION fraction — the coefficient of correlation — is not changed and the arithmetic is considerably reduced. 2, The Product-Moment Formula When Deviations Are Taken from the Actual Averages of the Two Distributions Since formula (23) assumes that all x and y deviations have been taken from the two guessed averages, for this reason it is necessary to correct — ~ by the amount of the two corrections, c x and c y . If deviations are taken from the actual averages of the two distributions instead of from the GA's, no correction is needed, as both c x and c v then equal 0. Thus when devia- tions are taken from the two averages, formula (23) becomes Xxy (24) NaxVy and this is the form in which the product-moment formula is usually written. The formula may be put in still another form. If we write J-rr- for <j x and \/-tt- for <?V) the formula then becomes (the Ns cancel) VZx 2 • v 2y 2 in which the x and y deviations are from the averages as in (24) and Vzx 2 and vlj/ 2 are the sums of the squared devia- tions from the two averages. Formula (23) should always be used when there are more than, say, 30 or 40 cases. Formula (25) may be used, to advantage, however, with short series when the purpose of the experimenter is to find whether there is any relation present rather than to discover the degree of relation very accurately. No correlation table is required with formula (25). An illus- tration of the use of this formula is given in Table XVII, in which the problem is to find the correlation between the scores CORRELATION 169 TABLE XVII To Illustrate the Calculation of r when Deviations are Taken from the Averages of the Distributions Score in Score in Individual Testl(Z) Test 2(F) X V x 2 y2 xy A 50 22 -12 -8.4 144 70.56 100.8 B 53 25 - 9 -5.4 81 29.16 48.6 C 56 34 - 6 3.6 36 12.96 -21.6 D 58 28 - 4 -2.4 16 5.76 9.6 E 60 26 - 2 -4.4 4 19.36 8.8 F 61 30 - 1 - .4 1 .16 .4 G 61 32 - 1 1.6 1 2.56 - 1.6 H 64 30 2 - .4 4 .16 - .8 I 67 28 5 -2.4 25 5.76 -12.0 J 70 34 8 3.6 64 12.96 28.8 K 71 36 9 5.6 81 31.36 50.4 L 73 40 11 9.6 121 92.16 105.6 Average 62 30.4 Average (Test 1)=62.0 Average (Test 2) =30.4 578 282.92 317 V578- V282. 92 = .78 Pi^- 6745(1 Zl- 78) V08 317.0 made on two tests of association by 12 adults. The steps in finding r may be outlined as follows : Step I Find the average of Test 1 and the average of Test 2. In the table the first average is 62 . 0, and the second, 30 . 4. Step II Find the deviations of each score in Test 1 from its average, 62, and enter in column x. (The deviations from the average of the first test may be called ^-deviations, those from the average of the second test, y-deviations.) Find the deviation of each score in Test 2 from its average, 30 . 4, and enter in column y. Step III Square all ^-deviations, and all ^-deviations, and enter these squares in columns x 2 and y 2 , respectively. 170 STATISTICS IN PSYCHOLOGY AND EDUCATION Step IV Multiply the corresponding x and y deviations and enter these products in the xy column. Step V Substitute for Xxy (317), for 2z 2 (578), for 2?/ 2 (282.92) in formula (25) as shown in Ta.ble XVII, and solve for r. IV. The Probable Error of a Coefficient of Correlation The PE of an r may be found from the formula, m = 1 6745XO-^ VN If we substitute in formula (26) the r— .60 and the N= 120 of the height-weight problem (see Diagram XXI), PE T will equal .04. 1 This means that the chances are even that the " true " r falls within the limits . 60db .04, or between .56 and .64; and that the chances are 9930 in 10,000 (Table XI) that the true r falls within the limits .60±4X .04, or between .44 and .76. By the true r is meant (see page 118) that r which we should expect to get between height and weight in the population from which our group of 120 is, presumably, a random sampling. To be reasonably sure that there is some correlation present an obtained r should be at least 4 times its PE. For example, given the situation in which r is exactly 4 times its PE, in which, say, r= .16 and PE r = .04, we can only be sure that the true r falls within the limits . 16±4X .04, or between and .32. It is customary, therefore, not to consider an r as reliable — as in- dicative of a correlation at least better than — unless it is at least 4 times its PE. To be certain of a low degree of correla- tion an r should be 5 or 6 times its PE. We found in Chapter III that the reliability of the differ- ence between two averages or two medians can be calculated by 1 If we know r and A r , the PE T may be read directlv or bv interpolation from Table XVIII. CORRELATION 171 means of the formulas for <r mt t.) and PJ^ (d ia.)"(see page 128). In the same way, the reliability of the difference between two obtained r's can be found from the size of the PE of their difference. TABLE XVIII Probable Errors OF THE Coefficient or Correlation for Various Numbers of Measures (N) and for Various Values of r Number of Correlat ion Coefficient r Measures 0.0 0.1 0.2 0.3 0.4 0.5 0.6 20 1508 1493 1448 1373 1267 1131 0965 30 1231 1219 1182 1121 1035 0924 0788 40 1067 1056 1024 0971 0896 0800 0683 50 0954 0944 0915 0868 0801 0715 0610 70 0806 0798 0774 0734 0677 0605 0516 100 0674 0668 0648 0614 0567 0506 0432 150 0551 0546 0529 0501 0463 0413 0352 200 0477 0472 0458 0434 0401 0358 0305 250 0426 0421 0409 0387 0358 0319 0272 300 0389 0386 0374 0354 0327 0292 0249 400 0337 0334 0324 0307 0283 0253 0216 500 0302 0299 0290 0274 0253 0226 0193 1000 0213 0211 0205 0194 0179 0160 0137 Number of Measures 0.65 0.7 0.75 0.8 0.85 0.9 0.95 20 0871 0769 0860 0543 0419 0287 0147 30 0711 0628 0544 0539 0444 0342 0234 0120 40 0616 0467 0384 0296 0203 0104 50 0551 0486 0417 0343 0265 0181 0093 70 0466 0411 0353 0290 0224 0153 0079 100 0391 0345 0294 0242 0187 0128 0066 150 0318 0281 0241 0198 0153 0105 0054 200 0275 0243 0209 0172 0133 0091 0047 250 0246 0218 0187 0154 0118 0081 0042 300 0225 0199 0170 0140 0108 0074 0038 400 0195 0172 0148 0122 0094 0064 0033 500 0174 0154 0132 0109 0084 0057 0029 1000 0123 0109 0093 0077 0059 0041 0021 The formula for PE { diff.) between two r's is PEw&n-T$ = s/PE 2 Tl +PE\, . . . . (27) in which PE n and PE n are the PE's of the two r's to be com- pared, and must first be obtained from formula (26). The value of formula (27) may be illustrated by the following problem. Suppose that in a group of 100 eight year old boys the 172 STATISTICS IN PSYCHOLOGY AND EDUCATION r between IQ and the A -cancellation test is . 20 with a PE of .065; and that in a group of 110 eight year old girls the r be- tween the same two tests is .25 with a PE of .06. The corre- lation is .05 higher for girls than for boys. Is this difference sufficiently large to indicate that the true correlation between IQ and the A -test is higher for 8 year old girls than for 8 year old boys? To answer this question, we must determine the PE of the difference between the two r's. From formula (27), P^(diff.r 1 -r 2) = 'V / (.065) 2 +(.06) 2 =.09, and comparing the ob- tained difference of .05 with the PE {dm , we find that -5-^ = .556. This means (see Table XV) that there are only 64 chances in 100 of a real difference, a difference greater than 0, between the true correlations of IQ and the A -test for 8 year old boys and girls. The difference of .05 is, therefore, quite unreliable. To be completely reliable the obtained differ- ence should be at least 4X.09 or .36. (A difference is con- sidered reliable when r— is 4 or more, see page 133.) In *& (diff .) the present case the obtained difference is only about 14 per cent of what it should be in order to guarantee a true difference between the r's of the boys and girls. The formulas for PE T and PE^ m . Tl -T 2 ) are subject to the same restrictions and must be interpreted with the same caution as the other standard and probable error formulas (see Chap- ter III, page 145). In order to be of any real value as meas- ures of reliability, PE r and PE {am ^ should be calculated for r's obtained from random and reasonably large samples. PE's found for r's obtained from small and obviously selected groups may give an entirely false picture of the observed coefficient's reliability — especially when the coefficient is large. An r of .90 found from 20 cases, for instance, is unreliable despite the fact that PE r = .03 (see Table XVIII). Another sample of 20 drawn from the same population might give an r one half as large. CORRELATION 173 V. The Regression Equations 1. The Regression Equations in Deviation Form We have already discovered (Diagram XVII) that there are two regression lines in a correlation table, and that the first " best fits " the means of the successive columns (the average heights, represented by crosses) while the second " best fits " the means of the rows (the average weights, represented by circles). These lines of " best fit " were seen to be of value in showing graphically the change in average height accompanying a given change in weight, and the change in average weight accompanying a given change in height. Moreover, we found that either line will measure the correlation directly when the x and y steps in the diagram have been laid out with due allow- ance for the difference in size of the o-'s of the X and Y dis- tributions. This last use of the regression line is of little practical value, however. It is very much easier to draw up a correlation table without bothering about the difference in the two cr's, and find r by the product-moment formula as shown in Diagram XXI, thah to try and estimate r from the regression lines. In fact, the real value of the regression lines is not to give r, but to enable us to " predict" an individual's "most probable" standing in a test or series of measures, given his standing in another test or series of measures. We may describe briefly how this is done. Suppose that we wish to estimate a man's height from our correlation table, knowing his weight to be 68 kgs. Now the best possible " guess " that we can make of this man's height is to give the average height of all men who fall in the 65-69 weight interval. From Diagram XVI the " mean weight " of the 25 men in this column is found to be 173.6 cms., and hence 173.6 cms. is the most likely height of a man who weighs 68 kgs. In like manner, the most probable height of a man who weighs 72 kgs. is 178 . 6 cms. — the mean height of the 9 men who fall in the weight column 70-74 kgs. In general, then, the most probable height 174 STATISTICS IN PSYCHOLOGY AND EDUCATION of any man is the mean of the heights of all the men in the group who weigh the same (approximately) as he — who fall in the same weight column. 1 The line which best fits the mean heights of the successive weight-columns is the line which gives the change in average height with the change in weight (the line through the crosses in Diagram XVII). Given a man's weight, therefore, we can best " predict " his height from the regression line of height on weight; and by analogy, given a man's height, we can best predict his weight from the regres- sion line of weight on height (the line through the circles in Diagram XVII). If we had the equations of the two regression lines, it would seem obvious that estimates could be made from these much more efficiently and quickly than from the plotted regression lines. For then knowing a man's standing in the X- variable (his weight) we should be able on substituting in the equation connecting X and Y to find directly his most probable standing in the F-variable (height). The equations of the two regression lines have been deduced by Prof. Karl Pearson, who took as his criterion the idea of the " best fit- ting " fine. Pearson's method, briefly, was to find the equa- tion of that line from which the sum of the squares of the deviations of the means in the different arrays (the rows or the columns) is the least possible. 2 There are, of course, two such lines. The one "best fits" the means of the rows, the other "best fits" the means of the columns. The equation of the line drawn through the means of the columns (the crosses in Diagram XVII) is written in its simplest form 3 as y = r^-x (28) 1 There is a certain error of estimate made in taking a man's most probable height as being the average of his weight-group. The method of finding the size of this error will be considered later on page 1S3. 2 For a mathematical treatment of the application of the Method of Least Squares to the problem of deducing the regression equations, see Jones, A First Course in Statistics, 1921, pp. 106ff and 271. s A brief review of the equation of a straight line and of the method of plot- CORRELATION 175 The expression r— is called the regression coefficient and is often replaced in the equation by the expression b yx or 612, so that (28) is sometimes written y = b yx 'X and y = bi2-x. If we substitute the values of r, <r y , and <r x , — obtained from Diagram XXI — in formula (28) we have y= .WX^y^-x or y = .51x, as the equation which measures the regression of height on Y AB=3l /=6J — x DIAGRAM XXII ( . ting a simple linear equation is given in order to simplify the discussion of the regression equations. Let X and Y be coordinate axes, or axes of reference. Now suppose that we are given the equation y=2x and are required to represent the relation between x and y graphically. To do this we substitute values for x in the equation and compute the corresponding values of y. When x = 2, for example, j/ = 2X2 or 4; when a; = 3, y = 2X3 or 6. In like manner, given any x value, we can com- pute the y which will " satisfy " the equation, that is, make the left side equal to the right. Now if the series of points determined from the pairs of x and y values as given by the equation are plotted with respect to the X and Y axes (see Diagram XXII) they will be found to fall along a straight line, and this straight line will picture the relation of x and y, y =2x. This line will pass through the origin, since when x = 0, y also equals 0. The equation y = 2x represents, then, a straight line which passes through the origin and the relation of its points is y such that - (called the slope of the line) always equals 2. x The general equation of any straight line which passes through the origin may be written y = mx, where m is the slope of the line. If we replace the m of the general formula by the expression r • — we see at once that the regression <rx equation in deviation form is simply the equation of a straight line which goes through the origin. 176 STATISTICS IN PSYCHOLOGY AND EDUCATION weight. This equation represents a straight line through the origin, and hence it is a simple matter to plot it, as shown in Diagram XXIII. First, however, we must draw a vertical line through the point 63.4 kgs., the mean of all the weights (the X's) in the table, and a horizontal line through 172.6 cms., the mean of all the heights (the Y's) in the table. These two lines are the coordinate axes. Now since our plotted line must go through the origin [see note (3), page 175], only one other point is needed to determine it. If x = 2 (any value will do just as well) , y becomes .51X2 or 1.02. To plot this point, measure out 2 units from the origin along the horizontal axis and go up 1 . 02 units from the same line. This will locate the point, x = 2, y = 1.02. (Any convenient scale may be used for measuring off x and y distances — a mm. rule is useful.) The line drawn through the point just located and the origin (0, 0) is the regression line of height on weight. From the equation, it is clear that a point on this line with an a:- value of 1.00 has a corresponding y~ value of .51 (substitute x=l in the equation and 2/=. 51). This means that a deviation of 1 unit from the mean of the X's (from the vertical line drawn through the mean weight of the group) is accompanied by just . 51 time as much deviation from the mean of the F's (from the horizontal line drawn through the mean height of the group) (see Diagram XXIII). Put concretely, a man who stands 1 kg. above the average weight of the group is most probably .51 cm. above the mean height of the group also — if his weight is 64.4 kgs. (63.4+1.00) his height is probably 173.11 cms. (172.6+.51). To take another exam- ple, the man who weighs 60 kgs. — stands 3.4 kgs. below the mean weight — is most probably 170.87 cms. tall — stands 1.73 cms. below the mean height. In this example, we substitute #=—3.4 in the equation, and y=— 1.73. In general then we know from the regression equation that the most prob- able deviation of any individual in our group * from the mean 1 Or in the population from which our group of 120 is drawn, provided the group is a random sample. CORRELATION 177 DIAGRAM XXIII Illustrating Position op the Regression Lines, and Calculation or the Regression Equations (Calculation of r repeated from Diagram XXI) 4549 Weight 50-54 55-59 in kgB. (X-variable) 60-64 65-69 70-74 75-79 80-84 T u to 7. TO ~T 1 1 12 1 3 3 <3 rt (-2) 1 -2 1° (2; 3 6 / /i 16 2 <0) Ha 16 3 £; b* 2 b (-1) 4 -4 i° /l) /6 <2> "6 2 6 (4) 2 8 28 - 2 -° — 9 — i? S' — 8 — -■2-- 33 * OS •9 S "3 55 1* 3 „ J? 7 / '1 (-1) 3 -3 26 1 6 8 / / 1 14 <l° (-2) 2 -4 13 «3 1 (9) 9 6 ii° i 3 Dy 3 FDy 3 9 Zx'y' + 12 2 32 64 58 2 1 28(03) 28 2G i -1 -26 26 20 S o -26 52 28 i F x 3 10 28 37 £>x "3 - 2 - 1 22 1 9 2 5 6 120 3 4 2 206 159 - (14ft) FZ>c -9 -20 -28 (-57) 22 18 15 24 (79) = 22. FD X 27 40 28 22 36 45 96 = 294 Calculation of r: *-j||-. 017 ( CX = 120 =183 c 2 *=.0334 146 if-.017X.183 c 2 2/=.0003 1.31X1.55 CV=.085 Cx=.915 = .60 (?A(7) = 172.5 ( ?A(X)=62.5 P#r=. 04 Aver.(F) = 172.6 Av X5 3I\(X)=( <T X =' 33.4 /206 ° y = \120~ 0003 /294 Vl20 .0334X5 = 6.55 =7 r .75 Calculation of Regression Equations: I. Deviation Form: (1) y=.mx^iix=.51x 7.75 7 7^ 71?/ II. Score Form: (1) 7-172.6=.51(X-63.4) 7=.51X+140.3 (2) X-63.4=, 71(7-172. 6) X=. 717-59.1 Calculation of Standard Errors of Estimate: o-(est. Y)=6.55X.8 = 5.2 cms. <r(est.X)=7.75X.8 = 6.20 kgs. 178 STATISTICS IN PSYCHOLOGY AND EDUCATION height is just .51 as great as his deviation from the mean height. Hence, given a man's deviation from the mean weight, we are able to predict his most probable deviation from the mean height of the group. The regression equation, y = r- — -x, is known as the regres- sion equation of Y on X in Deviation Form. Stated generally, this equation measures the most probable deviation of any Y measure from the mean Y corresponding to a known deviation in the X measure from the mean X. The equation of the second regression line drawn through the means of the rows (the circles of Diagram XVII) is written x = r- — -y (29) Gy This equation measures the regression of X on Y and in the pres- ent problem, of weight on height. The regression coefficient r • — <Ty is sometimes replaced by the expression b xy or 621, so that (29) is often written x = b xy -y or £ = 621-2/. If we substitute in (29) the values of r, a x , and tr y found from Diagram XXI, we have 7 75 x= .Q0X7r-^-y or x= .71?/, 0.55 as the equation which measures the regression of weight on height. This equation, like the other, represents a straight line through the origin; and consequently, one point on the line together with the origin (0, 0) are sufficient to plot the line. Put y = l in the equation, and x will equal .71. Now plot the point a; =.71, y =1.00 on the diagram, and draw the regression line through this point and the origin (see Diagram XXIII). It is evident from the second regression equation that a deviation of 1 cm. from the mean of all the heights (F's) is most probably accompanied by a deviation of .71 kg. from the CORRELATION 179 mean of all the weights (X's) ; or put in a different way, the most probable deviation of any man from the mean weight is just .71 as great as his deviation from the mean height. A man 180 cms. tall, for example (7.4 cms. above the mean height), most probably weighs 68.65 kgs. — is 5.25 kgs. above the mean weight). (To get this result substitute 7.4 for y in the equation, and solve for x.) The equation x = r y is known as the regression equation (Jy of X on Y in deviation form. To summarize briefly it measures the probable deviation of an X-measure from the average X y corresponding to a known deviation in the F-measure from the average Y. Although there are two regression equations, both of which involve x and y, the student must bear in mind the important fact that the two equations cannot be used inter- changeably and that neither can be used to predict both x and y. The first regression equation, y — r- — -x, is to be <J* used only when y is to be predicted from x (when y is the " dependent " variable), while the second regression equa- tion, x — r-— -y, is to be used only when x is to be predicted (Jy from y (when x is the " dependent " variable). 1 There are always two regression equations unless the correlation is perfect. When r=1.00, however, the equation y = v— -x becomes y = ~.x, or a x -y = cry-x ) while the equation x = r-— -y <J X (Jy becomes x = — • y, or o- x -y = a y -x. The two equations are now (Jy identical, and the regression lines coincide. As an illustration of this last condition suppose that the * A dependent variable depends for its value on the other variable in the equation. Thus in the equation y = r — •£, y " depends " on the value given x, ax 180 STATISTICS IN PSYCHOLOGY AND EDUCATION correlation between height and weight is perfect, a x and tr w remaining the same. The first regression equation would now 6.55 become y = 1 . 00 X 7 ' 7 g -x, or y= . 85?/, while the second regres- 7 75 sion equation would become x = 1 . 00 X w-r= 'V, or x = 1 . 18z/. Algebraically, x— 1.18 z/ is equivalent to y= .85x (since in the second equation # = -— , or x = 1.18y). Under the prescribed . oo conditions, therefore, we should have a single equation and a single line, which would represent equally well a change (devi- ation) in Y for a given change in X, or a change (deviation) in X for a given change in Y. It may be added that when r=1.00, and in addition the two as are equal or are made equal by the arrangement of the diagram, the single regression line makes an angle of 45 degrees with the horizontal axis (see Diagram XVIII, and the discussion on pages 161-162). 2. The Regression Equations in Score Form In the last paragraph the point was stressed that formulas (28) and (29) are the equations of the regression lines in devi- ation form — that values of x and y substituted in these equa- tions are deviations from the means of the X and Y distribu- tions and not actual scores or measures. 1 While equations in deviation form are all that we actually need for purposes of predic- tion, it is often very convenient to be able to estimate an indi- vidual's actual score in Y, say, directly from his score in X with- out the trouble of first converting the X-score into a deviation from the mean X. This can be done very simply if we emplo}^ the score form rather than the deviation form of the regression equation. The conversion of deviation to score form may be made as follows. Let the average of the F's be denoted by Y' and any F-score by Y, then the y deviation of anj r individual from the mean will be Y—Y' (the difference between 1 The small letters x and y are used to denote deviations from the means of the X and Y distributions. The large letters X and Y denote actual scores. CORRELATION 181 the score and the mean) or, in general, y=Y—Y'. In the same way, we can show that, in general, x = X — X\ when x is the deviation of any X score from the mean X from X'. Now substitute 7 — Y' for y and X—X' for x in formulas (28) and (29) and the two regression equations become, Y-Y' = r-^(X-X') or Y = r-^(X-X') + Y', . (30) and X-X' = r--(7-7') or X = r.-(7-7')+X', . (31) Gy Gy These are the equations of the two regression lines in score form. In both equations, X and Y now represent actual scores and not deviations from the means of the two distributions. If we substitute in (30) the values for Y' ', r, a y , g x , and X' obtained from Diagram XXIII, the equation becomes 7-172.6= .60x!^(^-G3.4), i . t o or, clearing of fractions, F=.51X+140.3. To illustrate the use of this equation, let us suppose that a man in our group weighs 60 kgs. (X) and that we wish to estimate his most probable height (7). Substituting 60 for X in the equation, 7 = 170.9; and accordingly the most probable height of a man who weighs 60 kgs. is 170.9 cms. If the problem is to predict weight instead of height, we must use equation (31). Substituting the values for X', r, ay, ff x , and Y' in the second equation we have X-63.4= . 60X^45(7-- m. 6) 6,55 or X=. 717-59.1. Now given a man 180 cms. tall, we find putting 180 for 7 in the formula, that X = 68.7 kgs. Hence the most probable weight of a man 180 cms. tall is 68.7 kgs. 182 STATISTICS IN PSYCHOLOGY AND EDUCATION It may seem strange to the student to talk of " pre- dicting " a man's height from his weight, when we already know the height and weight of all 120 men in our group. Of course when we have both height and weight it is unneces- sary to convert one into the other. Suppose, however, that all we know about a certain man is his weight and the fact that he falls within the age-range of our group of 120 men. Now since we know the correlation between height and weight in this group it is possible from the regression equation to predict the most probable height of our subject in lieu of actually measuring him. In the same way, the regression equation may be used to predict the height of any man in the population from which our group is taken, provided our group is a random sample of the larger group. The regression equa- tions hold, of course, only for the population from which the sample group is drawn. We could not, of course, estimate the probable heights of children or of women from a regression equation which had been worked out for men between the ages of 18 and 25 (the age-range of the men in our group of 120). And conversely, we could not expect regression equations worked out for elementary children to hold for older groups. Probably height and weight — since they are both easily measured — do not show the value of the regression equations as well as other and more complex traits. To take a problem of more direct interest, suppose that in a group of children of approximately the same age the r between IQ and average grades made in the first year of high school works out to be .70. Now if we know the IQ of a child entering school the next year, it is possible to estimate what his probable scholastic performance will be from the regression equation worked out from the group of the previous year. This may be extremely valuable in educational guidance. The same thing is true of vocational guidance — we may be able on the basis of test scores to predict the probable success of an individ- ual who contemplates entering a certain trade or profession, and thus advise him more intelligently. CORRELATION 183 3. The Reliability of the Predictions Made by the Regression Equations A. The Standard Error of Estimate, a {eKt . h or S We have constantly referred to the values of X and Y " predicted " from the regression equations as being the " most probable " values of the one variable accompanying the given value of the other. The method of showing just how reliable, i.e., how probable, our predicted values are, is to calculate their standard error of estimate, written o- (est) . To find the accuracy with which we are able to estimate F-values from equation (30) , we employ the formula x 0"(est. y) = oyvl — f 2 , (32) in which <j y is the <r of the F-distribution, and the " (est.)" is to distinguish its <j from the expressions o-( dis .), 0"(aver.)> etc., r is, of course, the coefficient of correlation between X and Y. Now from equation (30) we have found that a man weigh- ing 60 kgs. is most probably 170.9 cms. tall (see page 181). To find the reliability of this estimate substitute in formula (32), to find, <r ( est.y) = 6.55xVl-.6 2 = 5.2. We may now say that the most probable height of a man weigh- ing 60 kgs. is 170.9 cms. with a o- (est .) of 5.2 cms. — and that the chances are 68 in 100 that the actual height of the given individual falls within the limits 170. 9 =±=5. 2, or between 165.7 cms. and 176 . 1 cms. We may be practically certain that the height of this man falls within the limits 170.9±3X5.2; or between 155.3 cms. and 186.5 cms. In order to find with what degree of accuracy we are able to predict X values from equation (31) we use the formula, 2 o - (est.x) = o-xV / l — r 2 , (33) in which <t x is the a of the X-distribution. 1 c(est. Y) is sometimes written Sy, 2 o"(est. X) is sometimes written Sx- 184 STATISTICS IN PSYCHOLOGY AND EDUCATION We have already found from formula (31) that the most probable weight (X) of a man 180 cms. tall is 68.7 kgs. (see page 181). To find the cr (est . X) of this prediction we substitute for a x and r in formula (33) : <r (e st.x) = 7.75xVl-.6 2 = 6.2. Hence the most probable weight of a man in our group (or in the population from which it is drawn) who is 180 cms. tall is 68.7 kgs. with a (7 (es t.) of 6.2 kgs. The chances are 68 in 100 that the actual weight of this man falls within the limits 68.7±6.2, or between 62.5 and 74.9 kgs. We may be prac- tically certain that his weight falls within the limits 68.7±3X 6 . 2 or between 50 . 1 and 87 . 3 kgs. B. The Probable Error of Estimate, PE( es t.) The Pi^t.) may be used for estimating the accuracy of a prediction instead of c (est .). PE {esU) is obtained by simply multiplying 0- (e8 t.) by the constant .6745. Thus P£ (est .y)=. 6745X^1^ .... (34) and P^ ( est.x,= .0745Xcr x Vl^7, .... (35) The height of a man who weighs 60 kgs. has been estimated to be 170.9 cms. with a o- (es t. d of 5.2 cms. The PE {a3bmY } of this estimated height is .6745X5.2 or 3.5 cms. The chances are even, therefore, that the actual height of this man falls within the limits 170.9±3.5 or between 167.4 and 174.4 cms. In like manner, since the estimated weight of a man ISO cms. tall is 68.7 kgs. with a o- (est . X ) of 6.2, the PE iesuX ) of this man's weight will be .6745X6.2 or 4.2 kgs. The chances are even that this man's actual weight lies within the limits 68.7d=4.2 or between 64.5 and 72.9 kgs. The formulas for <r (est .) and P£ , (es t,) measure the error made in taking predicted instead of actual X and Y scores. Note that when r=1.00, VI- r 2 is 0; and consequently since both CORRELATION 185 o-(est.) and PE {es t.) are then zero, there is no error of prediction. This result follows because all of the paired scores fall on the one double regression line when r=1.00 1 (see page 161). An inspection of the formulas for o- (est .) and PE^ U) shows that the accuracy of the prediction from the regression equa- tions depends upon the o-'s of the two distributions (the u v and cr x ) and upon the degree of correlation between the two traits. If the variability in Y, say, is small, and the correlation between Y and X high (e.g., .90 to 1.00) values of Y can be predicted from known values of X with a comparatively high degree of accuracy. When the variability is large or the correla- tion low, however, the prediction often becomes so unreliable as to be almost valueless; and even with a fairly high coeffi- cient, predictions will often have such a large error of estimate as to be almost valueless. Thus, in spite of the fact that an r=.60 is usually considered fairly substantial, 2 we can only predict a man's height (F), knowing his weight (X), within a PE {est .) of 3.5 cms. In other words, the chances are only 50 in 100 that the actual height does not differ from the predicted height by more than ±3.5 cms. When using the regression equations for prediction, the o-est. or the PE est . should always be given. In general, the value of a prediction will depend — in addition to the size of the error of estimate — upon the fineness of the units of measure- ment and the purposes for which the prediction is made. VI. The Complete Solution of a Correlation Problem In Diagram XXIV will be found the complete solution of a second correlation problem. The purpose of another " model " problem, in addition to the height-weight problem in Diagram XXIII, is to strengthen the student's grasp on cor- relation by having him work through the steps in finding r and the regression equations with a new set of data. Often- 1 See Monroe, An Introduction to the Theory of Educational Measurements, 1923, pp. 351-353, for a graphical demonstration of the meaning of <r(est.). 2 See, however, the discussion of high and low correlation on page 288ff. 186 STATISTICS IN PSYCHOLOGY AND EDUCATION DIAGRAM XXIV To Illustrate the Complete Solution of a Correlation Problem IQ First Test(X -variable) 90- 95-" 100 105-110- 115- U20- 125- 130- 135- 140- 145- 150- 94 99 104 109 114 119 124 129 134 139 144 149 154-^2/ 155-159 150-154 145-149 ~ 140-144 a | 135-139 g 130-134 Dy 8 7 6 5 4 3 2 1 - 1 -2 -3 -4 -5 -6 FDy 24 192 + - IS 14 98 13 12 72 13 40 200 37 24 90 24 21 03 21 26 52 26 13 (174) 13 13 -19 -24 -45 -24 - 15 19 48 135 96 75 - C(-133)36 41 1195 3 13 31 17 14 5 144 91 78 185 96 63 52 13 3 26 93 68 70 30 1012 FD X -15 -12 -24-28 -21 (-100) 14 28 24 44 35 24 21(l90) = 90 FD% 75 48 72 56 21 14 56 72 176 175 144 147 = 1056 , 41 - S ch = . 09 Cv=1.5 Afi/ = 117.5+1.5 = 119 Cx 90 AA Calculation of r: 1012 c 2 x=.44 Cx = 3.30 Mx = 117.5 +3.30 = 120.8 r = 136 .3X66 2.95X2.71 = .91 PE r =. 01 (Table XVIII) <r v =y 1195 133 = 2.95X5 = 14.75 09X5 ax = A 1056 136 = 2.71X5 = 13.55 Calculation of Regression Equations: I. Deviation Form: ,44X5 y yiX 13.55 X Q1v 13j5 X= .91X , ; „r V 99.c S4y 14.75 Calculation of PEW.) PE {sst . Y ) = . 0745 X 14 . 75 X Vl-(.91) 2 = 4.12(4) PEm. X) = ■ 6745 X 13 . 55 X ^T~ = 3.79(4) II. Score Form: r-119=.99(X-120.S) F=.99X-.59 X-120.8=.S4(F-119) X=.S4F+20.S Examples : Let X = 100 F = 99-.59or9S±4 Let X = 120 r=ii8d=4 (.91 2 ) Let F = 100 A r = S4+20.84 = 104=fc:4 CORRELATION 187 times when only a single model problem is given, one fails to understand certain points in the solution which another entirely different problem will succeed in clearing up. A brief discussion of the important points in the solution of this prob- lem will be given in the following paragraphs, which the student should read with Diagram XXIV before him. The problem is to find the relation between the 7Q's of 136 children (of same chronological age) as determined from two individual intelligence tests. The correlation table has been constructed from a scatter diagram as explained on page 154. The first set of IQ's is the X- variable, and the second set of IQ J s the F-variable. Since the calculations of the two averages, c x , c y , <T X , and <r v , cover familiar ground and have been given in detail on the diagram, they need not be repeated. Note first, then, that the product-deviations in the "Zx'y' column have been taken from column 115-119 (the column containing the GA of the X-distribution) and row 115-119 (the row containing the GA of the F-distribution) . The entries in the Hx'y' column have been obtained by the shorter method described on page 167 — each cell frequency in a given row has been multiplied by its D x , and the sum of these partial deviations entered in the column Zsc'. This entry has then been " weighted " (multiplied) once for all by the D y of the whole row. To illustrate, in the first row (reading from left to right) we have (IX 5) + (IX 6) + (1X7), or 18, as 2x' entry. (The DJs are 5, 6, and 7, respectively, and may be found from the D x row at the bottom of the diagram.) The common D y is 8, hence the 2x'y' entry is 18X8 or 144. Again in the eighth row, we have (3X-1) + (2X0) + (3X1) + (3X2) + (1X3) + (1X4) or 13 as the Xx' entry. The D v of this row is 1, and hence the Xx'y' entry is 13. To take still another example, in the eleventh row we have (2X -3) + (3X-2) + (3X -1) + (2X0) + (2X1) or — 13 as the 2a/. Since the common Dy is ( — 2), the x'y' entry here is +26. After all of the 2x'y f entries have been made and the sum of the column found, the calculation of r from formula (23) and of 188 STATISTICS IN PSYCHOLOGY AND EDUCATION PE r from formula (26) are simply matters of substitution. Remember that c X} c y , <r v , a x , are all left in units of step-interval in the r formula (see page 167). The regression equations in Deviation Form under (1) have been found by substituting the values of r, cr x , and a y in formulas (28) and (29), and the two straight lines which these two equations represent have been plotted on the diagram. So far as the actual solution of the problem is concerned, it is unnecessary to plot these lines. They are of value, however, in indicating whether the means of the X and Y arrays may be fairly represented by straight lines; i.e., whether the regression is apparently " linear." If the relation is not " straight-line," other methods must be employed in calculating the correlation (see page 203.) The regression equations in Score Form have been found, the one by substituting the two averages and the regression coefficient of Y on X (.99) in formula (30), and the other by substituting the two averages and the regression equation of Ion 7 (-84) in formula (31). The calculation of the two PE's of estimate is shown on the Diagram. PE^ est , Y) is found from formula (34) ; PE (esU X ) from formula (35) . Several examples have been given in the diagram to illus- trate the use of the regression equations in " prediction." Note that an IQ of 100 on the first test (X) is most probably accompanied by an IQ of 98 on the second test (Y) with a PE( est . Y ) of 4 . 12 (4) points. The chances are 50 in 100 that the actual IQ on the second test falls within the limits 98 ±4, or between 102 and 94. An IQ of 120 on the first test (X) is most probably accompanied by an/Q of 118 points in the second test (F), and the PE {est , y> is again 4 points. All predicted F's have the same error of estimate, no matter where on the scale the Y may fall. While the errors of estimate <T (e st.) and PE {est .) have been used hitherto for the purpose of giving the reliability of specific predicted scores, they may also be interpreted in a more general fashion. A P^ (es t. r>, for instance, of 4 points may be CORRELATION 189 taken to mean that one half of the IQ's in test Y failed of per- fect correlation with the IQ's in test X by ±4 points or more, while the other one half failed of perfect correlation by less than ±4 points. In most correlation problems we are interested in pre- dicting the scores on only one test. (F is usually taken as the dependent, and X the independent variable.) For illustrative purposes, however, an example is given in Diagram XXIV of the prediction of an IQ in X from an IQ in Y. Thus for an IQ(Y) of 100 we find the most probable IQ(X) to be 104 with a PE lesb , X ) of 3 . 79 (4) points. The chances are 50 in 100 that the actual IQ(X) falls within the limits 104 ±4 points or between 100 and 108. VII. Methods of Measuring Correlation Which Take Account Only of Relative Position or Rank In many problems, especially in the fields of applied and vocational psychology, the investigator finds that he must work with data in which differences in capacity or merit are expressed in ranks rather than in graded scores or measures. To mention a few cases of this sort, we have individuals ranked in order of merit for honesty, athletic ability, salesmanship, or intelligence; and advertisements, colors, etc., ranked for esthetic qualities, beauty, or individual preference. In com- puting correlations from such material as this it is neccessary to use methods which take account only of the relative posi- tions or ranks. Also, when we have only a few scores (10 to 25 for example), it is often advisable to rank these in orders of merit and compute the correlation by a rank method instead of by the longer and more laborious product-moment method. Coefficients of correlation calculated from a few cases are nearly always unreliable, and of little value except in sug- gesting the possible existence of relation, or as a preliminary survey. In such cases, therefore, simple methods are recom- mended, as they save much time and labor besides giving 190 STATISTICS IN PSYCHOLOGY AND EDUCATION results which are as good as those secured by more elaborate methods. In the present Section we shall consider two methods of finding the correlation when the data to be correlated have been arranged in orders of merit. These methods are known respectively as (1) the Method of Rank-Differences, and (2) the Method of Gains or the Spearman " Footrule." 1. The Method of Rank-Differences The method of rank-differences is illustrated in Table XIX. The problem is to find the relation between the length of service and the selling efficiency of 12 salesmen. The men are listed in column 1, and in column 2, opposite the name of each man, is given the number of years he has been in the service of the company. In column 3, the men are ranked in order of merit in accordance with the length of their service. For example, G who has been longest with the company is ranked 1; C, the next longest, is ranked 2; and so on down the list. Notice that both A and J have the same period of service, and that each is ranked 7.5. Instead of ranking one 7, and the other 8, or both 7 or 8, we compromise by ranking both 7.5, and F who follows 9. 1 In column 4 the men are ranked in order of merit for effi- ciency by the salesmanager. The most efficient man (C) is ranked 1, the least efficient (B) is ranked 12. In column 5, the difference (the "D") between each man's efficiency rank and his years of service rank is entered, and in the next column (6) each of these D's is squared. The correlation between the two orders of merit may now be computed by substituting for 2D 2 and N in the formula, 62D 2 p=1 -ww^Ty (36) 1 When three or more individuals (or specimens of any sort) are tied — have the same score — the simplest plan is to give them all the median order of merit rating. Thus three individuals who are 5, 6, and 7, respectively, are all ranked 6, and the next following 8; while four individuals who are 5, 6, 7, and 8, are all ranked 6.5, and the next following 9. CORRELATION 191 TABLE XIX To Illustrate the Rank-Difference Method of Finding Correlation (l) Salesmen A B C D E F G H I J K L AT = 12 (2) Years of Service 5 2 10 8 6 4 12 2 7 5 9 3 (3) Order of Merit (Service) 7.5 11.5 2 4 6 9 1 11.5 5 7.5 3 10 (4) Order of Merit (Efficiency) 6 12 1 9 8 5 2 10 3 7 4 11 = 1 62D 2 N(N 2 ~1) = 1 6X58 12(143) (5) Difference between Ranks 0>) 1.5 .5 1.0 5.0 2.0 4.0 1.0 1.5 2.0 .5 1.0 1.0 = .80 From Table XX r=. 81. P^Jgg^S, ,07 (6) Difference Squared (Z> 2 ) 2.25 .25 1.00 25.00 4.00 16.00 1.00 2.25 4.00 .25 1.00 1.00 58.00 [See formula (37)] in which D represents the difference in the rank of an individual in the two series, and 2D 2 is the sum of the squares of all such differences. N is, of course, the number of cases, and p is the rank order coefficient of correlation, p may be transmuted into a product-moment r by means of Table XX. Substituting 58 for 2D 2 and 12 for N in formula (36), we obtain a p of .80, and from Table XX this is found to be equivalent to an r of .81. The PE of an r found from a p, is about 5% larger than the PE of the product-moment r. 1 The formula is PE r = 7063(1 -r 2 ) Vn (37) and since, in the present example, r= .81, PE r — .07. Accord- ingly, the coefficient of correlation though based on only 12 1 See Brown & Thomson, Essentials of Mental Measurement, 1921, p. 103. 192 STATISTICS IN PSYCHOLOGY AND EDUCATION cases is conventionally reliable. Whenever N is less than 30, however, the PE r is probably much larger than the value given by the formula. In any case r's and PEr's secured from less than 30 cases should be accepted as tentative, and inter- preted with caution. In the present example, all that we are justified in concluding is that in our particular group of 12 men there is evidence of a close correspondence between rank- ings for efficiency and number of years employed. TABLE XX A Table to Infer the Value of r from Any Given Value of p 62£> 2 p — *■ N(N*-1) p r p r p r p r .01 .0105 .26 .2714 .51 .5277 .76 .7750 .02 .0209 .27 .2818 .52 .5378 .77 .7847 .03 .0314 .28 .2922 .53 .5479 .78 .7943 .04 .0419 .29 .3025 .54 .5580 .79 .8039 .05 .0524 .30 .3129 .55 .5680 .80 .8135 .03 .062S .31 .3232 .56 .5781 .81 .8230 .07 .0733 .32 .3335 .57 .5881 .82 .8325 .03 .0838 .33 .3439 .58 .5981 .83 .8421 .09 .0942 .34 .3542 .59 .6081 .84 .8516 .10 .1047 .35 .3645 .60 .6180 .85 .8610 .11 .1151 .36 .3748 .61 .6280 .86 .8705 .12 .1256 .37 .3850 .62 .6379 .87 .8799 .13 .1360 .38 .3935 .63 .6478 .88 .8S93 .14 .1465 .39 .4056 .64 .6577 .89 .89S6 .15 .1569 .40 .4158 .65 .6676 .90 .90S0 .16 .1674 .41 .4261 .66 .6775 .91 .9173 .17 .1778 .42 .4363 .67 .6873 .92 . 9269 .18 .1882 .43 .4465 .68 .6971 .93 .9359 .19 .1986 .44 .4567 .69 .7069 .94 .9451 .20 .2091 .45 .4669 .70 .7167 .95 .9543 .21 .2195 .46 .4771 .71 .7265 .96 .9635 .22 .2299 .47 .4872 .72 .7363 .97 .9727 .23 .2403 .48 .4973 .73 .7460 .98 .9818 .24 .2507 .49 .5075 .74 .7557 .99 .9909 .25 .2611 .50 .5176 .75 .7654 1.00 1.0000 2. The Method of Gains, or the Spearman Footrule A second method of computing correlation when the data are ranked in orders of merit is the Method of Gains, or the Spear- man " Footrule.' ' Table XXI illustrates the use of the Foot- CORRELATION 193 rule with the data taken from Table XIX. It will be noticed that the first four columns are the same in both methods, i.e., each series is arranged first in an order of merit. The methods differ from here on, however. The entries in column 5, which is headed G (" Gains"), are found by taking the plus differences or the gains in rank of the 12 men in the efficiency-rankings as compared with their service-rankings. Thus A who ranks 7.5 in " service " and 6 in " efficiency " has an increase in rank or gain of 1 . 5 in the second ranking over the first. 1 C, F, H, I, and J, likewise register plus differences or gains in their effi- ciency rankings as compared with their service rankings. The total of the G column is 10.5. Note that if we compute the gains in rank of service over efficiency instead of efficiency over service, the same G will be obtained. This is shown in column 6, marked G'. It makes no difference, therefore, whether we figure gains of the first series over the second, or the other way round, second over first. TABLE XXI To Illustrate ' THE FOOTRULE Method of Finding Correlation (i) (2) (3) (4) (5) (6) Years of Order of Merit Order of Merit G (Gains) G' (Gains) Salesmen Service (Service) (Efficiency) (4 over 3) (3 over 4) A 5 7.5 6 1.5 B 2 11.5 12 .5 C 10 2 1 "i.6 D 8 4 9 5.0 E 6 6 8 2.0 F 4 9 5 i'.o G 12 1 2 1.0 H 2 11.5 10 i'.h I 7 5 3 2.0 J 5 7.5 7 .5 K 9 3 4 1.0 L 3 10 11 1.0 10.5 10.5 R = 62(7 N 2 -l~ 6X10.5 143 = .56 T (Table XXII) = .79 1 Since the rankings arc ^rom 1 io 12, a rank of 6 is to be taken as higher than a rank of 7.5. 194 STATISTICS IN PSYCHOLOGY AND EDUCATION When the sum of the G column has been obtained, the cor- relation may be found from the formula, 62(3 R==1 ~~(N 2 -1)' •.-••■ • • • (38) Substituting for 2(7 its value 10.5, and for N its value 12, we get an R of .56. From Table XXII this R may be converted into an equivalent product-moment r of .79. Note that this value of r compares favorably with the r (found from p) of .81. table x::n A Table to Infer the Value of r from Any Given Value of R R r R r R r R r 00 .000 01 .018 02 .036 03 .054 04 .071 05 .089 06 .107 07 .124 08 .141 09 .158 10 .176 11 .192 12 .209 13 .226 14 .242 15 .259 16 .275 17 .291 18 .307 19 .323 20 .338 21 .354 22 .369 23 .384 24 .399 25 .414 26 .429 27 .444 28 .458 29 .472 30 .486 31 .500 32 .514 33 .528 34 .541 35 .554 36 .567 37 .580 38 .593 39 .608 40 .618 41 .630 42 .642 43 .654 44 .666 45 .677 46 .689 47 .700 48 .711 49 .721 50 .732 51 .742 .76 .937 52 .753 .77 .942 53 .703 .78 .947 54 .772 .79 .952 55 .782 .80 .956 56 .791 .81 .961 57 .801 .82 .965 58 .810 .83 .968 59 .818 .84 .972 60 .827 .85 .975 61 .836 .86 .979 62 .844 .87 .981 63 .852 .88 .9S4 64 .860 .89 .987 65 .867 .90 .9S9 66 .875 .91 .991 67 .882 .92 .993 68 .889 .93 .995 69 .896 .94 .996 70 .902 .95 .997 71 .90S .96 .998 72 * .915 .97 .999 73 .921 .98 .9996 74 .926 .99 .9999 75 . 932 1.00 1.0000 The Footrule formula gives a rough estimate of the cor- relation, and is generally less accurate than the rank- difference formula. The coefficient R " has a large, though CORRELATION 195 except in the case of zero correlation, not definitely known PE; does not vary between — 1 and +1; is not comparable in meaning with the product-moment coefficient ; and in general has none of the merits except brevity of the formula based on the squares of the differences in rank." x The Footrule can be employed to advantage, however, when the data are so meager or crude as to make a more refined method a waste of time; or it may be used in a preliminary survey to determine whether there is sufficient evidence of correlation to warrant the applica- tion of the product-moment method. 3. Summary of the Rank Methods The product-moment method takes account of both the size of the score and its position in the series. The rank methods take account only of the position of the items in the series. For example, individuals who score 90, 86, and 70, on a given test must be ranked 1, 2, and 3 in order of merit despite the fact that the difference between 90 and 86 is 4, and the difference between 86 and 70 is 16. The rank methods indicate the presence of relationship rather than the extent of relation. In general it may be set down as a convenient rule that rank methods should never be used ordinarily except when N is small — say less than 30. Of the two rank methods, the method of rank-differences is to be preferred as the more accurate. VIII. A Method of Measuring Relationship When the Data are Grouped into Classes or Categories. The Contingency Method Sometimes the need arises of computing correlation when the facts in which we are interested cannot be conveniently measured, but can be grouped into classes or categories. To cite a few examples of such data, we can classify eye-color as blue, grey, or brown; temper as quick, even, or slow; athletic i See Kelley, T. L., Statistical Method, 1923, p. 193. 196 STATISTICS IN PSYCHOLOGY AND EDUCATION ability as good, average or poor, when we are unable to measure such facts exactly. The methods of computing correlation which have been given in the preceding sections are generally applied to facts which can be measured absolutely in terms of some common unit, or which, at least, can be ranked in order of merit — they do not ordinarily apply to data which can only be grouped into classes. Several methods are avail- able for such material, however. One of the best of these is the Contingency Method developed by Prof. Karl Pearson. 1 In the contingency method relation is expressed by C, the Coefficient of Mean Square Contingency. Table XXIII illustrates the method of drawing up a con- tingency table, and shows in detail the steps involved in finding C. The problem is to discover whether there is any " resem- blance " (correlation) between the eye-color of father and son. There are 1000 cases. Tabulation of data is similar to the method used in constructing a correlation table. Reading down the first column, for example, we find that out of a total of 358 blue-eyed fathers, 194 have blue-eyed sons; 83 grey- eyed sons; 25 dark grey or hazel-eyed sons; and 56 brown- eyed sons. In the first row, we find 335 blue-eyed sons of whom 194 have blue-eyed fathers; 70 grey-eyed fathers; 41 dark grey or hazel-eyed fathers; 30 brown-eyed fathers. After the contingency table is completed, the first step in the calculation of C is to find an " independence value " for each cell. These values — the figures in the parentheses in the cells — represent the number of fathers and sons (whose eye- color is given by the column and row, respectively, in which the cell lies) whom we should expect to find in any given cell in the absence of any actual association in the eye-color of father and son. For example, the observed number of blue- eyed fathers who have blue-eyed sons in our sample of 1000 is 194. If there were no correlation between the eye-color of father and son, we should still expect to find — TTwT - " or Yule, G. U., An Introduction to the Theory of Statistics, 1919, p. 6-iff. CORRELATION 197 TABLE XXIII To Illustrate the Calculation of C, the Coefficient of Mean Square Contingency. [From Yule, p. 70] Column 2 « o j o O H H GO o 02 Blue Grey Hazel Brown Totals Father's Eye Color Blue Grey Hazel Brown Totals (120) ; 194 (88) 70 (60) 41 (66) 30 335 (102) 83 (75) 124 (51) 41 (56) 36 284 (49) 25 (36) 34 (25) 55 (27) 23 137 (87) 55 (64) 36 (44) 43 (48) 109 244 (194) 2 120 (83) 2 87 (70) 2 88 (124) 2 358 264 180 198 1000 Column 1 Independence Values 335X358 1000 335X264 1000 335X180 1000 335X198 1000 284X358 1000 284X264 1000 284X180 1000 284X198 1000 137X358 1000 120 88 60 = 66 = 102 = 75 = 51 56 = 49 137X264 1000 137X180 1000 137X198 1000 244X358 1000 244X264 1000 244X180 1000 244X198 1000 36 = 25 = 27 = 87 = 64 = 44 = 48 44 (30) 2 66 (36) 2 56 (23) 2 27 (109) 2 £ = 1270.8 AT = 1000 S-N= 270.8 C = A' S-N S ■4 270.8 1270.8 = 462 198 STATISTICS IN PSYCHOLOGY AND EDUCATION 120 blue-eyed fathers with blue-eyed sons by the operation of chance alone. 1 Again, the observed number of grey-eyed fathers who have blue-eyed sons is 70. In the absence of any real association, chance alone would account for — — — — — or 88 such cases in our sample of 1000. In like manner " independ- ence values " may be found for each cell by the simple process of multiplying together the totals of the row and column in which the cell lies and dividing this product by N, the number of cases. (See column 1, Table XXIII.) When the independence values have been calculated for each cell, the next step is to square each cell entry and divide this result by the independence value of that cell (see column 2). All quotients so found are totaled to give S (1270.8), and ^(1000) is subtracted to give S — N. The coefficient of mean square contingency, C, may then be found from the formula, c= yV* • • (39) In the present problem, C— .462. The steps in the computation of C may be summarized as follows : 1. Construct a contingency table as shown in Table XXIII. 2. Determine the " independence value " for each cell by multiplying together the totals of the row and column in which the cell falls and dividing this product by A'. 3. Square the number found in each cell, and divide this result by the independence value of that cell obtained in (2) above. 4. Sum the quotients obtained from (3). Call this total S. 335 1 We find that of all the sons are blue-eyed. This proportion should hold for sons of all fathers, if there is no dependence of son on father in respect 335 to eye-color. Hence — — — of the 35S blue-eyed fathers should have blue-eyed sons by the operation of chance alone. This argument applies to the other " independence values " also. CORRELATION 199 5. Subtract N from S, giving S—N. 6. Divide S—N by S and extract the square root to get C, the coefficient of mean square contingency. The fundamental principle underlying the Contingency Method is a comparison of the frequency of association (num- ber of cases) actually found in each cell with the frequency of association which we should expect to find in the cells if the traits considered were completely unrelated (independent). If there is just no correlation between the two variables in our contingency table, (7= .00; if there is perfect correlation, C approaches 1 . 00 as a limit. While in general no sign is attached to C, as this coefficient simply indicates whether the two traits are associated or independent, for interpretative purposes a minus sign may be affixed to a C if an inspection of the contingency table shows that marked degrees of the one trait are found with slight degrees of the other. Thus from an inspection of Table XXIII, it is evident that slight pigmentation of eyes in the father is associated with slight pigmentation of eyes in the son, and hence in the present case, C is clearly positive. 1 If marked pigmenta- tion in the eyes of the father had been associated with slight pigmentation in the eyes of the son, C would have been negative. In other words, we must determine whether the correlation is positive or negative from the contingency table, — C gives simply the degree of the relation. One disadvantage of the contingency method lies in the fact that C does not remain constant — for the same data — when the number of classes in the table is increased. The C cal- culated from a 3X3 fold table will not ordinarily equal the C calculated from the same data arranged in, say, a 5X5 fold table. Moreover, the maximum value which a C can take will depend 1 Note, for example, that 194 blue-eyed fathers have blue-eyed sons, while only 30 brown-eyed fathers have blue-eyed sons. Also, 109 brown-eyed fathers have brown-eyed sons while only 56 blue-eyed fathers have brown-eyed sons. Other comparisons like these will show that association between the degree of pigmentation in the eyes of father and son is positive. 200 STATISTICS IN PSYCHOLOGY AND EDUCATION on the fineness of the classification employed. Yule 1 has shown that when the number of classes = 2 C cannot exceed . 707 when the number of classes = 3 C cannot exceed .816 when the number of classes = 4 C cannot exceed . 866 when the number of classes = 5 C cannot exceed . 894 when the number of classes = 6 C cannot exceed .913 when the number of classes = 7 C cannot exceed . 926 when the number of classes = 8 C cannot exceed . 935 when the number of classes = 9 C cannot exceed . 943 when the number of classes = 10 C cannot exceed .949 Yule has suggested, in the light of these facts, that we "restrict the use of the ' coefficient of contingency ' to 5 X 5-fold or finer classifications " in order that the maximum value of C may be as near unity as possible. On the other hand, we must avoid a too-fine classification or C will be affected by slight or " casual irregularities of no physical significance "; and in addition the arithmetic will be needlessly increased. Since the classification in Table XXIII is 4 X 4-fold, the value of C would very probably change somewhat if the num- ber of classes were increased. The table will serve very well, however, as an illustration of the method, and of the arithmetic involved in finding C. Moreover, as the maximum C from a 4X4-fold table is .866, and the C found from Table XXIII is .462, we are justified in concluding — in spite of the relative crudeness of our measures — that there is a medium positive correlation between pigmentation of eyes in father and son. The relation of C to r, the Product-Moment coefficient of correlation, is of considerable importance. C may be taken as practically equivalent to r, (1) when the grouping is relatively fine, — 5 X 5-fold or finer; (2) when the sample is large; (3) when we know, or are justified in assuming, that the traits which we are correlating are normally distributed. In case the first of these conditions is not fulfilled, Pearson 2 has given a correction for " broad categories " which should be used with 4 X 4-fold and less fine classifications, if C is to be compared with i An Introduction to the Theory of Statistics, 1919, p. 66. 2 Pearson Karl, On the Measurement of the Influences of " Broad Categories " on Correlation. Biometrika, Vol. IX, 1913. CORRELATION 201 r. For 5X5 fold or finer classifications this correction is usually small, and unless a very accurate measure of correlation is desired it may be disregarded and C taken as roughly equal to r. TABLE XXIV To Illustrate the Calculation of C by Short Method Boys: Ages 4|-5£ Years Weight in Pounds 24-28 29-33 34-38 39-43 44-48 49-53 Total Xfl J3 45- 42- m 39- r-C! '53 w 36- 33-i 30-. 47 1 2 3 44 4 35 21 5 65 41 5 87 90 7 1 190 38 1 18 72 8 99 35 5 15 5 25 32 2 2 38 169 133 30 Column 1: = .3762 Column 2: = .3264 8 1_99^25^2 J 1 T 25 324 2251 38Ll90 + 99 + 25J n 1 fl 16 .7569 .5184 .251 K _, n Column3: m \j +^ +— +— ■+- J = .5549 ~ . . 1 T1225 . 8100 , 641 Column 4: _^+_-+_j 1 ["4 441 49 1 3o|_3 65 + 190J = .4671 Column 5: Column 6: 30 1 6|_65^190 LI 90J ■H = .2792 = .0650 P = 2. 0688 P-l 1 . 0688 P = A 2 0688 = .719 384 202 STATISTICS IN PSYCHOLOGY AND EDUCATION The arithmetic involved in computing C may be lessened somewhat by combining the twofold process of (1) calculating independence values and (2) dividing the square of each cell frequency by its independence value. This Short Method of finding C is illustrated in Table XXIV. Note that the first occupied cell in the first column of the table has a fre- 99X8 quency of 1 and an independence value of , and that oo4 the cell frequency squared and divided by the independence , . 1 X384 _ . ,. , . 1X384 value is n -. lnis quotient, viz., is the contnbu- tion of this particular cell to the total S. In like manner the 5 2 X384 contribution to S of the next cell in this column is — — -^~ ; and of the third and last cell, . These contributions 384 / 1 25 4 from column 1 may be combined as follows, "iv - ! qTT+fp+q and the contribution of each of the other five columns to S may be found in exactly the same way. One further simplification may be made. Since iV(384) is a common factor in each column, it may be left out of the computations entirely in calculating the contribution of each cell, as shown in the table. Then if /p3J the sum of all six columns is denoted by P, C = P directly. 1 By the Short Method, C is found to equal .719, and the coefficient of correlation for the same table will be found to be .709 (see page 216). The correspondence of C and r is some- what closer here than is generally obtained, although the difference between C and r is never very great when the con- ditions prescribed on page 200 have been met. In the present i Since P = ~, S = PA r . Substituting PN for S in the formula C = -v r ~ , JPN-N . . JP—I = V — pv — or rcniovin K t" e common factor, C = -y — — — CORRELATION 203 case, N is fairly large, the classification is 6 X 6-fold, and the distributions of both height and weight fairly normal. The steps in the computation of C by the Short Method may be summarized as follows (see Table XXIV). 1. Square the frequency in each cell of column 1, and divide each square by the row total in which the cell falls. 2. Add all of the results for column 1, and divide by the column total, a common factor. Record this partial sum. 3. Repeat (1) and (2) for each of the other columns in the table. 4. Call the sum of all partial sums P. 5. Find C from the formula C = a / — — — . In many problems in psychology in which the relation between various attributes, whether of individuals or things, is sought, C will prove of considerable value. IX. Non-Linear Relationship 1. The Correlation Ratio The relation which exists between the paired values of two sets of measures X and Y may be described in a general way as either " linear " or " non-linear." When the means of the arrays of successive columns or rows in a correlation table fol- low straight lines (exactly or approximately) the regression is called " linear," and the relation between the two sets of measure or scores is a " straight line relation." On the other hand, when the drift or the trend of the means in the successive arrays cannot be described by a straight line, but can be prop- erly represented only by a curve of some kind, the regression is called curvilinear, or in general non-linear, and the relation between the two variables is a " curved line relation." Our previous discussion has been concerned entirely with cases in which the relation between X and Y was known to be linear and in which r gave a fair measure of the degree of correla- 204 STATISTICS IN PSYCHOLOGY AND EDUCATION tion. Cases sometimes arise in psychological measurement, however, in which the relation between X and Y is clearly non-linear, and in such cases the coefficient of correlation r — since the product-moment method assumes linear relationship — cannot be used. The reason for this may be stated in brief as follows. When a definitely curvilinear relation — instead of being described by a curve — is represented by a straight line, the scatter of the paired values is considerably greater about the straight line than about the curve. This results from the fact that the scatter about a curve joining the means of the successive arrays is necessarily less than the scatter about a straight line which has been " fitted " to these mean points. The less the scatter about the regression line or curve, the greater the degree of correlation; hence a coefficient of cor- relation calculated from a correlation table in which the regression is truly curvilinear will be materially less than the true correlation between the variables X and Y. (See Foot- note 1.) In order to measure non-linear relation, therefore, we need a more generalized coefficient than the coefficient of correlation, r: — that is, we need a coefficient which will measure the con- 1 A simple illustration will make clear just why this is true. The correlation between the following two short series (Table XXV) by the product-moment formula (formula 25) is .93. The true correlation, however, is 1.00, i.e., perfect, since the Y values are absolutely dependent on the X values: — as X increases TABLE XXV Variable X Variable 1 .25 2 .50 3 1.00 4 2.00 5 4.00 in steps of 1 (in arithmetic progression) Y doubles (increases in geometric progression). The reason why r is less than 1.00 is perfectly obvious as soon as we plot the paired X and Y values (see Diagram XXV). Since the relationship between X and Y is curvilinear, it cannot be described by a straight line. Con- sequently when straight line relationship is assumed (as in the product-moment formula) the plotted points do not fall on the relation line, and r is less than 1.00 — the true correlation between X and Y. In true curvilinear correlation, r is always less than rj. CORRELATION 205 centration of the paired X and Y values about a relation curve, just as r measures the concentration of the paired values about a relation line. One such coefficient is the Correlation Ratio, devised by Prof. Karl Pearson, and designated by the symbol 77. (eta). Since eta is a general coefficient it may be employed when the regression is linear as well as non-linear. If the regres- sion is linear — if the means of the arrays fall on straight lines — 77 will equal r; if the regression is non-linear — if the means 2 3 X - variable DIAGRAM XXV do not fall on straight lines — 77 will be greater than r. In gen- eral, as long as the relation between Y and X is non-linear 77 and r will differ, 77 always being greater than r. The coefficient of correlation, therefore, is seen to be simply a limiting value of the more general 77, just as straight line relationship is simply a limiting case of curvilinear relation. 77 is always positive, and varies from zero to 1 . 00. Whether or not the relation given by 77 is positive, negative or a varying one must be determined, however, from the direction taken by the curve of relation; i.e., by inspection of the correlation diagram. 206 STATISTICS IN PSYCHOLOGY AND EDUCATION The process of calculating 77 from a correlation table in which the relation is definitely non-linear is shown in Diagram XXVI. The steps involved in finding the values to be sub- stituted in the formula for r\ may be outlined as follows: Step I Construct a correlation table as shown in Diagrams XXIII and XXIV and described on page 154. Step II Find the average (Y') and the a of the F-distribution, using the Guessed average Method described in Chapter I. Step III Compute the averages (Y' x ) of the successive F-arrays, i.e., the arrays of the columns. Enter these in row marked Y' x . Step IV Find the deviation of each Y' x from the average of the whole table, Y'\ that is, find (Y' x — Y') for each column. Step V Square each deviation — each (Y'x—Y') — and enter the results in the row marked (Y' x —Y')~. Step VI Multiply or weight each (Y' x — Y') z by the F x of its column. In the first column, for example, multiply 15.52 [i.e., (Y' x —Y')-] by 20, its F x . Step VII Find the sum of the F X (Y' X —Y') 2 column. Divide this sum by X, and extract the square root. The result is a my , the standard deviation of the means of the various columns about the arithmetic mean of all of the Fs. Step VIII Divide <r my by <r y to get the correlation ration r yx . The formula for 7] yx may be written, flyx^ — , (40) (Ty If now we substitute in formula (40) the values of <r my and a u found from Diagram XXVI, the correlation-ratio v\ yx CORRELATION 207 o f t- 1 d CO 1-3 w ► H o 2 3 > H O W H cc l-t o > D H W H Q > c- 1 n cj F O o *j i-3 M H O O SI H F ► M o > H •-< O x i_^ H* J »— r; Iw »-; N • " -S s L 5- 3 ■ 1 1 * c» Number of pr )lilcrns i vo ■ked ¥ -vari ib e *•- © w to ti »» C31 OS — } 00 o o !_l IS CO *- CT II & © '-■■ to 1 eo ;o tp- o to o e» •o C!l EH «? H iS 1 -=s II fej h'- Oi *■ OS I" 1 io i— 1 to as '© en W w-. *"* o so OI to <=> «l *-£ 1 «= w 1 1 II •J2 © tO os 'co oo c-» CS w cs CO L 1 (O H» h- 1 < ■"pi 1 e> o fc Sj os 1 1 II °»lc»l OI . 1 OS © CO I - a fO J, |_l <- |o| Ci CO is 09 O OS *" o SB \ ■x CO O £0 o I co II CH 1 \ \ SI C"S to tS 1 <! jg < o Is o co SO ~3 to F' rs --i **■ vi V si \ eg II 00 00 ,1 *» i_i \ N i 60 N> < .3 *- *■ •-S CO to '© o iO -q as ffl ^^\C5 fS ■-s >o — II • GO o CO so IP- p o © - 00 Ol ^. ^v 3S rf* (O O >D < — i ,-v iS H >*- o" id o a p SO CO OS °-o © >*- CO «S ;c ts o ^ ^<j is ~J o S. ~* 00 1! > \ o Ol cs es ht- jj, OS to ,_, _ M ^J JO M M M ^ C' C7I -5 M 1-1 H- ~" C5 rf* ss *- <i oo n- Si ts o <c os o 1 o 1 1 1 OS 1 1 o - ts OS *- Ol cn -J CO to b «: < | * < 1 1 1 co 1 •j0 1 Ol 1 1 CO *» to *- <-o (O to ^ -J Ci CJ' to os ao *> to to CO o *» 35 CS o ol^ if ,1 ? 1 cs or|-g w|§ b II ■J *^ 1 e t^ 1 l J- 1 tS CO II .*■ cs > o -= s> o 09 to ,_, _ *■ CO CO -1 CO a a s en -a o ri CO to lo o c o> CO as ^ o O C^l ^2 CO s> e ^ to CO o p «s' T9 I o «K II w- II EO to OS >< 208 STATISTICS IN PSYCHOLOGY AND EDUCATION becomes .931. 1 This coefficient shows how the number of problems worked (on the average) in a certain arithmetic test (F) is related to the grade position (X) of 465 pupils. The curve which describes this relation — the curve which best marks the trend or " drift " of the means of the successive Y arrays — has been drawn in on the figure. Note that it begins low and gradually rises, suddenly bending up in a concave fashion. From the diagram alone it would seem to be clear enough that the regression of 7 on J is non-linear. Further evidence of this may be found in the fact that the coefficient of cor- relation, r, calculated from this table (on the assumption, of course, of linear relationship) is . 80, — about . 13 less than 7] yx . The method of determining definitely whether regres- sion is linear or non-linear in any table will be given in (3) following. There are always two q's in every non-linear correlation table, just as there are always two regression coefficients, r— and r— , in a table in which regression is linear. The one, a x cr y written r] yx , refers to the regression of Y on X (Y is the dependent variable); the other, written rj xy , refers to the regression of X on Y (X is the dependent variable). The value of r) xy may be computed in exactly the same way as rj yx by substituting X for Y in the outline of " steps " given above. The formula is *.-— , (-42) Unlike r which has the same value in both regression equa- tions [see formulas (28) and (29)] rj yx and y] xy will usually differ, their values depending on the degree of scatter about the curves joining the means of the Y and X arrays. In the present 1 The PE of rj may be found from the formula P*,-*«£=aS (41) or from Table XVIII. CORRELATION 209 problem, for example, rj xy = .818, while rj yx = .931 as shown above. In the special case in which the regression is truly linear, y\ yx and 7} xy equal each other, and both equal r (see page 205). 2. The Correction of " Raw " Eta The value of rj depends materially on the number of cases in the sample, and on the fineness of the grouping. As a general rule, rj should never be calculated unless N is fairly large. When N is comparatively small or the number of arrays is large, Pearson 1 has given a correction which should be applied to the " raw " (i.e., calculated) value of rj. If we represent the number of arrays by k the formula for " corrected eta " is V 2 (k-3) N corrected r\ 2 = ( , .... (43) N (The rj on the right hand side of the equation is the " raw " eta.) If we apply this correction to the value of rj yx obtained in the present problem, we have, substituting .931 for 7] yx , 8 (the number of F^arrays) for k, and 465 for N, (.931) 2 -.011 corrected rj 2 yx m 1— .011 V yx— qoq — .oboo, and 7] VX = .930. In the present case the correction is very small. If iV is small, however, or k large, the raw eta may be considerably reduced. 3. Test for Linearity of Regression It is oftentimes difficult to tell from the appearance of a correlation table whether the regression is linear or non-linear ; i Biometrika, 1923, 14, 412-417. 210 STATISTICS IN PSYCHOLOGY AND EDUCATION and in such cases it is best to calculate both r and 77. As stated above, if the regression is strictly linear 77 equals r; and the greater the departure from linearity the greater the difference between 77 and r. A simple test of linearity is that f (zeta) the difference between y\ 2 — r 2 shall differ from zero by an amount which is not greater than that which might arise from fluctuations due to random sampling. To make this test, we must first find PE$ given by the formula 1 PE t =. 6745X2^ V(l-r7 2 ) 2 -(l-r 2 ) 2 + l, . (44) The second radical in formula (44) is approximately equal to 1, and hence unless great accuracy is required we may write the formula simply as PE { =. 6745X2^, (45) In the problem which we have been considering %*= .930 and r= .80. Accordingly, f= (.930) 2 -(.80) 2 or .2249, and from formula (45) PE$ = .030. 2 Zeta, therefore, is / • T 2249 \ 7.49 times its PE since T ^ F r = — -—^r- or 7.49 and there is no \ r fci^ . Uo(J / doubt as to the non-linearity of the regression. To determine whether -=r=- denotes a real or simply a chance difference between r] 2 and r 2 , Table XV, the ^-^ table, may be used conveniently. If zeta is very small, or if both 77 and r are small, a simple test for linearity (Blakeman's test 3 ) which does not require finding PE$ may be used. According to this test, when A r (77 2 -r 2 )<11.37 (46) 1 This formula is due to Blakeman. Sec Yule, An Introduction to the Theory of Statistics, p. 352. 2 Formula (44) gives PE (zeta) as .02S. The difference between the results given by formulas (44) and (45) is negligible here. 3 Blakeman, J., On Tests for Linearity of Regression, Biometrika. 4. 1906, pp. 332-350. CORRELATION 211 fche regression is linear. In our problem, N(r) 2 — r 2 ) = 104.58, and the regression is clearly non-linear. True non-linear relation is often met with in psycho- physics, and in experiments dealing with fatigue, practise, forgetting, etc. Most mental and physical tests, however, have been found to exhibit linear relationship, and in con- sequence r has been employed in psychology and education to a much greater extent than v . If the regression is definitely non-linear, it makes considerable difference whether 77 or r is taken as the measure of relation. Unless the regression is clearly curvilinear, however, little error is introduced by taking r instead of rj; and this is especially true if the cor- relation is low. The coefficient of correlation, r, is superior to rj in that knowing its value we can easily write the equation from which the value of the dependent variable may be estimated from the independent. This is not possible with the correlation-ratio. In order to estimate one variable from the other in non-linear relation, a curve must be fitted to the means of the arrays of the columns or rows. 1 ( » X. The Correction of a Coefficient of Correlation for " Attenuation " The accuracy of any series of test scores or other meas- ures of capacity is always conditioned by the number and size of the chance variations — " errors of observation " — pres- ent. The term " errors of observation " may be taken to in- clude slight changes in technique and procedure on the part of the experimenter, as well as variations in the subjects due to fatigue, distraction, shifts in attention or attitude towards the test, and other minor fluctuations of different sorts. If the number of observations is large, errors of observa- tion — since their effect is as liable to be in the negative as the 1 The subject of curve fitting is fully dealt with in more advanced books on statistics. See, Jones, D. C, A First Course in Statistics, 1921, Chaps. XV, XVI, and XVtL for a fairly elementary discussion. 212 STATISTICS IN PSYCHOLOGY AND EDUCATION positive direction — will tend in the long run to cancel each other off as far as the average is concerned. Such errors, however, always tend to increase the a of the distribution, and to decrease or " attenuate " a coefficient of correlation calculated between series in which they are present. For this reason, it is generally advisable to correct raw r's for observational errors, and special formulas have been devised to rule out their effect. 1 It is first necessary to make at least two independent measures of each capacity, and to find the self-correlation of each test. 2 This done, the r corrected for attenuation may be found from formula (47) given below. The complete procedure is as follows: Let A and B represent the tests to be correlated. Let A\ represent the 1st series of scores obtained in A. Let A 2 represent the 2nd series of scores obtained in A. Let Bi represent the 1st series of scores obtained in B. Let B2 represent the 2nd series of scores obtained in B. Let Tab represent the " true " correlation between tests A and B. Let r Al A 2 represent the self-correlation of test A. Let r Bl B 2 represent the self-correlation of test B. Let r Al B 2 represent the obtained correlation between A and B2. Let r A iB x represent the obtained correlation between A 2 and B\. Then 3 v (r^ ]B2 )(;\4 2 si) (A n\ Tab= ,- ===== , (4/; 1 See the two articles by C. Spearman: (a) The Proof and Measurement of the Association between Two Things, American Journal of Psychology, 190-4, Vol. XV, p. 72-101. and (b) Demonstration of Formulae for True Measure of Correlation, American Journal of Psychology, 1907, Vol. XVIII, p. 161-169. 2 See page 288. 3 See Yule, An Introduction to the Theory of Statistics, pp. 213-214 for discussion of this formula. CORRELATION 213 To illustrate the formula, suppose that. A is a Following Directions Test, and B a Mixed Relations Test, and that r Al A 2 = . 72 r Bl B 2 = . 75 r Al B 2 = . 35 r A2 B 1 = . 42 Substituting in formula (47) we have V.72X.75 or correcting for observational errors, we raise the correlation from .35 and .42 (the obtained r's) to .52. If we have only the one correlation between two given tests A and B, so that formula (47) is inapplicable, it is still possible to obtain an approximate correction for attenuation by dividing the " raw " coefficient by the geometrical mean of the two " reliability coefficients." 1 Formula (47) then becomes r AB = /**- , (48) v TA i A 2 TB 1 B 2 Thus if the obtained correlation between tests A and B above had been . 50, and the reliability coefficients, as before, . 72 and . 75, we could correct (approximately) for attenuation as follows : Tab = , = ■ 68. V.72X.75 Corrected for attenuation, the obtained coefficient is increased from .50 to .68. XL Summary of Formulas Used in This Chapter 1. For Product-Moment r, deviations from GA's Ixy N C X Cy (23) a x (T y 1 See Spearman, C, American Journal of Psychology, 1904, Vol. XV, p. 271. 214 STATISTICS IN PSYCHOLOGY AND EDUCATION 2. For Product-Moment r, deviations from actual averages r =ivd' (24) r- J-* ■ •' (25) 3 P ^ r = ^5Xil-!) (26) Vat 4. PJE (dM . ri _ r2) = VPE n 2 +PEr 2 2 \ (27) 5. Regression Equations in Deviation Form y = r-^-x, (28) x=r-^-y, (29) 6. Regression Equations in Score Form Y = r-^(X-X') + Y', (30) &x X = r--(Y-Y')+X', (31) 7. Standard Errors of Estimate o-(est. r) = oyvl — r 2 , (32) 0-(est. X) = 0*Vl — r 2 , (33) P^(est.y)= .6745<r„Vl-r 2 J (34) PE {est . a-) =■ 6745(7, VT=^, (35) 8. Correlation Measured from " Ranks " 62Z> 2 P = 1 ~iY(iV^l)' (36) pR= .70630--^ (37) 62(7 /? = 1 -(^^TI). (38) CORRELATION 215 9. Coefficient of Mean Square Contingency, C C-^—, r . (39) 10. Non-line^ Regression %* = —", (40) a p ^ = ; C745X(l-^) ) (41) *»-— . (42) 2 C*c — 3) 71 N~ Corrected ?? 2 = - ( rr-, (43) N P^ r =. 6745X2^. V(l-^)2_ (1 _ r 2 )2+1> g (44) P# r =. 6745 X^Jjr (approximately), . . . .(45) JV(r7 2 -r 2 )<11.37, (46) 11. Correction for Attenuation v / (r^ 1 g 2 )(r^ 2gl ) r^g= /7 — ===, (47) Tab= . TA \ B ; =, (48) PROBLEMS 1. Find the coefficient of correlation (product-moment) between the following sets of Army Alpha and typewriting scores made by 100 students in a typewriting class. The typewriting scores are 216 STATISTICS IN PSYCHOLOGY AND EDUCATION in number of words written per minute (with certain penalties). In tabulating scores, let typing be the F-variable and Alpha the X-variable. Take the F-step as 5 and the X-step as 10 units. Typing (F) Alpha (X) Typing (F) Alpha (X) Typing (F) Alpha (X) 46 152 26 164 40 120 31 96 33 127 36 140 46 171 44 144 43 141 40 172 35 160 48 143 42 138 49 106 45 138 41 154 40 95 58 149 39 127 57 146 23 142 46 156 23 175 45 166 34 156 51 126 44 138 48 133 35 120 47 150 48 173 41 154 29 148 38 134 28 146 46 166 26 179 32 154 46 146 37 159 50 159 39 167 34 167 29 175 49 139 51 136 41 164 34 183 47 153 32 111 41 150 39 145 49 164 49 179 32 134 58 119 31 138 37 184 35 160 - 47 136 26 154 48 149 40 172 40 90 40 149 30 145 53 143 43 143 40 109 46 173 38 159 38 158 39 168 37 157 29 115 52 187 41 153 43 93 47 166 51 149 55 163 31 172 40 163 37 147 33 189 35 175 52 169 22 147 31 133 38 75 46 150 23 178 39 152 44 150 37 168 32 159 37 143 46 156 42 150 31 133 2. In the Correlation Table 1 given below, find (a) the coefficient of correlation, and PE r ; (b) the regression equations in Score Form, and the standard errors of estimate. (c) What is the most probable height of a boy who weighs 30 pounds? 45 pounds? i See Table XXIV for the C worked out for these data. CORRELATION 217 Boys: Ages 4.5 to 5.5 Years Weight in Pounds (X) 24-28 29-33 34-38 39-43 44-48 49-53 Totals (Fy) £m 45-47 1 2 3 02 0) 42-44 4 35 21 5 65 39-41 5 87 90 7 1 190 d • F-H 36-38 1 18 72 8 99 '53 33-35 5 15 5 25 w 30-32 2 2 Totals Fa; 8 38 169 133 30 6 384 3. In the following correlation table, 1 find (a) the coefficient of correlation, and the PE r . (b) What is the most probable grade of a pupil who makes 120 on Alpha? Army Alpha IQ's School Marks 84 and lower 85- 89 90- 94 95- 99 100- 104 105- 109 110- 114 115- 119 120- 124 125 over Totals 90 and over 3 3 15 12 9 9 5 56 85-89 8 17 15 24 13 6 6 89 80-84 4 6 22 21 20 10 5 1 89 75-79 7 25 33 23 10 7 4 109 70-74 4 10 18 14 22 12 1 1 82 65-69 1 3 3 12 7 8 8 1 43 60-64 2 5 3 1 1 12 Totals 1 7 26 77 99 105 87 41 25 12 480 From. Otis, Statistical Methods in Educational Measurement, 1925, p. 315. 218 STATISTICS IN PSYCHOLOGY AND EDUCATION 4. Find the correlation between the following test scores by (a) the Rank-Difference Method, and (b) the Method of Gains. Cancellation Score (A test + Number Group Checking Test) 110 98 118 104 112 124 119 95 94 97 110 94 126 120 118 (Note. — Since the Cancellation scores are in seconds, the highest score (94) is numerically the lowest.) 5. Compute the coefficient of contingenc}^ C, for the two tables given below, which show: A. The resemblance between brothers in athletic capacity. 1 B. The resemblance between fathers and sons in temperament. 2 Individual Intelligence Score (Alpha) Kp My Le 185 203 188 Hy Sh 195 176 Ld 174 Sn 158 St 197 Wn 176 Pe 138 Gr 126 Bn 160 Gm 151 Ly Ws 185 185 Athletic Capacity — First Brother « a W H O H Q O o w Athletic Betwixt Non-athletic Totals Athletic 906 20 140 1066 Betwixt 20 76 9 105 Non-athletic 140 9 370 519 Totals 1066 105 519 1690 1 From Yule, An Introduction to the Theory of Statistics, p. 74, after Pearson. 2 From Brown and Thompson, Essentials of Mental Measurement, 1921 p. 125. The coefficient of contingency is not usually calculated for tables having less than a 5X5 fold classification. These tables, however, will illustrate the method in a simple way, CORRELATION 219 B Fathers Merry Melancholy Alternating Even Totals Merry 122 8 81 67 278 Melancholy 10 2 7 10 29 O Alternating 70 9 101 68 248 Even 58 6 66 45 175 Totals 260 25 255 190 730 6. The following correlation table gives the relation between the scores on the Thorndike College Entrance Intelligence Examina- tion and the extra-curricular activities of 102 Columbia College students. 1 (a) Find rj yx for this table. (6) Find r, and test the regression of 7 on J for linearity. Thorndike Scores (X) 55- 59 60.- 64 65- 69 70- 74 75- 79 80- 84 85- 89 90- 94 95- 99 100- 104 Fy ^ 18-20 2 2 4 02 ]+3 15-17 2 3 1 6 > •|-H < c3 12-14 4 6 2 2 14 9-11 1 2 4 4 6 7 3 27 3 6-8 1 6 2 2 6 2 4 1 24 3 o i o3 3-5 1 1 3 5 3 5 1 1 20 0-2 1 1 1 1 1 2 7 Totals F x 2 2 3 16 13 20 16 15 11 4 102 i From Sommerville, R. C, Physical, Motor, and Sensory Traits. Archives of Psychology, 1924, 75, p. 101, 220 STATISTICS IN PSYCHOLOGY AND EDUCATION 7. Verify the correlation-ratio r) xv of . 82 given for Diagram XXYI (see page 209). (a) Test the regression of X on Y for linearity. (6) Plot the regression line (or curve) on the diagram. 8. Ma is the series of scores from one trial of a memory test. Mb is the series of scores from a second trial of the same test. Aa is the series of scores from one trial of an association test. A6 is the series of scores from a second trial of this test. The r's are as follows: between Ma and Mb, . 60. between Mb and Aa, .50. between Ma and A b, .55. between Aa and A b .72. Find the r between M and A corrected for attenuation. Answers 1. r=-.05; PEr=. 07. 2. (a) r=.709; PE r = .017. (b) Y= .4X+24.42; X=l. 267-11. 66 °"(est. Y) = 1 ■ ' 9 ; c (est> X) — 3 . 18. (c) 36.42 inches; 42.42 inches. 3. (a) r=.455; PE r = . 024. (6) 85.4 with a PE iesU Y) of 4.75. 4. (a) p=.187; r=.19 PEr= .18. (6) #=.09; r=.16. 5. A. C=.6S B. C=.16. 6. (a) r] yx = A3; r\ yx (corrected) = .36. (6) r= — .09. The regression is almost certainly non-linear. 8. r=.80. CHAPTER V PARTIAL AND MULTIPLE CORRELATION 1 I. The Meaning of Partial and Multiple Correlation The coefficient of correlation between sets of test scores (or other series of measures) often represents not simply the degree of relationship existing between these measures in themselves, but the degree of this relation plus the indirect effect of other factors to which they are both related. For this reason in measuring the correlation between two sets of measures, it is necessary that we eliminate or rule out as far as possible those uncontrolled factors which through their common relation to the measures to be correlated tend to raise or lower the " net " correlation. As an illustration of the effect on correlation of uncontrolled factors, suppose that the correlation between intelligence (i) and age (a) in a large group of children whose ages range from 7 to 14 years is r lQ ; that the correlation between school achievement (s) and age (a) in the same group is r sa ; and that the correlation between intelligence (z) and school achievement (s) is r ls . Xow this last coefficient, r ls , is not simply a measure of the influence of intelligence on school achievement, but is a measure of the influence of intelligence, plus the indirect effect of differences in age, on school achieve- ment. In order to determine the relation between intelli- gence and school achievement uninfluenced by the age factor, it is necessary to rule out the effect of age-differences. This can be accomplished in two ways: (1) by selecting children all of whom are of the same age, or (2) by finding a " partial ' : coefficient of correlation between intelligence and school 1 The discussion of partial and multiple correlation given in this chapter follows Yule in method and nomenclature. 221 222 STATISTICS IN PSYCHOLOGY AND EDUCATION standing. Such a partial coefficient is written r l5 . a , and may be thought of as giving the net correlation between intelligence and school achievement for children of the same age, or as the net correlation between intelligence and school achievement with age constant. In short, a coefficient of partial correlation may be said to represent the net relation between two variables when one or more other variables which might increase or decrease the true correlation have been ruled out or held con- stant. In addition to its value as a device whereby we are able to control conditions by ruling out disturbing factors, partial correlation is highly important also in that it enables us to build up regression equations involving three or more variables from which a test score (or other measure) may be predicted when we know the corresponding scores made on the other tests. The value of the regression equation in estimating scores — its accuracy as a predicting instrument — may be determined from the " multiple " coefficient of correlation. 1 This coefficient gives the correlation between the scores actually obtained on a given test, and the scores on the same test predicted by the re- gression equation from the scores made on two or more correlated tests. The multiple coefficient of correlation may be thought of also as giving the correlation between a trait (or traits) as measured by a single test, and the same trait (or traits) as measured by a number of tests taken together. (The multiple coefficient will be best understood by working through an actual problem.) To summarize briefly, partial and multiple correlation may be considered as representing an important extension of the theory and technique of " simple " or two- variable cor- relation to include problems which involve three or more variables. 1 o" (est.) also gives the accuracy of the regression equation in predicting single scores. (See page 183.) PARTIAL AND MULTIPLE CORRELATION 223 II. A Correlation Problem Involving Three Variables The simplest and most straightforward approach to an understanding of the value of the method of partial and mul- tiple correlation and of the technique involved is by way of an illustration. In the present section, therefore, is shown the application of partial and multiple correlation to a three-vari- able problem; and following this, the general formulas and some further applications of the method are considered. The problem selected (Table XXVI) is taken from a study made by Professor Mark May 1 of the factors which influence " academic success." In that part of his study from which our example is taken, May wished to find how accurately he could " predict " the academic success or scholastic achievement of 450 Syracuse freshmen from a knowledge of their general intelligence and study habits. Academic success was defined specifically as the number of " credit " or "honor" points obtained by a student at the end of his first semester in college. The number of honor points secured depends on the number of A, B, and C grades made by the student in his courses. Thus a grade of A carries 3 honor points; a grade of B, 2 honor points; a grade of C, 1 honor point ; and a grade of D, which is a passing mark, carries no honor point credit. The maximum number of points which a freshman taking the " regular " course can obtain in one semester is 48. General intelligence was measured by a combination of the Miller Mental Ability Test, and the Dartmouth Completion of Definitions Test. The Miller Test contains 120 items and the Dartmouth Test 40, so that the maximum " raw score " was 160. The scores of the 450 students ranged from 50 to 150, the distribution being fairly normal. As a measure of industry and application, it was decided to take the number of hours per week spent, on the average, in study. Information in regard to study habits was obtained 1 May, Mark A., Predicting Academic Success, Journal of Educational Psy- chology, 1923, Vol. XIV, 7, pp. 429-440. 224 STATISTICS IN PSYCHOLOGY AND EDUCATION by means of a questionnaire given at the beginning and at the middle of the first semester. Among other items of informa- tion asked for in the questionnaire were such things as the number of hours spent per week at meals, in sleeping, etc. In this way an attempt was made to have the student think that he was being checked up on the distribution of his total time, and not on his study habits alone. The self-correlation between the two statements— number of hours spent in study — on the first and second questionnaires was .86, which indicates a very satisfactory degree of reliability. As previously stated, the main object of this study was to find how accurately the number of honor points which a student receives can be predicted from a knowledge of his study habits and his general intelligence. 1 In solving this problem, however, it is necessary to find the partial coefficient which shows to what extent honor points are related to general intelligence when the variable factor of study-hours per week is held constant; and also the partial coefficient which shows to what extent honor points are related to study-hours when the variable factor of general intelligence is held constant. This information, in itself, will prove to be of considerable interest. The solution of the whole problem is given in the following series of steps — the necessary data and statistics will be found in Table XXVI Step I. Note that the mean and a of each series of measures, and the inter correlations are first calculated. These inter- correlations are the usual product-moment r's, computed as shown in Chapter IV. The r between (1) honor points, and (2) general intelligence, written ru is .60; the r between (1) honor points and (3) number of study hours, written ri3, is .32; and the r between (2) general intelligence and (3) number of study hours, i.e., r23, is —.35. The low correlation between honor points and study-hours is of considerable interest; 1 Other factors, of course, such as health, personality, previous preparation, etc., are of considerable importance in determining honor points as May indicates in his article. The two factors selected were chosen simply because they are not only important, but also objective and measurable. PARTIAL AND MULTIPLE CORRELATION 225 but probably the most interesting r is the — .35 between study- hours and general intelligence. Evidently, the brighter the student, the less he studies! Step II. The next step is to calculate the " net " correlation between (1) honor points and (2) general intelligence with the influence of (3) study-hours "partialed" out or held constant. This net, or partial coefficient of correlation, is written ri2.3. The formula 1 for ri2.3 is 7-12.3 = 77.=^ / — -f=. [Formula (49), page 232]. vi — r 13 vi — r 23 Substitution of the values of n.2, nz, and r23 in the formula gives ri2.3 a value of .802. This means that if all of our 450 students studied exactly the same number of hours per week (i.e., if the number of study hours were constant), the coefficient of correlation between honor points earned and general intel- ligence scores would be .802 instead of .60, the obtained coeffi- cient, ri2. In other words, if each student spent the same number of hours in study, there would be a much closer corre- spondence between general intelligence and honor points than there is when the number of study hours varies. The partial coefficient of correlation between (1) honor points and (3) hours spent in study for (2) general intelligence constant is given by the formula ri3.2 = , ri8 ~ r "?gl= . [Formula (49)] vl-r 2 i 2 vl — H23 Substitution of the values of 7*13, ^12 and r23 gives a partial coefficient 713.2= .707 as against a "raw" coefficient, 7*13, of .32. It is evident, therefore, that if our group were of the same degree of general intelligence 2 there would be a much closer correspond- 1 The general formulas from which this and other formulas used in this section are derived will be found in Section III following. 2 By " same degree of general intelligence " is meant the same score on the given general intelligence tests. 226 STATISTICS IN PSYCHOLOGY AND EDUCATION ence between the number of honor points received and the number of hours spent in study than there is when the members of the group possess varying degrees of general intelligence — and this is certainly the result to be expected. The last partial coefficient of correlation r2s.i=— .715. This coefficient gives the net correlation between (2) general intelligence and (3) study-hours, for (1) honor points held constant, and is found from the formula r 2 3.i = . 9 .- = . [Formula (49)] V 1 — r J i2 v 1— H13 Like the two partial r's above, we may interpret r2z.\ to mean that the correlation between general intelligence score and hours spent in study in a group in which every student has earned the same number of honor points would be much higher — negatively — than the raw correlation between these same two factors in a randomly selected group — a group in winch the number of honor points received by different students vary. Thus we discover that the brighter students not only study less than the average and dull (since ros = — .35) but that the brighter the student the less he needs to study in order to reach a given standard of academic success, — to secure a given number of honor points (since r23.i= —.715). Step III. The partial coefficients of correlation calculated, the next step is to write the regression equation from winch the most probable number of honor points which a student will receive can be estimated, given his general intelligence score and the number of hours he spends in study per week. The regres- sion equation for three variables is written — in Deviation Form — as follows: [Formula (51)]. Xl = bi2.3X2 + bi3. 2.T3- In this formula x\ is the dependent variable and stands for honor points; X2 and £3 are the independent variables, and PARTIAL AND MULTIPLE CORRELATION 227 stand for general intelligence and study-hours respectively. 1 In Score Form the equation becomes: [Formula (52)] (Xi-Av.Xi)=6i2.3(Z 2 -Av.Z 2 )+6i3.2(X3-Av.X8), or transposing and collecting terms, X\ — 612.3 X2+613.2 Xz-\-K (a constant). It is clear that before we can use this equation we must find the values of the regression coefficients 612.3 and 613.2. These are found from the formulas, &12.3 = 7*12.3-^; and 613.2 =ri3.2— 1 —, [Formula (53)] 0"2.13 0-3.12 and as we already have the value of ri2.3 and 7*13.2 it is only necessary to find 0-1.23, 0-2.13, and 0-3.12 (the "partial" o-'s) in order to replace the regression coefficients in the equation by numerical values. Step IV. The values of the " partial "o-'s are found from the formulas, _____ 1. 0-1.23 =01 Vl— r 2 i 2 Vl— r 2 i3. 2 . 2. 02.13 =02 Vl — r^Vl— r 2 i2.3. [Formula (50)] 3. 0-3.12=0-3^1 — r 2 23^1— ^ 2 13.2. Substituting the known values of the raw and partial r's in these formulas we get 0-1.23 = 6.34; 0-2.13 = 8.84; 03.12 = 3.97. (For calculations, see Table XXVI.) Step V. From the partial o-'s and the partial r's, the numerical values of the regression coefficients 612.3 and 613.2 are found to be .57 and 1.13, respectively. Hence we may now write the regression equation as #1= .57^2 + 1.13x3; or multiplying by a convenient constant (e.g., by 1.75), (the num- ber of honor points) = 1 (score on the intelligence tests) +2 (num- ber of hours spent in study per week). It is evident from this equation that in so far as the general intelligence score and 1 Note the resemblance of this equation to the simple regression equation for two variables y=bn-x (page 174). If x\ is put for y and x 2 for x in this equation, we have, 21 =612 -£2. 228 STATISTICS IN PSYCHOLOGY AND EDUCATION number of study hours per week determine the number of honor points received, their relative weight is as 1 : 2. TABLE XXVI A Correlation Problem Involving Three Variables Step I (1) Honor Points (2) General Intelligence (3) Hours of Study per Week ilfi = 18.5 ikf 2 = 100.6 Af 3 = 24 Ol = 11.2 (T2 = 15.8 3 =6 ri2=.60 ri 3 =.32 r 23 =-.35 Step II. Calculation of Partial Coefficients of Correlation, (see Note) ftM «, **-'•»•'■» =1 60 -.32(-.35) = n - 3 Vr^WI^3 • 9474 X. 9367 '**' ' ' ^ ri 3 -ri 2 r 23 = .32- .60(- .35) = 7QfJ Vl-r^Vl^rSa .8X.9367 _ r 2 3— ri 2 r.3 _ — .35— .32X .60 _ __ ^"vr^^yp^Ts" .8X.9474 •'*■ * For Vl— r 2 values, use Table XXVII. Step III. The Regression Equations Xi= 612.3X2+613.^3 (Deviation Form), .... (51) or Xi = bi2.zX2+bu.2X 3 +K. (Score Form), .... (52) in which 6i 2 .3=n 2 .3 — — and 613.2=7*13.2 — — (53) 02.13 0"3.12 Step IV. Calculatio n of o's (1) Q-1.23 =<riV l-y 2 i 2 Vl-r 2 i3.2 = 11.2X.8X. 7072=6. 34. . (50) (2) q-2.13 =0-2 V l -rhs V l -r 2 i 2 .3 = 15 . 8 X ■ 9367 X ■ 5973 = 8 . 84 (3) o 3 .i2 = (Wl-r 2 23Vl-r 2 i3.2 = 6X.9367X. 7072 = 3. 97 Step V. The Regression Coefficients and Regression Equation Substituting for 7*12.3, 7*13.2, 0-1.23, 0-2.13, 0-3.1-2, we have 612.3=. 802 x|^=. 57; 613.2= .707 X§^ = 1.13. Hence the regression equation becomes: xi = . 57a*2+l . 13.r 3 (Deviation Form), or Zi= .57X2+1.13X3-66 (Score Form). Step VI. Calculation of the Standard Error of Estimate o(est. Xi) =oi.23 = 6.34 (54) P#(est.A-i) = .6745X6.34=4.2S (55) Step VII. The Coefficient of Multiple Correla tion 7^(23) = Jl--!A 3 (56) ™ 0-1 = .824 Note. — It should be noted that while the partial coefficient of correlation 7*23.1 is of interest as giving us the relation between general intelligence and hours PARTIAL AND MULTIPLE CORRELATION 229 spent in study for a constant number of honor points, it is unnecessary in the regression equation, x\ =612.3^2 +&13. 2^3. In order to evaluate the constants 612.3 and 613.2 in this regression equation, we need only 7-12.3 an d ^13.2. In any problem involving three variables, only two partial coefficients of correlation need be computed, if we are interested only in the prediction of Xi values from known values of X2 and X3. to Infer the TABLI Value of 5 XXVII a Given A Table V 1— r 2 FROM Value of r r Vl-r2 r Vl-r 2 r Vl-r 2 .00 1.0000 .34 .9404 .68 .7332 .01 .9999 .35 .9367 .69 .7238 .02 .9998 .36 .9330 .70 .7141 .03 .9995 .37 .9290 .71 .7042 .04 .9992 .38 .9250 .72 .6940 .05 .9987 .39 .9208 .73 .6834 .06 .9982 .40 .9165 .74 .6726 .07 .9975 .41 .9121 .75 .6614 .08 .9968 .42 .9075 .76 .6499 .09 .9959 .43 .9028 .77 .6380 .10 .9950 .44 .8980 .78 .6258 .11 .9939 .45 .8930 .79 .6131 .12 .9928 .46 .8879 .80 .6000 .13 .9915 .47 .8827 .81 .5864 .14 .9902 .48 .8773 .82 .5724 .15 .9887 .49 .8717 .83 .5578 .16 .9871 .50 .8660 .84 .5426 .17 .9854 .51 .8617 .85 .5268 .18 .9837 ( .52 .8542 .86 .5103 .19 .9818 .53 .8480 .87 .4931 .20 .9798 .54 .8417 .88 .4750 .21 .9777 .55 .8352 .89 .4560 .22 .9755 .56 .8285 .90 .4359 .23 .9732 .57 .8216 .91 .4146 .24 .9708 .58 .8146 .92 .3919 .25 .9682 .59 .8074 .93 .3676 .26 .9656 .60 .8000 .94 .3412 .27 .9629 .61 .7924 .95 .3122 .28 .9600 .62 .7846 .96 .2800 .29 .9570 .63 .7766 .97 .2431 .30 .9539 .64 .7684 .98 .1990 .31 .9507 .65 .7599 .99 .1411 .32 .9474 .66 .7513 1.00 .0000 .33 .9440 .67 .7424 To write the regression in Score Form, we simply replace xi by (Xi-18.5); x 2 by (X 2 -100.6); and £3 by (X 3 -24). The equation then becomes Xi=. 57X 2 + 1.13X3 -66. 230 STATISTICS IN PSYCHOLOGY AND EDUCATION Given a student's general intelligence score (X2) and the number of hours he spends in study per week (X3) we can, from this equation, estimate the most probable number of honor points which he will receive in the first semester. By way of illustra- tion, suppose that a student has a general intelligence score of 120 points and that he studies on the average 20 hours per week: how many honor point will he most probably receive during the first semester? Substituting X2 = 120 and X3 = 20 in the regression equation, we have that Xi=*. 57X120+1. 13X20-66, or Xi = 25. The most probable number of honor points which this student will receive, therefore, using the given criteria as the basis of our estimate, is 25. Step VI. This estimate — like every other " most probable " number of honor points predicted from the regression equation — has a certain " error of estimate." The standard error of estimate of all honor points, i.e., Xi's, predicted from the regression equation Xi = 612.3X2 +&i3.2X3-|-i£ is designated o-(est.xi) and equals 0-1.23 [see Formula (50)] directly. The Potest. Xi) IS • 6745 X <7 (es t. Xx). The standard error of estimate in the present problem is 6.34 points, and the PE^ t . Xl ) is 4.28 points. In the illustration above, therefore, the 25 estimated honor points have a PE^st.xi) °f 4.28 points, which means that the chances are even — 50 in 100 — that this student will receive (roughly) not less than 21 nor more than 29 honor points. The reliability of any other honor points estimate made from the regression equation may be found in exactly the same way. Step VII. The final step in the solution of our problem is to compute the coefficient of multiple correlation. This " mul- tiple r," which is generally written R 1 , has been defined (see page 222) as the coefficient of correlation between the scores 1 Multiple R must not be confused with the R of the Spearman FootruJe formula, page 104. PARTIAL AND MULTIPLE CORRELATION 231 actually made on a given test and the scores on the same test predicted from the regression equation. Expressed more mathematically, R gives the correlation between the dependent variable Xi, and the independent variables, X2, X3, etc., taken together as a team. The formula for R when there are two independent variables is Ri&3) = ^l-^^. [Formula (56)] In the present problem, i2i ( 23)= .824. This means that if the most probable number of honor points which each student in our group of 450 will receive is predicted from the regression equation, the correlation between these 450 pre- dicted scores and the 450 scores actually received will be. 824. Multiple R, therefore, tells us how closely Xi is related to the combined action of X2 and X3, or — in the present instance — how closely honor points are related to general intelligence and num- ber of hours spent in study per week, taken together. III. General Formulas for Use in Partial and Multiple Correlation I. General Formulas for Partial r's We have found (Table XXVI) that in a correlation problem involving three variables, we are enabled by the method of partial correlation to find the net relation between two variables when a third is ruled out or held constant. In like manner, by an extension of the method of partial correlation, we can secure the net correlation between Xi and X2 when two or more variables have been ruled out or held constant. Thus the partial coefficient of correlation 7-12.34 means by analogy to ri2.s that the correlation between Xi and X2 has been freed of the influence of both X3 and X4; and the partial coeffi- cient of correlation ri2.34 . . . n means that the correlation between Xi and X2 has been freed (theoretically) of the influence of all disturbing factors. 232 STATISTICS IN PSYCHOLOGY AND EDUCATION In every partial coefficient of correlation the subscripts to the left of the point are called primary subscripts and denote the two variables whose correlation we are seeking. The subscripts to the right of the point are called secondary sub- scripts, and denote those variables which are to be ruled out or held constant. 1 The order of a partial r is determined by the number of its secondary subscripts: ru.z or 7*13.2 or 7*23. 1, for example, is a partial r of the first order, while " entire " or " total " r's, such as r\2 or ri3 or r23 are coefficients of zero order. The general formula for partial r's of the nth order is written ^12.34 . . . (n-1)— rin.34 . • . (n- l)?"2n.34 . . . (n-1) //(m 7*12.34 . . . « = , 7= = . (49) VI— r z i n .34 . . . (n-1) V 1— 7*-2n.34 . . . (n-1) From formula (49) partial r's of any given order can be found. In a four-variable problem, for example, ri2.34 may be written by reference to the formula as ri2.3 — "14.37*24.3 7-12.34 = , j==== , V 1 — H14.3V 1 — H24.3 that is to say, in terms of the partial r's of the first order. These first order partial r's must then be computed by (49) from r's of zero order before the second order r's can be evaluated. To find partial r's of a higher order, we must first express them in terms of the partial r's of the next lower order; and these r's, in turn, in terms of r's of the next lower order, and so on until r's of zero order have been reached. 2 In other words, it is necessary to "work up" from zero order r's, whenever r's of any higher order are to be computed. Hence it is apparent that with each additional variable the arithmetic of calculation 1 The order in which the secondary subscripts are written is entirely imma- terial, e.g., 7*12.34 — fn. 43- The order of the primary subscripts is of importance, however, in telling us which variable is " dependent " and which "independ- ent." Thus m means that Xi is dependent — is to be predicted from X%\ while m means that X2 is dependent — is to be predicted from Xi. The numerical value of ri2 and m is, of course, the same. 2 In calculating partial r's, use Table XXVII to get VI — r 2 values. PARTIAL AND MULTIPLE CORRELATION 233 is greatly increased. As a result, unless the work is carefully planned, the calculations soon become extremely laborious. The PE of a partial r of any order may be found, like the PE of an " entire" r, by substituting in formula (26). 2. General Formulas for Partial cr's of Any Order Just as the correlation between two sets of scores or other measures can be determined when the influence of 1, 2, 3, ... n other factors is held constant, so the variability (the a) of any set of scores can be found when the influence of 1, 2, 3, ... n factors is held constant. As an illustration of this, take 0*1.23 of Table XXVI. This " partial o-" gives the variability of Xi (honor points) freed of the influence exerted by the two factors X2 (general intelligence) and X3 (average study-hours per week). The general formula for a'$ of any order is (T1.234 . . . n = 0'l'V / l — r 2 l 2 Vl— r 2 i3.2^1 — r 2 l4.23 • • . Vl-r 2 l7 ,23...u-i) (50) This formula may be used to compute the net o-'s in correlation problems which involve any number of variables. In a five- variable problem, for example, 01.2345 is written (1) 01.2345 = 01 Vl — r 2 i2 Vl — r 2 i3.2 Vl — r 2 i 4.23^1 — r 2 i 5.234 and by analogy to (1) or by reference to (50) the other o-'s may be written: (2) 02.1345 = 02 Vl — r 2 i2 Vl — r 2 23.i v'l — r 2 24.i3 Vl — r 2 25 .i34 (3) 03.1245 = 03 Vl — r 2 l3 Vl — r 2 2 3.1 Vl — r 2 34.12 Vl — r 2 35.124 (4) 04.1235 = 04 Vl-r 2 i4Vl-r 2 24.lVl-r 2 34.12Vl-r 2 45.123 (5) 05.1234 = 05 Vl — r 2 i5Vl — r 2 25.iVl — r 2 35.i2Vl — r 2 45.123 Each of these o-'s measures the variability of a single factor when the effects of the other four are ruled out or held con- stant. All of them are o's of the fourth order, since there are 4 secondary subscripts, and the order of a partial a, like the order 234 STATISTICS IN PSYCHOLOGY AND EDUCATION of a partial r, is determined by the number of its secondary subscripts. By a simple rearrangement of the secondary subscripts any higher order o may be written in more than one way. A a of the second order may be written in two ways: e.g., 0-1.23 which is given on page 227 a s 0-1.23 = Q'iV / l — r^Vl — r 2 i3.2 may also be written 0-1.32 = o-i V 1 — r^v^l — r 2 i2.3- In like manner, 0-2.13 may be written (1) 0-2.13 = 0-2 Vl — f 2 l2 Vl — r 2 2 3.i, or (2) 0-2.31 = 0-2^1— r 2 23^1 — r 2 l2.3j and 0-3.12 may be written (1) 03.12 = (2) 0-3.21 = or 0-3V1- -r 2 i 3 Vl- -r 2 23.i 0-3 Vl - -r 2 23 Vl- -r 2 i3.2. The alternate forms of a partial a are useful as a check on the arithmetic calculations, and too because they make unnecessary the calculation of otherwise unused and hence superfluous partial r's. Thus by using the second forms of 02.13 and 0-3.12 instead of the first (see Table XXVI) w T e make unnecessary the calculation of r23.i so far as the computation of the o-'s is concerned. Furthermore, if r23.i is not used elsewhere in the problem, it need not be calculated at all (see page 228). Two partial r's, are all that we need in order to write the regression equation in a three-variable problem. The number of alternate forms in which any higher order 0- may be written depends on the number of permutations which its secondary subscripts can take. We have seen that a second order a may be written in two ways: 0-1.23 and 0-1.32. In the same way, any 0- of the third order, e.g., 0-1.234 may be written in 6 ways: 01.234, 0*1.243, 0-1.324, 01.342, 0-1.423, 0-1.432. Any <r of the fourth order, e.g., 0-1.2345 may be written in 24 ways, and any a of the fifth order, e.g., 01.23450, in 120 ways. 1 1 This follows from the law of permutations. The permutations of 4 things taken 4 at a time are 4^4 = 4X3X2 XI =24; and the permutations of 5 things PARTIAL AND MULTIPLE CORRELATION 235 Fortunately we need only a very few of all of these possible arrangements. Care, nevertheless, must be taken that the correct forms are chosen, for just as the number of partial r's which must be computed in a 3-variable problem can be reduced by a judicious choice of <r formulas, so also in problems which contain more than 3 variables the number of partial r's may be considerably reduced by proper selection. And it is in the longer problems that a reduction of the number of partial r's to be computed counts most, since it is here that the calculations become laborious. The partial a's which require the calcula- tion of the minimum number of partial r's are given — for 4- and 5-variable problems — in the outline solutions on pages 240-244. These will be found useful for quick reference. By analogy to these, the selection of the a formulas in problems which involve more than five variables can be easily made. 3. General Formulas for the Regression Equation, and Co- efficients of Regression The general regression equation, which expresses the rela- tion between a single dependent variable, Xi, and a number of independent variables, X2, X3, X4 . . . X n , may be written in Deviation Form as follows : Xl = 6l2.34 ... n X2 + bl3.24 . . . n #3+ . . . &l n .23 . . . (n-1) X n . (51) and in Score Form as Xl = 6l2.34 . .. n X2 + 613. 24 . . . ra X3+ . . . 6l n .23 . . . (n-l) X n ~\-K. (52) The regression coefficients 612.34 . . . », 613.24 . . . », etc., give the weight or value to be attached to each independent variable when Xi is to be estimated from all of these in combination. Moreover, the regression coefficients indicate the weight which each independent variable has in determining Xi exclusive of the influence of the other variables, and hence we can tell from the regression equation just what part the score on each of several taken 5 at a time are 6 P & = 5 X4 X3 X2 X 1 = 120. In general, the permutations of n things taken n at a time are n Pn ac n{n — l)(ji—2) . . . to n factors. See the Chapter on Permutations and Combinations in any Algebra. 236 STATISTICS IN PSYCHOLOGY AND EDUCATION tests plays in determining the score on the test taken as the dependent variable. The regression coefficients in a regression equation may be computed from the formula 7 CI. 234 . . . n /ro\ 012.34 . . . n = ^12.34 . . . n • • • • \06) 02.134 . . . n If the problem involves only three variables, the regression equation becomes Xi = 612.3X2+013.2X3 -\-K. In this equa- tion, the regression coefficients 612.3 and 613.2 are — like the partial r's, ri2.3, and ri3.2 — of the first order. The first, 612.3, equals ri2.3 — : — ; and the second, 613.2, equals 7*13.2 — : — (see 0-2.13 03.12 page 227 and Table XXVI). Regression equations which involve more than three variables are easily written by refer- ence to formula (52) and their regression coefficients may be found from formula (53). In a five-variable problem, for example, the regression equation becomes Xi = 612.345X2+613.245X3+614.235X4+615.234X5+^, and the regression coefficients (6's of the third order) are 01.2345 6l2.345 = 7-12.345 6l3.245 = ^13.245 6l4.235 = 7 , 14.235 6l5.234 = 7*15.234 0-2.1345 01-2345 0-3.1245 Q'1.2345 0-4.1235 Q'1.2345 0-5.1234 Obviously, to compute these regression coefficients we must first compute the third order partial r's, and the necessary partial q-'s. The calculation of the 6's is then a matter of sub- stitution. PARTIAL AND MULTIPLE CORRELATION 237 4. General Formulas for Standard and Probable Errors of Estimate All Xi scores estimated from a regression equation have a standard error of estimate, a^st-xo, which measures the error made in taking estimated instead of actual scores (see page 230) . cr {eat. xo is found from the formula for 0-1.234 ... n, as follows: C(est. Xi) = 0"1.234 ... n, (54) and P#(est.X 1 )=.6745X<X(est.X 1 ) (55) As ci.234 . . . n must always be computed in order to find the regression coefficients (see examples above), o-( est . xo is known at once without further calculation. The value of a standard error of estimate has already been illustrated on page 230 from the data of Table XXVI. To repeat, we find in Table XXVI, that the o-^st.x^ °f an Y estimated number of honor points is 6.34, and that the P£ T (es t.^ 1 ) is 4.28 points. Hence, the chances are even that the "most probable," i.e., estimated, number of honor points received by any student — as found from the regression equation — will be in error by 4 points or less (roughly). We may be practically certain that any estimated number of honor points is not in error by more than 4X4 or 16 honor points. It may be shown by the method of least squares x that the standard error (or PE) of estimate is a minimum when the regression equation is used to estimate the Xi scores. For this reason, values of Xi predicted from the regression equation are said to be the "best" estimates of the actual Xi values which can be made from a linear equation which contains the given variables. The regression equation Xi = . 57X2 + 1.13X3 — 66 (see page 230) will serve as an illustration of what is meant. Assuming that the relation between Xi and X2, Xi and X3, and X2 and X3 is linear in every case, Xi (honor points) can be estimated from this equation with a smaller error of estimate than from any other equation. 1 See Yule, An Introduction to the Theory of Statistics, p. 231. 238 STATISTICS IN PSYCHOLOGY AND EDUCATION 6. General Formula for R, the Coefficient of Multiple Correlation The correlation between a single dependent variable X\ and (n — 1) independent variables, — e.g., X2, X3, X4 . . . X n — in combination is given by the formula #1(23 . . • n) = \/l ~ ' H , .... (56) \ <T"l in which #i ( 23 . . . ») is the coefficient of multiple correlation, c\ is the o- of the dependent series of X\ scores, and 0-1.23 ... n equals the standard error of estimate (see formula 54). When there are only three variables, the multiple coefficient of cor- 2 O 1 2*^ 1 ^— ; when there are five R 1(23) = A h C 2 1.23 \ /1- 9 or 1 .2345 o ; and variables #k2345) = \/1 5 — ; and in like manner the R \ 0-1 for six, seven, or any number of variables may be written by reference to (56). Since the error of estimate is a minimum when the regression equation is used for estimating A r i scores, it follows that the multiple coefficient of correlation R gives the maximum correlation obtainable between the actual X\ scores and X\ scores estimated from a knowledge of the independent vari- ables X2, X3 . . • X n , in the regression equation. R is valu- able, therefore, as indicating how effectively a given com- bination of measures (or "team of tests") represents the actual values of X\ when these measures are combined in the best possible way. R is always positive no matter what the signs in the regression equation may be. Errors of sampling, therefore, do not neutralize each other but tend to become cumulative. As a result, the PE of R — which is found from the same formula as the PE of any product-moment ?' — is not a fair measure of the coefficient's validity. To test the validity of an obtained R, we must compare it with the value of that R which we should get from the same number of cases and the same number of variables, when the variables are uncorrected, PARTIAL AND MULTIPLE CORRELATION 239 i.e., with the R which would arise from fluctuations of sampling alone. The formula for this R is R =^T' < w > in which n is the number of variables, and N is the number of cases. 1 To illustrate this formula, let us apply it to the three- variable problem in Table XXIV, in which n = 3, and N = 450. Substituting for N and n in the formula, we get an R equal to .07, which indicates a highly satisfactory degree of validity for the obtained R of .824. If we replace 0-1.23 n in formula (56) by its value in terms of the entire and partial r's [see formula 50] we may write the general formula for #i ( 234 . . . n), as follows: R 1(234 . . . n) = Vl-[(] -r 2 i 2 )(l-r 2 i3.2) . . . (l-r 2 i n .23 . . . ( »-i>)]. . (58) Moreover, since a higher order a may be written in a variety of ways, the number depending upon its order (see page 234), we have in the alternate forms for R & valuable means of checking the accuracy of our arithmetical calculations. In a three- variable problem, for example, Ri&3) may be written as fii ( 23) = Vl-[(l-r 2 i 2 )(l-r 2 i3.2)], or #K32) = Vl-[(l-r 2 13 )(l-r 2 i2. 3 )]. In like manner, in a 4-variable problem #i#34) may be found from £i(234) = Vl-[(l-r 2 i 2 )(l-r 2 i3.2)(l-r 2 i4.23)], and checked by #K342) = Vl-[(l-r 2 i3)(l-r 2 14 .3)(l-r 2 1 2.34)]. 1 Rosenow, Curt, The Analysis of Mental Functions, Psychological Mono- graphs, 1917, Vol. XXIV, 5, p. 20. 240 STATISTICS IN PSYCHOLOGY AND EDUCATION 6. Outline of the Formulas Needed in Correlation Problems Which Involve (a) Four Variables and (b) Five Variables In multiple correlation problems, generally the main task is to find — with a minimum of time and calculation — the regres- sion equation which expresses the relation of the dependent variable to the independent variables. For this purpose, when working with more than three variables, the simplest plan is to write down the formula for the regression equation required first and then proceed deductively to find those partial r's and higher order cr's which are necessary for computing the regres- sion coefficients. The formulas for getting the regression equation with a minimum amount of calculation are given — for four and five variables — in the following outlines. It is neces- sary, of course, that all zero order r's be first computed before the partial correlation technique can be applied. (a) Formulas for Four- Variable Problems (1) Regression Equation. The regression equation for four vari- ables is written by reference to formula (52) as follows: (2) Regression Coefficients. The three regression coefficients needed in (1) are found from formula (53), — , Cx.234 Oi2.34 — 7*12.34 C2.134 , 0*1.234 Oi3.24 — Tu. 24 C73.124 , Cl.234 014.23— 7*14.23 CT4.123 These regression coefficients evidently require the computation of 3 second order partial r's, and 4 third order o-'s. PARTIAL AND MULTIPLE CORRELATION 241 (3) Partial r's. To find: To find: To find: (a) (6) (c) 7*12.3 — #14.3 7*24 3 7"l3.2— 7*14.2 T 3 4.2 7*14.2— 7*13.2 7*34.2 ri2.34= ; / - — 7*13.24= , — , 7*14.23 = " Vl-r 2 i 4 . 3 Vl-r 2 24.3 Vl-r 2 14 .2Vl-r 2 34 2 ' Vl-r 2 13 . 2 Vl. •7*-34.2 We must find 3 first We must find 3 first No partials of first order partial r's as order partial r's as order are needed follows: follows: other than those already found. ri2-ri3 r 23 ri 3 -ri 2 r 23 ri 2 . 3 =— — . — — ri3 2=- Vl-r 2 i 3 Vl-r 2 23 Vl-r 2 i 2 Vl-r 2 12 v X-7-23 ri4-ri 3 r 34 ri 4 -ri 2 r 24 ri4.3=— 7= - / = ri4. 2 =- Vl-r 2 i3Vl-r 2 3 4 " Vl-r 2 i 2 Vl-r 2 24 r 24 -r 2 3 r 3 4 r 3 4-r 2 3 r 24 r 2 4.3=— 7== — , r 3 4.2=- Vl-r 2 2 8 Vl-r 2 84 ' Vl-r 2 23Vl-r 2 24 [Note that a minimum of 9 partial r's must be computed, 3 of the second order and 6 of the first order. The 9 first and second order r's together with the 6 zero order r's make 15 coefficients of correlation required in all.] (4) Standard Deviations. The four third order cr's required may be found from the following formulas which make use of no partial r's other than those already computed in (3) above. From formula (50) : Cl.284 = <Tl Vl — r 2 i 2 Vl — r a i«.» Vl — f 2 i4. 23 CT2.134 (i.e., (7 2 .34l)=0-2 V 1— r 2 2 3 V 1 — 7* 2 24 .3 V 1— r 2 i 2 .34 c 3 .i 2 4 (i.e., (73. 2 4i)=(73 V 1— r 2 23 V 1— ?' 2 34 . 2 V 1 — r 2 i 0-4.123 (i.e., o-4.32i)=o 4 V / l — r 2 3 4 Vl— r 2 2 4.3V / l — r 2 i 3.24 4.23 The numerical values of the regression coefficients may now be computed and substituted in the regression equation. (5) The Standard Error of Estimate, a- (est. xi)- From formulas (54) and (55) we find: ocest.xx) =01.234 [for value 01.234 see (4) above] PE(eat. X{) = • 6745 0(est. Xi) 242 STATISTICS IN PSYCHOLOGY AND EDUCATION (6) Coefficient of Multiple Correlation, R. In a four- variable problem the multiple coefficient, R, is written Riqu) and may be found from formula (56) : Rwui = yjl -~ This formula may also be written as: #i<2W) = VH(l^)(l-r« llll )(l-r« M .„) or as #1(234) = V / l-[(l-r2 13 )(l-r 2 14 . J )(l-^12.34) (6) Formulas for Five-Variable Problems (1) Regression Equation: ^l = Oi2.345A^2-j-Oi3. 245A3-hOi4.235-X44-O15.234X5-h.lv. . • (52) (2) Regression Coefficients: , 0*1.2345 7 0*1.2345 /~0\ Ol2.345 = yi2.345 j Oi4.236 = ^14.235 , • • (.Oo) 0*2. 1345 0"4.1235 , 0*1.2345 , 0*1.2345 Ol3.245 = ^l3.245 " , Oig.234 = ri5.234 • 0*3.1245 0*5.1234 (3) Partial r's. We compute 22 partial r's as follows (formula 49) : (a) (o) To find: r 12 .345 write as r 12 .4 5 3. To find; fi3 _ 24s write as Then Then— 23-45 ^12.45 — ?"l3.45 ^23.45 „ „ „ 7-12.453 = — T 7= • r - r n.45-ri2.45r 23 . 45 To compute this r we need 3 partial r's of the second order, To compute this r we need no partial r's other than those already found in (a). viz., — ri2.4— ru.4 r 25 .4 ri2.45 — ri3.45 — Vl-rhsWl-rhs.4 ri34— ris.4 r 35 .4 r 2 3.4— r 2 5.4r 35 .4 r23.45 — / = / ~' Vl-r J 2M vl-r 2 3u To compute these 3 r's we need 6 r's of the first order, viz., — ru.4 ris.4 ri 3 .4 T26.4 ^23.4 rjS.4 PARTIAL AND MULTIPLE CORRELATION 243 (c) W To find: ri 4 . 235 write without To find: r ]5 . 23 4 write without change— change— 7*1 A. "3 —7*15.23 9*45.23 7*15.23 — 9*14.23 7*45.23 ri4.235 = / j- 7 *15.^34 V^-rh^Vl -r»«.s«" Vl -r2 14 . 23 Vl-r^s-aa' To compute this r we need 3 m A ±1 • j .,,-,, i , to compute this r we need no partial r s of the second order, partialg other than those already vlz -> found in (c). 7*14.2 —7*13.2 7*34.2 7*14.23 : 7*15.23 — 7*45.23 Vl -r\ 3 . 2 Vl -rhi.2 7*15.2 —7*13.2 7*35.2 Vl -rhz.2 Vl -rh&.2 7*45.2 — 7*34.2 7*35.2 Vl-rhiWl-rsJ To compute these r's we need 6 r's of the first order, viz., — 7*14 2 7*13.2 7*15.2 7*34.2 7*35.2 7*45.2 [Note that we must compute a minimum of 4 third order r's, 6 second order r's, and 12 first order r's, 22 in all.] (4) Standard Deviations. The 5 fourth order cr's required may be found from the following forms which make use of only those partial r's already computed in (3): 0-1.2345 =o- 1 Vl-r 2 12 V / l-r 2 i3 2Vl-rtu.2zVl-rhs.Z4 • (50) CT2.1345 (i.C, 02.453l) =0-2^1 -r 2 24 Vl-r 2 2 5. 4 V / l-r 2 23.45Vl-r 2 l2. 345 0-3.1245 (i.e., 0-3.4521) =o- 3 V 1 — r 2 34 V 1 — r J 3d ., Vl — r 2 i - iA6 Vl — r 2 i 3 . 24 5 0-4.1236 (i.e., 0-4.235l)=0- 4 V / l-r 2 24V / l-r 2 3 4.2V / l-r 2 45.23'V / l-7* 2 l4.235 0*6.1234 (i.C, 0-5.234l) =0-5 V 1 — r 2 26 V 1 — r 2 36 .2 V 1 — r 2 45.23 V 1 — r 2 i5.234 (5) Standard Error of Estimate a- (est. xa ©■(est.x!) =0-1.2345 [see (4) above for value] . . . (54) P^(est.X 1 )=.6745 0-( es t.Xi) . . (55) 244 STATISTICS IN PSYCHOLOGY AND EDUCATION (6) Coefficient of Multiple Correlation, R. •'"' (56) it 1(2346) — A/ 1 ~ which may be written also as Rums* = V / l-[(l-r2 12 )(l-r 2 13 . 2 )(l-r2 14 .23)(l-r 2 i 5 .23 4 )], and checked by ^K2346) = Vl-[(l-r» M )(l-r* 1 ,. 4 )(l-r« w .„)(l-r* 1> . a46 )]. IV. A Multiple Correlation Problem with Four Variables In Section II we found that a student's honor points (X\) could be estimated with a considerable degree of accuracy from a knowledge of his general intelligence score (X2) and the num- ber of hours he spends in study per week (X3) . The PE iest . Xl ) made in estimating individual scores from this three-variable regression equation was found to be 4.28 points; and the coeffi- cient of multiple correlation, Ri@3) which indicates, in general, how well the estimated scores represent the actual scores was .824. Now suppose that we add to the two independent variables X2 and X3 a third factor X4 — e.g., the quality of the preparatory work done by the student in High School. 1 This will give us three independent variables from which to estimate the dependent variable honor points, and the question arises : — with how much greater accuracy will this additional factor enable us to predict academic success? The answer to this question will be found in Table XXVIII, which gives a complete solution of this problem, following the scheme outlined for four- variable problems in Section 111(6). Some additional discussion of procedure and methods and several points to be especially noted are given in the following paragraphs. Remember first of all that the mean and the a of each set of measures must be known as well as their 6 inter correlations, 1 This was measured by the average grade obtained in the work offered for entrance to College. May, Predicting Academic Success, Journal of Educa- tional Psychology, Vol. XIV, 434-436. PARTIAL AND MULTIPLE CORRELATION 245 r's of the zero order. The calculation of these 6 intercorrela- tions is actually the most laborious part of the solution of a multiple correlation problem — in spite of the fact that we have passed it over with little comment heretofore — since a separate correlation table must be drawn up for each r. (1) The discussion from here on * follows the outline given in (6) on page 240. Thus, before calculating any partial r's, we write the regression equation, and from it deduce what partial r's and higher order cr's will be required. (2) It is clear from the regression coefficients that we shall need three partial r's of the second order: — viz., ri2.34, ri3.24, and ri4.23; and four partial <r's of the third order, viz., 0-1.234, 0-2.134, 0-3.124, and 04.123, in order to evaluate the constants in the regression equation. Only the partial r's actually required in the regression equation need be calculated. (3) In order to find ri2.34 we shall need three first order partial r's, viz., ri2.3, ri4.3, and r24.3j and to find ri3.24 we shall need, again, three first order partial r's, viz., ri3.2, ri4.2, and r34.2- To find the last second order partial, ri4.23, no additional first order r's are required other than those already found. A mini- mum of 9 partial r's, therefore, is required in all. The partial ri2.34 gives the net correlation between (1) honor points and (2) general intelligence when both (3) study hours and (4) average High School grades have been eliminated as variable factors or held constant. In like manner, ri3.24 gives the net correlation between (1) honor points and (3) study hours when both (2) general intelligence and (4) average High School grades are held constant. The first second order partial r, i.e., ri2.34, equals .764 and is but slightly reduced from ri2.3 which equals .802; while the second partial ri3.24 = .676, and is also but slightly less than ri3.2 which equals .707. This comparison of partial r's shows the relatively small influence of High School grades on the net correlation between (1) honor points and (3) study hours with general intelligence constant, as well as the small influence of this factor on the net correlation 1 See Table XXVIII. The divisions in the text parallel those in the table. 246 STATISTICS IN PSYCHOLOGY AND EDUCATION between (1) honor points and (2) general intelligence for study constant. Notice, however, that while the zero order coefficient of correlation between (1) honor points and (4) average High School grades, i.e., ru is .40, ri4.2 = .246, ri4.3 = .387, and 7*14.23 = .088. Evidently, nearly all of the correlation which appears between (1) honor points and (4) average High School grades may be attributed to the common dependence of these two factors on (2) general intelligence and to a somewhat lesser degree on (3) study hours. (4) By using the forms given in (6) page 240, we are enabled to calculate the four third order as required by the regression coefficients without the necessity of finding any additional partial r's (see page 234). These partial o's viz., 0-1.234, 02.134, etc., give the net variability of the distribution of measures denoted by the primary subscripts when the influence of all three of the other factors (secondary subscripts) has been excluded. To take a single example, 01.234 is 6.31 as against a 01 of 11.2, which means, concretely, that if each of the 450 students in the group were exactly alike as regards (2) general intelligence, (3) study-hours, and (4) average High School grades, the a of their distribution of honor points would be only about half as large as the observed o: — the o of the group in which these factors differ in weight or value. The computation of the regression coefficients is simpl}- a matter of combining the partial r's and o's already found. When this has been done, we may substitute in the regres- sion equation to find xi = . 55^2 + 1.07x3 + .083o*4, or multiply- ing by 12.5 (a convenient constant), (the number of honor points) =7 (score on general intelligence test) +13 (the number of hours spent per week in study) +1 (average High School grades). In Score Form the regression equation becomes Xi = .55X 2 +1.07Z 3 +.083X4-69. It is clear from the regression equations that the number of hours spent in study has twice the weight of the score on general intelligence test and thirteen times the weight of the average High School grades, in determining the number of PARTIAL AND MULTIPLE CORRELATION 247 honor points which a student will most probably receive at the end of the first semester. Apparently (as noted above), the average High School grades have relatively little influence on honor points as compared with the other factors in the equation. (5) Still further evidence of the small importance of High School grades in improving the estimate of honor points is to be seen in the size of the PE^ t . Xl )- The PE of estimate made in predicting honor points from the present equation is 4.26 points as compared with a Finest x$ of 4.28 points made in using the regression equation which does not include High School grades (see page 230) . This means that we can estimate the number of honor points which a student will receive, know- ing his general intelligence score and the number of hours he spends in study per week, with but slightly greater error than when we know in addition to these two the average grade he has received in High School also. It would seem apparent, therefore, that the work required to build up a regression equa- tion which will include the latter factor is hardly worth while. (6) The multiple coefficient of correlation, 2£i ( 234) is .826 as compared with the Ri@3) of .824. A comparison of these multiple coefficients further substantiates the conclusion that High School grades contribute practically nothing to the reliability of an honor point estimate. It will be of considerable interest to compare the reliability of our estimate of honor points when the factors, singly and in combination, are taken into account. In this way the "prognostic" value of the multiple regression equation — as shown by the size of o- (es t. xi> — will be more readily appreci- ated. The standard errors of estimate and the coefficients of correlation for the different factors taken singly and in combination are given below: Dependent Variable: (Honor Points X{) o"(est. Z\) Coefficients of Correlation X x =.43X 2 -24.76 8.96 r 12 =.60 Xi=.60X 3 +4.1 10.61 ris-,32 Xi«.57X 2 +l. 13X3-66 6.34 # 1(2 3)"=.824 X= .55X 2 +1. 07X 3 +.083X 4 -69 6.31 #i (23 4) = .826 248 STATISTICS IN PSYCHOLOGY AND EDUCATION CO > X M W M H CO H s > I fa O O « O fa O B «j « M O O o o M H 3 o w o o •d lO ,£03 OS t>^ bC-d l> bD 03 3 c3 >> cu •n Mm o o 03 go % "8£ +3 CO p l-H 3 a d o 03 rj< CO CN O • iH -4-3 C3 -4-3 d P-. a o U CO 0) o d 0> bD 0) d 03 o CN co .9 O fa f-i O d o w CD O O 00 lO CM 00 tH k! + iO CO ^ CO CO 1 II II + s £ o CM 0) bfl CD o cc d o 03 f-l 03 a CO II CO O <N o o • • d CO CO II II II V|-l d # o d o 03 CN CO — -+j d 03 c c V- 3 C O "o w O 00. d d ««H o o o *w *co 03 CO CO a 0> 03 H 03 bO txO rd 03 03 o CO rt « u / ^^ o in <N fa N -"' b b PARTIAL AND MULTIPLE CORRELATION 249 OS H3 o H a> a •-* •+■» d o o > M M H n3 a o H I i— i > > > > © S-i '3 .. 02 O ^^ 03 03 *^ ?-< t-i 03 o2 m d cc^ O H o © .a -^ <4-l O H-= S-I 03 ft CO a HI© © O o CO GO o II* 1> CM os X CO 1> OS o * l> X CM OS 1 rH © CO O e CM <» ►*s 1 ■♦^ to § 1 © c V. fe< o CM CO i>- CO r^ ill II CM CO cm cm iO IM II S co CM Sj- CO i-» L, 1 1 CO I x CO 1 1 t^ ■*r 1 CO •*r 1 ^.^ 5^. ^H 1 CO J^ T-H o ^. i-l lO c > X CO OS CM > X CO CO co CM \, CO 1 ~» o 1 CM o os 1 S 1 1 w ^ CO X 1 CM CO X 1 *T 1 c 1 1 GO £ | 1 £ 1 1 1 1 GO 1 1 i— t CM t-4 o *-* TH > CO > "tf > -. 1 1 i 1 1 II o CO CO os X i>- CO co os ca © CO CM X CO CM 1^ O OS CI CO OS X cm GO CO OS CO O H > > © 03 ft CO £ 02 • • f-t © o ca CM O CO CM CO l^ CO ! II CO T— 1 CO CM CM no CO CM II 1 C3 tH 5- 1 i— 1 > CO 1 X CO co OS 5*. 1 > 1 T-H T-H X os CO OS OS •3" 5^" |i x \„ CO S» CO CM x ■ CO CM 1 S i 1 CM CO r* 1 CM 5- CO X 1 CM 1 5^ y^, c-i 1 t>. I , tH CJ 1 , C 1 1 r}H !>. 1 1 ]>. t* 1 1 1—1 o OS i—l o "?H i— I co > CO V "Ch OS \ CO 1 1 1 1 1 1 OS CO OS OS X t> co CO os © e OS © CO 1> CO CM rh X t> 00 CO I CM o oo 250 STATISTICS IN PSYCHOLOGY AND EDUCATION o CO -t IC ) to 3 u; tC > 1— 1 CO CD II 00 00 CD 00 II 05 05 CO o CO II 1-1 CD II Oi to C3 • • • (2 d _o '•+3 03 P •>* CO o CO C5 • C5 X oo CO X t 00 o X CI o X 00 CI + ? CN O OS o ]> <M a> H rH o X CD • t^ O 1> 05 X o fr *0 0) .S H-> a o O | X 00 X CM i—i T— 1 II CD CO C5 X 00 to 1—4 II X 1> CD CO X CD II C3 CO o d X to ]>i II co rO d CO tr i rH CO CD 3 : 00 CO 00 c 1 1—4 CO CD 3 CO c a c 1 1—1 CO CD 3 D 1 i— I CD 1—1 + CM LO to II 1 03 rH O C3 m a CO 1 CO 00 r— C" C£ II ! « > i ) ! d C~> esT « r-i ^f X <* X o X 00 CO cu CO 00 o II ' X .2 > h~ - - 5- CM o i> 00 B o o 2 rt < r— . X 1 1— 1 1 rH 1 -4 1 i—l to i> co o 1 + + b o II II « •"H — — • *J 02 00 r-H « S j-J b S a PQ go 1 i—i > > T i—l > > 1 i—l > <* CO CM 1 — 1 > -t-3 el "o cfi 03 o o d o 1 b 5 1 1 b i" 1 b 5 1 CO b \ b 1 1 CO CM ^b f -o 03 rD "■+3 o 1 CO o 1—1 o 1-H + H 03 H-» cc< •n ., CO CM CM rH + to o N -, 1 CO "oo CM CO ** _o to cj CO +•» \ — t > r —4 > CM r > CO r —4 > CD rH bJO 0) rH 1— 1 .3 rO rO o 03 o rH CD O O i—i II CO w o d s "u 03 • • b b b b ^ 1 O CO o •rH II eo C4 II CO II rH II CO CM rH c3 Cm J32 rO 03 *» ^ O Vh w T3 U > p b CM b £ s o d .2 r£ <4-l O 00 to to II >»H o d # o '•+3 03 to a cii -4-> ty3 "-3 1 c3 -P g 00 TO H-> 3 »o ft > 1 Ch in. a o .9 *h3 1 03 H 1 3 H-J '■+3 CO .a 3 03 O io o >-• OQ CO to co 00 ^ ft. CD CM 00 CO X 00 LO iO oo X CD l> 00 > I > II r* o 03 rC PARTIAL AND MULTIPLE CORRELATION 251 The important fact here is that cr( es t. xo is considerably less, and the correlation considerably greater, when X2 and X3 are taken together than when either is taken alone. The stand- ard error of estimate and the R improve very slightly when X4 is added to X2 and X3. It is very probable that by an exten- sion of the method of partial and multiple correlation to in- clude other variables in addition to those we already have, the o- ( est. xi) of our problem could be still further reduced and R increased. Before working out a regression equation containing added variable or variables the " predictive value" of the "new" equation should be found by computing o-(est.xi) or & This will enable us to determine what the effect will be of adding another variable or variables, and whether <7 (est . Xl ) is sufficiently reduced or R sufficiently increased to justify the additional calculation. In the present problem, for instance, either <T(est.x 1 y or .Ri(234) would have told us that average High School grades add practically nothing to the predictive value of a regression equation which already contains the two variables general intelligence and number of hours spent on the average in study each week. V. The Value and Use of Partial and Multiple Correlation 1. The Value and Use of Partial Correlation in Analysis and in Causal Investigations Partial correlation is of considerable importance in the analysis of the part played by each of several factors in a total result, inasmuch as it enables us to find the net relationship between two sets of scores or measures when the influence of one or more other factors is excluded. A concrete illustration of this use of partial correlation may be cited from the work of Cyril Burt. 1 Burt wished to find how much a child's mental age — as given by the Binet tests — influenced his school attain- ment. His subjects were 300 children from 7 to 14 years old. I Burt, Cyril, Mental and Scholastic Tests, London, 1921, pp. 180-184, 252 STATISTICS IN PSYCHOLOGY AND EDUCATION Each child's (1) MA (Binet) was found; likewise his (2) scholastic achievement as measured by educational examina- tions and checked by teachers; and (3) his chronological age. The " entire" coefficient of correlation between Binet MA and scholastic achievement (ru) was .91. When chronological age (3) was held constant, the partial r (7*12.3) between Binet MA and scholastic achievement dropped to .68. This shows, in the first place, that age has a decided effect on the observed correlation between MA and school work — that it tends to increase or " dilate" the obtained r. This dilation is due to the fact that both MA and school attainment tend to increase with chronological age, and hence this common depend- ence on chronological age is sufficient to bring about a consider- able " boost" in the observed correlation. In the second place, the 7*12.3 = .68 indicates that a substantial relation remains between MA and school work when age conditions are uniform. In other words, Binet MA (intelligence) is a substantial factor in a pupil's school attainment irrespective of his chronological age. To take the analysis a step further, Burt found that the correlation between school work (2) and chronological age (3) (7*23), was .87; and that when the effect of Binet MA was held constant, the partial r between school work and chrono- logical age (7*23. 1), was .49. The persistence of a fairly high relation between school work and chronological age when intelligence is eliminated offers confirmatory evidence, accord- ing to Burt, of the "undue influence of age upon school classifi- cation." In these illustrations it is clear that the calculation of the partial r's is the first step in an analysis of the factors which determine school attainment. By an extension of this same method the influence of other factors may be excluded and net relations secured. From the analyses made through the elimination of factors by partial correlation, we are often enabled to determine exist- ing "causal" relationships. Thus Phillips 1 in a study of the 1 Phillips, Prank M., Application of Partial Correlation to a Health Problem. Reprint No. 867 from Public Health Reports, Sept., 1923. PARTIAL AND MULTIPLE CORRELATION 253 causes contributing to absence on account of sickness among government employees over the period of a year found that the observed correlation between absence (i.e., number of persons absent) and mean temperature on the day of absence (r at .) was — .37. When the four factors (1) relative humidity at 8 a.m. on the day of absence; (2) relative humidity at noon of the previous day; (3) inches of rainfall on the day of absence; and (4) per cent of possible sunshine on the day of absence were held constant, the net correlation (r a t. 1234) remaining between absence and temperature was —.39, practically the same as the original correlation. Since this was the only r of any size (the other r's both entire and partial were negligible) the obvious conclusion seems to be that of the factors studied, temperature on the day of absence is the most important sec- ondary or contributing cause of absence. (The sickness must be taken, of course, as the primary cause of absence.) Here and elsewhere let it be understood that partial correlation has absolutely nothing to say about " causes," as such. The con- clusion as to which of two factors is the cause and which the effect is a matter of common sense analysis. In the illustration given, the distinction between cause and effect is obvious. Another interesting example of the use of partial correlation in a causal investigation is found in the work of Reavis. 1 This investigator undertook to ferret out the causes of attend- ance and non-attendance in rural schools. Certain factors, (1) distance from school, (2) age-grade relation, (3) kind of work done by the pupils, (4) training, experience, etc., of teacher, (5) school equipment, and (6) kind of community were taken as having more or less effect on school attendance. When partial correlation was applied to the problem, it was found that the entire coefficient of correlation between attendance and distance, and attendance and kind of community, were the least reduced. The first was lowered from — .45 to — .43 ; and the second from .30 to .28. Of all the factors selected, therefore, these two seem 1 Reavis, George, Factors Controlling Attendance in Rural Schools. Teachers College, Columbia University, 1920. 254 STATISTICS IN PSYCHOLOGY AND EDUCATION to have the most direct or independent influence on school attendance. As in the problem cited above, the distinction between cause and effect in this illustration is clear: — it is evident that distance from school and kind of community are the causes and not the effects of attendance or non-attendance. 2. The Value of the Regression Equation in Prediction and Analysis The value of the regression equation is twofold: 1 (1) In its usual form, it gives the weights to be assigned each of several independent variables, in order that Xi (the dependent variable) may be predicted or forecasted with minimum error (see page 237). (2) In its " special" form it may be used to analyze — within certain limits — a given capacity or ability. We shall consider these two uses of the regression equation in order. (1) It has already been stated that the regression equation enables us to combine two or more tests or other measures (independent variables, X2, X3, . . . X n ). into a single value (Xi) in such a way as to give the best possible estimate of X\. In the three-variable problem on page 228, for example, the regression equation gives us the best possible forecast of the number of honor points (Xi) which a student will receive, when we know his general intelligence score (X2) and the average number of hours he spends per week in study (X3). Moreover, once calculated, the regression equation may be used subse- quently to estimate other student's scores in Xi when only their scores in X2 and X3 are known. The value of the regression equation as a forecasting instrument is determined by the size of the standard error of estimate, and by the multiple coefficient of correlation. A good illustration of the value of the regression equation in forecasting — taken from another field than psychology — is to be found in the work of Moore in forecasting the cotton crop in 1 Kelley, T. L., Tables to Facilitate the Calculation of Partial Coefficients of Correlation and Regression Equations, BulletiD of the University of Texas, 1916, 27, p. 7. PARTIAL AND MULTIPLE CORRELATION 255 the Southern States. 1 Taking the cotton crop in Georgia as the dependent variable (to cite a single example) and the May rainfall, June temperature, and August temperature as inde- pendent variables, Moore built up a regression equation from which it was possible to get a better forecast of the crop at the end of August than the official method of the U. S. Department of Agriculture could obtain from the condition of the crop in September. (By better forecast is meant a smaller error of prediction.) In addition to its use as a forecasting instrument, the regres- sion equation may be used also to determine the value or " weight" which each test in a battery should have in order that the composite scores obtained from the battery (group of tests) shall be the best possible estimates of that capacity which the whole battery of tests presumably measures. This is essentially the same problem as that of prediction or fore- casting discussed in the last paragraph. Suppose, by way of illustration, that the problem is to devise a group test for measur- ing general intelligence; and that this battery is to consist of four tests. The first step is to secure some good " criterion" 2 of general intelligence. This may be (1) school grades, (2) teachers' estimates, (3) (1) and (2) combined, or (4) some standard intelligence examination, as for example, Stanford- Binet or Army Alpha. The next step is to select four tests which will separately give (1) high correlations with the criterion, and (2) low correlations with each other. 3 These two condi- tions guarantee that each test will measure some aspect or phase of the criterion ; and further that each test will probably measure a different, or slightly different, phase of the criterion, since the low intercorrelations will prevent much duplication. Let us call the criterion X c and the four tests of the battery Xi, X2, X3, and X4. The regression equation in Score Form is 1 Moore, H. L., Forecasting the Yield and Price of Cotton, 1917, pp. 108-115. 2 See page 266 for definition of " criterion." 3 The ideal battery of tests would consist of tests which correlate as. high as possible with the criterion, and as low as possible with each other, 256 STATISTICS IN PSYCHOLOGY AND EDUCATION X c = AX 1 +BX 2 +CXz+DX±+K: in which A, B, C, D, the regression coefficients, are the "weights" to be given the scores made on the four tests, and K is a numerical constant. Now to take a very simple case, suppose that A — \; B = 2; C = 3; and D = 4. The regression equation then becomes X c = lXi + 2X2+3X,3+4X4+i^: which means that a subject's score on test No. 1 must be multiplied by 1, his score on test No. 2 by 2, his score on test No. 3 by 3, and his score on test No. 4 by 4 in order that his composite score on the battery may give the "best" estimate of his score on X c , the criterion. The regression equation may be said to furnish the ideal method of combining several tests into a team, since each test in a regression equation is weighted according to its correlation with the criterion, independently of the other tests in the team or battery. Under these conditions the standard error of estimate is a minimum while the correlation of the predicted X e values and the actual X c values (multiple R) is the maximum obtainable with the given set of tests. R tells the extent to which our team represents the criterion. (2) The only difference between the usual or " regular" form of the regression equation and the "special" form to be considered now is that in the special form, the o-'s of all of the different tests (or other measures) are taken as equal. This procedure eliminates differences in the size of the test units as well as differences in "spread" or variability, and enables us to determine (from the correlation alone) the relative weight with which each independent factor "enters into" or contributes to the dependent variable (the criterion) independently of the other factors. In this way, an analysis can be made of the impor- tance of several different factors in some final result. It is very important to remember, however, that in its special form, the re- gression equation cannot be used for forecasting. We may illustrate the special use of the regression equation with data taken from the three-variable problem on page 228. If Xi, honor points, be taken as the criterion, while X2, general intelligence, and X3, average number of hours spent in study PARTIAL AND MULTIPLE CORRELATION 257 per week are, as before, the independent variables, the usual or " regular" regression equation is written: Xi — 612.3X2 +613.2X3+^. Replacing the b's in this equation by means of formula (53), v CT1.2S v , 0-1.23 -rr 1 rr. Al=ri2.3 A2+ri3.2 A3+A; (T2.13 0-3.12 and replacing the partial o's [by formula (50)], we have v 0-1 Vl — r 2 i 3 Vl — r 2 i2.3 v Al=ri2.3 1 > - A2 (72 V 1 — f 2 23 V 1 — H12.3 . (TiVl-r 2 i2^l-r 2 i3.2 v , ^ +ri3.2 y -— r Xz+K. 0-3 V 1 — r 2 23 V 1 — H13.2 Substituting numerical values for the r's and putting 0-1 = 0-2 = 0-3, we have or Xi = .8X 2 + .QX 3 +K. L This result may be interpreted to mean that in so far as the two factors, general intelligence and number of hours spent on the average in study per week, "enter into" the ability to get honor points, they contribute with the relative weight of .8 : .6 or 4 : 3. It must be clearly understood that this ratio refers to the relative contribution of the two factors themselves to the final result and not to the relative weights of their scores. The weight to be assigned each score is found from the regular regression equation given on page 229. It is of considerable interest, however, to note that while the scores on the general intelligence test and number of study hours are as 1:2, the actual contribution of these two factors to honor points (allow- ing for differences in units, variability, etc.) is as 4 : 3. Intel- ligence, therefore, as we should expect, has more weight than hours spent in study in determining the hypothetical ability 258 STATISTICS IN PSYCHOLOGY AND EDUCATION which we have called " academic success." Much of the weight which study-hours has is due to its relatively high negative correlation ( — .35) with intelligence. In concluding this discussion of partial and multiple correla- tion, certain limitations to the use of the method should be pointed out. In the first place, in order that partial coefficients of correlation be valid, it is necessary that all of the zero order coefficients be computed from data in which the regression is linear. Before calculating any partial r's, we should make sure that all zero order r's have linear regression: if there is any doubt as to linearity, the tests given on page 209 should be employed. In the second place, the number of cases must be large, especially if there are a number of variables, otherwise partial and multiple coefficients will have little significance. Coefficients which are misleadingly high may be obtained when studies which involve many variables are based upon relatively few cases. When the limitations and conditions mentioned are fully recognized and met, however, partial and multiple correlation furnishes us with an exact and powerful instrument for the analysis of problems which arise in mental and social measurements. VI. Spurious Correlation 1 The correlation between two sets of test scores is said to be "spurious" when it is due in whole or part to factors other than those which determine performance in the tests themselves. In general, the cause of spurious correlation may be said to lie in a failure to control conditions; and the most usual effect of this lack of control is a "boosting" or dilation of the coefficient. Some of the more general situations which may lead to spurious correlation are given under the following heads: 1. Spurious Correlation Due to the Heterogeneity of Material We have already found occasion to show elsewhere (page 221) how a lack of uniformity in age conditions will lead to iSec also Chap. IV, p. 211. PARTIAL AND MULTIPLE CORRELATION 259 correlation which is too high, i.e., is spurious. Differences in age within the group will lead to a distinctly higher correlation between two tests — when the test scores increase with age — than the correlation which we should obtain in a single age (a homogeneous) group. To cite a simple case, in a group of boys from 10 to 18 years old, a substantial correlation will appear between strength of grip and length of forearm, quite apart from any real relation, due solely to the fact that both of these physical attributes increase with age. Failure to take account of the age factor is a prolific source of error in correlational work. In stating the correlation between two tests, or the reliability coefficient of a test, we should always be careful to specify the range of ages, grades, etc., in order to show the heterogeneity of the group. With- out this information an r per se is practically valueless. Many other factors besides age may lead to spurious cor- relation. To cite a familiar example : 1 if alcoholism, degeneracy and bad heredity are all positively related, the r between alcohol- ism and degeneracy will be too high (due to the indirect effect of heredity on both factors) unless the heredity influences are kept constant. Again, to take another example, suppose that we have found the scores on a general intelligence examination and a cancellation test for two distinctly different groups, e.g., 500 college seniors and 500 day laborers; and that the average ability in both tests is definitely higher in the college group. Now if the correlation between these tests is zero in each group taken separately, when the two groups are combined a positive correlation will be obtained due simply to the hete- rogeneity of the composite group. 2 Such a correlation is, of course, spurious. To be valid, it is clear that a correlation must be freed of extraneous influences which affect the homogeneity of the material. When such influences cannot be determined quan- 1 Kelley, T. L., Tables to Facilitate the Calculation of Partial Coefficients of Correlation and Regression Equations, Bull. Univ. Texas, 1916, No. 27. 2 Otis, A. S., Statistical Method in Educational Measurement, 1925, pp. 334- 336. 260 STATISTICS IN PSYCHOLOGY AND EDUCATION titatively, this is far from an easy task. Provided, however, the factor or factors producing heterogeneity are measurable, their influence may usually be allowed for by the method of partial correlation. 2. Spurious Index Correlation It can be shown x that three variables Xi, X2, and X3 may be totally uncorrelated, and still a correlation between Z\ = ^r- A-3 X 2 8 "id Z2 = -tf* may be obtained which is as large as .50. To take a -*3 concrete case, if two individuals observe a series of magnitudes (e.g., Galton Bar settings) independently, the absolute errors of observation (Xi and X2) may be uncorrelated, and still a distinct correlation appear between the errors made by the two observers when these are expressed as per cents of the magnitude observed (X3). The spurious element here is, of course, the common factor, X3, in the denominator of the ratios. One of the commonest examples of spurious index correla- tion in psychology is found in the correlation of 7Q's obtained from two different intelligence tests. If the 7Q's of 500 children ranging in age from 3 to 14 years are calculated from two tests Xi and X2, the correlation between IQ Xl and IQ X2 will be con- siderably increased because of the presence of the common factor of chronological age X3 (since IQ = -^-r-\ in the two series. The spurious element here may be eliminated by holding con- stant the common factor of age through partial correlation. 3. Spurious Correlation of a Single Test With a Composite of Which it is a Member If the scores of several tests, Xi } X2, X3, etc., are averaged or added, and the composite scores, X com . correlated with the scores of any single test Xi, the correlation resulting will be too high (spurious) because of the presence of Xi in the composite. 1 Yule G. U., An Introduction to the Theory of Statistics, pp. 215-216. PARTIAL AND MULTIPLE CORRELATION 261 The amount or degree of the spurious element is measured by the ratio - in which t = the number of elements in the single s test, and s = the number of elements in the composite 1 (see page 293). To illustrate: there are 20 items in the Number Series Completion Test of the Army Alpha, and 212 items in the whole test. Now if there were no correlation at all between the scores on Alpha and Completion there would still be a spurious cor- relation between the two tests equal to the ratio of the number of items in Completion to the total number of items in Alpha, i.e., 2 2 A or .094. A correlation obtained between Completion and Alpha, therefore, will be too high, due simply to the inclu- sion of the Completion items in both sets of data. It should be noted that when several tests are all of the same — or approximately the same — length, the amount of spurious correlation which will result from correlating any single test with a composite of them all is approximately con- stant ( - is same ) . For this reason it is valid to compare the correlations of the separate tests with the composite in order to discover which tests are most representative of the capacity measured by them all (see page 267). VII. Summary of Formulas in Chapter V 1. Partial r's, ^12.34 . . . (»-l)— Tln.34 . . . (n-l)^2».34 . . . (n-1) //in x ri2. 3 4 . . . » = , ,- =. . (49) VI— r-l n .34 . . . (»-l) V 1— r^2n.34 . . . (n-1) 2. Partial o-'s, 0-1.234 • • • ft = (TlVl -rV^l _ r 2 13 2 Vl -r 2 14.23 . . . Vl- r 2 l„. 23 . . . (»-!)• (50) 3. Regression Equation, Deviation Form, Xl=bl2.S . . . n^2 + ?>13.2 . . . n%3 ■ • . + &ln.23 • • • (n-l)X n . (51) 1 Musselman, J. R., Spurious Correlation Applied to Urn Schemata, Journal of American Statistical Association, Vol. XVIII, Sept., 1923. 262 STATISTICS IN PSYCHOLOGY AND EDUCATION 4. Regression Equation, Score Form, X\ = &12.34 . . . w X2 + 6l3.24 • • • nXs . . . + &l w .23 . . . ( n -l)X n -\-K. (52) 5. Regression Coefficients, 7 0-1.234 ...71 , co x 012.34 . . . n = ?12.34 . . . n {OS) 02.134 . . . n 6. Standard Error of Estimate, 0(est.A' 1 ) = CT1.234 . . . n (54) 7. Probable Error of Estimate, PE (est. x x )= • 6745X0- (est. xi) (55) 8. Multiple Coefficient of Correlation, #i(23 . . . n) — \ll — o~ — ~~^ (56) \ a~i 9. Formula for " Chance'' R, # = ^p. (57) 10. Alternate formula for R, #1(234 ...«)= Vl-[(l-r2 12 )(l-^13.2) • • • (l-r 2 m.,3 . . . („-!))]. (58) PROBLEMS 1. The r for intelligence and school achievement in a group of children 8 to 14 years old is .80. The r for intelligence and age in the same group is .70. The r for school achievement and age is .60. What will be the correlation between intelligence and school achievement in children of the same age? 2. 'The correlation between (1) Army Alpha and (2) Cancellation in a group of 100 freshmen is .20. The correlation between (1) Army Alpha and (3) Controlled Association in the same group is .70. The correlation between (2) Cancellation and (3) Controlled Association is .45. What is the net correlation between Alpha and Cancellation in this group? Between Alpha and Controlled Association? How do you interpret your results? PARTIAL AND MULTIPLE CORRELATION 263 3. Given the following data : 1 Xi = high school grade in mathematics. X 2 = grade in an English interest test. X 3 = grade in a history interest test. X 4 = grade in a mathematics interest test. o- 1 =4.93 r 12 =.20 r 23 =.63 0-2 = 3.13 r 13 =.15 r 24 =.21 cr 3 = 6.12 r 14 =.24 r 34 =.54 0-4 = 4.64 (a) Work out the regression equation of Xi on X 2 , X 3 , X 4 . (6) What are the relative weights of the three tests, X 2 , X 3 , and X 4 , in determining the score on Xi? 4. The following records were secured from 450 Liberal Arts freshmen at Syracuse University: 2 Honor points 2. Intell. 3. Aver. H. S. Grades 4. Units 5. Hours per week of study Mi = 18.5 Mr- = 100.6 M 3 = 79 M 4 =16.1 M 5 = 24 o-! = 11.2 0-2 : = 15.8 o 3 = 7.5 0-4= 1.5 0-5= 6 r 12 =.60 7*23 = .36 r 3 4 : = .40 r 45 =.25 r 13 =.40 ( r 2 4 = .20 r 3 5 = .11 r 14 =.22 T2b = -.35 r 15 =.32 (a) Work out a regression equation with (1) honor points as the dependent variable. (b) If a student has an intelligence score of 110, a High School average of 75, offers 15 units for entrance, and studies on the average 25 hours per week, what is his most probable number of honor points? 5. Using as much of the data in Example (4) as is necessary, find how many hours a student must study if he has an intelligence score of 120, and wants to make 20 honor points? (Hint : work 1 Kelley, T. L., Educational Guidance, Teachers College, Contributions to Education, 1914, 71, p. 104. 2 May, Mark A., Predicting Academic Success, Journal of Educational Psychology, 1923, Vol. XIV, 7, 429-440. 264 STATISTICS IN PSYCHOLOGY AND EDUCATION out the regression equation of study hours on honor points and intelligence and substitute the given values in the equation.) 6. Let Xi be a criterion, and X 2 and X 3 two other tests. Correlations and a's are as follows : ri2=.60 r 23 =.20 <r,= 5.00 n 3 =.50 a 2 = 10.00 o-3= 8.00 How much more accurately can X x be predicted from X 2 and X 3 in combination than from either alone? 7. Given a team of two tests, each of which correlates .50 with a criterion. If the correlation of the two tests is .20, (a) How much would the addition of another test which correlates .50 with the criterion and .20 with each of the other tests improve the predictive value of the team? (6) How much would the addition of two such tests improve the predictive value of the team? 8. Two absolutely independent measures B and C completely deter- mine a third measure A. If B correlates .50 with A, what is the correlation of C and A? 9. Using the data given in Example (1) above, analyze school achieve- ment in terms of intelligence and age. What is the relative importance of the contribution made by these factors? 10. A group test contains 10 tests with a total of 200 items. One of the tests correlates .60 with the composite scores on the battery. If this test contains 15 items, how much of the given correlation is spurious? Answers 1. r=.67. 2. The r between Alpha and Cancellation is — .18; between Alpha and Controlled Association, . 70. 3. (a) xi= .37x 2 -.llz 3 +.28:c4. (6) Grade in mathematics = 6. 5 (grade in English interest test) —2 (grade in history interest test) +5 (grade in mathematics interest test). PARTIAL AND MULTIPLE CORRELATION 265 4. (a) Xi=.58X 2 +. 14X 3 -1. O3X4+I. 10X B -62 (6) 24 with a PE (est . Xl) of 4 points. 5. 18 hours with a PE iesUX0 of 2.7 hours: 18db2.7 6. From X 2 alone cr (est . Xl ) = 4 . From X 3 alone o- (e st. x x ) = 4 . 3 From X 2 and X 3 cr (est . Xl > = 3.5 7. (a) i? increases from .64 to .73. (6) R increases from .64 to .79. 8. r AC =.8m. 9. Intelligence and age contribute in the ratio (approximately) of 10 : 1. 10. .075. CHAPTER VI SOME APPLICATIONS OF STATISTICAL METHOD AND TECHNIQUE TO TESTS AND TEST RESULTS To treat properly all of the statistical methods which may be applied to tests would require not a single chapter but a volume in itself. The aim of the present chapter, therefore, is to consider simply those methods — having to do largely with correlation and reliability — which are deemed essential (1) in the treatment of ordinary problems involving tests and (2) as a foundation for more advanced work in methods of treating test results. I. The Validity of Test Scores The validity of any measuring instrument depends on the fidelity with which it measures whatever it purports to measure. A yardstick is " valid" when measurements made by it can be checked by other measuring instruments. And in like manner a test is valid when the capacity which it measures corresponds to the same capacity as otherwise objectively measured and defined. 1. Validity Determined through Correlation with a Criterion The validity of a test is usually determined by finding the correlation between the test and some independent criterion. A criterion is defined as that measure in terms of which the value of a test is estimated or judged. The criterion of a general intelligence test, for example, may be school marks, or ratings for intelligence, or some other test believed to be valid. 1 1 Stanford-Binet is often taken as a reliable criterion of general intelligence. For example, see Herring Revision of Bluet-Simon tests. 266 STATISTICAL METHOD AND TEST RESULTS 267 The criterion for a trade test is actual ability in the trade. A high correlation between a test and its criterion may be taken as evidence of validity, provided both the test and the criterion are reliable. Before accepting criterion-correlations as final, however, we must know the reliability of our test, and if possi- ble, we should know also the reliability of our criterion. 1 2. Indirect Measures of Validity When a reliable criterion is not available, indirect methods must be employed to determine validity. One indirect method is to combine the scores on a number of tests of the same general function and to judge as best (most valid for the func- tion) that test which correlates highest with the average of all. Thus Whitley 2 found for three discrimination tests, Naming Colors, Naming Forms, and Naming Objects, the following correlations : 3 [Naming Colors r= .67 Average of all three tests with \ Naming Forms r = .99 l Naming Objects r= .96 She concludes that " Naming Forms seems more a typical test in so far as it measures an ability common to these three tests. " In the absence of an independent measure of the function the average of several tests of that function may be taken as one criterion. A second indirect method of measuring validity is to find correlations between the given test and other tests, in this way discovering some of the facts which the test does, and does not, measure. For example, tests of Controlled Association, e.g., Opposites, Logical Relations, "etc., correlate much more highly with tests of general intelligence and " reasoning" than with tests of Cancellation or Color-Naming. The first group of tests is, therefore, a better (more valid) measure of the capacity i Kelley, T. L., The Reliability of Test Scores, Journal Educational Research, 1921, Vol. 3, 5, p. 370. 2 Tests for Individual Differences, Archives of Psychology, 1911, 19, p. 78. 3 The "spurious" element here is constant provided the tests are all of practically the same length (see page 261). 268 STATISTICS IN PSYCHOLOGY AND EDUCATION measured by the general intelligence and reasoning tests than the second group. (Indirect measures of this sort are advisable only in the absence of more direct and valid criteria.) The absence of valid criteria for many of his tests forces the careful psychologist to define tests strictly in terms of what they actually do. Hence the tendency of present-day testers is to call a test by some descriptive name rather than in terms of some more or less well-defined " mental function. ' ; Accord- ingly, we have Opposites Tests, and Completion Tests rather than tests of Association or Reasoning. II. The Reliability of Test Scores 1. The Reliability of a Test as Measured by Its Self-Correlation A. The " Reliability Coefficient " The reliability of a test (or of any measuring instrument) is determined by the consistency with which it measures the capacity of those taking it. If a group repeats a test and each individual in the group scores close to his first record, we regard the test as reliable. If, however, there are large positive and negative differences between the scores made by individuals on the first and second giving of the test over and above the practice effect l — and if such differences occur in a large num- ber of cases — obviously the test is inconsistent and unreliable. One method of measuring the reliability of a test is to correlate the scores made on the test by a given group with the scores made on the same or a duplicate test by the same group. This is the method of self-correlation; and the r so found is called the "reliability coefficient." When the reliability coefficient of a test is 1.00, the test is an absolutely accurate measure of whatever capacity it tests, and when the reliability coefficient is .00 the test has just no relia- bility. The lower the reliability coefficient the less the reliability or consistency of the test as a measuring instrument. 1 Practice, since it serves to increase all scores proportionally, does not affect self-correlation. It does, however, introduce a constant error. STATISTICAL METHOD AND TEST RESULTS 269 How high should self-correlation be in order to indicate a satisfactory reliability? This is an important question and its answer depends largely on the nature of the test and the size and variability of the group for whom the test is intended. Most makers of general intelligence tests demand a reliability coeffi- cient of at least .90 between duplicate forms of their tests for unselected groups of the same chronological age. To be a reli- able measure of capacity, a mental or physical test should — generally speaking — have a minimum reliability coefficient of at least .80. This minimum will vary with the group, however, as the reliability coefficient is considerably affected by the range of scores made on the test (see page 271). For this reason, in giving the reliability coefficient of a test the size and variability of the group measured should always be stated. B. Effect on Reliability of Lengthening or Repeating the Test If the self-correlation of a test is unsatisfactory two courses are open: (1) we can lengthen the test until the reliability is greater; or (2) we can repeat the test and its duplicate twice each, average the two series of scores, and correlate these averages. If after (2) the reliability coefficient is still too low, we can repeat the test and its duplicate, three, four, or as many times as is necessary to secure the desired reliability coefficient. To do either (1) or (2) empirically would require a consider- able amount of time and labor; hence it is fortunate that a good measure of the effect of (1) or (2) may be expeditiously secured by applying Spearman's (sometimes called Brown's 1 ) " prophecy" formula: Nr Tx ~.l+(N-l)r (59) To illustrate the application of this formula, suppose (a) that the self-correlation of a test is .70 and that we wish to know what will be the effect of doubling the length of the test 1 Brown, Wm., The Essentials of Mental Measurement, 1911, p. 102. 270 STATISTICS IN PSYCHOLOGY AND EDUCATION on its reliability. Substituting r = .70 and N = 2 in the formula, and solving for r x we have 2X.70 Doubling the test's length, therefore, increases the self-correla- tion from .70 to .82. Instead of doubling the length of the test, we may give it and its duplicate twice each, average the two scores made by each individual in the two series, and correlate these averages. The result will be the same (as far as purely statistical factors are concerned) as that obtained by doubling the length of the test. The " prophecy" formula may be used in another way. Suppose (6) that the self-correlation of a test or the correlation of the test and its duplicate is .80. How much will the test have to be lengthened (or how many times repeated) in order to insure a reliability coefficient (r x ) of .95? Substituting r = .80 and r x =.95 in the formula, and solving for iV, — .95= - SN - 8N 1+.82V-.8 .2+. SN' .04AT=.19 N = 4 . 75 or 5 . 00 (in whole numbers) . The test must be 5 times its present length or repeated (together with its duplicate) 5 times in order to raise the self-correlation from .80 to .95. When a test is increased in length, e.g., doubled or tripled, the items or questions added must always be equal in reliability to the reliability of the original test, if the results from the prophecy formula are to be valid. Provided this condition is satisfied, it is evident that if we increased the length of a test indefinitely we could — theoretically — raise its self-correlation to any desired figure. This seems scarcely reasonable, however; and there is evidence to indicate that while the reliability .STATISTICAL METHOD AND TEST RESULTS 271 coefficient increases according to the formula for the first four or five pooled tests, thereafter it increases ''more slowly than the prediction formula would lead us to expect." ! C. Coefficient of Reliability from One Application of a Test If a test has no duplicate and cannot well be repeated, we may measure the reliability of half of the test and then by Spearman's formula find the reliability of the whole test. The procedure is as follows: First, we make up two independent sets of scores by combining, say, alternate exercises in the test. For example, one set of scores may be the performance on the odd exercises, e.g., 1, 3, 5, etc.; the other set the performance on the even exercises, e.g., 2. 4, 6, etc.; or some other plan may be used. 2 These two sets of scores are now correlated to find the reliability coefficient of the half test. If the self-correlation of the half test so found is called r*, substituting X = 2 in Spearman's formula, we can calculate the reliability of th whole test bv the formula, 2 r h (6o; In using this formula we make the assumption that the halves of the test as we have made them up are approximately equiva- lent in difficulty and content. D. Dependence of the Reliability Coefficient on the Size and Variability of the Group The coefficient of reliability obtained from a test and its duplicate given to the pupils of a single grade cannot be taken as indicative of the same degree of reliability as the identical coefficient obtained from a group composed of pupils spread over several grades. This is due to the fact that the heterogeneity — 1 Hoizinger, Karl J., Note on the Use of Spearman's Prophecy Formula for Reliability, Journal Educational Psychology. 1923. Vol. XIV. 5. pp. 301-305. 2 Ruch. G. ML, and Del Manzo, M. C, The Downey Will Temperament Hfi Test; Analysis of its Reliability and Validity, Journal Applied Psvcbok g Vol. VII. 1. 1923. p. 65. 272 STATISTICS IN PSYCHOLOGY AND EDUCATION the size, and spread — of the two groups is different. Recently Kelley l has devised a formula from which, knowing the relia- bility coefficient of a test, say, in a group composed of pupils from a single grade, we can determine what the reliability coeffi- cient of the same test must be in a group composed of pupils from several grades in order that the test be equally effective in both ranges. The formula is Vl-r ' 2 (61) in which u and 2 are the o-'s of the scores in the small and large groups, respectively, and r and R are the reliability coefficients of the test in the small and large groups. To illustrate, suppose that in a single grade r=^.50 and c = 5.00; and that in a large group made up of children from grades 3 to 8, inclusive, 2 = 15. What R (i.e., reliability coefficient) must the test yield in the large group in order to be as effective here as in the small group? Substituting for a, 2, and r in the formula, R = .94, — which means that a reliability coefficient of .50 in the small group indicates the same degree of reliability as a reliability coefficient of .94 in the group in which the range of " talent" is three times as great. This formula may be used to determine whether a test is equally effective in parts of the range (a) as in the whole range (2) ; or in one range as in another. It also serves to make clear the necessity of always giving the size and spread of the group in stating and interpreting reliability coefficients. 2 2. The Index of Reliability By an individual's "true" score in a test is meant the average of a very large number of measurements made of the given individual on the same or duplicate tests under precisely i The Reliability of Test Scores, Journal Educational Research, 1921, Vol. Ill, 5, pp. 370-379. 2 Otis, A. S., Statistical Method in Educational Measurement, 1925, pp. 253-254. STATISTICAL METHOD AND TEST RESULTS 273 the same conditions. It has been shown 1 that the correlation between a series of obtained scores and their corresponding "true" scores may be found from the formula ^"obt. true = vVi2, (62) in which 7*12 is the self-correlation or the reliability coefficient obtained from duplicate forms of the test. Given the reliability coefficient, therefore, it is possible to secure the coefficient of correlation between a set of obtained scores and their correspond- ing true scores. This coefficient, r obt . true , is called the "index of reliability," and is the maximum value which the reliability coefficient, ri2* can take. This will be seen to follow from the fact that "the highest possible correlation which can be obtained (except as chance might occasionally lead to higher spurious correlation) between a test and a second measure is with that which truly represents what the test actually measures, — that is, the correlation between the test and the true scores of individuals in just such tests." 2 Since ri2 is usually less than 1.00, r G bt. true is nearly always greater than ri2. To illustrate the index of reliability, suppose that for a given group, ri2 = .64. Then r oht _ true = V.64 or .80, and .80 is the highest self-correlation which can be obtained (except by chance) with this test in its present form. The index of reliability is a useful and easily interpreted measure of a test's reliability, since by simply extracting the square root of an obtained reliability coefficient we can find the maximum reli- ability which the test is capable of yielding. Thus, if r& = .25, so that r obt . trU e = v .25 or .50, it is obviously a waste of time to continue using the test without lengthening or otherwise improving it. 1 Kelley, T. L., A Simplified Method of Using Scaled Data for Purposes of Testing. School and Society, 1916, Vol. IV; 34, 71. 2 Kelley, T. L., The Reliability of Test Scores, Journal of Educational Research, 1921, Vol. Ill, 5, 327. 274 STATISTICS IN PSYCHOLOGY AND EDUCATION 3. The Standard Error and Probable Error of Measurement coif) and PE {m We have seen that the reliability of a test may be measured in terms of (1) its reliability coefficient, and (2) its index of reliability. Still another way of measuring the reliability of a test is to determine how closely a score obtained on the given test approximates its corresponding true score. (True scores have been defined on page 272.) An obtained score will usually differ in some degree from its corresponding true score due to the presence of two sorts of errors, — constant errors and variable errors. Constant errors, since their weight is all in one direction, do not affect self-correlation, and can usually be ruled out or their influence measured. Variable errors, how- ever, since they may be either positive or negative, are less easily eliminated than constant errors, and hence are more effective in producing departures of obtained scores from cor- responding true scores. The measurement of the influence of variable errors, there- fore, becomes a matter of considerable importance. It may be done by calculating the standard error of measurement — written o- ( m> — which may be interpreted as a measure of the amount of variable error, or as a measure of the probable divergence of obtained scores from true scores after the elimi- nation of constant errors. The a {M ) is derived directly from the <j ( est.) as follows. In the equation ff(ejt.i)=ci^l-^i2 (see formula 32), if <n is the a of the scores in test 1, and T\% is the correlation between tests 1 and 2, then <r (est . i> measures the accuracy with which individual scores in test 1 may be esti- mated from a knowledge of the corresponding scores in test 2. Now if the scores on test 2 are taken to represent true scores, and the scores on test 1, obtained scores on the same test the equation may be written ^(est. obt.) — O'obt.'V I T obt. true. But r b». truo= v >'i2, and r 2 ODt . true = ''12 the reliability coefficient. STATISTICAL METHOD AND TEST RESULTS 275 Hence, substituting these values in the above equation, we have 0"(est. obU = 01 vl— Ti2, or writing <r {M ) for o- (est . obt.) finally, o- w = criVl-ri2. (63) Formula (63) gives the standard error of measurement for a set of obtained scores. Given ri2, the reliability coefficient of the test, and a\ (the a of the test scores) we can, from formula (63) measure the probable divergence of an obtained score from its corresponding true score. Instead of a iM ) we may find PE( M ) — which is probably more often used — by the formula PE C M)=.6745criV , l-ri2. .... (64) To illustrate the use of these formulas, suppose that in a group of 100 college men, we obtain an average Army Alpha score of 150 with a a of 15.00 points; and that the self -cor- relation of Alpha (found by correlating two forms) is .90. What are the a^M) and PE\M)! Applying formula (63), we have <r ( M) = 15V / l-.90 = 4.74 and from (64), PE\ M) = • 6745 X 15VT=T90 = 3 . 20. From the PE {M ), we may interpret this result to mean that the chances are even that the true score of any individual in the group of 100 falls within the range, obtained score±3.20. For a given obtained score of 175, the chances are even that the true score of this particular man lies within the limits 178.20 and 171.80. Expressed in another way, we may say that 50% of the obtained scores are in error (as compared with their true scores) by not more than ±3.20 points. In the formulas for a {M) and PE {M ), the o-'s of the test and its duplicate are assumed to be equal. If this is not at 276 STATISTICS IN PSYCHOLOGY AND EDUCATION least approximately true we must write these formulas as follows: _ (0-1+^2) ^/1 — — fat ~ <T(M)= 2 v l — H2, (65) and P2? ( „> =. 6745 Xp^VT^l.. . . (66) In the illustration above, if the a obtained from the first form of Alpha, and the a obtained from the second form of Alpha — had been 15 and 20, respectively, <j^ m and PE {M ) would be written (run = ^^Vl-.90 = 5 . 53 and PE {m =- 6745X5. 53 = 3. 73. The student must be careful not to confuse the formulas for 0- (est .) and P^(est.) with those for u^ m and PE {M ). The "estimate" formulas enable us to say with what degree of accuracy we can predict an individual's score on one test, — knowing his score on a second (and usually a different) test. The actual prediction of the "most probable score" is made of course, by means of the regression equation connecting the two tests. The a iM ) and PE iM ) formulas, on the other hand, enable us to determine the probable divergence of an individual's obtained score from his corresponding true score, when we know (1) the a and (2) the reliability coefficient of the test. When tests are scored in different units, the g {M ) of the one cannot be directly compared with the c^ of the other. We cannot compare directly, for example, the reliability of a score made on a tapping test (score in number of taps made in 30 sec.) with the reliability of a score on a logical memory test (scored in number of items remembered). A simple method of overcoming this difficulty is to use a ratio similar to the coeffi- cient of variation, V, described in Chapter I. Thus the ratio STATISTICAL METHOD AND TEST RESULTS 277 -~- or t (M) of the one test may be compared directly with the -r^- or . {M) of the other. In this way, the reliability of obtained scores on one test may be compared with the reliability of the obtained scores on another. III. Combining the Scores from Different Tests When a number of different tests have been given to the same individual, it is often desirable be able to combine the separate test scores into a composite score in order to express the individual's standing in the tests as a whole. The simplest procedure is, of course, to average the scores as they stand. In merely averaging results, however, two difficulties arise. The first is the difference in the size and kind of units employed in the tests. Many tests are given by the Amount- Limit Method — the work is completed (or as much as possible done) and the individual's performance is scored in terms of the time required. Many other tests are given by the Time- Limit Method — the time is fixed, and the subject's score is the number of items completed or the number of questions answered in the time allowed. It is obvious that scores ob- tained from tests given by these two methods cannot be com- bined directly. A second difficulty is the question of the relative influence or "weight" to be given the different tests in the composite score. Simply to average the "raw" (obtained) scores gives us no control over the relative importance of the various tests in the final total score. For although it is often assumed that by simply averaging results we avoid the troublesome question of weighting, what we actually do in such cases is to weight quite drastically without knowing what the weights are. With these two difficulties in mind, let us examine several methods which have been proposed for combining separate test scores into a composite score. 278 STATISTICS IN PSYCHOLOGY AND EDUCATION 1. Combining Test Scores by Percentiles If the distribution of each of the separate tests which we have given is broken up into percentiles, it becomes an easy matter to combine the separate percentile rankings in the vari- ous tests, and thus secure a final percentile ranking for each individual. The method of calculating percentiles has already been considered (page 45). It is only necessary, then, to show how percentile rankings may be combined. TABLE XXIX Percentile Distributions for 9- Year Olds on Three Tests. Method of Combining the Percentile Ratings of a Single Individual Percentiles S's 5's Perc. Tests 10 20 30 40 50 60 70 80 90 100 Score Rank Picture Completion 62 240 297 325 372 407 440 450 499 577 646 445 65 Substitution 219 190 173 158 152 141 133 126 121 109 80 126 70 Sequin Form-Board.... 34 24 21 20 18 18 17 16 15 15 13 17 60 Median percentile •. . . . 65 Table XXIX gives the percentile tables for 9 year-olds on three tests of the Pintner-Patterson series of performance tests. The subject, a 9 year-old boy, made a score of 445 on Picture Completion which gave him a percentile ranking of 65 (midway between 60 and 70) on this test. On Substitution, a score of 126 gave him a percentile ranking of 70; and on the Sequin Form Board a score of 17 gave him a percentile ranking of 60. The median of these three percentile rankings is 65, which indicates that the subject is somewhat above the average for Ins age. If the subject had been, say, 10 or 11 years old, percentile tables for these age distributions would have been used. As is evident from Table XXIX the method of combining percentile rankings is simple and straightforward; it rules out the question of different units in the tests combined, and gives each test equal weight in the final score. STATISTICAL METHOD AND TEST RESULTS 279 2. Combining Test Scores by the Method of Median Mental Age When the subjects are children, and age-norms exist for the tests administered, it is a relatively easy matter to determine the MA of the subject in each test, and then find the median of these Mi's. The median MA is the " composite score." Tables giving the MA equivalents in scores for various tests have been published by many authors J and need not be reproduced here. The method of finding a median mental age for several tests is often very useful and its results are easily interpreted. The method does not, however, apply to normal adults. 3. Combining Tests Which Have Been Weighted According to the Variability of the Test Scores When several tests have been given, all by the Time-Limit or all by the Amount-Limit Method, scores may be combined directly, the weight which each test score shall have in the composite score being determined in accordance with the varia- bility of the test scores. An illustration will make the method clear. Suppose that in a given test in which the Average = 25 and o- = 5, subject A scores 20; and in another test in which the Average = 150 and a = 15, A scores 160. Now if we simply add A's two scores, e.g., 20+160 to get 180, the score in the second test is given three times as much importance in this composite as the score in the first, since the spread, i.e., the cr, is three times greater in the second test. In order to give the two tests equal weight, we must equalize their spread or variability, and this can be done by multiplying the a of the first test by 3 or dividing the <s of the second by 3. This same procedure must then be applied to the scores. By the first operation, our composite score becomes 20X3+160 or 220; by the second operation, the 1 For example, see Whipple, Manual of Menial and Physical Tests, Vols. I and II, 1914; Pintner and Patterson, A Scale of Performance Tests, 1921; Pyle, W. H., The Examination of School Children, 1913. 280 STATISTICS IN PSYCHOLOGY AND EDUCATION composite score becomes 20-f J -f 5 - or 73 . 34. In either composite both tests will now have equal weight. TABLE XXX How to Combine Scores Weighed According to Variability Data from 200 College Women. (From Carothers, F. E., Psychological Ex- amination of College Students, Archives of Psychology, 1921, pp. 30-34.) Log. Memory Log. Memory Com- Informa- Vocab- Testa (recall) (recognition) pletion tion ulary 1 2 3 4 5 Average 6.50 37.47 35.78 104.71 73.90 a- 1.76 7.69 4.36 26.79 7.60 Multiplier to give all tests equal weight. 5 12 ^ 1 Newer 8.80 7.69 8.72 8.93 7.60 A's score 5 35 30 100 75 A's weighted score Total (all tests equal)... 25 35 60 34 75 = 229 A's weighted score: Tests 1 and 3 weighted 2,othersl 50 35 120 34 75 = 314 In order to illustrate this method of combining scores in more detail, the average and the a for each of five tests are given in Table XXX together with the scores of subject A on each test. If A's scores are added as they stand, test 4 (Information) will be given 15 times the weight of test 1 (Logical Memory, recall) in the composite, since the a for Information is 15 times the a for Logical Memory, recall. Likewise, Information will have approximately 6 times the weight of Completion and approxi- mately 3 times the weight of Logical Memory, recognition, and Vocabulary. It seems hardly probable that Information is as much superior in value as this to the other tests — in fact, it is possibly one of the least important — and hence a new weighting is clearly necessary. The simplest plan at the start will be to weight all of the tests equally as shown in the table. If we multiply the a of test 1 by 5, the a of test 2 by 1, the a of test 3 by 2, the a of test 4 by §, and the a of test 5 by 1, we make all of the a's approximately equal. Now if we multiply A's scores by STATISTICAL METHOD AND TEST RESULTS 281 these same "multipliers," the new test scores will all have the same weight in the final composite. In determining multipliers, the best plan is to keep them whole numbers, if practicable, and as small as possible. In Table XXX, for example, the o-'s of tests 2 and 5 have been taken as standards because this gives the simplest multipliers for the other tests. Suppose now that we had wished to give Logical Memory, recall, and Completion twice as much weight as the other tests in the composite. To accomplish this we should simply have multiplied the <r's of tests 1 and 3 by 10 and 4 instead of 5 and 2, i.e., we should have multiplied by enough to make their new o-'s twice as large as the cr's of the other tests. Of course, when all of the tests have already been weighted 1, we need only double the scores on tests 1 and 3. To summarize the steps in the method: (a) Find the average and the a or Q of each test. (6) If the tests are to have equal weight, multiply the cr or Q of each test by factors selected so as to make all of the new <r's or Q's equal. If some tests are to count more heavily than others, make their cr's or Q's proportionally larger. (c) Multiply each £'s score by the " multiplier" decided upon in (6), and add these new scores. Leave the result as a composite total, or average the new scores if there is some reason for working with smaller numbers. 4. Combining Test Scores by Converting the Scores of Different Tests into Comparable Series As mentioned above, the chief difficulties in combining the scores of different tests arise from differences in the units in which the tests are scored as well as differences in variability among the tests themselves. We have already considered three ways of avoiding these difficulties. Still another method is to convert the scores of the different tests into comparable distributions, after which the test scores may be combined directly. Two methods of combining tests in this way have been. 282 STATISTICS IN PSYCHOLOGY AND EDUCATION proposed, both of which assume that the distributions of test scores are normal or approximately normal. The more recent, suggested by Professor Clark Hull, 1 is to convert the scores from each test into a "standard" normal distribution in which the scores shall range from to 100 with a mean at 50 and a of 14. [Individual scores rarely spread more than ±3.5o- 50 above or below the average ; hence, since ^r— = 14 . 00 the a of o.o this distribution may be taken as 14.00.] Conversion of the scores of a given test is readily made by the following scheme: Let M— average of the given test. Let <7 = a of the given test. Let Xi = individual's score on the given test. Let 50 = average of the converted series. Let 14 = 0- of the converted series. Let X = individual's score in the converted series. Now if £ = — SindK = 50-MS; then X = i£+SXi. To illustrate, suppose that in a given test the average is 16.00, the <j is 3.5, and that subjects scores 18 on the test. What is A's converted score? S=^\ or 4.00, and # = 50-16X4 or -14.00. o . o Substituting in X = K+SX U X= -14+4X18 = 58. A's score, therefore, in a distribution of Average = 50 and a = 14 is 58. In other words (assuming a normal distribution), 58 is as far above the average of the distribution whose average is 50, as 18 is above the average of the distribution whose average is 16.00. An illustration will serve to demonstrate how scores may be combined by this method (Table XXXI). 1 The Conversion of Test Scores into Series which shall have any Assigned Mean and Degree of Dispersion, Journal Applied Psychology, 1922, 6. p. 299, STATISTICAL METHOD AND TEST RESULTS 283 TABLE XXXI Test 1 Test 2 Word Building Digit Span Total Average 16 . 30 7.4 a 4.90 1.3 A's score 18.00 8.0 A's converted score 54 . 86 56 . 48 55 . 67 Taking test 1, Word-Building, first, from the formula above, £ = ~ or 2.86; and # = 50-16.30X2.86 or 3.38. Hence, 4.9 X = 3. 38+2. 86 Xi, and substituting A's score of 18 for X\ we . . 14 have X = 54.86. In like manner, m test 2, Digit Span, & = — -x 1 . o or 10.8; and # = 50-7.4X10.8 or -29.92. Accordingly, X= -29.92+10.8X8 (substituting A's score in Digit Span) or 56.48. Averaging A's scores in Word-Building and Digit Span, we have 55.67 as the composite score, which means that A is slightly above average (50) in the two tests. Since we have computed both K and S for each of the tests, all of the scores on Word-Building may be quickly converted into "new" scores by means of the formula Z = 3.38+2.86Xi; and all of the scores on Digit Span converted into "new" scores by means of the formula X= —29. 92+10. 8X1. In each case the X\ represents the actual score on the test. An earlier method of combining test scores, based on the same principles as the above plan, was outlined in 1912 by Professor Woodworth. 1 Woodworth's plan was to find the difference between a given individual's score on a test and the average score, i.e., X— Av x ; divide this plus or minus differ- ence (ztx) by the a of the test and call the result ( — ), the "reduced score." 2 Reduced scores found in this way for the 1 Combining the Results of Several Tests, A Study in Statistical Method, Psychological Review, 1912, Vol. XIX, pp. 97-123. ? Note that in Woodworth's method the average is taken at and a as 1.00, 284 STATISTICS IN PSYCHOLOGY AND EDUCATION same individual on several tests may be combined by simply averaging them — the weight of each test in the composite will be 1.00. To illustrate the method using the data of Table XXXI, A 's score of 18 on the Word-Building test is 1 . 70 above the average, i.e., above 16.30; and dividing this deviation by the a of the series gives A a " reduced score" — a score ex- pressed in a units — of .347. On the Digit Span test, A's score of 8.00 is .6 above the average of the distribution, i.e., above 7.4; and dividing . 6 by 1 . 3 we get a reduced score on Memory Span of .462. If we average these two reduced scores, A is found to stand . 405 (in <t units) above the average of the group in the two tests. (Remember that this method, like the preceding one, assumes that the distributions of test scores are approxi- mately normal.) Of these two methods, the first is somewhat the simpler inasmuch as it involves only plus values (all transmuted scores lie between and 100), while the second method introduces plus and minus values which are nearly always fractions, often small in size and inconvenient to handle. Again, a composite score of 55 . 67 by Hull's method is probably more intelligible to the average student accustomed to think in per cents, than an average score of .405 found by Woodworth's plan. The latter result is meaningful only to those who have had considerable statistical training. Woodworth's method has one particular advantage, how- ever, which should be mentioned, viz., that when reduced scores have once been calculated for two or more tests, correlations between the tests may easily be found. The method of obtain- ing such correlations is illustrated in Table XXXII which gives the reduced scores made by 10 adults on a Memory Span and Information test, and the correlation between the two series. As shown in the table the calculations are relatively simple. Since each individual's reduced score on Memory Span (X) is simply his x (i.e., his deviation from the average) divided by & X) and his reduced score on Information (F) is, again, his y (i.e., deviation from the average) divided by cr y , the sum of the STATISTICAL METHOD AND TEST RESULTS 285 products (i.e., — • — ) of the reduced scores of all of the ten \ Vx Cfy/ 2^7*77 individuals will give — -. We know from formula (24) that O'xO'y 2t?7 r— Ar (page 168). Hence, the correlation between the i\a x (ry two tests is obtained simply by dividing - — -, (7 . 31) by N (10) : (TxCy that is, r equals .731. TABLE XXXII To Illustrate the Method of Finding Correlation from ''Reduced Scores" Memory Information (F) Reduced Score in X Reduced Score in Y Individuals Span (X) Score Scor A 5 90 B 9 60 C 8 90 D 7 85 E 6 70 F 10 100 G 12 130 H 6 80 I 5 ( 75 J 12 120 Avx = = 8.0 <TX- = 2.53 (-) (-) \(JX' \(Ty/ \ffx -1.19 .39 -''.39 - .79 .79 1.58 - .79 -1.19 1.58 Product of Reduced Scores ( xy \ (JxOy) -1.45 -!24 - .97 .49 1.94 - .49 - .73 1.46 2xy OxOy -.566 '^094 .766 .387 3.065 .387 .869 2.307 = 7.309 Av y = 90.00 0-^ = 20.62 2xy 7.31 N<r x <Ty 10 = .731 Note. — This table is intended simply to illustrate the method. A produot- moment r would not ordinarily be found for 10 cases. The student should bear in mind when using either of these methods that neither is strictly applicable when the distributions are considerably skewed. As stated above, both assume that the distributions to which they are applied are normal or approximately normal. 286 STATISTICS IN PSYCHOLOGY AND EDUCATION IV. The or of the Sum or Difference of Corresponding Values of Two Series of Test Scores If we know the correlation between two series of test scores Xi and X2 and the cr's of the two series, it is possible to compute, in a simple way, the a of the new composite series obtained by adding or subtracting the corresponding scores in the two original series. When the scores of the "new" distribution have been found by adding corresponding scores, the formula for a s l is (Ts—^o- 2 Xl +(T 2 X2 -\- 2ra Xl a X2 , (67) in which cr s denotes the a of the "new" summed-series, a Xl is the a of the Xi scores, a X2 is the cr of the X2 scores, and r is the coefficient of correlation between Xi and X2. When the scores in the new distribution have been obtained by subtracting cor- responding scores in the two tests, formula (67) becomes <rd=^/(T 2 x 1 +(T 2 X2 -2ra Xl (T X2 , (68) in which ad is the a of the new difference-series. A problem will illustrate the use of these formulas. Let Xi denote a Verb-Object Test and X2 an Opposites Test. Then given 0^=11.18, 0^ = 9. 00, and r XlX2 = .60, what is the a of the new series obtained (1) by adding the corresponding Xi and X2 scores, and (2) by subtracting the corresponding Xi and X2 scores? Substituting in formula (67), we have or < t 3 =\ / (11.18) 2 +(9.00) 2 +2X. 60X11. 18X9, a 8 = 18.07. Thus, 18.07 is the a of the (X1+X2) series. To find the a of the (Xi— X2) series, ad, we substitute in formula (68), cr d =V / (11.18) 2 + (9.00) 2 -2X. 60X11, 18X9. 00, or <7d = 9.23. 1 For a simple mathematical proof of this formula, 9ee Yule, An Introduction to the Theory of Statistics, pp. 210-211. STATISTICAL METHOD AND TEST RESULTS 287 Formula (68) is often useful when a test has been repeated in a group under changed conditions and the variability of these changes, i.e., the <j of the differences between scores made on the second and the first giving of the test, is sought. Except that there is only the one test concerned, the method is identical with that of the problem above. The chief objection to the formula is that the r between the scores on the first and second giving of the test must be known. For this reason, unless the r is wanted for other purposes, it is usually easier to subtract the corresponding scores and derive the a of their differences directly. From the formula for the reliability of the average, <r av = ~^i, VN (formula 13), we know that o- (dls .) = v JW av .. We may, therefore, write ViVVav.^ instead of <r Xl ; VW<7 av .z 2 instead of <r X2 ; ViVo- av . s instead of a s ; and v JW av .d instead of <x d . Making these sub- stitutions in formulas (67) and (68) we have (the iV's cancel), that 0"av. s—v 0"*av. x x + C"av. x 2 4" 2r<7 av . xi^av. x 2 , • • (69a) and ( Cav.d = v (7- 2 a v.i 1 +0- 2 av . a ; 2 — 2/'(7a, v . Xl 0- a v. x 2 ' • • (696) in which o- av . s is the a of the average of the (X1+X2) series of scores, and <7 av . a is the a of the average of the (Xi — X2) series of scores. Formulas (69a) and (696) must always be used whenever there is any correlation between the X\ and X2 scores. If Xi and X2 are uncorrelated, that is, if r = . 00, the third term under the radical disappears and (69a) and (696) become Oav. s = v O^av. x t + C 2 av. x 2 , .... (70a) and %.d = ^ 2 av.ii + ff 2 av.i 2 (706) Now if we write <r^m.) instead of o- av . d in formula (706), we at once recognize the familiar formula, cr (dlff .) = V c 2 av . 1+ <r 2 av . 2 , 288 STATISTICS IN PSYCHOLOGY AND EDUCATION which we have used heretofore for measuring the reliability of the difference between two averages, or with appropriate changes, two <r% or two r's. It should always be remembered that 0-( dl ff.) is simply a special form of the more general formula (696) and that it always assumes a zero correlation between Xi and X2. The PE may be written for a in any of the formulas given in this Section by making the substitution PE = . 6745 X <r. V. How to Interpret the Coefficient of Correlation BETWEEN TWO TESTS When can a coefficient of correlation be considered "high"? Is an r of .40 between two tests evidence of "low" or "marked" relationship? Questions like these, and many others which relate to the interpretation of a coefficient of correlation fre- quently arise in test work and must be answered if we would understand the significance of an obtained r. The effectiveness of an r as a measure of relation may be evaluated in several ways: (1) in terms of the standard error of estimate ; (2) in terms of the standard error of measurement ; and (3) in terms of the percentage of factors common to the two capacities correlated. Let us consider these three approaches to an interpretation of r before attempting to lay down any general rule for classifying r's as "high," "medium," or "low." 1. The Interpretation of a Coefficient of Correlation in Terms Of 0- ( est.) The standard error of estimate, o- (eS t.)> is probably the most practicable way of evaluating the effectiveness of a coeffi- cient of correlation. This follows from the fact that a^st. x x ), which enables us to tell how accurately we can estimate an individual's score on test Xi knowing his score on test Xo, depends on the r between the two tests. When r = 1 . 00, o"(est. xi> = • 00, which means that we can predict a score in Xi from a knowledge of X2 with perfect accuracy — no error. STATISTICAL METHOD AND TEST RESULTS 289 To take the opposite extreme, when r = . 00, o-( es t. x x ) = 01 directly, which means that we can only be certain that the predicted score lies somewhere within the limits of the Xi dis- tribution, i.e., within the limits, Obtained Score ±3c. In other words, the estimate from the distribution of X\ alone is as good as the estimate made with the addition of X2. As r decreases from 1 . 00 to 0, the standard error of estimate rapidly increases, so that predictions from the regression equation range all of the way from certainty to practically guesswork. The closeness of the correspondence denoted by an r, therefore, may be gauged by the size of cr (est0 . We may illustrate with the following problem. Suppose that the correlation between two tests X\ and X2 is .60, and that a Xl = 5. 00. Then er (es t. Xl ) is 5 X Vl - . 6 2 or 4 . 00, which is only 20% less than 5.00 the <7(est. x£> for r= .00, i.e., for~a mini- mum predictive value. The proportionate amount of reduc- tion in (7(est. x)! as r varies from .00 to 1.00 is given by the expression vl- r 2 , and hence it is possible to estimate the " predictive " value of an r from Vl— r 2 alone. This radical (vl — r 2 ) has been designated by Kelley 1 the "coefficient of alienation," and is usually denoted by the letter "k" k may be thought of as measuring the absence of relation between two variables Xi and X2, in the same way that r measured the presence of relation. Thus when k = 1 . 00, r = . 00, and when k = . 00, r = 1 . 00 — the larger the coefficient of alienation the greater the lack of relation, and the less the value of the prediction. In order to show how the estimate improves as r increases, the k's for the values of r from .00 to 1.00 are given in Table XXXIII. It will be noted that r must be .866 before k is half way between perfect correlation, and a guess: — before the stand- ard error of estimate is reduced one-half. For r's of .30 and less, the coefficients of alienation are so large that the predic- 1 Kelley, T. L., Principles Underlying the Classification of Men. Journal of Applied Psychology, 1919, Vol. Ill, 1, p. 50. 290 STATISTICS IN PSYCHOLOGY AND EDUCATION tions based on them are but little better than a guess. Even with an r — . 99, it will be noticed that the standard error of estimate is still \ as large as when k = 1 . 00. It is obvious, then, that in order to estimate individual scores with accuracy, the correlation should be at least . 90. TABLE XXXIII Giving Coefficients of Alienation k for Values of r FROM .00 TO 1.00 r fc= Vl-r 2 r k= y/i-r* .00 1.0000 .80 .6000 .10 .9950 .8660 .5000 .20 .9798 .90 .4539 .30 .9539 .95 .3122 .40 .9165 .98 .1990 .50 .8660 .99 .1411 .60 .8000 1.00 .0000 .70 .7141 (.7071) .7071 2. The Interpretation of a Coefficient of Correlation in Terms of the Standard Error of Measurement, cr {M) . We have found (page 183) that the standard error of measurement enables us to estimate the probable divergence of an obtained score on a test from its corresponding true score. Moreover, since <rw) = <riVl — ri2, the amount of this probable divergence will depend to a large degree upon the size of the self-correlation, ri2, and accordingly it follows that the value of ri2 as a measure of relation may be determined from the size of o-(jif). When r=1.00, for example, o-(ad=.00, and every obtained score equals its true score exactly. When r = . 00, on the other hand, cr(M) = <ri (the <j of the distribution) and we can only be sure that the true score (corresponding to a given obtained score) lies somewhere within the limits of the dis- tribution — within the limits ±3c. In other words, when r— .00, the probable divergence of an obtained score from its true score is as great as it would be had we simply guessed that the true score lay somewhere in the distribution. To illustrate, suppose that the reliability coefficient of a given STATISTICAL METHOD AND TEST RESULTS 291 test, n 2 =.80, and that 01*= 10.00. Then (T ( M) = 10Vl- .80 or 4.472, and since <rw) is 10.00 when r=.00, evidently a reliability coefficient of .80 serves to reduce a^M) to about 45% of what it would be in the event of a guess. The re- duction in aw as r varies from to 1.00 is given by the expression vl- ru. Hence this factor may be used to test the effectiveness of an obtained reliability coefficient, just as k tests the value of the r between two tests. In Table XXXIV the values of vl — r l2 have been calculated for r's from .00 to 1.00. TA] BLE FOR XXXIV Values r of r Giving Values of vl— r 12 V*l— TO FROM . 00 TO 1 . 00 r Vl~TO .00 1.0000 .80 .4472 .10 .9487 .90 .3162 .20 .8944 .95 .2236 .30 .8367 .98 .1414 .40 .7746 .99 .1000 .50 .7071 1.00 .0000 .60 .6325 .70 .5477 .75 .5000 ( From Table XXXIV it is evident that the self-correlation of a test must be at least . 75 before v 1 — ri2 is half way between complete reliability and a guess. For an 7*12 = .98, the chances are still 68 in 100 that a given score will diverge from its true score by as much as ± . 1414 of the a of the test. Since high reliability coefficients, therefore (e.g., .90 or above), indicate relatively large departures from perfect reliability, it is clear that a self-correlation of, say, .30 or .40 is almost valueless. 3. Interpretation of a Coefficient of Correlation in Terms of the Percentage of Common (Overlapping) Elements or Factors It is sometimes helpful to regard a coefficient of correlation as a ratio which expresses — directly or indirectly — the per- 292 STATISTICS IN PSYCHOLOGY AND EDUCATION centage of elements or factors common to the tests which are correlated. Or again, r may be thought of as a device for indicating the extent to which the factors which determine capacity in the one test "overlap" those of another test. 1 Let us suppose that capacity in test X depends upon the presence or absence of a+c independent, elemental, factors; and that capacity in test Y depends upon the presence or absence of b-\-c independent, elemental, factors. The a factors determine X scores alone, the b factors Y scores alone, and the c factors are common to both X and Y. Moreover, let us suppose further that all factors, a, b, and c, are governed solely by the laws of chance, so that each factor is as likely to be present as absent in the same way that a coin when tossed is as likely to fall heads as tails. Now if we let n a = total number of a factors, n h = total number of b factors, and n c = the total number of c factors, it can be shown 2 that the correlation between X and Y is given by the formula : r=- n„ , = (71) That is, the coefficient of correlation equals the number of com- mon factors in X and F, -X- -Y- a a a a cccc bbbbbbb r = .426 V8xIT DIAGRAM XXVII divided by the geometrical mean of the total number of factors in X and Y. This situation is shown graphically in Diagram XXVII in which X is determined by 8 factors, 4 a's and 4 c's, and 7 by 11 factors, 7 6's and 4 c's. The correla- tion by formula (71) is 4 4 ■ or -7== = A9Q V(4 + 4)(7-H) fSxll 1 The following is adapted from the discussion by Kelley, Statistical Method, pp. 189-190. 2 See Kelley, Statistical Method, 1923, p. 190; or Brown, Wm., Essentials of Mental Measurement, 1911, pp. 79-SO. STATISTICAL METHOD AND TEST RESULTS 293 If the number of elementary factors determining the score in X equals exactly the number determining the score in Y, so that n & = n h , formula (71) becomes n c n & +n c ' (72) and the coefficient of correlation is now simply the decimal fraction which indicates what proportion of the causes influenc- ing performance in X and Y are common to both. If t = ihe number of common factors (n c ) and if s = the total number of factors, present in X and Y (n a +n c ) r is simply -. (Remem- ber that the factors in X and Y are assumed to be equal in number and influence.) This condition is illustra- ted in Diagram XXVIII. Since X is determined by 8 factors, 4 a's and 4 c's and Y by 8 factors, 4 b's and 4 c's, the correlation by formula (72) is 4/8 or .50. Now let us assume, lastly, that Y is completely determined by n c elements, and that X is determined by these same elements plus n & elements in addition (n b = 0). Formula (71) then becomes ■Y- bbbb -X- a a a a c c c c = .50 DIAGRAM XXVIII r — V^c(™a+™c) (73) and the coefficient of correlation equals the number of common elements in X and Y divided by the geometrical mean of the total number of factors in X and in Y. Diagram XXIX shows this graphically. Y is determined by 4 c's and X by these factors plus .4 4 a's in addition: the correlation, therefore, is , : or .707. If V4X8 a a a a -Y- c c c c 294 STATISTICS IN PSYCHOLOGY AND EDUCATION we square the r obtained from formula (73), we have that r2= rf^' ■ ™ that is, the square of the coefficient gives the extent to which the elements in Y overlap those of X:— or the proportion of elements in X which are also involved in Y. In Diagram XXIX note that Y overlaps X 50% and that r 2 — i.e., (.707) 2 — is .50 as _ x „ it should be. 1 Moreover, since the coefficient of alienation will equal .707 when r=.707 (see Table XXXIII), it follows that an r of . 707 (and not . 50) should be taken r= 4 =.7n7 as half of a perfect correlation. 2 On the same assumptions, an over- DIAGRAM XXIX , , oolr)y , lapping oi 33 1% common ele- ments — i.e., r 2 =.3334 — will give a correlation of .578, which is 1/3 of a perfect correlation; and an overlapping of 25% common elements, r 2 = . 25, gives an r = . 50, which is 1/4 of a perfect correlation. By analogy, an r of .30 or less implies so slight a degree of overlapping that there can be a very small percentage of common elements. The coefficient of correlation as a measure of the percen- tage of common factors may be seen to best advantage in series formed by tossing coins or throwing dice, in which the " overlapping " is arbitrarily determined and controlled at will. As an illustration, consider the correlation table in Diagram XXX in which is shown the relation between two series of 500 successive throws of 12 pennies made in the fol- 1 This result has interesting implications. Thus if all of the elements in test X2 are common to X\ (e.g., a criterion) the extent to which A' 2 overlaps Ai is given by simply squaring the coefficient, r X ixi- The assumption must be made, of course, that the scores in both tests are summations of independent and similar elements whose presence or absence is governed by chance alone. 3 Woodworth, R. S., Combining the Results of Several Tests: A Study in Statistical Method, Psychological Review, 1912, XIX, p. 113. Hull Clark, The Joint Yield from Teams of Tests, Journal of Educational Psychology, 1923, 14, pp. 396-406. STATISTICAL METHOD AND TEST RESULTS 295 DIAGRAM XXX Showing the number of heads in 500 successive throws of 12 pennies in which 7 pennies were tossed in the second throw and 5 remained as they fell in the first throw of all 12 together. Heads in First Toss 1 2 3 4 5 6 7 8 9 10 11 12 Total 12 1 11 1 2 1 2 3 1 1 10 10 CO 9 2 9 13 4 3 31 o H 8 1 5 9 10 18 14 4 2 63 Q O 7 1 2 5 14 24 28 10 7 4 95 o 6 1 3 9 18 27 29 16 3 2 1 109 CO 5 4 11 23 21 15 9 5 1 83 P < 4 3 6 9 21 14 10 69 w 3 3 3 8 4 4 4 26 2 (3 1 5 1 1 11 1 1 1 11C GO 21 9 2 54 93 112 Total 11 20 2 500 X Y a a a a a a a G c c c c b b b b b 6 b n a = n& = 7 n c r = n a -\-n c 12 By calculation (product-moment) r=.424. .416. (72) i From Pearl, R., Medical Biometry and Statistics, p. 297 (after Darbishire). 296 STATISTICS IN PSYCHOLOGY AND EDUCATION DIAGRAM XXXI Showing the results of 100 successive throws of dice in first throw of which (X) 5 dice were thrown, counted, and left down; and in each second throw of which (Y) 5 additional dice were thrown and counted together with the 5 left down (10 in all). Fiest Throw OF 5 Dice (X) w o Q o 1-1 o o « w H n O u w 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Total 45 1 1 1 1 1 1 1 1 4 1 2 1 2 2 1 1 1 11 2 1 1 2 2 1 1 1 1 12 1 1 2 2 1 2 1 1 1 1 1 14 1 1 3 2 1 2 1 1 2 1 15 1 2 1 2 6 1 1 1 1 1 1 6 1 1 1 1 1 5 1 2 1 1 1 3 1 5 6 2 44 43 42 41 40 6 6 6 39 38 37 1 1 9 8 9 36 35 34 1 9 3 5 5 4 2 5 33 1 32 31 1 1 3 2 1 1 6 30 29 1 28 2 27 2 4 o o 3 1 2 100 26 1 25 1 Total 3 1 7 By calculation (product-moment) r = . 694 n c = o (5) N a a a a a (5) c c c c c —Y— Vn c (n a +n c ) V5X10 = .707 (73) STATISTICAL METHOD AND TEST RESULTS 297 lowing way: first, all 12 pennies were tossed, and the number of heads recorded and noted in the X column; then 5 coins were left lying and the remaining 7 were tossed again and the number of heads in all 12 recorded and noted in column Y, opposite the X entry. By this scheme 5 coins (factors) contrib- ute to each pair of tosses ; and hence, according to formula (72) the correlation should be 5/12 or .416. By the product-moment formula the actual correlation between the two series is .424, which indicates a very close correspondence between actual and theoretical results. The situation existing in each pair of X and Y tosses is shown in the figure in Diagram XXX. If 4 coins had been left lying, the r would have been 4/12 or .334; if 6 had been left lying, r would have been 6/12 or . 50 etc. A number of diagrams of the sort shown, in which the number of common factors (i.e., coins left lying) varies from to 12, and r from to 1 . 00 may be found in Pearl's Medical Biometry and Statistics, pages 294-300. Now suppose that we calculate the correlation between two series of dice throws made according to the following scheme : 1 5 dice are thrown, and the total read and recorded in the X column; then 5 additional dice are thrown and the total of all 10 (the 5 left and 5 just thrown) are read and recorded in the Y column. If this is continued until 100 throws have been made, we shall have 100 X and 100 Y entries, each Y throw (of 10 dice) "overlapped" to the extent of 50% by its corresponding X throw (of 5 dice). And since all of the ele- ments in X are completely contained in Y } the correlation be- 5 tween X and Y should, by formula (73), be , or .707. V5X10 (See Diagram XXXI and accompanying figure.) Actually, the correlation by the product-moment formula is .694, which indicates, again, a very close correspondence between actual and theoretical results. The square of this r gives us approxi- mately . 50 as the percentage of common elements in X and Y : 1 These throws were made by the writer* 298 STATISTICS IN PSYCHOLOGY AND EDUCATION that is, we have one half of a perfect correlation. (See page 294.) While formulas (71-74) are interesting and suggestive as giving us the means of interpreting a coefficient of correlation under certain special or restricted conditions, it would be a mistake to apply them generally, — to assume that by simply squaring the coefficient of correlation we can always determine the percentage of common factors or the amount of overlapping. It seems likely that the scores on most psychological tests as well as many social and educational measurements are the result of the combined action of many factors which are often dependent on each other, and probably interwoven in a rela- tively complex manner. At any rate, we do not know that a test score is simply the sum of a certain number of similar and independent elements. Summary From the discussion in the preceding paragraphs, it is evident that even with correlation coefficients which we have been accustomed to think of as high, the departure from perfect correlation is considerable. Strictly speaking, the term "high correlation " should be applied only to coefficients which are .95 or above. However, in mental, social, and educational measurements there are so many actual and potential sources of error due to the variability of the material dealt with, and the relative crudity of the measurements made, that very few tests indeed could meet this requirement. Very seldom do correlations between tests run above .70 or .75; and hence it is probably justifiable, in view of the limitations mentioned, to regard such coefficients as high. There seems to be fairly general agreement among workers with tests that an r from .00 to =b .20 denotes indifferent or negligible relation. r from ± .20 to ± .40 denotes low correlation: present but slight. r from ± . 40 to d= .70 denotes substantial or marked relationship. r from ± . 70 to =fc 1 . 00 denotes high relation. This is a tentative classification which is to be taken as only STATISTICAL METHOD AND TEST RESULTS 299 generally true. The size of a correlation coefficient should always be evaluated with due regard for the material dealt with, the size of the sample, and PE T , no matter what its absolute value. PROBLEMS 1. The self-correlation of a certain test is .60. (a) How much must the test be lengthened to raise the self -correla- tion to .90? (6) What effect will doubling the test have on its reliability? 2. Two equivalent half-scales are made up from the Downey Will- Temperament * Test in the following way: (1) by grouping all odd-numbered tests in one half-scale, and all even-numbered tests in the other; (2) by grouping the first two tests of every pattern into one half-scale, and the last two tests into another half-scale ; (3) by grouping the first and last tests of each pattern into one half-scale, and the second and third tests of each pattern into a second half-scale. Reliability coefficients for the half -scale were found as follows by the three methods : iV=146 Method Reliability Coefficient 1 .17 2 .31 3 .24 Average .24 What is the reliability of the whole Downey test? 3. In a small group the reliability coefficient of a test is .55 and the a of the test scores is 3.00. What must the self-correlation of this test be in a larger group whose a is 5.00, in order to have the same degree of reliability? 4. The reliability coefficient of a test, as found in a large unselected group, is .92; the Average is 142 and a is 16.00. If an individual makes 150 on the test, (a) What is the PE of this score, i.e., the PE {M) 1 (b) Within what range does the true score lie? i Ruch, G. M., and Del Manzo, M. C, The Downey Will-Temperament Group Test: A Further Analysis of Its Reliability and Validity. Journal Applied Psychology, Vol. VII, 1923, p. 65, 300 STATISTICS IN PSYCHOLOGY AND EDUCATION (c) In a second test of a different function, the reliability coeffi- cient is .86; the average is 54 and cr is 10.00. In which test are the obtained scores the more reliable, i.e., closer to the true scores? 5. The reliability coefficient of a test is .80. What is the maximum self-correlation obtainable with this test as it stands? 6. Given the following records (all in seconds) for 100 Barnard Freshmen; - 1 and the scores made by individual A. Tests Coordinate Tapping Color Naming Opposites Average 82.7 376.3 57.0 51.1 SD 10.8 51.7 8.8 10.3 A's scores 85 350 62 40 (a) Combine A's scores by the method of variability weighting all tests 1. (b) Combine A's scores weighting Coord, and Tapping 1 each, Color Naming 3, and Opposites 4. 7. Using the data in Example 6 above, combine A's scores by the two methods given on pages 282 and 283. Since all scores are in seconds, the higher the score numerically the lower it actually is. 8. One hundred and fifty high school seniors make an average score of 120 on Army Alpha with a cr of 21.6. Two weeks later the group is praised for its performance (without, however, being told what the scores were) and given a second form of Alpha on which the average score is 126 and the a is 24.2. The r between the tests is .86. (a) Is the effect of the incentive (praise) plus the practice effect sufficient to bring about a real increase in average score? How would you rule out the practice effect? (b) Why is it necessary to have the correlation between the tests? 9. A battery of tests correlates .85 with a criterion. Assuming that performance on the battery is completely determined by X elements, and performance on the criterion by X-\-Y elements, to what extent may we say that the battery probably " overlaps " the criterion? 10. Interpret a coefficient of correlation ?*=.50 in three ways; an r=.65? i Carothers, F. E., The Psychological Examination of College Students, Archives of Psychology, 1921, No. 46, pp. 21ff. STATISTICAL METHOD AND TEST RESULTS 301 Answers 1. (a) 6 times. (6) r=.75 2. Method 1: r= .29. Method 2: r= .47. Method 3: r=.39. Average of all three methods : r = . 38. 3. r=.84. 4. (a) P# (M) = 3.05. (6) Between 162.2 and 137.8. (c) In the first test. The —^=.021 (first test); — — Av. Av. = .047 (second test). 5. r=.89. 6. (a) Taking as multipliers for the four tests, 1, -J, 1, and 1, re- spectively, we have 257 as A's composite score. (6) A's score is 501. (Since the measures of performance are in time units, the higher the numerical score the lower the actual performance.) 7. A's scores are 47, 57, 42, and 65. Her average is 52.75. (Hull's method.) A's scores are —.213, +.509, — .568, +1.078; her average is .202. (This means that A stands .202<7 above the average cf the group on the four tests.) D 8. (a) Yes. is 5+. °dlff. 9. About 72% common elements. REFERENCES The following books will be found to be helpful as general references : 1. Primer of Statistics, by W. P. and E. M. Elderton. A. & C. Black, Ltd., London. 1910. 2. Mental and Social Measurements, by Edward L. Thorndike. Published by Teachers College, Columbia University. 1912 (revised edition). 3. Statistical Methods Applied to Education, by Harold O. Rugg. Houghton Mifflin Company. 1917. 302 STATISTICS IN PSYCHOLOGY AND EDUCATION 4. An Introduction to Statistical Methods, by Horace Secrist. Macmillan Company. 1917. 5. How to Measure in Education, by Wm. M. McCall. The Mac- millan Company. 1922. 6. The Theory of Educational Measurements, by Walter Scott Monroe. Houghton Mifflin Company. 1923. 7. The Fundamentals of Statistics, by L. L. Thurstone. The Mac- millan Company. 1925. 8. Statistical Method in Educational Measurement, by Arthur S. Otis. World Book Company. 1925. More advanced books are: 1. Elements of Statistics, by A. L. Bowley. P. S. King and Son, London. 1920 (fourth edition). 2. An Introduction to the Theory of Statistics, by G. Udny Yule. Chas. Griffin and Company, London. 1919 (5th edition). 1 3. Essentials of Mental Measurement, by W. M. Brown and G. H. Thomson. Cambridge University Press. 1920. 4. A First Course in Statistics, by D. Caradog Jones. G. Bell & Sons, London. 1921. 5. Statistical Method, by Truman L. Kelley. The Macmillan Com- pany. 1923. 6. Handbook of Mathematical Statistics, by H. L. Rietz et al. Houghton Mifflin Company. 1924. Aids to Computation: 1. Barlow's Tables of Squares, Cubes, Square Roots, Cube Roots, Reciprocals of numbers from 1 to 10,000. E. and F. N. Spon, Ltd., London. 1921. 2. Tables of Vl — r 2 and 1— r 2 for use in Partial Correlation and Trigonometry, by John Rice Miner, Sc.D. Johns Hopkins Press. 1922. 1 The book by Yule is a classic which should be known to every serious student of mental and social measurements. STATISTICAL METHOD AND TEST RESULTS 303 Table of Squares and Square Roots of the Numbers from 1 to 1000 Number Square Square Root 1 1 1.000 2 4 1.414 3 9 1.732 4 16 2.000 5 25 2.236 6 36 2.449 7 49 2.646 8 64 2.828 9 81 3.000 10 100 3.162 11 121 3.317 12 144 3.464 13 169 3.606 14 196 3.742 15 2 25 3.873 16 2 56 4.000 17 2 89 4.123 18 3 24 4.243 19 3 61 4.359 20 4 00 4.472 21 4 41 4.583 22 4 84 4.690 23 5 29 4.796 24 5 76 4.899 25 6 25 5.000 26 6 76 5.099 27 7 29 5.196 28 7 84 ( 5.292 29 8 41 5.385 30 9 00 5.477 31 9 61 5.568 32 10 24 5.657 33 10 89 5.745 34 1156 5.831 35 12 25 5.916 36 12 96 6.000 37 13 69 6.083 38 14 44 6.164 39 15 21 6.245 40 16 00 6.325 41 16 81 6.403 42 17 64 6.481 43 18 49 6.557 44 19 36 6.633 45 20 25 6.708 46 21 16 6.782 47 22 09 6.856 48 23 04 6.928 49 24 01 7.000 50 25 00 7.071 imber Square Square Root 51 26 01 7.141 52 27 04 7.211 53 28 09 7.280 54 29 16 7.348 55 30 25 7.416 56 31 36 7.483 57 32 49 7.550 58 33 64 7.616 59 34 81 7.681 60 36 00 7.746 61 37 21 7.810 62 38 44 7.874 63 39 69 7.937 64 40 96 8.000 65 42 25 8.062 66 43 56 8.124 67 44 89 8.185 68 46 24 8.246 69 47 61 8.307 70 49 00 8.367 71 50 41 8.426 72 51 84 8.485 73 53 29 8.544 74 54 76 8.602 75 56 25 8.660 76 57 76 8.718 77 59 29 8.775 78 60 84 8.832 79 62 41 8.888 80 64 00 8.944 81 65 61 9.000 82 67 24 9.055 83 68 89 9.110 84 70 56 9.165 85 72 25 9.220 86 73 96 9.274 87 75 69 9.327 88 77 44 9.381 89 79 21 9.434 90 8100 9.487 91 82 81 9.539 92 84 64 9.592 93 86 49 9.644 94 88 36 9.695 95 90 25 9.747 96 92 16 9.798 97 94 09 9.849 98 96 04 9.899 99 98 01 9 950 LOO 100 00 10.000 304 STATISTICS IN PSYCHOLOGY AND EDUCATION Table of Squares and Square Roots — Continued dumber Square Square Root Number Square Square Root 101 1 02 01 10.050 151 2 28 01 12.288 102 1 04 04 10.100 152 2 31 04 12.329 103 1 06 09 10.149 153 2 34 09 12.369 104 1 08 16 10.198 154 2 37 16 12.410 105 1 10 25 10.247 155 2 40 25 12.450 106 1 12 36 10.296 156 2 43 36 12 . 490 107 1 14 49 10.344 157 2 46 49 12 . 530 108 1 16 64 10.392 158 2 49 64 12 . 570 109 1 18 81 10.440 159 2 52 81 12.610 110 121 00 10.488 160 2 56 00 12 . 649 111 123 21 10.536 161 2 59 21 12 . 689 112 1 25 44 10.583 162 2 62 44 12.728 113 1 27 69 10.630 163 2 65 69 12.767 114 129 96 10.677 164 2 68 96 12.806 115 132 25 10.724 165 2 72 25 12 . 845 116 134 56 10.770 166 2 75 56 12.884 117 1 36 89 10.817 167 2 78 89 12.923 118 139 24 10.863 168 2 82 24 12.961 119 141 61 10.909 169 2 85 61 13 . 000 120 144 00 10.954 170 2 89 00 13.038 121 146 41 11.000 171 2 92 41 13.077 122 148 84 11.045 172 2 95 84 13.115 123 1 51 29 11.091 173 2 99 29 13.153 124 1 53 76 11.136 174 3 02 76 13.191 125 156 25 11.180 175 3 06 25 13.229 126 158 76 11.225 176 3 09 76 13.266 127 1 61 29 11.269 177 3 13 29 13.304 128 1 63 84 11.314 178 3 16 84 13.342 129 1 66 41 11.358 179 3 20 41 13.379 130 1 69 00 11.402 180 3 24 00 13.416 131 1 71 61 11.446 181 3 27 61 13.454 132 1 74 24 11.489 182 3 31 24 13.491 133 1 76 89 11.533 183 3 34 89 13 . 528 134 1 79 56 11.576 184 3 38 56 13.565 135 1 82 25 11.619 185 3 42 25 13.601 136 184 96 11.662 186 3 45 96 13 . 638 137 1 87 69 11.705 187 3 49 69 13.675 138 1 90 44 11.747 188 3 53 44 13.711 139 1 93 21 11.790 189 3 57 21 13 . 74S 140 1 96 00 11.832 190 3 61 00 13 . 784 141 1 98 81 11.874 191 3 64 81 13.S20 142 2 01 64 11.916 • 192 3 68 64 13 . S56 143 2 04 49 11.958 193 3 72 49 13 . 892 144 2 07 36 12.000 194 3 76 36 13 . 92S 145 2 10 25 12.042 195 3 80 25 13.964 146 2 13 16 12.083 196 3 84 16 14.000 147 2 16 09 12.124 197 3S8 09 14.036 148 2 19 04 12.166 198 3 92 04 14.071 149 2 22 01 12.207 199 3 96 01 14.107 150 2 25 00 12.247 200 4 00 00 14.142 STATISTICAL METHOD AND TEST RESULTS 305 Table of Squares and Square Roots — Continued dumber Square Square Root Number Square Square Root 201 4 04 01 14.177 251 6 30 01 15.843 202 4 08 04 14.213 252 6 35 04 15.875 203 4 12 09 14.248 253 6 40 09 15 . 906 204 4 16 16 14 . 283 254 6 45 16 15.937 205 4 20 25 14.318 255 6 50 25 15.969 206 4 24 36 14.353 256 6 55 36 16.000 207 4 28 49 14.387 257 6 60 49 16.031 208 4 32 64 14.422 258 6 65 64 16 . 062 209 4 36 81 14.457 259 6 70 81 16.093 210 4 41 00 14.491 260 6 76 00 16.125 211 4 45 21 14.526 261 6 81 21 16.155 212 4 49 44 14.560 262 6 86 44 16.186 213 4 53 69 14.595 263 6 91 69 16.217 214 4 57 96 14.629 264 6 96 96 16.248 215 4 62 25 14.663 265 7 02 25 16.279 216 4 66 56 14.697 266 7 07 56 16.310 217 4 70 89 14.731 267 7 12 89 16.340 218 4 75 24 14.765 268 7 18 24 16.371 219 4 79 61 14.799 269 7 23 61 16.401 220 4 84 00 14.832 270 7 29 00 16.432 221 4 88 41 14.866 271 7 34 41 16.462 222 4 92 84 14.900 272 7 39 84 16.492 223 4 97 29 14.933 273 7 45 29 16.523 224 5 01 76 14.967 274 7 50 76 16.553 225 5 06 25 15.000 275 7 56 25 16.583 226 5 10 76 15.033 276 7 61 76 16.613 227 5 15 29 15.067 277 7 67 29 16.643 228 . 5 19 84 15.100 278 7 72 84 16 . 673 229 5 24 41 15.133 279 7 78 41 16.703 230 5 29 00 15.166 280 7 84 00 16.733 231 5 33 61 15.199 281 7 89 61 16.763 232 5 38 24 15.232 282 7 95 24 16.793 233 5 42 89 15.264 283 8 00 89 16 . 823 234 5 47 56 15.297 284 8 06 56 16.852 235 5 52 25 15.330 285 8 12 25 16 . 882 236 5 56 96 15.362 286 8 17 96 16.912 237 5 61 69 15.395 237 8 23 69 16.941 238 5 66 44 15.427 238 8 29 44 16.971 239 5 71 21 15.460 289 8 35 21 17.000 240 5 76 00 15.492 290 8 41 00 17.029 241 5 80 81 15.524 291 8 46 81 17.059 242 5 85 64 15.556 292 8 52 64 17.088 243 5 90 49 15.588 293 8 58 49 17.117 244 5 95 36 15.620 294 8 64 36 17.146 245 6 00 25 15.652 295 8 70 25 17.176 246 6 05 16 15.684 296 8 76 16 17.205 247 6 10 09 15.716 297 8 82 09 17.234 248 6 15 04 15.748 298 8 88 04 17.263 249 6 20 01 15.780 299 8 94 01 17 . 292 250 6 25 00 15.811 300 9 00 00 17.321 306 STATISTICS IN PSYCHOLOGY AND EDUCATION Table of Squares and Square Roots Number Square Square Root 301 9 06 01 17.349 302 9 12 04 17.378 303 9 18 09 17.407 304 9 24 16 17.436 305 9 30 25 17.464 306 9 36 36 17.493 307 9 42 49 17.521 308 9 48 64 17.550 309 9 54 81 17.578 310 9 61 00 17.607 311 9 67 21 17.635 312 9 73 44 17 . 664 313 9 79 69 17.692 314 9 85 96 17.720 315 9 92 25 17.748 316 9 98 56 17.776 317 10 04 89 17 . 804 318 10 11 24 17.833 319 10 17 61 17.861 320 10 24 00 17.889 321 10 30 41 17.916 322 10 36 84 17.944 323 10 43 29 17.972 324 10 49 76 18.000 325 10 56 25 18.028 326 10 62 76 18.055 327 10 69 29 18.083 328 10 75 84 18.111 329 10 82 41 18.138 330 10 89 00 18.166 331 10 95 61 18.193 332 11 02 24 18.221 333 1108 89 18.248 334 11 15 56 18.276 335 11 22 25 18.303 336 11 28 96 18.330 337 11 35 69 18.358 338 11 42 44 18.385 339 1149 21 18.412 340 1156 00 18.439 341 11 62 81 18.466 342 11 69 64 18.493 343 11 76 49 18.520 344 11 83 36 18.547 345 11 90 25 18.574 346 11 97 16 18.601 347 12 04 09 18.628 348 12 11 04 18.655 349 12 18 01 18.682 350 12 25 00 18.708 ^.re Roots — Continued Number Square Square Root 351 12 32 01 18.735 352 12 39 04 18.762 353 12 46 09 18.788 354 12 53 16 18.815 355 12 60 25 18.841 356 12 67 36 18.868 357 12 74 49 18.894 358 12 81 64 18.921 359 12 88 81 18.947 360 12 96 00 18.974 361 13 03 21 19.000 362 13 10 44 19.026 363 13 17 69 19.053 364 13 24 96 19.079 365 13 32 25 19.105 366 13 39 56 19.131 367 13 46 89 19.157 368 13 54 24 19.183 369 13 61 61 19.209 370 13 69 00 19.235 371 13 76 41 19.261 372 13 83 84 19.287 373 13 91 29 19.313 374 13 98 76 19.339 375 14 06 25 19.363 376 14 13 76 19.391 377 14 21 29 19.416 378 14 28 84 19.442 379 14 36 41 19.46S 380 14 44 00 19 . 494 381 14 51 61 19.519 382 14 59 24 19.545 383 14 66 89 19.570 384 14 74 56 19.596 385 14 82 25 19.621 386 14 89 96 19.647 387 14 97 69 19.672 388 15 05 44 19.698 389 15 13 21 19.723 390 15 21 00 19.748 391 15 28 81 19.774 392 15 36 64 19.799 393 15 44 49 19.824 394 15 52 36 19.849 395 15 60 25 19.875 396 15 6S 16 19.900 397 15 76 09 19.925 398 15 84 04 19 . 950 399 15 92 01 19.975 400 16 00 00 20.000 STATISTICAL METHOD AND TEST RESULTS 307 Table of Squares and Square Roots — Continued Number Square Square Root Number Square Square Root 401 16 08 01 20.025 451 20 34 01 21.237 402 16 16 04 20 . 050 452 20 43 04 21.260 403 16 24 09 20 . 075 453 20 52 09 21 . 284 404 16 32 16 20.100 454 20 61 16 21.307 405 16 40 25 20.125 455 20 70 25 21.331 406 16 48 36 20.149 456 20 79 36 21.354 407 16 56 49 20.174 457 20 88 49 21.378 408 16 64 64 20.199 458 20 97 64 21.401 409 16 72 81 20 . 224 459 21 06 81 21.424 410 16 81 00 20.248 460 21 16 00 21.448 411 16 89 21 20.273 461 2125 21 21.471 412 16 97 44 20.298 462 21 34 44 21.494 413 17 05 69 20.322 463 21 43 69 21.517 414 17 13 96 20.347 464 21 52 96 21.541 415 17 22 25 20.372 465 21 62 25 21.564 416 17 30 56 20.396 466 21 71 56 21.587 417 17 38 89 20.421 467 21 80 89 21.610 418 17 47 24 20.445 468 21 90 24 21.633 419 17 55 61 20.469 469 21 99 61 21.656 420 17 64 00 20.494 470 22 09 00 21.679 421 17 72 41 20.518 471 22 18 41 21.703 422 17 80 84 20.543 472 22 27 84 21.726 423 17 89 29 20.567 473 22 37 29 21 . 749 424 17 97 76 20.591 474 22 46 76 21.772 425 18 06 25 20.616 475 22 56 25 21.794 426 18 14 76 20.640 476 22 65 76 21.817 427 18 23 29 20.664 477 22 75 29 21.840 428 18 31 84 20.688 478 22 84 84 21.863 429 18 40 41 20.712 479 22 94 41 21.886 430 18 49 00 20.736 480 23 04 00 21.909 431 18 57 61 20.761 481 23 13 61 21.932 432 18 66 24 20.785 482 23 23 24 21.954 433 18 74 89 20.809 483 23 32 89 21.977 434 18 83 56 20.833 484 23 42 56 22 . 000 435 18 92 25 20.857 485 23 52 25 22 . 023 436 19 00 96 20.881 486 23 61 96 22 . 045 437 19 09 69 20.905 487 23 71 69 22 . 068 438 19 18 44 20.928 488 23 81 44 22.091 439 19 27 21 20.952 489 23 91 21 22.113 440 19 36 00 20.976 490 24 01 00 22.136 441 19 44 81 21 . 000 491 24 10 81 22.159 442 19 53 64 21.024 492 24 20 64 22.181 443 19 62 49 21.048 493 24 30 49 22 . 204 444 19 71 36 21.071 494 24 40 36 22 . 226 445 19 80 25 21.095 495 24 50 25 22.249 446 19 89 16 21.119 496 24 60 16 22.271 447 19 98 09 21.142 497 24 70 09 22 . 293 448 20 07 04 21.166 498 24 80 04 22.316 449 20 16 01 21.190 499 24 90 01 22.338 450 20 25 00 21.213 500 25 00 00 22.361 308 STATISTICS IN PSYCHOLOGY AND EDUCATION Table of Squares and Square Roots — Continued Number Square Square Root Number Square Square Root 501 25 10 01 22 . 383 551 30 36 01 23.473 502 25 20 04 22 . 405 552 30 47 04 23.495 503 25 30 09 22.428 553 30 58 09 23.516 504 25 40 16 22.450 554 30 69 16 23.537 505 25 50 25 22.472 555 30 80 25 23 . 558 506 25 60 36 22 . 494 556 30 91 36 23 . 580 507 25 70 49 22.517 557 31 02 49 23.601 508 25 80 64 22 . 539 558 31 13 64 23.622 509 25 90 81 22.561 559 31 24 81 23 . 643 510 26 01 00 22 . 583 560 31 36 00 23 . 664 511 26 11 21 22 . 605 561 31 47 21 23 . 685 512 26 21 44 22 . 627 562 31 58 44 23 . 707 513 26 31 69 22 . 650 563 31 69 69 23.728 514 26 41 96 22 . 672 564 31 80 96 23 . 749 515 26 52 25 22 . 694 565 31 92 25 23.770 516 26 62 56 22.716 566 32 03 56 23.791 517 26 72 89 22.738 567 32 14 89 23.812 518 26 83 24 22 . 760 568 32 26 24 23.833 519 26 93 61 22 . 782 569 32 37 61 23 . 854 520 27 04 00 22.804 570 32 49 00 23 . 875 521 27 14 41 22 . 825 571 32 60 41 23.896 522 27 24 84 22 . 847 572 32 71 84 23.917 523 27 35 29 22 . 869 573 32 83 29 23.937 524 27 45 76 22.891 574 32 94 76 23.958 525 27 56 25 22.913 575 33 06 25 23.979 526 27 66 76 22.935 576 33 17 76 24 . 000 527 27 77 29 22.956 577 33 29 29 24.021 528 27 87 84 22 . 978 578 33 40 84 24 . 042 529 27 98 41 23.000 579 33 52 41 24 . 062 530 28 09 00 23 . 022 580 33 64 00 24.0S3 531 28 19 61 23 . 043 581 33 75 61 24.104 532 28 30 24 23 . 065 582 33 S7 24 24.125 533 28 40 89 23 . 087 583 33 98 89 24.145 534 28 51 56 23.108 584 34 10 56 24.166 535 28 62 25 23.130 585 34 22 25 24.1S7 536 28 72 96 23 . 152 586 34 33 96 24.207 537 28 83 69 23.173 587 34 45 69 24.228 538 28 94 44 23.195 528 34 57 44 24.249 539 29 05 21 23.216 589 34 69 21 24.269 540 29 16 00 23.238 590 34 81 00 24 . 290 541 29 26 81 23.259 591 34 92 81 24.310 542 29 37 64 23.281 592 35 04 64 24.331 543 29 48 49 23 . 302 593 35 16 49 24 . 352 544 29 59 36 23 . 324 594 35 28 36 24.372 545 29 70 25 23.345 595 35 40 25 24.393 546 29 81 16 23 . 367 596 35 52 16 24.413 547 29 92 09 23 . 388 597 35 04 09 24.434 548 30 03 04 23.409 598 35 76 04 24.454 549 30 14 01 23.431 599 35 88 01 24.474 550 30 25 00 23.452 600 36 00 00 24.495 STATISTICAL METHOD AND TEST RESULTS 309 Table of Squares and Square Roots — Continued Number Square Square Root Number Square Square Roc 601 36 12 01 24.515 651 42 38 01 25.515 602 36 24 04 24.536 652 42 51 04 25 . 534 603 36 36 09 24 . 556 653 42 64 09 25 . 554 604 36 48 16 24.576 654 42 77 16 25.573 605 36 60 25 24 . 597 655 42 90 25 25.593 606 36 72 36 24.617 656 43 03 36 25.612 607 36 84 49 24 . 637 657 43 16 49 25 . 632 608 36 96 64 24 . 658 658 43 29 64 25 . 652 609 37 08 81 24.678 659 43 42 81 25.671 610 37 21 00 24 . 698 660 43 56 00 25 . 690 611 37 33 21 24.718 661 43 69 21 25.710 612 37 45 44 24.739 662 43 82 44 25.729 613 37 57 69 24.759 663 43 95 69 25 . 749 614 37 69 96 24.779 664 44 08 96 25.768 615 37 82 25 24.799 665 44 22 25 25.788 616 37 94 56 24.819 666 44 35 56 25 . 807 617 38 06 89 24.839 667 44 48 89 25.826 618 38 19 24 24.860 668 44 62 24 25 . 846 619 38 31 61 24.880 669 44 75 61 25.865 620 38 44 00 24 . 900 670 44 89 00 25.884 621 38 56 41 24 . 920 671 45 02 41 25 . 904 622 38 68 84 24.940 672 45 15 84 25.923 623 38 81 29 24.960 673 45 29 29 25.942 624 38 93 76 24.980 674 45 42 76 25 . 962 625 39 06 25 25 . 000 675 45 56 25 25.981 626 39 18 76 25 . 020 676 45 69 76 26 . 000 627 39 31 29 25 . 040 677 45 83 29 26.019 628 39 43 84 25 . 060 678 45 96 84 26.038 629 39 56 41 25.080 679 46 10 41 26 . 058 630 39 69 00 25.100 680 46 24 00 26.077 631 39 81 61 25.120 681 46 37 61 26.096 632 39 94 24 25.140 682 46 51 24 26.115 633 40 06 89 25.159 683 46 64 89 26.134 634 40 19 56 25.179 684 46 78 56 26.153 635 40 32 25 25.199 685 46 92 25 26.173 636 40 44 96 25.219 686 47 05 96 26.192 637 40 57 69 25 . 239 687 47 19 69 26.211 638 40 70 44 25.259 688 47 33 44 26.230 639 40 83 21 25 . 278 689 47 47 21 26 . 249 640 40 96 00 25 . 298 690 47 61 00 26.268 641 41 08 81 25.318 691 47 74 81 26.287 642 41 21 64 25.338 692 47 88 64 26.306 643 41 34 49 25.357 693 48 02 49 26.325 644 41 47 36 25.377 694 48 16 36 26.344 645 41 60 25 25.397 695 48 30 25 26 . 363 646 41 73 16 25.417 696 48 44 16 26.382 647 41 86 09 25.436 697 48 58 09 26.401 648 41 99 04 25.456 698 48 72 04 26.420 649 42 12 01 25.475 699 48 86 01 26 . 439 650 42 25 00 25.495 700 49 00 00 26.458 310 STATISTICS IN PSYCHOLOGY AND EDUCATION Table of Squares and Square Roots — Continued dumber Square Square Root Number Square Square Root 701 49 14 01 26.476 751 56 40 01 27 . 404 702 49 28 04 26.495 752 56 55 04 27.423 703 49 42 09 26.514 753 56 70 09 27.441 704 49 56 16 26.533 754 56 85 16 27.459 705 49 70 25 26.552 755 57 00 25 27.477 706 49 84 36 26.571 756 57 15 36 27.495 707 49 98 49 26 . 589 757 57 30 49 27.514 708 50 12 64 26 . 608 758 57 45 64 27.532 709 50 26 81 26.627 759 57 60 81 27.550 710 50 41 00 26 . 646 760 57 76 00 27.568 711 50 55 21 26 . 665 761 57 9121 27.586 712 50 69 44 26 . 683 762 58 06 44 27.604 713 50 83 69 26.702 763 58 21 69 27.622 714 50 97 96 26.721 764 58 36 96 27.641 715 51 12 25 26.739 765 58 52 25 27.659 716 51 26 56 26 . 758 766 58 67 56 27.677 717 51 40 89 26.777 767 58 82 89 27.695 718 51 55 24 26.796 768 58 98 24 27.713 719 51 69 61 26.814 769 59 13 61 27.731 720 51 84 00 26 . 833 770 59 29 00 27 . 749 721 51 98 41 26.851 771 59 44 41 27.767 722 52 12 84 26 . 870 772 59 59 84 27 . 785 723 52 27 29 26 . 889 773 59 75 29 27 . 803 724 52 41 76 26.907 774 59 90 76 27.821 725 52 56 25 26.926 775 60 06 25 27.839 726 52 70 76 26.944 776 60 21 76 27.857 727 52 85 29 26 . 963 777 60 37 29 27.875 728 52 99 84 26.981 778 60 52 84 27.893 729 53 14 41 27 . 000 779 60 68 41 27.911 730 53 29 00 27.019 780 60 84 00 27.92S 731 53 43 61 27.037 781 60 99 61 27.946 732 53 58 24 27 . 055 782 61 15 24 27.964 733 53 72 89 27.074 783 61 30 89 27 . 982 734 53 87 56 27.092 784 61 46 56 28.000 735 54 02 25 27.111 785 61 62 25 28.018 736 54 16 96 27.129 786 61 77 96 2S.036 737 54 31 69 27.148 787 61 93 69 28.054 738 54 46 44 27.166 788 62 09 44 2S.071 739 54 61 21 27.185 789 62 25 21 28.089 740 54 76 00 27 . 203 790 62 41 00 2S.107 741 54 90 81 27.221 791 62 56 SI 28.125 742 55 05 64 27 . 240 792 62 72 64 28.142 743 55 20 49 27.258 793 62 88 49 28.160 744 55 35 36 27.276 794 63 04 36 28.178 745 55 50 25 27 . 295 795 63 20 25 28.196 746 55 65 16 27.313 796 63 36 16 28.213 747 55 80 09 27.331 797 63 52 09 28.231 748 55 95 04 27.350 798 63 68 04 28.249 749 56 10 01 27 . 368 799 63 84 01 28.267 750 56 25 00 27.386 800 64 00 00 2S.2S4 STATISTICAL METHOD AND TEST RESULTS 311 Table of Squares and Square Hoots — Continued lumber Square Square Root 801 64 16 01 28.302 802 64 32 04 28.320 803 64 48 09 28.337 804 64 64 16 28.355 805 64 80 25 28.373 806 64 96 36 28.390 807 65 12 49 28.408 808 65 28 64 28 . 425 809 65 44 81 28.443 810 65 61 00 28.460 811 65 77 21 28.478 812 65 93 44 28.496 813 66 09 69 28.513 814 66 25 96 28.531 815 66 42 25 28.548 816 66 58 56 28.566 817 66 74 89 28.583 818 66 91 24 28.601 819 67 07 61 28.618 820 67 24 00 28 . 636 821 67 40 41 28.653 822 67 56 84 28.671 823 67 73 29 28.688 824 67 89 76 28.705 825 68 06 25 28.723 826 68 22 76 28.740 827 68 39 29 28.758 828 68 55 84 28.775 829 68 72 41 { 28.792 830 68 89 00 28.810 831 69 05 61 28 . 827 832 69 22 24 28.844 833 69 38 89 28.862 834 69 55 56 28 . 879 835 69 72 25 28.896 836 69 88 96 28.914 837 70 05 69 28.931 838 70 22 44 28 . 948 839 70 39 21 28.965 840 70 56 00 28.983 841 70 72 81 29 . 000 842 70 89 64 29.017 843 71 06 49 29 . 034 844 71 23 36 29 . 052 845 7140 25 29 . 069 846 71 57 16 29.086 847 71 74 09 29.103 848 71 91 04 29.120 849 72 08 01 29.138 850 72 25 00 29.155 Number Square Square Root 851 72 42 01 29.172 852 72 59 04 29.189 853 72 76 09 29 . 206 854 72 93 16 29 . 223 855 73 10 25 29.240 856 73 27 36 29.257 857 73 44 49 29.275 858 73 61 64 29 . 292 859 73 78 81 29.309 860 73 96 00 29 . 326 861 74 13 21 29 . 343 862 74 30 44 29 . 360 863 74 47 69 29.377 864 74 64 96 29.394 865 74 82 25 29.411 866 74 99 56 29 . 428 867 75 16 89 29.445 868 75 34 24 29 . 462 869 75 51 61 29 . 479 870 75 69 00 29.496 871 75 86 41 29.513 872 76 03 84 29 . 530 873 76 21 29 29 . 547 874 76 38 76 29 . 563 875 76 56 25 29.580 876 76 73 76 29.597 877 76 91 29 29.614 878 77 08 84 29.631 879 77 26 41 29 . 648 880 77 44 00 29 . 665 881 77 61 61 29.682 882 77 79 24 29 . 698 883 77 96 89 29.715 884 78 14 56 29.732 885 78 32 25 29 . 749 886 78 49 96 29 . 766 887 78 67 69 29 . 783 888 78 85 44 29.799 889 79 03 21 29.816 890 79 21 00 29.833 891 79 38 81 29 . 850 892 79 56 64 29.866 893 79 74 49 29 . 883 894 79 92 36 29.900 895 80 10 25 29.916 896 80 28 16 29 . 933 897 80 46 09 29 . 950 898 80 64 04 29 . 967 899 80 82 01 29.983 900 81 00 00 30.000 312 STATISTICS IN PSYCHOLOGY AND EDUCATION Table of Squares and Square Roots — Continued Number Square Square Root Number Square Square Root 901 81 18 01 30.017 951 90 44 01 30.838 902 81 36 04 30 . 033 952 90 63 04 30 . 854 903 81 54 09 30 . 050 953 90 82 09 30.871 904 81 72 16 30.067 954 91 01 16 30 . 887 905 81 90 25 30.083 955 91 20 25 30.903 906 82 08 36 30.100 956 91 39 36 30.919 907 82 26 49 30.116 957 91 58 49 30.935 908 82 44 64 30.133 958 91 77 64 30.952 909 82 62 81 30.150 959 91 96 81 30.968 910 82 81 00 30.166 960 92 16 00 30 . 984 911 82 99 21 30.183 961 92 35 21 31.000 912 83 17 44 30.199 962 92 54 44 31.016 913 83 35 69 30.216 963 92 73 69 31.032 914 83 53 96 30.232 964 92 92 96 31.048 915 83 72 25 30.249 965 93 12 25 31.064 916 83 90 56 30.265 966 93 31 56 31.081 917 84 08 89 30 . 282 967 93 50 89 31.097 918 84 27 24 30 . 299 968 93 70 24 31.113 919 84 45 61 30.315 969 93 89 61 31.129 920 84 64 00 30.332 970 94 09 00 31.145 921 84 82 41 30.348 971 94 28 41 31.161 922 85 00 84 30.364 972 94 47 84 31.177 923 85 19 29 30.381 973 94 67 29 31.193 924 85 37 76 30.397 974 94 86 76 31.209 925 85 56 25 30.414 975 95 06 25 31.225 926 85 74 76 30.430 976 95 25 76 31.241 927 85 93 29 30.447 977 95 45 29 31.257 928 86 11 84 30 . 463 978 95 64 84 31.273 929 86 30 41 30.480 979 95 84 41 31.289 930 86 49 00 30.496 980 96 04 00 31.305 931 86 67 61 30.512 981 96 23 61 31.321 932 86 86 24 30 . 529 982 96 43 24 31.337 933 87 04 89 30 . 545 983 96 62 89 31.353 934 87 23 56 30.561 984 96 82 56 31.369 935 87 42 25 30.578 985 97 02 25 31.3S5 936 87 60 96 30.594 986 97 21 96 31.401 937 87 79 69 30.610 987 97 41 69 31.417 938 87 98 44 30 . 627 988 97 61 44 31.432 939 88 17 21 30 . 643 989 97 81 21 31.448 940 88 36 00 30.659 990 98 01 00 31.464 941 88 54 81 30.676 991 9S 20 81 31.4S0 942 88 73 64 30 . 692 992 98 40 64 31.496 943 88 92 49 30.708 993 98 60 49 31.512 944 89 11 36 30.725 994 98 80 36 31.528 945 89 30 25 30.741 995 99 00 25 31.544 946 89 49 16 30.757 996 99 20 16 31.559 947 89 68 09 30.773 997 99 40 09 31.575 948 89 87 04 30.790 998 99 60 04 31.591 949 90 06 01 30.806 999 99 SO 01 31.607 950 90 25 00 30.822 1000 100 00 00 31.623 INDEX Italics are used for Reference to Definitions. Age-scale, 109, 110 Array, 155 Attenuation, 211; correction for, 212 Average, 8, 9, 28, 31, 50, 51; relia- bility of an, 121 Average deviation or AD, 22, 23, 32, 34, 35, 51, 52 Axes, coordinate, 60; use in cor- relation, 159, 175 Barlow's Tables, 302 Bias in sampling, 144. See Sam- pling. Binomial expansion, 79; in prob- ability, 77-80; graphic repre- sentation of, 80 Blakeman, J., test for linearity, 210 Bowley, A. L., 302 Bravais, 163 Brown, Wm, 269, 292 Brown and Thomson, 191, 218, 302 Burt, Cyril, 251 Carothers, F. E., 134, 280, 300 Central tendencies, 8-16; reliabil- ity of measures of, 120-127 Classification of measures into fre- quency distributions, 2-4 Class-interval. See Step-interval. Coefficient of alienation, 289 Coefficient of contingency, 198; computation of, 198-199; com- parison with correlation coeffi- cient, 200; short method of computing, 201 Coefficient of correlation, 1^9; as a ratio, 152-153; repre- sented graphically, 158-159; steps in computation of, from guessed average, 163-168; steps in computation of, from aver- age, 169-170; reliability of, 170; interpretation of, 288- 299. See also Correlation. Coefficient of regression, 175, 178 Coefficient of variation, calcula- tion of, 41-42 Coin tossing, in experiments on laws of chance, 79-81 Column diagram. See Histogram. Comparison of groups in terms of central tendencies and variabil- ities, 42; in terms of overlap- ping, 45 Comparison of obtained distribu- tions with normal probability curve, 81 Contingency method, 195-203. See also Coefficient of contin- gency. Continuous series, 1; tabulation of measures in, 2-7 Correction, computation of cor- 313 314 INDEX rection, C, in Short Method, 31; for attenuation, 211 Correlation, 149-152; positive, negative, and zero, 150-151; graphic representation of, 161— 162; construction of correla- tion table, 154; product-mo- ment method of computing, 163-170; rank methods of computing, 189-195; spurious, 258; effect of errors of observa- tion on, 211. See also Par- tial correlation and Multiple correlation. Correlation-ratio ; in non-linear relation, 204-205; steps in computing, 206; comparison with r to determine linearity of regression, 209-210; correction of " raw" eta, 209; reliability of, 208 Criterion, 266; value of, in deter- mining validity of tests, 266- 267 Cumulative errors, effect on mul- tiple R, 238-239 Deciles, 45. See Percentiles. Deviation. See Quartile devia- tion, Average deviation, and Standard deviation Dice throwing, in experiments on laws of chance, 80-81 Difference, reliability of, between measures of central tendency, 128-137; reliability of, be- tween two r's, 171. See Stand- ard and Probable error. Discrete series, 2; median in, 12; short method applied to, 36 Elderton, W. P. and E. M., 301 Equation, of straight line, 175; plotting of linear, 176-178; of regression lines, in Deviation Form, 178-179; in Score Form, 180-182 Error, curve of, 83. See also Nor- mal curve. Errors, of sampling, 143; of ob- servation, 211; constant, 274 variable 274. See also Prob- able and Standard errors. Footrule (Spearman's) in corre- lation, 192-195 Frequency distribution, three methods of constructing, 3-4 Frequency Polygon, 59-63; com- parison with histogram, 65 Garrett, H. E., 114 Grades, method of, in correlation, 192. See also Footrule. Graphic methods, of representing data, in a frequency distribu- tion, 59-71; of representing correlation coefficient, 158-162 Grouping, in tabulation, 3; as- sumptions in, 5 Heterogeneity, effect of, on cor- relation, 259; on reliability, 271 Hillegas, Milo B., 108 Histogram, 63-66 ; comparison with frequenc} r potygon, 65 Holzinger, Karl J., 271 Homogeneity of a group, 17 Hull, Clark, method of transmut- ing ranks, 111-115; method of combining tests, 282, 300 Index of reliability, 273 Jerome, Harry, 82 Jones, D. Caradog, S3, S4, 174, 211, 302 INDEX 315 Kelley, T. L., 33, 195, 254, 259, 263, 267, 272, 273, 289, 292, 302 Law of normal frequency, 82 Line graphs, 72-73 Line of means, best fitting line, 160, 173; plotting of, 175; equation of, 175-182 Linearity of relation, 203; tests for, 209-210 May, Mark A., 223, 224, 244, 263 McCall, W. A., 109, 110,302 Mean, Arithmetic. See Average. Mean deviation. See Average deviation. Median, 11; 12, 13, 38, 50; reli- ability of, 126 Methods of combining test scores, 277; by percentiles, 278; by median mental age, 279; by variability of test scores, 279- 281; by conversion into com- parable distributions, 281-284 Middle 50%, 21, 85 Midpoint of step, how to find, 6; as representative of all the scores on the step, 6 Midscore, in ungrouped discrete series when N is even, 12; when N is odd, 12 Miner, John Rice, 302 Monroe, W. S., 185, 302 Mode, 15, 16,50 Moore, H. L., 255 Multiple coefficient of correlation, R, 222; computation of, 230- 231; general formula for, 238; "chance" R, 239; alternate forms for, 239 Musselman, J. R., 261 Non-linear relation, 203-205 Normal curve, 74', deduction from binomial expansion, 80; why employed in psychological meas- urement, 81-84; properties of, 84-85; use in the solution of a variety of problems, 94ff; in test making, 101-109; in trans- mutation of ranks, 111-115; in measuring reliability, 123, 131 Normal probability curve. See Normal curve. Normal frequency distribution, 83 ; illustrations of, 75 Ogive, 66; construction of, 67, 71; smoothing of, 68; in calcu- lating percentiles, 69-70 Otis Correlation Chart, 167 Otis, A. S., 217, 259, 272, 302 Overlapping, in the measurement of groups, 44-45; of elements or factors in correlation, 291- 299 Partial correlation, 221; illus- tration of, in three-variable problem, 223-231; notation in, 232; general formulas for use in, 231-240; models of four- and five-variable problems, 240- 244; illustration of, in four- variable problem, 244-251 ; value of, in analysis and causal investigations, 25 Iff ; limitations to use of, 258 Pearl, Raymond, 295, 297 Pearson, Karl, 163, 200, 205, 209 Percentile scale, 109; evaluation of, 209 Percentiles, calculation of, 45ff, percentile scores, 46; graphic method of finding, 69; method of combining scores from dif- ferent tests, 278 316 INDEX Phillips, Frank M., 252 Pintner and Patterson, 49, 279 Probable error, relation to Q, 21; relation to other measures of variability in a normal distri- bution, 85; use in solution of problems, 94-109 Probable error, of an average, 125ff; of a median, 126; of a, 127; of a difference, 129; table for finding reliability of a dif- ference in terms of, 135; of a coefficient of correlation, 170- 171 Probable error of estimate, in pre- diction, 184-185; in partial and multiple correlation, 237 Probable error of measurement, 274-276 Product-moment method of find- ing r, deviations from GA, 163- 168; deviations from average, 168-170 Pyle, W. H., 279 Quartile deviation (Q), 17, 18- 22; in discrete series, 40; when to use, 50 Quartiles, Qi and Qz, computa- tion of, 18-19 r, Product-moment coefficient of correlation, formulas for, 167, 168. See Coefficient of correla- tion, and Correlation. Random sample, 142-145. See also Sampling. Range, 2, 17, 50 Rank difference method of com- puting correlation, 189ff; when to use, 195 Ranks, transmutation of, into units of amount, 11 Iff Reavis, George, 253 Reduced scores, in combining test scores, 283-284; in com- putation of r, 285 Regression equations, deviation form, 174f ; in score form, 180f ; partial equations of, 235; non- linear, 203ff Regression coefficients, 174, 178 Relative variability, measures of, 40. See also Coefficient of variation. Reliability, measures of, 118-137; limitations to measures of, 142- 145; coefficient of, 268-271; dependence of coefficient of, on size and variability of group, 271-272; index of, 273. See also Probable error and Stand- ard error. Rietz, H. L., et al., 302 Rosenow, Curt, 239 Ruch-Stoddard Correlation Sheet, 167 Ruch, G. M. and Del Manzo, M. C, 271, 299 Rugg, H. O., 301 Sampling, random, 120; errors of, 142-143; unreliability due to, 144; criteria of, 144 Scaling total scores, 109. See also Percentile scale, Age-scale, T- scale. Scatter diagram, 154 Score, meaning of, 7 Secrist, Horace, 302 Semi-interquartile range, 21. See Quartile deviation. Skewness, 86-89 Sommerville, R. C, 56, 219 Spearman, C, 212, 213 INDEX 317 Spearman's Footrule, 192; proph- ecy formula,269 Spurious Correlation, 258-261 Standard deviation (a), 26, 27, 35; relation to other measures of variability, 85; reliability of, 127; general formulas for par- tial o-'s, 233-235; of the sum or difference of corresponding val- ues of two series of test scores, 286-288 Standard error, of an average, 121-125; of a median, 126; of a (7, 127; of a Q, 128; of a difference, 128-133; table for finding the reliability of a dif- ference in terms of, 134; of a sum or difference, measures correlated, and uncorrelated, 187 Standard error of estimate, in prediction, 183; in partial and multiple correlation, 237; in interpreting, r, 288-290 Standard error of measurement, 274-276 ; in interpreting r, 290- 291 Step-interval, 2, 3, 4, 5; midpoint of, 5-6; assumptions with re- gard to data on, 5-6 Tables of frequencies of normal probability curve, in terms of a, 91; in terms of PE, 93 Tabulation, of measures into fre- quency distribution, 3f; of correlation table, 154 Thorndike, E. L., 88, 301 Thurstone Correlation Sheet, 167 Thurstone, L. L., 302 Trabue, M. R., 127, 137 Transmutation of ranks into units of amount, 111 T-scale, 110 True scores, 118, 272-273 Validity, measurement of, in a test, 266-268 Variable errors, effect on r, 211; measurement of, 274-276 Variability, 16; causes of, 82, 88; comparison of groups with re- spect to, 42-44; coefficient of relative, 41 ; reliability of meas- ures of, 127-128. See also Aver- age deviation, Quartile devia- tion, and Standard deviation. Weighting of tests, by variability of test scores, 279 Whitley, M. T., 267 Whipple, G. M., 279 Woody, Clifford, 104, 105, 107 Woodworth, R. S., method of combining tests, 283; use of " reduced scores " in comput- ing r, 285 Yule, G. Udny, 80, 121, 122, 196, 200, 210, 212, 218, 221, 237, 286, 302 HA33 Garrett, Henry Edward!' X education! 108 " **«**>* and G192 n -^hr-^J^i Date Due Hfc33 c. 1 G192 Garrett, Henry Edward. author statistics in psychology and Education. TITLE DATE DUE BORROWER'S NAME Ju :uj^^^ Ca o q»,3fl^ L'^^^>-^ i_