H_
American Foundation
ForThe Blind inc.
Digitized by the Internet Archive
in 2012 with funding from -
Lyrasis Members and Sloan Foundation
http://www.archive.org/details/statisticsinpsycOOhenr
STATISTICS IN PSYCHOLOGY
AND EDUCATION
STATISTICS IN PSYCHOLOGY
AND EDUCATION
BY
HENRY E. GARRETT
ASSISTANT PROFESSOR OF PSYCHOLOGY, COLUMBIA UNIVERSITY
WITH AN INTRODUCTION BY
R. S. WOODWORTH
PROFESSOR OF PSYCHOLOGY, COLUMBIA UNIVERSITY
LONGMANS, GREEN AND CO.
55 FIFTH AVENUE, NEW YORK
CHICAGO, TORONTO, LONDON
1926
Copyright, 1926, by
LONGMANS, GREEN AND CO.
First Edition, January, 192G
Reprinted, November, 1926
MADE IN THJB UNITED STATES
INTRODUCTION
Modern problems and needs are forcing statistical methods
and statistical ideas more and more to the fore. There are so
many things we wish to know which cannot be discovered by a
single observation, or by a single measurement. We wish to
envisage the behavior of a man who, like all men, is rather a
variable quantity, and must be observed repeatedly and not
once for all. We wish to study the social group, composed of
individuals differing one from another. We should like to be
able to compare one group with another, one race with another,
as well as one individual with another individual, or the indi-
vidual with the norm for his age, race or class. We wish to
trace the curve which pictures the growth of a child, or of a
population. We wish to disentangle the interwoven factors of
heredity and environment which influence the development of
the individual, and to measure the similarly interwoven effects
of laws, social customs and economic conditions upon public
health, safety and welfare generally. Even if our statistical
appetite is far from keen, we all of us should like to know enough
to understand, or to withstand, the statistics that are constantly
being thrown at us in print or conversation— much of it pretty
bad statistics. The only cure for bad statistics is apparently
more and better statistics. All in all, it certainly appears that
the rudiments of sound statistical sense are coming to be an
essential of a liberal education.
Now there are different orders of statisticians. There is,
first in order, the mathematician who invents the method for
performing a certain type of statistical job. His interest, as a
mathematician, is not in the educational, social or psychological
problems just alluded to, but in the problem of devising instru-
VI INTRODUCTION
ments for handling such matters. He is the tool-maker of the
statistical industry, and one good tool-maker can supply many
skilled workers. The latter are quite another order of statisti-
cians. Supply them with the mathematician's formulas, map
out the procedure for them to follow, provide working charts,
tables and calculating machines, and they will compute from
your data the necessary averages, probable errors and correla-
tion coefficients. Their interest, as computers, lies in the quick
and accurate handling of the tools of the trade. But there is
a statistician of yet another order, in between the other two.
His primary interest is psychological, perhaps, or it may be
educational. It is he who has selected the scientific or practical
problem, who has organized his attack upon the problem in
such fashion that the data obtained can be handled in some
sound statistical way. He selects the statistical tools to be
employed, and, when the computers have done their work, he
scrutinizes the results for their bearing upon the scientific or
practical problem with which he started. Such an one, in
short, must have a discriminating knowledge of the kit of
tools which the mathematician has handed him, as well as some
skill in their actual use.
The reader of the present book will quickly discern that it
is intended primarily for statisticians of the last-mentioned
type. It lays out before him the tools of the trade; it explains
very fully and carefully the manner of handling each tool; it
affords practice in the use of each. While it has little to say of
the tool-maker's art, it takes great pains to make clear the use
and limitations of each tool. As any one can readily see who
has tried to teach statistics to the class of students who most
need to know the subject, this book is the product of a genuine
teacher's experience, and is exceptionally well adapted to the
student's use. To an unusual degree, it succeeds in meeting
the student upon his own ground.
R. S. Woodworth
Columbia University
PREFACE
The present day emphasis on measurement and the quanti-
tative treatment of results has made a knowledge of statistical
method not only extremely useful but almost necessary to the
student of psychology, education, and the social sciences. To
those who have been well trained in mathematics, the acquisi-
tion of statistical technique offers no particular difficulty. To
many otherwise capable students, however, either because of
inadequate preparation in mathematics, or because their prep-
aration is not very recent, the application of statistical method
to data obtained from test and experiment is more than
ordinarily difficult.
It is for this last group of students, especially, that this
book has been written. Its primary purpose is to present the
subject in a simple and concise form understandable to those
who have no previous knowledge of statistical method. With
this end in view, theory has everywhere been subordinated to
practical application, and numerous illustrations of the various
statistical devices have been provided. References have been
given, however, for the benefit of those interested in the mathe-
matical theory underlying the methods introduced.
The reader will note that in nearly all cases formulas have
simply been stated without proof. This has been done, because
the writer believes that most students of mental and social
measurement are — and probably should be — more concerned
with what a formula means and does than in how it is derived.
There is considerable justification for such an attitude. In
every science certain facts obtained from other fields must be
taken on faith. We do not, to take a simple example, restrict
the use of the radio or the microscope to those who understand
the physical principles involved, and there seems to be no real
yii
vni PREFACE
reason why a student of psychology should not make good use
of a correlation formula when he cannot derive it mathe-
matically.
A chapter has been given to the subject of reliability — a
topic too often passed over lightly — and considerable space has
been devoted to correlation. An entire chapter, also, has been
given to partial and multiple correlation. This method, while
comparatively recent, is being widely used in educational
research, and is probably destined in the near future to be more
often used in the psychological laboratory. In the last chapter,
the application of correlation and other statistical methods is
shown to tests and testing.
Many have contributed to the making of this book of whom
only a few can be mentioned. To Professors R. S. Woodworth
and Mark A. May who read the manuscript, the writer is
indebted for many useful and constructive criticisms. He is
also grateful to Dr. M. R. Neifeld, to Mr. V. W. Lemmon,
and to Miss Elizabeth Farber for computations and helpful
suggestions.
Henry E. Garrett
Columbia University
CONTENTS
CHAPTER I
THE FREQUENCY DISTRIBUTION
SECTION PAGE
I. The Tabulation of Measures into a Frequency Distribu-
tion 1
1. Measures in General: Continuous and Discrete ... 1
2. Classification of Measures in Continuous Series ... 2
3. Three Ways of Expressing the Limits of a Step-interval . 5
4. The Meaning of a Single Score in a Continuous Series . 7
II. Measures of Central Tendency 8
1. The Average, or Arithmetic Mean . ... . . . 8
2. The Median 11
3. The Mode . . 15
III. Measures of Variability 16
1. The Range 17
2. The Quartile Deviation, or Q 17
3. The Average Deviation, or AD 22
4. The Standard Deviation, or SD 26
(
IV. The Short Method of Finding the Average, AD, and
SD(a) 28
1. The Calculation of the Average by the Short Method . 28
2. The Calculation of the AD by the Short Method ... 32
A. The Calculation of the AD from the Average ... 32
B. The Calculation of the AD from the Median ... 35
3. The Calculation of the Standard Deviation by the Short
Method 35
4. The Short Method Applied to Discrete Series .... 36
V. The Comparison of Groups 40
1. The Measurement of Relative Variability 40
2. The Comparison of Two Groups in Terms of Central
Tendency and Variability 42
3. The Comparison of Two Groups in Terms of Overlapping 44
VI. The Calculation of the Percentiles in a Frequency Dis-
tribution 45
is
X CONTENTS
SECTION PAGE
VII. When to Use the Different Measures of Central Ten-
dency and Variability 50
VIII. Summary of Formulas for Finding the Measures of Cen-
tral Tendency and Variability 51
IX. Illustrative Problems 53
CHAPTER II
GRAPHIC METHODS AND THE NORMAL CURVE
I. The Graphic Representation of the Frequency Distribu-
tion 59
1. The Frequency Polygon 59
2. The Histogram or Column Diagram 63
3. The Ogive, or Cumulative Frequency Graph ... . .66
II. Other Uses of Graphical' Methods: the Comparative Line
Graph .71
III. The Normal Probability Curve 74
1. Elementary Principles of Probability , 76
2. Why the Probability Curve is Employed in Psychological
Measurement - 81
3. Important Properties of the Normal Curve 84
4. The Measurement of Skewness 86
IV. Some Practical Applications of the Normal Curve . . 89
1. The Construction and Use of Tables X and XI .... 89
2. A Variety of Problems Solved by Means of Tables X and XI 94
3. The Arrangement of Problems or other Test Items into a
Scale in Which the Difficulty of Each Item is Known with
Reference to Each Other Item as Well as Some Selected
Zero Point 101
4. The Conversion of Judgments by Relative Position — or
Relative Merit — into a or PE Positions on the Scale . . 107
5. The Scaling of Total Scores on a Test 109
V. The Transmutation of Measures by Relative Position
(in Order of Merit) into Units of Amount on the
Assumption of Normality in the Trait Measured . Ill
CHAPTER III
THE RELIABILITY OF MEASURES
I, What is Meant by the Reliability of a Measure . . 118
CONTENTS XI
SECTION PAGE
II. The Reliability of Measures op Central Tendency . . 120
1. The Reliability of the Average or Mean 120
A. In Terms of the Standard Error, <rav. 120
B. In Terms of the Probable Error, PEav. . . . .125
2. The Reliability of the Median 126
III. The Reliability of Measures of Variability .... 127
1. The Standard Deviation, or a 127
2. The Quartile Deviation, or Q 128
IV. The Reliability of the Difference between Two Measures 128
1. The Reliability of the Difference between Two Averages . 128
A. In Terms of the o"(diff.) 128
B. In Terms of the PE(dm.) 133
2. The Reliability of the Difference between Two Medians . 136
V. Some Problems which Involve Measures of Reliability . 138
VI. Limitations to the Reliability Formulas, and Cautions to
be Observed in Interpreting Them 142
VII. Summary of Reliability Formulas 145
CHAPTER IV
CORRELATION
I. What is Meant by Correlation 149
II. The Coefficient of Correlation: What it is, and what it
Does 152
1. The Coefficient of Correlation as a Ratio 152
2. Graphical Representation of the Coefficient of Correlation 158
III. The Calculation of the Coefficient of Correlation by
the Product-moment Method 163
1. The Product-moment Formula when Deviations are
Taken from the Guessed Averages of the Two Distri-
butions 163
2. The Product-moment Formula when Deviations are
Taken from the Actual Averages of the Two Distribu-
tions 168
IV. The Probable Error of a Coefficient of Correlation . 170
1. The PEr . 170
2. The PE of the Difference between Two r's 171
V. The Regression Equations 173
1. In Deviation Form 173
2. The Regression Equations in Score Form 180
3. The Reliability of the " Predictions" made from the
Regression Equations 183
xii CONTENTS
SECTION PAGE
VI. The Complete Solution of a Correlation Problem . . 185
VII. Methods of Measuring Correlation which Take Account
only of the Relative Position or Rank . . . 189
1. The Method of Rank-differences 190
2. The Method of Gains, or the Spearman Footrule . . . 192
3. Summary of the Rank Methods 195
VIII. A Method of Measuring Relationship when the Data are
Grouped into Classes or Categories. The Contin-
gency Method 195
IX. Non-linear Relationship 203
1. The Correlation Ratio 203
2. The Correction of "raw" eta . . . 209
3. Test of Linearity of Regression ; 209
X. The Correction of a Coefficient of Correlation for
"Attenuation." 211
XI. Summary of Formulas in Chapter IV 213
CHAPTER V
PARTIAL AND MULTIPLE CORRELATION
I. The Meaning of Partial and Multiple Correlation . . 221
II. A Correlation Problem Involving 3 Variables . .* . 223
III. General Formulas for Use in Partial and Multiple Corre-
lation 231
1. General Formulas for Partial r's 231
2. General Formulas for Partial o-'s of any Order .... 233
3. General Formulas for the Regression Equation, and Co-
efficients of Regression 235
4. General Formulas for Standard and Probable Errors of
Estimate 237
5. General Formula for R, the Coefficient of Multiple Correla-
tion 23S
6. Outline of the Formulas Needed in Correlation Problems
which Involve (a) Four Variables, (6) Five Variables . 240
IV. A Multiple Correlation Problem Involving 4 Variables . 244
V. The Value and Use of Partial and Multiple Correlation 251
VI. Spurious Correlation 258
1. Spurious Correlation Due to Heterogeneity of Material . 25S
2. Spurious Index Correlation 260
CONTENTS xin
SECTION PAGE
3. Spurious Correlation of a Single Test with a Composite of
which it is a Member 260
VII. SUMMARY OF FORMULAS IN CHAPTER V 261
CHAPTER VI
SOME APPLICATIONS OF STATISTICAL METHOD AND
TECHNIQUE TO TESTS AND TEST RESULTS
I. The Validity of Test Scores 266
1. Validity Determined through Correlation with a Criterion . 266
2. Indirect Measures of Validity 267
II. The Reliability of Test Scores 268
1. The Reliability of a Test as Measured by its Self-Correla-
tion 268
(A) The " Reliability Coefficient" 268
(B) Effect on Reliability of Lengthening or Repeating the
Test 269
(C) Coefficient of Reliability from One Application of a
Test 271
(D) Dependence of the Reliability Coefficient on the Size
and Variability of the Group 271
2. ' The Index of Reliability 272
3. The Standard Error and Probable Error of Measurement:
<T(M) and PE(M) 274
III. Combining the Scores from Different Tests .... 277
1. Combining Test Scores by Percentiles 278
2. Combining Test Scores by the Method of Median Mental
Age 279
3. Combining Tests which have been Weighted According to
the Variability of the Test Scores 279
4. Combining Test Scores by Converting the Scores of Dif-
ferent Tests into Comparable Series 281
IV. The a of the Sum or Difference of Corresponding Values
of Two Series of Test Scores . 286
V. How to Interpret the Coefficient of Correlation between
Two Tests or Other Measures 288
1. The Interpretation of a Coefficient of Correlation in Terms
ofo-(est.) 288
2. The Iiiterpretation of a Coefficient of Correlation in terms
of the Standard Error of Measurement, cr^M) . . . . 290
3. Interpretation of a Coefficient of Correlation in Terms of the
Percentage of Common (Overlapping) Elements or Fac-
tors 291
STATISTICS IN PSYCHOLOGY
AND EDUCATION
CHAPTER I
THE FREQUENCY DISTRIBUTION
I. The Tabulation of Measures into a Frequency
Distribution
1. Measures in General : Continuous and Discrete Series
In the measurement of mental and social traits or capacities
most of the facts with which we deal fall into what are known
as continuous series. A continuous series may be defined
simply as a series which is theoretically capable of any degree
of subdivision. JQ's, for example, are generally thought of as
increasing by increments of 1 on a scale which extends from the
idiot to the genius; however, there is actually no real reason —
at least theoretically — why with more refined methods of
measurement we should not be able to get IQ's of 100.8 or even
100.83. Nearly all capacities measured by mental and educa-
tional tests and scales, as well as such attributes as height,
weight, cephalic index, etc., have been found to be continuous,
so that within the range of the scale used, any measure —
integral or fractional — may exist and have meaning. When-
ever gaps occur in a truly continuous series, therefore, these are
usually to be attributed to our failure to measure enough cases,
or to the relative crudity of our measuring instruments, or
2 STATISTICS IN PSYCHOLOGY AND EDUCATION
to some other fact of the same sort, rather than to the fact that
no measures exist within the gaps.
There are, however, measures which do not fall into continu-
ous series. Thus a salary scale in a department store may run
from $10 per week to $20 per week in units of 50 cents or $1.00;
no one receives, let us say, $17.53 per week. Or, to take
another example, the average family in a certain locality may
work out mathematically to be 4.57 children, although there is
obviously a real gap between four and five children. Series
like these, which contain real gaps, are called discrete or dis-
continuous.
It is probably fortunate— at least from the standpoint of the
beginner in statistics— that nearly all of the measures which we
make in psychology are continuous or can be treated as con-
tinuous. This considerably simplifies the problem, inasmuch as
we may concern ourselves (for the present at least) almost
entirely with methods of handling continuous data, postponing
the discussion of discrete series to a later page.
2. The Classification of Measures in Continuous Series
Data collected from test or experiment are often merely a
series of numbers or mass of figures without meaning or signifi-
cance until they have been rearranged or classified in some
systematic way. The first task that confronts us, then, is the
organization of our material, and this leads naturally to a
grouping of the measures into classes or categories. The pro-
cedure in grouping falls under three main heads, which are
given in order below:
(1) The determination of the range: the interval between
the largest and the smallest measures. The range is easily
found by subtracting the smallest from the largest measure.
(2) Deciding upon the number and size of the groups to be
used in classification. The number and the size of these steps
or class-intervals depend largely upon the range and the kind of
measures with which we are dealing.
THE FREQUENCY DISTRIBUTION
(3) The tabulation of the separate measures within their
proper step- or class-intervals.
TABLE I
Army Alpha Scores Made
by 54 Columbia College Men
1. THE ORIGINAL !
SCORES (UNGROUPED)
185
174
127
183
168
*
126 177 154
157 189
172
*201
158
160
179
184
155 137 177
164 198
176
188
197
151
188
188
169 195 165
185 188
164
195
176
185
185
179
146 182 153
158 160
191
176
138
185
155
178
151 144 191
170 157
* Maximum score =
201
* Minimum score =
= 126.
2. THE SAME SCORES GROUPED INTO A FREQUENCY
DISTRIBUTION BY
THREE
METHODS
(A)
(B)
(C)
(1)
(2)
(3)
Scores
Tabulat:
ion
F
Scores
F
Scores
F
200 up
to 205
/
1
200-204.99
1
200-204
1
195 "
" 200
////
4
195-199.99
4
195-199
4
190 "
" 195
//,
2
190-194.99
2
190-194
2
185 "
" 190
MU
10
185-189.99
10
185-189
10
180 "
" 185
'ill"
3
180-184.99
3
180-184
3
175 "
" 180
mu
III
8
175-179.99
8
175-179
8
170 "
" 175
in
3
170-174.99
3
170-174
3
165 "
" 170
in
3
165-169.99
3
165-169
3
160 "
" 165
mi
4
160-164.99
4
160-164
4
155 "
" 160
mu
1
6
155-159.99
6
155-159
6
150 "
" 155
mi
4
150-154.99
4
150-154
4
145 "
" 150
i
(
1
145-149.99
1
145-149
1
140 "
" 145
i
1
140-144.99
1
140-144
1
135 "
" 140
ii
2
135-139.99
2
135-139
2
130 "
" 135
0
130-134.99
0
130-134
0
125 "
" 130
n
2
125-129.99
2
125-129
2
AC-
54
AT =
54
N
= 54
These three principles of classification are illustrated in
Table I. The figures in this table represent the Army Alpha
scores received by 54 college men. Since the highest score is
201, and the lowest 126, the range is found at once to be exactly
75 points. In deciding upon the number of "steps" or class-
intervals to be used in grouping, the best general rule is to select
by trial a step-interval which will yield not more than 20 nor
less than 10 steps. The number of steps which a given interval
will yield can be determined approximately (within one step)
4 STATISTICS IN PSYCHOLOGY AND EDUCATION
by dividing the range by the step tentatively chosen. In the
present problem, for example, 75 (the range) divided by 5 (the
step-interval) gives 15, which is one less than the actual number
of steps, namely 16. A step-interval of 3 points will yield
approximately 25 steps, while a step-interval of 10 points will
yield approximately 7.5 steps. (Actually, for the given data, a
step-interval of 3 points yields 26 steps, and one of 10 points
8 steps.)
The tabulation of the separate scores within their appro-
priate step- or class-intervals is shown in Table I(2A). In the
first column of this table, — in the column marked " Scores, " —
the step-intervals have been listed serially, with the smallest
measures at the bottom of the column. The first interval,
"125 up to 130," begins at 125 and ends at 130; the second
interval "130 up to 135," begins at 130 and ends at 135 and
so on. The last interval, "200 up to 205," begins at 200 and
ends at 205. In column 2, marked "Tabulation," the separate
scores have been listed opposite their proper intervals. The
first score, 185 [see Table 1(1)], is represented by a tally placed
opposite step-interval "185 up to 190"; the second score, 201,
by a tally placed opposite step-interval "200 up to 205"; the
third score, 188, by a tally placed opposite "185 up to 190"
and so on for the other scores. When all 54 scores have been
listed, the total number of tallies on each step-interval (i.e.,
the frequency) is written in column 3, headed F (frequencies).
The sum of the F column is called N. In the present case, of
course, N = 54. When the total frequency of each step-interval
has been tabulated opposite its proper step-interval, as shown
in column 3, our 54 Alpha scores are arranged into what is
known as a Frequency Distribution.
The reader will note that the lower limit of the first step in
the distribution (i.e., 125 up to 130) has been taken at 125
although the lowest actual score in the series is 126. This is
due to the fact that when the step-interval equals 5 units, it
facilitates tabulation as well as computations which come later
onx if the lower limit of the first step-interval (and accordingly
THE FREQUENCY DISTRIBUTION 5
of each succeeding step-interval) is a multiple of 5. A step-
interval of 126 up to 131 is just as good as a step-interval of
125 up to 130, theoretically; the second, however, is much
easier to handle from the standpoint of the arithmetic involved.
3. Three Ways of Expressing the Limits of a Step-interval
Table I (2 A,B,C) illustrates three ways of writing the limits
of a step-interval. In (A) the interval "125 up to 130" means
that all scores from 125 up to but not including 130 fall on this
step. In (B) the step-interval 125-129.99 means exactly the
same thing. The upper limit is written 129.99 simply to
emphasize the fact that this step-interval includes score 129
plus fractional parts up to 130, but does not include score 130.
(C) expresses the same facts more clearly than (A) and not so
exactly as (B). Thus 125-129 means that this step-interval
begins with score 125 and ends with score 129. A diagram will
indicate how (A), (B), and (C) are simply three ways of express-
ing the same facts.
Step Step
Begins Ends
1 1 » 2 , 3 , 4 , 5 1
125 126 127 128 129 130
Either method (B) or method (C) is advised as preferable
to (A). It is fairly easy — even when one is on guard — to let
a score of say 160 slip into the step-interval 155 up to 160 due
simply to the presence of the 160 at the upper limit of the step.
The accurate tabulation of a frequency distribution depends
on getting each score into its proper step-interval, and for this
reason one cannot be too careful in defining the limits of the
steps.
In any frequency distribution we always assume that the
scores within a given interval (i.e., the frequency) are spread
evenly over the entire interval; and this assumption holds
whether the length of the step is 3, 5 or 10 units. If we wish to
represent all of the scores within a given interval by some
single value, however, the midpoint of the interval is taken as
6 STATISTICS IN PSYCHOLOGY AND EDUCATION
the most logical choice. To illustrate, in the step-interval
155-159 [see Table I (2 C)] the six scores on this step are all
represented by the same value, 157.50, the midpoint of the
interval, although the scores are 155, 155, 157, 157, 158, 158.
The reason why 157.50 is the midpoint of the step-interval can
be shown graphically as follows:
Step Step
Begins Ends
I 1 i 2 ,3,4,5|
155 156 157 1 158 159 160
157.50
A simple rule for finding the midpoint of a step is
__. , . . , ,. ., - . . (upper limit — lower limit)
Midpoint = lower limit of step -j — .
For example, in the present case, 155H ^ =157.50.
Again, since the length of the step is 5, it follows that the mid-
point must be 2.5 points from the lower limit of the step, i.e.,
at 155+2.5 or 157.50.
It is often a question whether the midpoint is a fair repre-
sentative of all of the scores on a given step-interval. If we
examine the six scores on step 155-159, two scores, the two
155's, are below the midpoint; two scores, the two 157's, are
practically on the midpoint; and two scores, the two 158's, are
above the midpoint. Also an examination of the step preced-
ing and the step following 155-159 shows that on both of these
steps there are 2 measures above and 2 below the midpoint.
There seems good evidence, therefore, for assuming that the
midpoint represents fairly the scores on these intervals, though
it is true that the balancing of scores above and below the
midpoint is not always as clear cut as in the examples cited. In
certain cases, in fact (e.g., when the distribution is considerably
"skewed" *), there are often many more scores on one side of
the midpoint than the other, and the midpoint assumption is
1 When the scores are " piled " up at either the lower or the upper end of
the scale, the distribution is said to be " skewed.'! See page 86.
THE FREQUENCY DISTRIBUTION 7
then clearly untenable. The fact remains, however, that in
most frequency distributions of mental and educational measure-
ments, especially when the number of measures is large, the
assumption that the midpoint represents all of the scores on the
interval is a valid one, since in the long run about as many
scores will fall above as below the midpoint value.
4. The Meaning of a Single Score in a Continuous Series
So far we have discussed the classification of scores into step-
intervals (the frequency distribution) and the necessity of defin-
ing carefully the upper and lower limits of our step-intervals.
We shall now try to give a more precise notion of what is meant
by a single score, for example, a score of 165 points on Army
Alpha. If we think of the score 165 as occupying a certain
interval or distance on a linear scale, then any fractional value
from 165 up to (but not including) 166, e.g., 165.3, 165.8, etc.,
will fall within this interval and be scored simply as 165. See
illustration :
Step 1G5
165 166
A score of 165 may mean, therefore, that the person who made
it was just barely through 165 items, or that he had nearly
completed 166 — in either case his score will be 165.
In performance scales a score equal to or greater than 8,
say, but less than 9 is placed on step 8-9 or 8-8.99 and scored 8.
In most product scales, however, — the Thorndike Handwriting
Scale is an example — a score of 8 represents any value from 7.5
to 8.5: i.e., any value from a point one half step below 8 to
a point one half step above. Thus scores 7.7, 8.0, 8.4, etc.,
would all be scored 8. If as before we think of a score on such
a scale as a linear magnitude, 8 represents the midpoint of that
interval which extends from 7.5 to 8.5. See illustration:
Step 8
! i
7.5 8 8.5
8 STATISTICS IN PSYCHOLOGY AND EDUCATION
This method of scoring is employed in scales which measure
handwriting, drawing, composition, etc.
It is evident from the foregoing that the meaning of a single
score in a continuous series will depend upon how the test
is scored. If the score is not defined by the test, it is probably
safer to assume that a score of 22, say, means 22-23, rather
than 21.5-22.5.
II. Measures of Central Tendency
When scores or other measures have been tabulated into a
frequency distribution, generally our next task is to find a
measure of central tendency. The value of a measure of central
tendency is twofold: in the first place, it is a single measure
which represents all of the scores made by the group, and
as such gives a concise description of the performance of the
group as a whole; secondly, it enables us to compare two or
more groups in terms of typical performance. There are three
measures of central tendency in common use, (1) the average
or arithmetic mean, (2) the median, and (3) the mode. We
shall consider these three measures in order.
1. The Average, or Arithmetic Mean 1
The average is the best known of the measures of central
tendency. It may be defined simply as the sum of the sepa-
rate scores or measures in a series divided by their number.
To illustrate, if a man makes $3.00, $4.00, $3.50, $5.00 and
$4.50 on five successive days, his average daily wage ($4.00) is
obtained by dividing the sum of his daily earnings by the number
of days he has worked. The formula for the average of a
series of ungrouped measures is simply
A 2 (Measures) /1N
Average = -^ , (1)
in which N is the number of measures in the series.2
1 The term " average " is often used as a general expression to cover any
measure of central tendency. It is here used in a more restricted sense.
2 The symbol 2 means "sum of."
THE FREQUENCY DISTRIBUTION 9
When measures have been grouped into a frequency dis-
tribution, it is necessary to calculate the average by a slightly
different method from the one given above. The two illustra-
tions in Table II will make this method clear. The first of
these shows the calculation of the average for the 54 Army
Alpha scores which we have already tabulated into a frequency
distribution in Table I. Note that we first calculate the FXM
column by multiplying the midpoint (M) of each step-interval
by the number of scores (F) on it; and that the average (171.57)
is then simply the sum of the FXM (9265) divided by N (54).
The use of the midpoint for all of the scores on the interval is
made necessary by the fact that when scores have been grouped
into step-intervals they lose their identity and are thereafter
represented by the midpoint of the particular interval on which
they happen to fall. Hence, we must multiply or "weight"
the midpoint of each step (M) by the frequency (F) on that
step; add the FXM, and divide by N to get the average. The
formula may be written
Average = *^ (2)
Example (2), Table II, is a second illustration of the calcula-
tion of an average from grouped data. This frequency dis-
tribution represents 200 scores made by a group of adults on a
cancellation test. These scores are classified into 9 steps;
and since the step-interval is 4 points, the midpoint of each
step is found by adding J of 4 to the beginning of each step (for
example, 104+2=106). The FXM column (found as shown
above) totals 23988, and N equals 200. Hence, applying
formula (2), the average is found to be 119.94.
In both illustrations in Table II we have found the average
of the scores made by a given group. There is no reason,
however, why we cannot use either formula (1) or (2) to find
the average of a number of measurements made on the same
individual, as well. Thus an individual's reaction time to light
may be measured 100 times, the measures tabulated into a
10 STATISTICS IN PSYCHOLOGY AND EDUCATION
TABLE II
To Illustrate the Calculation of the Average, Median, and Mode,
from Data Grouped into a Frequency Distribution
1. data from table i (2), 54 army alpha scores
the step-interval = 5 points
Scores
Midpoint
F
FXM
200-204.99
202.5
1
202.
50
195-
-199
99
197.5
4
790.
00
190-
-194
99
192.5
2
385.
00
185-
-189
.99
187.5
10
1875.
00
180-
-184
.99
182.5
3
547.
50
175-
-179.
.99
177.5
3 26
1420,
,00
170-
-174.99
172.5
517
50
165-
-169
.99
167.5
3
502
50
160-
-164
.99
162.5
4
650
.00
155-
-159
.99
157.5
6
945
.00
150
-154
.99
152.5
4
610
.00
145-
-149
.99
147.5
1
147
.50
140
-144.99
142.5
1
142
.50
135-
-139
.99
137.5
2
275
.00
130-
-134.99
-129.99
132.5
127.5
0
2
N = 54
125-
255
.00
9265.00
vprn
p-p =
X(FXM)
= 9265
1 .57.
(2) (^ = 27^ Median = 175+ix5 = 175.625.
(3) Crude mode falls on class-interval, 185-189.99 or at 187.5
2. SCORES MADE BY 200 ADULTS ON A CANCELLATION TEST
STEP-INTERVAL = 4 POINTS
F FXM
Scores
Midpoint
136-
-139
138
132-
-135
134
128-
-131
130
124-
-127
126
120-
-123
122
116-
-119
118
112-
-115
114
108-
-111
110
104-
-107
106
mffe
2(FXM)
_23988.
3
414
5
670
16
2080
23
2898
52
6344
49 52
27 bl
5782
3078
18
1980
7
742
AT = 200 23988
(1) Average = ~" "M/ =^— = 119.94.
(2) (^ = 100) Median = 116-f^X4 = 119.92.
(3) Crude mode falls on class-interval, 120-123, or at 122.
THE FREQUENCY DISTRIBUTION 11
frequency distribution, and the average found in exactly the
same way in which we find the average reaction time to light
of 100 different observers.
2. The Median
When scores or other measures are arranged in order of
size, the median is defined as the midpoint of the series, that is,
as the point above which and below which are 50% of the
measures. By definition, therefore, the median may be found
N
by counting off one half of the measures, i.e., — , from either end
of the series.
Let us first consider the calculation of the median for scores
or measures in a simple ungrouped series. Two cases arise:
Case I when N is odd, and Case II when N is even. As an illus-
tration of the first case, take the following eleven consecutive
scores: 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24. Now since N
N
equals 11, — = 5.5; and counting off the first five scores, namely,
Ji
14, 15, 16, 17, 18, we reach 19, since score 18 means "18 up
to 19." (See page 7.) The .5 left of our 5.5 then locates the
median midway between 19 and 20, viz., at 19.5. To verify
this result we may count off 5.5 scores beginning at the other
end of the series. The five scores, 24, 23, 22, 21, and 20,
take us to 20 (the upper limit of score 19) and the .5 left puts
the median at a point midway on the scale between 20 and 19,
viz., at 19.5 again. (See diagram below.)
Case I (N is odd)
Begin 5.5 Scores Median 5.5 Scores End ]
I 1 1 I 1 I 19-5 1 1 I I [ 1
14 15 16 17 18 19 20 21 22 23 24 25
To illustrate the procedure when N is even, let us drop off the
first score (14) from the series of eleven scores in Case I. N is
N
now 10, and -^ is 5.0. Counting off the first five scores, therefore.
12 STATISTICS IN PSYCHOLOGY AND EDUCATION
from the small end of the series, i.e., 15, 16, 17, 18, 19, we reach
20 (the upper limit of " 19 up to 20") as the median. Likewise,
if we count down five scores from 24, i.e., 24, 23, 22, 21, 20, we
again reach 20, the lower limit of the step " 20 up to 21." See
diagram below:
Case II (N is even)
Begin (5 Scores) Median (5 Scores) End
1 1 11111111
15 16 17 18 19 20 21 22 23 24 25
It will be noted that in the two cases just cited, the measures
were taken to be in continuous series. If, instead of continuous,
the eleven scores under Case I are taken as discrete or discontin-
uous there is now no value which fulfills the definition of the
median as the midpoint in the series. When N is odd, however,
the midscore or the middle measure may be obtained by counting
off - — ~ — scores from either end of the series, after the scores
have been arranged in order of size from least to greatest.
11 + 1
Thus, (Case I) — - — or 6 scores counted off from either end of
our series puts the midscore at 19 — since there are 5 scores
above and 5 scores below this score. A slightly different pro-
cedure is necessary when N is even. If the ten scores under
Case II, for example, are taken as discrete, there is in this
series, clearly no median value, and no midscore. However,
in such cases as this it is customary to take the midscore arbi-
trarily at a point midway between the two middlemost scores.
N+l
Thus, in our illustration, — - — =5.5, which puts the midscore
A
at 19.5, midway between 19 and 20, the two middlemost scores.
(For a discussion of the median for discrete measures grouped
into a frequency distribution, see page 36.)
The method of calculating the median for continuous data
grouped into a frequency distribution is shown in the two
examples in Table II. Since there are 54 scores in the first
THE FREQUENCY DISTRIBUTION 13
N . .
distribution, — is 27. The median, therefore, is that point on
the scale which has 27 scores on each side of it. If we begin at
the small end of the distribution x and add up the scores in
order, the step-intervals 125-129.99 to 170-174.99, inclusive,
are found to contain just 26 scores. The next step, 175-179.99,
contains 8 scores (assumed to be evenly spread over the
entire step. See page 5.) To get the 1 extra score needed to
make 27, therefore, we must take 1/8X5 — the length of step
— and add this amount (.625) to 175, the beginning of the step-
interval 175-179.99. This puts the midpoint at 175+.625 or
175.625, which is, accordingly, the median of the distribution.
(See Diagram I.)
A second illustration of how the median is found when the
data are grouped into a frequency distribution is given in
Table II (2). This second example should aid in clearing up
any doubtful points in the first problem. Since there are 200
scores in this distribution, one half of the scores is 100, and the
median must lie at a point 100 scores distant from either end of
the distribution. If we begin at the small end of the distribu-
tion, i.e., at 104-107, and add the scores in order, 52 scores will
take us through step 112-115. The 49 scores on the next step-
interval, (116-119) total 101 scores — one too many to give us
the median. To get the 48 scores needed to make exactly 100,
therefore, we must take 48/49X4 (the length of the step) and
add this amount, 3.92 to 116, the beginning of the step-interval.
This takes us exactly 100 scores into the distribution, and locates
the median at 119.92. Diagram I (2) shows graphically how
this median is obtained.
Summary of the steps in computing the median from data
tabulated in a frequency distribution:
N
(1) Find — measures.
, z
N
1 While the median may be found equally well by counting in — scores from
the large end of the distribution, it is simpler to begin at the small end, and the
student is advised to follow this plan first.
14
STATISTICS IN PSYCHOLOGY AND EDUCATION
(2) Begin at the smaller end of the distribution and count
the measures serially up to the interval which contains the
median.
N-
(3) Divide the number of measures necessary to fill out —
by the frequency on the interval containing the median [reached
Scale
F
179
178
177
176
IT8!
8
7
6
Step 2
\ 5
175-179 3
4
3
2
1
174
173
172
171
120
3
Step •-
170-174 s
M5
2
1
34 Scores to 180
8F's
.21 Scores to 175.625, the Median
26 Scores to 175
3F's
Median =175 +^ X 5 =175.625
DIAGRAM I (1)
The Calculation of the Median.
Explanation — 26 9cores go up to 175 on the scale; 34 scores to 180. To find how
far 27 scores will go, we must take J of 5 (the step length) and add this to 175. This
puts the median at 175.625.
in (2) above] and multiply the result by the length of the
Btep-interval.
(4) Add the amount obtained in (3) to the lower limit of
THE FREQUENCY DISTRIBUTION
15
the step which contains the median. This will give the median
point on the scale.
3. The Mode
The mode is most simply defined as that measure which
occurs most often in a series. In the series, 10, 11, 11, 12, 12,
Step £
116-119 §
Step jj
112-115 s
Scale
_120_
119
118
117
-1-16-
115
114
113
4-1-2-
OS
X
101 Scores to 120
100 Scores to 119.92, the Median
52 Scores to 116
Median = 116 +*%> x 4 =119.92
DIAGRAM I (2)
The Calculation of the Median.
Explanation — 52 scores counted off take us to 116 on the scale; 101 scores take us
to 120. To find how far 100 scores go, we must take 48/49 of 4 (the step length) and
add this amount (3.92) to 116. This locates the median at 119.92.
13, 13, 13, 14, 14, and 15, for example, since the most often
recurring measure is 13 this measure may be taken as the mode.
In Table I (1) we find from the ungrouped scores that 185 occurs
5 times — more often than any other single score — and hence 185
may be taken as the mode of this series.
16 STATISTICS IN PSYCHOLOGY AND EDUCATION
When the scores or measures are continuous and have been
grouped into a frequency distribution, the " crude mode" is
often taken as the midpoint of the step-interval which contains
the greatest frequency. In Table I, for example, if we did not
know from the ungrouped scores that 185 is the modal score,
the crude mode of the distributions given in (2) would be taken
at 187.50, the midpoint of step 185-189, the step-interval con-
taining the greatest frequency. Likewise, in Table II, the
crude mode would be 122, the midpoint of the step which con-
tains the greatest frequency.
It is clear that the crude mode will be dependent to a large
extent upon the size of the step-interval selected (i.e., on whether
the grouping is by large or small steps) and for this reason it is
often an unstable measure of central tendency. This is not
necessarily a serious drawback, however, as the mode is usually
employed simply to indicate in a rough way the center of con-
centration in the distribution. For this purpose it is not
necessary to define it so carefully as we do the median or the
arithmetic mean.
III. Measures of Variability
In Section II we discussed the calculation of the so-called
" measures of central tendency" — measures typical or repre-
sentative of the set of scores as a whole. Our next step is the
calculation of the variability of the scores, i.e., of the "scatter"
or "spread" of the separate scores or measures around their
measure of central tendency. This will be the task of the pres-
ent section.
The usefulness of some measure of variability can be shown
by a simple example. Suppose that we have given a test of
controlled association to a group of 50 boys and the same test
to a group of 50 girls. The average scores are, Boys, 34.6 sees.,
and Girls, 34.5 sees. — so far as the averages go, there is
apparently no difference in the performance of the two groups.
Suppose, however, that on examining the original scores, we
THE FREQUENCY DISTRIBUTION 17
find the boys' scores ranging from 15 to 51 sees, and the girls'
scores ranging from 19 to 45 sees. This discovery would make
it evident at once that in a general way, the boys " cover more
territory" — are more variable — than the girls, and this greater
variability may be of considerably more interest than the lack
of difference in the average scores. If a group is homogeneous,
i.e., made up of individuals of nearly the same ability, most of
the scores will fall near the same point on the scale, the range
will be relatively short, and the variability will be small. If,
however, the group contains individuals of widely differing
capacity, the scores will be strung out from high to low, the range
will be relatively wide, and the variability will be large. Four
measures have been devised to take account of this factor of
variability within a set of measures. These are (1) the range,
(2) the quartile deviation, or Q, (3) the average deviation, or
AD, and (4) the standard deviation, or SD.
1. The Range
In grouping the scores in Table I into a frequency distribu-
tion (page 3) we have already had occasion to use the range.
It may be re-defined simply as the interval between the largest
and the smallest measures. In the illustration given above,
the range of the boys' scores is 51-15 or 36, and the range of the
girls' scores 45-19 or 26. The range is the most general measure
of " spread" or " scatter." It includes 100% of the distribution,
and is employed when we wish to make a rough comparison of
two or more groups for variability; or when the number of
measures is too small to justify the calculation of some more
refined measure of variability. Since the range only takes ac-
count of the extremes of the series, it is obviously unreliable
when frequent or large gaps occur in the distribution of scores.
2. The Quartile Deviation, or Q
The quartile deviation, or Q, may be defined as one half
of the distance between the 75th and the 25th percentile points
in the given distribution. The 25th percentile, or Qi, is the
18 STATISTICS IN PSYCHOLOGY AND EDUCATION
first quarter or quartile point on the scale; the point below
which lie 25% of the measures. In like manner, the 75th
percentile, or Qz, is the third quarter or quartile point on the
scale, the point below which lie 75% of the measures. (By
analogy, the median is Q2, the second quartile point.)
In order to find Q, it is obvious that we must first calculate
the 75th and 25th percentile points. These points are found in
exactly the same way as the median: viz., to find Qi we count
off 25% of the scores from the beginning of the distribution;
and to find Qs, we count off 75% of the scores from the beginning
of the distribution.
Table III illustrates the calculation of Q for the distribution
of 54 Alpha scores tabulated in Table I. First, to find Qi, we
must count off 1/4 of the total number of scores, i.e., 13.5, from
the small end of the distribution. When the scores (the F's) are
added in order the first six step-intervals (the steps 125-129.99
to 150-154.99 inclusive) are found to contain 10 scores. The
next step, 155-159.99, contains 6 scores.1 We need only 3.5
additional scores, however, to make up the necessary 13.5;
3 5
hence we take -77- X 5 (the step length) and add this amount
(2.92) to 155, the beginning of the step. This locates Qi at
155+2.92 or 157.92.
In like manner, we find Q% by counting off 3/4 of the score^
from the small end of the distribution. 3/4 of 2V = 40.5; and thb
F's on steps 125-129.99 to 180-184.99, inclusive, added in order,
total 37. The next step, 185-189.99, contains 10 scores. To
3 5
round out the necessary 40.5, therefore, we take tttX5 (the
step length) and add this amount (1.75) to 185, the beginning
of the step. This puts Q3 at 186.75 since 40.5 scores reach this
point.
1 Assumed to be spread evenly over the entire step. See page 5.
THE FREQUENCY DISTRIBUTION
19
TABLE III
To Illustrate the Calculation op Q, AD, and SD from
Data Grouped into a Frequency Distribution
1. DATA FROM TABLE I, 54 ARMY ALPHA SCORES
V
(1)
Scores
200-204 . 99
195-199.99
190-194.99
185-189.99
180-184.99
175-179.99
170-174 . 99
165-169.99
160-164.99
155-159.99
150-154.99
145-149.99
140-144.99
135-139.99
130-134.99
125-129.99
(2)
Midpoint
202.50
197.50
192.50
187.50
182.50
177.50
172.50
167.50
162.50
157.50
152.50
147.50
142.50
137.50
132.50
127.50
(3)
F
1
4
2
10
3
8
3
3
4
6
4
1
1
2
0
2
AT = 54
Average = 171.57 (Table II)
AT
— = 13.5, therefore,
^ = 155+^X5 = 157.92
(4)
D
30.93
25.93
20.93
15.93
10.93
5.93
.93
■ 4.07
■ 9.07
-14.07
■19.07
-24.07
-29.07
-34.07
-39.07
•44.07
(5)
FD
30.93
103.72
41.86
159.30
32.79
47.44
2.79
-12.21
-33.28
-84.42
-76.28
-24.07
-29.07
-68.14
-88! ii
837.44
(6)
956.66
2689.46
876.13
2537.65
358.39
281.32
2.79
49.69
329.06
1187.79
1454.66
579.36
845.06
2321 . 53
'3884^33
18353.88
— =40.5, therefore,
Q3 = 185+^X5 = 186.75
g.A=g»,186-75-157.92Bl4>42
AD = ZTO 837^4 =15<51
N
54
SD =
V
2TO2
N
-4
18353 . 88
54
V339. 887 = 18.44
20 STATISTICS IN PSYCHOLOGY AND EDUCATION
TABLE III — Continued
2. DATA FROM TABLE II (2), 200 CANCELLATION SCORES
(1)
(2)
(3)
(4)
(5)
(6)
Scores
Midpoint
F
D
FD
FD*
136-139
138
3
18.06
54.18
978.49
132-135
134
5
14.06
70.30
988.42
128-131
130
16
10.06
160.96
1619.26
124-127
126
23
6.06
139.38
844 . 64
120-123
122
52
2.06
107.12
220.67
116-119
118
49
- 1.94
- 95 06
184.42
112-115
114
27
- 5.94
-160.38
952.66
108-111
110
18
- 9.94
-178.92
1778.47
104-107
106
7
N = 200
-13.94
- 97.58
1360.27
1063.88
8927.30
Average =
= 119.94 (Table II)
N
4 :
= 50, therefore,
3N
4
= 150, therefore
t
Qi-
= 112+—
^27
X4 = 115.70
Qz--
49
= 120+^X4 =
52
123.77
Q^Q3-Qi=123.77-115.70_1Q1
sro_1063.88
AD~ N 200 ~5'6Z
on jWD* /8927.30 pQ
^=VnV-=V-200-=6-68
With Qi and Q3 known, the quartile deviation, Q, is easily
calculated from the formula
Q = ^^ (3)
_ ., ul n 186.75-157.92 1/f ._
In the present problem, Q = or 14.42.
A second illustration of the calculation of Q from a frequency
distribution is given in Table III (2). Since the N of this dis-
THE FREQUENCY DISTRIBUTION 21
tribution is 200, 1/4 of the measures equals 50. The steps 104-
107 and 108-111 contain 25 scores; and the next step contains 27
scores. To find the point reached by 50 scores, therefore, we
must take 25/27X4 (the step length) and add this amount
(3.70) to 112, the lower limit of step 112-115. This locates
Qi at 115.70.
To find Q3, we must count off 3/4 of AT or 150 scores from
the small end of the distribution. The first four steps include
101 scores, and the next step, 120-123, contains 52. To fill
out 150, therefore, we take 49/52X4 (the length of step) and
add this increment (3.77) to 120 to locate Q3 at 123.77. Sub-
stituting 115.70 for Qi and 123.77 for Q3 in formula (3) we
get a Q of 4.04 points.
The quartile points, Qi and Q3, are of considerable impor-
tance in that they mark off the limits within which fall the
middle 50% of the measures in the distribution. The distance
between these two points is often called the interquartile range;
hence Q is sometimes called the Semi-interquartile Range.
Q actually measures the average distance of the two quartile
points from the median, and because of the ease with which
it can be found is a valuable measure of the closeness with
which the scores are grouped directly around the median point.
If the scores of a distribution are closely packed together, the
quartiles will be close together and Q will be small ; if the scores
are scattered, the quartiles will be relatively far apart, and Q
will be large.
When the distribution is symmetrical or " normal " (see
page 85) Q marks off exactly the limits of the 25% of the cases
just above, and the 25% of the cases just below the median:
and accordingly, the median lies just halfway between the two
quartile points Q\ and Q3. Q is then commonly known as the
PE (probable error). The terms Q and PE are often used inter-
changeably, although it is probably best to restrict the use of
the latter term to normal distributions, and to the measure-
ment of reliability. The value of the PE as a measure of
reliability will be discussed at length in Chapter HI,
22 STATISTICS IN PSYCHOLOGY AND EDUCATION
Summary of Steps in Calculation of Q (Data Grouped)
To find Qi :
1. Divide N by 4.
2. Begin at the small end of the distribution, and count
the scores up to the interval which contains Q\.
3. Divide the number of measures necessary to locate
/ N\
Qi ( i.e., to complete — J by the frequency in the
interval reached in (2) above, and multiply the
result by the step-interval.
4. Add the amount obtained in (3) to the lower limit of
the step-interval on which Qi lies. The result
is Qi.
To find Q3:
1. Find 3/4 of .V.
2. Begin as before at the small end of the distribution,
and count up the scores until the interval which
contains Qs is reached.
3. Divide the number of scores required to locate Qs by
the frequency in the interval reached in (2) and
multiply the result by the step-interval.
4. Add the amount obtained in (3) to the lower limit of
the step-interval on which Q3 lies. This locates
Qb.
To find Q:
Substitute Q3 and Qi in formula (3),
n_Qs-Qx
^~ 2 *
3. The Average Deviation, or AD
The average deviation or AD (also written mean deviation-
or MD) may be defined as the average of the deviations of all
the separate measures in a series taken from their central
tendency (usually the average, less frequently the median,
THE FREQUENCY DISTRIBUTION 23
or mode). In averaging deviations to find the AD, no account
is taken of signs, and all deviations, whether positive or negative,
are treated as positive.
An example will make the definition clearer. If we have
five scores, 6, 8, 10, 12, and 14, the average is easily found to
be 10. It is then a simple process also to find the deviation of
each measure from the average by subtracting the average from
each measure. Thus 6, the first score, minus 10 equals —4
(calculation algebraic); 8-10= -2; 10-10 = 0; 12-10 = 2;
and 14 — 10 = 4. The five deviations measured from the aver-
age are —4, —2, 0, 2, and 4. Now adding these deviations
without regard to sign, the sum is 12; and dividing 12 by 5,
we get 2.4, as the average of the 5 deviations from the average,
or the AD. The formula for the AD with simple ungrouped
numbers like these may be written,
22)
1D = y (arithmetical), (4)
in which 2D = sum of deviations, and N is, as before, the num-
ber of cases or items in the series.
In Table III, the calculation of the AD for scores grouped
into a frequency distribution is illustrated by two problems.
The average of problem (1) has already been found in Table
II to be 171.57. Hence, to find the average deviation of the
scores in this distribution from the average, we must take our
deviations (D's) around this point. Note, however, that, since
the scores have been grouped into step-intervals, we are no
longer able to get the D of each score from the average; and
hence we simply find the deviation (D) of the midpoint of each
step from the average. The substitution of the midpoint value
for all of the scores within the step is the only difference
between the computation of D's with grouped and ungrouped
measures. For example the D of step 200-204.99 is 30.93,
found by subtracting 171.57 (the average) from 202.50 (the
midpoint of the step). Likewise, the D of the next step is
25.93, found by subtracting 171.57 from 197.50. All of the D's
24 STATISTICS IN PSYCHOLOGY AND EDUCATION
are positive as far down the scale as 170-174.99, as in each
case the midpoint is larger numerically than the average.
From the step-interval 165-169.99 on down to the beginning
of the series, however, the D's are negative, as the midpoints
of these steps are all smaller than 171 .57. Thus the D of step
165-169.99 is -4.07, e.g., 167.50-171.57= -4.07; and the D
of the lowest step in the distribution, 125-129.99, is —44.07.
It will be helpful in finding deviations to remember that
the average is always subtracted from the individual score or
midpoint value. That is,
Deviation = Score or Midpoint — Average (calculation alge-
braic).
Hence it is clear that when the score or midpoint is
numerically larger than the average, the deviation must be
positive; when the score or midpoint is numerically smaller
than the average, the deviation must be negative.
It is obviously unnecessary to subtract the average from
each midpoint separately in order to obtain the different D's.
The reason, of course, is that each step-interval is 5 points;
hence, after finding the D of step 200-204.99 to be 30.93,
we need only subtract 5 points from this D in order to obtain
25.93, the D of the next step; then 5 again to obtain 20.93,
the D of the next step, and so on.1 The negative D's are
obtained in exactly the same way as the positive D's. Thus
.93-5= -4.07; -4.07-5= -9.07 and so on to -44.07.
Column 4 gives the deviation of each step-interval (as
represented by its midpoint) from the average of the dis-
tribution. There are, however, more scores on some steps
than on others; and for this reason each midpoint -devia-
tion (D) in column 4 must be " weighted " (multiplied) by
the number of scores (F) which it represents. This gives
the FD column, — column 5. The first FD is 30.93; for since
there is only 1 score on step 200-204.99, we need simply
multiply the first D by 1. The next FD is 103.72; since each
1 Checking the D's occasionally to avoid carrying an error throughout our
calculations.
THE FREQUENCY DISTRIBUTION 25
of the 4 scores on step 195-199.99 has a D of 25.93. In like
manner, we obtain the other FD's, by multiplying each D in
column 4 by its corresponding frequency (F) in column 3.
When all of the FD's have been calculated, we sum the
column without regard to sign and divide by N to obtain the
837.44
AD. In the present problem, the AD equals — =j — or
15.51.
The formula for the AD for measures grouped into a fre-
quency distribution may now be written as follows:
AD= — -(arithmetical) (5)
This formula applies equally well to the AD found from the
average, median, or mode.
The second problem in Table III shows the calculation of
the AD for the 200 cancellation scores, grouped into a fre-
quency distribution with a step of 4. The average for this
distribution has been found to be 119.94 (see Table II, 2).
Hence, the D of the first step 136-139 (midpoint 138), from the
average is 18.06. The next D may be found by subtracting
4 (the step-interval) from 18.06, and each succeeding D in
turn by subtracting 4 from the D just preceding it.
The FD's in column 5 are found [as previously shown in (1)]
by " weighting " each D by the F which it represents, — by
the F opposite it. The sum of the FD column is 1063.88;
and since N is 200, from formula (5) we obtain 5.32, as the
AD of the scores in this distribution from their average
119.94.
In a perfectly symmetrical or normal distribution (page
85) the AD — when measured off above and below the average
— marks the limits of the middle 57.5% of the measures.
Thus the AD is seen to be slightly larger than the Q. In general,
a large AD means that the scores in the distribution are scat-
tered around the central tendency; a small AD means that
they are concentrated within a relatively narrow range.
26 STATISTICS IN PSYCHOLOGY AND EDUCATION
4. The Standard Deviation, or SD
The standard deviation or SD is the most reliable of the
measures of variability, and for this reason is customarily used
in research which requires great accuracy. The SD differs
from the AD in several respects. In the first place, in cal-
culating the AD we disregard signs and treat all deviations
as positive; in finding the SD, on the other hand, we avoid
this difficulty of signs by squaring the separate deviations.
Again, the deviations used in computing the SD are always
taken from the average, and never from the median or mode
as is sometimes done in finding an AD. The conventional
symbol used to denote the SD is the Greek letter sigma, a.
We may define the SD or a as the square root of the mean
(or average) of the squared deviations taken from the average
of the distribution. To illustrate the calculation of the SD
in a simple case, let us consider the example used to illustrate
the calculation of the AD (see page 25) in which the devia-
tions of the five measures, 6, 8, 10, 12, and 14, from their
average 10 were found to be —4, —2, 0, 2, and 4, respectively.
If we square each of these deviations we get 16, 4, 0, 4, and 16
(the minus signs become plus in squaring). Next, summing up
these five squares and dividing by 5, the mean of the squares
(8) is obtained; extracting the square root of this result gives
2.828 the SD or a of the series. The formula for the a of a
series of numbers, ungrouped, is
2D2
w (6)
Table III illustrates the calculation of a for scores grouped
into a frequency distribution. The process is identical with
that used for simple numbers except that in addition to squar-
ing the D of each midpoint from the average, we " weight '
each of these squared deviations by the frequency which it
represents — the frequency opposite it. This gives the FD2
column. By simple algebra, DXFD~FD2) and accordingly
the easiest way to obtain the entries in this column is by
THE FREQUENCY DISTRIBUTION 27
multiplying the corresponding D's and FD's in columns 4 and 5.
The first FD entry, for example, is 956.66, the product of
30.93X30.93; the second is 2689.66, the product of 103.72 X
25.93, and so on to the end of the column. All of the FD2,s
are necessarily positive, since each negative D is matched by
a negative FD and consequently the product is positive. The
sum of the FD2 column (18,353.88) divided by N(54) gives
the mean of the squared deviations as 339.887; and the
square root of this result is 18.44, the standard deviation.
The formula for the SD when the data are grouped into a
frequency distribution is
fzFm
^\^r (7)
Problem (2) of Table III furnishes another illustration of
the calculation of cr from grouped data. Column 6, the FD2
column has been obtained, as in the previous problem, by
multiplying each D by its corresponding FD. The sum of the
FD2 column is 8927.30; and N is 200. Hence, applying
formula (7) we get 6.68 as the standard deviation [see Table
III, (2) for calculations].
The standard deviation is, in general, less affected by
chance fluctuations than the AD, and is, therefore, a more
stable measure of dispersion. In a " normal " distribution
(page 85) the SD when measured off above and below the
average marks the limits of the middle 68.26% (roughly the
middle 2/3) of the distribution. This is approximately true,
also, for less symmetrical distributions. For example, in the
first problem in Table III, the middle two thirds of the
scores will fall roughly between score 190 (171.57+18.44) and
score 153 (171.57—18.44). The standard deviation is always
larger than the AD which, in turn, is always larger than Q.
This relation supplies a rough but simple check on the accuracy
of calculated measures of variability.
28 STATISTICS IN PSYCHOLOGY AND EDUCATION
IV. The Short Method of Finding the Average, the AD,
AND THE SD(a)
In Tables II and III, the average, the AD, and the SD
have been calculated by what is oftentimes known as the
Long Method. The reader will recall that the average in these
tables was found by multiplying the midpoint of each step-
interval by the number of scores on the step, summing up
this column (the FXM) and dividing by N, the number of
cases (page 9). Besides, in finding the AD and the SD all
midpoint deviations were figured from the actual averages of
the distributions.
It is, no doubt, already apparent that the Long Method
(LM) requires the handling of large numbers and decimals
and that the calculations are often tedious. To save time
and labor, therefore, the Guessed Average Method, or more
simply the Short Method (SM), has been devised for the
express purpose of cutting down the calculations involved
in finding the average, the AD, and the SD. (The Short
Method does not apply to the computation of the Median and
the Q, which are always found by the methods with which
we are already familiar.) The student of statistics should
make a special effort to learn the Short Method to the point
where he can use it with facility. Not only is it a great time
and labor saver, but in the calculation of coefficients of
correlation it is well-nigh indispensable.
Table IV (2) illustrates the calculation of the average,
AD, and SD by the Short Method. In order to make a com-
parison of the computations involved in the two methods
easier, the calculations by the Long Method of the average,
AD, and SD for the same data are also given in the Table.
1. The Calculation of the Average by the Short Method
The first important fact to grasp in beginning a study of
the calculation of the average by the Short Method is that we
" guess " or assume an average at the outset, and later apply
THE FREQUENCY DISTRIBUTION
29
TABLE IV
To Illustrate the Calculation of the Average, AD, and SD by the
Short Method. Data from Table II (1) Calculations for Long
Method Given for Comparison.
1. long method
(i)
Scores
200-204
195-199
190-194
185-189
180-184
175-179
170-174
165-169
160-164
155-159
150-154
145-149
140-144
135-139
130-134
125-129
1. Aver.
(2)
Midpoint
202.5
197.5
192.5
187.5
182.5
177.5
172.5
167.5
162.5
157.5
152
147.
142
137
132
127
(3)
F
1
4
2
10
3
8
3
3
4
6
4
1
1
2
0
2
(4)
FXM
202 . 5
790.0
385.0
1875.0
547.5
1420.0
517.5
502.5
650.0
945.0
610.0
147.5
142.5
275.0
iV=54
255.0
9265.0
(5)
D
30.93
25.93
20.93
15.93
10.93
5.93
.93
- 4.07
-9.07
-14.07
-19.07
- 24 . 07
- 29 . 07
- 34 . 07
- 44 . 07
■ZFM 9265
N
54
= 171.57
— V^N183!
54
2. SHORT method
(1)
(2)
(3)
(4)
Scores
Midpoint
F
D
200-204
202.5
1
7
195-199
197.5
4
6
190-194
192.5
2
5
185-189
187.5
10
Fg = 31
1 4
180-184
182.5
3
3
175-179
Average
=177.5
8
2
170-174
171
57
172.5
3
1
165-169
167.5 (GA)
3 ]
160-164
162.5
4
-1
155-159
157.5
6
-2
150-154
152.5
4
-3
145-149
147.5
1
> Fi = 23
-4
140-144
142.5
1
-5
135-139
137.5
2
-6
130-134
132.5
0
-7
125-129
127.5
2
-8
A=54
(6)
FD
30.93
103.93
41.88
159.30
32.79
47.44
2.79
-12.21
-36.28
-84.42
-76.28
-24.07
- 29 . 07
-68.14
-88.14
837.44
(7)
FD*
956.66
2689.46
876.13
2537.65
358.39
281.32
2.59
49.69
329.06
1187.79
1454.66
579 . 36
845 . 06
2321.53
3884^33
18353.88
2.
53 . 88
AD =
SFD_ 837.44
N
54
15.51
= 18.44
(5)
(6)
FD
FD*
7
49
24
144
10
50
40
160
9
27
16
32
3 ( + 109)
3
4
4
12
24
12
36
4
16
5
25
12
72
16 (-65)
128
GA= 167.50
c2= .6639
C = .8148X5=4.07
Average = 167 . 5 +4 . 07 = 171 . 57
2. AD
174
2FD+c(Fi-
770
Fg)
c=4+= .8148
5 4
N
Xstep
_174 + . 8148(23-31)
= 15.51
VSFD2 /770
— j c2= -J-gj— .6639 = 3.687X5=18
54
44
X5
30 STATISTICS IN PSYCHOLOGY AND EDUCATION
a correction to this guessed average (GA) in order to obtain
the actual average. There is no set rule for guessing an average.
The best plan is to take the midpoint of a step somewhere
near the center of the distribution, and if possible the mid-
point of that step-interval which contains the greatest
frequency. In our problem the greatest F is on step 185-189.
However, the GA is taken at 167.5 instead of 187.5 since the
former is closer to the center of the distribution. With the
question of the GA settled, the correction which must be
applied to it to get the average is determined as outlined in the
following steps:
(1) First, we fill in the D column, column 4. Here are
entered the deviations of the midpoints of the steps measured
from the GA in units of step-interval. Thus 172.5, the mid-
point of step 170-174, deviates from 167.5, the GA} by 1
step-interval; and hence, a figure 1 is placed in the D column
opposite 172.5. In like manner, 177.5 deviates 2 steps from
167.5; and accordingly, a 2 goes in the D column opposite
177.5. Reading on up the column from 177.5, the succeeding
D entries are found in the same way to be 3, 4, 5, 6, and 7.
The last entry, 7, is the step deviation of 202.5 from 167.5
(the actual point deviation, is, of course, 35).
Returning to 167.5, we find that the D of this point,
measured from the GA (from itself) is 0; and hence a 0 is
placed in the D column opposite step 165-169. Below 167 . 5,
all of the D entries are negative, as all of the midpoints are less
than 167.5, the GA. So the D of 162.5 from 167.5 is -1
step-interval; and the D of 157.5 from 167.5 is —2 step-
intervals. The other D's are —3, —4, —5, —6, —7, —8.
(2) The D column completed, we next compute the FD
column — column 5. The FD entries are found in exactly the
same way as in the Long Method [compare (1)]; namely,
each D in column 4 is multiplied, or " weighted," by the
appropriate F in column 3. Note that in the Short Method
we multiply each F by its deviation from the GA in units
of step-interval instead of by its actual deviation from the
THE FREQUENCY DISTRIBUTION 31
average of the distribution, and that for this reason the com-
putation of the FD's is much simpler here than in the Long
Method. All of the FD's above (greater than) the GA will
be positive, and all below (smaller than) the GA negative,
since the signs of the FD's depend on the signs of the D's.
(3) From the FD column the correction is obtained as
follows: The sum of the plus FD's is 109; of the negative
FD's, — 65. This makes 44 more plus FD's than minus
(the algebraic sum is +44) and 44 divided by 54 (N) equals
.8148, which is the correction, " c," in units of step-interval.
If we multiply c (.8148) by 5, the length of the step, the result
is C (4 . 07) , the score correction, or the correction in score units.
When +4.07 is added to 167.5, the GA} the result is 171.57,
the average. (Compare this result with the average found by
the Long Method.)
A summary of the steps in the calculation of the average by
the Short Method may be outlined as follows (see Table IV, 2) :
(1) Organize the scores or measures into a frequency
distribution.
(2) Guess an average somewhere near the center of the
distribution, and preferably on the step containing the
greatest frequency.
(3) Find the deviation of the midpoint of each step-interval
from the GA in units of step-interval.
(4) Multiply or weight each step-deviation (D) by its
appropriate F, i.e., by the F opposite it.
(5) Find the algebraic sum of the plus and minus FD's, and
divide this sum by N, the number of cases. This gives c,
the correction in units of step-interval.
(6) Multiply c by the length of the step-interval to get C,
the score correction.
(7) Add C algebraically to the guessed average to get
the actual average. Sometimes C will be positive and some-
times negative, depending upon where the average has been
guessed. The method applies equally well in either case.
32 STATISTICS IN PSYCHOLOGY AND EDUCATION
If it seems to the reader that the Short Method belies its
name, let him compare the calculations in columns 4 and 5
(SM) with the calculation of column 4 (LM). In spite of the
extra column, the SM has a decided advantage over the LM,
for as all deviations from the GA are in units of step-interval
(whole numbers) the arithmetic is considerably easier in the
latter method. In distributions containing large numbers,
the calculation of the average by the LM becomes very
laborious; and it is with such distributions that the SM
justifies itself as a time and labor saver, rather than with
distributions containing small numbers.
2. The Calculation of the AD by the Short Method
(A) The Calculation of the AD from the Average
The chief advantage in finding the AD by the Short Method
instead of the Long Method lies in the fact (already noted in
calculating the average) that in the Short Method deviations
are taken from a GA in units of step-interval. This procedure
eliminates fractions and cuts down multiplication; but at the
same time it necessitates the application of a correction to
the XFD and as a result complicates the AD formula. The
formula for the AD by the Short Method is: l
.n 2FD+c(Fi-Fg), ,, , . . ■ . ,0.
AD = ~ -X length of step-interval. . (8)
The term Fl in the formula refers to the sum of the F's
on those steps whose midpoints are less (the subscript " I '
means less) than the average of the distribution. The term
Fg refers to the sum of the F's on those steps whose midpoints
are greater (the subscript " g " means greater) than the average.
In Table IV, for example, all of the midpoints from 167.5
down to 127 . 5, inclusive, are less than 171 . 57, the average
and hence the Fl is 23. All of the midpoints from 172.5 up to
202.5, inclusive, are greater than 171.57; and hence the Fg
is 31. It is important to remember that the Fl and the Fg
1 This formula applies equally well to the AD calculated from average,
median, or mod©.
THE FREQUENCY DISTRIBUTION 33
are always calculated from the actual average of the distribution
(never from the guessed average) as the reference point. In con-
sequence the 3 scores on step 165-169 whose midpoint, 167 . 5,
is less than 171.57 are included in the Fl. A simple check
on the size of the Fl and Fg is to make sure that Fi+Fg=N.
(Note that in the present problem 23+31 = 54.)
The other terms in the formula require little explanation.
The c is the correction in units of step-interval. It has already
been found in calculating the average (page 31) and equals
.8148. The 2FD is the arithmetic sum of the FD column,
and equals 174.
If now we substitute for 2FD, c, Fl, and Fg in formula
(8), the numerator is 174+ .8148(23-31) or 167.482. Dividing
this result by 54 (2V) we obtain 3.102, the AD expressed in
units of step-interval; and this value multiplied by 5 (the
step) gives 15.51, the AD of the distribution. (Compare with
the AD found by the Long Method.) Notice that it is always
necessary to multiply the result given in the formula by the
step-interval, since XFD and c are both in units of step.
Formula (8) is a relatively quick way of rinding the AD
of a frequency distribution. The value of the formula is
somewhat limited, however, since it gives correct iD's only
when c, the step-correction, is less than 1.00. In Table IV,
c= .8148 — is less than 1.00 — and in consequence the formula
holds, as we find on comparing the AD's given by the Long and
Short Methods. One method of circumventing this limitation
in the AD formula, is to make use of the fact that no matter
where the GA is taken, a correction can always be calculated
by means of which we can obtain the actual average. If the
c so found is less than 1 . 00, formula (8) may be applied
directly; if, however, c is larger than 1.00, we must guess
another average on the same step as the actual average
(which is now known) and take deviations from this " new "
GA. The formula will then hold. (There is another formula
for the AD which avoids the difficulty mentioned: see Kelley
T. L., Statistical Method, p. 72ff.)
34 STATISTICS IN PSYCHOLOGY AND EDUCATION
A summary of the steps in the calculation of the AD from the
average by the Short Method may be given as follows:
(1) Find c, the correction in step-units, as shown on page
31. If c is less than 1.00:
(2) Find the arithmetic sum of the FD's.
(3) Calculate the Fl: the total number of scores on steps
with midpoints less than the average. Next calculate the Fg :
the total number of scores on steps with midpoints greater than
the average.
(4) Substitute for FD, c, Fl, Fg, N, and the step length in
formula (8) to find the AD.
TABLE V
To Illustrate the Calculation of the AD from the Median
by the Short Method. Data prom Table 11(2)
(1) (2) (3)
Scores Midpoint F
133-139 138 3
132-135 134 5
128-131 130 16
124-127 126 23
120-123 122 52
116-119 118 (GM) 49
112-115 114 27
Fa = 99
108-111 110 18 f
104-107 106 7 J
Fi = 101
AT = 200 265
(4)
(5)
D
FD
5
15
4
20
3
48
2
46
1
52
0
-1
-27
-2
-36
-3
-21
N
2=10°
48
Median = 116+^X4 = 119.92
Guessed median = 118 (midpoint of step 116-119)
Correction, C = 119. 92- 118. 00 = 1.92
1.92
c = — j— = . 48
4
Applying formula: AD = ^ Xstep length
.n 265+ .48(101 -99)^
AD = 200 X4 =
AD = 1. 33X4 = 5. 32
THE FREQUENCY DISTRIBUTION 35
(B) The Calculation of the AD from the Median
It is sometimes desirable to calculate the AD from the
median instead of the average. The formula for the AD
from the median is exactly the same as formula for AD from
the average (see page 32). However, the scheme of the work
differs in some respects from the calculation of the AD from
the average, and hence it is illustrated in Table V for the 200
cancellation scores taken from Table II (2).
First we find the true median, 119.92, by the method
outlined on pages 13-14. Next, we assume or guess a median
at the midpoint of the step-interval which contains the true
median, viz., at 118. Since the true median is known, the
score correction, C, is found directly to be 1 . 92 by subtracting
118 from 119.92 (true median — assumed median). Then
dividing 1.92 by 4, the step-interval, we obtain .48, the cor-
rection in step-units (c) .
The D's are taken from 118, the guessed median, and the
FD's are obtained (as shown in Table IV) by " weighting "
each D by its corresponding F. The arithmetic sum of column
5, i.e., the XFD, is 265. Fl, the total number of scores on mid-
points 118 to 106 inclusive (those less than 119.92) equals
101. And Fg, the total number of scores on midpoints 122 to
128 inclusive (those greater than 119.92) equals 99.
With 2FD, c, Fl, and Fg known, the AD is now easily
found by substituting these values in formula (8). The
numerator becomes 265+. 48 (101 — 99) or 265.96; and divid-
ing by 200 and multiplying by 4, the step-interval, we get 5 . 32
as the AD from 119.92, the median.
3. The Calculation of the Standard Deviation (a) by the Short
Method
The calculation of the standard deviation by the Short
Method is considerably less complex than the calculation of
the AD. The formula is :
(7 =
kFD2
\~~Aj c2 X the step-interval, ... (9)
36 STATISTICS IN PSYCHOLOGY AND EDUCATION
in which the ZFD2 is the sum of the squared deviations in
units of step-intervals, taken from the guessed average, and c
is the correction in units of step-interval.
An illustration of the calculation of a by the Short Method
is given in Table IV. The first step is to fill in the FD2 column
(column 6) by multiplying each D in column 4 by its corre-
sponding FD in column 5. The process is identical with that
used in the Long Method, except that the Z)'s are all expressed
in units of step-interval. This, of course, considerably simpli-
fies the multiplication. The calculation of c has already been
described on page 31. The sum of the FD2 column (2FD2)
is 770, and c2 is .6639. Applying formula (9) therefore, we
get 3.687X5 or 18.44 as the a of the distribution.
The formula for a by the Short Method unlike the AD
formula, holds good no matter what the size of the correction,
c. This general applicability of formula (9) serves to increase
its value.
4. The Short Method Applied to Discrete Series
We have defined a discrete series on page 2 as one in
which there are real gaps. This means that in a truly dis-
crete series each measure, instead of representing an interval
on a scale as in a continuous series, is a separate and distinct
value. There is, for example, a real gap between one man
and two men; or between one dollar and two dollars —
provided the unit of measurement in the latter case is one
dollar.
Table VI illustrates the method of finding the measures of
central tendency and variability for discrete measures tabu-
lated into a frequency distribution. The data consist of the
records of the number of children in 44 families of a rural
community. In the first column of the table is given the
number of children in the family; in the second column —
under the F — the number of families of a given size. We find,
for example, one family of 10 children; three of 9; four of
8, etc. Since the measures — here the children — are discrete,
THE FREQUENCY DISTRIBUTION
37
TABLE VI
To Illustrate the Calculation of the Average, Median, <t, AD,
and SD When Measures are Discrete
The "F" column gives the number of families containing the children listed in first
column.
Measures,
No. Children
10
9
8
7
6
5
4
3
2
1
0
F
Families
1
3
4
3
5
N = 44
N
2=22
F„ = 24
Fi = 20
D
FD
90
FD*
5
5
25
4
12
48
3
12
36
2
6
12
1
5+40
5
0
-1
- 7
7
-2
- 8
16
-3
-12
36
-4
- 8
32
-5
-15-50
75
292
GA=5
-10
c =
44
Average = 4. 77
Median = 5.0
Mode = 5.0
N
= -.23 c2 = .054
Q = QizQi = 6^-3 = 1 75
AD =
2 2
XFD+c(Fi-Fg) 90- .23(20-24)
N
44
AD = 2.07
SD =
)FD*
A N
-V!
292
054
£D = 2.57
22; since 22nd measure falls on 5, Median =5
N
•j- = 11; since 11th measure falls on 3, Qi = 3.
3.V
= 33; since 33rd measure falls between 6 and 7, $3 = 6.5.
each measure must be taken at face value, and there are, in
consequence, no midpoint values for the different steps. As
a result, the average being guessed at 5, D's are taken directly
from this point. The FD and the FD2 columns are calculated
exactly as shown in Table IV for continuous series — the
38 STATISTICS IN PSYCHOLOGY AND EDUCATION
first column is obtained by multiplying corresponding F and
D values, and the second by multiplying corresponding D
and FD values. Note that since the step-interval is 1, the
correction c equals C directly.
If we apply the correction — . 23 to 5, the guessed average,
the average of the distribution 4 . 77 is obtained. This result,
while mathematically correct, is obviously a rather difficult
one to interpret in a practical way, however, as it is impossible
for a family to have four and a fraction children. Possibly
the median is a more meaningful measure. One half of the
measures is 22, and counting in from the small end of the
series we find that the twenty-second score falls on the fre-
quency opposite step 5. Fractional values are, of course, really
meaningless in a discrete series ; and hence we must simply take
5 as being rough1, y the median of the distribution without any
interpolation. The median family, accordingly, — and the
modal family as well — may be said to contain 5 children, and
on the face of it, this result seems to be of more practical value
than the statement that the average number of children to a
family is 4 . 77.
It is worth while examining further, however, exactly
what is meant by the statement that the average number of
children per family is 4.77. In the first place it means, of
course, that the number of children in the N families examined,
divided by N, gives us 4.77. But furthermore, if the families
examined are actually a fair sample of all of the families in the
" population " from which they are taken (see page 120),
it means that if we had taken all of these families — or
another fair sample of them — the average size of the family
would have been (approximately) the .same. The average,
then, is a constant factor for the given population, such that,
knowing the number of families in any fair sample of the
population, we can multiply this number by the constant factor
and obtain (approximately) the number of children in all of
these families. Good use may thus be made of the average,
therefore, even when the measures are necessarily discrete:
THE FREQUENCY DISTRIBUTION 39
exactly the same kind of use that can be made of the average
In the case of continuous measures.
The median, on the other hand, together with the quartiles,
really breaks down in the case of discrete measures. In the
example above of the families, there is actually no value which
fulfills the definition of the median as such a point or value
that one half of the measures exceed it, and one half fall below
it. There are just 44 families in all; the median, then, would
be such a point that 22 families exceeded it and 22 fell below it.
Now there are 20 families falling below 5; 8 families at 5: and
16 families above 5. If we place the median exactly at 5,
only 20 families instead of the required 22 fall below. And
if we place the median even the least fraction above 5, the
number falling below is increased by all of the families having
5 children, so that there are then 22+8 families falling below
the median, or more than half. There is, in short, no median
value for this series under the definition of the median which
we have been using.
Sometimes, however, another definition of the median is
given, namely, that it is the score or measure made by the
middle individual wjien the individuals have been arranged in
order — for scores — from least to greatest.1 Strictly speaking,
this definition also breaks down in the case of discrete measures,
since there is really no sense in speaking of two or more individ-
uals who have the same score as being arranged in order of
magnitude, when measures are discrete. Thus the 8 families,
of 5 children each, are all exactly equal as regards number of
children. Of course, we might admit that in a sense, some
one (any one) of these 8 families is the middle of the whole
series, and since it is a family of 5 children, the median — so
defined — is just 5, no more nor less. This is the median as we
have used it. At best, however, it is a rough and unreliable
measure.
In computing the measures of variability in a discrete
series, the Q is the only one which offers difficulties. In the
1 See discussion of midscore, page 12.
40 STATISTICS IN PSYCHOLOGY AND EDUCATION
present illustration, one fourth of the measures ( — ) is 11,
and counting in from the small end of the series 11 scores,
we put Qi on step 3 (as in the case of the median, no interpola-
tion is made). If we check this value of Qi by counting in 33
scores from the large end of the distribution, we again obtain
/3N
3 as the value of Qi. Three fourths of the measures f— -
is 33; and counting in 33 scores from the small end of the
series, we find that we complete — or count through — the
frequency on step 6. If 11 scores are counted off from the
other direction, we complete — or count through — the frequency
on step 7. This puts Q% at either 6 or 7, and the best
way out of the difficulty is to take Qs as roughly equal to
6.5, i.e., midway between 6 and 7. This is of course a
makeshift, though even at that probably as accurate as the
median or quartiles ever are in discrete series. Taking Q±
q 5 — 3
equal to 3, and Qs equal to 6 . 5, Q is — "— — - or 1 . 75.
The AD and a in a discrete series are found from formulas
(8) and (9) in exactly the same way as in a continuous series.
For example, Fl — the number of families less than 4.77 — ■
is 22; and Fg — the number of families greater than 4.77 —
is 24. The AD is, therefore, 90+[~ -231(20-24) xl ^
I292
step-interval) or 2.07. The a is */— — .054X1 (the step-
interval) or 2 . 57.
V. The Comparison of Groups
1. The Measurement of Relative Variability. The Coefficient
of Variation
Thus far we have been dealing entirely with measures of
absolute variability within the distribution, the Q, the AD,
and the SD. It is sometimes desirable, however, to measure
relative variability as for instance to compare the variability
THE FREQUENCY DISTRIBUTION 41
of one group on two different tests, or of two or more groups
on the same test. The measures of absolute variability are
not sufficient in such cases as these unless the averages of the
two distributions are equal or approximately equal. A problem
will serve to make this clear.
A group of 50 boys works for 6 minutes on an arithmetic
test and makes an average score of 20 . 5 with a a of 5 . 24. The
same group works for 10 minutes on the same test and makes
an average score of 34 . 8 with a a of 9 . 62. If we compare the a's
of these two distributions we should probably be inclined to say
that the group was considerably more variable in the 10 minute
period than in the 6 minute period. Despite the fact that the
a in the second period is nearly twice as large as the a in the
first period, however, this does not mean necessarily that the
variability of the group has doubled with the increased time
allowance (or even increased at all) for the average score has
also increased from 20.5 to 34.8. In other words, the two
o-'s are not directly comparable as they have been measured
around different central tendencies. In order to compare
the relative variability of this group in the two periods it is
evident, therefore, that we must have a measure which takes
account both of the dentral tendency and the variability. Such
a measure is Pearson's Coefficient of Variation, given by the
formula,
V=^- (10)
Average
Applying this formula to the present problem we find that
For the 6 minute period : V = ' 0, , — = 25 . 56.
20.5
i? 4-u m • i. • j tt 9.62X100 0_ nA
For the 10 minute period: 7= — ^-r-x — = 27.64.
o4 . o
Instead of being 50% as variable in the 6 minute period as
25 56
m the 10, therefore, the group is seen to be actually '
or 93% as variable,
42 STATISTICS IN PSYCHOLOGY AND EDUCATION
The coefficient of variation is especially useful in those
problems in which the variability of the group under different
conditions is the factor studied. As stated above, when the
averages are equal the absolute variability may be compared
directly.
2. The Comparison of Two Groups in Terms of Their Measures
of Central Tendency and Variability
The existence of a difference between the averages or the
medians of two groups does not indicate, necessarily, that
there are any very marked differences in the performance of the
various individuals within the two groups. An obtained differ-
ence in central tendency may mean that the person ranking
lowest in the one group is better than the person ranking high-
est in the other; on the other hand, it may mean also that
only a very small per cent of the better group is actual^
ahead of the poorer. For this reason in comparing groups it
is not sufficient to state simply the difference between their
averages or medians, for any such difference will depend for its
significance largely upon the variability, or spread, within the
groups compared.
Table VII will illustrate what is meant. A group of 300
boys and a group of 250 girls have been measured on the
same test, and the average, median, Q and a of each group
computed. Now if we compare the central tendencies, it is
clear that the average girl is 2 . 19 points ahead of the average
boy, and that the median girl is 2.25 points ahead of the
median boy. If taken alone this result might suggest a fairly
definite sex difference in the given test; but before drawing this
conclusion, we should compare the variability of the two groups.
A comparison of the Q's and c's shows that the girls tend to
scatter somewhat more around their central tendency than
the boys. The range of scores is, however, practically the same
in both groups: 100% of the boys and 92% of the girls score
between 12 and 32 on the scale. Also from the quartiles
it is evident that the middle 50% of the boys scored between
THE FREQUENCY DISTRIBUTION
43
19 and 24 (approximately) while the middle 50% of the girls
scored between 20 and 27 (approximately).
TABLE VII
Comparison
OF
Two Groups in Terms of Central Tendency,
Variability, and Overlapping
Boys
Girls
Scores F
D
FD
F£)2
Scores F
D FD FD*
28-32 15
24-28 68
20-24 128
16-20 79
12-16 10
AT =300
f=150
2
1
0
-1
-2
30
68+98
-79
-20-99
60
68
79
40
247
32-36 20
28-32 35
24-28 73
20-24 68
16-20 41
12-16 13
iV = 250
J-u.
2 40 80
1 35+75 35
0
-1 -68 68
-2 -82 164
-3 -39-189 117
464
GA=22.0
&4=26
-1
C 300
-.003
-114
C 250"
-.456 c2 = .208
C=-. 003X4 =
= -.01
C= -.456X4= -1.82
Average = 2 1.9£
1
-
Average =24.18
Median = 20+
^X4 = 21.91
Median = 24+^
i o
X4 = 24.16
[?-»>-
= 16+^X4 = 19
.29
[^=62.5]q,=
= 20+~X4 = 20.50
68
[^ = 225]0,
= 24+^X4 = 24.47
[f=i87.5]e,
= 24+^-X4 = 27.59
Q=2.59
:4
Q = 3.55
/247
a~\300><
/464
ff=V250-208><4
= .907X4
: = 3
.63
= 1.28X4 =
5.12
What per cent of the boys reach or exceed 24.16, the median of the
girls? 217 boys score below 24. Step 24-28 contains 68 scores; hence
there are 68/4 or 17 scores per scale unit on this step. 17X-16 = 2.72.
217+2.72 or 219.72 of the boys' scores fall below 24.16, the girls' median.
300-219.72 ~80.28. Accordingly, ~* or 26.76%— approximately 27%—
of the boys reach or exceed the median score of the girls.
44 STATISTICS IN PSYCHOLOGY AND EDUCATION
Again, we find from comparing the o-'s that the middle 2/3
of the boys scored between 21. 99 ±3. 63, i.e., between 18 and
25 (approximately) and that the middle 2/3 of the girls scored
between 24.18±5.12, i.e., between 19 and 29 (approximately)
on the scale. In spite of the difference in averages and
medians, therefore, it is evident from the measures of varia-
bility that the boys and girls scored over almost exactly the
same part of the scale.
To compare the variability of the boys as a group with that
of the girls, we must compute the coefficients of variation.
These are
„ « T7 3.63X100 ir -
For Boys: V= g^— = 16.5.
For Girls: F=5-^**00 = 21.2.
24.18
16 5
Expressed as a per cent, the boys are 91 ' or 78% as variable
as the girls.
3. The Comparison of Two Groups in Terms of Overlapping
A second way of showing how alike, or unlike, two groups
are in their performance on a given test is to state the amount
of overlapping in the distributions of scores made by the two
groups. This information serves as a valuable supplement
to that secured from a comparison of central tendencies and
variabilities. Overlapping is usually measured by the per cent
of the one group which reaches or exceeds the median of the
other. In the present problem we may compute the per cent
of boys who reach or exceed the median score of the girls.
The calculation of this measure of overlapping is as follows.
First, we add up the boys' scores from the small end of the
distribution to find how many fall below 24 . 16, the girls'
median. Two hundred and seventeen boys, 10+79 + 128,
score below 24, the lower limit of the step 24-28. To find
how many score below 24.16, we divide the 68 scores on this
THE FREQUENCY DISTRIBUTION 45
step-interval by 4 (the length of step) and multiply the result
(17) by .16 in order to find how far beyond 24 we must go to
reach the point 24 . 16. The result of this last calculation is
2.72, and accordingly a total of 217+2.72 or 219.72 of the
boys' scores out of the total 300 fall below 24.16, the girls'
median score. If we subtract 219.72 from 300, it follows that
80.28 of the boys' scores lie above 24. 16. It is clear, then, that
80 28
' or 27% of the boys score at or beyond the girls' median.
oUU
(See Table'VII.)
Summarizing the results from Table VII and the discus-
sion of the preceding paragraphs, we find that the difference
between the average boy and average girl is 2. 19 points in favor
of the girls, and that the difference between the median boy
and median girl is 2.25 points in favor of the girls. Twenty-
seven per cent of the boys reach or exceed the median score of
the girls; 100% of the boys and 92% of the girls score within
the same limits on the scale; the middle 2/3 of the boys score
between 18 and 25, and the middle 2/3 of the girls score between
19 and 29. The obvious conclusion from these data seems to
be that individual differences within either group — between
boy and boy or between girl and girl — are probably of more
importance (because greater) than the differences between
boy and girl indicated by the averages or medians taken alone.
VI. The Calculation of the Percentiles in a Frequency
Distribution
We have already found it necessary in finding the quartile
deviation, Q (see page 18) to calculate Qi, the first quartile
or 25th percentile, and Qz, the third quartile, or 75th percentile.
It is often very useful to know, in addition to these points,
the ten decile points in the distribution as well, viz., the 10th,
the 20th, the 30th, the 40th, etc., percentile points. These
values are calculated in exactly the same manner as the median
and the quartiles. As the 25th percentile, for example, was
4G STATISTICS IN PSYCHOLOGY AND EDUCATION
found by counting off 1/4 of the scores from the small end of
the distribution, and the 50th percentile (the median) by count-
ing off 1/2 of the scores, in exactly the same way the 10th
percentile is found by counting off 1/10, and the 20th percentile
by counting off 2/10 of the scores from the small end of the dis-
tribution. Percentiles are of considerable value in enabling
us to compare the standing of different individuals in a number
of tests, or to combine the standing of the same individual in
different tests (see page 278 for a fuller discussion of this).
Table VIII gives the method of calculating the percentiles
in the distribution of 54 Army Alpha scores taken from Table I.
The 10th percentile, 147, is located by finding 10% of 54,
and counting off 5.4 scores from the small end of the distribu-
tion. In like manner, the 20th percentile, which is 2/10 or
10.8 scores from the small end of the distribution is located
at 155.67. The 20th percentile score is taken as 155. This
is due to the fact that a score of 155 in a continuous series
means "155 up to 156" and consequently 155.67 falls on score
155, just as 160.25, the 30th percentile point, falls on score
160.1 The other percentile points, and their scores, are
tabulated in Table VIII.
A word should be said with regard to the calculation of the
0 and 100th percentiles. These values are the lowest and the
highest scores, respectively, in the distribution. For example,
we find from the original scores in Table I that the lowest
score is 126 and the highest 201. Therefore, the 0 percentile
falls at 126 and the 100th at 201.
Note the column in the table marked Cum. F (cumulative
frequency) . The entries in this column were obtained by adding
the scores (the F) serially beginning with those on step 125-129 :
e.g., 2+0 = 2; 2+2=4; 4+1 = 5, etc. From this column
we can quickly tell how far we must count into the distribution
in order to reach any percentile point. For example, the 70th
percentile is 37.8 scores from the beginning of the distribution;
1 This applies also to the median and the quartilep in a distribution of scores
in continuous series.
THE FREQUENCY DISTRIBUTION
47
TABLE VIII
To Illustrate the Calculation of the Percentiles in a
Frequency Distribution
1. data from table i
Scores
F
Cum. F
Percentiles
Scores
200-204
1
54
100
201
195-199
4
53
90
194
190-194
2
49
80
188
185-189
10
47
70
185
180-184
3
37
60
179
175-179
8
34
50
175
170-174
3
26
40
167
165-169
3
23
30
160
160-164
4
20
20
155
155-159
6
16
10
147
150-154
4
10
0
126
145-149
1
6
140-144
1
5
135-139
2
4
130-134
0
2
125-129
2
2
N~-
= 54
CALCULATIONS :
10% of 54 =
5.4
4
145 + —
-X5 = 147
20% of 54 = 10.8
30% of 54 = 16.2
40% of 54 = 21.6
50% of 54 = 27
60% of 54 = 32.4
70% of 54 = 37.8
80% of 54=43.2
90% of 54 = 48.6
155 + ^-X5 = 155.67 (155)
160 + ^-X5 = 160.25 (160)
165+^X5 = 167.67 (167)
175+ I X5 = 175.626 (175)
o
6 4
175+-^-X5 = 179
185+Io x5 = 18540 (185>
185+^X5 = 188.1 (188)
190+^X5 = 194
48
STATISTICS IN PSYCHOLOGY AND EDUCATION
TABLE VIII— Continued
2. DATA FROM "A SCALE OF PERFORMANCE TESTS," BY PINTNER AND
PATTERSON, PAGE 133. SCORES MADE BY 72 NINE-YEAR OLDS ON THE
SUBSTITUTION TEST (iN SECONDS).
Scores (sec.)
F
Cum. F
Percentiles
Scores
80-89
1
1
100
80
90-99
2
3
90
108
100-109
5
8
80
121
110-119
5
13
70
126
120-129
13
26
60
133
130-139
9
35
50
141
140-149
6
41
40
152
150-159
11
52
30
158
160-169
5
57
20
172
170-179
3
60
10
192
180-189
4
64
0
219
190-199
3
67
200-209
2
69
210-219
3
72
N = 72
calculations:
10% of 72 (90th percentile
20 % of 72 (80th percentile
30% of 72 (70th percentile
40% of 72 (60th percentile
50% of 72 (50th percentile
60% of 72 (40th percentile
70% of 72 (30th percentile
80% of 72 (20th percentile
90% of 72 (10th percentile
= 7.2 100+^X10 = 108.4 (10S)
o
= 14.4 120+^X10 = 121
= 21.6 120+^X10 = 126.6 (126)
=28.8 130+^X10 = 133
= 36
140+ -r X10 = 141.67 (141)
o o
= 43.2 150+j^X10 = 152
= 50.4 150+j^Xl0 = 15S.5 (15S)
= 57.6 170+ -- X10 = 172
= 64.8 190+ 4 X10 = 192.67 (192)
THE FREQUENCY DISTRIBUTION 49
hence it is clear from the Cum. F's that 37 scores will take us
to 185 — upper limit of step 180-184 — and that the 70th
percentile lies on step 185-189.
When once the percentile table has been drawn up, it is a
relatively simple matter to find the percentile corresponding
to any given score. In our problem, for instance, the man
who makes a score of 177 falls on the 55th percentile — midway
between the 50th (175) and the 60th (179) percentiles; while
the man who scores 158 has a percentile score of 26, six tenths
of the interval between the 20th percentile (155) and the
39th percentile (160). Other interpolations may be easily
made in like manner.
In Table VIII (2) the percentiles have been calculated for
the distribution of scores (in seconds) made by seventy-two
9-year olds on the Woodworth- Wells Substitution test.1 As the
scores are in time-units, the lowest score is the best (the
quickest) performance, while the highest score is the worse (the
slowest) performance. Consequently, the percentile scale is
reversed: we count from the 100th percentile down instead
of from the 0 percentile up. To find the 90th percentile for
example, we count in 7.2 (10% of N) from 80-89 until we
reach 108.4 (score 108). Counting in two tenths of N from
80-89, we reach 121, the 80th percentile. The 100th per-
centile is taken at 80, theoretically the fastest record; the 0
percentile at 219, the poorest record.
From the percentile table we may say that a 9-year old who
completes the Substitution Test in 141 sees, has a percentile
score of 50 — stands at the median of the group; while a child
of 9 who takes 181 sees, to complete the test sjtands 15th in
the group — midway between the 10th percentile (192) and the
20th percentile (172).
1 Pintner and Patterson: A Scale of Performance Tests, 1921, p. 133.
50 STATISTICS IN PSYCHOLOGY AND EDUCATION
VII. When to Use the Various Measures of Central
Tendency and Variability
The beginner in statistics is often at a loss to know which
measure of central tendency or variability to use. The following
summary will serve as a guide for most of the problems which
the student will ordinarily meet :
1. When to Use the Average, Median, and Mode
1. Use the Average:
(1) When each score or measure should have equal
weight in determining the central tendency.
(2) When the highest reliability is sought.
(3) When product-moment coefficients of correlation,
or measures of reliability are to be subse-
quently computed.
2. Use the Median:
(1) When a quick and easily computed measure of
central tendency is necessary.
(2) When there are extreme measures which would
affect the average disproportionately.
(3) When certain scores or measures should influence
the central tendency, but all that is known about
them is that they are above or below the central
tendency.
3. Use the Mode:
(1) When a quick approximate measure of concentration
is desired.
(2) When only the most often recurring score is sought.
2. When to Use the Range, Q, AD, and <r
1. Use the Range:
(1) When the data are too scant or scrapp3T to justify
the calculation of another measure of variability.
(2) When a knowledge of the total spread is all that is
necessary.
THE FREQUENCY DISTRIBUTION 51
2. Use the Q:
(1) For a quick, inspectional measure of variability.
(2) When there are scattered or extreme measures.
(3) When only the concentration around the central
tendency is sought.
3. Use the AD:
(1) When it is desired to weight all deviations accord-
ing to their size.
(2) When extreme deviations should not influence the
measure of variability.
4. Use o".
(1) When the highest reliability is desired.
(2) When it is desired that extreme deviations influence
the measure of variability.
(3) When coefficients of correlation or measures of
reliability are later to be computed.
VIII. Summary of Formulas for Finding the Measures of
Central Tendency and Variability
1. Measures of Central Tendency
I. Average:
A. Long Method:
(a) data ungrouped :
A 2 (Measures) ,_
Average = — — j= '- (1)
(b) data grouped :
Average = -A-^ — - (2)
B. Short Method:
(a) data grouped :
Average = GA +C (Algebraic.)
c = 2(TO)(algebraic)xlengthofstep
52 STATISTICS IN PSYCHOLOGY AND EDUCATION
2. Median:
Arrange the measures in order of size, and count off
1/2 of the measures beginning at the small end of
the series.
3. Mode:
For Crude Mode take most frequent score, or mid-
point of atep with largest frequency.
2. Measures of Variability
1. Range = (largest measure) — (smallest measure).
2. Quartile Deviation:
Q=Qj^-, (3)
3. Average Deviation:
A. Long Method :
(a) data ungrouped :
. n 2D (arithmetical) fA.
AD— jy -, (4)
(b) data grouped :
. ~ 2FD (arithmetical) /rN
AD= K—^ ', (o)
B. Short Method:
(a) data grouped :
,n 2FD+c(Fl-Fg)„, ., , , fQ.
AD = ^ -X length of step, . . (8)
4. Standard Deviation:
A. Long Method :
(a) data ungrouped :
'->Sr. (6)
(b) data grouped :
H
N
.-^ m
THE FREQUENCY DISTRIBUTION
53
B. Short Method:
(a) data grouped:
(T=V
ZFD2
N
c2 X length of step, .... (9)
5. Coefficient of Variation:
100(7
V
Average'
IX. Illustrative Problems
(10)
The following problems illustrate the calculation of the
average, median, mode, Q, AD, and o- for continuous and
discrete series. They are given as examples of the Short
Method, and should be carefully reviewed by the student.
Example I
Calculation of the Average, Median, Mode, Q, AD,
and SD.
Step
» = 7
Measures
Midpoint
F D
FD
FZ)2
145-151.99
148.5
1 1
6
6
36
138-144.99
141.5
1
5
5
25
131-137.99
134.5
2
4
8
32
124-130.99
127.5
2
►F*7=34 3
6
18
117-123.99
120.5
3
2
6
12
110-116.99
113.5
10
1
10+41
10
103-109.99
106.5
Av
=
15
96-102.99
99.5
106
.26
14 1
6
3
-1
-14
14
89- 95.99
82- 88.99
92.5
85.5
>Fi = 25 Z\
-12
- 9
24
27
75- 81.99
78.5
2J
-4
- 8-43
32
N
= 59
84
230
N
2
= 29.5
GA = 106.5
2
C="59 =
AD=Si+<
-.034)[25-
59
-34]
X7
-.034
t
:2 = .
001
AD = 10.00
C=-. 034X7= -.238
Average = 106 . 5 -f- ( - . 238) = 106 . 26
Median = 103 + ~X7 = 105. 10
15
.= J?30.
V 59
.001X7
er = l. 97X7 = 13. 79
Mode = 106. 50
N
4=14.75
f=44.25
[
[
Qi=96+^-X7 = 97.875
#3 = 1104
14
4.25
Q = 7.55
10
X7 = 112.975
54 STATISTICS IN PSYCHOLOGY AND EDUCATION
Example II
Calculation of Average, Median, Q and SD. Step = 1
Soores
22-22.9
21-21.9
20-20.9
19-19.9
18-18.9
17-17.9
16-16.9
15-15.9
14-14.9
13-13.9
12-12.9
11-11.9
10-10.9
9- 9.9
8- 8.9
7- 7.9
6- 6.9
5- 5.9
4- 4.9
3- 3.9
2- 2.9
1- 1.9
F
1
7
16
35
81
172
330
600
1,031
1,793
2,572
2,951
3,187
3,319
2,891
2,149
1,315
684
302
112
38
10
# = 23,596
N
,J =11,798
GA = 10.5
-2234
c=-
23,596
C=-.09
Average = 10.41
= -.09
D
12
11
10
9
8
7
6
5
4
3
2
1
0
-1
-2
-3
-4
-5
-6
-7
-8
-9
c2 = .008
Median = 10
978
'3187
Xl = 10.31
FD
12
77
160
305
648
1,204
1,980
3,000
4,124
5,379
5,144
2,951+24,984
•3,319
-5,782
•6,447
5,260
■3,420
■1,812
• 784
- 304
- 90-27,218
-2,224
FD*
144
847
1,600
2,745
5,184
8,428
11,880
15,000
16,496
16,137
10,288
3,319
11,564
19,341
21,040
17,100
10,872
5,488
2,432
810
1S0,715
,1S0, 715 „„
V 23^96 -00SX1
r^= 5,899] q1==8+iii?>
[^= 17,697] <?.«
2891
7QQ
12+^X1=12.29
25/2
= 2.77
Q = 1.92
THE FREQUENCY DISTRIBUTION
55
Example III
Calculation of Average, Median, Mode, Q, AD, SD, for Discrete Series
Step = 1
Measures
F
21
21
22
1
23
4
> Fl
24
9
25
Average
"~ =25.036
26
21,
11
\
27
28
6
1
■ Fg
29
_^j
N =
56
N
2
28
GA=25
5o
(
; c-=.ooi
Average = 25 . 04
Median =25
Mode = 25
[?-»]
Qi=24
Of*-]
&
=26
D
-4
-3
-2
-1
1
2
3
4
FD
FD
-8
32
-3
9
-8
16
-9-28
9
11
11
12
24
3
9
4+30
16
58
126
AD = 58+. 036(37-19) xl
5o
4D = 1.05
<r = 1.50
O-i.o
56
STATISTICS IN PSYCHOLOGY AND EDUCATION
PROBLEMS
1. Tabulate the following scores into three frequency distributions,
using class-intervals of 3, 5, and 10 units respectively.
Scores made on the Thorndike Entrance Examination by 100
applicants for admission to Columbia College. (From Sommerville,
R. C: Physical, Motor and Sensory Traits, Archives of Psychology,
75, 1924.) Note: — Fractions have been dropped.
2.
63
80
75
90
81
83
78
81
83
83
89
98
46
90
103
81
71
93
82
78
86
85
73
83
74
86
84
72
63
76
103
78
85
81
105
94
78
101
76
98
74
75
88
65
80
81
98
56
103
90
92
85
78
73
87
75
102
58
78
95
73
73
73
96
83
110
95
90
87
86
96
98
82
86
70
70
95
71
89
86
85
72
94
92
73
84
79
74
88
72
92
86
93
84
50
85
76
82
99
91
The following distributions represent the scores made on a logical
memory test by two racial groups, A and B.
(1) Find the average, median, Q and SD of each distribution.
(2) What per cent of group A reaches or exceeds the median of
group B?
(3) Compare the relative variability of the two groups by means
of their coefficients of variation.
Scores
Group A
Group B
79-83
6
8
74-78
7
8
69-73
8
9
64-68
10
16
59-63
12
20
54-58
15
18
49-53
23
19
44-48
16
11
39-43
10
13
34-38
12
8
29-33
6
7
24-28
3
2
# = 128
# = 139
THE FREQUENCY DISTRIBUTION
57
3. Compare the 30th, 60th, and 90th percentile scores in Group A
[problem (2)] with the corresponding percentile scores in
Group B.
4. The following problems are given for the purpose of affording
practice in finding measures of central tendency and measures of
variability. In every case where the Average, AD, or SD is to
be found, use the Short Method.
(1) Find the Average
! and
SD.
Scores
F
70-71
2
68-69
2
66-67
3
64-65
4
62-63
6
60-61
7
58-59
5
56-57
4
54-55
2
52-53
3
50-51
1
(2) Find the Median and AD
(from the Median.)
Scores
90-94
85-89
80-84
75-79
70-74
65-69
60-64
55-59
50-54
45-49
40-44
iV = 39
F
2
2
4
8
6
11
9
7
5
0
2
iV = 56
(3) Find the Average, AD,
and SD.
Scores
F
120-122
2
117-119
2
114-116
2
111-113
4
108-110
5
105-107
9
102-104
6
99-101
3
96-98
4
93-95
2
90-92
1
(4) Find the Average and SD.
(Discrete Series.)
Scores
80 '
79
78
77
76
75
74
73
72
71
2V = 4Q
F
1
3
3
6
8
7
3
4
2
1
iV=38
58
STATISTICS IN PSYCHOLOGY AND EDUCATION
(5) Find the Median and Q. (6) Find the Average, Median and SD.
Scores
F
Measures
F
100-109
5
80-84
8
90-99
9
75-79
14
80-89
14
70-74
19
70-79
19
65-69
24
60-69
21
60-64
29
50-59
30
55-59
27
40-49
25
50-54
26
30-39
15
45-49
28
20-29
10
40-44
20
10-19
8
35-39
15
0-9
6
30-34
10
# = 162
# = 220
2. (1)
Answers
Group A
Group B
Average
53.88
56.21
Median
52.70
56.64
Q
9.64
9.90
SD
13.82
13.73
(2) 39% of Group A reaches or exceeds the median of Group B
(3) Coefficient of Variation, Group A = 25. 64; Group B =24.43 ;
Group B is 95 . 3% as variable as Group A.
3.
Group A
Group B
30th percentile score
46
49
60th percentile score
56
60
90th percentile score
74
75
(1)
Average = 61.26
£D= 4.99
(2)
Median = 67.27
AD= 8.97
(3)
Average = 106. 5
AD= 5.55
SD = 7.2S
(4)
Average = 75.66
SD= 2.11
(5)
Median = 55.67
(3 = 16.41
(6)
Average = 57.0
Median = 57. 04
£D = 13.17
CHAPTER II
GRAPHIC METHODS AND THE NORMAL CURVE
I. The Graphic Representation of the Frequency
Distribution
We learned in the last chapter how scores or other measures
of capacity may be organized and condensed into the tabular
arrangement called a frequency distribution. In addition
we found how such arrangement aids us in calculating measures
of central tendency and variability, and, in general, gives us a
better idea of the facts as a whole. Still further aid in analyzing
numerical data may be secured by a graphic or pictorial treat-
ment of our material. The advertiser has long recognized
the power of the illustration to catch the eye and hold the
attention where the most careful array of statistics fails. And
in like manner, the statistician, through the medium of dia-
grams and graphs^ attempts to utilize the attention-getting
power of visual presentation and at the same time to translate
numerical facts — often abstract and difficult of interpretation —
into a more concrete and understandable form.
There are three methods of representing graphically — i.e.,
of " plotting " — measures which have been grouped into a
frequency distribution. The first method gives the Frequency
Polygon; the second the Histogram or Column Diagram;
and the third, the Ogive, or cumulative frequency graph.
These will be considered in order.
1. The Frequency Polygon
Before outlining the method of constructing a frequency
polygon, it might be well to review briefly the simple algebraic
principles which apply to all graphical representation of
59
60
STATISTICS IN PSYCHOLOGY AND EDUCATION
Y
F
a,
3)
0)
CO
II
'<&
o
a
bs<
jiss
a
0
JC
numerical data. Graphing or plotting is done with reference
to two lines or " coordinate axes," the one the vertical or
F-axis, the other the horizontal or X-axis. These basic lines
are perpendicular to each other, the point where they inter-
sect being called 0, or the origin " (see Diagram II). To
locate or "plot" a point "P" whose coordinates are x =4,
and 2/ = 3, we go out from the origin 4 units on the X-axis, and
up from the origin 3 units
on the F-axis, and, where
the perpendiculars to these
points intersect, locate the
point P (see Diagram II).
In like manner, any point
whose x and y values are
known can be located
with reference to OY and
OX, the coordinate axes.
Distances measured along
the X-axis are commonly
called abscissas, and dis-
tances along the Y-axis ordinates.
We may now show how these principles of graphing apply
to the construction of the frequency polygon shown in Diagram
III (1). This graph pictures the frequency distribution of
Table I. The limits of the step-intervals (the abscissas)
are laid off at regular intervals along the base line (the X-axis)
from the origin; and the frequencies within each interval
(the ordinates) are measured off on a scale along the F-axis.
There are 2 scores on the first step, 125-129 (see Table I).
To represent these on our diagram, we go out on the X-axis
to 127.5 — midway between 125 and 130 — and up 2 F-units.
Here we locate the first point. The frequency on the next
step-interval, 130-134 is 0; hence the second point falls mid-
way between 130 and 135 directly on the X-axis. The 2
scores on step 135-139, the 1 score on step 140-144, and the
frequency on each succeeding step is, in every case, represented
DIAGRAM II
The Use of Coordinate Axes
X and Y.
GRAPHIC METHODS AND THE NORMAL CURVE 61
to fi
.2
o
a
3 D
o
ll
1
|
, ,
i
a:
/
V
i
/ec
^r
!
*—
%
II
jj1
p
u
0
1
>
s
1
r
S
1
-
,
120 125 130 135 140 145 150 155 160 165 170 175 130 185 190 195 200 205 210
Scores
DIAGRAM III (1)
Frequency Polygon Plotted from Distribution
of 54 Scores in Table I
J.U
9
.
8
7
S 6
a
§5
1
o
&4
o
3
I
oc
r-H
c
?.
i
ral
II
2
W5
-31
o
r-i
ir
1
— R
II
1
>
<
f
,
!
j
120 125 130 135 140 145 150 155 160 165 170 175 180 185 190 195 200 205 210
Scores
DIAGRAM III (2)
Histogram Plotted from Data in Table I.
G2 STATISTICS IN PSYCHOLOGY AND EDUCATION
by a point the specified number of scores (Y-units) above the
X-axis, and midway between the upper and lower limits of
the step on which it lies. It is important to remember in plot-
ting a frequency polygon that the midpoint of the step is always
taken to represent all of the scores within that interval. The
heights of the ordinates at the different midpoints represent
the frequencies within the intervals.
When all of the points have been located they are joined
in regular order to give the outline of the frequency polygon
shown in Diagram III (1). In order to complete the figure,
note that the step next below the lowest (125-129) and the
step next above the highest (200-204) are included on the
X-scale. The frequency of each of these steps is taken as 0;
and in consequence the frequency polygon begins and ends on
the X-axis.
The distance taken to represent a step-interval on the
X-axis will usually depend on the width of the cross section
paper used and on the number of steps in the distribution.
No general rule can be given for the choice of an X-unit: nor
for the choice of the unit taken to represent 1 score on the
F-axis. The length of the diagram, and the maximum fre-
quency on any given step (as, for example, the 10 scores on
step 185-189) will generally serve to indicate within what
practical limits the F-unit must be selected. After plotting
several polygons, the student will soon discover that a too-
long F-unit exaggerates the changes in the distribution from
step to step, while a too-short F-unit makes the graph too
flat. In like manner, a too-long X-unit tends to stretch
out the polygon, while a too-short X-unit crowds the separate
points on the frequency surface and makes comparisons
difficult.
The total frequency (N) of the distribution is represented
by the area of the polygon: that is, by the area between the
boundary or frequency surface and the base line. The area
of any given interval cannot be taken as proportional to the
number of cases within the interval, however, because of the
GRAPHIC METHODS AND THE NORMAL CURVE 63
numerous irregularities in the distribution, and consequently
of the frequency surface.
To show the position of the average, median, and mode
on the graph, we must first locate these values on the X-axis,
and then erect perpendiculars as shown in the diagram. Note
that the mode is easily located as the highest point on the*
frequency surface.
The steps involved in constructing a frequency polygon may be
summarized as follows:
1. Draw two straight lines perpendicular to each other,
the vertical line near the left side of the paper, the horizontal
line near the bottom. Call the vertical line — the F-axis —
OY, and the horizontal line — the X-axis — OX. Put the 0
where the two lines intersect. This point is called the origin.
2. Lay off the step-intervals of the frequency distribution
at regular intervals along the X-axis. Begin with the lower
limit of the step next below the lowest as the origin, and end
with the upper limit of the step next above the highest. Label
the successive X-points with the step limits. Select as the
X unit a distance which will permit all of the steps to be
represented on the one graph.
3. Mark off on the Y-axis successive unit distances to
represent the scores on the different steps. Choose a scale
which will permit the maximum frequency to be represented
on the graph.
4. From the midpoint of each step-interval on the X-axis,
go up in the Y direction a distance equal to the number of
scores on the step. Place a point here.
5. Join the points plotted in (4) with straight lines to give
the frequency polygon.
2. The Histogram or Column Diagram
A second method of representing a frequency distribution
graphically is to construct a histogram or column diagram.
This type of graph is illustrated in Diagram III (2), with the
same distribution of scores represented by the frequency
polygon in Diagram III (1). The two graphs are constructed
64 STATISTICS IN PSYCHOLOGY AND EDUCATION
in much the same way with this important difference: that
whereas, in a frequency polygon, all of the scores within a
given interval are represented by the midpoint of that interval,
in the histogram the assumption is made that all of the scores
within an interval are spread uniformly over the entire interval.
For this reason, the measures within any given interval in a
histogram are represented by a rectangle constructed with
base equal to the length of the step-interval, and altitude
equal to the number of measures within the interval. Thus [see
Diagram III (2)] the 2 scores on step 125-129 are represented
by a rectangle with base equal to the length of step-interval
on the X-axis, and altitude equal to 2 units measured off on
the F-axis. As there are no scores within the next interval
130-134, no rectangle is drawn here. The altitudes of the
other rectangles vary with the number of scores on the intervals.
When the same number of scores occur on two (or more)
adjacent steps, as in the intervals from 140 up to 145 and from
145 up to 150, the base of the rectangle covers two (or more)
intervals on the X-axis. The highest rectangle is, of course,
that which has the step 185 up to 189 as its base and 10, the
maximum frequency, as its altitude. In selecting scales for
the X- and F-axes, the same considerations as to numbers of
intervals, size of paper, maximum frequency, etc., noted under
the frequency polygon, must be observed.
Although in a histogram each step-interval is represented
by a separate rectangle, it is not necessary to project the sides
of these different rectangles to the base line, as shown in
Diagram III (2), as the rise and fall of the boundary line showing
the increase or decrease in the number of scores from step to
step is usually the important fact to be brought out. As
in the frequency polygon, the total frequency (N) is represented
by the area of the histogram. In contrast to the frequency
polygon, however, the area of each rectangle in a histogram is
directly proportional to the number of measures in the interval,
so that we have in the column diagram an accurate picture
of the number of scores falling on each step.
GRAPHIC METHODS AND THE NORMAL CURVE 65
In order to make easier a comparison of the two types of
frequency graph, the distribution of Table III is plotted in
Diagram IV, on the same coordinate axes, both as a frequency
polygon and a histogram. The increased number of cases
and the more symmetrical distribution of scores make both
52
1
'?
\
5U
rrr
\
4o
i
\
4b
1
\
44
/
\
4^
/
r
1
4U
i
\
OO
/
\
o4
32
/
\
\
/
\
-2 oO
§28
/
\
\
/
\
p <so
£24
/
/
\
/
C-i
CM
X
J* 22
9(1
/
OS
\
\
/
/
T— 1
III
II
— CD
— C
\
lo
16
14
19
/
K
\
\
/
/
5*
\
/
<H
3?
\
in
/
ci
\
g
>
/
i— t
||
\
a
7^
p
>
s
/
1
<l
L^H
&
/
/
^
100 104
103
112
116
120 124
Scores
128
132
136
140
144
DIAGRAM IV
Plotting op Frequency Polygon and Histogram.
[Data from Table III (2)].
of these graphs more regular in appearance than the graphs
of Diagram III.1
The question of when to use the frequency polygon and
when to use the histogram cannot be answered, unfortunately,
by giving a general rule which will cover all cases. The
frequency polygon is less exact than the histogram in that
it does not represent accurately— i.e., in terms of area— the
1 Other examples of frequency polygons and histograms may be found on
page 75.
6G STATISTICS IN PSYCHOLOGY AND EDUCATION
number of measures on the successive step-intervals. For
comparing two or more distributions plotted on the same
diagram, however, the frequency polygon is probably the more
useful, since the many vertical lines in the histogram often
coincide. Both the histogram and the frequency polygon
tell the same story, and both are useful in enabling us to show
in a graphic fashion whether the scores of a group distribute
uniformly over the scale, or whether they pile up at the low
or the high end. Not only information with regard to the
group but information with regard to the test may be thus
secured. If a test is too easy, the scores will fall dispropor-
tionately at the high end of the scale; if too hard at the low
end. If the test is neither too hard nor too easy, the scores
will tend to be symmetrically distributed, a few individuals
scoring high, a few low, and the majority scoring somewhere
near the middle of the scale. In this last case, the frequency
polygon or histogram approximates the " ideal " or normal
frequency distribution (see page 76).
3. The Ogive
The ogive, or cumulative frequency graph, is a third
way of representing a frequency distribution by means of a
diagram. Before we can plot an ogive, the scores of the distri-
bution must first be added serially or cumulated, as shown in
Table IX for the two distributions taken from Table II (1
and 2). (These two distributions have already been used to
illustrate the frequency polygon and histogram in Diagrams
III and IV.) Note, that the first two columns in Table IX
are exactly the same as in any frequency distribution, but
that in the third column the scores have been " accumulated "
successively from the low end of the distribution as described
on page 46. The last cumulative score is, of course, equal
toiV.1
1 Cumulative distributions are useful also in telling quickly how many in a
group scored above or below a certain point on the scale. In Table IX, for
example, we read that 10 men in the group made Alpha scores below 155, 47
below 190, etc.
GRAPHIC METHODS AND THE NORMAL CURVE 67
125 130 135 140 145 150 155 160 165 170 175 180 185 190 195 200
Step-Intervals
DIAGRAM V (1)
Ogive Curve. Data prom Table II (1).
205
200
_
100
180
(
90
Frequencies
§ S 8
80
70
60
|100
1 80
50
40
/l
/ 1
/ l
i 60
1
1
-
30
40
M<in.
1
-
20
20
m. t _
i
1
j
I
i
i
r
ail a
-
10
i
14
108
112
116
120 124
Step-Intervals
128
132
136
14
0
DIAGRAM V (2)
Ogive Curve. Data prom Table II (2).
68 STATISTICS IN PSYCHOLOGY AND EDUCATION
The two ogives which represent the distributions of Table
IX are shown in Diagram V (1 and 2). Consider first the
ogive of the 54 Alpha scores shown in (1). The step-intervals
of the distribution have been laid off along the X-axis, and
successive distances equal to the total number of scores in the
distribution (here 54) have been laid off on the F-axis. It will
be remembered in plotting the frequency polygon that the
frequency of each step was taken at the midpoint of the step-
interval; in constructing an ogive, however, each cumulative
TABLE IX
Cumulative Frequencies
OF THE
Two Distributions
in Table 11
(For Plotting the Ogives of Diagram V)
(1)
(2)
Measures
F
Cum. F
Measures
F
Cum. F
200-204
1
54
136-139
3
200
195-199
4
53
132-135
5
197
190-194
2
49
128-131
16
192
185-189
10
47
124-127
23
176
180-184
3
37
120-123
52
153
175-179
8
34
116-119
49
101
170-174
3
26
112-115
27
52
165-169
3
23
108-111
18
25
160-164
4
20
104-107
7
7
155-159
6
16
150-154
4
10
iV=200
145-149
1
6
140-144
1
5
135-139
2
4
130-134
0
2
125-129
2
iV = 54
2
frequency must be plotted at the upper limit of the step on which
it falls. The first point on the curve, for example, is 2 Y-
units (the cumulative frequency on step 125-129) above 130;
the second point is 2 7-units above 135, the third, 4 7-units
above 140, and so on to the last point which is 54 7-units above
205. The plotted points are joined in order to give the ogive.
Note that the curve begins at 125 on the A"-axis, and ends at
205 just 54 7-units above the X-axis.
GRAPHIC METHODS AND THE NORMAL CURVE 69
Because the sample is small and the distribution of scores
unsymmetrical, the ogive in (1) is somewhat jagged in outline.
To eliminate such irregularities as these and to facilitate later
computations, we often " smooth " an ogive by sketching in a
smooth curve through as many of its points as possible. The
dotted line in Diagram V (1) shows the result of this smooth-
ing process. If the sample is large, and the measures well
distributed, smoothing is often unnecessary [see Diagram
V (2)].
The ogive in Diagram V (2) has been plotted from the
distribution in Table IX (2), as described above. It offers
no new difficulties and need not be considered in any detail.
Note that the curve begins at 104, the lower limit of the first
step, and ends at 140, the upper limit of the last step on the
scale; also that the cumulative F% 7, 25, 52, etc., have all
been plotted at the upper limits of their respective step-intervals.
This ogive does not require any smoothing as the distribution
which it represents is very symmetrical.
The ogive has been less frequently used by workers in exper-
imental psychology and education than either the frequency
polygon or the histogram, and is probably somewhat more
difficult for the general reader to interpret. It has, however,
several distinct advantages. In the first place, unlike the
other frequency graphs, the shape of the ogive remains prac-
tically the same when the size of the step-interval varies.
Furthermore, while the frequency polygon and histogram can-
not be compared unless the step-intervals are the same, this
restriction does not apply to the ogive.
Probably the chief value of the ogive to the student of
mental measurement lies in the relative ease with which
percentile values may be calculated from the curve. The
method of getting these values is illustrated in Diagram V (1
and 2). First, a perpendicular is erected on the X-axis at
the upper limit of the last step-interval, and continued until
it reaches the curve. (In the first ogive this perpendicular will
be erected at 205.) Next, this line between the curve and the
70 STATISTICS IN PSYCHOLOGY AND EDUCATION
X-axis is divided into 10 equal parts (by means of a compass
or mm. rule) and the points of division labeled 10, 20, 30, 40,
50, 60, 70, 80, 90, and 100 (the 100 point lies on the curve,
the 0 point on the X-axis). These points are used to locate the
10 decile points in the distribution. To find the second
decile, or 20th percentile, for example, we draw a line from the
second point, i.e., from 20, parallel to the X-axis, and where
this line cuts the curve, drop a perpendicular to the X-axis.
Individuals in Order
DIAGRAM VI
Another Way of Constructing an Ogive. The Individuals are
Arranged in Order Along the Baseline, Each Man's Score
Being Marked Off on the Ordinate Above Him.
This perpendicular locates the 20th percentile on the A'-scale.
The other percentiles and quartiles may be found in the same
way. Notice in ogive (1) that the 0 percentile is 125 — theo-
retically the lowest score in the distribution — and that the
100th percentile is 205 — theoretically the highest score in the
distribution.
The student should compare the percentile values obtained
from the ogive with the same values as calculated in Table
VIII (1). Due to the greater smoothness of the curve, the
GRAPHIC METHODS AND THE NORMAL CURVE 71
percentiles obtained from ogive (2) will be more accurate than
those got from the ogive (1).
The accuracy with which we are able to obtain the
percentiles graphically will depend, in general, on the accuracy
with which the points of the curve have been plotted, the fine-
ness of the scale, the number of cases, and the symmetry of
the distribution.
Another way of constructing an ogive is shown in Diagram
VI, with the data of Table IX (1). Imagine the 54 individuals
in the distribution arranged along the baseline according to
the size of their scores, the score of each man being marked
off on the ordinate above him. When these points are joined
by straight lines, we have a series of rectangles of the histogram
type, the base of each rectangle representing the number of
men making the given score, the height of each rectangle
representing the size of the score. A smooth curve may be
sketched through (or as near as possible to) the midpoint
of the upper base of each rectangle — as shown in the diagram —
to give an ogive curve. From this ogive, percentiles may be
easily found. To get the median, for example, we erect a per-
pendicular at 27 ( -d- J on the X-axis, and draw a line through
the point where this perpendicular cuts the curve parallel to
the X-axis to locate the median approximately at 175 on the
F-scale. The quartiles and the percentile points may be found
in exactly the same manner.
II. Other Uses of Graphical Methods — the Com-
parative Line Graph
Many problems in mental measurement, especially those
which involve the measurement of changes attributable to
growth, learning, practice, etc., readily lend themselves to
graphical treatment. Diagram VII illustrates several such
problems, in which the data are represented by " line graphs."
As in all graphs hitherto considered, the measures are plotted
72
STATISTICS IN PSYCHOLOGY AND EDUCATION
with reference to the coordinate axes, OY and OX, the coor-
dinates of a plotted point being its abscissa or X-distance,
and its ordinate, or F-distance.
Figure 1 illustrates the " age " or " growth " curve. It
10
11 12 13 14 .15 16 17 18 Ads.
Age
Fig. 1. — Logical memory. Age is represented on X-line (horizontal); score, e.g.,
number of ideas remembered, on F-line (vertical). (After Pyle.)
12 16 20 24 28 32 36
Weeks of Practice
40 44 48
Fig. 2. — Improvement in telegraphy. Weeks of practice on X-lines; number of
letters per minute on F-line. (After Bryan and Harter.)
DIAGRAM VII
Comparative Line Graphs.
represents the growth in logical memory (for a connected
passage) in boys and girls from 8 to 18 years old.
Figure 2 illustrates the " learning " or " practice " curve.
It shows the improvement in sending and receiving telegraphic
messages, resulting from successive trials at the same task
GRAPHIC METHODS AND THE NORMAL CURVE 73
over a period of weeks. Improvement is measured in terms
of the number of letters sent or received per minute.
Figure 3 is a " performance " or " practice " curve. It
represents 25 successive trials with the hand dynamometer
60 r
50
w
C
& 30
u
O
20
10
J L
j L
12345678
9 10 11 12 13 14 15 16 17 18 19 20 21
Trials
23 24 25
Fig. 3. — Hand dynamometer readings in kilograms for 25 successive grips at intervals
of 10 seconds. Two subjects, a man and a woman.
100 r
i i_
j_
lhr.91ir.24hr.
48 hr.
144 hr.
Fig. 4. — Curve of forgetting. The numbers on base line give hours elapsed from
time of learning; numbers along F-axis give per cent retained. (After Ebbinghaua.)
DIAGRAM VII
Comparative Line Graphs.
by one man and one woman. Note that the successive trials
are laid off on the X-axis, and the strength of grip (in kgs.)
on the F-axis. Graphs like these are useful in enabling us to
compare individuals or groups at various stages in the test' or
performance. They also enable us to study the effect of
fatigue with successive trials.
Figure 4 shows the well-known " curve of forgetting " (or
74 STATISTICS IN PSYCHOLOGY AND EDUCATION
retention). It represents memory retention, as measured by
the percentage of the original material retained after the
passage of different time intervals. The time intervals between
relearning are laid off on the X-axis; the per cent retained, as
shown by the relearning, on the X-axis.
III. The Normal Probability Curve
In Diagram VIII are shown four graphs — two frequency
polygons and two histograms — which represent frequency
distributions of data drawn from anthropometry, psychology,
and meteorology. It is at once apparent that all of these
graphs have the same general form — the measures are con-
centrated closely around the center, and taper off" from the
central high point, or crest, equally to right and left. In
general we find relatively few measures at the " low " score
end of the scale; an increasing number up to a maximum
at the midposition, and a progressive falling off as we go
toward the " high " score end of the scale. If we divide
the area under each curve (the area between the curve and
the X-axis) by a line drawn perpendicularly through the
central high point to the base line, the two parts will be
practically similar in form and equal in area. This results
from the fact that each curve shows almost perfect bilateral
symmetry. The perfectly symmetrical curve, or frequency sur-
face, to which all of the figures in Diagram VIII approximate,
is shown in Diagram IX. This bell-shaped curve is called
the Normal Probability Curve, or simply the Normal Curve,
and is of the greatest value in psychological measurement.
An understanding of its characteristics is essential to the
student of experimental psychology and measurement; and
consequently the rest of this chapter will be concerned with the
study of the properties and uses of the Normal Curve.
GRAPHIC METHODS AND THE NORMAL CURVE 75
saiouorib&i^
fl <r>
<BcO
u
T)
4>
bll
-fl
o
a
•n
r7
V
c3
O
03
s
S-i
CO
fl
H
3
u
flO>
"So
T3 o
«*-i
u
03 vim
o «
1 76 7.
shorn
, page
6 68 70 72 7
In Inches
85 adult male
(After Yule
i 6
ture
f 85
es.
^
^s
fir, ro1""1
\
fl OQ
\
58 6
1.— Sta
in Bri
V
sjityvig jo I'BAjtvjni qoni jed •ba.ij
o
DIAGRAM VIII
fa
fa
§3
oiaoiflOiooooiaoia
OO l~- t» <o «o >o iO •* •* 0-3 M(N
.«8
eo V
2*
a *-
««2
a
S to
<U 03
Sea
a> 83 _
<H . fl)
•NOW
Samples op Frequency Distributions Drawn prom Different Fields.
76
STATISTICS IN PSYCHOLOGY AND EDUCATION
1. Elementary Principles of Probability. The Derivation and
Construction of the Probability Curve
Perhaps the simplest approach to an understanding of the
Normal Curve is through a consideration of the elementary
facts of probability. As used in statistics, the " probability "
of the occurrence of an event may be defined as the expected
relative frequency of occurrence of the given event in a very
5C
%
v
1
68.26%
V
S
/
— 4PE.
S'-X
PE -
I
-2:
I
'E
-1]
DE
ll
>E
23
DE
3f
eV
4PE
*Y
— 3(T
-2<r
Sigma Scale
-Iff
0
Mean
+lff
+2ff
+ 3<r
DIAGRAM IX
Normal Probability Curve.
large (infinite) number of observations. This expected relative
frequency of occurrence may be based upon a knowledge of the
conditions determining the probable occurrence, as in dice
throwing or coin tossing, or upon empirical data, as in mental
and social measurements.
The probability of an event may be stated most simply,
perhaps, as a ratio; as, for example, when we say that the
probability of a coin falling heads or tails is 1/2, or that of a die
showing a two spot is 1/6. This ratio, called the " probability
GRAPHIC METHODS AND THE NORMAL CURVE 77
ratio," may be defined as that fraction the numerator of which
equals the expected outcome or outcomes and the denominator
of which equals the total possible outcomes. Such a ratio always
falls between the limits 0 (impossibility of occurrence) and
1.00 (certainty of occurrence). Thus the probability that the
sky will fall is 0; that an individual now living will some day
die is 1.00. Between these limits there are all possible degrees
of probability expressed by the probability ratio.
Let us now apply these simple principles of probability
to the specific case of what happens when we toss coins (coin
tossing and dice throwing furnish simple and often-used illus-
trations of the laws of chance). If we toss one coin, obviously
it must fall either heads (H) or tails (T) 100% of the time
and a head or tail is equally probable. Expressed as a ratio,
the probability of an H is 1/2; of a T, 1/2; and
(H-f-T), i.e., 1+|= 1.00.
Again, if we toss two coins, (a) and (6), at the same time
there are 4 possible arrangements which the coins may take:
(1)
(2)
(3)
(4)
a b
a b
a b
a b
H H
H T
T H
T T
That is, both coins (a) and (6) may fall H; (a) may fall H
and (b) T; (6) may fall H and (a) T; or both coins may fall T.
Expressed as a probability ratio, the chances of 2 heads are
1/4; of one head and one tail, 2X1/4 or 1/2; of 2 tails 1/4.
Let us go a step further and increase the number of coins
to three. If we toss three coins, (a), (6), and (c) simultaneously
there are 8 possible outcomes:
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
a b c
a b c
a b c
a b c
a b c
a b c
a b c
a b c
HHH
HHT
HTH
HTT
THH
THT
TTH
TTT
Expressed as a ratio, the chances of 3 heads are 1/8 (combina-
tion 1) ; of 2 heads and 1 tail 3/8 (combinations 2, 3, and 5) ;
78 STATISTICS IN PSYCHOLOGY AND EDUCATION
of 1 head and 2 tails 3/S (combinations 4, 6, and 7) ; and of
3 tails 1/8 (combination 8). In exactly this same way we can
figure the probability of different combinations when we have
4, 5, or any number of coins.
These probable outcomes may be secured in a very much
simpler way than by listing all of the various possible com-
binations as shown above. If there are two independent events,
the probability of the occurrence or non-occurrence of each
being the same (as in the probability, of a coin falling heads or
tails) the " compound " probabilities may be found by the
expansion of the binomial (p+q)2 in which p equals the prob-
ability of its happening, q the probability of its not happening,
and the exponent 2 indicates the number of events. Now if we
substitute H for p, and T for q (tails = non-heads), we have for
two coins (H+T)2: and squaring, the binomial (H+T)2 =
H2+2HT+T2. This expansion may be written,
1 H2 1 chance in 4 of 2 heads; probability ratio = 1/4
2 HT 2 chances in 4 of 1 head and 1 tail; probability ratio = 1/2
1 T2 1 chance in 4 of 2 tails; probability ratio = 1/4
Total = 4
Note that these results are identical with those obtained above
by listing the various possible outcomes when two coins are
tossed.
If we have three independent events, the expression
(p+q)3 becomes, for three coins, (H+T)3. Expanding this
binomial, we get H3 + 3H2T+3HT2+T3 which may be written,
1 H3 1 chance in 8 of 3 heads; probability ratio =1/S
3 H2T 3 chances in 8 of 2 heads and 1 tail; probability ratio =3/8
3 HT2 3 chances in 8 of 1 head and 2 tails; probability ratio = 3/8
IT3 1 chance in 8 of 3 tails; probability ratio = 1/8
Total = 8
Again these results are identical with those got by listing the
various possible outcomes obtained by tossing throe coins.
GRAPHIC METHODS AND THE NORMAL CURVE 79
The binomial expansion may be applied more generally to the
case in which there are any number of independent events,
just so long as the probability of occurrence or non-occurrence
is the same for each separate event. Thus if we toss 10 coins
simultaneously, we have by analogy with the above (p+#)10,
which equals (H+T)10, putting H for probability of a head,
T for probability of a non-head (tail) and 10 for the number
of coins tossed. When the expression (H+T)10 is expanded,
we have,1
H10+10H9T+45H8T2 + 120H7Ts+210H6T4+252H6T5+210H4Ti
+ 120H:iT7+45H2T8+10HT9+T10
which may be summarized as follows:
Probability
Ratio
1 H10 1 chance in 1024 of all coins falling heads. . . toVt
10 H9T 10 chances in 1024 of 9 heads and 1 tail Ti^
45 H8T2 45 chances in 1024 of 8 heads and 2 tails T££T
120 H7T3 120 chances in 1024 of 7 heads and 3 tails yV^r
210 HCT4 210 chances in 1024 of 6 heads and 4 tails ^T
252 H5T5 252 chances in 1024 of 5 -heads and 5 tails t%Vt
210 H4T6 210 chances in 1024 of 4 heads and 6 tails ■£££.
120 H3T7 120 chances in 1024 of 3 heads and 7 tails TVftr
45 H2T8 45 cliances in 1024 of 2 heads and 8 tails Tf| T
10 HT9 10 chances in 1024 of 1 head and 9 tails T^JT
IT10 1 chance in 1024 of all coins falling tails ToW
Total = 1024
These results are represented graphically in Diagram X,
by a histogram and frequency polygon plotted on the same
axes. The eleven terms of the expansion have been laid off at
equal distances on the X-axis, and the chances of the occurrence
of each combination of H's and T's plotted as scores on the
F-axis. The result is a symmetrical probability curve, with the
greatest concentration in the center, and the " scores " (the
chances) falling away by corresponding decrements above and
1 The reader may take this expansion on faith ; or he may refer to the chapter
on Binomials in any elementary Algebra.
80
STATISTICS IN PSYCHOLOGY AND EDUCATION
below the central point. Diagram X represents the results
which we should expect to get theoretically by tossing 10 coins
1024 times.
Many experiments have been made for the purpose of
checking the theoretical against the actual results, by tossing
coins or throwing dice a great many times. In one well-
known experiment1 12 dice were thrown 4096 times, each
/
\
\
/
/
t
\
/
\
200
\
i
f
\
\
i
i
\
V
i
i
\
i
i
\
100
/
\
\
/
t
\
i
\
\
i
s
\
/
/
\
.
•
N
— 1
^-t^
B10 10H°T 45H8T2 120H7r3210i/t5T4252H6r5210H4Tt5120fl3TT45H-T6 10HT9 T10
DIAGRAM X
Probability Surface Obtained from the Expansion of (H+T)10.
4, 5, and 6 spot being taken as a " success " and each 1, 2,
and 3 spot as a" failure.'' For example, in a throw of 3, 1,
2, 6, 4, 6, 3, 4, 1, 5, 2, 3, there would be 5 successes. The
observed frequencies of the different number of successes
and the theoretical results secured from the binomial expan-
sion have been plotted on the same axes in Diagram XI. The
reader will note how closely the observed frequencies check
the theoretical: how close the two polygons are to being
identical. If the reader should care to verify the results of
Diagram XI by tossing 10 coins 1024 times, he will find his
1 Yule G. Udny, An Introduction to the Theory of Statistics, 5th edition,
1919, p. 258.
GRAPHIC METHODS AND THE NORMAL CURVE 81
empirical results closely in accord with the theoretical
expectations.
2. Why the Probability Curve is Employed in Psychological
Measurement
The frequency curve plotted in Diagram X from the
expansion of the expression (H+T)10 is a symmetrical 10-sided
polygon. If the number of factors (e.g., coins) is increased
1000
>4
§ 600
o>
a
c
o
> 400
200
*""■-<
\
s
•
\\
/ /
/ /
\
1
V
/
\
•\
/
s
^^
r^"~*
'S-
^*5!^=^
10 11 12
Theoretical curve
Actual curve
DIAGRAM XI
Comparison of Observed and Theoretical Results in Throwing
12 Dice 4096 Times. (After Yule, page 258.)
from 10 to 20, to 30, and then to 40 (the baseline extent remain-
ing the same) the number of sides of the polygon will increase
from 10 to 20, to 30, to 40. With each increase in the number
of factors, the points on the curve will move more and more
closely together, until finally when the number of factors
becomes very large [when n in the expression (p+q)n becomes
infinite] the polygon will become a perfectly smooth curve
like the one in Diagram IX. The " ideal " polygon or normal
curve, therefore, may be said to represent the relative frequency
of occurrence of various combinations of a very large number
of equal, similar, and independent factors, when the chances of
the occurrence or non-occurrence of each factor is the same.
82 STATISTICS IN PSYCHOLOGY AND EDUCATION
If now we compare the frequency curve in Diagram IX
with the four graphs plotted from actual data obtained in
measurements of height, intelligence (IQ), memory span,
and temperature (see Diagram VIII) the similarity — as noted
above — of these graphs to the normal curve is clearly evident.
In other words, these distributions of variable phenomena act
as though they were determined by the operation of factors
which are present or absent according to the same laws which
govern the combinations of coins and dice. This is found
to be true of many other distributions as well; so that the
general tendency of quantitative data to follow the normal
probability curve is often called the " law of normal frequency."
Stated briefly, this law is as follows: measurements of natural
phenomena as well as measurements of mental and social
traits tend to be distributed symmetrically about their central
tendency in proportions which are determined by the laws
of chance.
The reason why frequency distributions of variable
phenomena are similar to chance distributions obtained from
tossing coins or throwing dice is that the former, like the latter,
are probably due very often to the operation of the laws of
chance. " Chance " may be defined as the result obtained
from the operation of a great many factors, none of which is
dominant, or, put id another way, all of which are (relatively)
similar, equal, and independent. A number of small factors,
for example, determine whether a coin will fall heads or tails,
or whether a die will show a 2, 3, or 6 spot: the twist of the
wrist, height from which coin or die is thrown, weight or size
of coin or die, kind of floor on which experiment is made, and
many others.1 In like manner a man's height, or his weight,
or the shape of his head, or his intelligence, or his eye color
is determined, very probably, by a large number of factors
which have approximately the same influence on the final
result. (Note: Should one or more of these factors have
special weight the distribution will no longer be of the prob-
1 See Jerome Harry, Statistical Methods, 1924. pp. 169-170.
GRAPHIC METHODS AND THE NORMAL CURVE 83
ability type, but will be skewed or shifted over towards the
uoper or the lower end of the scale. The question of " skew-
ness " will be considered on page 86.)
Experiments have shown that the normal probability
curve serves to describe the frequency of occurrence of many
variable facts with a relatively high degree of accuracy. Some
of these distributions have already been shown in Diagram VIII.
Important facts which give normal, or approximately normal,
distributions may be classified as follows:1
1. Biological statistics: the proportions of male to female
births for the same country or community over a period
of years; the proportion of different types of plants and
animals in cross-fertilization (the Mendelian ratios).
2. Anthropometrical statistics: height, weight, cephalic
index, etc., for large groups of same age and sex.
3. Social and economic statistics: rates of birth, marriage,
or death, under uniform conditions; wages and output of
large numbers of workers under like conditions and in same
occupation; labor costs, prices, etc.
4. Psychological measurements: intelligence as measured
by standard tests; speed of association, perception, reaction
time, etc.; educational test scores, e.g., in spelling, arithmetic,
reading.
5. Errors of observation: measures of height, speed of
movement, magnitudes, physical and mental traits, etc.,
contain errors which are as likely to cause them to lie above
as below the true value Such errors follow the normal
probability curve. (This topic is treated in Chapter III.)
The normal curve is often called the normal probability curve
because it gives the theoretical probabilities of the occurrence
of chance phenomena. It is also called the normal frequency
curve because frequency distributions of actual data obtained
from the measurement of many variable facts are normal.
Finally, it is called the " curve of error " because when repeated
measurements have been made of such variables as height,
1 Jones D. Caradog, A First Course in Statistics, 1921, p. 233.
84 STATISTICS IN PSYCHOLOGY AND EDUCATION
linear magnitudes, time and extent of movement, reaction,
time, etc, the separate measures tend to diverge from the
" true " measure (or standard) by amounts which when
plotted give the characteristic probability curve (see Chapter
We may conclude this discussion of the normal curve
with a word of caution. Despite the similarity of actual and
chance distributions, the student must be careful not to draw
the conclusion that because of this analogy, we can assume
forthwith that mental and physical traits are always (or neces-
sarily) due to the operation of equal, similar, and independ-
ent factors governed entirely by chance. The factors which
determine, say, musical ability or intelligence are too little
known to warrant the assumption, a priori, that they operate
in the same manner, and in accordance with the same laws,
as those factors which give chance distributions of coins or
dice. The selection of the normal curve, rather than some
other type of curve, is, after all, sufficiently justified by the
fact that it does generally fit the data better. However
" the theoretical justification and the empirical use of the
curve are two quite different matters." x
3. Important Properties of the Normal Frequency Curve
In the normal frequency curve, the average, the median,
and the mode all fall exactly at the midpoint of the distribution,
and hence are numerically equal. This follows from the fact
that the normal probability curve is perfectly symmetrical
bilaterally, and in consequence all of the measures of central
tendency must fall at the middle of the curve. Also in the
normal curve, the measures of variability include certain con-
stant fractional amounts of the total area of the curve as
follows (see Diagram IX) :
1. If the SD is laid off in the plus and minus directions
from the mean (to right and left) along the baseline, and if
perpendiculars are erected at these points, the area included
1 Jones D. Caradog, ibid., p. 233.
GRAPHIC METHODS AND THE NORMAL CURVE 85 '
by the perpendiculars, the baseline, and the curve itself con-
tains the middle 68 . 26% of the total area under the curve.
Stated briefly, between the mean and ±1<7 are found the
middle 2/3 (approximately) of the cases in the normal dis-
tribution.
2. If the AD is laid off in the plus and minus directions
from the mean along the baseline, and if perpendiculars are
erected at these points, the area included by the perpendicu-
lars, the baseline, and the curve, contains the middle 57 . 5%
of the total area. Put briefly, between the mean and ±1AD
will be found the middle 57.5% of the cases in the dis-
tribution.
3. If the PE is laid off in the plus and minus directions
from the mean along the baseline, and if perpendiculars are
erected at these points, the area included by the perpendicu-
lars, the baseline and the curve contains the middle 50% of
the area. Since the PE (equivalent to the Q in a normal dis-
tribution) equals 1/2 the distance between the 75th and 25th
percentiles, in a perfectly symmetrical distribution it marks
off the 25% of the area directly above and the 25% directly
below the mean — the middle 50% of the measures.
Certain constant relations will be found to obtain among
the measures of variability. These are easily derived from the
per cents of area included by each.
1. PE= .6745 a
2. PE= .84534D
3. <r = 1.4825P#
4. <7 = 1.2533AD
5. AD= .7979 o-
6. AD = 1.1843P#
The first of these relations is the only one used often enough to
warrant its being memorized. From these equations it should
now be evident why it was stated earlier (page 27) that the a
is always greater than the AD which is, in turn, always greater
than the Q(PE).
86 STATISTICS IN PSYCHOLOGY AND EDUCATION
4. The Measurement of Skewness
In a frequency polygon or histogram, usually the first
thing which strikes the eye is the symmetry, or — what is more
often the case — the lack of symmetry in the figure. In the
normal curve the mean, the median, and the mode all coincide,
and there is a perfect balance or symmetry between the right
and left halves of the figure. In a " skewed " distribution,
on the other hand, the mean, the median, and the mode fall
at different points in the distribution, and the balance (or
center of gravity) is thrown to one side or the other — to right
or left. The degree of displacement or skewness is measured
by the formula,
~. 3 (mean — median) ,. „N
Skewness = ^ , .... (11)
and in the normal distribution, since the mean = the median,
the skewness is 0. The more nearly the distribution approaches
the normal type, the closer together the mean and the median,
and the less the skewness.
If we apply formula (11) to the distribution of 54 Army
Alpha scores in Table I, we get — .66 as the measure of skew-
ness. Distributions like this one are said to be skewed negatively,
or to the left: the scores are massed at the high end of the scale
(the right end), and spread out gradually at the low or left end, as
shown in Diagram XII. Distributions are skewed positively or
to the right when the scores are massed at the low (the left) end
of the scale, and spread out gradually at the high or right end
(see Diagram XIII).
Formula (11) gives the measure of skewness of the distribu-
tion of 200 cancellation scores in Table II (2) as + . 003.
This indicates a very low degree of positive skewness, and shows
how very closely this distribution approaches the probability
type.
There are several reasons why distributions are skewed. In
the first place we should hardly expect the distribution of IQ's
obtained from a group of 25 eight-year old boys to be normal,
GRAPHIC METHODS AND THE NORMAL CURVE 87
nor the distribution of IQ's obtained from a special class for
the dull and feebleminded, even though the latter group
Median
Average
DIAGRAM XII
Negative Skewness: To the Left.
were large. The small size of the group in the first case,
and " special selection " l in the second are sufficient causes
of skewness.2 Again, technical faults in the construction
Median
DIAGRAM XIII
Positive Skewness: To the Right.
of the test, errors in scoring and the like may often produce
skewness in a distribution of test scores.
In addition to these more obvious causes, skewness also
*A " selected " group is one which is not representative of the larger group
from which it is drawn.
2 For an illustration of skewness due to both of these causes, see the distribu-
tion of Table I.
88 STATISTICS IN PSYCHOLOGY AND EDUCATION
results, oftentimes, from a real lack of " normality " in the
data.1 This condition arises when several of the factors
determining a given result are dominant or prepotent and
hence are present more often than chance would allow (see
page 83). A simple illustration of this will be found in those
distributions which result from the throwing of loaded dice.
When dice of this sort are thrown, the resulting distributions
will always be skewed, due to the greater " potency " of the
heavier faces. Again, to take an illustration from real data,
the graph representing the chances of death is considerably
skewed — being higher in infancy and old age than in youth
or old age — because of the difference in number and impor-
tance of the " causes of death " at certain ages.
One other illustration may be taken, this time from the field
of tests. If an arithmetic test which involves only the four
fundamental operations is given to 1000 eighth grade children,
there will be a piling up of the scores towards the high score
end of the distribution: a negative skewness. On the other
hand, if the test contains only problems in fractions, square
root, interest, etc., there will be a piling up of the scores (or at
least a shift in the peak of the curve) towards the low score end
of the scale: a positive skewness. These results may be ex-
plained in terms of the small positive and negative factors which
produce the probability curve. Too easy a test excludes from
operation some of the factors which make for an extension of
the curve at the upper end, such as a knowledge of more ad-
vanced arithmetical relations, which the brighter children would
know. Too hard a test excludes from operation factors which
make for the extension of the curve at the lower end, such as a
knowledge of very simple facts which would permit the answer-
ing of a few, at least, of the questions had these been included.
1 Theoretically, there is no real reason why distributions should always be
normal. Thorndike has written: " There is nothing arbitrary or mysterious
about variability which makes the so-called normal type of distribution a neces-
sity, or any more rational than any other sort, or even more to be expected on a
priori grounds. Nature does not abhor irregular distributions." — Mental and
Social Measurements, pp. 88-89.
GRAPHIC METHODS AND THE NORMAL CURVE 89
In the one case we have a number of perfect scores, and little
discrimination; in the second case a number of zero scores,
and equally poor discrimination.
IV. Some Practical Applications of the Normal Curve
The entire area under any frequency curve represents the
total number of frequencies in the distribution (see page 62).
If we know the total area of the curve, therefore, and in addition
the proportion of the total area in a given segment, it is pos-
sible to compute very simply the frequency represented by the
segment. This information in regard to the normal curve is
given in Tables X and XI from which the theoretical frequency
of any fractional part of the probability curve may be easily
obtained. Acquaintance with these tables is extremely valuable
in the solution of a large number of varied problems. For
this reason before considering any problems which depend for
their solution on the assumption of the normal distribution,
it is very desirable that the construction and use of Tables
X and XI be clearly understood.
1. The Construction and Use of Tables X and XI
Table X gives the fractional parts of the total area under
the normal curve found between the mean and ordinates
erected at various distances from the mean, such distances
measured in a units.1 The total area of the curve (the num-
ber of cases in the distribution) is taken arbitrarily as 10,000
because of the greater ease with which fractional parts of area
x
may then be calculated. The first column of the table, -,
a
gives the distances in tenths of a measured off on the baseline
from the mean as the 0 point or origin; distances in hun-
dredths of cr are given by the headings of the columns. To
find the number of cases in a normal distribution between
the mean and the ordinate erected at a distance of l<r from
1 Table X should be studied in conjunction with Diagram IX.
90 STATISTICS IN PSYCHOLOGY AND EDUCATION
x
the mean, we go down the - column until 1.0 is reached,
a
and in the next column under . 00 take the entry opposite 1 . 0,
viz., 3413. This figure means that there are 3413 cases in
10,000, or 34.13% of the entire area of the curve between the
mean and la; or put more exactly, 34.13% of the cases in the
normal distribution fall within the interval bounded by the
baseline, the F-ordinate erected at the mean, the F-ordinate
erected at a distance of la from the mean, and the curve itself
(see Diagram IX for illustration). To find the per cent of the
x
distribution between the mean and 1 . 57a we go down the -
a
column to 1.5, then across horizontally to the column headed
.07 and take the entry 4418. This means that in a normal
distribution, 44.18% of the entire distribution falls between
the mean and 1 . 57a-.
Thus far we have considered only a distances measured in
the positive direction from the mean; that is, we have taken
account only of the right half — the high score end — of the
normal curve. Since the curve is bilaterally symmetrical,
however, the entries in Table X may be used for a distances
measured in the negative (to the left) as well as the positive
direction. Accordingly, to find the per cent of the distribution
between the mean and — 1 . 26<r, we simply take the entry 3962
in the table: the entry in the column headed .06 opposite 1.2
x
in the - column. This means that 39.62% of the cases in
a
the distribution fall between the mean and — 1.26o\ In the
same way, the percentage of cases between the mean and
— l.OOo- is found to be 34.13; and the student will now be
able to verify the statement made on page 85 that between
the mean and ±1.00cr are 68.26% of the cases in the normal
distribution.
While theoretically the normal curve meets the baseline
at infinite distances to the right and left of the mean, for
practical purposes the curve may be taken to end at points
GRAPHIC METHODS AND THE NORMAL CURVE 91
TABLE X
Fractional Parts op the Total Area (Taken as 10,000) under the
Normal Probability Curve, Corresponding to Distances on
the Baseline between the Mean and Successive Points Laid
off from the Mean in Units of Standard Deviation.
Example : between the mean, and a point 1 . 3 er ( — = 1.3), is found
40.32% of the entire area under the curve.
.00
.01
.02
.03
.04
.05
.06
.07
.08
.09
0.0
0000
0040
0080
0120
0160
0199
0239
0279
0319
0359
0.1
0398
0438
0478
0517
0557
0596
0636
0675
0714
0753
0.2
0793
0832
0871
0910
0948
0987
1026
1064
1103
1141
0.3
1179
1217
1255
1293
1331
1368
1406
1443
1480
1517
0.4
1554
1591
1628
1664
1700
1736
1772
1808
1844
1879
0.5
1915
1950
1985
2019
2054
2088
2123
2157
2190
2224
0.6
2257
2291
2324
2357
2389
2422
2454
2486
2517
2549
0.7
2580
2611
2642
2673
2704
2734
2764
2794
2823
2852
0.8
2881
2910
2939
2967
2995
3023
3051
3078
3106
3133
0.9
3159
3186
3212
3238
3264
3290
3315
3340
3365
3389
1.0
3413
3438
3461
3485
3508
3531
3554
3577
3599
3621
1.1
3643
3665
3686
3708
3729
3749
3770
3790
3810
3830
1.2
3849
3869
3888
3907
3925
3944
3962
3980
3997
4015
1.3
4032
4049
4066
4082
4099
4115
4131
4147
4162
4177
1.4
4192
4207
4222
4236
4251
4265
4279
4292
4306
4319
1.5
4332
4345
4357
4370
4383
4394
4406
4418
4429
4441
1.6
4452
4463
4474
4484
4495
4505
4515
4525
4535
4545
1.7
4554
4564
4573
4582
4591
4599
4608
4616
4625
4633
1.8
4641
4649
4656
4664
4671
4678
4686
4693
4699
4706
1.9
4713
4719
4726
4732
4738
4744
4750
4756
4761
4767
2.0
4772
4778
4783
4788
4793
4798
4803
4808
4812
4817
2.1
4821
4826
4830
4834
4838
4842
4846
4850
4854
4857
2.2
4861
4864
4868
4871
4875
4878
4881
4884
4887
4890
2.3
4893
4896
4898
4901
4904
4906
4909
4911
4913
4916
2.4
4918
4920
4922
4925
4927
4929
4931
4932
4934
4936
2.5
4938
4940
4941
4943
4945
4946
4948
4949
4951
4952
2.6
4953
4955
4956
4957
4959
4960
4961
4962
4963
4964
2.7
4965
4966
4967
4968
4969
4970
4971
4972
4973
4974
2.8
4974
4975
4976
4977
4977
4978
4979
4979
4980
4981
2.9
4981
4982
4982
4983
4984
4984
4985
4985
4986
4986
3.0
4986.5
4986.9
4987.4
4987.8
4988.2
4988.6
4988.9
4989.3
4989.7
4990.0
3.1
4990.3
4990.6
4991.0
4991.3
4991.6
4991.8
4992.1
4992.4
4992.6
4992.9
3.2
4993.129
3.3
4995.166
3.4
4996.631
3.5
4997.674
3.6
4998.409
3.7
4998.922
3.8
4999.277
3.9
4999.519
4.0
4999.683
4.5
4999.966
5.0
4999.997133
From: Tables for Statisticians and Biometricians, edited by Karl Pearson,
Cambridge University Press,
92 STATISTICS IN PSYCHOLOGY AND EDUCATION
— 3cr and +3o- from the mean. We find from Table X, for
example, that 4986.5 cases in the total 10,000 fall between the
mean and 3a; and 4986.5 cases will, of course, fall between
the mean and — 3cr also. Therefore, since 9973 cases in
10,000, or 99.73% of the distribution, fall within the limits
set by —3cr and +3<r, by cutting off the curve at these two points
we disregard only .27 of 1% of the distribution — a negligible
amount, except in very large samples.
Instead of a, the PE may be used as the unit of measurement
in determining the theoretical frequencies within given intervals
of the normal curve. Table XI gives the fractional parts of
the total area under the normal curve found between the mean
and ordinates erected at various PE distances from the mean.
The table is read in exactly the same way as Table X. To
find, for instance, the number of cases between the mean and
1PE (or more accurately the ordinate erected at that point)
x
we go down the ^=— - column to 1.0 and in the next column
PE
under .00 read 2500. Twenty-five per cent of the cases in the
distribution, therefore, lie between the mean and 1PE. In like
manner 25% of the cases lie between the mean and —1PE;
hence, it is clear that the middle 50% of the distribution is con-
tained between the mean and —1PE and -\-lPE measured
off from the mean. This table does not read in as fine units
as Table X, only tenths and .05ths PE divisions being given.
If smaller divisions are desired, however, interpolation can
be made.
Just as it is customary to disregard that part of the curve
beyond the limits ±3<r, so we ordinarily disregard that part
of the curve beyond the limits ±4PE. This is done because
9930 cases (4965X2) in the total 10,000 fall between the mean
and ±^PE (see Table XI). Hence, in cutting of the curve
at +4PE and —4PE, we disregard only .70 of 1% of the cases
in the distribution.
There is little to choose as between Tables X and XI. The
former admits of slightly easier interpolation, but the latter
GRAPHIC METHODS AND THE NORMAL CURVE 93
is probably accurate enough, without interpolation, for most of
the work done in psychological measurement.
TABLE XI
Fractional Parts of the Total Area (Taken as 10,000) under the
Normal Probability Curve, Corresponding to Distances on the
Baseline between the Mean and Successive Points Laid off
from the Mean in Units of PE.
Example : we find between the mean and a point 1 . 55 PE ( -^= = 1 . 55 J
from the mean 35.21% of the entire area under the curve.
X
.00
.05
X
.00
.05
PE
PE
0
0000
0135
3.0
4785
4802
.1
0269
0403
3.1
4817
4831
.2
0536
0670
3.2
4845
4858
.3
0802
0933
3.3
4870
4881
.4
1063
1193
3.4
4891
4900
.5
1321
1447
3.5
4909
4917
.6
1571
1695
3.6
4924
4931
.7
1816
1935
3.7
4937
4943
.8
2053
2168
3.8
4948
4953
.9
2291
2392
3.9
4957
4961
1.0
2500
2606
4.0
4965
4968
1.1
2709
(2810
4.1
4971
4974
1.2
2908
3004
4.2
4977
4979
1.3
3097
3188
4.3
4981
4983
1.4
3275
3360
4.4
4985
4987
1.5
3441
3521
4.5
4988
4989
1.6
3597
3671
4.6
4990
4991
1.7
3742
3811
4.7
4992
4993
1.8
3896
3939
4.8
4994
4995
1.9
4000
4057
4.9
4995
4996
2.0
4113
4166
5.0
4996
4997
2.1
4217
4265
5.1
4997.1
4997.4
2.2
4311
4354
5.2
4997.7
4998
2.3
4396
4435
5.3
4998.2
4998.4
2.4
4472
4508
5.4
4998.6
4998.8
2.5
4541
4573
5.5
4999
4999.1
2.6
4602
4631
5.6
4999.2
4999.3
2.7
4657
4682
5.7
4999.4
4999.5
2.8
4705
4727
5.8
4999.55
4999 . 6
2.9
4748
4767
5.9
4999.65
4999.7
94 STATISTICS IN PSYCHOLOGY AND EDUCATION
2. A Variety of Problems Solved by Means of Tables X
and XI
Under this heading we shall consider a number of problems
which may be solved by means of Tables X and XI, on the
assumption that the distributions which they involve are normal
or approximately normal. For easy reference later, each
group of examples is preceded by a general statement of the
problem which they are designed to illustrate.
A. To Determine the Per Cent of Cases in a Normal Distribution
which Fall within Given Limits.
Problem (1) — Given a normal distribution with Average
= 12, and a = 4.00. (a) What per cent of the cases fall
between 8 and 16? (6) What per cent of the cases lie above
18? (c) below 6?
(a) A score of 16 is just 4 points above the mean, and a score
of 8 is just 4 points below the mean. If we divide this differ-
ence of 4 points by the a of the distribution (by 4) it is clear
that 16 is la above the mean and that 8 is la below the mean
(see Diagram XIV, Fig. I). 68.26% of the cases in a normal
distribution fall between the mean and ±la (Table X). Hence,
68.26% of the scores in the given distribution, or approximately
the middle 2/3, fall between 8 and 16. This result may also
be stated in terms of " chances." Since 68.26% of the cases
in the distribution fall between 8 and 16, the chances are
6826 in 10,000 or 68 in 100 that any score in the distribution
will be found between 8 and 16.
(b) A score of 18 is 6 points or 1.5a above the mean.
From Table X we find that 43.32% of the cases fall between
the mean and 1.5cr. Accordingly, 6.68% of the cases
(50% -43.32%) must lie above 18, in order to fill out the
50% of cases in the right half of the curve (see Fig. 1). Stated
as " chances," there are 668 chances in 10,000 or about 7 in
100 that any future score will lie above 18.
(c) A score of 6 is — 1.5<r from the mean. Between the
GRAPHIC METHODS AND THE NORMAL CURVE 95
-1.5CT 1.5C
FlG.l.
-1:150" 1:15(T
Fig. 3.
150^
1:25PE 182.50
Fig. I.
.530-^7.8i0-V1.280-
FlG. 5.
-1.20- -1.20-
1.20" 1.20"
FlG. 8.
-2.45PE2p£ 1PE 1PE 2i?E
0 point
FIG. 7.
DIAGRAM XIV
Illustrating a variety op Problems Solved by Means of Tables
X and XI.
96 STATISTICS IN PSYCHOLOGY AND EDUCATION
mean and — 1.5a (6) are 43.32% of the cases in the entire
distribution. Hence, 6.68% of the cases lie below 6 — fill out
the 50% below the mean — and the chances are 7 in 100 that
any future score will lie below 6.
Problem (2) — Given a distribution with Average = 29 . 75,
and Q = 4 . 56. What per cent of the distribution lies between
22 and 26? What are the chances that a score will fall be-
tween 22 and 26?
In a normal distribution Q is equal to the PE. Score 22 is
since . ' = 1 . 70 J from the mean,
and score 26 is 3 . 75 points or — . 822PE from the mean (see
Diagram XIV, Fig. 2). From Table XI, we find that 37.42%
of the cases in a normal distribution lie between the mean and
— 1.7QPE; and that 21% of the cases he between the mean
and — .WIPE. By simple subtraction, therefore, 16.42% of
the cases fall between — 1 . 70PE and — . S22PE or between
22 and 26. The chances are 1642 in 10,000 or 16 in 100 that a
score will fall between 22 and 26.
B. To Find the Limits in Any Normal Distribution which Will
Include a Given Per Cent of the Cases
Problem (1) — Given a distribution with Average = 16, and
(T=4. What limits will include the middle 75% of the cases?
The middle 75% of the cases in a normal distribution must
include the 37.5% just above and the 37.5% just below the
mean. From Table X, we find that 3749 cases in 10,000, or
37.5% of the distribution fall between the mean and 1.15a-;
and consequently, 37.5% of the distribution must fall between
the mean and — 1 . 15a-. The middle 75% of the cases, there-
fore, lie between the mean and ±1.15<r; or since a equals
4, between the mean and ±4.60 points. Adding ±4.60
to the mean (to 16), we find that the middle 75% of the
scores in the given distribution lie between 20.60 and 11.40
(see Diagram XIV, Fig. 3).
GRAPHIC METHODS AND THE NORMAL CURVE 97
Problem (2) — Given a distribution with Average = 150,
and Q =26. What limits will include the highest 20% of the
group?
The highest 20% of a normally distributed group must
have 30% of the cases between its lower limit and the mean
in order to fill out the 50% of cases in the right half of the dis-
tribution (see Diagram XIV, Fig. 4). From Table XI, we find
that 3004 cases in 10,000, or 30% of the distribution, fall between
the mean and 1 . 25PE. Since the PE of the given distribution
is 26, 1.25PE will be 1.25X26 or 32.5 points above the mean,
namely, at 182 . 50. The lower limit of the highest 20% of the
given group, therefore, is 182.50; and the upper limit is the
highest score in the distribution, whatever that may be.
C. To Determine the Relative Difficulty of Test Questions,
Problems, or Other Test Items
Problem (1) — Given a test question or problem solved by
10% of a large unselected group; a second problem solved
by 20% of the group; and a third, solved by 30% of the
group. Assuming that the capacity measured by the test
problems is distributed " normally " what is the relative
difficulty of questions 1, 2, and 3?
Our first task is to find for question 1 a position in the
distribution, above which are 10% (the per cent passed) and
below which are 90% (the per cent failed) of the entire group.
The highest 10% in a normally distributed group has 40% of
the cases between its lower limit and the mean (50% — 10% =
40%, see Diagram XIV, Fig. 5), and from Table X we find
that 39.97%, i.e., 40%, of a normal frequency distribution falls
between 1.28a and the mean. Hence, question 1 falls at a
point on the baseline of the curve whose abscissa is 1.28o-
from the mean; and accordingly 1.28a may be taken as its
difficulty value.
In the same way, question 2, passed by 20% of the group,
falls at a point in the distribution 30% above the mean
98 STATISTICS IN PSYCHOLOGY AND EDUCATION
(50% -20% = 30%, see Fig. 5). From Table X we find that
29.95%, i.e., 30%, of the group falls between the mean and
.84(7; hence question 2 has a difficulty value of .84a-. In like
manner question 3, which falls at a point in the distribution
20% above the mean has a difficulty value of .53(7, since
20.19% of the distribution lies between the mean and .53o\
To summarize our results:
Question
Passed by
<t value
<r difference
1
2
3
10% '
20%
30%
1.28
.84
.53
.44
.31
The a difference in difficulty between 2 and 3 is .31, roughly only
3/4 of the o- difference in difficulty between 1 and 2 (.44)
in spite of the fact that the per cent difference is the same in
the two cases. On the assumption that ability follows the
normal frequency distribution, therefore, it is evident that the
a and not the per cent difference gives the real index of dif-
ferences in difficulty.
Problem (2) — Given three test items, No. 1, No. 2, and
No. 3, passed by 50%, 40%, and 30%, respectively, of a large
group. What per cent of the same group must pass test item
No. 4, in order for it to be as much more difficult than No. 3,
as No. 2 is more difficult than No. 1?
A question or problem which is " passed " by 50% of a
group is, of course, " failed " by 50% also, and accordingly,
such a problem falls exactly in the middle of normal distribu-
tion of difficulty. Test item 1, therefore, has a a value of 0;
it falls just on the mean (see Diagram XIV, Fig. 6). Test
item 2 lies at a point in the distribution 10% above the mean,
as 40% of the group passed, and 60% failed this problem.
Accordingly, the a value of this item is .25, since from Table
X, we find that 9 . 87% — roughly 10% — of the cases He between
the mean and . 25c. Test item 3, passed hy 30% of the group,
lies at a point 20% above the mean, and this item, therefore,
has a difficulty value of . 52<r as 19 . 85% (20%) of the normal
distribution lies between the mean and . 52c.
GRAPHIC METHODS AND THE NORMAL CURVE 99
Now since item 2 is .25<r further along on the difficulty
scale (towards the high score end of the curve) than item 1,
it is clear that item 4 must be . 25a above item 3, if it is to be
as much harder than 3 as 2 is harder than 1. Item 4, therefore,
must have a value of .52(7+ .25(7 or .11 a) and from Table X,
we find that 27.94% of the group fall between the mean and
this point. This means that 50% — 28% or 22% of the group
pass item 4. To summarize by a table:
Test Item
Passed by
Difficulty Value (<r)
<r difference
1
50% *
.00
—
2
40%
.25
.25
3
30%
.52
—
4
22%
.77
.25
A problem or test item must be passed by 22% of the group,
therefore, in order for it to be as much more difficult than an
item passed by 30%, as an item passed by 40% is more difficult
than one passed by 50%. Note again that per cent differences
are not reliable indices of differences in difficulty when the
capacity measured is taken to be distributed normally.
D. To Separate a Given Group into Sub-Groups According to
Capacity, When the Capacity is Normally Distributed
Problem (1) — Suppose that we have measured 100 college
men on a certain test. We wish to classify our group into 5
sub-groups A, B, C, D, and E, according to ability, the range
of ability to be equal in each sub-group. Assuming that
the capacity measured by the test is distributed normally, or
approximately so, and that the group is relatively unselected,
how many men should be placed in groups A, B, C, D, and
E, respectively?
Let us first represent the positions of the five sub-groups
graphically on the normal curve as shown in Diagram XIV,
Fig. 7. If the baseline of the curve is taken to extend from
— 3cr to +3(7, that is, over a range of 6(7, dividing this range by 5,
we get 1 . 2(7 as the baseline extent to be allotted to each group.
100 STATISTICS IN PSYCHOLOGY AND EDUCATION
These five intervals may be laid off on the baseline as shown
in the figure, and perpendiculars drawn to demarcate the
various sub-groups. It is clear that group A covers the upper
1.2a; group B, the next 1.2a; that group C lies .60- to the
right and .60- to the left of the mean; and that groups D and
E occupy the same relative positions on the left half of the
curve, as B and A occupy on the right half.
Now to find what per cent of the whole group falls within
the A group, we must find what per cent of a normal distribu-
tion lies between 3a (the upper limit of the A group) and l.Sa
(the lower limit of the A group) (see Fig. 7). From Table X
we know that 49.86% of a normal distribution falls between
the mean and 3a; and that 46.41% falls between the mean
and l.Sa. Hence 3.45% of the total area under the normal
curve (49.86%-46.41%) falls between 3a and 1.8a, and,
accordingly, group A comprises 3.45% of the whole group.
The per cents in the other groups are found in exactly the
same way. Thus, 46.41% of the normal curve falls between
the mean and 1.8a (upper limit of group B) and 22.57% falls
between the mean and .60- (lower limit of the same group).
Subtracting, 46. 41% -22. 57% or 23.84% of our whole group
evidently belongs in sub-group B. Group C lies .60- above
and . 6a below the mean. Between the mean and . 60- is con-
tained 22.57% of a normal distribution, and the same per
cent is contained between the mean and — . 60-. Group C,
then, includes 45% (22. 57% X 2) of the whole group. Finally,
group D which falls between — .Qa and — 1 .80- contains exactly
the same percentage of the total as group B; and group E
which falls between — 1.80- and — 3a contains the same per
cent as group A. The percentage (and number) of men in
each group is given in the following summary:
Group A B C D E
Per cent of total in
each group 3.5 23 . 8 45 23 . 8 3.5
Number in each group
(100 men in all) ... 4 or 3 24 45 24 4 or 3
GRAPHIC METHODS AND THE NORMAL CURVE 101
On the assumption that the capacity measured follows the
normal probability curve, therefore, only 4 men in the group
of 100 should be placed in group A — call the marked ability
group; 24 in group B, the high average ability group; 45 in
group C, the average ability group; 24 in group D, the low
average ability group ; and 4 in group E, the very low or stupid
group.
The above procedure may be used in determining how many
individuals in a large class should get grades of say, A, B, C,
D, E, or it may be employed for any number of grade-groups.
The assumption must be made, however, that the subject in
which the individuals are being graded follows the normal curve.
3. The Arrangement of Problems or Other Test Items into a
Scale in which the Difficulty of Each Item is Known
with Reference to Each Other Item as Well as Some
Selected Zero Point
One of the important tasks which confronts the worker
with tests is the construction of scales which shall contain
problems or questions graded in difficulty from very easy
to very hard by known steps or intervals. Given a set of
problems or test items, if we know what per cent of a large
group (selected from among those for whom the test is intended)
pass or fail each problem, it is a comparatively easy matter
to arrange the problems in a rough order of difficulty. Such
an arrangement, however, constitutes a very crude scale, as
we know very little about the relative difficulty of the separate
problems (see page 98) and next to nothing about the range
of ability tested.
For this reason in most scaled tests — if we can assume a
normal or approximately normal distribution in the capacity
tested — the unit of measurement is taken as the a or the PE. By
so doing we are able not only to arrange the test items in a simple
order of difficulty, but to " set " or space them at definite points
along a scale of difficulty — along the baseline of the normal
curve. On such a scale the distance from one item to another,
102 STATISTICS IN PSYCHOLOGY AND EDUCATION
or from any given item to the selected zero point is known as
definitely as the distance between two divisions on a foot rule.
To illustrate concretely how a scale of this sort is made, let us
suppose that we wish to construct a scaled test for measuring
" reasoning ability " (e.g., by means of syllogisms) in 12 year
olds; or an addition scale for Grade IV; or a scale for testing
sentence memory in 8 year olds. The steps involved may be
outlined as follows:
(1) First it is necessary to compile a large number of
problems or other test items which vary in difficulty from very
easy to very hard, and which are fairly representative of the
field covered by the test.
(2) These problems are then given to as large a random
sample as possible from among those for whom the scale is
intended.
(3) The per cent of the group which solves each problem
correctly is next computed. This allows duplicates and prob-
lems too easy or too hard or those which for one reason or
another are unsatisfactory to be discarded. It also permits
the arrangement of the problems selected for the scale into an
order of difficulty. A problem solved correctly by 90% of the
group is obviously easier than one solved correctly by 75%;
while the second problem is, in turn, clearly less difficult than
one solved correctly by 50%. The larger the per cent passing
the lower the position of the problem on the difficulty
scale.
(4) By means of Table XI each per cent correct found in (3)
may now be converted into a PE (or a) * distance above or below
the mean. The procedure here is as follows. An item solved
correctly by 40% of the group is 10% or .375PE above the
mean. In like manner, an item solved correctly by 78% of the
group is 28% (78% -50%) or l.lbPE below the mean. We
may tabulate the results for five items selected at random as
follows (see Diagram XIV, Fig. 8) :
1 The procedure is identical when a is employed instead of the PE.
GRAPHIC METHODS AND THE NORMAL CURVE 103
Problem A B C D E
Per cent solving 93 78 55 40 14
Distance from mean in per-
centage terms —43 —28 —5 10 36
Distance from the mean in
PE terms -2.20 -1.15 -.20 .375 1.60
Note that Problem A is solved by 93% of the group, i.e., by
the upper 50% (the right half of the curve) plus the 43% to the
left of the mean. Hence it is — 2 . 20PE to the left of the mean.
In like manner, the percentage distance from the mean measured
to the right or left — plus or minus — for each problem may be
found by simply subtracting the per cent passing from 50%.
From these percents, the PE distance of the problem from the
mean can be read from Table XIV, as shown above.
(5) With the PE distance of each problem above or below
the mean established, the PE distance of each problem from
the " zero point " of ability in the test may be calculated.
This zero point is located in the following way. Suppose that
5% of the whole group failed to solve a single problem correctly.
This puts the point of zero ability 45% of the distribution below
the mean or at a point — 2A5PE from the mean.1 The PE
distance of each problem in the scale may now be found from
this arbitrary zero point. To illustrate with the five problems
above :
Problem
A
B
C
D
E
PE distance from mean
-2.20
-1.15
-.20
.375
1.60
PE distance from assumed
zero, i.e., -2A5PE
.25
1.30
2.25
2.83
4.05
The simplest way to find the PE distances from the given zero
point is to subtract, algebraically, the distance of the zero point
below the mean, from the PE distance of each problem from the
mean. Problem A, for example, is —2.20 — ( — 2.45) or .25PE
from the zero point; while problem E is 1.60 — ( — 2.45) or
4 . 05PE from the zero point. The PE value of each of the other
1 Note that this point is not a true zero unless the problems range down to
zero difficulty. It serves, however, as a convenient reference point for the
group for whom the test is intended.
104 STATISTICS IN PSYCHOLOGY AND EDUCATION
problems as measured from the given zero point is found in the
same way.
When the PE value from zero of each of the problems has
been determined, the difficulty of each problem with respect
to every other problem as well as to zero is known and the
scale is finished.
It is evident, of course, that a scale of this sort will not
usually have equal difficulty intervals or " steps " from easy
to hard. However, this fact, while inconvenient, does not
necessarily invalidate the usefulness of the scale as a measuring
instrument. In lieu of a rule, one might use a stick on which
marks had been set at 2, 3.7, 4.8, etc., inches with a fair degree
of accuracy. Nevertheless linear measurements are certainly
more easily obtained with a rule, and in like manner scores are
more easily obtained when the scale has equal steps than when
the steps are unequal. For this reason among others, scale
makers have tried as far as possible to have the steps on their
scales approximately equal. One method of doing this is to
eliminate from the scale as first constructed, certain " odd n
problems, and retain only those which fall at points approx-
imately the same distance apart. Another plan is to try out
a new set of problems, and from among these select problems
which will fill in the gaps in the scale ; or to change the wording
or scoring of a problem in such a way as to shift it up or down
on the scale of difficulty.
A good example of the first method of securing equal steps
on the scale is given by the Woody Arithmetic Scales, Series B.1
These scales represent a selection of certain problems from the
longer Series A (scales constructed by the method outlined
above) and contain problems which are progressive^ more
difficult by approximately equal steps. The problems in Series
A are not spaced at equal points on a difficulty scale. In
the Addition Scale, for example, problem No. 1 has a difficulty
value of 1 . 23PE as measured from the arbitrary zero
1 Woody, Clifford: Measurements of Some Achievements in Arithmetic.
Teachers College, Columbia University, 1916.
GRAPHIC METHODS AND THE NORMAL CURVE 105
-2.425PE;1 problem No. 2 has a difficulty value of 1A0PE,
and problem No. 3 a difficulty value of 2.50PE.
i
TABLE XII
Difficulty Values (PE) of the Problems in the Woody
Arithmetic Scale (Addition), Series A and B
PE Differences
Problem No.
Series A, PE Value
Series B, PE Value
± jjj jL»iucicuuca
(Series B)
1
1.23
1.23
2
1.40
1.40
.17
3
2.50
2.50
1.10
4
2.61
5
2.83
2.83
.33
6
3.21
7
3.26
3.26
.43
8
3.35
9
3.63
10
3.78
3.78
.52
11
3.92
12
4.18
13
4.19
4.19
.41
14
4.85
4.85
.66
15
4.97
16
5.52
5.52
.67
17
5.59
18
5.73
19
5.75
5.75
.23
20
6.10
6.10
.35
21
6.44
6.44
.34
22
6.79
6.79
.35
23
7.11
7.11
.32
24
7.43
7.43
.32
25
7.47
26
7.61
27
7.62
28
7.67
29
7.71
7.71
.28
30
7.71
31
7.97
32
8.04
33
8.18
8.18
.47
34
8.22
35
8.58
36
8.67
8.67
.49
37
8.67
38
9.19
9.19
.52
1 The arbitrary zero point on the Woody addition scale is —2A25PE below
the median of Grade II.
106 STATISTICS IN PSYCHOLOGY AND EDUCATION
The number and the PE value of the other problems in Series
A (Addition) and the problems which have been selected from
this series to make up Series B are shown in Table XII. Each
problem in Series A, as noted above, is expressed in terms of its
PE distance from the arbitrary zero point —2A25PE below
the second grade median. The extremely high PE values of the
problems in the upper half of the scale result from the fact
that the scale is intended for the elementary grades from II
to VIII inclusive, and hence the more difficult problems fall
entirely out of the range of second grade ability. Note that
except in a very few cases, the problems in Series B appear as
a graded series from easy to hard in which the steps from
problem to problem are fairly well equalized. The score on this
scale is simply the number of problems solved correctly — the
distance which one progresses up the scale — just as a child's
height is so many feet and inches on a scale of height.
On a scale which has equal steps, we know that the increase
from say point 10 to 12 is the same as the increase from 12 to
14, and 1/2 the increase from 14 to 15. Moreover, we may
say that the child who works 8 problems is as far ahead of the
child who works 4, as the second child is ahead of one who cannot
work a single problem. We must be extremely careful not to
interpret one measure of capacity on such a scale as "so many
times' ' another measure, however. Unlike measures of height or
weight which are measured from absolute zeros, the measures
given by a scale of performance are taken from some arbitrary
zero point selected by the experimenter. So while we may say
that a man 72 inches in height is twice as tall as a child who is
only 36 inches in height, we cannot, by analogy, say that a child
who scores 5 on an addition test has doubled his ability when
he is able to score 10, unless the measures in the test have been
taken from the absolute zero point of " just no ability at all "
in addition.
The method of constructing a scale outlined above may be
used with any group, grade, or class. When the scale is
designed for use with more than one group, e.g., for the whole
GRAPHIC METHODS AND THE NORMAL CURVE 107
elementary school, an extension of the method given is often
used. In brief, this is as follows:
(1) The PE value of each problem is determined for each
grade separately, as shown above, by computing the per cent
who pass each problem.
(2) The PE distances between the different grade medians
are then computed. This is done by finding the per cent of
the pupils in each grade who have scores larger than the median
score of the next grade. These per cents, when turned into
PE values by means of Table XI, give the PE distances
between adjoining grade medians.
(3) Knowing the PE distances between the grade medians,
we may now convert the PE distance of each problem from
a given grade median into a PE distance from some common
zero point. The different PE values of each problem as
determined for the various grades are averaged to give the
final scale value * — the distance from the common zero point.
A shorter method than the one described may also be used.
This is to compute the PE value of a problem once for all from
the per cent of a large sample — drawn from the entire group —
who pass the problem. This plan is practically identical with
that which we have already described on page 102. It assumes
that the capacity which the scale is designed to measure is dis-
tributed normally throughout the entire group. While probably
not as exact as the more elaborate method, it has the advantage
of simplicity and straightforwardness.
4. The Conversion of Judgments by Relative Position — or
Relative Merit — into a or PE Positions on a Scale
The preceding paragraphs have dealt with the construction
of performance scales built up on the principle that the per cent
passing (or failing) a given problem is the best index of the
difficulty of that problem. It sometimes happens, however,
1 A method of weighting the PE values of a problem in averaging the results
from the different grades is described by Woody in his "Measurements of Some
Achievements in Arithmetic."
108 STATISTICS IN PSYCHOLOGY AND EDUCATION
that the ability to be measured is of such a nature that per-
formance in it cannot be scored simply as correct or incorrect,
but must be determined by a comparison with other perform-
ances of a like sort. This leads to the construction of product
scales. Handwriting scales, composition scales, drawing scales
are examples of instruments in which the quality of the product
is measured, and not its presence or absence in terms of a
per cent or number correct. For example an individual's
handwriting is rated for merit by comparing it with " standard "
specimens of handwriting the quality of which is known.
Quality scales are constructed on the assumption that
equally often noticed differences — in merit or excellence — are
equal. The first step is to secure a large number of samples
of the thing to be measured, e.g., specimens of handwriting or
composition, ranging from very poor to excellent. The next
step is to have a large number of presumably able judges
arrange these specimens in order of merit, in this way comparing
each specimen with each other one. The number of times
each specimen is ranked above each other one is now reduced
to percentage terms, and this per cent is expressed as a PE
difference between the two specimens. The PE difference
determined, specimens selected for the scale may be expressed
as so many PE above some arbitrary zero point. We may take
specimens 8 and 9 on the Hillegas Composition Scale 1 as an
illustration of the method. Hillegas had each of 202 judges
arrange a number of English compositions in order of merit.
An artificial composition was selected as being of zero merit,
and given the value 0 on the scale. Of the 202 judges, 136
or 67.5% ranked 9 as better than 8. From Table XI, we
know that a percentage difference of 67.5% indicates a PE
difference of .QQPE, and this value, therefore, expresses the
amount by which 9 is better than 8. The value of 8 had
already been found to be 7 .72PE above the 0 point on the
scale. Hence 9 is 7 . 72+ . 66 or 8 . SSPE above the zero compo-
i Hillegas, Milo B. A Scale for the Measurement of Quality in English
Composition by Young People. Teachers College, Columbia University, 1912,
GRAPHIC METHODS AND THE NORMAL CURVE 109
sition. The values of the other compositions on the Hillegas
Scale as measured in PE values from zero, the differences deter-
mined in terms of relative merit, are 0, 1 . 83, 2 . 60, 3 . 69, 4 . 74,
5.85, 6.75, 7.72, 8.38, 9.37. Note that the steps on this
scale are fairly regular, being approximately 1PE apart.
5. The Scaling of Total Scores on a Test
Before concluding this brief review of the methods of con-
structing scales, we should mention several methods used for
scaling total scores on a test. The distinction between these
methods and those we have outlined is that in the latter, instead
of scaling each separate element on the test for difficulty —
except possibly to secure an approximate order of difficulty —
we simply determine the difficulty value attained as a result of
doing correctly a certain number of test elements. In other
words the score depends on total number of questions answered
or problems worked, and the difficulty value of individual
problems is not considered as in (3) and (4) above. The three
methods 1 proposed for scaling total scores give, respectively,
(a) a percentile scale, (6) an age scale, and (c) a T-scale.
(a) We have already learned how to locate the percentile
values in a distribution of scores (pages 45-46). In a per-
centile scale a child making a certain score (total number correct)
on a test is given a percentile rating of 20, 30, 70, etc., according
to his position in the distribution. The percentile method
assumes that the difference between a percentile of say 10 and
20 is the same as the difference between a percentile of 40 and
50: that percentile differences are equal throughout the scale.
There is considerable reason to doubt this assumption of equal
units on the percentile scale, however; and for this reason while
practically very useful, the percentile scale is not entirely sound
theoretically.
(6) In the age scale, the mean number of points scored,
on the test by unselected 7 year olds is scored 7, the mean num-
ber of points scored by unselected 9 year olds is scored 9, and
i See McCall, W. M. How to Experiment in Education, 1923, p. 95ff.
110 STATISTICS IN PSYCHOLOGY AND EDUCATION
so on for other age groups. Scores which fall between age
groups are evaluated by interpolation. The age scale is widely
used, and is easily interpreted. The chief drawback to its use
seems to be the difficulty of getting unselected samples for
determining the norms of the low and high age groups. Many
very young children are not in the schools, while many of the
older ones for one reason or another have been eliminated.
As a result, age scales are only strictly accurate between very
narrow ranges of ability.
(c) Recently McCall has suggested a method of scaling total
scores, the !T-scale, which eliminates many of the defects of both
the percentile and the age scale methods. In this method, scores
are based on the a of the distribution of scores made by un-
selected 12 year olds. jT-scores range from 0 to 100. The
zero point on the scale is taken at 5a below the mean and the
100 point at 5a above the mean. The unit of measure, or one
" T " is .1 of the a of the distribution of unselected 12 year
olds. The mean T'-score, therefore, is 50 and each 10 points
above or below this point represent la of the 12 year old dis-
tribution. In actual practice I'-scores will be found to range
generally between 15 and 85. A person who stands at the mean
of 12 year olds on a given test has a !T-score of 50; one who
stands la above the mean, a T-scove of 60; and one who stands
la below the mean of 12 year olds a T'-score of 40. x
The construction of the T-scale has been described in great
detail by McCall in Chapter X of his How to Measure in
Education, and in consequence only the most important
advantages of the scale need be considered here.2 In the
first place, the scale covers a wide range of ability which may
be extended if necessary. Secondly, all T-scores are expressed
in terms of the same unit and with respect to the same zero
point and are equal throughout the scale. Accordingly,
scores from different tests are directly comparable and may
1 For an example, see the Thorndike-McCall Reading Scales, published by
Teachers College, Columbia University.
2 For a complete discussion of the advantages of the T-Scale over the age
and percentile scales, see McCall, How to Experiment in Education, 1923, 94ff.
GRAPHIC METHODS AND THE NORMAL CURVE 111
be combined by simple addition. Finally, a score of a given
size will always have the same meaning when referred to the
mean of unselected 12 year olds which remains at 50.
V. The Transmutation of Measures by Relative Position
(in Order of Merit) into Measures in Units of
Amount
It is often very desirable, especially in the calculation of
coefficients of correlation, to be able to transmute measures
arranged in order of merit into measures in units of amount
or " scores " on some linear scale. This can easily be accom-
plished by means of tables, provided we can assume " nor-
mality " in the trait for which the ranking has been made.
To take an example, let us suppose that we have 15 salesmen
ranked in order of merit for selling efficiency, the most effi-
cient ranked No. 1, the least efficient ranked No. 15. Now
if we are justified in assuming that selling efficiency follows
the normal probability curve, we can — with the aid of Table
XIII — assign to each man a " selling score " on a scale of 10
or 100 points which will very probably represent his capacity as
a salesman much better than a rank of 2, 6, or 14. The problem
may be stated as follows:
Problem (1) — Given 15 salesmen ranked in order of merit
by their sales-manager, transmute these rankings into scores
on a scale of 10 points.
The procedure is as follows: First by means of a simple
formula,
„ , ... 100(^-.5)l ,10.
Per cent position = — =r= - / . . . (12)
in which R is the rank of the individual in the series, and N
the number ranked, we determine the " per cent position " of
each man. Next, from Table XIII we read off the score on a
scale of 10 points. Thus Salesman A who ranks No. 1 (see the
1 This formula and the method built around it were devised by Professor Clark
Hull. See Hull, The Computation of the Pearson r from Ranked Data, Journal
of Applied Psychology, 1922, 6, 385.
112 STATISTICS IN PSYCHOLOGY AND EDUCATION
table below) has a per cent position of ^— — or 3.34,
and his score from Table XIII is 8.5 (finer interpolation un-
necessary). In like manner, Salesman B who ranks No. 2 has
a per cent position of r— — : — or 10, and his score, accord-
ingly, is 7.5. The scores of the others, found in exactly the
same way, are given in the following table:
Salesmen
Rank
Per cent Position
Score (Scale 10)
A
1
3.34
8.5
B
2
10.00
7.5
C
3
16.67
6.9
D
4
23.34
6.4
E
5
30.00
6.0
F
6
36.67
5.7
G
7
43.34
5.3
H
8
50.00
5.0
I
9
56.67
4.7
J
10
63.34
4.3
K
11
70.00
4.0
L
12
76.67
3.6
M
13
83.34
3.1
N
14
90.00
2.5
0
15
96.67
1.5
On several previous occasions, it has been pointed out that
the assumption of normality in a trait or capacity implies that
differences at the extremes of capacity are relatively much
greater than the same differences around the average or mean.
This is clearly brought out in the table above; for while all
differences in the order of merit series equal 1, the differences
between the transmuted scores vary considerably, being
greatest at the ends of the series, and smallest in the middle.
The difference between A and B, for example, or between
N and O, is three times as great as the difference between G
and H. Stated differently, we may say that it is three times as
easy to move from H to G (from 8th to 7th place) as from B
to A (from 2nd to 1st place).
GRAPHIC METHODS AND THE NORMAL CURVE 113
TABLE XIII
[From Hull, Journal of Applied Psychology, 1922]
The Transmutation of an Order of Merit into Units of Amount or
"Scores."
Let R represent the rank in the Order of Merit, and N the number
iked. Then from the formula, Per (
per cent position, and from it the score.
ranked. Then from the formula, Per cent position = =r= — '- — , find the
Example
:: IfJV=25,
and R= 3,
Per cent position =
100(3-5)
25
or 10.00,
and from th
e table the score is 7 . 5.
Per cent
Score
Per cent
Score
Per cent
Score
.09
9.9
22.32
6.5
83.31
3.1
.20
9.8
23.88
6.4
84.56
3.0
.32
9.7
25.48
6.3
85.75
2.9
.45
9.6
27.15
6.2
86.89
2.8
.61
9.5
28.86
6.1
87.96
2.7
.78
9.4
30.61
6.0
88.97
2.6
.97
9.3
32.42
5.9
89.94
2.5
1.18
9.2
34.25
5.8
90.83
2.4
1.42
9.1
36.15
5.7
91.67
2.S
1.68
9.0
38.06
5.6
92.45
2.2
1.96
8.9
40.01
5.5
93.19
2.1
2.28
8.8
41.97
5.4
93.86
2.0
2.63
8.7
43.97
5.3
94.49
1.9
3.01
8.6
45.97
5.2
95.08
1.8
3.43
8.5
47.98
5.1
95.62
1.7
3.89
8.4
50.00
5.0
96.11
1.6
4.38
8.3
52.02
4.9
96.57
1.5
4.92
8.2
54.03
4.8
96.99
1.4
5.51
8.1
56.03
4.7
97.37
1.3
6.14
8.0
58.03
4.6
97.72
1.2
6.81
7.9
59.99
4.5
98.04
1.1
7.55
7.8
61.94
4.4
98.32
1.0
8.33
7.7
63.85
4.3
98.58
.9
9.17
7.6
65.75
4.2
98.82
.8
10.06
7.5
67.48
4.1
99.03
.7
11.03
7.4
69.39
4.0
99.22
.6
12.04
7.3
71.14
3.9
99.39
.5
13.11
7.2
72.85
3.8
99.55
.4
14.25
7.1
74.52
3.7
99.68
.3
15.44
7.0
76.12
3.6
99.80
.2
16.69
6.9
77.68
3.5
99.91
.1
18.01
6.8
79.17
3.4
100.00
0
19.39
6.7
80.61
3.3
20.93
6.6
81.99
3.2
114 STATISTICS IN PSYCHOLOGY AND EDUCATION
Another use to which Table XIII may be put is in the
combining of incomplete order of merit rankings. To illus-
trate with a problem:
Problem 2 — Given six persons, A, B, C, D, E, and F, to
be ranked for honesty by three judges. Judge 1 knows all six
well enough to rank them; Judge 2 knows only three well
enough to rank them; and Judge 3 knows four well enough
to rank them. Can we obtain a fair order of merit for all
six persons by combining these three sets of rankings, two of
which are incomplete?
We may tabulate the data as follows:
Persons A B C D E F
Judge l's ranking 1 2 3 4 5 6
Judge 2's ranking 2 1 3
Judge 3's ranking 2 1 3 4
Now assuming that honesty is " normally distributed ':
it seems fair that A should get more credit for ranking first in
a list of six than D for ranking first in a list of three, or C for
ranking first in a list of four. In the order of merit rankings,
all three are given the same rank. But when we assign scores
to each person in accordance with his position in the list bjr
means of formula (12) and Table XIII, A gets 77 for his first
place, D gets 69 for his, and C gets 72 for his (see table below) . !
Persons A B C D E F
Judge l's ranking 1 2 3 4 5 6
Score 77 63 54 46 37 23
Judge 2's ranking .. 2 .. 1 .. 3
Score 50 69 . . 33
Judge 3's ranking 2 .. 1 .. 3 4
Score 55 .. 72 43 28
Sum of scores 132 113 126 115 SO S4
Average score 66 57 63 58 40 28
Order of Merit 1 4 2 3 5 6
1 It is somewhat doubtful whether it is usually worth the trouble to trans-
mute orders of merit into scores as shown above and then combine them so as
to get a weighted order (see Garrett, H. E., An Empirical Study of the Various
Methods of Combining Incomplete Order of Merit Ratings. Journal of Educational
Psychology, 1924, XV, pp. 157-171). If it is deemed desirable to weight ratings,
however, the method given will prove useful.
GRAPHIC METHODS AND THE NORMAL CURVE 115
The other ratings are transmuted in the manner shown above.
All of the scores are then combined and averaged to give the
final weighted order of merit as shown in the table.
With formula (12) and Table XIII it is possible to
transmute any set of ranks into scores on the assumption of a
normal distribution in the trait for which the ranking is made.
This is very useful in the case of those traits which are not
easily measured by ordinary methods, but for which individ-
uals may be arranged in an order of merit, as for example
athletic ability, personality, beauty, etc. It is also valuable
in correlation when a set of ranks is the only available " crite-
rion " for a given ability while the " independent " tests are
scored in ordinary test units.1 Transmuted scores may be
combined, or averaged, like other test scores.
A word of explanation may be said in regard to the con-
struction of Table XIII. This table was derived from a table
of the theoretical frequencies of the normal frequency distri-
bution in which the curve was taken to end at ±2.5cr. The
baseline of the curve is 5cr, therefore, and may conveniently be
subdivided into 100 parts, each . 05<r. The first . 05<r from the
upper extreme limit of the curve takes in .09% of the distri-
bution and is scored 9.9 (or 99 on a scale of 100). The next
.05(7 (.lOcr from the upper end of the curve) takes in .20% of
the entire distribution and is scored 9.8, or 98, and so on. In
each case, the percent position gives the fractional part of the
normal distribution which lies to the right of the given a value
on the baseline. The a values determine the score.
PROBLEMS
1. (a) Plot both distributions given in example (2), page 56 as
frequency polygons and histograms. For comparative
purposes plot the frequency polygon and the histogram for
each distribution with respect to the same coordinate axes:
on the same diagram.
(b) Calculate a measure of skewness for both distributions.
1 The definition of a criterion and its value in determining the validity of
one or more tests is discussed at length in Chapters V and VI.
116 STATISTICS IN PSYCHOLOGY AND EDUCATION
2. Plot distribution A, example (2), page 56, as an ogive. Compare
the percentiles obtained from the graph with the calculated
values.
3. Assuming that trait X is completely determined by 6 factors — all
equal in value, similar, and independent, and each as likely to
be present as absent — plot the distribution which one would
most probably get from the measurement of trait X in an
unselected group of 1000 people.
4. In a random sample of 1000 cases, Average = 14 . 4, and a = 2. 5.
(a) What per cent of the cases lie between 12 and 16?
(b) What are the chances that any future case will be above 18?
(c) What are the chances that any future case will be below 8?
5. In an approximately normal distribution of 100 cases, Average =
29.74, Q(PE) =3. 18.
(a) What per cent of the cases lie between 24 and 25?
(6) What limits include the middle 60% of the cases?
(c) What limits include the lowest 5% of the cases?
6. In a certain test the 7th grade median is 28, with a Q of 4.8; and
the 8th grade median is 31 .6, with a Q of 4.0. What per cent
of the 7th grade is above the median of the 8th grade?
7. A group of 12 year olds, two years ago, had a reading ability
expressed by an average of 40, and a <r of 3.6; and a composition
ability expressed by an average of 62, and a a of 9.6. Today
the group has gained 12 in reading and 10.8 in composition.
How many times greater is the former than the latter gain?
8. Four problems, 1, 2, 3, and 4, are solved by 50%, 60%, 70%,
and 80%, respectively, of a large group. Compare the dif-
ference in difficulty between 1 and 2 with the difference in
difficulty between 3 and 4.
9. In a college the 10 grades A+, A, A- ; B+,B,B-; C+,C,C-;
and D are given. On the assumption that ability in mathe-
matics is distributed normally, how many men in a group of
500 Freshmen should receive each grade?
10. Five problems are passed by 15%, 34%, 50%, 62%, and 80%
of a large unselected group. If the zero point of ability is
taken at — 3a, what is the a value of each problem as measured
from this point?
GRAPHIC METHODS AND THE NORMAL CURVE 117
11. In a large group of competent judges, 88% rank composition A
as better than composition B; 65% rank B as better than C.
If C is known to have the PE value of 3.5 as measured from
the zero composition, i.e., the composition of zero merit, what
are the PE values of B and A as measured from this " zero "?
12. Twenty-five men on a football squad are ranked in order of merit
from 1 to 25 for general playing ability by the coach. Assuming
" normality " in the trait " general playing ability " transmute
these ranks into units of amount on a scale of 100 points.
Answers
4. (a) 57.04%. (b) 749 in 10,000. (c) 52 in 10,000.
5. (a) 4.8%. (6) 25.76 and 33.72. (c) 21.95 and the lower limit
of the distribution.
6. 30.65%.
7. 2 . 96 (approximately 3) times as great.
8. Difference between 1 and 2, .25<j; between 3 and 4, .315a-.
9. Grades: A+ A A- B+ B B- C+ C C- D
No. men
receiving: 3 f 14 40 80 113 113 80 40 14 3
10. In order: 4.04; 3.41; 3.00; 2.69; 2.16.
11. B, 4.07PE; A, 5.82PE.
12.
tank
Score
Rank
Scoi
1
89
13
50
2
80
14
48
3
75
15
46
4
71
16
44
5
68
17
42
6
65
18
39
7
63
19
37
8
61
20
35
9
58
21
32
10
56
22
29
11
54
23
25
12
52
24
20
25
11
CHAPTER III
THE RELIABILITY OF MEASURES
I. What is Meant by the Reliability of a Measure
By the " true " measure of an individual's capacity in any
trait, as for example, the true measure of his height, reaction,
time, or intelligence, we mean the average of an infinite number
of measurements of the given capacity made under precisely
the same conditions. Obviously, in actual practice, we can never
deal with true measures as thus defined — for usually wre must
be satisfied with a single measure, or at best with a compara-
tively few measures of the given trait. We can, however,
measure the amount by which an obtained measure "most
probably" varies from its corresponding true measure; and this
measure of "probable divergence" serves as an index of the
reliability of the obtained measure — of how good an approxi-
mation it is of the true measure.
In like manner, the reliability of an obtained measure of a
group is determined by finding the probable divergence of the
obtained measure from the true measure of the group. The
true measure of a group — as for example the true average
or the true a — is defined as that measure obtained by taking
into account all of the members of the group, and the true
measure of difference between two groups is the difference
between their true means or medians. To show just what
is meant by the " true measure " of a group, let us suppose
that we could measure the height of every 12 year old boy
in the United States. If from this frequency distribution of
heights, we should calculate a measure of central tendency
and a measure of variability — the average and a for example —
this average would be the true average height of 12 year old
IIS
THE RELIABILITY OF MEASURES 119
boys in the United States, and the a would be the true measure
of scatter around this average. In the same way, if we could
measure the height of every 12 year old girl in the United
States, it would be possible to secure the true average height,
and the true variability around it, of 12 year old girls in this
country. Moreover, knowing the true average height of 12
year old boys and the true average height of 12 year old girls,
it would be a very simple matter to find the true difference
between the average height of 12 year old boys and 12 year
old girls in the United States.
Unfortunately it is rarely, if ever, possible to measure all
of the individuals in a group or " population," and it is, of
course, impossible to take an infinite number of measures of
a given individual. We must be content, therefore, to deal with
" samples " selected from the total number of possible meas-
ures; and, as a result, due to slight differences in the samples
chosen, measures of central tendency and variability are often
larger or smaller than their corresponding true measures.
Hence, whenever we have measured an individual or a group,
we must ask ourselves this question: " How reliable a measure
of capacity have I secured? How well does it ' represent '
the true measure which I should get from a very large (infinite)
number of measures of this individual — or from measuring
all of the individuals in the population from which my group
is taken?" This question will often lead to a second: " How
many measurements must I make in order to get a result
which shall meet a certain standard of reliability, i.e., show a
probable divergence from the true result which is less than
some given amount?"
The purpose of the following sections is to develop methods
which will enable us to answer these questions. First, the
reliability of the mean and median will be considered; then
the reliability of the measures of variability; and finally the
reliability of the difference between two measures.1
1 The method of finding the reliability of a coefficient of correlation is given
later on page 170.
120 STATISTICS IN PSYCHOLOGY AND EDUCATION
II. The Reliability of Measures of Central Tendency
1. The Reliability of the Average or Mean
A. The Reliability of the Mean in Terms of its Standard
Error Oav.)
Perhaps the simplest approach to the study of the reliabil-
ity of the average is to examine the factors upon which the
reliability of this measure must depend. Suppose that we wish
to find the average score of college freshmen in the United
States on Army Alpha. To measure the achievement of
college freshmen in general, would require in strict logic that
we test all of the freshmen in the United States. However,
this is a well-nigh impossible task, and hence we must be
satisfied with taking the records of as large and random a sample
of freshmen as we can secure. This means that we cannot use
freshmen from only a single institution or from only one sec-
tion of the country, and that we must guard against selecting
only those with low or high scholastic records. The more
successful we are in getting an " unselected " group the more
nearly representative will this group be of all of the freshmen in
the country. Evidently, therefore, the reliability (the " repre-
sentativeness ") of an average depends, for one thing, on how
impartially we have selected our sample.
Granted a fair sample, the reliability of an average can be
shown to depend upon two characteristics of the distribution,
(1) the number of cases, and (2) the variability or spread of
the measures within the sample.
(1) It is clear that the number of cases must influence the
stability of an average, since the addition of even one extra
measure to a series will bring about a change in the average
unless the additional case happens to coincide with it exactly.
Moreover, the addition of one case to a set of 10 measures will
cause a greater change in the obtained average — written
" average(0bt.)" — than the addition of one extra case to a
set of 1000 measures, as each case counts for less in the larger
THE RELIABILITY OF MEASURES 121
group. It has been shown empirically, as well as theoretically,1
that the reliability of an average (0bt.) will increase, not in pro-
portion to the number of measures upon which it is based,
but rather in proportion to the square root of the number of
measures. Thus the average (obt.) of 25 measures of a vari-
able quantity is not 25 times, but V25 or 5 times as reliable
as a single measure of the quantity. And in like manner, the
average of 36 cases is not 4 times as reliable as the average
of 9 cases, but only twice as reliable — since V 36 divided by
V9 equals 2.
(2) In addition to the size of the sample, the reliability
of an average must depend also upon the variability of the
separate measures around the obtained average. If the a of
the distribution is large, the separate measures tend to scatter
widely from the average, and we are unable to say where those
cases in the population which we have not measured will most
probably fall: whether they will be close to, or far from the
obtained average. On the other hand, if the a is small we may
be fairly certain that unmeasured cases will fall fairly close
around the average. For this reason, the reliability of an
obtained average depends upon the size of its a — and as a
increases, the reliability decreases.
We find, then, that the reliability of an average depends
first upon our having selected a fairly representative sample
from the larger group — or population — which we are studying.
When this condition has been met, and only then, the reli-
ability of an average can be measured mathematically in terms
of its standard error — in terms of the number of cases, and
the a of the distribution (written cr(dis)). The formula for the
standard error of an average or mean, written o-av. is
°"~Vft' (13)
1 Yule: An Introduction to the Theory of Statistics, 19l9, p. 257. For results
of experiment, see Fullerton and Cattell: On the Perception cf Small Differences,
Publications of the University of Pennsylvania, Philosophical Series 2, 1892.
122 STATISTICS IN PSYCHOLOGY AND EDUCATION
This is one of the most important — and most often used — of
the reliability formulas. Note that a decrease in <7(dis.), or an
increase in the size of N will cause the standard error to be-
come smaller numerically. A decrease in <rav. means that the
probable divergence of the obtained average from the true is
just so much less; hence the reliability of an average(0bt.) in-
creases as crav. decreases.
A problem will illustrate the value and use of formula (13).
Problem (1) — In 1883, the Anthropometric Committee of
the British Association found the average height of 8585 adult
males in the British Isles to be 67 . 46 inches with a a of 2 . 57
inches.1 How reliable is this average? What is its probable
divergence from the average which would have been secured
had all adult males in the British Isles been measured?
Applying formula (13) the standard error of the mean,
<rav., is found to be .0277 inch. This result is interpreted
in the following way. The chances are 6826 in 10,000 or 68
in 100 that the obtained average of 67.46 inches does not
diverge from the true average by more than ±l<rav.7 i.e., by more
than ±.0277 inch. Stated in another way, the chances are
68 in 100 that the true average lies within the limits 67.46+
.0277 and 67. 46 -.0277, or between 67.488 and 67.432
inches. We can be practically certain that the true mean
lies within the limits 67.46±3X .0277 (=fc3o-av.), or between
67.543 and 67.377 inches (see Table X for a values).
Just how the standard error measures the reliability of an
average may be shown most clearly, perhaps, by an illustra-
tion. Suppose that we have measured the heights of 1000
groups of men, each group containing 8585, the groups or
samples chosen at random from the general population. The
1000 averages obtained from these groups will tend to differ
slightly from one another due to so-called errors of sampling
(see page 143) and hence not all samples will represent with
equal accuracy the population from which they have been
i Yule, An Introduction to the Theory of Statistics, 1919, pp. 112 and 141,
THE RELIABILITY OF MEASURES 123
drawn. Now suppose, further, that it were possible to secure
the average height of the entire male population of the British
Isles. If we should subtract this true mean from each one of
the 1000 obtained means, obviously we would get 1000 differ-
ences, and these 1000 " measures " (differences) would —
according to the best assumption that we can make — follow
the normal probability curve (see page 83). In this hypo-
thetical distribution of differences, we should have relatively
few large plus or minus deviations, and a relatively large num-
ber of small plus, small minus, and zero deviations — in short,
the obtained means would hit close to the true mean more often
than they would miss it.
The average of this distribution of differences would fall
(most probably) at 0; for other things being equal, this will
be the difference most often obtained — the maximum frequency
— in subtracting the true from the obtained means. The a of
this distribution is given by the formula -^=. In other
VN
words, the standard error of the mean measures the spread
of the differences (obtained-true) around 0 as a central tend-
ency; and for this, reason o-av. is a measure of the probable diver-
gence of the obtained average from its corresponding true
average.
These results are represented graphically in Diagram XV,
Fig. 1. The 1000 differences between the 1000 obtained means
and the true mean are shown arranged into a normal frequency
distribution with mean at 0, and a equal to . 0277. The heights
of the different ordinates represent the frequency of the various
obtained-true differences: the height of the maximum ordinate
at the mean is the zero difference. Now we know that the a of a
normal distribution includes the middle 68.26% of the cases,
when measured off in the plus and minus directions from the
mean. Hence we may say that the chances are 68 in 100 that
the difference between the obtained mean of 67.46 inches and
the true mean will not be greater than ± . 0277 inch. Or, as
stated above, there are 68 chances in 100 that the true average
124 STATISTICS IN PSYCHOLOGY AND EDUCATION
lies within the limits 67. 46 +.0277 and 67. 46 -.0277, or
between 67.488 and 67.432 inches. Furthermore, we can be
practically sure that the true average will fall within the limits
dz3o-av. from the mean. Three times ±.0277 is ±.0831; and
accordingly there are 9973 chances in 10,000 (see Table X) that
the true average lies within the limits 67.46± . 0831, or between
67.543 and 67.377 inches.
-.0831 —.0277 0
FlG.l
.0277
+3 <r
.0831
5000-
cases
28.1
29 29.C 30.2 30.8
Fig. 3
2.17CT
31.5 32
-1.6PE
2i 26.4 30
Fig. i
142.7 147.7 149.7 151.7 152.7 153.7
Fig. 5
1.340-
Fig. 6
DIAGRAM XV
The average height of our sample of 8585 British males has
been found to be 67.46 inches with a standard error of .0277
inch. Let us now proceed to the second question stated
on page 119, viz., "How many measurements must I make
in order to get a result whose probable divergence from the
true result is less than some given amount ?" Suppose, for
example, that we wish to secure an average which is twice as
reliable as the average we now have — how many cases will be
required? Assuming that the spread in the increased group,
THE RELIABILITY OF MEASURES 125
i.e., <T(dig.), remains approximately the same, all that we need
do in order to cut the standard error in two and thus double
the reliability, is to place a 2 in the denominator of the fraction
; . But 2V8585 becomes V4X8585 when the 2 is placed
V8585
under the radical, and, accordingly, it is evident that 8585 must
be multiplied by 4 in order to make <rav. just 1/2 its original
size. By analogy, to double the reliability of any average
we must multiply N by 4; to triple the reliability, by 9, etc.
Assuming substantially the same o-(dlSi), the average obtained
from 400 cases is twice as reliable as the average got from
100, and the average from 900 cases three times as reliable as
that from 100 cases.
B. The Reliability of the Mean in Terms of the PE of the Average
In measuring the reliability of an average the PE of the
average — written PZ?(av.) — may be used instead of the crav
The Pi?(av.) is interpreted in exactly the same way as the o-(av.) .
Its formula is derived simply by multiplying formula (13) by
.6745 (seepage 121):
PE(av^'67y^ (14)
Applying this formula to our problem of heights P£,(av.)
is found to be .0187 inch. The chances are even, therefore,
that the obtained average of 67 . 46 inches does not differ from
the true average by more than ± . 0187 inch. Moreover,
since ±4PE includes practically all of the cases in a normal
distribution, we may be certain (the chances are 99 in 100)
that the true average lies within the limits 67.46±4X .0187,
or between 67.39 and 67.53 inches (see Table XI for PE
values).
A comparison of the extreme limits within which we may
be practically sure that the true average will lie shows that the
values of these limits differ slightly when ±4P2£ instead of
±3<r are taken as limiting points [see Problem (1) above].
126 STATISTICS IN PSYCHOLOGY AND EDUCATION
This discrepancy is due to the fact that ±3<7 takes in 9973
of the 10,000 cases in the normal distribution, while ±4Pi?
takes in but 9930 cases (see Tables X and XI). The a limits,
therefore, contain 43 more cases than the PE limits, and while
43 cases in 10,000 may seem to be an insignificant number —
and is insignificant if taken from the middle of the distribution
— even so few cases as this have considerable importance at the
extremes of the distribution. This may be seen in the fact
that we must take ±4:A5PE, in order to have our PE limits
correspond exactly to ±3<r, since these limits include 9974
cases in 10,000.
It is customary, however, in measuring reliability to use
zt4:PE instead of ±4.45P1? as limits of practical certainty.
In the first place, ±4:PE mark off limits within which the
chances are very great — 9930 in 10,000 — that the true average
will fall. And furthermore, the slight increase in reliability got
by using ±4.45Pi? instead of ±4PE is not usually sufficient
to offset the greater convenience of the latter figure.
2. The Reliability of the Median
The formulas for measuring the reliability of an obtained
median are easily derived from those for measuring the reli-
ability of the mean. The o-(mdn.) and Pi^mdn.) are 1.25331, or
roughly 5/4, times the o-av. and P2£(av0 respectively.
_5 0-(dis.) n »*
<r (num.)- J" ;^f> UOJ
DJ? _5 . 6745Xcr(d|S) _ . 8454cr (dls-) , p.
or
PBo-w-f-^.1 (16a)
Formulas (15), (16), and 16a) are all used and interpreted
in the same way as the reliability formulas for the average or
1 This formula should be used when Q and not a is given.
THE RELIABILITY OF MEASURES 127
mean. A problem will serve to show how the reliability of the
median is found.
Problem (2) — Measurement of 801 12 year old boys on
the Trabue Language Scale A 1 gave the following results :
Median = 21.4; Q = 4.9. What is the reliability of this
median? How close is it to the true median score of 12 year
old boys?
From formula (16a) the PE{mdn.) is found to be .2164. The
chances are 50 in 100, therefore, that the true median does not
differ from 21 . 4 by more than ± . 2164. We may be practically
certain that the true median lies within the limits 21.4±4X
.2164, or between 22.27 and 20.53.
Since cr(mdn0 and PEimdn,} are both larger — approximately
1 . 25 times — than the corresponding measures of reliability of the
average (obt.), it is clear that the obtained average is always more
reliable than the obtained median of the same group. For
this reason the average is used whenever the highest reliability
is sought (see page 50).
III. The Reliability of Measures of Variability
1. The Standard Deviation, or <r
We have seen that the reliability of an obtained average
or obtained median is found by determining the probable
divergence of the obtained from the true measure. In the
same way, the reliability of an obtained a or an obtained Q
is measured by the probable divergence of this measure from
the true a or the true Q, viz., the a or the Q which we should
get from all possible measures of the trait in question. The
formula for finding the reliability of an obtained a is
-*«** (17)
" V2N'
In Problem (1), page 122, we found that for 8585 adult
British males, the obtained <t — the a taken around the
i Trabue, M. R., Completion Test Language Scales, 1916, p. 15.
128 STATISTICS IN PSYCHOLOGY AND EDUCATION
average(obt.) of 67.46 inches — was 2.57 inches. The question
may well be asked: how reliable is this a? How well does
it represent the true a which we should get if deviations could
be taken from the true average? Substituting for <ri6iam) and
N in formula (17), the value of ov is found to be .0196 inch.
This means that the chances are 68 in 100 that 2 . 57 inches
does not differ from the true a by more than ±.0196 inch;
and that the chances are 997 in 1000 that the o-(dls0 does not
differ from the true a by more than 3X=b.0196 or ±.0588
inch. We can be practically certain, then, that the true a
lies within the limits 2.57± .0588, or between 2.63 and 2.51
inches.
2. The Quartile Deviation, or Q
The reliability of the Q of a distribution is found from the
formula,
CQ- vm ' (18)
1.65X0 ,10 v
OQ=-7m~ (18a)
or in terms of Q,
The 801 12 year old boys who took the Trabue Completion
Test, Scale A (see page 127), had a median score of 21 .4 points
with a Q of 4.9 points. What is the reliability of this Q?
From formula (18a) aQ is found to be .202. The chances are
68 in 100, therefore, that 4.9, the obtained Q, does not differ
from the true Q by more than ± . 202 point. And the chances
are 9973 in 10,000 that the true Q lies within the limits 4.9±
3 X . 202, or between 5 . 5 and 4 . 3 points.
IV. The Reliability of the Difference between Two
Measures
1. The Reliability of the Difference between Two Averages
A. The Reliability of the Difference in Terms of the c(dm.)
Suppose that we wish to find whether there is any difference
in the performance of 10 year old boys and 10 year old girls
THE RELIABILITY OF MEASURES 129
on a certain general intelligence test. The usual method of
attacking this problem is to select as large and as random a
sample of 10 year old boys and 10 year old girls as possible; give
them our test, compute the average scores, and find the dif-
ference between the two averages. If this difference is, let us say,
several points in favor of the girls, such a result would be
evidence (on the face of it) for believing that the average girl is
better than the average boy. Before drawing this conclusion
definitely, however, we should know how reliable the obtained
difference is: what its probable divergence is from the true dif-
ference which we should get if we could subtract the true average
of the boys from the true average of the girls.1 Otherwise, if
we compared the averages of other groups of boys and girls
similarly selected as our groups, we might wipe out or even
reverse the difference found. One formula for calculating the
reliability of an obtained difference is
C(diff.) = * & (av. l)~r°" (av.2); .... (19)
in which <rav. x is the standard error of the first obtained average,
o"av.2 is the standard error of the second obtained average, and
c«iifl.) is the standard error of the difference between the two
averages. Thus to find the reliability of the difference between
two averages, we must first know the reliability of the averages
themselves.
Let us illustrate the use and value of formula (19) by means
of a problem.
Problem (3) — In a study of the intelligence of foreign born
white draft during the Great War, a sample of 308 native
born Germans and a sample of 325 native born Danes were
found to test as follows on the " combined scale:"2
Country of Birth
Germany
Denmark
No. of Cases
Average Score
0-(dIs.)
308
13.88
2.43
325
13.69
2.23
1 Simpler methods of studying the significance of the difference between two
averages are given in Chapter I, p. 40.
2 The combined scale was made up of the 8 Alpha tests, the Stanford-Binet,
and tests 4, 5, 6, and 7 of Beta. The maximum score was 25.
130 STATISTICS IN PSYCHOLOGY AND EDUCATION
The difference between the two obtained averages is seen
to be . 19 in favor of the Germans. Is this a reliable difference?
Would further testing of other groups of Germans and Danes
give approximately the same difference; or is it probable that
the difference would be reduced to zero, or even reversed in favor
of the Danes? Stated more exactly, what is the probable
divergence of this difference from the true difference between
Germans and Danes? To answer these questions, we must find
the reliability of the averages of the Germans and the Danes,
and from these the reliability of the difference between the
averages.
By formula (13) the standard errors of the two averages are,
For Germans:
2.43
(Tov —
or .1385.
For Danes:
V308
— = or .1237.
V325
Substituting these values in formula (19) we have that
aidm = V(. 1385) 2 + (. 1237) 2 = . 1857.
The actual difference between the two averages is .19, there-
fore, and the standard error of this difference, earn, is . 1857.
An obtained difference is interpreted in terms of its standard
error in exactly the same way in which an obtained average
is interpreted in terms of its standard error. Thus we may
say that the chances are 68 in 100 that the obtained difference
of . 19 does not diverge from the true difference by more than
± . 1857; and that the chances are 99 in 100 that . 19 does not
differ from the true difference by more than 3X±.1S57 — by
more than ± . 56 (see Table X) .
To sum up our findings so far, we may be almost certain that
the true difference between the averages of the Germans and
Danes lies within the limits . 19±.56 or between —.37 and
+ .75. Note that the lower limit of this range is negative,
THE RELIABILITY OF MEASURES 131
and in consequence there is at least some chance that the true
difference is less than zero — that the average of the Danes
will sometimes actually be higher than that of the Germans.
In spite of the obtained difference in favor of the Germans,
we cannot be 100% sure that the true difference between the
average German and the average Dane is greater than zero.
Just what then, it may be asked, are the chances of a true
difference greater than zero between Germans and Danes?
Before answering this question, let us digress for the moment to
consider the following hypothetical situation.1 Suppose that we
could secure the averages of 1000 groups of native born Ger-
mans and 1000 groups of native born Danes on the combined
scale, the samples selected at random from the general popula-
tion of native born Germans and Danes and roughly of the
same size as the samples we have. Suppose further, that these
groups could be paired off so that we should have 1000 differ-
ences between the obtained averages of Germans and Danes,
these hypothetical differences corresponding to the actually
obtained differences of . 19. Now according to the best assump-
tion that we can make this distribution of differences would fol-
low the normal probability curve; the lower limit of the dis-
tribution would be at — .37, the upper limit at . 75 and the mean
at . 19 as shown in Diagram XV, Fig. 2. The mean is taken at
. 19 because this is the difference actually obtained, and hence
may be fairly taken as the most probable. Again, the chances
are even that any other obtained difference will be greater or
less than . 19; and accordingly, the logical place for this differ-
ence would seem to be at the mean. The a of this distribution
of differences is . 1857, the crdlff..
Now to determine the chances that the true difference
between Germans and Danes is greater than zero, we divide . 19,
which is the distance of the mean difference from the zero dif-
ference, by . 1857, the a of the difference-distribution. This tells
1 The argument here which differs somewhat from that on page 123 is
believed to be better adapted to the present illustration than the other. The
two are essentially the same, however.
132 STATISTICS IN PSYCHOLOGY AND EDUCATION
us how far the zero difference is below the mean in u terms.
19
■ ' „. is 1 . 02cr, and from Table X we find that in the normal
. 1857
curve 3461 cases in 10,000 lie between the mean and 1.02cr.
Adding in the 5000 cases above the mean (see Digram XV,
Fig. 2) and translating cases over into " chances," it is clear that
the chances are 8461 in 10,000 that the true difference between
the averages of Germans and Danes is greater than zero. We
may be practically certain, therefore, when we compare groups
of Germans and Danes on the combined scale, that 84 times
in 100 or 4 times in 5, the difference between the average scores
will be in favor of the Germans. This answers the question
put on page 130: "What are the chances of a true difference
greater than zero between the Germans and Danes?"
The obtained difference of . 19 is sufficiently large to insure
considerably more than an even chance of a true difference
between Germans and Danes. It is not large enough, how-
ever, to guarantee that the Germans will always score higher,
on the average, than the Danes. The further question arises,
therefore: — how much difference would be required to insure
absolute reliability, — to guarantee that the Germans will
always lead the Danes. This question is easily answered
with the help of Fig. 2. If the point —3a- below the mean
(the point taken at — . 37) were the zero-difference point, we
should then be practically certain, since the whole curve of
differences would lie to the right of this point, of a true difference
always greater than zero. To accomplish this, however, i.e., to
shift the zero-difference point down to — . 37, the mean difference
would have to be .37+. 19 or .56. This new difference (D)
56
divided by <rd,fl. would equal * . or 3a-, and the chances would
. lo57
then be 9986.5 in 10,000 that the true difference between
Germans and Danes on the combined scale will always be
greater than zero.
We may summarize the preceding paragraphs as follows.
The obtained difference between the averages of the Germans
THE RELIABILITY OF MEASURES 133
and Danes on the combined scale is found to be . 19, or 1/3
(approximately) of what it should be, (.56) to insure a com-
pletely reliable difference. The obtained difference is large
enough, however, to guarantee that 4 times in 5 the average
score of the native born Germans will be higher than the
average score of the native born Danes.1
Once we understand what the <rd!fL formula means, the
reliability of an obtained difference in terms of " chances that
the obtained difference represents a true difference greater
than zero " may be conveniently read from Table XIV. For
example, when D=.19 and cam.- = • 1857, so that - = 1.02,
Odlff.
we find at once from the table that the chances are 84 in 100
that the true difference is greater than zero. Moreover, since a
of 3 means practically complete reliability, we know that a
0"diff.
of 1 . 02 is ' or about 34% of what it should be in order
to insure a difference always greater than zero.
It is usually customary to take a of 3 as indicative of
, °dlff.
complete reliability, since — Scr includes practically all of the
cases in the " distribution of differences " below the mean (see
Diagram XV, Fig. 2). A greater than 3 is to be taken as
Cdiff.
indicating just so much added reliability.
B. The Reliability of the Difference in Terms of the PE(diff.)
The reliability of the difference between two obtained means
may be measured by the PE^m.) as well as by the a-(d,fl.). The
formula for PE^m.) is
PE(dm, = VP^V. d +^2<av. 2), . . . (20)
in which PEiax, y and PECav. 2> are the PE's of the two given ob-
1 Assuming that the samples used represent adequately — at least as ade-
quately as the present samples — the population of native born Germans and
Danes.
134 STATISTICS IN PSYCHOLOGY AND EDUCATION
TABLE XIV
To Find the Chances of a True Difference Greater than Zero,
Given the Actual Difference between the Two Obtained
Measures, and the earn-
For example: a —=1.3 means that the chances are 90 in 100 that the true
ffdlff.
difference (the difference between the true measures) is greater than zero.
Note. — The "chances in 100" increase so slowly after 1.50 that the column
increases thereafter by .10 instead of by .05. dlfl-
D
D
.
Chances in 100
Chances in 100
""din*.
""cliff.
.00
50
1.15
87
.05
52
1.20
88
.10
54
1.25
89
.15
56
1.30
90
.20
58
1.35
91
.25
60
1.40
92
.30
62
1.45
93
.35
64
1.50
93
.40
65
1.60
94
.45
67
1.70
96
.50
69
1.80
96
.55
71
1.90
97
.60
73
2.00
98
.65
74
2.10
98
.70
76
2.20
99(98.6)
.75
77
2.30
99(98.9)
.80
79
2.40
99(99.2)
.85
80
2.50
99(99.4)
.90
82
2.60
99(99.5)
.95
83
2.70
100(99.7)
1.00
84
2.80
100(99.74)
1.05
85
2.90
100(99.8)
1.10
86
3.00
100(99.9)
tained averages. Formula (20) is interpreted in exactly the
same manner as formula (19) — a problem will illustrate its use.
Problem (4) — On the two halves of the Wood worth-Wells
Substitution Test 1 timed separately, 200 Barnard Freshmen
made the following records :
Average (Sees.) o^dls.)
First half 65.51 11.13
Second half 60.32 12.04
1 Carothers, F. E., Psychological Examination of College Students, Archives of
Psychology. 46, 1921, p. 36.
THE RELIABILITY OF MEASURES
135
TABLE XV
To Find the Chances of a True Difference Greater than Zero,
Given the Actual Difference between the Two Measures
AND THE P-Edlff-
D
For example: a
PE,
1.10 means that there are 77 chances in 100 that the true
cliff.
difference (the difference between the true measures) is greater than zero.
Note. — The "chances in 100" increase so slowly after 2.0 that the
increases thereafter by .10 instead of .05.
D „, . _ D
D
PE
column
diff.
-P^'dlff.
.00
.05
.10
.15
.20
.25
.30
.35
.40
.45
.50
.55
60
.65
.70
.75
.80
.85
.90
.95
1.00
1.05
1.10
1.15
1.20
1 . 25
1.30
1.35
1.40
1.45
1.50
Chances in 100
PEam.
Chances in 100
50
1.55
85
51
1.60
86
53
1.65
87
54
1.70
87
55
1.75
88
57
1.80
89
58
1.85
89
59
1.90
90
61
1.95
91
62
2.00
91
63
2.10
92
64
2.20
93
66
2.30
94
67
2.40
95
68
2.50
95
, 60
2.60
96
71
72'
2.70
2.80
97(96.6)
97
73
74
75
2.90
3.00
3.10
97(97.5)
98(97.9)
98
76
77
78
79
3.20
3.30
3.40
3.50
98(98.5)
99(98.7)
99(98.9)
99
80
3.60
99
81
3.70
99
82
83
84
3.80
3.90
4.00
99(99.5)
100(99.6)
100(99.7)
84
Is this gain in time from the first to the second half of the test
sufficiently large to indicate a true difference in the time
required to learn the key after practice, or would further testing
with other groups probably reduce, or even reverse, the gain?
136 STATISTICS IN PSYCHOLOGY AND EDUCATION
First, to find the probable errors of the two averages:
First half:
P£(av. i)='674^— 1-13^ . 5310. By formula (14)
Second half:
PE(*v.2)= j== = .5743. By formula (14)
Substituting PE{SLV, X) and PEitLy. 2> in formula (20) we have
PEmn.) = V(.5310)2 + (.5743)2 = . 7822.
The obtained difference, D, is 5 . 19 and the PEmn.) is . 7822.
Therefore, r^= is 6.64, and since we find from Table XV
" & (diff .)
(to be read exactly like Table XIV) that a ^—= of 4 indicates
P& (diff.)
complete reliability, it follows that our obtained difference is not
only completely reliable, but is 2.64P#(6.64— 4.00) or about
66% larger than it need be in order to insure a true difference
greater than zero.
Just as it is customary to take a of 3 as indicative of
0"dlff.
complete reliability, so a ^ = must be at least 4 in order
P& (diff.)
to insure complete reliability.
2. The Reliability of the Difference between Two Medians
The two formulas (19) and (20), used in finding the relia-
bility of the difference between two means, may be used also
for finding the reliability of the difference between two medians
when written:
0'«Uff.)BS'V»2(m<ln. l)+0'2(mdn.2)j .... (21)
and
P ■^(dlfl.)==^/-f>-E'*'(mdn. 1) + -P-E""" (mdn. 2), ■ • • (-2)
THE RELIABILITY OF MEASURES 137
We may illustrate these formulas by a problem:
Problem (5) — The following results were obtained from a
group of 12 year old boys and a group of 12 year old girls —
Grades III to VIII inclusive — on the Trabue Language
Scale A.1
iV Median Q
Boys 801 21 40 4.9
Girls 448 22.80 5.3
The actual difference between the two medians is 1.4
points in favor of the girls. Assuming that the two groups
are fairly unselected, is this difference sufficiently large to
insure a true difference greater than zero in favor of the girls?
Since the measure of variability given is the Q, we shall use
the formula for PE(Am.). First, to find the reliability of the
two medians:
For girls : PE^(la.) = j • A^= = .3130. By formula (16a)
For boys : P#(mdn.) = j • 4= = . 2164. By formula (16a)
Substituting in (22) we have,
PE(flUL) = V(.3130)2+(.2164)2 = . 3805
The obtained difference is 1.4 and the PEmn.) is .3805.
Therefore, ^ is 3.68, and from Table XV we find that
P -^(dlft.)
the chances are 99.3 in 100 that there is a difference greater
than zero between the true median scores of 12 year old boys
j-?^ ) of what
it should be conventionally in order to guarantee complete
reliability. However, it is sufficiently high to be taken —
for all practical purposes — as completely reliable.
1 Completion-Test Language Scales, 1916, p. 15.
138 STATISTICS IN PSYCHOLOGY AND EDUCATION
V. Some Problems Which Involve Measures of
Reliability
This Section is designed to illustrate a variety of problems
which require in their solution the reliability formulas given
in this Chapter and the frequency tables. For quick reference
later, each group of examples is preceded by a general state-
ment of the essential problem involved.
A. To Find the Probability That the True Average is Greater or
Less than Some Designated Point on the Scale, or That
it Falls within Given Limits
Problem (1) — Given Averageobt. = 30.2. C(di3.) = 6.00.
N — 100. On the assumption that this sample is fairly repre-
sentative of the population from which it is drawn, (a) what
is the reliability of the obtained average? (b) What are the
chances that the true average is less than 29? (c) greater
than 31.5? (d) that the true average lies between 28 and 31?
(a) From formula (13) we find that the crav. is .6; hence
the chances are 68 in 100 that the obtained average does not
diverge from the true average by more than ± . 6, and that
the true average falls between the limits 30.8 and 29.6.
Moreover, the chances are 99.7 in 100 that 30.2 does not
diverge from the true average by more than ±.6X3 or ±1.8;
i.e., that the true average falls within the limits 28.4 and 32.
These results are represented graphically in Diagram XV,
Fig. 3. This normal probability distribution represents the
distribution of means that we should expect to get from a
large number of random samples, selected in the same way as
the sample we have.1 The central tendency of this hypo-
thetical distribution of means is taken at 30.2, the actually
obtained, and hence the most probable, mean. The standard
deviation o£ the distribution is .6, the standard error of the
given obtained mean.
(b) What are the chances that the true mean is less than 29?
1 See the discussion on pages 122-123.
THE RELIABILITY OF MEASURES 139
29 lies 1.2 points or 2a below the obtained mean of 30.2
(see Fig. 3). From Table X, we find that 4772 cases in 10,000
fall between the mean and 2a in a normal distribution; and,
accordingly, 5000 — 4772 or 228 cases must lie below 2a. The
chances are 228 in 10,000, therefore, that the true mean lies
below — is less than — 29.
(c) What are the chances that the true mean is greater
than 31.5? This score is 1.3 points or 2.17o- above the
obtained mean. There are 4850 cases in 10,000 between the
mean and 2.17<r in a normal distribution: and 5000 — 4850 or
150 cases above this point. Hence the chances are 150 in 10,000
or about 2 in 100 that the true mean is greater than 31.5 (i.e.,
lies above 2.17a).
(d) What are the chances that the true mean lies between
28 and 31? 28 is 2.2 points or — 3.67o- from the mean; and
31 is .8 of a point or 1 . 34c- from the mean. Between the mean
and —3.67(7 in a normal distribution are 4999 cases in 10,000,
and between the mean and 1.34ct are 4099 cases in 10,000.
Within the interval from — 3.67<r to 1.34cr, therefore, we find
4999+4099 or 9098 cases. Stated as chances, there are about
91 chances in 100 that the true average lies between 28 and 31.
Problem (2) — Given Average(obt-) = 26 . 4. PE{SLV-) = 1.5.
What are the chances that the true average of the group of
which the given group is a random sample is (a) as large as 30?
(b) as small as 24?
As in Problem (1), this situation may be represented by a
normal probability curve, with the mean at 26.4 and PE equal
to 1.5 (see Diagram XV, Fig. 4).
(a) What are the chances that the true average of the group
is as large as 30? 30 is 3.6 points or 2.4 PE above the obtained
average of 26.4. There are 4472 cases in 10,000 between the
mean and 2.4 PE in a normal distribution (Table XI); and
5000-4472 or 528 cases above 2.4 PE, i.e., above 30. Hence
the chances are 528 in 10,000 or about 5 in 100 that the true
average is as large (or larger than) 30.
140 STATISTICS IN PSYCHOLOGY AND EDUCATION
(6) What are the chances that the true average is small as
24? 24 lies 2.4 points or —1.6 PE from the mean. There are
3597 cases in 10,000 between the mean and — 1.6 PE in a normal
distribution, and 5000-3597 or 1403 cases below -1.6 PE.
The chances are 1403 in 10,000, therefore, that the true average
is as small (or smaller than) 24.
B. To Find the Probability That the Divergence of an Obtained
Measure from its True Measure Will be within Given
Limits
Problem (3) — Given Average(obL) = 152.7 and c(av.)=4.5.
Find the probability that the given obtained average will not
diverge (or vary) from the true, by more than (a) 1 point,
(b) 3 points, (c) 5 points, (d) 10 points.
(a) This is essentially the same problem, expressed in a slightly
different way, as the problems under A. To find the probability
that the obtained average differs from the true by as much + 1 or
— 1, we must find the chances that the true mean lies within the
limits 152.7=1=1, i.e. between 151.7 and 153.7. (This is shown in
Diagram XV, Fig. 5). A deviation of ±1 point is a deviation of
±t~^ or ± .222c from the obtained mean. From Table X we
4.5
find that 880 cases in 10,000 in a normal distribution fall between
the mean and + .222<7 or — .222a. Accordingly, 880X2 or
1760 cases fall within the interval + .222o- to — .222<r, and the
chances are 1760 in 10,000 that the obtained mean will not
diverge from the true mean by more than ± 1 point.
3
(6) Three points are i^— r or ^ ■ ^7 a ^rom the mean. There
are 2475X2 or 4950 cases within the interval .667cr measured
off to the right and left of the mean. Hence there are 4950
chances in 10,000 that the obtained mean will not diverge from
the true mean by more than dz3 points.
5
(c) Five points are zk— or d= 1 . llo- from the mean. Hence
there are 3665X2 or 7330 chances in 10,000 that the obtained
THE RELIABILITY OF MEASURES 141
average will not differ from the true average by more than ±5
points.
(d) Ten points are ±j-r or ±2.22o- from the mean; and
accordingly there are 4868X2 or 9736 chances in 10,000 that
the obtained mean will not diverge from the true mean by more
than ± 10 points.
C. To Find the Probability That the True Difference between the
Measures of Two Groups is Greater or Less than a Given
Amount
Problem (4) — The difference between two obtained means
is 3. o"(dlft) = 1.5. (a) What are the chances that the
true difference between the means of the two groups is greater
than 0? (b) greater than 1? (c) greater than 3?
3
(a) Zero difference is — - or 2a below the mean of differences,
I . o
viz., 3 (see Diagram XV, Fig. 6). There are 4772 cases in 10,000
between the mean of a normal distribution and 2a. Accordingly,
there are 5000+4772 or 9772 chances in 10,000 that the true
difference is greater than zero. (Note that this result may be
read off directly from Table XIV— that = 2.)
tfdlff.
2
(6) One is — — or 1 . 33o- below the mean. There are 4082
1.5
cases in 10,000 in a normal distribution between the mean and
1 . 33(7. The chances, therefore, are 5000+4082 or 9082 in 10,000
that the true difference is greater than 1.
(c) What are the chances that the true difference is greater
than 3? The obtained difference of 3 has been placed at the
mean of differences as the obtained, and hence the most prob-
able difference. The chances are even, therefore, or 50-50 that
the true difference is greater (or less) than 3. Note that is
0"(dlff.)
—^ or 0. (Table XIV.)
142 STATISTICS IN PSYCHOLOGY AND EDUCATION
VI. Limitations to Reliability Formulas, and Cautions
to be Observed in Interpreting Them
The formulas which have been given in this chapter for
calculating the standard errors of obtained measures of central
tendency and variability make use of only two characteristics
of the distribution from which the measure has been obtained,
viz., the a (distribution) — the spread of the measures — and
N, the number of cases. It is obvious that so far as the
formulas themselves are concerned there is nothing which
would prevent our finding a standard error for a measure
obtained from any group. Such a general and uncritical appli-
cation of reliability formulas, however, will almost surely lead
to erroneous conclusions, and for this reason it is necessary to
indicate briefly some of the limitations to reliability formulas
as well as some cautions to be observed in interpreting results
secured from them.
(1) In the first place, in interpreting standard errors we
always make the assumption that measures obtained from
successive samples are distributed according to the normal
probability curve. This assumption is only true, however,
when the number of cases is large; it is not valid when the
sample is small. Hence the significance of a measure of relia-
bility is conditioned upon our having a sufficiently large number
of cases. If N is less than 25, there is little sense or justifica-
tion in using reliability measures. One simple and practical
method of judging whether the sample is " sufficiently " large
is to continue taking independent measures or adding cases
drawn at random, until the addition of extra cases fails to
produce an appreciable fluctuation in the average or median.
When this point is reached the sample is probably large enough
to be taken as fairly representative of the larger group from
which it has been drawn. As a corollary it must be recognized,
however, that mere numbers are not in themselves a guarantee
of a representative sample.
(2) A more serious limitation to the measures of reliability
THE RELIABILITY OF MEASURES 143
arises from the fact that standard and probable errors of
obtained measures can be assumed to measure only those errors
which result from fluctuations due to " random sampling."
An illustration will make this term clear. On page 122 we
found that the obtained average height of 8585 adult British
males was 67.46 inches with a standard error of .0277 inch.
This means that the chances are 997 in 1000 that the true
average height of British males lies between 67.54 and 67.38
inches. Now by "true average height" we mean the average
height of all British males, from whom our group of 8585
is an attempted random sampling. If our group were per-
fectly representative, its average would equal the true aver-
age exactly. Except by chance, however, neither this sample
nor another similarly selected, and approximately of the same
size, will represent the entire population perfectly; and further-
more, it is extremely unlikely that the averages calculated
from successive samples will equal each other. Nevertheless,
if the samples are actually random, and there are no large con-
stant errors present, the calculated averages will tend to vary
around the true average of the whole group within a compara-
tively small range. ( Variations like these, which arise from the
fact that we must generally work with samples instead of the
whole population, are called " errors of sampling."
The function of the standard and of the probable errors is to
give a measure of this sampling error, i.e., of the probable amount
of deviation to be expected in an obtained measure from the
corresponding true measure, as a result of working with a single
sample. In other words, the standard or probable error meas-
ures the error made in taking a sample as representative of the
larger group or population. If the standard error of a given
mean is small, it does not follow that the obtained mean is
highly reliable, necessarily; a small standard error indicates
merely that the reliability is high, in so far as fluctuations due
to differences in sampling are concerned.
Reliability formulas give no measure of the effects of errors
due to other causes than those which arise from sampling.
144 STATISTICS IN PSYCHOLOGY AND EDUCATION
Errors which arise from the failure to get a random sample, for
example, are neither detected nor measured by these formulas.
To illustrate this point, the average Army Alpha score made
by 500 college men between the ages of 18 and 25 will not be
representative of the male population of this age-range. Col-
lege men form a highly selected group, and in consequence,
other samples of 500 drawn at random from the male population
between the ages of 18 and 25 will return very different results
from that of the college group. These differences in average
score cannot be attributed to errors of sampling; and to take
this group as representative of the general male population
between the ages of 18 and 25, and to calculate the standard
error of its average will lead to an entirely erroneous idea of the
intelligence of the general population. (The given sample
might, of course, serve very well as a group representative of
the population of college men.)
Other variations not measured by the reliability formulas
arise from errors due to practice, fatigue, coachability of tests,
faulty technique in giving and scoring tests, and, in fact, errors
due to a bias of any sort. Standard errors calculated for measures
secured from samples which contain such errors will always be
of doubtful value.
The careful study of successive samples, retests when
practicable, care in controlling conditions, and the use of
objective checks whenever possible, will eliminate many of
these troublesome and prolific sources of error. Assuming
that constant errors are small or practically negligible, one
of the simplest tests of the adequac}^ — the " representative-
ness"— of a sample consists in taking several other groups
of approximately the same size from the general population.
If the measures calculated from these groups are of very nearly
the same size, we may be reasonably assured that we have
representative samples. If the similarity is not fairly close,
we must continue adding cases until the successive samples
are approximately similar. Oftentimes more information may
be secured in regard to the reliability of our measures in this
THE RELIABILITY OF MEASURES 145
way than could be obtained from a blanket use of reliability
formulas.
(3) In concluding this discussion, we should add one word
in regard to the use of formulas which measure the reliability
of the difference between two obtained measures, namely,
oW.) and PE@w.)- These formulas make allowance only for
variable errors in the original measures — for errors which
arise in sampling. Constant errors in the original scores and
errors of the sort mentioned above are not detected, nor their
influence measured. Furthermore, these formulas always
assume that the measures or scores in the two series which are
compared are uncorrelated (see page 288). These limitations
must be borne in mind when using or interpreting differences
in terms of the " true " difference. . . .
VII. — Summary of Reliability Formulas
1. The Reliability of Measures of Central Tendency
(1) The Average or Mean
i „ — q'(dl3-> (\<X\
-l. <T(aver.) — ,— - \lO)
9 PF - ■ 6745(7(dls-) nA\
L. /'^(ave,..) — -== ^14;
(2) The Median
1 ^ _ 5 g~(diS.) y--v
1- 0-(mdn.)-^7/^ UOj
I. JPA(mdn.)=- — -= — (16)
3. -P^Cmdn.) = T ,— (16a)
2. The Reliability of Measures of Variability
(1) The Standard Deviation
i. ff„=^ (17)
146 STATISTICS IN PSYCHOLOGY AND EDUCATION
(2) The Quartile Deviation
<e,_ V2N (I8)
'""-vw (18o)
3. The Reliability of the Difference between Two Measures
(1) The Average
1- 0"(dlff.) =VCT (aver. 1)4*0" (aver. 2) (19)
2. PE(am.) = vPE (aver.l)-\-PE (aver. 2). ■ ■ ■ (20)
(2) The Median
1- 0"(dlff.) — ^C^Cmdn. l)~rfw(mdn. 2) (21)
2. PE{aift.)=vPE2(man. i)+P-E2(mdn. 2). • . • (22)
PROBLEMS
Note: For uniformity in figuring "chances" in the following problems,
take all a and PE distances to three decimals and correct back to the second
place. Count all fractions over one half as wholes and drop all under one
half. For example, write 1.876<r as 1.88a; .023 PE as .02 PE, etc.
1. Given that the obtained average is 26.4; a is 3.2; N is 100.
{a) What are the chances that the true average for the 10,000 from
which the 100 cases measured are a random sampling will
be greater than 27?
(b) That it will be between 26 and 27?
(c) What are the chances that the true variability will be between
3.1 and 3.3?
(d) That the true variability will be less than 3 . 5?
2. Given: Median = 72 . 40. Q = 12.84. N = S1.
(a) What are the chances that the true median of the population
from which this random sample is drawn is above 75?
(b) That it lies between 70 and 74?
(c) What are the chances that the true Q is not greater than 15?
(d) That it lies between 10 and 14?
THE RELIABILITY OF MEASURES 147
3. Given: Av. 1=29.6. <r(dtoi) = 3 . 54. N=100.
Av. 2 = 28.4. o-(dl8.) = 5.36. # = 225.
(a) Find the o-av. for both distributions.
(6) Find the reliability of the difference between the means,
(c) What difference would be completely reliable, assuming that
the variability remains practically unchanged?
4. In Example 2, page 56, find the reliability of the difference between
the means of distributions A and B [use the <r(difl.)].
5. Average (obt-)=K. PE(Siy) = 3.5. What are the chances that the
true average will not diverge from the obtained by more than
(a) 1, (b) 3, (c) 10.
6. Given that Mdn. 1-Mdn. 2 = 3.6. PEidm = 3 . 0.
(a) What are the chances that true difference is less than 0?
(b) That it is 1 or more?
(c) What per cent is the obtained difference of the difference neces-
sary for complete reliability?
7. Find the reliability of the average in
(a) Example 4, page 116.
(b) Example 5, page 116.
8. In a random sample of 100 cases each from the four groups A, B, C,
and D, the following are obtained :
A. Average = 101. cr(dls) = 10 . 0.
B. Average = 104. <r(dIs.) = 11.0.
C. Average = 93. o-(dls<)= 9.6.
D. Average = 86. c^\s.)— 8-5.
What are the chances that, in general, the average of
(a) the A's is better than the average of the B\s.
(6) the A's is 5 better than the average of the C's.
(c) the A's is 10 better than the average of the D's.
What are the chances that
(a) a B will be better than the average A.
(6) a B will be better than the average C.
(c) a B will be better than the average D.
148 STATISTICS IN PSYCHOLOGY AND EDUCATION
A^SWEBS
1.
(a) 3 in 100.
(b) 86 in 100.
(c) 34 in 100.
(d) 91 in 100.
2.
(a) 16 in 100.
(b) 55 in 100.
(c) 90 in 100.
id) 71 in 100.
3.
(a) 0-av. i = • 354. o-av 2 = . 357.
(6) 99 chances in 100 of a true difference
(c) 1.51.
4.
92 chances in 100 of a true difference. (
5.
(a) 15 in 100.
(6) 44 in 100.
(c) 95 in 100.
6.
(a) 21 in 100.
(6) 72 in 100.
(c) 30%.
7.
(a) o-av.= .0791.
(6) P#av.= .318.
(Table XIV)<
a) 222 in 10,000.
b) 9846 in 10,000 or 99 in 100.
c) 9999.277 in 10,000 (100%).
a) 61 in 100.
b) 84 in 100.
c) 95 in 100.
CHAPTER IV
CORRELATION
I. What is Meant by Correlation
Up to this point in our discussion we have concerned our-
selves chiefly with methods of computing statistical measures
which shall represent in a reliable way the performance of an
individual or a group in some denned capacity or trait. Fre-
quently, however, it is of greater importance to examine the
relation of some capacity, such as general intelligence, to
some other capacity, such as musical ability, than to measure
performance in a single trait alone. For example, we may
ask whether there is any relation between general intelligence
as measured by a standard intelligence test and scholastic
achievement as measured by " grades " or " marks." Or,
more specifically, we may inquire whether an individual who
gives evidence of high general intelligence tends to outstrip the
average individual in school work. Again, knowing the ability
of an individual in one test, can we say anything about his
ability in another and different test? Are certain abilities
highly related, and others relatively independent? These
questions, and others of the same general nature, are studied
by the Method of Correlation.
The statistical device whereby relationship is expressed
on a quantitative scale is called the " coefficient of correlation,"
and is designated by the letter " r."
Let us consider first the situation where the correlation is
fixed and unchanging. We know that the circumference of
a circle is always 3.1416 times its diameter, no matter how
large or how small the circle, or in what part of the world we
149
150 STATISTICS IN PSYCHOLOGY AND EDUCATION
find it. Each time that we increase or decrease the diameter
of a circle, we increase or decrease the circumference by just
3.1416 times the same amount. In short, the relation is fixed
and definite, and hence we say that the " correlation" between
diameter and circumference is perfect, and that r is equal to
1.00. In like manner, if we find that 100 men take exactly the
same arrangement in two tests, so that the man who ranks first
(or highest) in the one ranks first in the other, the man who
ranks second in the first test ranks second in the other, and
that this one-to-one correspondence holds throughout the
entire list, the correlation here is perfect also, for the relative
position of each man is exactly the same in one test as in the
other. The coefficient of correlation, r, is equal to 1.00.
Now let us consider the case where there is just no relation
at all. Suppose that we have examined 100 college seniors
on the Army Alpha test and on a tapping test. The average
Alpha score for the whole group is 175, and the average tap-
ping rate is 185 taps in 30 seconds. Suppose further, that
when we divide our group into three equal parts, the average
Alpha score of the upper one-third is 190, and the average
tapping rate 184; the average Alpha score of the middle third
is 175 with an average tapping rate of 186; and the average
Alpha score of the lowest one-third is 160 with an average
tapping rate of 185. Now clearly since the tapping rate is
almost identical in all three groups, we should be unable to
draw any conclusion from a man's tapping rate alone as re-
gards his probable score on Alpha. An average tapping rate
of, say, 185 to 190, is as liable to be found with an Alpha score of
150 as with one of 175 or even 200. We should be as well
qualified, then, to estimate a man's Alpha score knowing only
his tapping rate as we should be able to estimate it if all we
knew about the man in question was that he had blue eyes
and light hair. In either case our estimate would be no better
than a guess. There is, therefore, little or no correspond-
ence in the degree or amount of capacity possessed by a given
individual in the traits measured by the two tests, and the
CORRELATION 151
coefficient of correlation r will equal zero, which means that
there is just no correlation present.
So far we have indicated that perfect relationship may
be expressed by a coefficient of 1.00, and that just no rela-
tion by a coefficient of 0. Between these two limits we may
have relations of varying degree, indicated by such coeffi-
cients as .30, .60, .90. In every case a coefficient between
0 and 1.00 implies some degree of positive association, the
degree of association depending on size of the coefficient.
Relation may be negative as well as positive, however.
That is, a large degree of one ability may be associated with a
small degree of another, or vice versa. When this inverse
relation is perfect, r equals — 1 . 00. To illustrate, suppose that
in a certain group of 25 boys, we find that the boy standing
highest in Latin ranks lowest in Shop Work; that the boy who
stands second in Latin stands next to the bottom in Shop Work ;
and that any given boy is found to stand exactly the same
distance from the top of the group in Latin as he stands from the
bottom of the group in Shop Work. Table XVI on p. 152 will
illustrate the situation.
The correspondence here is fixed and definite enough, but
the relation is inverse. Hence the correlation, while perfect,
is negative, and the coefficient of correlation r equals — 1 . 00.
Negative coefficients may range all the way from — 1 . 00
up to 0, just as positive coefficients range from 1 .00 down to 0.
Coefficients of correlation, then, may range up and down
on a scale which extends from — 1 . 00 through 0 to + 1 . 00. A
positive correlation indicates a positive relation or correspond-
ence; a zero correlation the absence of relation; and a negative
correlation indicates an inverse relation. While for the sake of
simplicity, we have illustrated above only perfect positive,
perfect negative, and zero correlation, only rarely do we get
coefficients at the extremes of the scale. In most cases cal-
culated coefficients will be found at intermediate points, e.g.,
at .90, . 20, — . 30, etc. Such intermediate values as these
are to be interpreted as " high " or " low " in a general way
152 STATISTICS IN PSYCHOLOGY AND EDUCATION
depending upon how close they are to ± 1 . 00 or 0. A more
complete discussion of the meaning of a correlation coefficient
is given later on page 160.
TABLE XVI
To Illustrate a Correlation of
-1.00
Boy
Standing in Latin Standi
ing in Shop Work
1
1
25
2
2
24
3
3
23
4
4
22
5
5
21
6
6
20
7
7
19
8
8
18
9
9
17
10
10
16
11
11
15
12
12
14
13
13
13
14
14
12
15
15
11
16
16
10
17
17
9
18
18
8
19
19
7
20
20
6
21
21
5
22
22
4
23
23
3
24
24
2
25
25
1
II. The Coefficient of Correlation: — What it is, and
What It Does
1. The Coefficient of Correlation as a Ratio
Instead of taking up directly the method of computing
an r, we shall first try in this section to give a clear notion
of just what an r represents and how it measures relationship.
The steps in the calculation of r by the "product-moment '
method — the standard method — will then be given in detail in
the next section.
Let us begin with Diagram XVI. This diagram, which is
CORRELATION
153
DIAGRAM XVI
To Show How Correlation May be Expressed as a Ratio
Weight in
Kgs. (X- variable)
45-
49
50- 55-
54 59
60- 65-
64 69
70-
74
7579
80-
84
189
1
185
/
"3
184
1
3
3
4
2
3
XJ
180
/
///
///
////
//
///
eS
"S
>
179
4
11
6
3
2
2
TO
175
////
Mm//
m/
///
//
//
174
2
9
11
8
2
1
H
170
//
M////
m m
m/ ///
//
/
a
169
1
5
7
10
3
fell
165
/
m
m///
m/m/
///
164
1
2
7
i
2
160
/
//
m///
/
159
1
1
i
155
/
/
/
Fy Av.wt.
1 82.5
16 71.3
28 66.4
33 62.8
26 59.2
13 57.9
3 54.2
Fx
10
28
37
22
Av. ht. 162.5 166.5 169.8 172.8 173.6 178.6 178.5
(A)
Weight
80-84
75-79
70-74
65-69
60-64
55-59
50-54
45-49
«
Av. ht. for given wt.
181.7 «
178.5 7
178.6 S
173.6 S
172.8 ^
169.8 S
166.5 |
162.5 «
Height
185-189 w
180-184 7
175-179 I
170-174 X
165-169 ~
160-164 |
155-159 *
6 120
181.7
(B)
Av. wt. for given ht.
82.51 £
71.3J71-91
66.4 3
62.8 I
59.2 S
57.9
54.2
a
Increase in average height 19.2-^-6.55 = 2.93
Corresponding increase in actual weight 37 . 5 -f- 7 . 75 = 4 . 84
Ratio, ttt7= -60
4.84
Increase in average weight 17.7^-7.75 = 2.28
Corresponding increase in height 25-^6.55 = 3.82
Ratio, |^|= .60
Average height = 172 . 6 cms. (rbt. = 6 . 55 cms.
Average weight = 63 . 4 kgs. <rwt. = 7 . 75 kgs.
Ratio, -^-' = ~Tr- = 118
p-ht. o . 55
154 STATISTICS IN PSYCHOLOGY AND EDUCATION
called a " scatter diagram," represents the paired heights and
weights of 120 college men. The construction of such a scat-
ter diagram is relatively simple. Along the left hand margin
from bottom to top are laid off the steps of the height distribu-
tion; while along the top of the diagram from left to right are
laid off the steps of the weight distribution. Each of the 120
men may now be located on the diagram with respect both to
his height and his weight. Suppose, for example, that a man
weighs 68 kgs. and is 176 cms. tall. His height locates him in
the 3rd row from the top, and his weight in the 5th column
from the left. Accordingly, this man belongs in the third
" cell " of the 5th column and a tally is put in this cell. Note
that in Diagram XVI there are 6 men and 6 tallies in this
cell — that is, there are 6 men who weigh 65 to 69 kgs. and
are 175 to 179 cms. tall. In the manner described every one
of the 120 men has been located in some cell or square
according to the two attributes, height and weight. Along
the bottom of the diagram in the Fx row will be found the
number of men who fall within each weight column (weight
is the ^-variable, page 60) ; while along the right hand margin
in the Fy column are tabulated the number of men who fall
within each height row (height is the F-variable, page 60).
Of course, both the Fy column and the Fx row total 120, the
number of men in all. All of the frequencies in each cell may
be totaled and written in numerical form as shown in the
diagram. When only the total frequency in each cell is given,
a scatter diagram becomes a correlation table (see Diagram
XXI).
Several important facts may be gleaned from the scatter
diagram as it stands. For example, we are able to classify
all the men in a given weight-column with regard to height.
In the 3rd column we find 28 men all of whom weigh 55 to
59 kgs. One of these 28 is 180 to 184 cms. tall; 4 are 175
to 179 cms. tall; 9 are 170 to 174 cms. tall; 7 are 165 to 169
cms. tall; and 7 are 160 to 164 cms. tall. In the same way
we may classify all the men within any height-row accord-
CORRELATION 155
ing to weight. In the row next to the bottom we find that
of the 13 men who are 160 to 164 cms. tall, 1 weighs 45 to
49 kgs.; 2 weigh 50 to 54 kgs.; 7 weigh 55 to 59 kgs.; 1
weighs 60 to 64 kgs.; and 2 weigh 65 to 69 kgs. It is fairly
clear, too, that the " drift" of paired heights and weights is
from the upper right section of the diagram (the "high score"
end) to the lower left hand section (the "low score" end).
That is to say, even a superficial examination of the diagram
indicates, in general, a fairly marked tendency for tall, medium,
and short men to rank high, medium, and low, respectively,
on the weight scale; and this observation holds, in spite of the
scatter of heights or weights within any given "array" (an
array is the distribution of cases within a given column or row) .
Without any further evidence, therefore, we should probably
be willing to hazard the guess that the correlation between
height and weight is positive and fairly high.
Suppose that we go a step further and calculate the
average height of the men who weigh 45 to 49 kgs. — the men
in column 1. The average height of these 3 men — using the
guessed average method of Chapter I — is 162.5 cms., and this
figure is entered at the bottom of the diagram. In the same
way, we can find the average height of the men who fall in each
of the succeeding weight-columns. These averages are tabu-
lated under (A) and from the summary it is evident that for an
actual weight increase of approximately 37.5 kgs.1 (from 47.5
to 85) we have a corresponding increase in average height of 19 . 2
cms. (from 162.5 to 181.7). Thus it is clear that in our group
of 120 college men, an increase of approximately 37.5 kgs. in
weight is paralleled by increase of 19.2 cms. in average height.
Before going any further let us shift from height to weight,
and applying the same method as above find the increase in
average weight which corresponds to the actual increase in
height. Taking the bottom row — the 3 men 155 to 159 cms.
tall — we find that the average weight of this small group is
1 The complete range is not taken into account because the data are scanty
at the ends of the distribution.
156 STATISTICS IN PSYCHOLOGY AND EDUCATION
54.2 kgs. The average weight of the 13 men who are 160 to
164 cms. tall is 57.9 kgs., and in like manner the average
weight of each height-row may be found and entered in the
" Average Weight" column. Summarizing the results for the
group in (B) as we did in (A) above, we find that along with an
increase in height of 25 cms. (160 to 185) there goes a cor-
responding increase in average weight of 17.7 kgs.1 (71.9 to
54.2).
Now if the coefficient of correlation measures the mutual
dependence or the degree of correspondence between two sets
of scores or measures, we should expect the ratio
increase in average height 19.2 . ,.
e.g., ^— to measure the cor-
corresponding increase in weight' 37.5
relation of height and weight, that is, to give us r. And like-
wise, and for the same reasons, we should expect the ratio
increase in average weight 17.7 , ,,
e.g., -^=- also to measure the
corresponding increase in height' 25
correlation. The two ratios work out, however, to be . 51 and
.71 respectively, which means evidently that neither is suit-
able as a measure of correlation, since the relation of height to
weight should certainly be the same as the relation of weight
to height in the same group.
The difficulty here — and while not an obvious one, it is easy
to understand once it has been pointed out — is that we have
failed to take account of the fact that the increases in height
and weight, and naturally the ratios formed from them,
depend for their numerical value upon the units which we have
arbitrarily chosen for measuring height and weight. Thus
while we have measured height in cms. and weight in kgs., it is
clear that different units, say, of 1 mm. for height and 1 kg.
for weight, or of 1 inch for height and 1 lb. for weight, would
have given us very different ratios. In other words, the ratios
which give the change in average height with corresponding
change in weight, and the change in average weight with cor-
i The single F in the top row has been combined with the F of the row just
below to prevent overweighting.
CORRELATION 157
responding increase in height will vary according to the units
in which height and weight are measured, and we have no way
of telling which ratio (or what unit) is the right one. The
best way out of this difficulty is to express the changes in
height and weight in terms of the a's of the height and weight
distributions, respectively. It will make no difference then
in what units our original measurements have been made, as
changes in both height and weight will be recorded in terms
of <j. The <j of the height distribution of our 120 men is 6.55
cms., and the a of the weight distribution is 7.75 kgs. (see
Diagram XVI). Accordingly, if we divide the increase in average
height and the parallel increase in weight by 6.55 and 7.75
. . ! „ . . increase in average height ,
respectively, the ratio T. — ^— -. — . , - becomes
corresponding increase in weight
2 93
. ' j or .605 (see Diagram XVI). And in like manner, if we
divide the increase in average weight and the parallel increase in
height by 7.75 and 6.55, respectively, the second ratio,
increase in average weight , 2.28 0^ „.
becomes - — or .60. lire two
corresponding increase in height 3 . 82
ratios are now equal, and either may be taken as representing the
coefficient of correlation1 — as giving the degree of association
between height and weight in our group of 120 men.
This method of finding relationship is useful for demon-
strating in a simple way what the ratio which we call the coeffi-
cient of correlation actually does. It is, however, neither a
very practical nor precise method of finding a coefficient of
correlation and is never used in actual practice. Its chief lack
of precision lies in the fact that in estimating the range of
scores or measures in either or both distributions (see footnote,
page 155) we are often uncertain where to begin or end the
series, due to the fact that the data are oftentimes scanty at
the extremes of the distributions. As a matter of fact, the coeffi-
cient of correlation in the present problem was first found
1 On a scale in which 1.00 denotes perfect relation.
158 STATISTICS IN PSYCHOLOGY AND EDUCATION
by the method given later on in Section III, and proper adjust-
ment was then made in the ranges so as to give the correct r.
2. Graphical Representation of the Coefficient of Correlation
Not only can we represent the coefficient of correlation as
a ratio, but we can also demonstrate graphically what a coeffi-
cient of correlation means. The correlation coefficient of
. 60 found in Diagram XVI between height and weight is shown
graphically in Diagram XVII. In this diagram the distance
taken to represent one unit (consider the step-interval as the
unit) on the height scale and the distance taken to represent
one unit on the weight scale have been selected with due regard
for the difference in size of the two cr's in order that changes
in height and weight may be comparable. This adjustment
is a very simple one. We know from Diagram XVI that
the cT(Wt.) which equals 7.75 kgs. is 1.18 times the or(ht.) which
equals 6.55 cms. (since ' ' =1.18). Hence it is only neces-
sary that we take each height-step 1 . 18 times the length ar-
bitrarily taken to represent one weight-step, in order that the
X and Y distances may be comparable. (Since the weight
distribution is laid off from left to right, and the height dis-
tribution from bottom to top, the first may be referred to as
the X variable, and the second as the Y variable, see page 60.)
To take a simpler case, if the a for height were twice as large
as the a for weight, we should take each step on the height
scale just \ each step on the weight scale.
When the diagram has been laid out in the manner described
above represent by a cross the mean height of the men in
each array — each weight column (these mean heights may
be found from Diagram XVI). Next, draw a vertical line
through the mean of the distribution of 120 weights, and a
horizontal line through the mean of the distribution of 120
heights. [The average height of the 120 men is 172.6 cms.,
and their average weight is 63.4 kgs. (see Diagram XVI)].
With these two lines as coordinate axes, draw through their
CORRELATION
159
intersection (the origin) a straight line which shall go through,
or as close as possible to, each of the crosses which have been
plotted. A rough — but fairly accurate — method of drawing
a
.22 *^
>
*"' T— I
aH
o
a
•i-l C5
+3 CD
S 2
45-49 50-54
Weight in Kgs. (X - variable)
55-59 60-64 65-69 TO -74
rs-79
80-84
sc=3
o
/o
"^ X
II
x/
X
?y=3
X*
£C=5
xx
y/y.
°/
X
/ °
' o
Average weight line drawn through 63.4 kgs.
height " " " 172.6 cms.
DIAGRAM XVII
Coefficient of Correlation Shown Graphically
such a line is to stretch a black thread through the origin and
shift it back and forth until it touches as many crosses as
possible. The crosses at the extremes need not concern us
very much, since they are located from only a few cases. This
160 STATISTICS IN PSYCHOLOGY AND EDUCATION
sloping line, which may be called the line of " best fit," describes
better than any other straight line the " run " of the crosses —
the increase in average height which corresponds to the given
increase in weight. Accordingly, to find the correlation simply
find the ratio of the distance of any point on this sloping
line from the horizontal or X-axis to the distance of the
same point from the vertical or Y-axis. For example, if a
convenient point P is taken with x = 5 cms., its y distance
(measured by mm. ruler) will be found to be approximately
y . 3
3 cms., and the ratio - is -= or .60. In like manner, the x and
x 5
y coordinates of any other point on this sloping line will be
y
found to give the ratio - a value of . 60.
x
2 93
Our sloping line pictures graphically the ratio ' — the
4 . o-±
correlation of .60 — which we worked out in (1) above. This
line, which will be known hereafter as the " regression line
of height on weight," has important properties which will be
considered later (page 173). Also in the following sections we
shall give the equation of this line, which will enable us to draw
it in on the diagram very much more accurately than can be
done by the trial-and-error method described on page 159.
It is a comparatively easy though not a necessary task
to verify the correlation coefficient of .60 found from the
regression line of height on weight by drawing in the second
" regression line," that of weight on height. This can be done
by designating the means of the different height -rows by circles
in exactly the same manner in which we marked the means of
the weight-columns by crosses. (The means of the rows may
be obtained from Diagram XVI.) The mean of the lowest row
is 54 . 2, of next above 57 . 9, etc. When all of the circles have
been correctly placed, we draw a straight line which shall go
through — or as close as possible to — each circle, just as we did
with the crosses above. Now if a point P' is taken on this
second line with a y = 5 cms., its x distance will be found to be
CORRELATION 161
approximately 3 cms., and the ratio - is .60. This relation
holds for any point on the line. Both regression lines, there-
fore, give us the same measure of the correlation between height
and weight.
Diagram XVII is still further useful in showing just what a
correlation of 1.00, 0, or —1.00 is graphically. Suppose (1)
that the two regression lines in the figure move together until
they coincide in such a way as to make an angle of 45 degrees
with the horizontal or X-axis. The x value of any point on
this " compound " line will always equal its y value — hence
the ratios - and - are always equal to each other l and r equals
1 . 00 (see Diagram XVIII). Accordingly, in perfect positive cor-
relation, ail the crosses and all the circles in a
correlation diagram fall along a single straight
line which runs from the upper right hand
section of the diagram (the 1st quadrant) to
the lower left hand section (the 3rd quadrant). x
The tallest man is the heaviest, the next
tallest, the next heaviest, and throughout
the entire 120 the correspondence of height diagram xviii
and weight is always 1 to 1.
Now suppose (2) that the first regression line, the line
through the means of the height arrays in the columns —
through the crosses — moves around until it coincides with the
X-axis, the line through the average of all the heights in the
table. And suppose again that the second regression line, the
line through the means of the weight arrays in the rows —
through the circles — moves around until it coincides with the
F-axis, the line through the average of all the weights in the
v x
table. The ratios - and - are now both equal to 0 (since in
x y
the first case x, and in the second case y, equals 0) and r, the
1 This is true also because the compound regression line becomes the diagonal
of a square. Again, the tangent of an angle of 45° = 1.00.
ftf
162 STATISTICS IN PSYCHOLOGY AND EDUCATION
o
o
o
)( X X
X XX
o
C)
C)
DIAGRAM XIX
coefficient of correlation, equals 0. The conclusion that r = 0
might also be drawn from the fact that under the conditions
described the average height is the same for the whole range of
weights and the average weight the same for the whole range
of heights. Hence, a man of average height is equally liable
v to be heavy, medium, or light, and a man
of average weight equally liable to be tall,
medium, or short. (Compare with the case
in which the average tapping rate was the
same for very high, high, and medium
high Alpha scores, page 150.) A picture
of zero correlation is shown in Diagram
XIX.
Lastly, suppose (3) that the two regression lines swing
around until they run from the upper left hand section (the
2nd quadrant) to the lower right hand section (the fourth
quadrant). Now if the two lines again coincide so as to make
an angle of 45 degrees with the X-axis — as described in (1) —
the x of any point on this compound line will always equal the
v x
y of the same point, and the ratios - and - will again always
x y
equal 1.00. A glance at the figure will show, however, that
either the x or the y of these ratios must
always be negative, and for this reason the
ratios will always be negative. The coef-
ficient of correlation, therefore, equals
— 1.00, and the relation is perfect but
inverse. In perfect negative correlation, it
is clear then that all of the crosses and all
of the circles fall along a single straight
line which runs from the upper left to the lower right hand
corner of the diagram. The tallest man in the group is the
lightest, the next tallest the next lightest, and as height de-
creases weight increases progressively. (Diagram XX.)
The regression lines coincide only when the correlation is
perfect — positive or negative. For degrees of correlation
45 >
DIAGRAM XX
CORRELATION 163
between these limits, the two regression lines are separate,
and take intermediate positions as shown in Diagram XVII
for an r = . 60.
III. The Calculation of the Coefficient of Correlation
by the Product-Moment Method
1. The Product-Moment Formula When Deviations Are Taken
from the Guessed Averages of the Two Distributions
With the meaning of a coefficient of correlation firmly in
mind as a result of the discussion of the last section, we are
now ready to consider the calculation of r by the product-
moment method.1 Diagram XXI will serve as an illustration
of the computations involved. This correlation table gives
the paired heights and weights of 120 college men and is
derived from the scatter diagram for the same data shown in
Diagram XVI. The complete process of calculating r is out-
lined in the following steps. (Diagram XXI should be con-
stantly referred to in the discussion that follows.)
Step I
Construct a scatter diagram and from it a correlation table
as described on page 154.
Step II
Guess an average for the height distribution (given in the
Fy column), and draw double lines to mark off the row which
contains the GA^, as shown in Diagram XXI. Note that
the average for the height distribution has been guessed at
172.5 (midpoint of interval 170-174) and that Dy's have
been taken from this point. Now fill in the FDy and the
FDy2 columns. From the first column the correction Cv (cy in
units of step) is obtained; and this correction together with the
sum of the FDy2 column will give the <j of the height distribu-
tion, uy. The value of <ry is 6.55 cms. (1.31X5) — see calcula-
tions in the Diagram.
1 The r found by this method is often called the " Pearson r " after Prof.
Karl Pearson, who devised the product-moment formula, following Bravais's
earlier work.
164 STATISTICS IN PSYCHOLOGY AND EDUCATION
DIAGRAM XXI
Calculation of the Product-Moment Coefficient of Correlation
between the heights and weights of 120 college men
Weight in kgs. CX variable)
4549
50-54
55-69
60-64
65-69
70-74
75-79
80-84
Fy
By
3
2
I
0
-1
, a
-3
FDy FDfr 2a»V
3 9 12
oo
(12)
1
12
I
16
28
33
26
13
3
*•*! TO
C l-H
"ice 3
(-2)
1
-2
3
3(?)
6
(4)
4
16
(G)
2
12
3<8>
24
32 64 68 2
(-1)
4
-4
0
11
(l)
6
6
(2)
3
6
(3)
2
6
(4)
2
8
28(63) 28 26 4
Eg
2°
0
9
0
11
8°
2°
1°
(3)
1
3
(2)
5
10
(1)
7
7
0
10
(-1)
3
-3
-26 26 20 3
<S 3
CO
(6)
1
6
2<4)
8
7(2>
14
0
1
-4
-26 52 28 4
J?
(9)
1
9
(6)
1
6
0
1
- 9 (-61) 27 15
Ea
Ac
3 10 28 37 22
-3-2-1 0 1
9 5 6 120
2 3 4
2 206 159 -13
(146)
iFDa; -9 -20 -28 (-57) 22
18 15 24 (79) =22
,.FZ>| 27 40 28 22
36 45 9S = 2Q4
Calculation of r:
VEST-017
22
Cx = ^-T=.183
146
Y^-.017X.183
c22/=.0003
c2*=.0334
r
1.31X1.55
Cy=.0
85
<5
Cx=.
915
r
= .60
S-.OOOS)
/294
/ 0334X5
PEr
.6745[l-(.60)2]
Vl20
<ry = 1.3lX5
<rx = 1.55X5
PEr
= .04 (Table XVIII)
tTy = 6 . 55
<rz = 7.75
Now guess an average for the weight distribution (given
in the Fx row) and draw double lines to designate the column
which contains the GA{yrt,). The average of the weight
distribution has been guessed at 62.5 (midpoint of interval
60-64) and ZVs have been taken from this point. Fill in the
FDX and FDX2 rows. From these rows the correction Cx
CORRELATION 165
(cx units of step) and the a of the weight distribution ax, may be
obtained. The value of ax is 7.75 kgs. (1.55X5) — see calcula-
tions on the Diagram.
Step III
The calculations in Step II simply repeat the familiar proc-
ess of finding a <r by the Guessed Average Method. (Chapter
I, page 35.) Our first new task is to fill in the 'Zx'y' column.
The entries in this column may be either + or — , and hence
two columns are provided under ^x'y', one for plus and one
for minus entries.
The procedure for determining the entries in the 2x'y'
column may be illustrated by taking the single entry in the
only occupied cell in the topmost row. The deviation of this
cell from the GA of the weight distribution, that is, its Dx, is 4
steps, and its deviation from the GA of the height distribution,
its Dy, is 3 steps. Hence, the product of the deviations of this
cell — its " product-moment " — from the two guessed averages
is 4X3 or 12, and a small figure 12 is placed in the upper
right hand corner of the cell.1 Moreover, since the " product-
moment " of the 1 frequency in this cell is 1(4X3) or 12 also,
a figure 12 is placed in the lower left hand corner of the cell to
denote the product of the deviations (or the product-deviation)
of this single frequency from the two GA's. There are no
other frequencies in the cells of this row, and 12 is placed at
once in the Xx'y' column 2 under the + sign.
Now let us consider the next row from the top, taking the
cells in order from right to left. The cell below the one whose
product-deviation we have just found, also deviates 4 steps
from the GA of the weight distribution (its Dx = 4) but its devia-
tion from the GA of the height distribution is only 2 steps
1 We may take the coordinates of this cell to be x = 4, and y =3. The first
is obtained by counting over 4 steps from the vertical column containing the
GA for weight, and the second by counting up 3 steps from the horizontal row
containing the GA for height. In each case the unit of measurement is the step-
interval.
2 The prime (') of x and y deviations is to indicate that all deviations are
taken from the two GA's.
166 STATISTICS IN PSYCHOLOGY AND EDUCATION
(its Dy = 2). Hence the product-deviation of this cell is 4X2
or 8 [note the small (8) in the upper right hand corner of the
cell], and since there are 3 frequencies in the cell, each with a
product-deviation of 8, the final entry in the lower left hand
corner of this cell is 3(4X2) or 24. In like manner, the product-
deviation of the 2nd cell in the row is 6, — its Dx=3, and its
Dy = 2, — and since there are 2 frequencies in the cell, the final
entry is 2(3X2) or 12. Each of the 4 frequencies in the third
cell has a product-deviation of 4 (the Dx of the cell is 2, and the
Dy is 2 also) and the final cell entry is 4(2X2) or 16. In the
4th cell each of the 3 frequencies has a Dx of 1 and a Dv of 2,
and the product deviation is 3(1X2) or 6. The entry of the
5th cell, the cell in the (?A(wt0 column, is 0, since Dx = 0, and
of course 3(2X0) =0. Notice particularly the entry in the
last cell of this row, viz., —2. This negative entry results
from the fact that the deviation of this cell from the GA(wt0,
its Dx, is —1, and its Dy is 2; the product-deviation of its
single frequency, therefore, is 1( — 1X2) or —2. Now total
separately the plus and minus x'y"s in this row. The results,
58 and —2, are entered separately in the lix'y* column under
the appropriate signs.
The final entries of the cells in the other rows in the table
and the sums of the product-deviations of each row are obtained
in the manner described above. It must be borne in mind
in calculating x'y"s that the product-deviations of all frequencies
in the first and third quadrants are positive, while the product-
deviations of all the frequencies in the second and fourth quad-
rants are negative (see page 162). Also remember that all
frequencies in either the column containing the GAiwti) or
in the row containing the GAiht,} have 0 product-deviations,
since in one case the Dx, and in the other the Dy, equals 0.
All frequencies in any given row have the same Dy, and for
this reason the arithmetic of calculation may be considerably
reduced if each frequency in the row is first multiplied by its
DXj and the sum of these deviations multiplied once for all
by the common Dv. To illustrate, for the 2nd row from the
CORRELATION 1G7
bottom — taking the cells from right to left — when we multiply
the frequency of each cell by its DX) the result is (2 X 1) + (1 X 0) +
(7X-l) + (2X-2) + (lX-3) or -12. Now multiplying this
partial " deviation-sum " by the Dy of the whole row, i.e., by
— 2, we get 24 at the final Hx'y' entry for the row. This result
checks the 28 and —4 entered separately in the lix'y' column.
This shorter method is useful in getting the total Xx'y' entry of a
given row quickly. It is less easy to check for errors, however,
than the method of getting the entry for each cell separately,
illustrated on page 166. l
Step IV
When the sum of the product-deviations of each row have
been entered in the Zx'y' column, the algebraic sum of the
Xx'y' column may be obtained (e.g., 159 — 13 = 146). The
coefficient of correlation is then found by the formula:
(23)
x'y'
■at <-ZOy
Xx'y'
146.
120 '
<Jx(Jy
for cx,
Substituting for (Ar , r^: for cx, .183; for cv, .017: and
I\ 1Z0
for ax and <rV} 1.55 and 1.31, respectively, (see Diagram XXI
for figures) r is found to equal . 60.
Notice that the terms cx, cy, ax and oy are all left in units of
step-interval when substituted in formula (23). This is done
simply because all product-deviations (x'yns) are in step-units
and hence it is very much easier to keep all the other terms
in the formula, and in consequence both numerator and de-
nominator, in step-units. By this procedure the value of the
1 Printed charts for facilitating the calculation of coefficients of correlation
by the product-moment method are now available. Examples are the Ruch-
Stoddard Correlation Charts, University Bookstore, Iowa City, Iowa, and
Thurstone Correlation Data Sheet, C. H. Stoelting & Co., Chicago. The first
of these gives the product-deviation of each cell printed on the chart. Otis
has also devised a correlation chart based on the product-moment method which
does away with the necessity of finding the x'y,J&. This chart is published with
directions for its use by the World Book Co., Yonkers, N. Y.
168 STATISTICS IN PSYCHOLOGY AND EDUCATION
fraction — the coefficient of correlation — is not changed and the
arithmetic is considerably reduced.
2, The Product-Moment Formula When Deviations Are Taken
from the Actual Averages of the Two Distributions
Since formula (23) assumes that all x and y deviations have
been taken from the two guessed averages, for this reason it is
necessary to correct — ~ by the amount of the two corrections,
cx and cy. If deviations are taken from the actual averages of
the two distributions instead of from the GA's, no correction
is needed, as both cx and cv then equal 0. Thus when devia-
tions are taken from the two averages, formula (23) becomes
Xxy (24)
NaxVy
and this is the form in which the product-moment formula is
usually written. The formula may be put in still another form.
If we write J-rr- for <jx and \/-tt- for <?V) the formula then
becomes (the Ns cancel)
VZx2 • v 2y2
in which the x and y deviations are from the averages as in
(24) and Vzx2 and vlj/2 are the sums of the squared devia-
tions from the two averages.
Formula (23) should always be used when there are more
than, say, 30 or 40 cases. Formula (25) may be used, to
advantage, however, with short series when the purpose of the
experimenter is to find whether there is any relation present
rather than to discover the degree of relation very accurately.
No correlation table is required with formula (25). An illus-
tration of the use of this formula is given in Table XVII, in
which the problem is to find the correlation between the scores
CORRELATION
169
TABLE XVII
To Illustrate the Calculation of r when Deviations are Taken
from the Averages of the Distributions
Score in
Score in
Individual Testl(Z)
Test 2(F)
X
V
x2
y2
xy
A
50
22
-12
-8.4
144
70.56
100.8
B
53
25
- 9
-5.4
81
29.16
48.6
C
56
34
- 6
3.6
36
12.96
-21.6
D
58
28
- 4
-2.4
16
5.76
9.6
E
60
26
- 2
-4.4
4
19.36
8.8
F
61
30
- 1
- .4
1
.16
.4
G
61
32
- 1
1.6
1
2.56
- 1.6
H
64
30
2
- .4
4
.16
- .8
I
67
28
5
-2.4
25
5.76
-12.0
J
70
34
8
3.6
64
12.96
28.8
K
71
36
9
5.6
81
31.36
50.4
L
73
40
11
9.6
121
92.16
105.6
Average 62
30.4
Average (Test 1)=62.0
Average (Test 2) =30.4
578
282.92
317
V578- V282. 92
= .78
Pi^-6745(1Zl-78)V08
317.0
made on two tests of association by 12 adults. The steps in
finding r may be outlined as follows :
Step I
Find the average of Test 1 and the average of Test 2. In the
table the first average is 62 . 0, and the second, 30 . 4.
Step II
Find the deviations of each score in Test 1 from its average, 62,
and enter in column x. (The deviations from the average of the first
test may be called ^-deviations, those from the average of the second
test, y-deviations.) Find the deviation of each score in Test 2 from
its average, 30 . 4, and enter in column y.
Step III
Square all ^-deviations, and all ^-deviations, and enter these squares
in columns x2 and y2, respectively.
170 STATISTICS IN PSYCHOLOGY AND EDUCATION
Step IV
Multiply the corresponding x and y deviations and enter these
products in the xy column.
Step V
Substitute for Xxy (317), for 2z2 (578), for 2?/2 (282.92) in formula
(25) as shown in Ta.ble XVII, and solve for r.
IV. The Probable Error of a Coefficient of Correlation
The PE of an r may be found from the formula,
m = 16745XO-^
VN
If we substitute in formula (26) the r— .60 and the N= 120
of the height-weight problem (see Diagram XXI), PET will
equal .04.1 This means that the chances are even that the
" true " r falls within the limits . 60db .04, or between .56 and
.64; and that the chances are 9930 in 10,000 (Table XI) that
the true r falls within the limits .60±4X .04, or between .44
and .76. By the true r is meant (see page 118) that r which
we should expect to get between height and weight in the
population from which our group of 120 is, presumably, a
random sampling.
To be reasonably sure that there is some correlation present
an obtained r should be at least 4 times its PE. For example,
given the situation in which r is exactly 4 times its PE, in which,
say, r= .16 and PEr= .04, we can only be sure that the true r
falls within the limits . 16±4X .04, or between 0 and .32. It
is customary, therefore, not to consider an r as reliable — as in-
dicative of a correlation at least better than 0 — unless it is at
least 4 times its PE. To be certain of a low degree of correla-
tion an r should be 5 or 6 times its PE.
We found in Chapter III that the reliability of the differ-
ence between two averages or two medians can be calculated by
1 If we know r and Ar, the PET may be read directlv or bv interpolation from
Table XVIII.
CORRELATION
171
means of the formulas for <rmtt.) and PJ^(dia.)"(see page 128). In
the same way, the reliability of the difference between two
obtained r's can be found from the size of the PE of their
difference.
TABLE
XVIII
Probable
Errors
OF THE
Coefficient or Correlation for Various
Numbers of
Measures (N) and for Various Values of
r
Number of
Correlat
ion Coefficient r
Measures
0.0
0.1
0.2
0.3
0.4
0.5
0.6
20
1508
1493
1448
1373
1267
1131
0965
30
1231
1219
1182
1121
1035
0924
0788
40
1067
1056
1024
0971
0896
0800
0683
50
0954
0944
0915
0868
0801
0715
0610
70
0806
0798
0774
0734
0677
0605
0516
100
0674
0668
0648
0614
0567
0506
0432
150
0551
0546
0529
0501
0463
0413
0352
200
0477
0472
0458
0434
0401
0358
0305
250
0426
0421
0409
0387
0358
0319
0272
300
0389
0386
0374
0354
0327
0292
0249
400
0337
0334
0324
0307
0283
0253
0216
500
0302
0299
0290
0274
0253
0226
0193
1000
0213
0211
0205
0194
0179
0160
0137
Number of
Measures
0.65
0.7
0.75
0.8
0.85
0.9
0.95
20
0871
0769
0860
0543
0419
0287
0147
30
0711
0628
0544
0539
0444
0342
0234
0120
40
0616
0467
0384
0296
0203
0104
50
0551
0486
0417
0343
0265
0181
0093
70
0466
0411
0353
0290
0224
0153
0079
100
0391
0345
0294
0242
0187
0128
0066
150
0318
0281
0241
0198
0153
0105
0054
200
0275
0243
0209
0172
0133
0091
0047
250
0246
0218
0187
0154
0118
0081
0042
300
0225
0199
0170
0140
0108
0074
0038
400
0195
0172
0148
0122
0094
0064
0033
500
0174
0154
0132
0109
0084
0057
0029
1000
0123
0109
0093
0077
0059
0041
0021
The formula for PE{
diff.) between two
r's is
PEw&n-T$ = s/PE2Tl+PE\, . . . . (27)
in which PEn and PEn are the PE's of the two r's to be com-
pared, and must first be obtained from formula (26).
The value of formula (27) may be illustrated by the following
problem. Suppose that in a group of 100 eight year old boys the
172 STATISTICS IN PSYCHOLOGY AND EDUCATION
r between IQ and the A -cancellation test is . 20 with a PE of
.065; and that in a group of 110 eight year old girls the r be-
tween the same two tests is .25 with a PE of .06. The corre-
lation is .05 higher for girls than for boys. Is this difference
sufficiently large to indicate that the true correlation between IQ
and the A -test is higher for 8 year old girls than for 8 year old
boys? To answer this question, we must determine the PE
of the difference between the two r's. From formula (27),
P^(diff.r1-r2) = 'V/(.065)2+(.06)2=.09, and comparing the ob-
tained difference of .05 with the PE{dm, we find that
-5-^ = .556. This means (see Table XV) that there are only
64 chances in 100 of a real difference, a difference greater
than 0, between the true correlations of IQ and the A -test for
8 year old boys and girls. The difference of .05 is, therefore,
quite unreliable. To be completely reliable the obtained differ-
ence should be at least 4X.09 or .36. (A difference is con-
sidered reliable when r— is 4 or more, see page 133.) In
*& (diff .)
the present case the obtained difference is only about 14 per
cent of what it should be in order to guarantee a true difference
between the r's of the boys and girls.
The formulas for PET and PE^m.Tl-T2) are subject to the
same restrictions and must be interpreted with the same caution
as the other standard and probable error formulas (see Chap-
ter III, page 145). In order to be of any real value as meas-
ures of reliability, PEr and PE{am^ should be calculated for
r's obtained from random and reasonably large samples. PE's
found for r's obtained from small and obviously selected
groups may give an entirely false picture of the observed
coefficient's reliability — especially when the coefficient is large.
An r of .90 found from 20 cases, for instance, is unreliable
despite the fact that PEr= .03 (see Table XVIII). Another
sample of 20 drawn from the same population might give an
r one half as large.
CORRELATION 173
V. The Regression Equations
1. The Regression Equations in Deviation Form
We have already discovered (Diagram XVII) that there are
two regression lines in a correlation table, and that the first
" best fits " the means of the successive columns (the average
heights, represented by crosses) while the second " best fits "
the means of the rows (the average weights, represented by
circles). These lines of " best fit " were seen to be of value in
showing graphically the change in average height accompanying
a given change in weight, and the change in average weight
accompanying a given change in height. Moreover, we found
that either line will measure the correlation directly when the
x and y steps in the diagram have been laid out with due allow-
ance for the difference in size of the o-'s of the X and Y dis-
tributions.
This last use of the regression line is of little practical value,
however. It is very much easier to draw up a correlation
table without bothering about the difference in the two cr's,
and find r by the product-moment formula as shown in
Diagram XXI, thah to try and estimate r from the regression
lines. In fact, the real value of the regression lines is not to
give r, but to enable us to " predict" an individual's "most
probable" standing in a test or series of measures, given his
standing in another test or series of measures.
We may describe briefly how this is done. Suppose that
we wish to estimate a man's height from our correlation table,
knowing his weight to be 68 kgs. Now the best possible
" guess " that we can make of this man's height is to give the
average height of all men who fall in the 65-69 weight interval.
From Diagram XVI the " mean weight " of the 25 men in this
column is found to be 173.6 cms., and hence 173.6 cms. is the
most likely height of a man who weighs 68 kgs. In like manner,
the most probable height of a man who weighs 72 kgs. is 178 . 6
cms. — the mean height of the 9 men who fall in the weight
column 70-74 kgs. In general, then, the most probable height
174 STATISTICS IN PSYCHOLOGY AND EDUCATION
of any man is the mean of the heights of all the men in the group
who weigh the same (approximately) as he — who fall in the
same weight column.1 The line which best fits the mean
heights of the successive weight-columns is the line which gives
the change in average height with the change in weight (the
line through the crosses in Diagram XVII). Given a man's
weight, therefore, we can best " predict " his height from the
regression line of height on weight; and by analogy, given a
man's height, we can best predict his weight from the regres-
sion line of weight on height (the line through the circles in
Diagram XVII).
If we had the equations of the two regression lines, it
would seem obvious that estimates could be made from these
much more efficiently and quickly than from the plotted
regression lines. For then knowing a man's standing in the
X- variable (his weight) we should be able on substituting in
the equation connecting X and Y to find directly his most
probable standing in the F-variable (height). The equations
of the two regression lines have been deduced by Prof. Karl
Pearson, who took as his criterion the idea of the " best fit-
ting " fine. Pearson's method, briefly, was to find the equa-
tion of that line from which the sum of the squares of the
deviations of the means in the different arrays (the rows or the
columns) is the least possible.2 There are, of course, two such
lines. The one "best fits" the means of the rows, the other
"best fits" the means of the columns.
The equation of the line drawn through the means of the
columns (the crosses in Diagram XVII) is written in its
simplest form 3 as
y = r^-x (28)
1 There is a certain error of estimate made in taking a man's most probable
height as being the average of his weight-group. The method of finding the
size of this error will be considered later on page 1S3.
2 For a mathematical treatment of the application of the Method of Least
Squares to the problem of deducing the regression equations, see Jones, A First
Course in Statistics, 1921, pp. 106ff and 271.
s A brief review of the equation of a straight line and of the method of plot-
CORRELATION
175
The expression r— is called the regression coefficient and is
often replaced in the equation by the expression byx or 612,
so that (28) is sometimes written y = byx'X and y = bi2-x.
If we substitute the values of r, <ry, and <rx, — obtained from
Diagram XXI — in formula (28) we have
y= .WX^y^-x or y = .51x,
as the equation which measures the regression of height on
Y
AB=3l
/=6J
0
— x
DIAGRAM XXII
( .
ting a simple linear equation is given in order to simplify the discussion of the
regression equations.
Let X and Y be coordinate axes, or axes of reference. Now suppose that we
are given the equation y=2x and are required to represent the relation between
x and y graphically. To do this we substitute values for x in the equation and
compute the corresponding values of y. When x = 2, for example, j/ = 2X2 or
4; when a; = 3, y = 2X3 or 6. In like manner, given any x value, we can com-
pute the y which will " satisfy " the equation, that is, make the left side equal
to the right. Now if the series of points determined from the pairs of x and y
values as given by the equation are plotted with respect to the X and Y axes (see
Diagram XXII) they will be found to fall along a straight line, and this straight
line will picture the relation of x and y, y =2x. This line will pass through the
origin, since when x = 0, y also equals 0. The equation y = 2x represents, then,
a straight line which passes through the origin and the relation of its points is
y
such that - (called the slope of the line) always equals 2.
x
The general equation of any straight line which passes through the origin
may be written y = mx, where m is the slope of the line. If we replace the m
of the general formula by the expression r • — we see at once that the regression
<rx
equation in deviation form is simply the equation of a straight line which goes
through the origin.
176 STATISTICS IN PSYCHOLOGY AND EDUCATION
weight. This equation represents a straight line through the
origin, and hence it is a simple matter to plot it, as shown
in Diagram XXIII. First, however, we must draw a vertical
line through the point 63.4 kgs., the mean of all the weights
(the X's) in the table, and a horizontal line through 172.6 cms.,
the mean of all the heights (the Y's) in the table. These two lines
are the coordinate axes. Now since our plotted line must go
through the origin [see note (3), page 175], only one other point is
needed to determine it. If x = 2 (any value will do just as well) ,
y becomes .51X2 or 1.02. To plot this point, measure out 2
units from the origin along the horizontal axis and go up 1 . 02
units from the same line. This will locate the point, x = 2,
y = 1.02. (Any convenient scale may be used for measuring
off x and y distances — a mm. rule is useful.)
The line drawn through the point just located and the
origin (0, 0) is the regression line of height on weight. From
the equation, it is clear that a point on this line with an a:- value
of 1.00 has a corresponding y~ value of .51 (substitute x=l
in the equation and 2/=. 51). This means that a deviation
of 1 unit from the mean of the X's (from the vertical line
drawn through the mean weight of the group) is accompanied
by just . 51 time as much deviation from the mean of the F's
(from the horizontal line drawn through the mean height of
the group) (see Diagram XXIII). Put concretely, a man
who stands 1 kg. above the average weight of the group is
most probably .51 cm. above the mean height of the group
also — if his weight is 64.4 kgs. (63.4+1.00) his height is
probably 173.11 cms. (172.6+.51). To take another exam-
ple, the man who weighs 60 kgs. — stands 3.4 kgs. below the
mean weight — is most probably 170.87 cms. tall — stands 1.73
cms. below the mean height. In this example, we substitute
#=—3.4 in the equation, and y=— 1.73. In general then
we know from the regression equation that the most prob-
able deviation of any individual in our group * from the mean
1 Or in the population from which our group of 120 is drawn, provided the
group is a random sample.
CORRELATION
177
DIAGRAM XXIII
Illustrating Position op the Regression Lines, and Calculation
or the Regression Equations
(Calculation of r repeated from Diagram XXI)
4549
Weight
50-54 55-59
in kgB. (X-variable)
60-64 65-69 70-74 75-79
80-84 Tu
to
7.
TO
~T
1
1
12
1
3 3
<3 rt
(-2)
1
-2
1°
(2;
3
6 /
/i
16
2<0)
Ha
16
3 £;
b*
2 b
(-1)
4
-4
i°
/l)
/6
<2>
"6
2
6
(4)
2
8
28
-2-°
0
— 9 —
i?
S' 0
— 8 —
0
-■2--
0
33
* OS
•9 S
"3 55
1*
3 „
J?
7 /
'1
(-1)
3
-3
26
1
6
8 /
/ 1
14
<l°
(-2)
2
-4
13
«3
1(9)
9
6
ii°
i
3
Dy
3
FDy
3
9
Zx'y'
+
12
2
32
64
58
2
1
28(03)
28
2G
i
0
-1
-26
26
20
S
o
-26
52
28
i
Fx 3 10 28 37
£>x "3 - 2 - 1 0
22
1
9
2
5 6 120
3 4
2 206 159 -
(14ft)
FZ>c -9 -20 -28 (-57)
22
18
15 24 (79) = 22.
FDX 27 40 28
22
36
45 96 = 294
Calculation of r:
*-j||-. 017
(
CX
= 120=183
c2*=.0334
146
if-.017X.183
c22/=.0003
1.31X1.55
CV=.085
Cx=.915
= .60
(?A(7) = 172.5
(
?A(X)=62.5
P#r=. 04
Aver.(F) = 172.6
Av
X5
3I\(X)=(
<TX='
33.4
/206
°y= \120~
0003
/294
Vl20
.0334X5
= 6.55
=7
r.75
Calculation of Regression Equations:
I. Deviation Form:
(1) y=.mx^iix=.51x
7.75
7 7^
71?/
II. Score Form:
(1) 7-172.6=.51(X-63.4)
7=.51X+140.3
(2) X-63.4=, 71(7-172. 6)
X=. 717-59.1
Calculation of Standard Errors of Estimate:
o-(est. Y)=6.55X.8 = 5.2 cms.
<r(est.X)=7.75X.8 = 6.20 kgs.
178 STATISTICS IN PSYCHOLOGY AND EDUCATION
height is just .51 as great as his deviation from the mean
height. Hence, given a man's deviation from the mean weight,
we are able to predict his most probable deviation from the mean
height of the group.
The regression equation, y = r- — -x, is known as the regres-
sion equation of Y on X in Deviation Form. Stated generally,
this equation measures the most probable deviation of any Y
measure from the mean Y corresponding to a known deviation
in the X measure from the mean X.
The equation of the second regression line drawn through
the means of the rows (the circles of Diagram XVII) is written
x = r- — -y (29)
Gy
This equation measures the regression of X on Y and in the pres-
ent problem, of weight on height. The regression coefficient r • —
<Ty
is sometimes replaced by the expression bxy or 621, so that
(29) is often written x = bxy-y or £ = 621-2/.
If we substitute in (29) the values of r, ax, and try found
from Diagram XXI, we have
7 75
x= .Q0X7r-^-y or x= .71?/,
0.55
as the equation which measures the regression of weight on
height. This equation, like the other, represents a straight line
through the origin; and consequently, one point on the line
together with the origin (0, 0) are sufficient to plot the line.
Put y = l in the equation, and x will equal .71. Now plot
the point a; =.71, y =1.00 on the diagram, and draw the
regression line through this point and the origin (see Diagram
XXIII).
It is evident from the second regression equation that a
deviation of 1 cm. from the mean of all the heights (F's) is
most probably accompanied by a deviation of .71 kg. from the
CORRELATION 179
mean of all the weights (X's) ; or put in a different way, the most
probable deviation of any man from the mean weight is just
.71 as great as his deviation from the mean height. A man
180 cms. tall, for example (7.4 cms. above the mean height),
most probably weighs 68.65 kgs. — is 5.25 kgs. above the
mean weight). (To get this result substitute 7.4 for y in the
equation, and solve for x.)
The equation x = r y is known as the regression equation
(Jy
of X on Y in deviation form. To summarize briefly it measures
the probable deviation of an X-measure from the average Xy
corresponding to a known deviation in the F-measure from the
average Y.
Although there are two regression equations, both of
which involve x and y, the student must bear in mind the
important fact that the two equations cannot be used inter-
changeably and that neither can be used to predict both x
and y. The first regression equation, y — r- — -x, is to be
<J*
used only when y is to be predicted from x (when y is
the " dependent " variable), while the second regression equa-
tion, x — r-— -y, is to be used only when x is to be predicted
(Jy
from y (when x is the " dependent " variable).1 There
are always two regression equations unless the correlation is
perfect. When r=1.00, however, the equation y = v— -x
becomes y = ~.x, or ax-y = cry-x) while the equation x = r-— -y
<JX (Jy
becomes x = — • y, or o-x-y = ay-x. The two equations are now
(Jy
identical, and the regression lines coincide.
As an illustration of this last condition suppose that the
* A dependent variable depends for its value on the other variable in the
equation. Thus in the equation y = r — •£, y " depends " on the value given x,
ax
180 STATISTICS IN PSYCHOLOGY AND EDUCATION
correlation between height and weight is perfect, ax and trw
remaining the same. The first regression equation would now
6.55
become y = 1 . 00 X 7'7g -x, or y= . 85?/, while the second regres-
7 75
sion equation would become x = 1 . 00 X w-r= 'V, or x = 1 . 18z/.
Algebraically, x— 1.18 z/ is equivalent to y= .85x (since in the
second equation # = -— , or x = 1.18y). Under the prescribed
. oo
conditions, therefore, we should have a single equation and a
single line, which would represent equally well a change (devi-
ation) in Y for a given change in X, or a change (deviation)
in X for a given change in Y. It may be added that when
r=1.00, and in addition the two as are equal or are made
equal by the arrangement of the diagram, the single regression
line makes an angle of 45 degrees with the horizontal axis (see
Diagram XVIII, and the discussion on pages 161-162).
2. The Regression Equations in Score Form
In the last paragraph the point was stressed that formulas
(28) and (29) are the equations of the regression lines in devi-
ation form — that values of x and y substituted in these equa-
tions are deviations from the means of the X and Y distribu-
tions and not actual scores or measures.1 While equations in
deviation form are all that we actually need for purposes of predic-
tion, it is often very convenient to be able to estimate an indi-
vidual's actual score in Y, say, directly from his score in X with-
out the trouble of first converting the X-score into a deviation
from the mean X. This can be done very simply if we emplo}^
the score form rather than the deviation form of the regression
equation. The conversion of deviation to score form may be
made as follows. Let the average of the F's be denoted
by Y' and any F-score by Y, then the y deviation of anjr
individual from the mean will be Y—Y' (the difference between
1 The small letters x and y are used to denote deviations from the means of
the X and Y distributions. The large letters X and Y denote actual scores.
CORRELATION 181
the score and the mean) or, in general, y=Y—Y'. In the
same way, we can show that, in general, x = X — X\ when x
is the deviation of any X score from the mean X from X'.
Now substitute 7 — Y' for y and X—X' for x in formulas
(28) and (29) and the two regression equations become,
Y-Y' = r-^(X-X') or Y = r-^(X-X') + Y', . (30)
and
X-X' = r--(7-7') or X = r.-(7-7')+X', . (31)
Gy Gy
These are the equations of the two regression lines in score
form. In both equations, X and Y now represent actual scores
and not deviations from the means of the two distributions.
If we substitute in (30) the values for Y' ', r, ay, gx, and X'
obtained from Diagram XXIII, the equation becomes
7-172.6= .60x!^(^-G3.4),
i . t o
or, clearing of fractions,
F=.51X+140.3.
To illustrate the use of this equation, let us suppose that a man
in our group weighs 60 kgs. (X) and that we wish to estimate
his most probable height (7). Substituting 60 for X in the
equation, 7 = 170.9; and accordingly the most probable
height of a man who weighs 60 kgs. is 170.9 cms.
If the problem is to predict weight instead of height, we
must use equation (31). Substituting the values for X', r,
ay, ffx, and Y' in the second equation we have
X-63.4= . 60X^45(7-- m. 6)
6,55
or
X=. 717-59.1.
Now given a man 180 cms. tall, we find putting 180 for 7 in
the formula, that X = 68.7 kgs. Hence the most probable
weight of a man 180 cms. tall is 68.7 kgs.
182 STATISTICS IN PSYCHOLOGY AND EDUCATION
It may seem strange to the student to talk of " pre-
dicting " a man's height from his weight, when we already
know the height and weight of all 120 men in our group. Of
course when we have both height and weight it is unneces-
sary to convert one into the other. Suppose, however, that
all we know about a certain man is his weight and the fact
that he falls within the age-range of our group of 120 men.
Now since we know the correlation between height and weight
in this group it is possible from the regression equation to
predict the most probable height of our subject in lieu of
actually measuring him. In the same way, the regression
equation may be used to predict the height of any man in the
population from which our group is taken, provided our group
is a random sample of the larger group. The regression equa-
tions hold, of course, only for the population from which the
sample group is drawn. We could not, of course, estimate the
probable heights of children or of women from a regression
equation which had been worked out for men between the ages
of 18 and 25 (the age-range of the men in our group of 120).
And conversely, we could not expect regression equations
worked out for elementary children to hold for older groups.
Probably height and weight — since they are both easily
measured — do not show the value of the regression equations
as well as other and more complex traits. To take a problem
of more direct interest, suppose that in a group of children
of approximately the same age the r between IQ and average
grades made in the first year of high school works out to
be .70. Now if we know the IQ of a child entering school
the next year, it is possible to estimate what his probable
scholastic performance will be from the regression equation
worked out from the group of the previous year. This may
be extremely valuable in educational guidance. The same
thing is true of vocational guidance — we may be able on the
basis of test scores to predict the probable success of an individ-
ual who contemplates entering a certain trade or profession,
and thus advise him more intelligently.
CORRELATION 183
3. The Reliability of the Predictions Made by the Regression
Equations
A. The Standard Error of Estimate, a{eKt.h or S
We have constantly referred to the values of X and Y
" predicted " from the regression equations as being the " most
probable " values of the one variable accompanying the given
value of the other. The method of showing just how reliable,
i.e., how probable, our predicted values are, is to calculate
their standard error of estimate, written o-(est). To find the
accuracy with which we are able to estimate F-values from
equation (30) , we employ the formula x
0"(est. y) = oyvl — f2, (32)
in which <jy is the <r of the F-distribution, and the " (est.)" is
to distinguish its <j from the expressions o-(dis.), 0"(aver.)> etc., r is,
of course, the coefficient of correlation between X and Y.
Now from equation (30) we have found that a man weigh-
ing 60 kgs. is most probably 170.9 cms. tall (see page 181).
To find the reliability of this estimate substitute in formula
(32), to find,
<r(est.y) = 6.55xVl-.62 = 5.2.
We may now say that the most probable height of a man weigh-
ing 60 kgs. is 170.9 cms. with a o-(est.) of 5.2 cms. — and that
the chances are 68 in 100 that the actual height of the given
individual falls within the limits 170. 9 =±=5. 2, or between 165.7
cms. and 176 . 1 cms. We may be practically certain that the
height of this man falls within the limits 170.9±3X5.2; or
between 155.3 cms. and 186.5 cms.
In order to find with what degree of accuracy we are able
to predict X values from equation (31) we use the formula,2
o-(est.x) = o-xV/l — r2, (33)
in which <tx is the a of the X-distribution.
1 c(est. Y) is sometimes written Sy,
2 o"(est. X) is sometimes written Sx-
184 STATISTICS IN PSYCHOLOGY AND EDUCATION
We have already found from formula (31) that the most
probable weight (X) of a man 180 cms. tall is 68.7 kgs. (see
page 181). To find the cr(est. X) of this prediction we substitute
for ax and r in formula (33) :
<r(est.x) = 7.75xVl-.62 = 6.2.
Hence the most probable weight of a man in our group (or in
the population from which it is drawn) who is 180 cms. tall is
68.7 kgs. with a (7(est.) of 6.2 kgs. The chances are 68 in 100
that the actual weight of this man falls within the limits
68.7±6.2, or between 62.5 and 74.9 kgs. We may be prac-
tically certain that his weight falls within the limits 68.7±3X
6 . 2 or between 50 . 1 and 87 . 3 kgs.
B. The Probable Error of Estimate, PE(est.)
The Pi^t.) may be used for estimating the accuracy of a
prediction instead of c(est.). PE{esU) is obtained by simply
multiplying 0-(e8t.) by the constant .6745. Thus
P£(est.y)=. 6745X^1^ .... (34)
and
P^(est.x,= .0745XcrxVl^7, .... (35)
The height of a man who weighs 60 kgs. has been estimated
to be 170.9 cms. with a o-(est. d of 5.2 cms. The PE{a3bmY} of
this estimated height is .6745X5.2 or 3.5 cms. The chances
are even, therefore, that the actual height of this man falls
within the limits 170.9±3.5 or between 167.4 and 174.4 cms.
In like manner, since the estimated weight of a man ISO
cms. tall is 68.7 kgs. with a o-(est. X) of 6.2, the PEiesuX) of this
man's weight will be .6745X6.2 or 4.2 kgs. The chances are
even that this man's actual weight lies within the limits
68.7d=4.2 or between 64.5 and 72.9 kgs.
The formulas for <r(est.) and P£,(est,) measure the error made
in taking predicted instead of actual X and Y scores. Note
that when r=1.00, VI- r2 is 0; and consequently since both
CORRELATION 185
o-(est.) and PE {est.) are then zero, there is no error of prediction.
This result follows because all of the paired scores fall on the
one double regression line when r=1.001 (see page 161).
An inspection of the formulas for o-(est.) and PE^U) shows
that the accuracy of the prediction from the regression equa-
tions depends upon the o-'s of the two distributions (the uv
and crx) and upon the degree of correlation between the two
traits. If the variability in Y, say, is small, and the correlation
between Y and X high (e.g., .90 to 1.00) values of Y can be
predicted from known values of X with a comparatively high
degree of accuracy. When the variability is large or the correla-
tion low, however, the prediction often becomes so unreliable
as to be almost valueless; and even with a fairly high coeffi-
cient, predictions will often have such a large error of estimate
as to be almost valueless. Thus, in spite of the fact that an
r=.60 is usually considered fairly substantial,2 we can only
predict a man's height (F), knowing his weight (X), within a
PE{est.) of 3.5 cms. In other words, the chances are only 50
in 100 that the actual height does not differ from the predicted
height by more than ±3.5 cms.
When using the regression equations for prediction, the
o-est. or the PEest. should always be given. In general, the
value of a prediction will depend — in addition to the size of
the error of estimate — upon the fineness of the units of measure-
ment and the purposes for which the prediction is made.
VI. The Complete Solution of a Correlation Problem
In Diagram XXIV will be found the complete solution of
a second correlation problem. The purpose of another
" model " problem, in addition to the height-weight problem
in Diagram XXIII, is to strengthen the student's grasp on cor-
relation by having him work through the steps in finding r
and the regression equations with a new set of data. Often-
1 See Monroe, An Introduction to the Theory of Educational Measurements,
1923, pp. 351-353, for a graphical demonstration of the meaning of <r(est.).
2 See, however, the discussion of high and low correlation on page 288ff.
186 STATISTICS IN PSYCHOLOGY AND EDUCATION
DIAGRAM XXIV
To Illustrate the Complete Solution of a Correlation Problem
IQ First Test(X -variable)
90- 95-" 100 105-110- 115- U20- 125- 130- 135- 140- 145- 150-
94 99 104 109 114 119 124 129 134 139 144 149 154-^2/
155-159
150-154
145-149
~ 140-144
a
| 135-139
g 130-134
Dy
8
7
6
5
4
3
2
1
0
- 1
-2
-3
-4
-5
-6
FDy
24
192
+ -
IS
14
98
13
12
72
13
40
200
37
24
90
24
21
03
21
26
52
26
13 (174) 13 13
-19
-24
-45
-24
- 15
19
48
135
96
75
- C(-133)36
41 1195
3
13
31
17
14
5
144
91
78
185
96
63
52
13
3
26
93
68
70
30
1012
FDX -15 -12 -24-28 -21 (-100) 14 28 24 44 35 24 21(l90) = 90
FD%
75 48 72 56 21
14 56 72 176 175 144 147 = 1056
, 41- S
ch = . 09
Cv=1.5
Afi/ = 117.5+1.5
= 119
Cx
90 AA
Calculation of r:
1012
c2x=.44
Cx = 3.30
Mx = 117.5 +3.30
= 120.8
r =
136
.3X66
2.95X2.71
= .91
PEr=. 01 (Table XVIII)
<rv=y
1195
133
= 2.95X5
= 14.75
09X5 ax =
A
1056
136
= 2.71X5
= 13.55
Calculation of Regression Equations:
I. Deviation Form:
,44X5
y
yiX13.55X
Q1v13j5
X= .91X, ; „rV
99.c
S4y
14.75
Calculation of PEW.)
PE {sst. Y) = . 0745 X 14 . 75 X Vl-(.91)2
= 4.12(4)
PEm. X) = ■ 6745 X 13 . 55 X ^T~
= 3.79(4)
II. Score Form:
r-119=.99(X-120.S)
F=.99X-.59
X-120.8=.S4(F-119)
X=.S4F+20.S
Examples :
Let X = 100
F = 99-.59or9S±4
Let X = 120
r=ii8d=4
(.91 2) Let F = 100
Ar = S4+20.84
= 104=fc:4
CORRELATION 187
times when only a single model problem is given, one fails
to understand certain points in the solution which another
entirely different problem will succeed in clearing up. A brief
discussion of the important points in the solution of this prob-
lem will be given in the following paragraphs, which the student
should read with Diagram XXIV before him.
The problem is to find the relation between the 7Q's of 136
children (of same chronological age) as determined from two
individual intelligence tests. The correlation table has been
constructed from a scatter diagram as explained on page 154.
The first set of IQ's is the X- variable, and the second set of IQJs
the F-variable. Since the calculations of the two averages,
cx, cy, <TX, and <rv, cover familiar ground and have been given
in detail on the diagram, they need not be repeated.
Note first, then, that the product-deviations in the "Zx'y'
column have been taken from column 115-119 (the column
containing the GA of the X-distribution) and row 115-119
(the row containing the GA of the F-distribution) . The
entries in the Hx'y' column have been obtained by the shorter
method described on page 167 — each cell frequency in a given
row has been multiplied by its Dx, and the sum of these partial
deviations entered in the column Zsc'. This entry has then been
" weighted " (multiplied) once for all by the Dy of the whole
row. To illustrate, in the first row (reading from left to right)
we have (IX 5) + (IX 6) + (1X7), or 18, as 2x' entry. (The
DJs are 5, 6, and 7, respectively, and may be found from the
Dx row at the bottom of the diagram.) The common Dy is 8,
hence the 2x'y' entry is 18X8 or 144. Again in the eighth row,
we have (3X-1) + (2X0) + (3X1) + (3X2) + (1X3) + (1X4) or
13 as the Xx' entry. The Dv of this row is 1, and hence the
Xx'y' entry is 13. To take still another example, in the eleventh
row we have (2X -3) + (3X-2) + (3X -1) + (2X0) + (2X1) or
— 13 as the 2a/. Since the common Dy is ( — 2), the x'y' entry
here is +26.
After all of the 2x'yf entries have been made and the sum of
the column found, the calculation of r from formula (23) and of
188 STATISTICS IN PSYCHOLOGY AND EDUCATION
PEr from formula (26) are simply matters of substitution.
Remember that cX} cy, <rv, ax, are all left in units of step-interval
in the r formula (see page 167).
The regression equations in Deviation Form under (1)
have been found by substituting the values of r, crx, and ay in
formulas (28) and (29), and the two straight lines which these
two equations represent have been plotted on the diagram.
So far as the actual solution of the problem is concerned, it is
unnecessary to plot these lines. They are of value, however,
in indicating whether the means of the X and Y arrays may be
fairly represented by straight lines; i.e., whether the regression
is apparently " linear." If the relation is not " straight-line,"
other methods must be employed in calculating the correlation
(see page 203.)
The regression equations in Score Form have been found,
the one by substituting the two averages and the regression
coefficient of Y on X (.99) in formula (30), and the other by
substituting the two averages and the regression equation of
Ion 7 (-84) in formula (31). The calculation of the two
PE's of estimate is shown on the Diagram. PE^est, Y) is found
from formula (34) ; PE(esU X) from formula (35) .
Several examples have been given in the diagram to illus-
trate the use of the regression equations in " prediction."
Note that an IQ of 100 on the first test (X) is most probably
accompanied by an IQ of 98 on the second test (Y) with a
PE(est. Y) of 4 . 12 (4) points. The chances are 50 in 100 that
the actual IQ on the second test falls within the limits 98 ±4,
or between 102 and 94. An IQ of 120 on the first test (X) is
most probably accompanied by an/Q of 118 points in the second
test (F), and the PE{est, y> is again 4 points. All predicted F's
have the same error of estimate, no matter where on the scale
the Y may fall.
While the errors of estimate <T(est.) and PE{est.) have been
used hitherto for the purpose of giving the reliability of specific
predicted scores, they may also be interpreted in a more
general fashion. A P^(est. r>, for instance, of 4 points may be
CORRELATION 189
taken to mean that one half of the IQ's in test Y failed of per-
fect correlation with the IQ's in test X by ±4 points or more,
while the other one half failed of perfect correlation by less
than ±4 points.
In most correlation problems we are interested in pre-
dicting the scores on only one test. (F is usually taken as the
dependent, and X the independent variable.) For illustrative
purposes, however, an example is given in Diagram XXIV of
the prediction of an IQ in X from an IQ in Y. Thus for an
IQ(Y) of 100 we find the most probable IQ(X) to be 104 with
a PElesb, X) of 3 . 79 (4) points. The chances are 50 in 100
that the actual IQ(X) falls within the limits 104 ±4 points or
between 100 and 108.
VII. Methods of Measuring Correlation Which Take
Account Only of Relative Position or Rank
In many problems, especially in the fields of applied and
vocational psychology, the investigator finds that he must
work with data in which differences in capacity or merit are
expressed in ranks rather than in graded scores or measures.
To mention a few cases of this sort, we have individuals ranked
in order of merit for honesty, athletic ability, salesmanship,
or intelligence; and advertisements, colors, etc., ranked for
esthetic qualities, beauty, or individual preference. In com-
puting correlations from such material as this it is neccessary
to use methods which take account only of the relative posi-
tions or ranks. Also, when we have only a few scores (10 to
25 for example), it is often advisable to rank these in orders
of merit and compute the correlation by a rank method instead
of by the longer and more laborious product-moment method.
Coefficients of correlation calculated from a few cases are
nearly always unreliable, and of little value except in sug-
gesting the possible existence of relation, or as a preliminary
survey. In such cases, therefore, simple methods are recom-
mended, as they save much time and labor besides giving
190 STATISTICS IN PSYCHOLOGY AND EDUCATION
results which are as good as those secured by more elaborate
methods.
In the present Section we shall consider two methods of
finding the correlation when the data to be correlated have
been arranged in orders of merit. These methods are known
respectively as (1) the Method of Rank-Differences, and (2)
the Method of Gains or the Spearman " Footrule."
1. The Method of Rank-Differences
The method of rank-differences is illustrated in Table XIX.
The problem is to find the relation between the length of
service and the selling efficiency of 12 salesmen. The men are
listed in column 1, and in column 2, opposite the name of each
man, is given the number of years he has been in the service of
the company. In column 3, the men are ranked in order of
merit in accordance with the length of their service. For
example, G who has been longest with the company is ranked
1; C, the next longest, is ranked 2; and so on down the list.
Notice that both A and J have the same period of service, and
that each is ranked 7.5. Instead of ranking one 7, and the
other 8, or both 7 or 8, we compromise by ranking both 7.5,
and F who follows 9.1
In column 4 the men are ranked in order of merit for effi-
ciency by the salesmanager. The most efficient man (C) is
ranked 1, the least efficient (B) is ranked 12. In column 5,
the difference (the "D") between each man's efficiency rank
and his years of service rank is entered, and in the next column
(6) each of these D's is squared. The correlation between the
two orders of merit may now be computed by substituting for
2D2 and N in the formula,
62D2
p=1-ww^Ty (36)
1 When three or more individuals (or specimens of any sort) are tied —
have the same score — the simplest plan is to give them all the median order of
merit rating. Thus three individuals who are 5, 6, and 7, respectively, are all
ranked 6, and the next following 8; while four individuals who are 5, 6, 7, and
8, are all ranked 6.5, and the next following 9.
CORRELATION
191
TABLE XIX
To Illustrate the Rank-Difference Method of Finding Correlation
(l)
Salesmen
A
B
C
D
E
F
G
H
I
J
K
L
AT = 12
(2)
Years of
Service
5
2
10
8
6
4
12
2
7
5
9
3
(3)
Order of
Merit
(Service)
7.5
11.5
2
4
6
9
1
11.5
5
7.5
3
10
(4)
Order of
Merit
(Efficiency)
6
12
1
9
8
5
2
10
3
7
4
11
= 1
62D2
N(N2~1)
= 1
6X58
12(143)
(5)
Difference
between
Ranks
0>)
1.5
.5
1.0
5.0
2.0
4.0
1.0
1.5
2.0
.5
1.0
1.0
= .80
From Table XX r=. 81.
P^Jgg^S, ,07
(6)
Difference
Squared
(Z>2)
2.25
.25
1.00
25.00
4.00
16.00
1.00
2.25
4.00
.25
1.00
1.00
58.00
[See formula (37)]
in which D represents the difference in the rank of an individual
in the two series, and 2D2 is the sum of the squares of all such
differences. N is, of course, the number of cases, and p is
the rank order coefficient of correlation, p may be transmuted
into a product-moment r by means of Table XX.
Substituting 58 for 2D2 and 12 for N in formula (36), we
obtain a p of .80, and from Table XX this is found to be
equivalent to an r of .81. The PE of an r found from a p,
is about 5% larger than the PE of the product-moment r.1
The formula is
PEr =
7063(1 -r2)
Vn
(37)
and since, in the present example, r= .81, PEr— .07. Accord-
ingly, the coefficient of correlation though based on only 12
1 See Brown & Thomson, Essentials of Mental Measurement, 1921, p. 103.
192 STATISTICS IN PSYCHOLOGY AND EDUCATION
cases is conventionally reliable. Whenever N is less than
30, however, the PEr is probably much larger than the value
given by the formula. In any case r's and PEr's secured from
less than 30 cases should be accepted as tentative, and inter-
preted with caution. In the present example, all that we are
justified in concluding is that in our particular group of 12
men there is evidence of a close correspondence between rank-
ings for efficiency and number of years employed.
TABLE XX
A Table to Infer the Value of r from Any Given Value of p
62£>2
p — *■
N(N*-1)
p
r
p
r
p
r
p
r
.01
.0105
.26
.2714
.51
.5277
.76
.7750
.02
.0209
.27
.2818
.52
.5378
.77
.7847
.03
.0314
.28
.2922
.53
.5479
.78
.7943
.04
.0419
.29
.3025
.54
.5580
.79
.8039
.05
.0524
.30
.3129
.55
.5680
.80
.8135
.03
.062S
.31
.3232
.56
.5781
.81
.8230
.07
.0733
.32
.3335
.57
.5881
.82
.8325
.03
.0838
.33
.3439
.58
.5981
.83
.8421
.09
.0942
.34
.3542
.59
.6081
.84
.8516
.10
.1047
.35
.3645
.60
.6180
.85
.8610
.11
.1151
.36
.3748
.61
.6280
.86
.8705
.12
.1256
.37
.3850
.62
.6379
.87
.8799
.13
.1360
.38
.3935
.63
.6478
.88
.8S93
.14
.1465
.39
.4056
.64
.6577
.89
.89S6
.15
.1569
.40
.4158
.65
.6676
.90
.90S0
.16
.1674
.41
.4261
.66
.6775
.91
.9173
.17
.1778
.42
.4363
.67
.6873
.92
. 9269
.18
.1882
.43
.4465
.68
.6971
.93
.9359
.19
.1986
.44
.4567
.69
.7069
.94
.9451
.20
.2091
.45
.4669
.70
.7167
.95
.9543
.21
.2195
.46
.4771
.71
.7265
.96
.9635
.22
.2299
.47
.4872
.72
.7363
.97
.9727
.23
.2403
.48
.4973
.73
.7460
.98
.9818
.24
.2507
.49
.5075
.74
.7557
.99
.9909
.25
.2611
.50
.5176
.75
.7654
1.00
1.0000
2. The Method of Gains, or the Spearman Footrule
A second method of computing correlation when the data are
ranked in orders of merit is the Method of Gains, or the Spear-
man " Footrule.' ' Table XXI illustrates the use of the Foot-
CORRELATION 193
rule with the data taken from Table XIX. It will be noticed
that the first four columns are the same in both methods, i.e.,
each series is arranged first in an order of merit. The methods
differ from here on, however. The entries in column 5, which is
headed G (" Gains"), are found by taking the plus differences
or the gains in rank of the 12 men in the efficiency-rankings
as compared with their service-rankings. Thus A who ranks
7.5 in " service " and 6 in " efficiency " has an increase in rank
or gain of 1 . 5 in the second ranking over the first.1 C, F, H, I,
and J, likewise register plus differences or gains in their effi-
ciency rankings as compared with their service rankings. The
total of the G column is 10.5. Note that if we compute the
gains in rank of service over efficiency instead of efficiency
over service, the same G will be obtained. This is shown in
column 6, marked G'. It makes no difference, therefore,
whether we figure gains of the first series over the second, or
the other way round, second over first.
TABLE XXI
To
Illustrate '
THE
FOOTRULE
Method of
Finding Correlation
(i)
(2)
(3)
(4)
(5)
(6)
Years of
Order of Merit
Order of Merit G (Gains)
G'
(Gains)
Salesmen
Service
(Service)
(Efficiency)
(4 over 3)
(3
over 4)
A
5
7.5
6
1.5
B
2
11.5
12
.5
C
10
2
1
"i.6
D
8
4
9
5.0
E
6
6
8
2.0
F
4
9
5
i'.o
G
12
1
2
1.0
H
2
11.5
10
i'.h
I
7
5
3
2.0
J
5
7.5
7
.5
K
9
3
4
1.0
L
3
10
11
1.0
10.5
10.5
R =
62(7
N2-l~
6X10.5
143
= .56
T
(Table XXII)
= .79
1 Since the rankings arc ^rom 1 io 12, a rank of 6 is to be taken as higher
than a rank of 7.5.
194 STATISTICS IN PSYCHOLOGY AND EDUCATION
When the sum of the G column has been obtained, the cor-
relation may be found from the formula,
62(3
R==1~~(N2-1)' •.-••■ • • • (38)
Substituting for 2(7 its value 10.5, and for N its value 12, we
get an R of .56. From Table XXII this R may be converted
into an equivalent product-moment r of .79. Note that this
value of r compares favorably with the r (found from p) of
.81.
table x::n
A Table to Infer the Value of r from Any Given Value of R
R r R r R r
R
r
00
.000
01
.018
02
.036
03
.054
04
.071
05
.089
06
.107
07
.124
08
.141
09
.158
10
.176
11
.192
12
.209
13
.226
14
.242
15
.259
16
.275
17
.291
18
.307
19
.323
20
.338
21
.354
22
.369
23
.384
24
.399
25
.414
26
.429
27
.444
28
.458
29
.472
30
.486
31
.500
32
.514
33
.528
34
.541
35
.554
36
.567
37
.580
38
.593
39
.608
40
.618
41
.630
42
.642
43
.654
44
.666
45
.677
46
.689
47
.700
48
.711
49
.721
50
.732
51
.742
.76
.937
52
.753
.77
.942
53
.703
.78
.947
54
.772
.79
.952
55
.782
.80
.956
56
.791
.81
.961
57
.801
.82
.965
58
.810
.83
.968
59
.818
.84
.972
60
.827
.85
.975
61
.836
.86
.979
62
.844
.87
.981
63
.852
.88
.9S4
64
.860
.89
.987
65
.867
.90
.9S9
66
.875
.91
.991
67
.882
.92
.993
68
.889
.93
.995
69
.896
.94
.996
70
.902
.95
.997
71
.90S
.96
.998
72 *
.915
.97
.999
73
.921
.98
.9996
74
.926
.99
.9999
75
. 932
1.00
1.0000
The Footrule formula gives a rough estimate of the cor-
relation, and is generally less accurate than the rank-
difference formula. The coefficient R " has a large, though
CORRELATION 195
except in the case of zero correlation, not definitely known
PE; does not vary between — 1 and +1; is not comparable
in meaning with the product-moment coefficient ; and in general
has none of the merits except brevity of the formula based on
the squares of the differences in rank." x The Footrule can be
employed to advantage, however, when the data are so meager
or crude as to make a more refined method a waste of time;
or it may be used in a preliminary survey to determine whether
there is sufficient evidence of correlation to warrant the applica-
tion of the product-moment method.
3. Summary of the Rank Methods
The product-moment method takes account of both the
size of the score and its position in the series. The rank
methods take account only of the position of the items in
the series. For example, individuals who score 90, 86, and
70, on a given test must be ranked 1, 2, and 3 in order of merit
despite the fact that the difference between 90 and 86 is 4, and
the difference between 86 and 70 is 16. The rank methods
indicate the presence of relationship rather than the extent
of relation. In general it may be set down as a convenient
rule that rank methods should never be used ordinarily except
when N is small — say less than 30. Of the two rank methods,
the method of rank-differences is to be preferred as the more
accurate.
VIII. A Method of Measuring Relationship When the
Data are Grouped into Classes or Categories.
The Contingency Method
Sometimes the need arises of computing correlation when
the facts in which we are interested cannot be conveniently
measured, but can be grouped into classes or categories. To
cite a few examples of such data, we can classify eye-color as
blue, grey, or brown; temper as quick, even, or slow; athletic
i See Kelley, T. L., Statistical Method, 1923, p. 193.
196 STATISTICS IN PSYCHOLOGY AND EDUCATION
ability as good, average or poor, when we are unable to measure
such facts exactly. The methods of computing correlation
which have been given in the preceding sections are generally
applied to facts which can be measured absolutely in terms
of some common unit, or which, at least, can be ranked in
order of merit — they do not ordinarily apply to data which
can only be grouped into classes. Several methods are avail-
able for such material, however. One of the best of these is the
Contingency Method developed by Prof. Karl Pearson.1
In the contingency method relation is expressed by C, the
Coefficient of Mean Square Contingency.
Table XXIII illustrates the method of drawing up a con-
tingency table, and shows in detail the steps involved in finding
C. The problem is to discover whether there is any " resem-
blance " (correlation) between the eye-color of father and son.
There are 1000 cases. Tabulation of data is similar to the
method used in constructing a correlation table. Reading
down the first column, for example, we find that out of a total
of 358 blue-eyed fathers, 194 have blue-eyed sons; 83 grey-
eyed sons; 25 dark grey or hazel-eyed sons; and 56 brown-
eyed sons. In the first row, we find 335 blue-eyed sons of
whom 194 have blue-eyed fathers; 70 grey-eyed fathers; 41
dark grey or hazel-eyed fathers; 30 brown-eyed fathers.
After the contingency table is completed, the first step in
the calculation of C is to find an " independence value " for
each cell. These values — the figures in the parentheses in the
cells — represent the number of fathers and sons (whose eye-
color is given by the column and row, respectively, in which
the cell lies) whom we should expect to find in any given cell
in the absence of any actual association in the eye-color of
father and son. For example, the observed number of blue-
eyed fathers who have blue-eyed sons in our sample of 1000
is 194. If there were no correlation between the eye-color
of father and son, we should still expect to find — TTwT-" or
Yule, G. U., An Introduction to the Theory of Statistics, 1919, p. 6-iff.
CORRELATION
197
TABLE XXIII
To Illustrate the Calculation of C, the Coefficient of
Mean Square Contingency. [From Yule, p. 70]
Column 2
«
o
j
o
O
H
H
GO
o
02
Blue
Grey
Hazel
Brown
Totals
Father's Eye Color
Blue Grey Hazel Brown Totals
(120)
; 194
(88)
70
(60)
41
(66)
30
335
(102)
83
(75)
124
(51)
41
(56)
36
284
(49)
25
(36)
34
(25)
55
(27)
23
137
(87)
55
(64)
36
(44)
43
(48)
109
244
(194)2
120
(83)2
87
(70)2
88
(124)2
358 264 180 198
1000
Column 1
Independence Values
335X358
1000
335X264
1000
335X180
1000
335X198
1000
284X358
1000
284X264
1000
284X180
1000
284X198
1000
137X358
1000
120
88
60
= 66
= 102
= 75
= 51
56
= 49
137X264
1000
137X180
1000
137X198
1000
244X358
1000
244X264
1000
244X180
1000
244X198
1000
36
= 25
= 27
= 87
= 64
= 44
= 48
44
(30)2
66
(36)2
56
(23)2
27
(109)2
£ = 1270.8
AT = 1000
S-N= 270.8
C =
A'
S-N
S
■4
270.8
1270.8
= 462
198 STATISTICS IN PSYCHOLOGY AND EDUCATION
120 blue-eyed fathers with blue-eyed sons by the operation
of chance alone.1 Again, the observed number of grey-eyed
fathers who have blue-eyed sons is 70. In the absence of any
real association, chance alone would account for — — — — — or
88 such cases in our sample of 1000. In like manner " independ-
ence values " may be found for each cell by the simple process
of multiplying together the totals of the row and column in
which the cell lies and dividing this product by N, the number
of cases. (See column 1, Table XXIII.)
When the independence values have been calculated for
each cell, the next step is to square each cell entry and divide
this result by the independence value of that cell (see column
2). All quotients so found are totaled to give S (1270.8), and
^(1000) is subtracted to give S — N. The coefficient of mean
square contingency, C, may then be found from the formula,
c= yV* • • (39)
In the present problem, C— .462.
The steps in the computation of C may be summarized as
follows :
1. Construct a contingency table as shown in Table
XXIII.
2. Determine the " independence value " for each cell by
multiplying together the totals of the row and column in which
the cell falls and dividing this product by A'.
3. Square the number found in each cell, and divide this
result by the independence value of that cell obtained in (2)
above.
4. Sum the quotients obtained from (3). Call this total S.
335
1 We find that of all the sons are blue-eyed. This proportion should
hold for sons of all fathers, if there is no dependence of son on father in respect
335
to eye-color. Hence — — — of the 35S blue-eyed fathers should have blue-eyed
sons by the operation of chance alone. This argument applies to the other
" independence values " also.
CORRELATION 199
5. Subtract N from S, giving S—N.
6. Divide S—N by S and extract the square root to get C,
the coefficient of mean square contingency.
The fundamental principle underlying the Contingency
Method is a comparison of the frequency of association (num-
ber of cases) actually found in each cell with the frequency
of association which we should expect to find in the cells if the
traits considered were completely unrelated (independent).
If there is just no correlation between the two variables in our
contingency table, (7= .00; if there is perfect correlation, C
approaches 1 . 00 as a limit.
While in general no sign is attached to C, as this coefficient
simply indicates whether the two traits are associated or
independent, for interpretative purposes a minus sign may be
affixed to a C if an inspection of the contingency table shows
that marked degrees of the one trait are found with slight
degrees of the other. Thus from an inspection of Table XXIII,
it is evident that slight pigmentation of eyes in the father is
associated with slight pigmentation of eyes in the son, and hence
in the present case, C is clearly positive.1 If marked pigmenta-
tion in the eyes of the father had been associated with slight
pigmentation in the eyes of the son, C would have been negative.
In other words, we must determine whether the correlation is
positive or negative from the contingency table, — C gives simply
the degree of the relation.
One disadvantage of the contingency method lies in the
fact that C does not remain constant — for the same data — when
the number of classes in the table is increased. The C cal-
culated from a 3X3 fold table will not ordinarily equal the C
calculated from the same data arranged in, say, a 5X5 fold table.
Moreover, the maximum value which a C can take will depend
1 Note, for example, that 194 blue-eyed fathers have blue-eyed sons, while
only 30 brown-eyed fathers have blue-eyed sons. Also, 109 brown-eyed fathers
have brown-eyed sons while only 56 blue-eyed fathers have brown-eyed sons.
Other comparisons like these will show that association between the degree of
pigmentation in the eyes of father and son is positive.
200 STATISTICS IN PSYCHOLOGY AND EDUCATION
on the fineness of the classification employed. Yule 1 has shown
that
when the number of classes = 2 C cannot exceed . 707
when the number of classes = 3 C cannot exceed .816
when the number of classes = 4 C cannot exceed . 866
when the number of classes = 5 C cannot exceed . 894
when the number of classes = 6 C cannot exceed .913
when the number of classes = 7 C cannot exceed . 926
when the number of classes = 8 C cannot exceed . 935
when the number of classes = 9 C cannot exceed . 943
when the number of classes = 10 C cannot exceed .949
Yule has suggested, in the light of these facts, that we "restrict
the use of the ' coefficient of contingency ' to 5 X 5-fold or finer
classifications " in order that the maximum value of C may
be as near unity as possible. On the other hand, we must
avoid a too-fine classification or C will be affected by slight
or " casual irregularities of no physical significance "; and in
addition the arithmetic will be needlessly increased.
Since the classification in Table XXIII is 4 X 4-fold, the
value of C would very probably change somewhat if the num-
ber of classes were increased. The table will serve very well,
however, as an illustration of the method, and of the arithmetic
involved in finding C. Moreover, as the maximum C from a
4X4-fold table is .866, and the C found from Table XXIII
is .462, we are justified in concluding — in spite of the relative
crudeness of our measures — that there is a medium positive
correlation between pigmentation of eyes in father and son.
The relation of C to r, the Product-Moment coefficient of
correlation, is of considerable importance. C may be taken as
practically equivalent to r, (1) when the grouping is relatively
fine, — 5 X 5-fold or finer; (2) when the sample is large; (3)
when we know, or are justified in assuming, that the traits
which we are correlating are normally distributed. In case the
first of these conditions is not fulfilled, Pearson 2 has given a
correction for " broad categories " which should be used with
4 X 4-fold and less fine classifications, if C is to be compared with
i An Introduction to the Theory of Statistics, 1919, p. 66.
2 Pearson Karl, On the Measurement of the Influences of " Broad Categories "
on Correlation. Biometrika, Vol. IX, 1913.
CORRELATION
201
r. For 5X5 fold or finer classifications this correction is
usually small, and unless a very accurate measure of correlation
is desired it may be disregarded and C taken as roughly equal
to r.
TABLE XXIV
To Illustrate the Calculation of C by Short Method
Boys: Ages 4|-5£ Years
Weight in Pounds
24-28 29-33 34-38 39-43 44-48 49-53 Total
Xfl
J3
45-
42-
m 39-
r-C!
'53
w
36-
33-i
30-.
47
1
2
3
44
4
35
21
5
65
41
5
87
90
7
1
190
38
1
18
72
8
99
35
5
15
5
25
32
2
2
38
169
133
30
Column 1:
= .3762
Column 2:
= .3264
8 1_99^25^2 J
1 T 25 324 2251
38Ll90+99 + 25J
n 1 fl 16 .7569 .5184 .251 K_,n
Column3: m \j +^ +— +— ■+- J = .5549
~ . . 1 T1225 . 8100 , 641
Column 4: _^+_-+_j
1 ["4 441 49 1
3o|_3 65 +190J
= .4671
Column 5:
Column 6:
30 1
6|_65^190
LI
90J
■H
= .2792
= .0650
P = 2. 0688
P-l
1 . 0688
P = A 2 0688
= .719
384
202 STATISTICS IN PSYCHOLOGY AND EDUCATION
The arithmetic involved in computing C may be lessened
somewhat by combining the twofold process of (1) calculating
independence values and (2) dividing the square of each cell
frequency by its independence value. This Short Method
of finding C is illustrated in Table XXIV. Note that the
first occupied cell in the first column of the table has a fre-
99X8
quency of 1 and an independence value of , and that
oo4
the cell frequency squared and divided by the independence
, . 1X384 _ . ,. , . 1X384
value is n -. lnis quotient, viz., is the contnbu-
tion of this particular cell to the total S. In like manner the
52X384
contribution to S of the next cell in this column is — — -^~ ;
and of the third and last cell, . These contributions
384 / 1 25 4
from column 1 may be combined as follows, "iv-! qTT+fp+q
and the contribution of each of the other five columns to S may
be found in exactly the same way. One further simplification
may be made. Since iV(384) is a common factor in each column,
it may be left out of the computations entirely in calculating
the contribution of each cell, as shown in the table. Then if
/p3J
the sum of all six columns is denoted by P, C =
P
directly.1
By the Short Method, C is found to equal .719, and the
coefficient of correlation for the same table will be found to be
.709 (see page 216). The correspondence of C and r is some-
what closer here than is generally obtained, although the
difference between C and r is never very great when the con-
ditions prescribed on page 200 have been met. In the present
i Since P = ~, S = PAr. Substituting PN for S in the formula C = -vr ~ ,
JPN-N . . JP—I
= V — pv — or rcniovinK t"e common factor, C = -y — — —
CORRELATION 203
case, N is fairly large, the classification is 6 X 6-fold, and the
distributions of both height and weight fairly normal.
The steps in the computation of C by the Short Method may
be summarized as follows (see Table XXIV).
1. Square the frequency in each cell of column 1, and
divide each square by the row total in which the cell falls.
2. Add all of the results for column 1, and divide by the
column total, a common factor. Record this partial sum.
3. Repeat (1) and (2) for each of the other columns in
the table.
4. Call the sum of all partial sums P.
5. Find C from the formula C = a / — — — .
In many problems in psychology in which the relation
between various attributes, whether of individuals or things,
is sought, C will prove of considerable value.
IX. Non-Linear Relationship
1. The Correlation Ratio
The relation which exists between the paired values of two
sets of measures X and Y may be described in a general way
as either " linear " or " non-linear." When the means of the
arrays of successive columns or rows in a correlation table fol-
low straight lines (exactly or approximately) the regression is
called " linear," and the relation between the two sets of
measure or scores is a " straight line relation." On the other
hand, when the drift or the trend of the means in the successive
arrays cannot be described by a straight line, but can be prop-
erly represented only by a curve of some kind, the regression
is called curvilinear, or in general non-linear, and the relation
between the two variables is a " curved line relation."
Our previous discussion has been concerned entirely with
cases in which the relation between X and Y was known to be
linear and in which r gave a fair measure of the degree of correla-
204 STATISTICS IN PSYCHOLOGY AND EDUCATION
tion. Cases sometimes arise in psychological measurement,
however, in which the relation between X and Y is clearly
non-linear, and in such cases the coefficient of correlation r —
since the product-moment method assumes linear relationship
— cannot be used. The reason for this may be stated in brief
as follows. When a definitely curvilinear relation — instead of
being described by a curve — is represented by a straight line,
the scatter of the paired values is considerably greater about
the straight line than about the curve. This results from the
fact that the scatter about a curve joining the means of the
successive arrays is necessarily less than the scatter about a
straight line which has been " fitted " to these mean points.
The less the scatter about the regression line or curve, the
greater the degree of correlation; hence a coefficient of cor-
relation calculated from a correlation table in which the
regression is truly curvilinear will be materially less than the
true correlation between the variables X and Y. (See Foot-
note 1.)
In order to measure non-linear relation, therefore, we need
a more generalized coefficient than the coefficient of correlation,
r: — that is, we need a coefficient which will measure the con-
1 A simple illustration will make clear just why this is true. The correlation
between the following two short series (Table XXV) by the product-moment
formula (formula 25) is .93. The true correlation, however, is 1.00, i.e., perfect,
since the Y values are absolutely dependent on the X values: — as X increases
TABLE
XXV
Variable X
Variable
1
.25
2
.50
3
1.00
4
2.00
5
4.00
in steps of 1 (in arithmetic progression) Y doubles (increases in geometric
progression). The reason why r is less than 1.00 is perfectly obvious as soon as
we plot the paired X and Y values (see Diagram XXV). Since the relationship
between X and Y is curvilinear, it cannot be described by a straight line. Con-
sequently when straight line relationship is assumed (as in the product-moment
formula) the plotted points do not fall on the relation line, and r is less than
1.00 — the true correlation between X and Y. In true curvilinear correlation, r
is always less than rj.
CORRELATION 205
centration of the paired X and Y values about a relation curve,
just as r measures the concentration of the paired values about
a relation line. One such coefficient is the Correlation Ratio,
devised by Prof. Karl Pearson, and designated by the symbol 77.
(eta). Since eta is a general coefficient it may be employed
when the regression is linear as well as non-linear. If the regres-
sion is linear — if the means of the arrays fall on straight lines
— 77 will equal r; if the regression is non-linear — if the means
2 3
X - variable
DIAGRAM XXV
do not fall on straight lines — 77 will be greater than r. In gen-
eral, as long as the relation between Y and X is non-linear
77 and r will differ, 77 always being greater than r. The
coefficient of correlation, therefore, is seen to be simply a
limiting value of the more general 77, just as straight line
relationship is simply a limiting case of curvilinear relation.
77 is always positive, and varies from zero to 1 . 00. Whether
or not the relation given by 77 is positive, negative or a varying
one must be determined, however, from the direction taken
by the curve of relation; i.e., by inspection of the correlation
diagram.
206 STATISTICS IN PSYCHOLOGY AND EDUCATION
The process of calculating 77 from a correlation table in
which the relation is definitely non-linear is shown in Diagram
XXVI. The steps involved in finding the values to be sub-
stituted in the formula for r\ may be outlined as follows:
Step I
Construct a correlation table as shown in Diagrams XXIII and
XXIV and described on page 154.
Step II
Find the average (Y') and the a of the F-distribution, using the
Guessed average Method described in Chapter I.
Step III
Compute the averages (Y'x) of the successive F-arrays, i.e., the
arrays of the columns. Enter these in row marked Y'x.
Step IV
Find the deviation of each Y'x from the average of the whole table,
Y'\ that is, find (Y'x — Y') for each column.
Step V
Square each deviation — each (Y'x—Y') — and enter the results in
the row marked (Y'x—Y')~.
Step VI
Multiply or weight each (Y'x— Y')z by the Fx of its column. In
the first column, for example, multiply 15.52 [i.e., (Y'x—Y')-] by 20,
its Fx.
Step VII
Find the sum of the FX(Y'X—Y')2 column. Divide this sum by X,
and extract the square root. The result is amy, the standard deviation
of the means of the various columns about the arithmetic mean of all
of the Fs.
Step VIII
Divide <rmy by <ry to get the correlation ration ryx. The formula
for 7]yx may be written,
flyx^ — , (40)
(Ty
If now we substitute in formula (40) the values of <rmy
and au found from Diagram XXVI, the correlation-ratio v\yx
CORRELATION
207
o
f
t-1
d
CO
1-3
w
►
H
o
2
3
>
H
O
W
H
cc
l-t
o
>
D
H
W
H
Q
>
c-1
n
cj
F
O
o
*j
i-3
M
H
O
O
SI
H
F
►
M
o
>
H
•-<
O
x
i_^
H*
J »— r;
Iw
»-;
N
• "
-S s
L5-
3
■
1
1
*
c»
Number
of
pr
)lilcrns i
vo
■ked
¥
-vari
ib
e
*•-
©
w
to
ti
»»
C31
OS — }
00
o
o
!_l
IS
CO
*-
CT
II
&
©
'-■■
to
1
eo
;o
tp-
o
to
o
e»
•o
C!l
EH
«?
H
iS
1
-=s
II
fej
h'-
Oi
*■
OS
I"1
io
i—
1
to
as
'©
en
W
w-.
*"*
o
so
OI
to
<=>
«l
*-£
1
«=
w
1
1
II
•J2
©
tO
os
'co
oo
c-»
CS
w
cs
CO
L
1
(O
H»
h-1
<
■"pi
1
e> o
fc Sj
os
1
1
II
°»lc»l
OI . 1
OS
©
CO
I-
a
fO
J,
|_l
<-
|o|
Ci
CO
is
09
O
OS
*"
o
SB
\
■x
CO
O
£0
o
I
co
II
CH
1
\
\
SI
C"S
to
tS
1
<!
jg
<
o
Is
o
co
SO
~3
to
F'
rs
--i
**■
vi
V
si
\
eg
II
00
00
,1
*»
i_i
\
N
i
60
N>
<
.3
*-
*■
•-S
CO
to
'©
o
iO
-q
as
ffl
^^\C5
fS
■-s
>o
—
II
•
GO
o
CO
so
IP-
p
o
©
-
00
Ol
^.
^v
3S
rf*
(O
O
>D
<
— i
,-v
iS
H
>*-
o"
id
o
a
p
SO
CO
OS
°-o
©
>*-
CO
«S
;c
ts
o
^
^<j
is
~J
o
S.
~*
00
1!
>
\
o
Ol
cs
es
ht-
jj,
OS
to
,_,
_
M
^J
JO
M
M
M
^
C'
C7I
-5
M
1-1
H-
~"
C5
rf*
ss
*-
<i
oo
n-
Si
ts
o
<c
os
o
1
o
1
1
1
OS
1
1
o
-
ts
OS
*-
Ol
cn
-J
CO
to
b
«:
<
|
*
<
1
1
1
co
1
•j0
1
Ol
1
1
CO
*»
to
*-
<-o
(O
to
^
-J
Ci
CJ'
to
os
ao
*>
to
to
CO
o
*»
35
CS
o
ol^
if
,1
?
1
cs
or|-g
w|§
b
II
■J
*^
1
e
t^
1
l
J-1
tS
CO
II
.*■
cs
>
o
-=
s>
o
09
to
,_,
_
*■
CO
CO
-1
CO
a
a
s
en
-a
o
ri
CO
to
lo
o
c
o>
CO
as
^
o
O
C^l
^2
CO
s>
0
e
^
to
CO
o
p
«s'
T9
I
o
«K
II
w-
II
EO
to
OS
><
208 STATISTICS IN PSYCHOLOGY AND EDUCATION
becomes .931.1 This coefficient shows how the number of
problems worked (on the average) in a certain arithmetic test
(F) is related to the grade position (X) of 465 pupils. The
curve which describes this relation — the curve which best
marks the trend or " drift " of the means of the successive Y
arrays — has been drawn in on the figure. Note that it begins
low and gradually rises, suddenly bending up in a concave
fashion.
From the diagram alone it would seem to be clear enough
that the regression of 7 on J is non-linear. Further evidence
of this may be found in the fact that the coefficient of cor-
relation, r, calculated from this table (on the assumption, of
course, of linear relationship) is . 80, — about . 13 less than
7]yx. The method of determining definitely whether regres-
sion is linear or non-linear in any table will be given in (3)
following.
There are always two q's in every non-linear correlation
table, just as there are always two regression coefficients,
r— and r— , in a table in which regression is linear. The one,
ax cry
written r]yx, refers to the regression of Y on X (Y is the dependent
variable); the other, written rjxy, refers to the regression of X
on Y (X is the dependent variable). The value of r)xy may be
computed in exactly the same way as rjyx by substituting X
for Y in the outline of " steps " given above. The formula is
*.-— , (-42)
Unlike r which has the same value in both regression equa-
tions [see formulas (28) and (29)] rj yx and y]xy will usually differ,
their values depending on the degree of scatter about the
curves joining the means of the Y and X arrays. In the present
1 The PE of rj may be found from the formula
P*,-*«£=aS (41)
or from Table XVIII.
CORRELATION 209
problem, for example, rjxy = .818, while rjyx= .931 as shown above.
In the special case in which the regression is truly linear, y\yx
and 7}xy equal each other, and both equal r (see page 205).
2. The Correction of " Raw " Eta
The value of rj depends materially on the number of cases
in the sample, and on the fineness of the grouping. As a general
rule, rj should never be calculated unless N is fairly large.
When N is comparatively small or the number of arrays is
large, Pearson 1 has given a correction which should be applied
to the " raw " (i.e., calculated) value of rj.
If we represent the number of arrays by k the formula for
" corrected eta " is
V2
(k-3)
N
corrected r\2 = ( , .... (43)
N
(The rj on the right hand side of the equation is the " raw " eta.)
If we apply this correction to the value of rjyx obtained
in the present problem, we have, substituting .931 for 7]yx,
8 (the number of F^arrays) for k, and 465 for N,
(.931)2-.011
corrected rj2
yxm
1— .011
V yx— qoq — .oboo,
and
7]VX= .930.
In the present case the correction is very small. If iV is
small, however, or k large, the raw eta may be considerably
reduced.
3. Test for Linearity of Regression
It is oftentimes difficult to tell from the appearance of a
correlation table whether the regression is linear or non-linear;
i Biometrika, 1923, 14, 412-417.
210 STATISTICS IN PSYCHOLOGY AND EDUCATION
and in such cases it is best to calculate both r and 77. As stated
above, if the regression is strictly linear 77 equals r; and the
greater the departure from linearity the greater the difference
between 77 and r. A simple test of linearity is that f (zeta)
the difference between y\2 — r2 shall differ from zero by an
amount which is not greater than that which might arise
from fluctuations due to random sampling. To make this test,
we must first find PE$ given by the formula 1
PEt=. 6745X2^ V(l-r72)2-(l-r2)2 + l, . (44)
The second radical in formula (44) is approximately equal
to 1, and hence unless great accuracy is required we may
write the formula simply as
PE{=. 6745X2^, (45)
In the problem which we have been considering %*=
.930 and r= .80. Accordingly, f= (.930)2-(.80)2 or .2249,
and from formula (45) PE$ = .030.2 Zeta, therefore, is
/ • T 2249 \
7.49 times its PE since T^Fr = — -—^r- or 7.49 and there is no
\ r fci^ . Uo(J /
doubt as to the non-linearity of the regression. To determine
whether -=r=- denotes a real or simply a chance difference
between r]2 and r2, Table XV, the ^-^ table, may be used
conveniently.
If zeta is very small, or if both 77 and r are small, a simple
test for linearity (Blakeman's test 3) which does not require
finding PE$ may be used. According to this test, when
Ar(772-r2)<11.37 (46)
1 This formula is due to Blakeman. Sec Yule, An Introduction to the Theory
of Statistics, p. 352.
2 Formula (44) gives PE (zeta) as .02S. The difference between the results
given by formulas (44) and (45) is negligible here.
3 Blakeman, J., On Tests for Linearity of Regression, Biometrika. 4. 1906,
pp. 332-350.
CORRELATION 211
fche regression is linear. In our problem, N(r)2 — r2) = 104.58,
and the regression is clearly non-linear.
True non-linear relation is often met with in psycho-
physics, and in experiments dealing with fatigue, practise,
forgetting, etc. Most mental and physical tests, however,
have been found to exhibit linear relationship, and in con-
sequence r has been employed in psychology and education
to a much greater extent than v. If the regression is definitely
non-linear, it makes considerable difference whether 77 or r
is taken as the measure of relation. Unless the regression is
clearly curvilinear, however, little error is introduced by
taking r instead of rj; and this is especially true if the cor-
relation is low.
The coefficient of correlation, r, is superior to rj in that
knowing its value we can easily write the equation from which
the value of the dependent variable may be estimated from the
independent. This is not possible with the correlation-ratio.
In order to estimate one variable from the other in non-linear
relation, a curve must be fitted to the means of the arrays of
the columns or rows.1
( »
X. The Correction of a Coefficient of Correlation
for " Attenuation "
The accuracy of any series of test scores or other meas-
ures of capacity is always conditioned by the number and
size of the chance variations — " errors of observation " — pres-
ent. The term " errors of observation " may be taken to in-
clude slight changes in technique and procedure on the part
of the experimenter, as well as variations in the subjects
due to fatigue, distraction, shifts in attention or attitude
towards the test, and other minor fluctuations of different
sorts. If the number of observations is large, errors of observa-
tion— since their effect is as liable to be in the negative as the
1 The subject of curve fitting is fully dealt with in more advanced books on
statistics. See, Jones, D. C, A First Course in Statistics, 1921, Chaps. XV,
XVI, and XVtL for a fairly elementary discussion.
212 STATISTICS IN PSYCHOLOGY AND EDUCATION
positive direction — will tend in the long run to cancel each other
off as far as the average is concerned. Such errors, however,
always tend to increase the a of the distribution, and to
decrease or " attenuate " a coefficient of correlation calculated
between series in which they are present. For this reason,
it is generally advisable to correct raw r's for observational
errors, and special formulas have been devised to rule out their
effect.1
It is first necessary to make at least two independent
measures of each capacity, and to find the self-correlation of
each test.2 This done, the r corrected for attenuation may be
found from formula (47) given below. The complete procedure
is as follows:
Let A and B represent the tests to be correlated.
Let A\ represent the 1st series of scores obtained in A.
Let A 2 represent the 2nd series of scores obtained in A.
Let Bi represent the 1st series of scores obtained in B.
Let B2 represent the 2nd series of scores obtained in B.
Let Tab represent the " true " correlation between tests
A and B.
Let rAlA2 represent the self-correlation of test A.
Let rBlB2 represent the self-correlation of test B.
Let rAlB2 represent the obtained correlation between A
and B2.
Let rAiBx represent the obtained correlation between A 2
and B\.
Then3
v (r^]B2)(;\42si) (An\
Tab= ,- ===== , (4/;
1 See the two articles by C. Spearman:
(a) The Proof and Measurement of the Association between Two Things,
American Journal of Psychology, 190-4, Vol. XV, p. 72-101.
and (b) Demonstration of Formulae for True Measure of Correlation, American
Journal of Psychology, 1907, Vol. XVIII, p. 161-169.
2 See page 288.
3 See Yule, An Introduction to the Theory of Statistics, pp. 213-214 for
discussion of this formula.
CORRELATION 213
To illustrate the formula, suppose that. A is a Following
Directions Test, and B a Mixed Relations Test, and that
rAlA2 = . 72 rBlB2 = . 75
rAlB2 = . 35 rA2B1 = . 42
Substituting in formula (47) we have
V.72X.75
or correcting for observational errors, we raise the correlation
from .35 and .42 (the obtained r's) to .52.
If we have only the one correlation between two given tests
A and B, so that formula (47) is inapplicable, it is still possible
to obtain an approximate correction for attenuation by
dividing the " raw " coefficient by the geometrical mean of the
two " reliability coefficients." 1 Formula (47) then becomes
rAB= /**-, (48)
v TAiA2TB1B2
Thus if the obtained correlation between tests A and B above
had been . 50, and the reliability coefficients, as before, . 72 and
. 75, we could correct (approximately) for attenuation as follows :
Tab = , = ■ 68.
V.72X.75
Corrected for attenuation, the obtained coefficient is increased
from .50 to .68.
XL Summary of Formulas Used in This Chapter
1. For Product-Moment r, deviations from GA's
Ixy
N CXCy
(23)
ax(Ty
1 See Spearman, C, American Journal of Psychology, 1904, Vol. XV, p. 271.
214 STATISTICS IN PSYCHOLOGY AND EDUCATION
2. For Product-Moment r, deviations from actual averages
r=ivd' (24)
r- J-* ■ •' (25)
3 P^r = ^5Xil-!) (26)
Vat
4. PJE(dM. ri_r2) = VPEn2+PEr22\ (27)
5. Regression Equations in Deviation Form
y = r-^-x, (28)
x=r-^-y, (29)
6. Regression Equations in Score Form
Y = r-^(X-X') + Y', (30)
&x
X = r--(Y-Y')+X', (31)
7. Standard Errors of Estimate
o-(est. r) = oyvl — r2, (32)
0-(est. X) = 0*Vl — r2, (33)
P^(est.y)= .6745<r„Vl-r2J (34)
PE{est. a-) =■ 6745(7, VT=^, (35)
8. Correlation Measured from " Ranks "
62Z>2
P = 1~iY(iV^l)' (36)
pR= .70630--^ (37)
62(7
/? = 1-(^^TI). (38)
CORRELATION 215
9. Coefficient of Mean Square Contingency, C
C-^—, r. (39)
10. Non-line^ Regression
%* = —", (40)
a
p^ = ;C745X(l-^)) (41)
*»-— . (42)
2 C*c — 3)
71 N~
Corrected ??2 = -( rr-, (43)
N
P^r=. 6745X2^. V(l-^)2_(1_r2)2+1> g (44)
P#r =. 6745 X^Jjr (approximately), . . . .(45)
JV(r72-r2)<11.37, (46)
11. Correction for Attenuation
v/(r^1g2)(r^2gl)
r^g= /7 — ===, (47)
Tab= . TA\B; =, (48)
PROBLEMS
1. Find the coefficient of correlation (product-moment) between the
following sets of Army Alpha and typewriting scores made by
100 students in a typewriting class. The typewriting scores are
216 STATISTICS IN PSYCHOLOGY AND EDUCATION
in number of words written per minute (with certain penalties).
In tabulating scores, let typing be the F-variable and Alpha the
X-variable. Take the F-step as 5 and the X-step as 10 units.
Typing (F) Alpha (X) Typing (F) Alpha (X) Typing (F) Alpha (X)
46
152
26
164
40
120
31
96
33
127
36
140
46
171
44
144
43
141
40
172
35
160
48
143
42
138
49
106
45
138
41
154
40
95
58
149
39
127
57
146
23
142
46
156
23
175
45
166
34
156
51
126
44
138
48
133
35
120
47
150
48
173
41
154
29
148
38
134
28
146
46
166
26
179
32
154
46
146
37
159
50
159
39
167
34
167
29
175
49
139
51
136
41
164
34
183
47
153
32
111
41
150
39
145
49
164
49
179
32
134
58
119
31
138
37
184
35
160
- 47
136
26
154
48
149
40
172
40
90
40
149
30
145
53
143
43
143
40
109
46
173
38
159
38
158
39
168
37
157
29
115
52
187
41
153
43
93
47
166
51
149
55
163
31
172
40
163
37
147
33
189
35
175
52
169
22
147
31
133
38
75
46
150
23
178
39
152
44
150
37
168
32
159
37
143
46
156
42
150
31
133
2. In the Correlation Table 1 given below, find
(a) the coefficient of correlation, and PEr;
(b) the regression equations in Score Form, and the standard errors
of estimate.
(c) What is the most probable height of a boy who weighs 30
pounds? 45 pounds?
i See Table XXIV for the C worked out for these data.
CORRELATION
217
Boys: Ages 4.5 to 5.5 Years
Weight in Pounds (X)
24-28
29-33
34-38
39-43
44-48
49-53
Totals
(Fy)
£m
45-47
1
2
3
02
0)
42-44
4
35
21
5
65
39-41
5
87
90
7
1
190
d
• F-H
36-38
1
18
72
8
99
'53
33-35
5
15
5
25
w
30-32
2
2
Totals
Fa;
8
38
169
133
30
6
384
3. In the following correlation table,1 find
(a) the coefficient of correlation, and the PEr.
(b) What is the most probable grade of a pupil who makes 120 on
Alpha?
Army Alpha
IQ's
School
Marks
84 and
lower
85-
89
90-
94
95-
99
100-
104
105-
109
110-
114
115-
119
120-
124
125
over
Totals
90 and over
3
3
15
12
9
9
5
56
85-89
8
17
15
24
13
6
6
89
80-84
4
6
22
21
20
10
5
1
89
75-79
7
25
33
23
10
7
4
109
70-74
4
10
18
14
22
12
1
1
82
65-69
1
3
3
12
7
8
8
1
43
60-64
2
5
3
1
1
12
Totals
1
7
26
77
99
105
87
41
25
12
480
From. Otis, Statistical Methods in Educational Measurement, 1925, p. 315.
218 STATISTICS IN PSYCHOLOGY AND EDUCATION
4. Find the correlation between the following test scores by
(a) the Rank-Difference Method, and
(b) the Method of Gains.
Cancellation Score
(A test + Number Group
Checking Test)
110
98
118
104
112
124
119
95
94
97
110
94
126
120
118
(Note. — Since the Cancellation scores are in seconds, the highest
score (94) is numerically the lowest.)
5. Compute the coefficient of contingenc}^ C, for the two tables given
below, which show:
A. The resemblance between brothers in athletic capacity.1
B. The resemblance between fathers and sons in temperament.2
Individual
Intelligence Score
(Alpha)
Kp
My
Le
185
203
188
Hy
Sh
195
176
Ld
174
Sn
158
St
197
Wn
176
Pe
138
Gr
126
Bn
160
Gm
151
Ly
Ws
185
185
Athletic Capacity — First Brother
«
a
W
H
O
H
Q
O
o
w
Athletic
Betwixt
Non-athletic
Totals
Athletic
906
20
140
1066
Betwixt
20
76
9
105
Non-athletic
140
9
370
519
Totals
1066
105
519
1690
1 From Yule, An Introduction to the Theory of Statistics, p. 74, after Pearson.
2 From Brown and Thompson, Essentials of Mental Measurement, 1921
p. 125. The coefficient of contingency is not usually calculated for tables having
less than a 5X5 fold classification. These tables, however, will illustrate the
method in a simple way,
CORRELATION
219
B
Fathers
Merry
Melancholy
Alternating
Even
Totals
Merry
122
8
81
67
278
Melancholy
10
2
7
10
29
O
Alternating
70
9
101
68
248
Even
58
6
66
45
175
Totals
260
25
255
190
730
6. The following correlation table gives the relation between the
scores on the Thorndike College Entrance Intelligence Examina-
tion and the extra-curricular activities of 102 Columbia College
students.1
(a) Find rjyx for this table.
(6) Find r, and test the regression of 7 on J for linearity.
Thorndike Scores (X)
55-
59
60.-
64
65-
69
70-
74
75-
79
80-
84
85-
89
90-
94
95-
99
100-
104
Fy
^
18-20
2
2
4
02
]+3
15-17
2
3
1
6
>
•|-H
<
c3
12-14
4
6
2
2
14
9-11
1
2
4
4
6
7
3
27
3
6-8
1
6
2
2
6
2
4
1
24
3
o
i
o3
3-5
1
1
3
5
3
5
1
1
20
0-2
1
1
1
1
1
2
7
Totals
Fx
2
2
3
16
13
20
16
15
11
4
102
i From Sommerville, R. C, Physical, Motor, and Sensory Traits. Archives
of Psychology, 1924, 75, p. 101,
220 STATISTICS IN PSYCHOLOGY AND EDUCATION
7. Verify the correlation-ratio r)xv of . 82 given for Diagram XXYI (see
page 209).
(a) Test the regression of X on Y for linearity.
(6) Plot the regression line (or curve) on the diagram.
8. Ma is the series of scores from one trial of a memory test.
Mb is the series of scores from a second trial of the same test.
Aa is the series of scores from one trial of an association test.
A6 is the series of scores from a second trial of this test.
The r's are as follows:
between Ma and Mb, . 60.
between Mb and Aa, .50.
between Ma and A b, .55.
between Aa and A b .72.
Find the r between M and A corrected for attenuation.
Answers
1. r=-.05; PEr=. 07.
2. (a) r=.709; PEr = .017.
(b) Y= .4X+24.42; X=l. 267-11. 66
°"(est. Y) = 1 ■ ' 9 ; c(est> X) — 3 . 18.
(c) 36.42 inches; 42.42 inches.
3. (a) r=.455; PEr= . 024.
(6) 85.4 with a PEiesU Y) of 4.75.
4. (a) p=.187; r=.19 PEr= .18.
(6) #=.09; r=.16.
5. A. C=.6S B. C=.16.
6. (a) r]yx= A3; r\yx (corrected) = .36.
(6) r= — .09. The regression is almost certainly non-linear.
8. r=.80.
CHAPTER V
PARTIAL AND MULTIPLE CORRELATION1
I. The Meaning of Partial and Multiple Correlation
The coefficient of correlation between sets of test scores
(or other series of measures) often represents not simply the
degree of relationship existing between these measures in
themselves, but the degree of this relation plus the indirect
effect of other factors to which they are both related. For
this reason in measuring the correlation between two sets of
measures, it is necessary that we eliminate or rule out as far as
possible those uncontrolled factors which through their common
relation to the measures to be correlated tend to raise or lower
the " net " correlation. As an illustration of the effect on
correlation of uncontrolled factors, suppose that the correlation
between intelligence (i) and age (a) in a large group of children
whose ages range from 7 to 14 years is rlQ; that the correlation
between school achievement (s) and age (a) in the same group
is rsa; and that the correlation between intelligence (z) and
school achievement (s) is rls. Xow this last coefficient, rls,
is not simply a measure of the influence of intelligence on school
achievement, but is a measure of the influence of intelligence,
plus the indirect effect of differences in age, on school achieve-
ment. In order to determine the relation between intelli-
gence and school achievement uninfluenced by the age factor,
it is necessary to rule out the effect of age-differences. This
can be accomplished in two ways: (1) by selecting children all
of whom are of the same age, or (2) by finding a " partial ':
coefficient of correlation between intelligence and school
1 The discussion of partial and multiple correlation given in this chapter follows
Yule in method and nomenclature.
221
222 STATISTICS IN PSYCHOLOGY AND EDUCATION
standing. Such a partial coefficient is written rl5.a, and may
be thought of as giving the net correlation between intelligence
and school achievement for children of the same age, or as the
net correlation between intelligence and school achievement
with age constant. In short, a coefficient of partial correlation
may be said to represent the net relation between two variables
when one or more other variables which might increase or
decrease the true correlation have been ruled out or held con-
stant.
In addition to its value as a device whereby we are able
to control conditions by ruling out disturbing factors, partial
correlation is highly important also in that it enables us to build
up regression equations involving three or more variables from
which a test score (or other measure) may be predicted when
we know the corresponding scores made on the other tests.
The value of the regression equation in estimating scores — its
accuracy as a predicting instrument — may be determined from
the " multiple " coefficient of correlation.1 This coefficient
gives the correlation between the scores actually obtained on a
given test, and the scores on the same test predicted by the re-
gression equation from the scores made on two or more correlated
tests. The multiple coefficient of correlation may be thought
of also as giving the correlation between a trait (or traits) as
measured by a single test, and the same trait (or traits) as
measured by a number of tests taken together. (The multiple
coefficient will be best understood by working through an actual
problem.)
To summarize briefly, partial and multiple correlation
may be considered as representing an important extension
of the theory and technique of " simple " or two- variable cor-
relation to include problems which involve three or more
variables.
1 o" (est.) also gives the accuracy of the regression equation in predicting single
scores. (See page 183.)
PARTIAL AND MULTIPLE CORRELATION 223
II. A Correlation Problem Involving Three Variables
The simplest and most straightforward approach to an
understanding of the value of the method of partial and mul-
tiple correlation and of the technique involved is by way of an
illustration. In the present section, therefore, is shown the
application of partial and multiple correlation to a three-vari-
able problem; and following this, the general formulas and
some further applications of the method are considered.
The problem selected (Table XXVI) is taken from a study
made by Professor Mark May 1 of the factors which influence
" academic success." In that part of his study from which our
example is taken, May wished to find how accurately he could
" predict " the academic success or scholastic achievement of 450
Syracuse freshmen from a knowledge of their general intelligence
and study habits. Academic success was defined specifically as
the number of " credit " or "honor" points obtained by a student
at the end of his first semester in college. The number of honor
points secured depends on the number of A, B, and C grades
made by the student in his courses. Thus a grade of A carries 3
honor points; a grade of B, 2 honor points; a grade of C, 1
honor point ; and a grade of D, which is a passing mark, carries
no honor point credit. The maximum number of points which
a freshman taking the " regular " course can obtain in one
semester is 48.
General intelligence was measured by a combination of the
Miller Mental Ability Test, and the Dartmouth Completion
of Definitions Test. The Miller Test contains 120 items and
the Dartmouth Test 40, so that the maximum " raw score "
was 160. The scores of the 450 students ranged from 50 to
150, the distribution being fairly normal.
As a measure of industry and application, it was decided to
take the number of hours per week spent, on the average, in
study. Information in regard to study habits was obtained
1 May, Mark A., Predicting Academic Success, Journal of Educational Psy-
chology, 1923, Vol. XIV, 7, pp. 429-440.
224 STATISTICS IN PSYCHOLOGY AND EDUCATION
by means of a questionnaire given at the beginning and at the
middle of the first semester. Among other items of informa-
tion asked for in the questionnaire were such things as the
number of hours spent per week at meals, in sleeping, etc. In
this way an attempt was made to have the student think that
he was being checked up on the distribution of his total time,
and not on his study habits alone. The self-correlation between
the two statements— number of hours spent in study — on the
first and second questionnaires was .86, which indicates a very
satisfactory degree of reliability.
As previously stated, the main object of this study was to
find how accurately the number of honor points which a student
receives can be predicted from a knowledge of his study
habits and his general intelligence.1 In solving this problem,
however, it is necessary to find the partial coefficient which
shows to what extent honor points are related to general
intelligence when the variable factor of study-hours per week
is held constant; and also the partial coefficient which shows
to what extent honor points are related to study-hours when
the variable factor of general intelligence is held constant.
This information, in itself, will prove to be of considerable
interest. The solution of the whole problem is given in the
following series of steps — the necessary data and statistics
will be found in Table XXVI
Step I. Note that the mean and a of each series of measures,
and the inter correlations are first calculated. These inter-
correlations are the usual product-moment r's, computed as
shown in Chapter IV. The r between (1) honor points, and
(2) general intelligence, written ru is .60; the r between (1)
honor points and (3) number of study hours, written ri3, is .32;
and the r between (2) general intelligence and (3) number of
study hours, i.e., r23, is —.35. The low correlation between
honor points and study-hours is of considerable interest;
1 Other factors, of course, such as health, personality, previous preparation,
etc., are of considerable importance in determining honor points as May indicates
in his article. The two factors selected were chosen simply because they are
not only important, but also objective and measurable.
PARTIAL AND MULTIPLE CORRELATION 225
but probably the most interesting r is the — .35 between study-
hours and general intelligence. Evidently, the brighter the
student, the less he studies!
Step II. The next step is to calculate the " net " correlation
between (1) honor points and (2) general intelligence with the
influence of (3) study-hours "partialed" out or held constant.
This net, or partial coefficient of correlation, is written ri2.3.
The formula 1 for ri2.3 is
7-12.3 = 77.=^ / — -f=. [Formula (49), page 232].
vi — r 13 vi — r 23
Substitution of the values of n.2, nz, and r23 in the formula
gives ri2.3 a value of .802. This means that if all of our 450
students studied exactly the same number of hours per week
(i.e., if the number of study hours were constant), the coefficient
of correlation between honor points earned and general intel-
ligence scores would be .802 instead of .60, the obtained coeffi-
cient, ri2. In other words, if each student spent the same
number of hours in study, there would be a much closer corre-
spondence between general intelligence and honor points than
there is when the number of study hours varies.
The partial coefficient of correlation between (1) honor
points and (3) hours spent in study for (2) general intelligence
constant is given by the formula
ri3.2 = , ri8~r"?gl=. [Formula (49)]
vl-r2i2vl — H23
Substitution of the values of 7*13, ^12 and r23 gives a partial
coefficient 713.2= .707 as against a "raw" coefficient, 7*13, of .32.
It is evident, therefore, that if our group were of the same degree
of general intelligence 2 there would be a much closer correspond-
1 The general formulas from which this and other formulas used in this
section are derived will be found in Section III following.
2 By " same degree of general intelligence " is meant the same score on the
given general intelligence tests.
226 STATISTICS IN PSYCHOLOGY AND EDUCATION
ence between the number of honor points received and the
number of hours spent in study than there is when the members
of the group possess varying degrees of general intelligence — and
this is certainly the result to be expected.
The last partial coefficient of correlation r2s.i=— .715.
This coefficient gives the net correlation between (2) general
intelligence and (3) study-hours, for (1) honor points held
constant, and is found from the formula
r23.i = . 9 .- = . [Formula (49)]
V 1 — rJi2 v 1— H13
Like the two partial r's above, we may interpret r2z.\ to mean
that the correlation between general intelligence score and
hours spent in study in a group in which every student has
earned the same number of honor points would be much higher —
negatively — than the raw correlation between these same two
factors in a randomly selected group — a group in winch the
number of honor points received by different students vary.
Thus we discover that the brighter students not only study
less than the average and dull (since ros = — .35) but that the
brighter the student the less he needs to study in order to reach
a given standard of academic success, — to secure a given number
of honor points (since r23.i= —.715).
Step III. The partial coefficients of correlation calculated,
the next step is to write the regression equation from winch the
most probable number of honor points which a student will
receive can be estimated, given his general intelligence score and
the number of hours he spends in study per week. The regres-
sion equation for three variables is written — in Deviation Form
— as follows: [Formula (51)].
Xl = bi2.3X2 + bi3. 2.T3-
In this formula x\ is the dependent variable and stands for
honor points; X2 and £3 are the independent variables, and
PARTIAL AND MULTIPLE CORRELATION 227
stand for general intelligence and study-hours respectively.1 In
Score Form the equation becomes: [Formula (52)]
(Xi-Av.Xi)=6i2.3(Z2-Av.Z2)+6i3.2(X3-Av.X8),
or transposing and collecting terms,
X\ — 612.3 X2+613.2 Xz-\-K (a constant).
It is clear that before we can use this equation we must
find the values of the regression coefficients 612.3 and 613.2.
These are found from the formulas,
&12.3 = 7*12.3-^; and 613.2 =ri3.2—1—, [Formula (53)]
0"2.13 0-3.12
and as we already have the value of ri2.3 and 7*13.2 it is only
necessary to find 0-1.23, 0-2.13, and 0-3.12 (the "partial" o-'s) in
order to replace the regression coefficients in the equation by
numerical values.
Step IV. The values of the " partial "o-'s are found from
the formulas, _____
1. 0-1.23 =01 Vl— r2i2Vl— r2i3.2.
2. 02.13 =02 Vl — r^Vl— r2i2.3. [Formula (50)]
3. 0-3.12=0-3^1 — r223^1— ^213.2.
Substituting the known values of the raw and partial r's in these
formulas we get 0-1.23 = 6.34; 0-2.13 = 8.84; 03.12 = 3.97. (For
calculations, see Table XXVI.)
Step V. From the partial o-'s and the partial r's, the numerical
values of the regression coefficients 612.3 and 613.2 are found to
be .57 and 1.13, respectively. Hence we may now write the
regression equation as
#1= .57^2 + 1.13x3;
or multiplying by a convenient constant (e.g., by 1.75), (the num-
ber of honor points) = 1 (score on the intelligence tests) +2 (num-
ber of hours spent in study per week). It is evident from this
equation that in so far as the general intelligence score and
1 Note the resemblance of this equation to the simple regression equation
for two variables y=bn-x (page 174). If x\ is put for y and x2 for x in this
equation, we have, 21 =612 -£2.
228 STATISTICS IN PSYCHOLOGY AND EDUCATION
number of study hours per week determine the number of honor
points received, their relative weight is as 1 : 2.
TABLE XXVI
A Correlation Problem Involving Three Variables
Step I
(1) Honor Points (2) General Intelligence (3) Hours of Study
per Week
ilfi = 18.5 ikf2 = 100.6 Af3 = 24
Ol = 11.2 (T2 = 15.8 03=6
ri2=.60 ri3=.32 r23=-.35
Step II. Calculation of Partial Coefficients of Correlation, (see Note)
ftM«, **-'•»•'■» =160-.32(-.35) =
n-3 Vr^WI^3 • 9474 X. 9367 '**' ' ' ^
ri3-ri2r23 = .32- .60(- .35) = 7QfJ
Vl-r^Vl^rSa .8X.9367
_ r23— ri2r.3 _ — .35— .32X .60_ __
^"vr^^yp^Ts" .8X.9474 •'*■
* For Vl— r2 values, use Table XXVII.
Step III. The Regression Equations
Xi= 612.3X2+613.^3 (Deviation Form), .... (51)
or
Xi = bi2.zX2+bu.2X3+K. (Score Form), .... (52)
in which
6i2.3=n2.3 — — and 613.2=7*13.2 — — (53)
02.13 0"3.12
Step IV. Calculation of o's
(1) Q-1.23 =<riVl-y2i2Vl-r2i3.2 = 11.2X.8X. 7072=6. 34. . (50)
(2) q-2.13 =0-2 Vl -rhsVl -r2i2.3 = 15 . 8 X ■ 9367 X ■ 5973 = 8 . 84
(3) o3.i2 = (Wl-r223Vl-r2i3.2 = 6X.9367X. 7072 = 3. 97
Step V. The Regression Coefficients and Regression Equation
Substituting for 7*12.3, 7*13.2, 0-1.23, 0-2.13, 0-3.1-2, we have
612.3=. 802 x|^=. 57; 613.2= .707 X§^ = 1.13.
Hence the regression equation becomes:
xi = . 57a*2+l . 13.r3 (Deviation Form),
or Zi= .57X2+1.13X3-66 (Score Form).
Step VI. Calculation of the Standard Error of Estimate
o(est. Xi) =oi.23 = 6.34 (54)
P#(est.A-i) = .6745X6.34=4.2S (55)
Step VII. The Coefficient of Multiple Correlation
7^(23) = Jl--!A3 (56)
™ 0-1
= .824
Note. — It should be noted that while the partial coefficient of correlation
7*23.1 is of interest as giving us the relation between general intelligence and hours
PARTIAL AND MULTIPLE CORRELATION 229
spent in study for a constant number of honor points, it is unnecessary in the
regression equation, x\ =612.3^2 +&13. 2^3. In order to evaluate the constants
612.3 and 613.2 in this regression equation, we need only 7-12.3 and ^13.2. In any
problem involving three variables, only two partial coefficients of correlation
need be computed, if we are interested only in the prediction of Xi values from
known values of X2 and X3.
to Infer the
TABLI
Value of
5 XXVII
a Given
A Table
V 1— r2 FROM
Value of r
r
Vl-r2
r
Vl-r2
r
Vl-r2
.00
1.0000
.34
.9404
.68
.7332
.01
.9999
.35
.9367
.69
.7238
.02
.9998
.36
.9330
.70
.7141
.03
.9995
.37
.9290
.71
.7042
.04
.9992
.38
.9250
.72
.6940
.05
.9987
.39
.9208
.73
.6834
.06
.9982
.40
.9165
.74
.6726
.07
.9975
.41
.9121
.75
.6614
.08
.9968
.42
.9075
.76
.6499
.09
.9959
.43
.9028
.77
.6380
.10
.9950
.44
.8980
.78
.6258
.11
.9939
.45
.8930
.79
.6131
.12
.9928
.46
.8879
.80
.6000
.13
.9915
.47
.8827
.81
.5864
.14
.9902
.48
.8773
.82
.5724
.15
.9887
.49
.8717
.83
.5578
.16
.9871
.50
.8660
.84
.5426
.17
.9854
.51
.8617
.85
.5268
.18
.9837 (
.52
.8542
.86
.5103
.19
.9818
.53
.8480
.87
.4931
.20
.9798
.54
.8417
.88
.4750
.21
.9777
.55
.8352
.89
.4560
.22
.9755
.56
.8285
.90
.4359
.23
.9732
.57
.8216
.91
.4146
.24
.9708
.58
.8146
.92
.3919
.25
.9682
.59
.8074
.93
.3676
.26
.9656
.60
.8000
.94
.3412
.27
.9629
.61
.7924
.95
.3122
.28
.9600
.62
.7846
.96
.2800
.29
.9570
.63
.7766
.97
.2431
.30
.9539
.64
.7684
.98
.1990
.31
.9507
.65
.7599
.99
.1411
.32
.9474
.66
.7513
1.00
.0000
.33
.9440
.67
.7424
To write the regression in Score Form, we simply replace
xi by (Xi-18.5); x2 by (X2-100.6); and £3 by (X3-24).
The equation then becomes
Xi=. 57X2+ 1.13X3 -66.
230 STATISTICS IN PSYCHOLOGY AND EDUCATION
Given a student's general intelligence score (X2) and the
number of hours he spends in study per week (X3) we can, from
this equation, estimate the most probable number of honor points
which he will receive in the first semester. By way of illustra-
tion, suppose that a student has a general intelligence score of
120 points and that he studies on the average 20 hours per
week: how many honor point will he most probably receive
during the first semester? Substituting X2 = 120 and X3 = 20
in the regression equation, we have that
Xi=*. 57X120+1. 13X20-66, or Xi = 25.
The most probable number of honor points which this student
will receive, therefore, using the given criteria as the basis of our
estimate, is 25.
Step VI. This estimate — like every other " most probable "
number of honor points predicted from the regression equation
— has a certain " error of estimate." The standard error of
estimate of all honor points, i.e., Xi's, predicted from the
regression equation Xi = 612.3X2 +&i3.2X3-|-i£ is designated
o-(est.xi) and equals 0-1.23 [see Formula (50)] directly. The
Potest. Xi) IS • 6745 X <7(est. Xx).
The standard error of estimate in the present problem is
6.34 points, and the PE^t.Xl) is 4.28 points. In the
illustration above, therefore, the 25 estimated honor points
have a PE^st.xi) °f 4.28 points, which means that the chances
are even — 50 in 100 — that this student will receive (roughly)
not less than 21 nor more than 29 honor points. The reliability
of any other honor points estimate made from the regression
equation may be found in exactly the same way.
Step VII. The final step in the solution of our problem is to
compute the coefficient of multiple correlation. This " mul-
tiple r," which is generally written R1, has been defined (see
page 222) as the coefficient of correlation between the scores
1 Multiple R must not be confused with the R of the Spearman FootruJe
formula, page 104.
PARTIAL AND MULTIPLE CORRELATION 231
actually made on a given test and the scores on the same test
predicted from the regression equation. Expressed more
mathematically, R gives the correlation between the dependent
variable Xi, and the independent variables, X2, X3, etc., taken
together as a team. The formula for R when there are two
independent variables is
Ri&3) = ^l-^^. [Formula (56)]
In the present problem, i2i(23)= .824. This means that
if the most probable number of honor points which each
student in our group of 450 will receive is predicted from the
regression equation, the correlation between these 450 pre-
dicted scores and the 450 scores actually received will be. 824.
Multiple R, therefore, tells us how closely Xi is related to the
combined action of X2 and X3, or — in the present instance — how
closely honor points are related to general intelligence and num-
ber of hours spent in study per week, taken together.
III. General Formulas for Use in Partial and Multiple
Correlation
I. General Formulas for Partial r's
We have found (Table XXVI) that in a correlation problem
involving three variables, we are enabled by the method of
partial correlation to find the net relation between two variables
when a third is ruled out or held constant. In like manner, by
an extension of the method of partial correlation, we can secure
the net correlation between Xi and X2 when two or more
variables have been ruled out or held constant. Thus the
partial coefficient of correlation 7-12.34 means by analogy to
ri2.s that the correlation between Xi and X2 has been freed
of the influence of both X3 and X4; and the partial coeffi-
cient of correlation ri2.34 . . . n means that the correlation
between Xi and X2 has been freed (theoretically) of the
influence of all disturbing factors.
232 STATISTICS IN PSYCHOLOGY AND EDUCATION
In every partial coefficient of correlation the subscripts
to the left of the point are called primary subscripts and denote
the two variables whose correlation we are seeking. The
subscripts to the right of the point are called secondary sub-
scripts, and denote those variables which are to be ruled out
or held constant.1 The order of a partial r is determined by
the number of its secondary subscripts: ru.z or 7*13.2 or
7*23. 1, for example, is a partial r of the first order, while " entire "
or " total " r's, such as r\2 or ri3 or r23 are coefficients of zero
order.
The general formula for partial r's of the nth order is written
^12.34 . . . (n-1)— rin.34 . • . (n- l)?"2n.34 . . . (n-1) //(m
7*12.34 . . . « = , 7= = . (49)
VI— rzin.34 . . . (n-1) V 1— 7*-2n.34 . . . (n-1)
From formula (49) partial r's of any given order can be found.
In a four-variable problem, for example, ri2.34 may be written
by reference to the formula as
ri2.3 — "14.37*24.3
7-12.34 = , j====,
V 1 — H14.3V 1 — H24.3
that is to say, in terms of the partial r's of the first order. These
first order partial r's must then be computed by (49) from r's
of zero order before the second order r's can be evaluated. To
find partial r's of a higher order, we must first express
them in terms of the partial r's of the next lower order; and
these r's, in turn, in terms of r's of the next lower order, and so
on until r's of zero order have been reached.2 In other words,
it is necessary to "work up" from zero order r's, whenever r's
of any higher order are to be computed. Hence it is apparent
that with each additional variable the arithmetic of calculation
1 The order in which the secondary subscripts are written is entirely imma-
terial, e.g., 7*12.34 — fn. 43- The order of the primary subscripts is of importance,
however, in telling us which variable is " dependent " and which "independ-
ent." Thus m means that Xi is dependent — is to be predicted from X%\ while
m means that X2 is dependent — is to be predicted from Xi. The numerical
value of ri2 and m is, of course, the same.
2 In calculating partial r's, use Table XXVII to get VI — r2 values.
PARTIAL AND MULTIPLE CORRELATION 233
is greatly increased. As a result, unless the work is carefully
planned, the calculations soon become extremely laborious.
The PE of a partial r of any order may be found, like the
PE of an " entire" r, by substituting in formula (26).
2. General Formulas for Partial cr's of Any Order
Just as the correlation between two sets of scores or other
measures can be determined when the influence of 1, 2, 3, ... n
other factors is held constant, so the variability (the a) of
any set of scores can be found when the influence of 1, 2, 3, ... n
factors is held constant. As an illustration of this, take
0*1.23 of Table XXVI. This " partial o-" gives the variability
of Xi (honor points) freed of the influence exerted by the two
factors X2 (general intelligence) and X3 (average study-hours
per week). The general formula for a'$ of any order is
(T1.234 . . . n = 0'l'V/l — r2l2Vl— r2i3.2^1 — r2l4.23 • • .
Vl-r2l7,23...u-i) (50)
This formula may be used to compute the net o-'s in correlation
problems which involve any number of variables. In a five-
variable problem, for example, 01.2345 is written
(1) 01.2345 = 01 Vl — r2i2 Vl — r2i3.2 Vl — r2i 4.23^1 — r2i
5.234
and by analogy to (1) or by reference to (50) the other o-'s may
be written:
(2) 02.1345 = 02 Vl — r2i2 Vl — r223.i v'l — r224.i3 Vl — r225.i34
(3) 03.1245 = 03 Vl — r2l3 Vl — r223.1 Vl — r234.12 Vl — r235.124
(4) 04.1235 = 04 Vl-r2i4Vl-r224.lVl-r234.12Vl-r245.123
(5) 05.1234 = 05 Vl — r2i5Vl — r225.iVl — r235.i2Vl — r2
45.123
Each of these o-'s measures the variability of a single factor
when the effects of the other four are ruled out or held con-
stant. All of them are o's of the fourth order, since there are 4
secondary subscripts, and the order of a partial a, like the order
234 STATISTICS IN PSYCHOLOGY AND EDUCATION
of a partial r, is determined by the number of its secondary
subscripts.
By a simple rearrangement of the secondary subscripts any
higher order o may be written in more than one way. A a of
the second order may be written in two ways: e.g., 0-1.23 which is
given on page 227 as 0-1.23 = Q'iV/l — r^Vl — r2i3.2 may also be
written 0-1.32 = o-i V 1 — r^v^l — r2i2.3-
In like manner, 0-2.13 may be written
(1) 0-2.13 = 0-2 Vl — f2l2 Vl — r223.i,
or
(2) 0-2.31 = 0-2^1— r223^1 — r2l2.3j
and
0-3.12 may
be written
(1) 03.12 =
(2) 0-3.21 =
or
0-3V1-
-r2i3Vl-
-r223.i
0-3 Vl -
-r223Vl-
-r2i3.2.
The alternate forms of a partial a are useful as a check on the
arithmetic calculations, and too because they make unnecessary
the calculation of otherwise unused and hence superfluous
partial r's. Thus by using the second forms of 02.13 and 0-3.12
instead of the first (see Table XXVI) wTe make unnecessary
the calculation of r23.i so far as the computation of the o-'s is
concerned. Furthermore, if r23.i is not used elsewhere in the
problem, it need not be calculated at all (see page 228). Two
partial r's, are all that we need in order to write the regression
equation in a three-variable problem.
The number of alternate forms in which any higher order 0-
may be written depends on the number of permutations which
its secondary subscripts can take. We have seen that a second
order a may be written in two ways: 0-1.23 and 0-1.32. In the
same way, any 0- of the third order, e.g., 0-1.234 may be written
in 6 ways: 01.234, 0*1.243, 0-1.324, 01.342, 0-1.423, 0-1.432. Any <r of
the fourth order, e.g., 0-1.2345 may be written in 24 ways, and
any a of the fifth order, e.g., 01.23450, in 120 ways.1
1 This follows from the law of permutations. The permutations of 4 things
taken 4 at a time are 4^4 = 4X3X2 XI =24; and the permutations of 5 things
PARTIAL AND MULTIPLE CORRELATION 235
Fortunately we need only a very few of all of these possible
arrangements. Care, nevertheless, must be taken that the
correct forms are chosen, for just as the number of partial r's
which must be computed in a 3-variable problem can be reduced
by a judicious choice of <r formulas, so also in problems which
contain more than 3 variables the number of partial r's may be
considerably reduced by proper selection. And it is in the
longer problems that a reduction of the number of partial r's to
be computed counts most, since it is here that the calculations
become laborious. The partial a's which require the calcula-
tion of the minimum number of partial r's are given — for 4- and
5-variable problems — in the outline solutions on pages 240-244.
These will be found useful for quick reference. By analogy
to these, the selection of the a formulas in problems which
involve more than five variables can be easily made.
3. General Formulas for the Regression Equation, and Co-
efficients of Regression
The general regression equation, which expresses the rela-
tion between a single dependent variable, Xi, and a number of
independent variables, X2, X3, X4 . . . Xn, may be written in
Deviation Form as follows :
Xl = 6l2.34 ... n X2 + bl3.24 . . . n #3+ . . . &ln.23 . . . (n-1) Xn. (51)
and in Score Form as
Xl = 6l2.34 . .. n X2 + 613. 24 . . . raX3+ . . . 6ln.23 . . . (n-l) Xn~\-K. (52)
The regression coefficients 612.34 . . . », 613.24 . . . », etc., give the
weight or value to be attached to each independent variable
when Xi is to be estimated from all of these in combination.
Moreover, the regression coefficients indicate the weight which
each independent variable has in determining Xi exclusive of the
influence of the other variables, and hence we can tell from the
regression equation just what part the score on each of several
taken 5 at a time are 6P& = 5 X4 X3 X2 X 1 = 120. In general, the permutations
of n things taken n at a time are nPnacn{n — l)(ji—2) . . . to n factors. See
the Chapter on Permutations and Combinations in any Algebra.
236 STATISTICS IN PSYCHOLOGY AND EDUCATION
tests plays in determining the score on the test taken as the
dependent variable.
The regression coefficients in a regression equation may be
computed from the formula
7 CI. 234 . . . n /ro\
012.34 . . . n = ^12.34 . . . n • • • • \06)
02.134 . . . n
If the problem involves only three variables, the regression
equation becomes Xi = 612.3X2+013.2X3 -\-K. In this equa-
tion, the regression coefficients 612.3 and 613.2 are — like the
partial r's, ri2.3, and ri3.2 — of the first order. The first, 612.3,
equals ri2.3 — : — ; and the second, 613.2, equals 7*13.2 — : — (see
0-2.13 03.12
page 227 and Table XXVI). Regression equations which
involve more than three variables are easily written by refer-
ence to formula (52) and their regression coefficients may be
found from formula (53). In a five-variable problem, for
example, the regression equation becomes
Xi = 612.345X2+613.245X3+614.235X4+615.234X5+^,
and the regression coefficients (6's of the third order) are
01.2345
6l2.345 = 7-12.345
6l3.245 = ^13.245
6l4.235 = 7,14.235
6l5.234 = 7*15.234
0-2.1345
01-2345
0-3.1245
Q'1.2345
0-4.1235
Q'1.2345
0-5.1234
Obviously, to compute these regression coefficients we must
first compute the third order partial r's, and the necessary
partial q-'s. The calculation of the 6's is then a matter of sub-
stitution.
PARTIAL AND MULTIPLE CORRELATION 237
4. General Formulas for Standard and Probable Errors of
Estimate
All Xi scores estimated from a regression equation have a
standard error of estimate, a^st-xo, which measures the error
made in taking estimated instead of actual scores (see page 230) .
cr {eat. xo is found from the formula for 0-1.234 ... n, as follows:
C(est. Xi) = 0"1.234 ... n, (54)
and
P#(est.X1)=.6745X<X(est.X1) (55)
As ci.234 . . . n must always be computed in order to find
the regression coefficients (see examples above), o-(est. xo is
known at once without further calculation. The value of a
standard error of estimate has already been illustrated on page
230 from the data of Table XXVI. To repeat, we find in
Table XXVI, that the o-^st.x^ °f anY estimated number of
honor points is 6.34, and that the P£T(est.^1) is 4.28 points.
Hence, the chances are even that the "most probable," i.e.,
estimated, number of honor points received by any student — as
found from the regression equation — will be in error by 4 points
or less (roughly). We may be practically certain that any
estimated number of honor points is not in error by more than
4X4 or 16 honor points.
It may be shown by the method of least squares x that the
standard error (or PE) of estimate is a minimum when the
regression equation is used to estimate the Xi scores. For this
reason, values of Xi predicted from the regression equation are
said to be the "best" estimates of the actual Xi values which
can be made from a linear equation which contains the given
variables. The regression equation Xi = . 57X2 + 1.13X3 — 66
(see page 230) will serve as an illustration of what is meant.
Assuming that the relation between Xi and X2, Xi and X3,
and X2 and X3 is linear in every case, Xi (honor points) can be
estimated from this equation with a smaller error of estimate
than from any other equation.
1 See Yule, An Introduction to the Theory of Statistics, p. 231.
238 STATISTICS IN PSYCHOLOGY AND EDUCATION
6. General Formula for R, the Coefficient of Multiple Correlation
The correlation between a single dependent variable X\ and
(n — 1) independent variables, — e.g., X2, X3, X4 . . . Xn — in
combination is given by the formula
#1(23 . . • n) = \/l ~ 0 ' H, .... (56)
\ <T"l
in which #i(23 . . . ») is the coefficient of multiple correlation,
c\ is the o- of the dependent series of X\ scores, and 0-1.23 ... n
equals the standard error of estimate (see formula 54). When
there are only three variables, the multiple coefficient of cor-
2
O 1 2*^
1 ^— ; when there are five
R
1(23)
= A
h
C21.23
\
/1-
9
or 1
.2345
o
; and
variables #k2345) = \/1 5 — ; and in like manner the R
\ 0-1
for six, seven, or any number of variables may be written by
reference to (56).
Since the error of estimate is a minimum when the regression
equation is used for estimating Ari scores, it follows that
the multiple coefficient of correlation R gives the maximum
correlation obtainable between the actual X\ scores and X\
scores estimated from a knowledge of the independent vari-
ables X2, X3 . . • Xn, in the regression equation. R is valu-
able, therefore, as indicating how effectively a given com-
bination of measures (or "team of tests") represents the actual
values of X\ when these measures are combined in the best
possible way. R is always positive no matter what the
signs in the regression equation may be. Errors of sampling,
therefore, do not neutralize each other but tend to become
cumulative. As a result, the PE of R — which is found from the
same formula as the PE of any product-moment ?' — is not a
fair measure of the coefficient's validity. To test the validity
of an obtained R, we must compare it with the value of that R
which we should get from the same number of cases and the
same number of variables, when the variables are uncorrected,
PARTIAL AND MULTIPLE CORRELATION 239
i.e., with the R which would arise from fluctuations of sampling
alone. The formula for this R is
R=^T' <w>
in which n is the number of variables, and N is the number of
cases.1 To illustrate this formula, let us apply it to the three-
variable problem in Table XXIV, in which n = 3, and N = 450.
Substituting for N and n in the formula, we get an R equal
to .07, which indicates a highly satisfactory degree of validity for
the obtained R of .824.
If we replace 0-1.23 n in formula (56) by its value in
terms of the entire and partial r's [see formula 50] we may
write the general formula for #i(234 . . . n), as follows:
R
1(234 . . . n) =
Vl-[(] -r2i2)(l-r2i3.2) . . . (l-r2in.23 . . . (»-i>)]. . (58)
Moreover, since a higher order a may be written in a variety of
ways, the number depending upon its order (see page 234), we
have in the alternate forms for R & valuable means of checking
the accuracy of our arithmetical calculations. In a three-
variable problem, for example, Ri&3) may be written as
fii(23) = Vl-[(l-r2i2)(l-r2i3.2)],
or
#K32) = Vl-[(l-r213)(l-r2i2.3)].
In like manner, in a 4-variable problem #i#34) may be found
from
£i(234) = Vl-[(l-r2i2)(l-r2i3.2)(l-r2i4.23)],
and checked by
#K342) = Vl-[(l-r2i3)(l-r214.3)(l-r212.34)].
1 Rosenow, Curt, The Analysis of Mental Functions, Psychological Mono-
graphs, 1917, Vol. XXIV, 5, p. 20.
240 STATISTICS IN PSYCHOLOGY AND EDUCATION
6. Outline of the Formulas Needed in Correlation Problems
Which Involve (a) Four Variables and (b) Five Variables
In multiple correlation problems, generally the main task is
to find — with a minimum of time and calculation — the regres-
sion equation which expresses the relation of the dependent
variable to the independent variables. For this purpose, when
working with more than three variables, the simplest plan is to
write down the formula for the regression equation required
first and then proceed deductively to find those partial r's and
higher order cr's which are necessary for computing the regres-
sion coefficients. The formulas for getting the regression
equation with a minimum amount of calculation are given — for
four and five variables — in the following outlines. It is neces-
sary, of course, that all zero order r's be first computed before
the partial correlation technique can be applied.
(a) Formulas for Four- Variable Problems
(1) Regression Equation. The regression equation for four vari-
ables is written by reference to formula (52) as follows:
(2) Regression Coefficients. The three regression coefficients
needed in (1) are found from formula (53), —
, Cx.234
Oi2.34 — 7*12.34
C2.134
, 0*1.234
Oi3.24 — Tu. 24
C73.124
, Cl.234
014.23— 7*14.23
CT4.123
These regression coefficients evidently require the computation of
3 second order partial r's, and 4 third order o-'s.
PARTIAL AND MULTIPLE CORRELATION 241
(3) Partial r's.
To find: To find: To find:
(a) (6) (c)
7*12.3 — #14.3 7*24 3 7"l3.2— 7*14.2 T34.2 7*14.2— 7*13.2 7*34.2
ri2.34= ; / - — 7*13.24= , — , 7*14.23 = "
Vl-r2i4.3Vl-r224.3 Vl-r214.2Vl-r234 2 ' Vl-r213.2Vl.
•7*-34.2
We must find 3 first We must find 3 first No partials of first
order partial r's as order partial r's as order are needed
follows: follows: other than those
already found.
ri2-ri3 r23 ri3-ri2 r23
ri2.3=— — . — — ri3 2=-
Vl-r2i3Vl-r223 Vl-r2i2Vl-r2
12 v X-7-23
ri4-ri3 r34 ri4-ri2 r24
ri4.3=— 7= - / = ri4.2=-
Vl-r2i3Vl-r234 " Vl-r2i2Vl-r224
r24-r23 r34 r34-r23 r24
r24.3=— 7== — , r34.2=-
Vl-r228Vl-r284 ' Vl-r223Vl-r224
[Note that a minimum of 9 partial r's must be computed, 3 of the
second order and 6 of the first order. The 9 first and second order r's
together with the 6 zero order r's make 15 coefficients of correlation
required in all.]
(4) Standard Deviations. The four third order cr's required may
be found from the following formulas which make use of no partial r's
other than those already computed in (3) above. From formula (50) :
Cl.284 = <Tl Vl — r2i2 Vl — rai«.» Vl — f2i4. 23
CT2.134 (i.e., (72.34l)=0-2 V 1— r223 V 1 — 7*224.3 V 1— r2i2.34
c3.i24 (i.e., (73.24i)=(73 V 1— r223 V 1— ?'234.2 V 1 — r2i
0-4.123 (i.e., o-4.32i)=o4V/l — r234Vl— r224.3V/l — r2i
3.24
4.23
The numerical values of the regression coefficients may now be
computed and substituted in the regression equation.
(5) The Standard Error of Estimate, a- (est. xi)- From formulas
(54) and (55) we find:
ocest.xx) =01.234 [for value 01.234 see (4) above]
PE(eat. X{) = • 6745 0(est. Xi)
242 STATISTICS IN PSYCHOLOGY AND EDUCATION
(6) Coefficient of Multiple Correlation, R. In a four- variable
problem the multiple coefficient, R, is written Riqu) and may be
found from formula (56) :
Rwui = yjl -~
This formula may also be written as:
#i<2W) = VH(l^)(l-r«llll)(l-r«M.„)
or as
#1(234) = V/l-[(l-r213)(l-r214.J)(l-^12.34)
(6) Formulas for Five-Variable Problems
(1) Regression Equation:
^l = Oi2.345A^2-j-Oi3. 245A3-hOi4.235-X44-O15.234X5-h.lv. . • (52)
(2) Regression Coefficients:
, 0*1.2345 7 0*1.2345 /~0\
Ol2.345 = yi2.345 j Oi4.236 = ^14.235 , • • (.Oo)
0*2. 1345 0"4.1235
, 0*1.2345 , 0*1.2345
Ol3.245 = ^l3.245 " , Oig.234 = ri5.234 •
0*3.1245 0*5.1234
(3) Partial r's. We compute 22 partial r's as follows (formula 49) :
(a) (o)
To find: r12.345 write as r12.453. To find; fi3_24s write as
Then
Then—
23-45
^12.45 — ?"l3.45 ^23.45 „ „ „
7-12.453 = —T 7= • r - rn.45-ri2.45r23.45
To compute this r we need 3
partial r's of the second order, To compute this r we need no
partial r's other than those already
found in (a).
viz., —
ri2.4— ru.4 r25.4
ri2.45 —
ri3.45 —
Vl-rhsWl-rhs.4
ri34— ris.4 r35.4
r23.4— r25.4r35.4
r23.45 — / = / ~'
Vl-rJ2Mvl-r23u
To compute these 3 r's we need
6 r's of the first order, viz., —
ru.4 ris.4 ri3.4
T26.4 ^23.4 rjS.4
PARTIAL AND MULTIPLE CORRELATION 243
(c) W
To find: ri4.235 write without To find: r]5.234 write without
change— change—
7*1 A. "3 —7*15.23 9*45.23 7*15.23 — 9*14.23 7*45.23
ri4.235 = / j- 7*15.^34
V^-rh^Vl -r»«.s«" Vl -r214.23 Vl-r^s-aa'
To compute this r we need 3 m A ±1 • j
.,,-,, i , to compute this r we need no
partial r s of the second order, partialg other than those already
vlz-> found in (c).
7*14.2 —7*13.2 7*34.2
7*14.23 :
7*15.23 —
7*45.23
Vl -r\3.2Vl -rhi.2
7*15.2 —7*13.2 7*35.2
Vl -rhz.2 Vl -rh&.2
7*45.2 — 7*34.2 7*35.2
Vl-rhiWl-rsJ
To compute these r's we need
6 r's of the first order, viz., —
7*14 2 7*13.2 7*15.2
7*34.2 7*35.2 7*45.2
[Note that we must compute a minimum of 4 third order r's, 6
second order r's, and 12 first order r's, 22 in all.]
(4) Standard Deviations. The 5 fourth order cr's required may
be found from the following forms which make use of only those
partial r's already computed in (3):
0-1.2345 =o-1Vl-r212V/l-r2i3 2Vl-rtu.2zVl-rhs.Z4 • (50)
CT2.1345 (i.C, 02.453l) =0-2^1 -r224Vl-r225.4V/l-r223.45Vl-r2l2. 345
0-3.1245 (i.e., 0-3.4521) =o-3 V 1 — r234 V 1 — rJ3d., Vl — r2i-iA6Vl — r2i3.245
0-4.1236 (i.e., 0-4.235l)=0-4V/l-r224V/l-r234.2V/l-r245.23'V/l-7*2l4.235
0*6.1234 (i.C, 0-5.234l) =0-5 V 1 — r226 V 1 — r236.2 V 1 — r245.23 V 1 — r2i5.234
(5) Standard Error of Estimate a- (est. xa
©■(est.x!) =0-1.2345 [see (4) above for value] . . . (54)
P^(est.X1)=.6745 0-(est.Xi) . . (55)
244 STATISTICS IN PSYCHOLOGY AND EDUCATION
(6) Coefficient of Multiple Correlation, R.
•'"' (56)
it 1(2346) — A/ 1 ~
which may be written also as
Rums* = V/l-[(l-r212)(l-r213.2)(l-r214.23)(l-r2i5.234)],
and checked by
^K2346) = Vl-[(l-r»M)(l-r*1,.4)(l-r«w.„)(l-r*1>.a46)].
IV. A Multiple Correlation Problem with Four
Variables
In Section II we found that a student's honor points (X\)
could be estimated with a considerable degree of accuracy from
a knowledge of his general intelligence score (X2) and the num-
ber of hours he spends in study per week (X3) . The PEiest. Xl)
made in estimating individual scores from this three-variable
regression equation was found to be 4.28 points; and the coeffi-
cient of multiple correlation, Ri@3) which indicates, in general,
how well the estimated scores represent the actual scores was
.824. Now suppose that we add to the two independent
variables X2 and X3 a third factor X4 — e.g., the quality of the
preparatory work done by the student in High School.1 This
will give us three independent variables from which to estimate
the dependent variable honor points, and the question arises : —
with how much greater accuracy will this additional factor
enable us to predict academic success?
The answer to this question will be found in Table XXVIII,
which gives a complete solution of this problem, following the
scheme outlined for four- variable problems in Section 111(6).
Some additional discussion of procedure and methods and
several points to be especially noted are given in the following
paragraphs.
Remember first of all that the mean and the a of each set of
measures must be known as well as their 6 inter correlations,
1 This was measured by the average grade obtained in the work offered for
entrance to College. May, Predicting Academic Success, Journal of Educa-
tional Psychology, Vol. XIV, 434-436.
PARTIAL AND MULTIPLE CORRELATION 245
r's of the zero order. The calculation of these 6 intercorrela-
tions is actually the most laborious part of the solution of a
multiple correlation problem — in spite of the fact that we have
passed it over with little comment heretofore — since a separate
correlation table must be drawn up for each r.
(1) The discussion from here on * follows the outline given
in (6) on page 240. Thus, before calculating any partial r's, we
write the regression equation, and from it deduce what partial
r's and higher order cr's will be required.
(2) It is clear from the regression coefficients that we shall
need three partial r's of the second order: — viz., ri2.34, ri3.24,
and ri4.23; and four partial <r's of the third order, viz., 0-1.234,
0-2.134, 0-3.124, and 04.123, in order to evaluate the constants in
the regression equation. Only the partial r's actually required
in the regression equation need be calculated.
(3) In order to find ri2.34 we shall need three first order
partial r's, viz., ri2.3, ri4.3, and r24.3j and to find ri3.24 we shall
need, again, three first order partial r's, viz., ri3.2, ri4.2, and r34.2-
To find the last second order partial, ri4.23, no additional first
order r's are required other than those already found. A mini-
mum of 9 partial r's, therefore, is required in all.
The partial ri2.34 gives the net correlation between (1) honor
points and (2) general intelligence when both (3) study hours
and (4) average High School grades have been eliminated as
variable factors or held constant. In like manner, ri3.24 gives
the net correlation between (1) honor points and (3) study
hours when both (2) general intelligence and (4) average High
School grades are held constant. The first second order partial
r, i.e., ri2.34, equals .764 and is but slightly reduced from ri2.3
which equals .802; while the second partial ri3.24 = .676, and
is also but slightly less than ri3.2 which equals .707. This
comparison of partial r's shows the relatively small influence
of High School grades on the net correlation between (1) honor
points and (3) study hours with general intelligence constant,
as well as the small influence of this factor on the net correlation
1 See Table XXVIII. The divisions in the text parallel those in the table.
246 STATISTICS IN PSYCHOLOGY AND EDUCATION
between (1) honor points and (2) general intelligence for study
constant. Notice, however, that while the zero order coefficient
of correlation between (1) honor points and (4) average High
School grades, i.e., ru is .40, ri4.2 = .246, ri4.3 = .387, and
7*14.23 = .088. Evidently, nearly all of the correlation which
appears between (1) honor points and (4) average High School
grades may be attributed to the common dependence of these
two factors on (2) general intelligence and to a somewhat lesser
degree on (3) study hours.
(4) By using the forms given in (6) page 240, we are enabled
to calculate the four third order as required by the regression
coefficients without the necessity of finding any additional
partial r's (see page 234). These partial o's viz., 0-1.234, 02.134,
etc., give the net variability of the distribution of measures
denoted by the primary subscripts when the influence of all
three of the other factors (secondary subscripts) has been
excluded. To take a single example, 01.234 is 6.31 as against
a 01 of 11.2, which means, concretely, that if each of the 450
students in the group were exactly alike as regards (2) general
intelligence, (3) study-hours, and (4) average High School grades,
the a of their distribution of honor points would be only about
half as large as the observed o: — the o of the group in which
these factors differ in weight or value.
The computation of the regression coefficients is simpl}- a
matter of combining the partial r's and o's already found.
When this has been done, we may substitute in the regres-
sion equation to find xi = . 55^2 + 1.07x3 + .083o*4, or multiply-
ing by 12.5 (a convenient constant), (the number of honor
points) =7 (score on general intelligence test) +13 (the number
of hours spent per week in study) +1 (average High School
grades). In Score Form the regression equation becomes
Xi = .55X2+1.07Z3+.083X4-69.
It is clear from the regression equations that the number
of hours spent in study has twice the weight of the score on
general intelligence test and thirteen times the weight of the
average High School grades, in determining the number of
PARTIAL AND MULTIPLE CORRELATION 247
honor points which a student will most probably receive at the
end of the first semester. Apparently (as noted above), the
average High School grades have relatively little influence on
honor points as compared with the other factors in the equation.
(5) Still further evidence of the small importance of High
School grades in improving the estimate of honor points is
to be seen in the size of the PE^t.Xl)- The PE of estimate
made in predicting honor points from the present equation is
4.26 points as compared with a Finest x$ of 4.28 points made
in using the regression equation which does not include High
School grades (see page 230) . This means that we can estimate
the number of honor points which a student will receive, know-
ing his general intelligence score and the number of hours he
spends in study per week, with but slightly greater error than
when we know in addition to these two the average grade he
has received in High School also. It would seem apparent,
therefore, that the work required to build up a regression equa-
tion which will include the latter factor is hardly worth while.
(6) The multiple coefficient of correlation, 2£i(234) is .826
as compared with the Ri@3) of .824. A comparison of these
multiple coefficients further substantiates the conclusion
that High School grades contribute practically nothing to the
reliability of an honor point estimate.
It will be of considerable interest to compare the reliability
of our estimate of honor points when the factors, singly and
in combination, are taken into account. In this way the
"prognostic" value of the multiple regression equation — as
shown by the size of o-(est. xi> — will be more readily appreci-
ated. The standard errors of estimate and the coefficients
of correlation for the different factors taken singly and in
combination are given below:
Dependent Variable:
(Honor Points X{) o"(est. Z\) Coefficients of Correlation
Xx=.43X2-24.76 8.96 r12=.60
Xi=.60X3+4.1 10.61 ris-,32
Xi«.57X2+l. 13X3-66 6.34 #1(23)"=.824
X= .55X2+1. 07X3+.083X4-69 6.31 #i(234) = .826
248 STATISTICS IN PSYCHOLOGY AND EDUCATION
CO
>
X
M
W
M
H
CO
H
s
>
I
fa
O
O
«
O
fa
O
B
«j
«
M
O
O
o
o
M
H
3
o
w
o
o
•d
lO
,£03
OS
t>^
bC-d
l>
bD
03
3
c3
>>
cu
•n
Mm
o
o
03
go %
"8£
+3
CO p
l-H
3 a
d
o
03
rj< CO
CN
O
• iH
-4-3
C3
-4-3
d
P-.
a
o
U
CO
0)
o
d
0>
bD
0)
d
03
o
CN
co
.9
O
fa
f-i
O
d
o
w
CD
O
O
00
lO CM
00 tH
k!
+
iO CO
^
CO CO
1
II II
+
s £
o
CM
0)
bfl
CD
o
cc
d
o
03
f-l
03
a
CO
II
CO
O <N
o
o
• •
d
CO CO
II II
II
V|-l
d
#o
d
o
03
CN CO
—
-+j
d
03
c c
V-
3
C
O
"o
w
O
00.
d
d
««H
o
o
o
*w
*co
03
CO
CO
a
0>
03
H
03
bO
txO
rd
03
03
o
CO
rt
«
u
/^^
o
in
<N
fa
N-"'
b b
PARTIAL AND MULTIPLE CORRELATION
249
OS
H3
o
H
a>
a
•-*
•+■»
d
o
o
>
M
M
H
n3
a
o
H
I
i— i
>
>
>
>
©
S-i
'3 ..
02 O
^^
03 03
*^ ?-<
t-i 03
o2
m d
cc^
O H
o
©
.a
-^
<4-l
O
H-=
S-I
03
ft
CO
a
HI©
© O
o
CO
GO
o
II*
1>
CM
os
X
CO
1>
OS
o
*
l>
X
CM
OS
1
rH
©
CO
O
e
CM
<»
►*s
1
■♦^
to
§
1
©
c
V.
fe<
o
CM
CO
i>-
CO
r^
ill
II
CM
CO
cm
cm
iO
IM
II
S co
CM
Sj-
CO
i-»
L,
1
1
CO
I x
CO
1
1
t^
■*r
1
CO
•*r
1 ^.^
5^.
^H
1
CO
J^
T-H
o
^.
i-l lO
c
>
X
CO
OS
CM
>
X
CO
CO
co
CM
\, CO
1
~»
o
1
CM
o
os
1
S 1
1
w
^
CO
X
1
CM
CO
X
1
*T 1
c
1
1
GO
£
|
1
£
1 1
1
1
GO
1 1
i— t
CM
t-4
o
*-* TH
>
CO
>
"tf
> -.
1
1
i
1
1
II
o
CO
CO
os
X
i>-
CO
co
os
ca
©
CO
CM
X
CO
CM
1^
O
OS
CI
CO
OS
X
cm
GO
CO
OS
CO
O
H
>
>
©
03
ft
CO
£
02 • •
f-t
© o
ca
CM
O
CO
CM
CO
l^
CO
!
II
CO
T— 1
CO
CM
CM
no
CO
CM
II
1 C3 tH
5-
1
i— 1
>
CO
1
X
CO
co
OS
5*.
1
>
1
T-H
T-H
X
os
CO
OS
OS
•3"
5^"
|i x
\„ CO
S»
CO
CM
x
■
CO
CM
1
S i
1
CM
CO
r*
1
CM
5-
CO
X
1
CM 1
5^ y^,
c-i
1
t>.
I
,
tH
CJ
1 ,
C
1
1
r}H
!>.
1
1
]>.
t*
1 1
1—1
o
OS
i—l
o
"?H
i— I co
>
CO
V
"Ch
OS
\ CO
1
1
1
1
1
1
OS
CO
OS
OS
X
t>
co
CO
os
©
e
OS
©
CO
1>
CO
CM
rh
X
t>
00
CO
I
CM
o
oo
250 STATISTICS IN PSYCHOLOGY AND EDUCATION
o
CO
-t
IC
)
to
3
u;
tC
>
1— 1
CO
CD
II
00
00
CD
00
II
05
05
CO
o
CO
II
1-1
CD
II
Oi
to
C3
•
•
•
(2
d
_o
'•+3
03
P
•>*
CO
o
CO
C5
•
C5
X
oo
CO
X
t
00
o
X
CI
o
X
00
CI
+
?
CN
O
OS
o
]>
<M
a>
H
rH
o
X
CD
•
t^
O
1>
05
X
o
fr
*0
0)
.S
H->
a
o
O
|
X
00
X
CM
i—i
T— 1
II
CD
CO
C5
X
00
to
1—4
II
X
1>
CD
CO
X
CD
II
C3
CO
o
d
X
to
]>i
II
co
rO
d
CO
tr
i
rH
CO
CD
3
:
00
CO
00
c
1
1—4
CO
CD
3
CO
c
a
c
1
1—1
CO
CD
3
D
1
i— I
CD
1—1
+
CM
LO
to
II
1
03
rH
O
C3
m
a
CO
1
CO
00
r—
C"
C£
II
! «
>
i
)
! d
C~>
esT
«
r-i
^f
X
<*
X
o
X
00
CO
cu
CO
00
o
II
' X .2
>
h~
-
-
5-
CM
o
i>
00
B
o
o
2 rt
< r— .
X
1
1— 1
1
rH
1
-4
1
i—l
to
i>
co
o
1
+
+
b o
II II «
•"H
— — • *J
02 00 r-H
« S j-J
b S a
PQ
go
1
i—i
>
>
T
i—l
>
>
1
i—l
>
<*
CO
CM
1
— 1
>
-t-3
el
"o
cfi
03
o
o
d
o
1
b
5
1
1
b
i"
1
b
5
1
CO
b
\
b
1
1
CO
CM
^b
f
-o
03
rD
"■+3
o
1
CO
o
1—1
o
1-H
+
H
03
H-»
cc<
•n
.,
CO
CM
CM
rH
+
to
o
N
-, 1
CO
"oo
CM
CO
**
_o
to
cj
CO
+•»
\
— t
>
r
—4
>
CM
r
>
CO
r
—4
>
CD
rH
bJO
0)
rH
1— 1
.3
rO
rO
o
03
o
rH
CD
O
O
i—i
II
CO
w
o
d
s
"u
03
• •
b
b
b
b
^
1
O
CO
o
•rH
II
eo
C4
II
CO
II
rH
II
CO
CM
rH
c3
Cm
J32
rO
03
*»
^
O
Vh
w
T3
U
>
p
b
CM
b
£
s
o
d
.2
r£
<4-l
O
00
to
to
II
>»H
o
d
#o
'•+3
03
to
a
cii
-4->
ty3
"-3
1
c3
-P
g
00
TO
H->
3
»o
ft
>
1
Ch
in.
a
o
.9
*h3
1
03
H
1
3
H-J
'■+3
CO
.a
3
03
O
io
o
>-•
OQ
0
CO
to
co
00
^
ft.
CD
CM
00
CO
X
00
LO
iO
oo
X
CD
l>
00
>
I
>
II
r*
o
03
rC
PARTIAL AND MULTIPLE CORRELATION 251
The important fact here is that cr(est. xo is considerably
less, and the correlation considerably greater, when X2 and X3
are taken together than when either is taken alone. The stand-
ard error of estimate and the R improve very slightly when X4
is added to X2 and X3. It is very probable that by an exten-
sion of the method of partial and multiple correlation to in-
clude other variables in addition to those we already have,
the o-(est. xi) of our problem could be still further reduced and
R increased.
Before working out a regression equation containing added
variable or variables the " predictive value" of the "new"
equation should be found by computing o-(est.xi) or & This
will enable us to determine what the effect will be of adding
another variable or variables, and whether <7(est. Xl) is sufficiently
reduced or R sufficiently increased to justify the additional
calculation. In the present problem, for instance, either
<T(est.x1y or .Ri(234) would have told us that average High
School grades add practically nothing to the predictive value
of a regression equation which already contains the two
variables general intelligence and number of hours spent on
the average in study each week.
V. The Value and Use of Partial and Multiple
Correlation
1. The Value and Use of Partial Correlation in Analysis and in
Causal Investigations
Partial correlation is of considerable importance in the
analysis of the part played by each of several factors in a total
result, inasmuch as it enables us to find the net relationship
between two sets of scores or measures when the influence of
one or more other factors is excluded. A concrete illustration
of this use of partial correlation may be cited from the work of
Cyril Burt.1 Burt wished to find how much a child's mental
age — as given by the Binet tests — influenced his school attain-
ment. His subjects were 300 children from 7 to 14 years old.
I Burt, Cyril, Mental and Scholastic Tests, London, 1921, pp. 180-184,
252 STATISTICS IN PSYCHOLOGY AND EDUCATION
Each child's (1) MA (Binet) was found; likewise his (2)
scholastic achievement as measured by educational examina-
tions and checked by teachers; and (3) his chronological age.
The " entire" coefficient of correlation between Binet MA and
scholastic achievement (ru) was .91. When chronological
age (3) was held constant, the partial r (7*12.3) between
Binet MA and scholastic achievement dropped to .68. This
shows, in the first place, that age has a decided effect on the
observed correlation between MA and school work — that it
tends to increase or " dilate" the obtained r. This dilation is
due to the fact that both MA and school attainment tend to
increase with chronological age, and hence this common depend-
ence on chronological age is sufficient to bring about a consider-
able " boost" in the observed correlation. In the second place,
the 7*12.3 = .68 indicates that a substantial relation remains
between MA and school work when age conditions are uniform.
In other words, Binet MA (intelligence) is a substantial factor
in a pupil's school attainment irrespective of his chronological
age. To take the analysis a step further, Burt found that the
correlation between school work (2) and chronological age
(3) (7*23), was .87; and that when the effect of Binet MA was
held constant, the partial r between school work and chrono-
logical age (7*23. 1), was .49. The persistence of a fairly high
relation between school work and chronological age when
intelligence is eliminated offers confirmatory evidence, accord-
ing to Burt, of the "undue influence of age upon school classifi-
cation." In these illustrations it is clear that the calculation of
the partial r's is the first step in an analysis of the factors which
determine school attainment. By an extension of this same
method the influence of other factors may be excluded and net
relations secured.
From the analyses made through the elimination of factors
by partial correlation, we are often enabled to determine exist-
ing "causal" relationships. Thus Phillips1 in a study of the
1 Phillips, Prank M., Application of Partial Correlation to a Health Problem.
Reprint No. 867 from Public Health Reports, Sept., 1923.
PARTIAL AND MULTIPLE CORRELATION 253
causes contributing to absence on account of sickness among
government employees over the period of a year found that the
observed correlation between absence (i.e., number of persons
absent) and mean temperature on the day of absence (rat.) was
— .37. When the four factors (1) relative humidity at 8 a.m.
on the day of absence; (2) relative humidity at noon of the
previous day; (3) inches of rainfall on the day of absence; and
(4) per cent of possible sunshine on the day of absence were held
constant, the net correlation (rat. 1234) remaining between
absence and temperature was —.39, practically the same as
the original correlation. Since this was the only r of any size
(the other r's both entire and partial were negligible) the
obvious conclusion seems to be that of the factors studied,
temperature on the day of absence is the most important sec-
ondary or contributing cause of absence. (The sickness must
be taken, of course, as the primary cause of absence.) Here
and elsewhere let it be understood that partial correlation has
absolutely nothing to say about " causes," as such. The con-
clusion as to which of two factors is the cause and which the
effect is a matter of common sense analysis. In the illustration
given, the distinction between cause and effect is obvious.
Another interesting example of the use of partial correlation
in a causal investigation is found in the work of Reavis.1
This investigator undertook to ferret out the causes of attend-
ance and non-attendance in rural schools. Certain factors,
(1) distance from school, (2) age-grade relation, (3) kind of
work done by the pupils, (4) training, experience, etc., of teacher,
(5) school equipment, and (6) kind of community were taken as
having more or less effect on school attendance. When partial
correlation was applied to the problem, it was found that the
entire coefficient of correlation between attendance and distance,
and attendance and kind of community, were the least reduced.
The first was lowered from — .45 to — .43 ; and the second from
.30 to .28. Of all the factors selected, therefore, these two seem
1 Reavis, George, Factors Controlling Attendance in Rural Schools. Teachers
College, Columbia University, 1920.
254 STATISTICS IN PSYCHOLOGY AND EDUCATION
to have the most direct or independent influence on school
attendance. As in the problem cited above, the distinction
between cause and effect in this illustration is clear: — it is
evident that distance from school and kind of community are
the causes and not the effects of attendance or non-attendance.
2. The Value of the Regression Equation in Prediction and
Analysis
The value of the regression equation is twofold:1 (1) In its
usual form, it gives the weights to be assigned each of several
independent variables, in order that Xi (the dependent variable)
may be predicted or forecasted with minimum error (see page
237). (2) In its " special" form it may be used to analyze —
within certain limits — a given capacity or ability. We shall
consider these two uses of the regression equation in order.
(1) It has already been stated that the regression equation
enables us to combine two or more tests or other measures
(independent variables, X2, X3, . . . Xn). into a single value
(Xi) in such a way as to give the best possible estimate of X\.
In the three-variable problem on page 228, for example, the
regression equation gives us the best possible forecast of the
number of honor points (Xi) which a student will receive, when
we know his general intelligence score (X2) and the average
number of hours he spends per week in study (X3). Moreover,
once calculated, the regression equation may be used subse-
quently to estimate other student's scores in Xi when only their
scores in X2 and X3 are known. The value of the regression
equation as a forecasting instrument is determined by the size
of the standard error of estimate, and by the multiple coefficient
of correlation.
A good illustration of the value of the regression equation in
forecasting — taken from another field than psychology — is to be
found in the work of Moore in forecasting the cotton crop in
1 Kelley, T. L., Tables to Facilitate the Calculation of Partial Coefficients of
Correlation and Regression Equations, BulletiD of the University of Texas,
1916, 27, p. 7.
PARTIAL AND MULTIPLE CORRELATION 255
the Southern States.1 Taking the cotton crop in Georgia as
the dependent variable (to cite a single example) and the May
rainfall, June temperature, and August temperature as inde-
pendent variables, Moore built up a regression equation from
which it was possible to get a better forecast of the crop at the
end of August than the official method of the U. S. Department
of Agriculture could obtain from the condition of the crop in
September. (By better forecast is meant a smaller error of
prediction.)
In addition to its use as a forecasting instrument, the regres-
sion equation may be used also to determine the value or
" weight" which each test in a battery should have in order
that the composite scores obtained from the battery (group
of tests) shall be the best possible estimates of that capacity
which the whole battery of tests presumably measures. This
is essentially the same problem as that of prediction or fore-
casting discussed in the last paragraph. Suppose, by way of
illustration, that the problem is to devise a group test for measur-
ing general intelligence; and that this battery is to consist of
four tests. The first step is to secure some good " criterion" 2
of general intelligence. This may be (1) school grades, (2)
teachers' estimates, (3) (1) and (2) combined, or (4) some
standard intelligence examination, as for example, Stanford-
Binet or Army Alpha. The next step is to select four tests
which will separately give (1) high correlations with the criterion,
and (2) low correlations with each other.3 These two condi-
tions guarantee that each test will measure some aspect or phase
of the criterion ; and further that each test will probably measure
a different, or slightly different, phase of the criterion, since
the low intercorrelations will prevent much duplication. Let
us call the criterion Xc and the four tests of the battery Xi,
X2, X3, and X4. The regression equation in Score Form is
1 Moore, H. L., Forecasting the Yield and Price of Cotton, 1917, pp. 108-115.
2 See page 266 for definition of " criterion."
3 The ideal battery of tests would consist of tests which correlate as. high as
possible with the criterion, and as low as possible with each other,
256 STATISTICS IN PSYCHOLOGY AND EDUCATION
Xc = AX1+BX2+CXz+DX±+K: in which A, B, C, D, the
regression coefficients, are the "weights" to be given the
scores made on the four tests, and K is a numerical constant.
Now to take a very simple case, suppose that A — \; B = 2;
C = 3; and D = 4. The regression equation then becomes
Xc= lXi + 2X2+3X,3+4X4+i^: which means that a subject's
score on test No. 1 must be multiplied by 1, his score on test
No. 2 by 2, his score on test No. 3 by 3, and his score on test
No. 4 by 4 in order that his composite score on the battery may
give the "best" estimate of his score on Xc, the criterion.
The regression equation may be said to furnish the ideal
method of combining several tests into a team, since each test
in a regression equation is weighted according to its correlation
with the criterion, independently of the other tests in the team
or battery. Under these conditions the standard error of
estimate is a minimum while the correlation of the predicted Xe
values and the actual Xc values (multiple R) is the maximum
obtainable with the given set of tests. R tells the extent to
which our team represents the criterion.
(2) The only difference between the usual or " regular"
form of the regression equation and the "special" form to be
considered now is that in the special form, the o-'s of all of the
different tests (or other measures) are taken as equal. This
procedure eliminates differences in the size of the test units as
well as differences in "spread" or variability, and enables us to
determine (from the correlation alone) the relative weight with
which each independent factor "enters into" or contributes to
the dependent variable (the criterion) independently of the other
factors. In this way, an analysis can be made of the impor-
tance of several different factors in some final result. It is very
important to remember, however, that in its special form, the re-
gression equation cannot be used for forecasting.
We may illustrate the special use of the regression equation
with data taken from the three-variable problem on page 228.
If Xi, honor points, be taken as the criterion, while X2, general
intelligence, and X3, average number of hours spent in study
PARTIAL AND MULTIPLE CORRELATION 257
per week are, as before, the independent variables, the usual
or " regular" regression equation is written:
Xi — 612.3X2 +613.2X3+^.
Replacing the b's in this equation by means of formula (53),
v CT1.2S v , 0-1.23 -rr 1 rr.
Al=ri2.3 A2+ri3.2 A3+A;
(T2.13 0-3.12
and replacing the partial o's [by formula (50)], we have
v 0-1 Vl — r2i3Vl — r2i2.3 v
Al=ri2.3 1 > - A2
(72 V 1 — f223 V 1 — H12.3
. (TiVl-r2i2^l-r2i3.2 v , ^
+ri3.2 y -— r Xz+K.
0-3 V 1 — r223 V 1 — H13.2
Substituting numerical values for the r's and putting 0-1 = 0-2 = 0-3,
we have
or
Xi = .8X2 + .QX3+K.
L
This result may be interpreted to mean that in so far as the
two factors, general intelligence and number of hours spent on
the average in study per week, "enter into" the ability to get
honor points, they contribute with the relative weight of
.8 : .6 or 4 : 3. It must be clearly understood that this ratio
refers to the relative contribution of the two factors themselves
to the final result and not to the relative weights of their scores.
The weight to be assigned each score is found from the regular
regression equation given on page 229. It is of considerable
interest, however, to note that while the scores on the general
intelligence test and number of study hours are as 1:2, the
actual contribution of these two factors to honor points (allow-
ing for differences in units, variability, etc.) is as 4 : 3. Intel-
ligence, therefore, as we should expect, has more weight than
hours spent in study in determining the hypothetical ability
258 STATISTICS IN PSYCHOLOGY AND EDUCATION
which we have called " academic success." Much of the
weight which study-hours has is due to its relatively high
negative correlation ( — .35) with intelligence.
In concluding this discussion of partial and multiple correla-
tion, certain limitations to the use of the method should be
pointed out. In the first place, in order that partial coefficients
of correlation be valid, it is necessary that all of the zero order
coefficients be computed from data in which the regression is
linear. Before calculating any partial r's, we should make
sure that all zero order r's have linear regression: if there is
any doubt as to linearity, the tests given on page 209 should
be employed. In the second place, the number of cases must
be large, especially if there are a number of variables, otherwise
partial and multiple coefficients will have little significance.
Coefficients which are misleadingly high may be obtained
when studies which involve many variables are based upon
relatively few cases. When the limitations and conditions
mentioned are fully recognized and met, however, partial and
multiple correlation furnishes us with an exact and powerful
instrument for the analysis of problems which arise in mental
and social measurements.
VI. Spurious Correlation1
The correlation between two sets of test scores is said to be
"spurious" when it is due in whole or part to factors other than
those which determine performance in the tests themselves.
In general, the cause of spurious correlation may be said to lie
in a failure to control conditions; and the most usual effect of
this lack of control is a "boosting" or dilation of the coefficient.
Some of the more general situations which may lead to spurious
correlation are given under the following heads:
1. Spurious Correlation Due to the Heterogeneity of Material
We have already found occasion to show elsewhere (page
221) how a lack of uniformity in age conditions will lead to
iSec also Chap. IV, p. 211.
PARTIAL AND MULTIPLE CORRELATION 259
correlation which is too high, i.e., is spurious. Differences in
age within the group will lead to a distinctly higher correlation
between two tests — when the test scores increase with age —
than the correlation which we should obtain in a single age
(a homogeneous) group. To cite a simple case, in a group
of boys from 10 to 18 years old, a substantial correlation will
appear between strength of grip and length of forearm, quite
apart from any real relation, due solely to the fact that both of
these physical attributes increase with age.
Failure to take account of the age factor is a prolific source
of error in correlational work. In stating the correlation
between two tests, or the reliability coefficient of a test, we
should always be careful to specify the range of ages, grades,
etc., in order to show the heterogeneity of the group. With-
out this information an r per se is practically valueless.
Many other factors besides age may lead to spurious cor-
relation. To cite a familiar example : 1 if alcoholism, degeneracy
and bad heredity are all positively related, the r between alcohol-
ism and degeneracy will be too high (due to the indirect effect
of heredity on both factors) unless the heredity influences are
kept constant. Again, to take another example, suppose that
we have found the scores on a general intelligence examination
and a cancellation test for two distinctly different groups,
e.g., 500 college seniors and 500 day laborers; and that the
average ability in both tests is definitely higher in the college
group. Now if the correlation between these tests is zero in
each group taken separately, when the two groups are combined
a positive correlation will be obtained due simply to the hete-
rogeneity of the composite group.2 Such a correlation is, of
course, spurious.
To be valid, it is clear that a correlation must be freed of
extraneous influences which affect the homogeneity of the
material. When such influences cannot be determined quan-
1 Kelley, T. L., Tables to Facilitate the Calculation of Partial Coefficients of
Correlation and Regression Equations, Bull. Univ. Texas, 1916, No. 27.
2 Otis, A. S., Statistical Method in Educational Measurement, 1925, pp. 334-
336.
260 STATISTICS IN PSYCHOLOGY AND EDUCATION
titatively, this is far from an easy task. Provided, however,
the factor or factors producing heterogeneity are measurable,
their influence may usually be allowed for by the method of
partial correlation.
2. Spurious Index Correlation
It can be shown x that three variables Xi, X2, and X3 may
be totally uncorrelated, and still a correlation between Z\ = ^r-
A-3
X2
8 "id Z2 = -tf* may be obtained which is as large as .50. To take a
-*3
concrete case, if two individuals observe a series of magnitudes
(e.g., Galton Bar settings) independently, the absolute errors
of observation (Xi and X2) may be uncorrelated, and still a
distinct correlation appear between the errors made by the two
observers when these are expressed as per cents of the magnitude
observed (X3). The spurious element here is, of course, the
common factor, X3, in the denominator of the ratios.
One of the commonest examples of spurious index correla-
tion in psychology is found in the correlation of 7Q's obtained
from two different intelligence tests. If the 7Q's of 500 children
ranging in age from 3 to 14 years are calculated from two tests
Xi and X2, the correlation between IQXl and IQX2 will be con-
siderably increased because of the presence of the common factor
of chronological age X3 (since IQ = -^-r-\ in the two series.
The spurious element here may be eliminated by holding con-
stant the common factor of age through partial correlation.
3. Spurious Correlation of a Single Test With a Composite of
Which it is a Member
If the scores of several tests, Xi} X2, X3, etc., are averaged
or added, and the composite scores, Xcom. correlated with the
scores of any single test Xi, the correlation resulting will be too
high (spurious) because of the presence of Xi in the composite.
1 Yule G. U., An Introduction to the Theory of Statistics, pp. 215-216.
PARTIAL AND MULTIPLE CORRELATION 261
The amount or degree of the spurious element is measured by
the ratio - in which t = the number of elements in the single
s
test, and s = the number of elements in the composite1 (see page
293). To illustrate: there are 20 items in the Number Series
Completion Test of the Army Alpha, and 212 items in the whole
test. Now if there were no correlation at all between the scores
on Alpha and Completion there would still be a spurious cor-
relation between the two tests equal to the ratio of the number
of items in Completion to the total number of items in Alpha,
i.e., 22A or .094. A correlation obtained between Completion
and Alpha, therefore, will be too high, due simply to the inclu-
sion of the Completion items in both sets of data.
It should be noted that when several tests are all of the
same — or approximately the same — length, the amount of
spurious correlation which will result from correlating any
single test with a composite of them all is approximately con-
stant ( - is same ) . For this reason it is valid to compare the
correlations of the separate tests with the composite in order
to discover which tests are most representative of the capacity
measured by them all (see page 267).
VII. Summary of Formulas in Chapter V
1. Partial r's,
^12.34 . . . (»-l)— Tln.34 . . . (n-l)^2».34 . . . (n-1) //inx
ri2.34 . . . » = , ,- =. . (49)
VI— r-ln.34 . . . (»-l) V 1— r^2n.34 . . . (n-1)
2. Partial o-'s,
0-1.234 • • • ft = (TlVl -rV^l _r213 2Vl -r214.23 . . . Vl-r2l„.23. . . (»-!)• (50)
3. Regression Equation, Deviation Form,
Xl=bl2.S . . . n^2 + ?>13.2 . . . n%3 ■ • . + &ln.23 • • • (n-l)Xn. (51)
1 Musselman, J. R., Spurious Correlation Applied to Urn Schemata, Journal
of American Statistical Association, Vol. XVIII, Sept., 1923.
262 STATISTICS IN PSYCHOLOGY AND EDUCATION
4. Regression Equation, Score Form,
X\ = &12.34 . . . wX2 + 6l3.24 • • • nXs . . . + &lw.23 . . . (n-l)Xn-\-K. (52)
5. Regression Coefficients,
7 0-1.234 ...71 ,cox
012.34 . . . n = ?12.34 . . . n {OS)
02.134 . . . n
6. Standard Error of Estimate,
0(est.A'1) = CT1.234 . . . n (54)
7. Probable Error of Estimate,
PE (est. xx)= • 6745X0- (est. xi) (55)
8. Multiple Coefficient of Correlation,
#i(23 . . . n) — \ll — o~ — ~~^ (56)
\ a~i
9. Formula for " Chance'' R,
# = ^p. (57)
10. Alternate formula for R,
#1(234 ...«)= Vl-[(l-r212)(l-^13.2) • • • (l-r2m.,3 . . . („-!))]. (58)
PROBLEMS
1. The r for intelligence and school achievement in a group of children
8 to 14 years old is .80. The r for intelligence and age in the same
group is .70. The r for school achievement and age is .60.
What will be the correlation between intelligence and school
achievement in children of the same age?
2. 'The correlation between (1) Army Alpha and (2) Cancellation in a
group of 100 freshmen is .20. The correlation between (1) Army
Alpha and (3) Controlled Association in the same group is .70.
The correlation between (2) Cancellation and (3) Controlled
Association is .45. What is the net correlation between Alpha
and Cancellation in this group? Between Alpha and Controlled
Association? How do you interpret your results?
PARTIAL AND MULTIPLE CORRELATION 263
3. Given the following data : 1
Xi = high school grade in mathematics.
X2 = grade in an English interest test.
X3 = grade in a history interest test.
X4 = grade in a mathematics interest test.
o-1=4.93 r12=.20 r23=.63
0-2 = 3.13 r13=.15 r24=.21
cr3 = 6.12 r14=.24 r34=.54
0-4 = 4.64
(a) Work out the regression equation of Xi on X2, X3, X4.
(6) What are the relative weights of the three tests, X2, X3, and
X4, in determining the score on Xi?
4. The following records were secured from 450 Liberal Arts freshmen
at Syracuse University: 2
Honor points
2.
Intell.
3. Aver. H. S.
Grades
4. Units 5. Hours per
week of study
Mi = 18.5
Mr-
= 100.6
M3 =
79
M4=16.1 M5 = 24
o-! = 11.2
0-2 :
= 15.8
o3 =
7.5
0-4= 1.5 0-5= 6
r12=.60
7*23 =
.36
r34:
= .40 r45=.25
r13=.40
(
r24 =
.20
r3 5
= .11
r14=.22
T2b =
-.35
r15=.32
(a) Work out a regression equation with (1) honor points as the
dependent variable.
(b) If a student has an intelligence score of 110, a High School
average of 75, offers 15 units for entrance, and studies on the
average 25 hours per week, what is his most probable
number of honor points?
5. Using as much of the data in Example (4) as is necessary, find
how many hours a student must study if he has an intelligence
score of 120, and wants to make 20 honor points? (Hint : work
1 Kelley, T. L., Educational Guidance, Teachers College, Contributions to
Education, 1914, 71, p. 104.
2 May, Mark A., Predicting Academic Success, Journal of Educational
Psychology, 1923, Vol. XIV, 7, 429-440.
264 STATISTICS IN PSYCHOLOGY AND EDUCATION
out the regression equation of study hours on honor points and
intelligence and substitute the given values in the equation.)
6. Let Xi be a criterion, and X2 and X3 two other tests. Correlations
and a's are as follows :
ri2=.60 r23=.20 <r,= 5.00
n3=.50 a2 = 10.00
o-3= 8.00
How much more accurately can Xx be predicted from X2 and X3
in combination than from either alone?
7. Given a team of two tests, each of which correlates .50 with a
criterion. If the correlation of the two tests is .20,
(a) How much would the addition of another test which correlates
.50 with the criterion and .20 with each of the other tests improve
the predictive value of the team?
(6) How much would the addition of two such tests improve the
predictive value of the team?
8. Two absolutely independent measures B and C completely deter-
mine a third measure A. If B correlates .50 with A, what is
the correlation of C and A?
9. Using the data given in Example (1) above, analyze school achieve-
ment in terms of intelligence and age. What is the relative
importance of the contribution made by these factors?
10. A group test contains 10 tests with a total of 200 items. One of
the tests correlates .60 with the composite scores on the battery.
If this test contains 15 items, how much of the given correlation
is spurious?
Answers
1. r=.67.
2. The r between Alpha and Cancellation is — .18; between Alpha
and Controlled Association, . 70.
3. (a) xi= .37x2-.llz3+.28:c4.
(6) Grade in mathematics = 6. 5 (grade in English interest test)
—2 (grade in history interest test) +5 (grade in mathematics
interest test).
PARTIAL AND MULTIPLE CORRELATION 265
4. (a) Xi=.58X2+. 14X3-1. O3X4+I. 10XB-62
(6) 24 with a PE(est. Xl) of 4 points.
5. 18 hours with a PEiesUX0 of 2.7 hours: 18db2.7
6. From X2 alone cr(est. Xl) = 4 . 0
From X3 alone o-(est. xx) = 4 . 3
From X2 and X3 cr(est. Xl> = 3.5
7. (a) i? increases from .64 to .73.
(6) R increases from .64 to .79.
8. rAC=.8m.
9. Intelligence and age contribute in the ratio (approximately) of
10 : 1.
10. .075.
CHAPTER VI
SOME APPLICATIONS OF STATISTICAL METHOD AND
TECHNIQUE TO TESTS AND TEST RESULTS
To treat properly all of the statistical methods which may
be applied to tests would require not a single chapter but a
volume in itself. The aim of the present chapter, therefore, is
to consider simply those methods — having to do largely with
correlation and reliability — which are deemed essential (1) in
the treatment of ordinary problems involving tests and (2) as a
foundation for more advanced work in methods of treating test
results.
I. The Validity of Test Scores
The validity of any measuring instrument depends on the
fidelity with which it measures whatever it purports to measure.
A yardstick is " valid" when measurements made by it can be
checked by other measuring instruments. And in like manner
a test is valid when the capacity which it measures corresponds
to the same capacity as otherwise objectively measured and
defined.
1. Validity Determined through Correlation with a Criterion
The validity of a test is usually determined by finding the
correlation between the test and some independent criterion.
A criterion is defined as that measure in terms of which the
value of a test is estimated or judged. The criterion of a
general intelligence test, for example, may be school marks, or
ratings for intelligence, or some other test believed to be valid.1
1 Stanford-Binet is often taken as a reliable criterion of general intelligence.
For example, see Herring Revision of Bluet-Simon tests.
266
STATISTICAL METHOD AND TEST RESULTS 267
The criterion for a trade test is actual ability in the trade. A
high correlation between a test and its criterion may be taken
as evidence of validity, provided both the test and the criterion
are reliable. Before accepting criterion-correlations as final,
however, we must know the reliability of our test, and if possi-
ble, we should know also the reliability of our criterion.1
2. Indirect Measures of Validity
When a reliable criterion is not available, indirect methods
must be employed to determine validity. One indirect method
is to combine the scores on a number of tests of the same
general function and to judge as best (most valid for the func-
tion) that test which correlates highest with the average of all.
Thus Whitley 2 found for three discrimination tests, Naming
Colors, Naming Forms, and Naming Objects, the following
correlations : 3
[Naming Colors r= .67
Average of all three tests with \ Naming Forms r = .99
l Naming Objects r= .96
She concludes that " Naming Forms seems more a typical test
in so far as it measures an ability common to these three tests. "
In the absence of an independent measure of the function the
average of several tests of that function may be taken as one
criterion.
A second indirect method of measuring validity is to find
correlations between the given test and other tests, in this way
discovering some of the facts which the test does, and does not,
measure. For example, tests of Controlled Association, e.g.,
Opposites, Logical Relations, "etc., correlate much more highly
with tests of general intelligence and " reasoning" than with
tests of Cancellation or Color-Naming. The first group of
tests is, therefore, a better (more valid) measure of the capacity
i Kelley, T. L., The Reliability of Test Scores, Journal Educational Research,
1921, Vol. 3, 5, p. 370.
2 Tests for Individual Differences, Archives of Psychology, 1911, 19, p. 78.
3 The "spurious" element here is constant provided the tests are all of
practically the same length (see page 261).
268 STATISTICS IN PSYCHOLOGY AND EDUCATION
measured by the general intelligence and reasoning tests than
the second group. (Indirect measures of this sort are advisable
only in the absence of more direct and valid criteria.)
The absence of valid criteria for many of his tests forces the
careful psychologist to define tests strictly in terms of what
they actually do. Hence the tendency of present-day testers is
to call a test by some descriptive name rather than in terms of
some more or less well-defined " mental function. '; Accord-
ingly, we have Opposites Tests, and Completion Tests rather
than tests of Association or Reasoning.
II. The Reliability of Test Scores
1. The Reliability of a Test as Measured by Its Self-Correlation
A. The " Reliability Coefficient "
The reliability of a test (or of any measuring instrument)
is determined by the consistency with which it measures the
capacity of those taking it. If a group repeats a test and each
individual in the group scores close to his first record, we regard
the test as reliable. If, however, there are large positive and
negative differences between the scores made by individuals on
the first and second giving of the test over and above the
practice effect l — and if such differences occur in a large num-
ber of cases — obviously the test is inconsistent and unreliable.
One method of measuring the reliability of a test is to correlate
the scores made on the test by a given group with the scores
made on the same or a duplicate test by the same group. This
is the method of self-correlation; and the r so found is called
the "reliability coefficient."
When the reliability coefficient of a test is 1.00, the test is an
absolutely accurate measure of whatever capacity it tests, and
when the reliability coefficient is .00 the test has just no relia-
bility. The lower the reliability coefficient the less the reliability
or consistency of the test as a measuring instrument.
1 Practice, since it serves to increase all scores proportionally, does not
affect self-correlation. It does, however, introduce a constant error.
STATISTICAL METHOD AND TEST RESULTS 269
How high should self-correlation be in order to indicate a
satisfactory reliability? This is an important question and its
answer depends largely on the nature of the test and the size and
variability of the group for whom the test is intended. Most
makers of general intelligence tests demand a reliability coeffi-
cient of at least .90 between duplicate forms of their tests for
unselected groups of the same chronological age. To be a reli-
able measure of capacity, a mental or physical test should —
generally speaking — have a minimum reliability coefficient of
at least .80. This minimum will vary with the group, however,
as the reliability coefficient is considerably affected by the range
of scores made on the test (see page 271). For this reason, in
giving the reliability coefficient of a test the size and variability
of the group measured should always be stated.
B. Effect on Reliability of Lengthening or Repeating the Test
If the self-correlation of a test is unsatisfactory two courses
are open: (1) we can lengthen the test until the reliability is
greater; or (2) we can repeat the test and its duplicate twice
each, average the two series of scores, and correlate these
averages. If after (2) the reliability coefficient is still too low,
we can repeat the test and its duplicate, three, four, or as many
times as is necessary to secure the desired reliability coefficient.
To do either (1) or (2) empirically would require a consider-
able amount of time and labor; hence it is fortunate that a
good measure of the effect of (1) or (2) may be expeditiously
secured by applying Spearman's (sometimes called Brown's1)
" prophecy" formula:
Nr
Tx~.l+(N-l)r (59)
To illustrate the application of this formula, suppose
(a) that the self-correlation of a test is .70 and that we wish to
know what will be the effect of doubling the length of the test
1 Brown, Wm., The Essentials of Mental Measurement, 1911, p. 102.
270 STATISTICS IN PSYCHOLOGY AND EDUCATION
on its reliability. Substituting r = .70 and N = 2 in the formula,
and solving for rx we have
2X.70
Doubling the test's length, therefore, increases the self-correla-
tion from .70 to .82. Instead of doubling the length of the test,
we may give it and its duplicate twice each, average the two
scores made by each individual in the two series, and correlate
these averages. The result will be the same (as far as purely
statistical factors are concerned) as that obtained by doubling
the length of the test.
The " prophecy" formula may be used in another way.
Suppose (6) that the self-correlation of a test or the correlation
of the test and its duplicate is .80. How much will the test
have to be lengthened (or how many times repeated) in order
to insure a reliability coefficient (rx) of .95? Substituting r = .80
and rx=.95 in the formula, and solving for iV, —
.95= -SN -8N
1+.82V-.8 .2+. SN'
.04AT=.19
N = 4 . 75 or 5 . 00 (in whole numbers) .
The test must be 5 times its present length or repeated (together
with its duplicate) 5 times in order to raise the self-correlation
from .80 to .95.
When a test is increased in length, e.g., doubled or tripled,
the items or questions added must always be equal in reliability
to the reliability of the original test, if the results from the
prophecy formula are to be valid. Provided this condition is
satisfied, it is evident that if we increased the length of a test
indefinitely we could — theoretically — raise its self-correlation to
any desired figure. This seems scarcely reasonable, however;
and there is evidence to indicate that while the reliability
.STATISTICAL METHOD AND TEST RESULTS 271
coefficient increases according to the formula for the first four
or five pooled tests, thereafter it increases ''more slowly than
the prediction formula would lead us to expect." !
C. Coefficient of Reliability from One Application of a Test
If a test has no duplicate and cannot well be repeated, we
may measure the reliability of half of the test and then by
Spearman's formula find the reliability of the whole test. The
procedure is as follows: First, we make up two independent
sets of scores by combining, say, alternate exercises in the test.
For example, one set of scores may be the performance on the
odd exercises, e.g., 1, 3, 5, etc.; the other set the performance
on the even exercises, e.g., 2. 4, 6, etc.; or some other plan may
be used.2 These two sets of scores are now correlated to find
the reliability coefficient of the half test. If the self-correlation
of the half test so found is called r*, substituting X = 2 in
Spearman's formula, we can calculate the reliability of th
whole test bv the formula,
2
rh
(6o;
In using this formula we make the assumption that the halves
of the test as we have made them up are approximately equiva-
lent in difficulty and content.
D. Dependence of the Reliability Coefficient on the Size and
Variability of the Group
The coefficient of reliability obtained from a test and its
duplicate given to the pupils of a single grade cannot be taken
as indicative of the same degree of reliability as the identical
coefficient obtained from a group composed of pupils spread over
several grades. This is due to the fact that the heterogeneity —
1 Hoizinger, Karl J., Note on the Use of Spearman's Prophecy Formula for
Reliability, Journal Educational Psychology. 1923. Vol. XIV. 5. pp. 301-305.
2 Ruch. G. ML, and Del Manzo, M. C, The Downey Will Temperament
Hfi Test; Analysis of its Reliability and Validity, Journal Applied Psvcbok g
Vol. VII. 1. 1923. p. 65.
272 STATISTICS IN PSYCHOLOGY AND EDUCATION
the size, and spread — of the two groups is different. Recently
Kelley l has devised a formula from which, knowing the relia-
bility coefficient of a test, say, in a group composed of pupils
from a single grade, we can determine what the reliability coeffi-
cient of the same test must be in a group composed of pupils
from several grades in order that the test be equally effective
in both ranges. The formula is
Vl-r '
2
(61)
in which u and 2 are the o-'s of the scores in the small and large
groups, respectively, and r and R are the reliability coefficients
of the test in the small and large groups. To illustrate, suppose
that in a single grade r=^.50 and c = 5.00; and that in a large
group made up of children from grades 3 to 8, inclusive, 2 = 15.
What R (i.e., reliability coefficient) must the test yield in the
large group in order to be as effective here as in the small group?
Substituting for a, 2, and r in the formula, R = .94, — which
means that a reliability coefficient of .50 in the small group
indicates the same degree of reliability as a reliability coefficient
of .94 in the group in which the range of " talent" is three times
as great.
This formula may be used to determine whether a test is
equally effective in parts of the range (a) as in the whole range
(2) ; or in one range as in another. It also serves to make clear
the necessity of always giving the size and spread of the group
in stating and interpreting reliability coefficients.2
2. The Index of Reliability
By an individual's "true" score in a test is meant the
average of a very large number of measurements made of the
given individual on the same or duplicate tests under precisely
i The Reliability of Test Scores, Journal Educational Research, 1921, Vol.
Ill, 5, pp. 370-379.
2 Otis, A. S., Statistical Method in Educational Measurement, 1925, pp.
253-254.
STATISTICAL METHOD AND TEST RESULTS 273
the same conditions. It has been shown 1 that the correlation
between a series of obtained scores and their corresponding
"true" scores may be found from the formula
^"obt. true
= vVi2, (62)
in which 7*12 is the self-correlation or the reliability coefficient
obtained from duplicate forms of the test. Given the reliability
coefficient, therefore, it is possible to secure the coefficient of
correlation between a set of obtained scores and their correspond-
ing true scores. This coefficient, robt. true, is called the "index of
reliability," and is the maximum value which the reliability
coefficient, ri2* can take. This will be seen to follow from
the fact that "the highest possible correlation which can be
obtained (except as chance might occasionally lead to higher
spurious correlation) between a test and a second measure is
with that which truly represents what the test actually measures,
— that is, the correlation between the test and the true scores of
individuals in just such tests." 2 Since ri2 is usually less than
1.00, rGbt. true is nearly always greater than ri2.
To illustrate the index of reliability, suppose that for a given
group, ri2 = .64. Then roht_ true = V.64 or .80, and .80 is the
highest self-correlation which can be obtained (except by
chance) with this test in its present form. The index of
reliability is a useful and easily interpreted measure of a test's
reliability, since by simply extracting the square root of an
obtained reliability coefficient we can find the maximum reli-
ability which the test is capable of yielding. Thus, if r&
= .25, so that robt. trUe = v .25 or .50, it is obviously a waste of
time to continue using the test without lengthening or otherwise
improving it.
1 Kelley, T. L., A Simplified Method of Using Scaled Data for Purposes of
Testing. School and Society, 1916, Vol. IV; 34, 71.
2 Kelley, T. L., The Reliability of Test Scores, Journal of Educational
Research, 1921, Vol. Ill, 5, 327.
274 STATISTICS IN PSYCHOLOGY AND EDUCATION
3. The Standard Error and Probable Error of Measurement
coif) and PE{m
We have seen that the reliability of a test may be measured
in terms of (1) its reliability coefficient, and (2) its index of
reliability. Still another way of measuring the reliability of a
test is to determine how closely a score obtained on the given
test approximates its corresponding true score. (True scores
have been defined on page 272.) An obtained score will usually
differ in some degree from its corresponding true score due
to the presence of two sorts of errors, — constant errors and
variable errors. Constant errors, since their weight is all in
one direction, do not affect self-correlation, and can usually be
ruled out or their influence measured. Variable errors, how-
ever, since they may be either positive or negative, are less
easily eliminated than constant errors, and hence are more
effective in producing departures of obtained scores from cor-
responding true scores.
The measurement of the influence of variable errors, there-
fore, becomes a matter of considerable importance. It may be
done by calculating the standard error of measurement —
written o-(m> — which may be interpreted as a measure of the
amount of variable error, or as a measure of the probable
divergence of obtained scores from true scores after the elimi-
nation of constant errors. The a{M) is derived directly from
the <j(est.) as follows. In the equation ff(ejt.i)=ci^l-^i2 (see
formula 32), if <n is the a of the scores in test 1, and T\% is the
correlation between tests 1 and 2, then <r(est. i> measures the
accuracy with which individual scores in test 1 may be esti-
mated from a knowledge of the corresponding scores in test 2.
Now if the scores on test 2 are taken to represent true scores,
and the scores on test 1, obtained scores on the same test the
equation may be written
^(est. obt.) — O'obt.'V I T obt. true.
But r0b». truo= v>'i2, and r2ODt. true = ''12 the reliability coefficient.
STATISTICAL METHOD AND TEST RESULTS 275
Hence, substituting these values in the above equation, we
have
0"(est. obU = 01 vl— Ti2,
or writing <r{M) for o-(est. obt.) finally,
o-w = criVl-ri2. (63)
Formula (63) gives the standard error of measurement for
a set of obtained scores. Given ri2, the reliability coefficient
of the test, and a\ (the a of the test scores) we can, from formula
(63) measure the probable divergence of an obtained score
from its corresponding true score.
Instead of aiM) we may find PE(M) — which is probably
more often used — by the formula
PECM)=.6745criV,l-ri2. .... (64)
To illustrate the use of these formulas, suppose that in a
group of 100 college men, we obtain an average Army Alpha
score of 150 with a a of 15.00 points; and that the self -cor-
relation of Alpha (found by correlating two forms) is .90. What
are the a^M) and PE\M)! Applying formula (63), we have
<r(M) = 15V/l-.90 = 4.74
and from (64),
PE\M) = • 6745 X 15VT=T90 = 3 . 20.
From the PE{M), we may interpret this result to mean that the
chances are even that the true score of any individual in the
group of 100 falls within the range, obtained score±3.20.
For a given obtained score of 175, the chances are even that
the true score of this particular man lies within the limits
178.20 and 171.80. Expressed in another way, we may say
that 50% of the obtained scores are in error (as compared
with their true scores) by not more than ±3.20 points.
In the formulas for a{M) and PE{M), the o-'s of the test
and its duplicate are assumed to be equal. If this is not at
276 STATISTICS IN PSYCHOLOGY AND EDUCATION
least approximately true we must write these formulas as
follows:
_ (0-1+^2)^/1 — — fat~
<T(M)= 2 vl — H2, (65)
and
P2?(„> =. 6745 Xp^VT^l.. . . (66)
In the illustration above, if the a obtained from the first
form of Alpha, and the a obtained from the second form of
Alpha — had been 15 and 20, respectively, <j^m and PE{M)
would be written
(run = ^^Vl-.90 = 5 . 53
and
PE{m=- 6745X5. 53 = 3. 73.
The student must be careful not to confuse the formulas for
0-(est.) and P^(est.) with those for u^m and PE{M). The
"estimate" formulas enable us to say with what degree of
accuracy we can predict an individual's score on one test, —
knowing his score on a second (and usually a different) test.
The actual prediction of the "most probable score" is made
of course, by means of the regression equation connecting the
two tests. The aiM) and PEiM) formulas, on the other hand,
enable us to determine the probable divergence of an individual's
obtained score from his corresponding true score, when we
know (1) the a and (2) the reliability coefficient of the test.
When tests are scored in different units, the g{M) of the
one cannot be directly compared with the c^ of the other.
We cannot compare directly, for example, the reliability of a
score made on a tapping test (score in number of taps made in
30 sec.) with the reliability of a score on a logical memory test
(scored in number of items remembered). A simple method of
overcoming this difficulty is to use a ratio similar to the coeffi-
cient of variation, V, described in Chapter I. Thus the ratio
STATISTICAL METHOD AND TEST RESULTS 277
-~- or t (M) of the one test may be compared directly with
the -r^- or . {M) of the other. In this way, the reliability of
obtained scores on one test may be compared with the reliability
of the obtained scores on another.
III. Combining the Scores from Different Tests
When a number of different tests have been given to
the same individual, it is often desirable be able to combine
the separate test scores into a composite score in order to
express the individual's standing in the tests as a whole. The
simplest procedure is, of course, to average the scores as they
stand. In merely averaging results, however, two difficulties
arise. The first is the difference in the size and kind of units
employed in the tests. Many tests are given by the Amount-
Limit Method — the work is completed (or as much as possible
done) and the individual's performance is scored in terms of
the time required. Many other tests are given by the Time-
Limit Method — the time is fixed, and the subject's score is
the number of items completed or the number of questions
answered in the time allowed. It is obvious that scores ob-
tained from tests given by these two methods cannot be com-
bined directly.
A second difficulty is the question of the relative influence
or "weight" to be given the different tests in the composite
score. Simply to average the "raw" (obtained) scores gives
us no control over the relative importance of the various tests
in the final total score. For although it is often assumed that
by simply averaging results we avoid the troublesome question
of weighting, what we actually do in such cases is to weight quite
drastically without knowing what the weights are. With these
two difficulties in mind, let us examine several methods which
have been proposed for combining separate test scores into a
composite score.
278 STATISTICS IN PSYCHOLOGY AND EDUCATION
1. Combining Test Scores by Percentiles
If the distribution of each of the separate tests which we
have given is broken up into percentiles, it becomes an easy
matter to combine the separate percentile rankings in the vari-
ous tests, and thus secure a final percentile ranking for each
individual. The method of calculating percentiles has already
been considered (page 45). It is only necessary, then, to show
how percentile rankings may be combined.
TABLE XXIX
Percentile Distributions for 9- Year Olds on Three Tests. Method
of Combining the Percentile Ratings of a Single Individual
Percentiles S's
5's Perc.
Tests 0 10 20 30 40 50 60 70 80 90 100 Score Rank
Picture Completion 62 240 297 325 372 407 440 450 499 577 646 445 65
Substitution 219 190 173 158 152 141 133 126 121 109 80 126 70
Sequin Form-Board.... 34 24 21 20 18 18 17 16 15 15 13 17 60
Median percentile •. . . . 65
Table XXIX gives the percentile tables for 9 year-olds on
three tests of the Pintner-Patterson series of performance tests.
The subject, a 9 year-old boy, made a score of 445 on Picture
Completion which gave him a percentile ranking of 65 (midway
between 60 and 70) on this test. On Substitution, a score of 126
gave him a percentile ranking of 70; and on the Sequin Form
Board a score of 17 gave him a percentile ranking of 60. The
median of these three percentile rankings is 65, which indicates
that the subject is somewhat above the average for Ins age. If
the subject had been, say, 10 or 11 years old, percentile tables
for these age distributions would have been used. As is evident
from Table XXIX the method of combining percentile rankings
is simple and straightforward; it rules out the question of
different units in the tests combined, and gives each test equal
weight in the final score.
STATISTICAL METHOD AND TEST RESULTS 279
2. Combining Test Scores by the Method of Median Mental
Age
When the subjects are children, and age-norms exist for the
tests administered, it is a relatively easy matter to determine
the MA of the subject in each test, and then find the median
of these Mi's. The median MA is the " composite score."
Tables giving the MA equivalents in scores for various
tests have been published by many authors J and need not be
reproduced here. The method of finding a median mental age
for several tests is often very useful and its results are easily
interpreted. The method does not, however, apply to normal
adults.
3. Combining Tests Which Have Been Weighted According to
the Variability of the Test Scores
When several tests have been given, all by the Time-Limit
or all by the Amount-Limit Method, scores may be combined
directly, the weight which each test score shall have in the
composite score being determined in accordance with the varia-
bility of the test scores. An illustration will make the method
clear. Suppose that in a given test in which the Average = 25
and o- = 5, subject A scores 20; and in another test in which the
Average = 150 and a = 15, A scores 160. Now if we simply add
A's two scores, e.g., 20+160 to get 180, the score in the second
test is given three times as much importance in this composite
as the score in the first, since the spread, i.e., the cr, is three times
greater in the second test. In order to give the two tests equal
weight, we must equalize their spread or variability, and this
can be done by multiplying the a of the first test by 3 or dividing
the <s of the second by 3. This same procedure must then be
applied to the scores. By the first operation, our composite
score becomes 20X3+160 or 220; by the second operation, the
1 For example, see Whipple, Manual of Menial and Physical Tests, Vols.
I and II, 1914; Pintner and Patterson, A Scale of Performance Tests, 1921;
Pyle, W. H., The Examination of School Children, 1913.
280 STATISTICS IN PSYCHOLOGY AND EDUCATION
composite score becomes 20-fJ-f5- or 73 . 34. In either composite
both tests will now have equal weight.
TABLE XXX
How to Combine Scores Weighed According to Variability
Data from 200 College Women. (From Carothers, F. E., Psychological Ex-
amination of College Students, Archives of Psychology, 1921, pp. 30-34.)
Log. Memory Log. Memory Com- Informa- Vocab-
Testa (recall) (recognition) pletion tion ulary
1 2 3 4 5
Average 6.50 37.47 35.78 104.71 73.90
a- 1.76 7.69 4.36 26.79 7.60
Multiplier to give all
tests equal weight. 5 12 ^ 1
Newer 8.80 7.69 8.72 8.93 7.60
A's score 5 35 30 100 75
A's weighted score Total
(all tests equal)... 25 35 60 34 75 = 229
A's weighted score:
Tests 1 and 3
weighted 2,othersl 50 35 120 34 75 = 314
In order to illustrate this method of combining scores in more
detail, the average and the a for each of five tests are given in
Table XXX together with the scores of subject A on each test.
If A's scores are added as they stand, test 4 (Information) will
be given 15 times the weight of test 1 (Logical Memory, recall)
in the composite, since the a for Information is 15 times the a
for Logical Memory, recall. Likewise, Information will have
approximately 6 times the weight of Completion and approxi-
mately 3 times the weight of Logical Memory, recognition, and
Vocabulary. It seems hardly probable that Information is as
much superior in value as this to the other tests — in fact, it is
possibly one of the least important — and hence a new weighting
is clearly necessary. The simplest plan at the start will be to
weight all of the tests equally as shown in the table. If we
multiply the a of test 1 by 5, the a of test 2 by 1, the a of test 3
by 2, the a of test 4 by §, and the a of test 5 by 1, we make all of
the a's approximately equal. Now if we multiply A's scores by
STATISTICAL METHOD AND TEST RESULTS 281
these same "multipliers," the new test scores will all have the
same weight in the final composite. In determining multipliers,
the best plan is to keep them whole numbers, if practicable, and
as small as possible. In Table XXX, for example, the o-'s of
tests 2 and 5 have been taken as standards because this gives
the simplest multipliers for the other tests.
Suppose now that we had wished to give Logical Memory,
recall, and Completion twice as much weight as the other tests
in the composite. To accomplish this we should simply have
multiplied the <r's of tests 1 and 3 by 10 and 4 instead of 5 and
2, i.e., we should have multiplied by enough to make their new
o-'s twice as large as the cr's of the other tests. Of course, when
all of the tests have already been weighted 1, we need only
double the scores on tests 1 and 3.
To summarize the steps in the method:
(a) Find the average and the a or Q of each test.
(6) If the tests are to have equal weight, multiply the
cr or Q of each test by factors selected so as to make all of the
new <r's or Q's equal. If some tests are to count more heavily
than others, make their cr's or Q's proportionally larger.
(c) Multiply each £'s score by the " multiplier" decided
upon in (6), and add these new scores. Leave the result as
a composite total, or average the new scores if there is some
reason for working with smaller numbers.
4. Combining Test Scores by Converting the Scores of Different
Tests into Comparable Series
As mentioned above, the chief difficulties in combining the
scores of different tests arise from differences in the units in
which the tests are scored as well as differences in variability
among the tests themselves. We have already considered three
ways of avoiding these difficulties. Still another method is to
convert the scores of the different tests into comparable
distributions, after which the test scores may be combined
directly.
Two methods of combining tests in this way have been.
282 STATISTICS IN PSYCHOLOGY AND EDUCATION
proposed, both of which assume that the distributions of test
scores are normal or approximately normal. The more recent,
suggested by Professor Clark Hull,1 is to convert the scores
from each test into a "standard" normal distribution in which
the scores shall range from 0 to 100 with a mean at 50 and a
of 14. [Individual scores rarely spread more than ±3.5o-
50
above or below the average ; hence, since ^r— = 14 . 00 the a of
o.o
this distribution may be taken as 14.00.] Conversion of the
scores of a given test is readily made by the following scheme:
Let M— average of the given test.
Let <7 = a of the given test.
Let Xi = individual's score on the given test.
Let 50 = average of the converted series.
Let 14 = 0- of the converted series.
Let X = individual's score in the converted series.
Now if £ = — SindK = 50-MS; then X = i£+SXi.
To illustrate, suppose that in a given test the average is
16.00, the <j is 3.5, and that subjects scores 18 on the test.
What is A's converted score?
S=^\ or 4.00, and # = 50-16X4 or -14.00.
o . o
Substituting in X = K+SXU X= -14+4X18 = 58.
A's score, therefore, in a distribution of Average = 50 and a = 14
is 58. In other words (assuming a normal distribution), 58 is
as far above the average of the distribution whose average is
50, as 18 is above the average of the distribution whose average
is 16.00.
An illustration will serve to demonstrate how scores may
be combined by this method (Table XXXI).
1 The Conversion of Test Scores into Series which shall have any Assigned
Mean and Degree of Dispersion, Journal Applied Psychology, 1922, 6. p. 299,
STATISTICAL METHOD AND TEST RESULTS 283
TABLE XXXI
Test 1 Test 2
Word Building Digit Span Total
Average 16 . 30 7.4
a 4.90 1.3
A's score 18.00 8.0
A's converted score 54 . 86 56 . 48 55 . 67
Taking test 1, Word-Building, first, from the formula above,
£ = ~ or 2.86; and # = 50-16.30X2.86 or 3.38. Hence,
4.9
X = 3. 38+2. 86 Xi, and substituting A's score of 18 for X\ we
. . 14
have X = 54.86. In like manner, m test 2, Digit Span, & = — -x
1 . o
or 10.8; and # = 50-7.4X10.8 or -29.92. Accordingly,
X= -29.92+10.8X8 (substituting A's score in Digit Span)
or 56.48. Averaging A's scores in Word-Building and Digit
Span, we have 55.67 as the composite score, which means that
A is slightly above average (50) in the two tests.
Since we have computed both K and S for each of the tests,
all of the scores on Word-Building may be quickly converted
into "new" scores by means of the formula Z = 3.38+2.86Xi;
and all of the scores on Digit Span converted into "new"
scores by means of the formula X= —29. 92+10. 8X1. In
each case the X\ represents the actual score on the test.
An earlier method of combining test scores, based on the
same principles as the above plan, was outlined in 1912 by
Professor Woodworth.1 Woodworth's plan was to find the
difference between a given individual's score on a test and the
average score, i.e., X— Avx; divide this plus or minus differ-
ence (ztx) by the a of the test and call the result ( — ), the
"reduced score." 2 Reduced scores found in this way for the
1 Combining the Results of Several Tests, A Study in Statistical Method,
Psychological Review, 1912, Vol. XIX, pp. 97-123.
? Note that in Woodworth's method the average is taken at 0 and a as 1.00,
284 STATISTICS IN PSYCHOLOGY AND EDUCATION
same individual on several tests may be combined by simply
averaging them — the weight of each test in the composite will
be 1.00. To illustrate the method using the data of Table
XXXI, A 's score of 18 on the Word-Building test is 1 . 70 above
the average, i.e., above 16.30; and dividing this deviation by
the a of the series gives A a " reduced score" — a score ex-
pressed in a units — of .347. On the Digit Span test, A's score
of 8.00 is .6 above the average of the distribution, i.e., above
7.4; and dividing . 6 by 1 . 3 we get a reduced score on Memory
Span of .462. If we average these two reduced scores, A is
found to stand . 405 (in <t units) above the average of the group in
the two tests. (Remember that this method, like the preceding
one, assumes that the distributions of test scores are approxi-
mately normal.)
Of these two methods, the first is somewhat the simpler
inasmuch as it involves only plus values (all transmuted scores
lie between 0 and 100), while the second method introduces
plus and minus values which are nearly always fractions, often
small in size and inconvenient to handle. Again, a composite
score of 55 . 67 by Hull's method is probably more intelligible to
the average student accustomed to think in per cents, than an
average score of .405 found by Woodworth's plan. The latter
result is meaningful only to those who have had considerable
statistical training.
Woodworth's method has one particular advantage, how-
ever, which should be mentioned, viz., that when reduced scores
have once been calculated for two or more tests, correlations
between the tests may easily be found. The method of obtain-
ing such correlations is illustrated in Table XXXII which gives
the reduced scores made by 10 adults on a Memory Span and
Information test, and the correlation between the two series.
As shown in the table the calculations are relatively simple.
Since each individual's reduced score on Memory Span (X) is
simply his x (i.e., his deviation from the average) divided by
&X) and his reduced score on Information (F) is, again, his y
(i.e., deviation from the average) divided by cry, the sum of the
STATISTICAL METHOD AND TEST RESULTS
285
products (i.e., — • — ) of the reduced scores of all of the ten
\ Vx Cfy/
2^7*77
individuals will give — -. We know from formula (24) that
O'xO'y
2t?7
r— Ar (page 168). Hence, the correlation between the
i\ax(ry
two tests is obtained simply by dividing - — -, (7 . 31) by N (10) :
(TxCy
that is, r equals .731.
TABLE XXXII
To Illustrate the Method of Finding Correlation from
''Reduced Scores"
Memory Information (F)
Reduced
Score in X
Reduced
Score in Y
Individuals
Span (X)
Score
Scor
A
5
90
B
9
60
C
8
90
D
7
85
E
6
70
F
10
100
G
12
130
H
6
80
I
5
( 75
J
12
120
Avx =
= 8.0
<TX-
= 2.53
(-) (-)
\(JX' \(Ty/
\ffx
-1.19
.39
-''.39
- .79
.79
1.58
- .79
-1.19
1.58
Product of
Reduced Scores
(xy \
(JxOy)
-1.45
-!24
- .97
.49
1.94
- .49
- .73
1.46
2xy
OxOy
-.566
'^094
.766
.387
3.065
.387
.869
2.307
= 7.309
Avy = 90.00
0-^ = 20.62
2xy 7.31
N<rx<Ty 10
= .731
Note. — This table is intended simply to illustrate the method. A produot-
moment r would not ordinarily be found for 10 cases.
The student should bear in mind when using either of these
methods that neither is strictly applicable when the distributions
are considerably skewed. As stated above, both assume that
the distributions to which they are applied are normal or
approximately normal.
286 STATISTICS IN PSYCHOLOGY AND EDUCATION
IV. The or of the Sum or Difference of Corresponding
Values of Two Series of Test Scores
If we know the correlation between two series of test scores
Xi and X2 and the cr's of the two series, it is possible to compute,
in a simple way, the a of the new composite series obtained by
adding or subtracting the corresponding scores in the two original
series. When the scores of the "new" distribution have been
found by adding corresponding scores, the formula for asl is
(Ts—^o-2Xl+(T2X2-\- 2raXlaX2, (67)
in which crs denotes the a of the "new" summed-series, aXl is the
a of the Xi scores, aX2 is the cr of the X2 scores, and r is the
coefficient of correlation between Xi and X2. When the scores
in the new distribution have been obtained by subtracting cor-
responding scores in the two tests, formula (67) becomes
<rd=^/(T2x1+(T2X2-2raXl(TX2, (68)
in which ad is the a of the new difference-series.
A problem will illustrate the use of these formulas. Let
Xi denote a Verb-Object Test and X2 an Opposites Test. Then
given 0^=11.18, 0^ = 9. 00, and rXlX2= .60, what is the a of
the new series obtained (1) by adding the corresponding Xi and
X2 scores, and (2) by subtracting the corresponding Xi and X2
scores? Substituting in formula (67), we have
or
<t3=\/(11.18)2+(9.00)2+2X. 60X11. 18X9,
a8 = 18.07.
Thus, 18.07 is the a of the (X1+X2) series. To find the a of
the (Xi— X2) series, ad, we substitute in formula (68),
crd=V/(11.18)2 + (9.00)2-2X. 60X11, 18X9. 00,
or
<7d = 9.23.
1 For a simple mathematical proof of this formula, 9ee Yule, An Introduction
to the Theory of Statistics, pp. 210-211.
STATISTICAL METHOD AND TEST RESULTS 287
Formula (68) is often useful when a test has been repeated
in a group under changed conditions and the variability of these
changes, i.e., the <j of the differences between scores made on the
second and the first giving of the test, is sought. Except that
there is only the one test concerned, the method is identical
with that of the problem above. The chief objection to the
formula is that the r between the scores on the first and second
giving of the test must be known. For this reason, unless the
r is wanted for other purposes, it is usually easier to subtract
the corresponding scores and derive the a of their differences
directly.
From the formula for the reliability of the average, <rav = ~^i,
VN
(formula 13), we know that o-(dls.) = v JWav.. We may, therefore,
write ViVVav.^ instead of <rXl; VW<7av.z2 instead of <rX2; ViVo-av.s
instead of as; and v JWav.d instead of <xd. Making these sub-
stitutions in formulas (67) and (68) we have (the iV's cancel),
that
0"av. s—v 0"*av. xx + C"av. x2 4" 2r<7av. xi^av. x2, • • (69a)
and (
Cav.d= v (7-2av.i1+0-2av.a;2 — 2/'(7a,v.Xl0-av. x2' • • (696)
in which o-av.s is the a of the average of the (X1+X2) series of
scores, and <7av. a is the a of the average of the (Xi — X2) series of
scores.
Formulas (69a) and (696) must always be used whenever
there is any correlation between the X\ and X2 scores. If Xi
and X2 are uncorrelated, that is, if r = . 00, the third term under
the radical disappears and (69a) and (696) become
Oav. s = v O^av. xt + C2av. x2, .... (70a)
and
%.d = ^2av.ii + ff2av.i2 (706)
Now if we write <r^m.) instead of o-av.d in formula (706), we at
once recognize the familiar formula, cr(dlff.) = V c2av. 1+ <r2av.2,
288 STATISTICS IN PSYCHOLOGY AND EDUCATION
which we have used heretofore for measuring the reliability of
the difference between two averages, or with appropriate
changes, two <r% or two r's. It should always be remembered
that 0-(dlff.) is simply a special form of the more general formula
(696) and that it always assumes a zero correlation between
Xi and X2.
The PE may be written for a in any of the formulas given
in this Section by making the substitution PE = . 6745 X <r.
V. How to Interpret the Coefficient of Correlation
BETWEEN TWO TESTS
When can a coefficient of correlation be considered "high"?
Is an r of .40 between two tests evidence of "low" or "marked"
relationship? Questions like these, and many others which
relate to the interpretation of a coefficient of correlation fre-
quently arise in test work and must be answered if we would
understand the significance of an obtained r.
The effectiveness of an r as a measure of relation may be
evaluated in several ways: (1) in terms of the standard error
of estimate ; (2) in terms of the standard error of measurement ;
and (3) in terms of the percentage of factors common to the two
capacities correlated. Let us consider these three approaches
to an interpretation of r before attempting to lay down any
general rule for classifying r's as "high," "medium," or "low."
1. The Interpretation of a Coefficient of Correlation in Terms
Of 0-(est.)
The standard error of estimate, o-(eSt.)> is probably the
most practicable way of evaluating the effectiveness of a coeffi-
cient of correlation. This follows from the fact that a^st. xx),
which enables us to tell how accurately we can estimate an
individual's score on test Xi knowing his score on test Xo,
depends on the r between the two tests. When r = 1 . 00,
o"(est. xi> = • 00, which means that we can predict a score in
Xi from a knowledge of X2 with perfect accuracy — no error.
STATISTICAL METHOD AND TEST RESULTS 289
To take the opposite extreme, when r = . 00, o-(est. xx) = 01
directly, which means that we can only be certain that the
predicted score lies somewhere within the limits of the Xi dis-
tribution, i.e., within the limits, Obtained Score ±3c. In
other words, the estimate from the distribution of X\ alone is as
good as the estimate made with the addition of X2. As r
decreases from 1 . 00 to 0, the standard error of estimate rapidly
increases, so that predictions from the regression equation
range all of the way from certainty to practically guesswork.
The closeness of the correspondence denoted by an r, therefore,
may be gauged by the size of cr(est0.
We may illustrate with the following problem. Suppose
that the correlation between two tests X\ and X2 is .60, and
that aXl = 5. 00. Then er(est. Xl) is 5 X Vl - . 62 or 4 . 00, which is
only 20% less than 5.00 the <7(est. x£> for r= .00, i.e., for~a mini-
mum predictive value. The proportionate amount of reduc-
tion in (7(est. x)! as r varies from .00 to 1.00 is given by the
expression vl- r2, and hence it is possible to estimate the
" predictive " value of an r from Vl— r2 alone. This radical
(vl — r2) has been designated by Kelley1 the "coefficient of
alienation," and is usually denoted by the letter "k" k may
be thought of as measuring the absence of relation between
two variables Xi and X2, in the same way that r measured the
presence of relation. Thus when k = 1 . 00, r = . 00, and when
k = . 00, r = 1 . 00 — the larger the coefficient of alienation the
greater the lack of relation, and the less the value of the
prediction. In order to show how the estimate improves as r
increases, the k's for the values of r from .00 to 1.00 are given
in Table XXXIII.
It will be noted that r must be .866 before k is half way
between perfect correlation, and a guess: — before the stand-
ard error of estimate is reduced one-half. For r's of .30 and
less, the coefficients of alienation are so large that the predic-
1 Kelley, T. L., Principles Underlying the Classification of Men. Journal
of Applied Psychology, 1919, Vol. Ill, 1, p. 50.
290 STATISTICS IN PSYCHOLOGY AND EDUCATION
tions based on them are but little better than a guess. Even
with an r — . 99, it will be noticed that the standard error of
estimate is still \ as large as when k = 1 . 00. It is obvious,
then, that in order to estimate individual scores with accuracy,
the correlation should be at least . 90.
TABLE
XXXIII
Giving Coefficients of
Alienation k for
Values of r
FROM
.00
TO 1.00
r
fc= Vl-r2
r
k= y/i-r*
.00
1.0000
.80
.6000
.10
.9950
.8660
.5000
.20
.9798
.90
.4539
.30
.9539
.95
.3122
.40
.9165
.98
.1990
.50
.8660
.99
.1411
.60
.8000
1.00
.0000
.70
.7141
(.7071)
.7071
2. The Interpretation of a Coefficient of Correlation in Terms
of the Standard Error of Measurement, cr{M).
We have found (page 183) that the standard error of
measurement enables us to estimate the probable divergence of
an obtained score on a test from its corresponding true score.
Moreover, since <rw) = <riVl — ri2, the amount of this probable
divergence will depend to a large degree upon the size of the
self-correlation, ri2, and accordingly it follows that the value of
ri2 as a measure of relation may be determined from the size
of o-(jif). When r=1.00, for example, o-(ad=.00, and every
obtained score equals its true score exactly. When r = . 00, on
the other hand, cr(M) = <ri (the <j of the distribution) and we
can only be sure that the true score (corresponding to a given
obtained score) lies somewhere within the limits of the dis-
tribution— within the limits ±3c. In other words, when
r— .00, the probable divergence of an obtained score from its
true score is as great as it would be had we simply guessed that
the true score lay somewhere in the distribution.
To illustrate, suppose that the reliability coefficient of a given
STATISTICAL METHOD AND TEST RESULTS 291
test, n2=.80, and that 01*= 10.00. Then (T(M) = 10Vl- .80
or 4.472, and since <rw) is 10.00 when r=.00, evidently a
reliability coefficient of .80 serves to reduce a^M) to about
45% of what it would be in the event of a guess. The re-
duction in aw as r varies from 0 to 1.00 is given by the
expression vl- ru. Hence this factor may be used to test
the effectiveness of an obtained reliability coefficient, just as
k tests the value of the r between two tests. In Table XXXIV
the values of vl — rl2 have been calculated for r's from .00 to
1.00.
TA]
BLE
FOR
XXXIV
Values
r
of r
Giving Values
of vl— r12
V*l— TO
FROM . 00 TO 1 . 00
r
Vl~TO
.00
1.0000
.80
.4472
.10
.9487
.90
.3162
.20
.8944
.95
.2236
.30
.8367
.98
.1414
.40
.7746
.99
.1000
.50
.7071
1.00
.0000
.60
.6325
.70
.5477
.75
.5000
(
From Table XXXIV it
is evident that the self-correlation
of a test must be at least . 75 before v 1 — ri2 is half way between
complete reliability and a guess. For an 7*12 = .98, the chances
are still 68 in 100 that a given score will diverge from its true
score by as much as ± . 1414 of the a of the test. Since high
reliability coefficients, therefore (e.g., .90 or above), indicate
relatively large departures from perfect reliability, it is clear
that a self-correlation of, say, .30 or .40 is almost valueless.
3. Interpretation of a Coefficient of Correlation in Terms of the
Percentage of Common (Overlapping) Elements or
Factors
It is sometimes helpful to regard a coefficient of correlation
as a ratio which expresses — directly or indirectly — the per-
292 STATISTICS IN PSYCHOLOGY AND EDUCATION
centage of elements or factors common to the tests which are
correlated. Or again, r may be thought of as a device for
indicating the extent to which the factors which determine
capacity in the one test "overlap" those of another test.1 Let
us suppose that capacity in test X depends upon the presence
or absence of a+c independent, elemental, factors; and that
capacity in test Y depends upon the presence or absence of
b-\-c independent, elemental, factors. The a factors determine
X scores alone, the b factors Y scores alone, and the c factors
are common to both X and Y. Moreover, let us suppose
further that all factors, a, b, and c, are governed solely by the
laws of chance, so that each factor is as likely to be present as
absent in the same way that a coin when tossed is as likely to
fall heads as tails.
Now if we let na = total number of a factors, nh = total number
of b factors, and nc = the total number of c factors, it can be
shown 2 that the correlation between X and Y is given by the
formula :
r=-
n„
, = (71)
That is, the coefficient of correlation equals the number of com-
mon factors in X and F,
-X-
-Y-
a a a a
cccc
bbbbbbb
r =
.426
V8xIT
DIAGRAM XXVII
divided by the geometrical
mean of the total number
of factors in X and Y.
This situation is shown
graphically in Diagram
XXVII in which X is
determined by 8 factors, 4
a's and 4 c's, and 7 by 11 factors, 7 6's and 4 c's. The correla-
tion by formula (71) is
4 4
■ or -7== = A9Q
V(4 + 4)(7-H) fSxll
1 The following is adapted from the discussion by Kelley, Statistical Method,
pp. 189-190.
2 See Kelley, Statistical Method, 1923, p. 190; or Brown, Wm., Essentials
of Mental Measurement, 1911, pp. 79-SO.
STATISTICAL METHOD AND TEST RESULTS
293
If the number of elementary factors determining the score
in X equals exactly the number determining the score in Y, so
that n& = nh, formula (71) becomes
nc
n&+nc'
(72)
and the coefficient of correlation is now simply the decimal
fraction which indicates what proportion of the causes influenc-
ing performance in X and Y are common to both. If t = ihe
number of common factors (nc) and if s = the total number of
factors, present in X and Y (na+nc) r is simply -. (Remem-
ber that the factors in X and Y are assumed to be equal
in number and influence.)
This condition is illustra-
ted in Diagram XXVIII.
Since X is determined by
8 factors, 4 a's and 4 c's
and Y by 8 factors, 4 b's and
4 c's, the correlation by
formula (72) is 4/8 or .50.
Now let us assume, lastly, that Y is completely determined
by nc elements, and that X is determined by these same elements
plus n& elements in addition (nb = 0). Formula (71) then
becomes
■Y-
bbbb
-X-
a a a a
c c c c
= .50
DIAGRAM XXVIII
r —
V^c(™a+™c)
(73)
and the coefficient of correlation equals the number of common
elements in X and Y divided by the geometrical mean of the total
number of factors in X and in Y. Diagram XXIX shows this
graphically. Y is determined by 4 c's and X by these factors plus
.4
4 a's in addition: the correlation, therefore, is , : or .707. If
V4X8
a a a a
-Y-
c c c c
294 STATISTICS IN PSYCHOLOGY AND EDUCATION
we square the r obtained from formula (73), we have that
r2=rf^' ■ ™
that is, the square of the coefficient gives the extent to which
the elements in Y overlap those of X:— or the proportion of
elements in X which are also involved in Y. In Diagram XXIX
note that Y overlaps X 50% and that r2— i.e., (.707)2— is .50 as
_x„ it should be.1 Moreover, since the
coefficient of alienation will equal
.707 when r=.707 (see Table
XXXIII), it follows that an r of
. 707 (and not . 50) should be taken
r= 4 =.7n7 as half of a perfect correlation.2
On the same assumptions, an over-
DIAGRAM XXIX , , oolr)y ,
lapping oi 33 1% common ele-
ments— i.e., r2=.3334 — will give a correlation of .578, which
is 1/3 of a perfect correlation; and an overlapping of 25%
common elements, r2 = . 25, gives an r = . 50, which is 1/4 of a
perfect correlation. By analogy, an r of .30 or less implies
so slight a degree of overlapping that there can be a very small
percentage of common elements.
The coefficient of correlation as a measure of the percen-
tage of common factors may be seen to best advantage in
series formed by tossing coins or throwing dice, in which
the " overlapping " is arbitrarily determined and controlled at
will. As an illustration, consider the correlation table in
Diagram XXX in which is shown the relation between two
series of 500 successive throws of 12 pennies made in the fol-
1 This result has interesting implications. Thus if all of the elements in
test X2 are common to X\ (e.g., a criterion) the extent to which A' 2 overlaps
Ai is given by simply squaring the coefficient, rXixi- The assumption must
be made, of course, that the scores in both tests are summations of independent
and similar elements whose presence or absence is governed by chance alone.
3 Woodworth, R. S., Combining the Results of Several Tests: A Study in
Statistical Method, Psychological Review, 1912, XIX, p. 113. Hull Clark,
The Joint Yield from Teams of Tests, Journal of Educational Psychology,
1923, 14, pp. 396-406.
STATISTICAL METHOD AND TEST RESULTS
295
DIAGRAM XXX
Showing the number of heads in 500 successive throws of 12 pennies
in which 7 pennies were tossed in the second throw and 5 remained as they
fell in the first throw of all 12 together.
Heads in First Toss
0
1
2
3
4
5
6
7
8
9
10
11
12
Total
12
1
11
1
2
1
2
3
1
1
10
10
CO
9
2
9
13
4
3
31
o
H
8
1
5
9
10
18
14
4
2
63
Q
O
7
1
2
5
14
24
28
10
7
4
95
o
6
1
3
9
18
27
29
16
3
2
1
109
CO
5
4
11
23
21
15
9
5
1
83
P
<
4
3
6
9
21
14
10
69
w
3
3
3
8
4
4
4
26
2
(3
1
5
1
1
11
1
1
1
11C
GO
21
9
2
0
54
93
112
Total
11
20
2
500
X
Y
a a a a a a a
G c c c c
b b b b b 6 b
na = n& = 7
nc
r =
na -\-nc 12
By calculation (product-moment)
r=.424.
.416.
(72)
i From Pearl, R., Medical Biometry and Statistics, p. 297 (after Darbishire).
296 STATISTICS IN PSYCHOLOGY AND EDUCATION
DIAGRAM XXXI
Showing the results of 100 successive throws of dice in first throw of
which (X) 5 dice were thrown, counted, and left down; and in each second
throw of which (Y) 5 additional dice were thrown and counted together
with the 5 left down (10 in all).
Fiest Throw
OF
5 Dice (X)
w
o
Q
o
1-1
o
o
«
w
H
n
O
u
w
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Total
45
1
1
1
1
1
1
1
1
4
1
2
1
2
2
1
1
1
11
2
1
1
2
2
1
1
1
1
12
1
1
2
2
1
2
1
1
1
1
1
14
1
1
3
2
1
2
1
1
2
1
15
1
2
1
2
6
1
1
1
1
1
1
6
1
1
1
1
1
5
1
2
1
1
1
3
1
5
6
2
44
43
42
41
40
6
6
6
39
38
37
1
1
9
8
9
36
35
34
1
9
3
5
5
4
2
5
33
1
32
31
1
1
3
2
1
1
6
30
29
1
28
2
27
2
4
o
o
3
1
2
100
26
1
25
1
Total
3
1
7
By calculation (product-moment)
r = . 694
nc = o
(5) N
a a a a a
(5)
c c c c c
—Y—
Vnc(na+nc) V5X10
= .707
(73)
STATISTICAL METHOD AND TEST RESULTS 297
lowing way: first, all 12 pennies were tossed, and the number
of heads recorded and noted in the X column; then 5 coins
were left lying and the remaining 7 were tossed again and the
number of heads in all 12 recorded and noted in column Y,
opposite the X entry. By this scheme 5 coins (factors) contrib-
ute to each pair of tosses ; and hence, according to formula (72)
the correlation should be 5/12 or .416. By the product-moment
formula the actual correlation between the two series is .424,
which indicates a very close correspondence between actual
and theoretical results. The situation existing in each pair of
X and Y tosses is shown in the figure in Diagram XXX. If 4
coins had been left lying, the r would have been 4/12 or .334;
if 6 had been left lying, r would have been 6/12 or . 50 etc. A
number of diagrams of the sort shown, in which the number of
common factors (i.e., coins left lying) varies from 0 to 12, and r
from 0 to 1 . 00 may be found in Pearl's Medical Biometry and
Statistics, pages 294-300.
Now suppose that we calculate the correlation between two
series of dice throws made according to the following scheme : 1
5 dice are thrown, and the total read and recorded in the X
column; then 5 additional dice are thrown and the total of
all 10 (the 5 left and 5 just thrown) are read and recorded
in the Y column. If this is continued until 100 throws have
been made, we shall have 100 X and 100 Y entries, each Y
throw (of 10 dice) "overlapped" to the extent of 50% by its
corresponding X throw (of 5 dice). And since all of the ele-
ments in X are completely contained in Y} the correlation be-
5
tween X and Y should, by formula (73), be , or .707.
V5X10
(See Diagram XXXI and accompanying figure.) Actually, the
correlation by the product-moment formula is .694, which
indicates, again, a very close correspondence between actual
and theoretical results. The square of this r gives us approxi-
mately . 50 as the percentage of common elements in X and Y :
1 These throws were made by the writer*
298 STATISTICS IN PSYCHOLOGY AND EDUCATION
that is, we have one half of a perfect correlation. (See page
294.)
While formulas (71-74) are interesting and suggestive as
giving us the means of interpreting a coefficient of correlation
under certain special or restricted conditions, it would be
a mistake to apply them generally, — to assume that by simply
squaring the coefficient of correlation we can always determine
the percentage of common factors or the amount of overlapping.
It seems likely that the scores on most psychological tests as
well as many social and educational measurements are the
result of the combined action of many factors which are often
dependent on each other, and probably interwoven in a rela-
tively complex manner. At any rate, we do not know that a
test score is simply the sum of a certain number of similar and
independent elements.
Summary
From the discussion in the preceding paragraphs, it is
evident that even with correlation coefficients which we have
been accustomed to think of as high, the departure from perfect
correlation is considerable. Strictly speaking, the term "high
correlation " should be applied only to coefficients which are
.95 or above. However, in mental, social, and educational
measurements there are so many actual and potential sources
of error due to the variability of the material dealt with, and
the relative crudity of the measurements made, that very few
tests indeed could meet this requirement. Very seldom do
correlations between tests run above .70 or .75; and hence it
is probably justifiable, in view of the limitations mentioned, to
regard such coefficients as high. There seems to be fairly
general agreement among workers with tests that an
r from .00 to =b .20 denotes indifferent or negligible relation.
r from ± .20 to ± .40 denotes low correlation: present but slight.
r from ± . 40 to d= .70 denotes substantial or marked relationship.
r from ± . 70 to =fc 1 . 00 denotes high relation.
This is a tentative classification which is to be taken as only
STATISTICAL METHOD AND TEST RESULTS 299
generally true. The size of a correlation coefficient should
always be evaluated with due regard for the material dealt with,
the size of the sample, and PET, no matter what its absolute
value.
PROBLEMS
1. The self-correlation of a certain test is .60.
(a) How much must the test be lengthened to raise the self -correla-
tion to .90?
(6) What effect will doubling the test have on its reliability?
2. Two equivalent half-scales are made up from the Downey Will-
Temperament * Test in the following way: (1) by grouping all
odd-numbered tests in one half-scale, and all even-numbered
tests in the other; (2) by grouping the first two tests of every
pattern into one half-scale, and the last two tests into another
half-scale ; (3) by grouping the first and last tests of each pattern
into one half-scale, and the second and third tests of each pattern
into a second half-scale.
Reliability coefficients for the half -scale were found as follows by
the three methods :
iV=146
Method
Reliability Coefficient
1
.17
2
.31
3
.24
Average
.24
What is the reliability of the whole Downey test?
3. In a small group the reliability coefficient of a test is .55 and the
a of the test scores is 3.00. What must the self-correlation of
this test be in a larger group whose a is 5.00, in order to have
the same degree of reliability?
4. The reliability coefficient of a test, as found in a large unselected
group, is .92; the Average is 142 and a is 16.00. If an individual
makes 150 on the test,
(a) What is the PE of this score, i.e., the PE{M)1
(b) Within what range does the true score lie?
i Ruch, G. M., and Del Manzo, M. C, The Downey Will-Temperament
Group Test: A Further Analysis of Its Reliability and Validity. Journal
Applied Psychology, Vol. VII, 1923, p. 65,
300 STATISTICS IN PSYCHOLOGY AND EDUCATION
(c) In a second test of a different function, the reliability coeffi-
cient is .86; the average is 54 and cr is 10.00. In which test
are the obtained scores the more reliable, i.e., closer to the
true scores?
5. The reliability coefficient of a test is .80. What is the maximum
self-correlation obtainable with this test as it stands?
6. Given the following records (all in seconds) for 100 Barnard
Freshmen; -1 and the scores made by individual A.
Tests Coordinate Tapping Color Naming Opposites
Average 82.7 376.3 57.0 51.1
SD 10.8 51.7 8.8 10.3
A's scores 85 350 62 40
(a) Combine A's scores by the method of variability weighting
all tests 1.
(b) Combine A's scores weighting Coord, and Tapping 1 each,
Color Naming 3, and Opposites 4.
7. Using the data in Example 6 above, combine A's scores by the two
methods given on pages 282 and 283. Since all scores are in
seconds, the higher the score numerically the lower it actually is.
8. One hundred and fifty high school seniors make an average score
of 120 on Army Alpha with a cr of 21.6. Two weeks later the
group is praised for its performance (without, however, being
told what the scores were) and given a second form of Alpha on
which the average score is 126 and the a is 24.2. The r between
the tests is .86.
(a) Is the effect of the incentive (praise) plus the practice effect
sufficient to bring about a real increase in average score? How
would you rule out the practice effect?
(b) Why is it necessary to have the correlation between the tests?
9. A battery of tests correlates .85 with a criterion. Assuming that
performance on the battery is completely determined by X
elements, and performance on the criterion by X-\-Y elements,
to what extent may we say that the battery probably " overlaps "
the criterion?
10. Interpret a coefficient of correlation ?*=.50 in three ways; an
r=.65?
i Carothers, F. E., The Psychological Examination of College Students,
Archives of Psychology, 1921, No. 46, pp. 21ff.
STATISTICAL METHOD AND TEST RESULTS 301
Answers
1. (a) 6 times.
(6) r=.75
2. Method 1: r= .29. Method 2: r= .47. Method 3: r=.39.
Average of all three methods : r = . 38.
3. r=.84.
4. (a) P#(M) = 3.05.
(6) Between 162.2 and 137.8.
(c) In the first test. The —^=.021 (first test); — —
Av. Av.
= .047 (second test).
5. r=.89.
6. (a) Taking as multipliers for the four tests, 1, -J, 1, and 1, re-
spectively, we have 257 as A's composite score.
(6) A's score is 501. (Since the measures of performance are in
time units, the higher the numerical score the lower the actual
performance.)
7. A's scores are 47, 57, 42, and 65. Her average is 52.75. (Hull's
method.)
A's scores are —.213, +.509, — .568, +1.078; her average is
.202. (This means that A stands .202<7 above the average cf
the group on the four tests.)
D
8. (a) Yes. is 5+.
°dlff.
9. About 72% common elements.
REFERENCES
The following books will be found to be helpful as general
references :
1. Primer of Statistics, by W. P. and E. M. Elderton. A. & C.
Black, Ltd., London. 1910.
2. Mental and Social Measurements, by Edward L. Thorndike.
Published by Teachers College, Columbia University. 1912
(revised edition).
3. Statistical Methods Applied to Education, by Harold O. Rugg.
Houghton Mifflin Company. 1917.
302 STATISTICS IN PSYCHOLOGY AND EDUCATION
4. An Introduction to Statistical Methods, by Horace Secrist.
Macmillan Company. 1917.
5. How to Measure in Education, by Wm. M. McCall. The Mac-
millan Company. 1922.
6. The Theory of Educational Measurements, by Walter Scott
Monroe. Houghton Mifflin Company. 1923.
7. The Fundamentals of Statistics, by L. L. Thurstone. The Mac-
millan Company. 1925.
8. Statistical Method in Educational Measurement, by Arthur S.
Otis. World Book Company. 1925.
More advanced books are:
1. Elements of Statistics, by A. L. Bowley. P. S. King and Son,
London. 1920 (fourth edition).
2. An Introduction to the Theory of Statistics, by G. Udny Yule.
Chas. Griffin and Company, London. 1919 (5th edition).1
3. Essentials of Mental Measurement, by W. M. Brown and G. H.
Thomson. Cambridge University Press. 1920.
4. A First Course in Statistics, by D. Caradog Jones. G. Bell
& Sons, London. 1921.
5. Statistical Method, by Truman L. Kelley. The Macmillan Com-
pany. 1923.
6. Handbook of Mathematical Statistics, by H. L. Rietz et al.
Houghton Mifflin Company. 1924.
Aids to Computation:
1. Barlow's Tables of Squares, Cubes, Square Roots, Cube Roots,
Reciprocals of numbers from 1 to 10,000. E. and F. N. Spon,
Ltd., London. 1921.
2. Tables of Vl — r2 and 1— r2 for use in Partial Correlation and
Trigonometry, by John Rice Miner, Sc.D. Johns Hopkins
Press. 1922.
1 The book by Yule is a classic which should be known to every serious
student of mental and social measurements.
STATISTICAL METHOD AND TEST RESULTS
303
Table of Squares and Square Roots of the Numbers from 1 to 1000
Number
Square
Square Root
1
1
1.000
2
4
1.414
3
9
1.732
4
16
2.000
5
25
2.236
6
36
2.449
7
49
2.646
8
64
2.828
9
81
3.000
10
100
3.162
11
121
3.317
12
144
3.464
13
169
3.606
14
196
3.742
15
2 25
3.873
16
2 56
4.000
17
2 89
4.123
18
3 24
4.243
19
3 61
4.359
20
4 00
4.472
21
4 41
4.583
22
4 84
4.690
23
5 29
4.796
24
5 76
4.899
25
6 25
5.000
26
6 76
5.099
27
7 29
5.196
28
7 84
( 5.292
29
8 41
5.385
30
9 00
5.477
31
9 61
5.568
32
10 24
5.657
33
10 89
5.745
34
1156
5.831
35
12 25
5.916
36
12 96
6.000
37
13 69
6.083
38
14 44
6.164
39
15 21
6.245
40
16 00
6.325
41
16 81
6.403
42
17 64
6.481
43
18 49
6.557
44
19 36
6.633
45
20 25
6.708
46
21 16
6.782
47
22 09
6.856
48
23 04
6.928
49
24 01
7.000
50
25 00
7.071
imber
Square
Square Root
51
26 01
7.141
52
27 04
7.211
53
28 09
7.280
54
29 16
7.348
55
30 25
7.416
56
31 36
7.483
57
32 49
7.550
58
33 64
7.616
59
34 81
7.681
60
36 00
7.746
61
37 21
7.810
62
38 44
7.874
63
39 69
7.937
64
40 96
8.000
65
42 25
8.062
66
43 56
8.124
67
44 89
8.185
68
46 24
8.246
69
47 61
8.307
70
49 00
8.367
71
50 41
8.426
72
51 84
8.485
73
53 29
8.544
74
54 76
8.602
75
56 25
8.660
76
57 76
8.718
77
59 29
8.775
78
60 84
8.832
79
62 41
8.888
80
64 00
8.944
81
65 61
9.000
82
67 24
9.055
83
68 89
9.110
84
70 56
9.165
85
72 25
9.220
86
73 96
9.274
87
75 69
9.327
88
77 44
9.381
89
79 21
9.434
90
8100
9.487
91
82 81
9.539
92
84 64
9.592
93
86 49
9.644
94
88 36
9.695
95
90 25
9.747
96
92 16
9.798
97
94 09
9.849
98
96 04
9.899
99
98 01
9 950
LOO
100 00
10.000
304 STATISTICS IN PSYCHOLOGY AND EDUCATION
Table of Squares and Square Roots — Continued
dumber
Square
Square Root
Number
Square
Square Root
101
1 02 01
10.050
151
2 28 01
12.288
102
1 04 04
10.100
152
2 31 04
12.329
103
1 06 09
10.149
153
2 34 09
12.369
104
1 08 16
10.198
154
2 37 16
12.410
105
1 10 25
10.247
155
2 40 25
12.450
106
1 12 36
10.296
156
2 43 36
12 . 490
107
1 14 49
10.344
157
2 46 49
12 . 530
108
1 16 64
10.392
158
2 49 64
12 . 570
109
1 18 81
10.440
159
2 52 81
12.610
110
121 00
10.488
160
2 56 00
12 . 649
111
123 21
10.536
161
2 59 21
12 . 689
112
1 25 44
10.583
162
2 62 44
12.728
113
1 27 69
10.630
163
2 65 69
12.767
114
129 96
10.677
164
2 68 96
12.806
115
132 25
10.724
165
2 72 25
12 . 845
116
134 56
10.770
166
2 75 56
12.884
117
1 36 89
10.817
167
2 78 89
12.923
118
139 24
10.863
168
2 82 24
12.961
119
141 61
10.909
169
2 85 61
13 . 000
120
144 00
10.954
170
2 89 00
13.038
121
146 41
11.000
171
2 92 41
13.077
122
148 84
11.045
172
2 95 84
13.115
123
1 51 29
11.091
173
2 99 29
13.153
124
1 53 76
11.136
174
3 02 76
13.191
125
156 25
11.180
175
3 06 25
13.229
126
158 76
11.225
176
3 09 76
13.266
127
1 61 29
11.269
177
3 13 29
13.304
128
1 63 84
11.314
178
3 16 84
13.342
129
1 66 41
11.358
179
3 20 41
13.379
130
1 69 00
11.402
180
3 24 00
13.416
131
1 71 61
11.446
181
3 27 61
13.454
132
1 74 24
11.489
182
3 31 24
13.491
133
1 76 89
11.533
183
3 34 89
13 . 528
134
1 79 56
11.576
184
3 38 56
13.565
135
1 82 25
11.619
185
3 42 25
13.601
136
184 96
11.662
186
3 45 96
13 . 638
137
1 87 69
11.705
187
3 49 69
13.675
138
1 90 44
11.747
188
3 53 44
13.711
139
1 93 21
11.790
189
3 57 21
13 . 74S
140
1 96 00
11.832
190
3 61 00
13 . 784
141
1 98 81
11.874
191
3 64 81
13.S20
142
2 01 64
11.916
• 192
3 68 64
13 . S56
143
2 04 49
11.958
193
3 72 49
13 . 892
144
2 07 36
12.000
194
3 76 36
13 . 92S
145
2 10 25
12.042
195
3 80 25
13.964
146
2 13 16
12.083
196
3 84 16
14.000
147
2 16 09
12.124
197
3S8 09
14.036
148
2 19 04
12.166
198
3 92 04
14.071
149
2 22 01
12.207
199
3 96 01
14.107
150
2 25 00
12.247
200
4 00 00
14.142
STATISTICAL METHOD AND TEST RESULTS
305
Table of Squares and Square Roots — Continued
dumber
Square
Square Root
Number
Square
Square Root
201
4 04 01
14.177
251
6 30 01
15.843
202
4 08 04
14.213
252
6 35 04
15.875
203
4 12 09
14.248
253
6 40 09
15 . 906
204
4 16 16
14 . 283
254
6 45 16
15.937
205
4 20 25
14.318
255
6 50 25
15.969
206
4 24 36
14.353
256
6 55 36
16.000
207
4 28 49
14.387
257
6 60 49
16.031
208
4 32 64
14.422
258
6 65 64
16 . 062
209
4 36 81
14.457
259
6 70 81
16.093
210
4 41 00
14.491
260
6 76 00
16.125
211
4 45 21
14.526
261
6 81 21
16.155
212
4 49 44
14.560
262
6 86 44
16.186
213
4 53 69
14.595
263
6 91 69
16.217
214
4 57 96
14.629
264
6 96 96
16.248
215
4 62 25
14.663
265
7 02 25
16.279
216
4 66 56
14.697
266
7 07 56
16.310
217
4 70 89
14.731
267
7 12 89
16.340
218
4 75 24
14.765
268
7 18 24
16.371
219
4 79 61
14.799
269
7 23 61
16.401
220
4 84 00
14.832
270
7 29 00
16.432
221
4 88 41
14.866
271
7 34 41
16.462
222
4 92 84
14.900
272
7 39 84
16.492
223
4 97 29
14.933
273
7 45 29
16.523
224
5 01 76
14.967
274
7 50 76
16.553
225
5 06 25
15.000
275
7 56 25
16.583
226
5 10 76
15.033
276
7 61 76
16.613
227
5 15 29
15.067
277
7 67 29
16.643
228 .
5 19 84
15.100
278
7 72 84
16 . 673
229
5 24 41
15.133
279
7 78 41
16.703
230
5 29 00
15.166
280
7 84 00
16.733
231
5 33 61
15.199
281
7 89 61
16.763
232
5 38 24
15.232
282
7 95 24
16.793
233
5 42 89
15.264
283
8 00 89
16 . 823
234
5 47 56
15.297
284
8 06 56
16.852
235
5 52 25
15.330
285
8 12 25
16 . 882
236
5 56 96
15.362
286
8 17 96
16.912
237
5 61 69
15.395
237
8 23 69
16.941
238
5 66 44
15.427
238
8 29 44
16.971
239
5 71 21
15.460
289
8 35 21
17.000
240
5 76 00
15.492
290
8 41 00
17.029
241
5 80 81
15.524
291
8 46 81
17.059
242
5 85 64
15.556
292
8 52 64
17.088
243
5 90 49
15.588
293
8 58 49
17.117
244
5 95 36
15.620
294
8 64 36
17.146
245
6 00 25
15.652
295
8 70 25
17.176
246
6 05 16
15.684
296
8 76 16
17.205
247
6 10 09
15.716
297
8 82 09
17.234
248
6 15 04
15.748
298
8 88 04
17.263
249
6 20 01
15.780
299
8 94 01
17 . 292
250
6 25 00
15.811
300
9 00 00
17.321
306 STATISTICS IN PSYCHOLOGY AND EDUCATION
Table of Squares and Square Roots
Number
Square
Square Root
301
9 06 01
17.349
302
9 12 04
17.378
303
9 18 09
17.407
304
9 24 16
17.436
305
9 30 25
17.464
306
9 36 36
17.493
307
9 42 49
17.521
308
9 48 64
17.550
309
9 54 81
17.578
310
9 61 00
17.607
311
9 67 21
17.635
312
9 73 44
17 . 664
313
9 79 69
17.692
314
9 85 96
17.720
315
9 92 25
17.748
316
9 98 56
17.776
317
10 04 89
17 . 804
318
10 11 24
17.833
319
10 17 61
17.861
320
10 24 00
17.889
321
10 30 41
17.916
322
10 36 84
17.944
323
10 43 29
17.972
324
10 49 76
18.000
325
10 56 25
18.028
326
10 62 76
18.055
327
10 69 29
18.083
328
10 75 84
18.111
329
10 82 41
18.138
330
10 89 00
18.166
331
10 95 61
18.193
332
11 02 24
18.221
333
1108 89
18.248
334
11 15 56
18.276
335
11 22 25
18.303
336
11 28 96
18.330
337
11 35 69
18.358
338
11 42 44
18.385
339
1149 21
18.412
340
1156 00
18.439
341
11 62 81
18.466
342
11 69 64
18.493
343
11 76 49
18.520
344
11 83 36
18.547
345
11 90 25
18.574
346
11 97 16
18.601
347
12 04 09
18.628
348
12 11 04
18.655
349
12 18 01
18.682
350
12 25 00
18.708
^.re Roots — Continued
Number
Square
Square Root
351
12 32 01
18.735
352
12 39 04
18.762
353
12 46 09
18.788
354
12 53 16
18.815
355
12 60 25
18.841
356
12 67 36
18.868
357
12 74 49
18.894
358
12 81 64
18.921
359
12 88 81
18.947
360
12 96 00
18.974
361
13 03 21
19.000
362
13 10 44
19.026
363
13 17 69
19.053
364
13 24 96
19.079
365
13 32 25
19.105
366
13 39 56
19.131
367
13 46 89
19.157
368
13 54 24
19.183
369
13 61 61
19.209
370
13 69 00
19.235
371
13 76 41
19.261
372
13 83 84
19.287
373
13 91 29
19.313
374
13 98 76
19.339
375
14 06 25
19.363
376
14 13 76
19.391
377
14 21 29
19.416
378
14 28 84
19.442
379
14 36 41
19.46S
380
14 44 00
19 . 494
381
14 51 61
19.519
382
14 59 24
19.545
383
14 66 89
19.570
384
14 74 56
19.596
385
14 82 25
19.621
386
14 89 96
19.647
387
14 97 69
19.672
388
15 05 44
19.698
389
15 13 21
19.723
390
15 21 00
19.748
391
15 28 81
19.774
392
15 36 64
19.799
393
15 44 49
19.824
394
15 52 36
19.849
395
15 60 25
19.875
396
15 6S 16
19.900
397
15 76 09
19.925
398
15 84 04
19 . 950
399
15 92 01
19.975
400
16 00 00
20.000
STATISTICAL METHOD AND TEST RESULTS
307
Table of Squares and Square Roots — Continued
Number
Square
Square Root
Number
Square
Square Root
401
16 08 01
20.025
451
20 34 01
21.237
402
16 16 04
20 . 050
452
20 43 04
21.260
403
16 24 09
20 . 075
453
20 52 09
21 . 284
404
16 32 16
20.100
454
20 61 16
21.307
405
16 40 25
20.125
455
20 70 25
21.331
406
16 48 36
20.149
456
20 79 36
21.354
407
16 56 49
20.174
457
20 88 49
21.378
408
16 64 64
20.199
458
20 97 64
21.401
409
16 72 81
20 . 224
459
21 06 81
21.424
410
16 81 00
20.248
460
21 16 00
21.448
411
16 89 21
20.273
461
2125 21
21.471
412
16 97 44
20.298
462
21 34 44
21.494
413
17 05 69
20.322
463
21 43 69
21.517
414
17 13 96
20.347
464
21 52 96
21.541
415
17 22 25
20.372
465
21 62 25
21.564
416
17 30 56
20.396
466
21 71 56
21.587
417
17 38 89
20.421
467
21 80 89
21.610
418
17 47 24
20.445
468
21 90 24
21.633
419
17 55 61
20.469
469
21 99 61
21.656
420
17 64 00
20.494
470
22 09 00
21.679
421
17 72 41
20.518
471
22 18 41
21.703
422
17 80 84
20.543
472
22 27 84
21.726
423
17 89 29
20.567
473
22 37 29
21 . 749
424
17 97 76
20.591
474
22 46 76
21.772
425
18 06 25
20.616
475
22 56 25
21.794
426
18 14 76
20.640
476
22 65 76
21.817
427
18 23 29
20.664
477
22 75 29
21.840
428
18 31 84
20.688
478
22 84 84
21.863
429
18 40 41
20.712
479
22 94 41
21.886
430
18 49 00
20.736
480
23 04 00
21.909
431
18 57 61
20.761
481
23 13 61
21.932
432
18 66 24
20.785
482
23 23 24
21.954
433
18 74 89
20.809
483
23 32 89
21.977
434
18 83 56
20.833
484
23 42 56
22 . 000
435
18 92 25
20.857
485
23 52 25
22 . 023
436
19 00 96
20.881
486
23 61 96
22 . 045
437
19 09 69
20.905
487
23 71 69
22 . 068
438
19 18 44
20.928
488
23 81 44
22.091
439
19 27 21
20.952
489
23 91 21
22.113
440
19 36 00
20.976
490
24 01 00
22.136
441
19 44 81
21 . 000
491
24 10 81
22.159
442
19 53 64
21.024
492
24 20 64
22.181
443
19 62 49
21.048
493
24 30 49
22 . 204
444
19 71 36
21.071
494
24 40 36
22 . 226
445
19 80 25
21.095
495
24 50 25
22.249
446
19 89 16
21.119
496
24 60 16
22.271
447
19 98 09
21.142
497
24 70 09
22 . 293
448
20 07 04
21.166
498
24 80 04
22.316
449
20 16 01
21.190
499
24 90 01
22.338
450
20 25 00
21.213
500
25 00 00
22.361
308 STATISTICS IN PSYCHOLOGY AND EDUCATION
Table of Squares and Square Roots — Continued
Number
Square
Square Root
Number
Square
Square Root
501
25 10 01
22 . 383
551
30 36 01
23.473
502
25 20 04
22 . 405
552
30 47 04
23.495
503
25 30 09
22.428
553
30 58 09
23.516
504
25 40 16
22.450
554
30 69 16
23.537
505
25 50 25
22.472
555
30 80 25
23 . 558
506
25 60 36
22 . 494
556
30 91 36
23 . 580
507
25 70 49
22.517
557
31 02 49
23.601
508
25 80 64
22 . 539
558
31 13 64
23.622
509
25 90 81
22.561
559
31 24 81
23 . 643
510
26 01 00
22 . 583
560
31 36 00
23 . 664
511
26 11 21
22 . 605
561
31 47 21
23 . 685
512
26 21 44
22 . 627
562
31 58 44
23 . 707
513
26 31 69
22 . 650
563
31 69 69
23.728
514
26 41 96
22 . 672
564
31 80 96
23 . 749
515
26 52 25
22 . 694
565
31 92 25
23.770
516
26 62 56
22.716
566
32 03 56
23.791
517
26 72 89
22.738
567
32 14 89
23.812
518
26 83 24
22 . 760
568
32 26 24
23.833
519
26 93 61
22 . 782
569
32 37 61
23 . 854
520
27 04 00
22.804
570
32 49 00
23 . 875
521
27 14 41
22 . 825
571
32 60 41
23.896
522
27 24 84
22 . 847
572
32 71 84
23.917
523
27 35 29
22 . 869
573
32 83 29
23.937
524
27 45 76
22.891
574
32 94 76
23.958
525
27 56 25
22.913
575
33 06 25
23.979
526
27 66 76
22.935
576
33 17 76
24 . 000
527
27 77 29
22.956
577
33 29 29
24.021
528
27 87 84
22 . 978
578
33 40 84
24 . 042
529
27 98 41
23.000
579
33 52 41
24 . 062
530
28 09 00
23 . 022
580
33 64 00
24.0S3
531
28 19 61
23 . 043
581
33 75 61
24.104
532
28 30 24
23 . 065
582
33 S7 24
24.125
533
28 40 89
23 . 087
583
33 98 89
24.145
534
28 51 56
23.108
584
34 10 56
24.166
535
28 62 25
23.130
585
34 22 25
24.1S7
536
28 72 96
23 . 152
586
34 33 96
24.207
537
28 83 69
23.173
587
34 45 69
24.228
538
28 94 44
23.195
528
34 57 44
24.249
539
29 05 21
23.216
589
34 69 21
24.269
540
29 16 00
23.238
590
34 81 00
24 . 290
541
29 26 81
23.259
591
34 92 81
24.310
542
29 37 64
23.281
592
35 04 64
24.331
543
29 48 49
23 . 302
593
35 16 49
24 . 352
544
29 59 36
23 . 324
594
35 28 36
24.372
545
29 70 25
23.345
595
35 40 25
24.393
546
29 81 16
23 . 367
596
35 52 16
24.413
547
29 92 09
23 . 388
597
35 04 09
24.434
548
30 03 04
23.409
598
35 76 04
24.454
549
30 14 01
23.431
599
35 88 01
24.474
550
30 25 00
23.452
600
36 00 00
24.495
STATISTICAL METHOD AND TEST RESULTS
309
Table of Squares and Square Roots — Continued
Number
Square
Square Root
Number
Square
Square Roc
601
36 12 01
24.515
651
42 38 01
25.515
602
36 24 04
24.536
652
42 51 04
25 . 534
603
36 36 09
24 . 556
653
42 64 09
25 . 554
604
36 48 16
24.576
654
42 77 16
25.573
605
36 60 25
24 . 597
655
42 90 25
25.593
606
36 72 36
24.617
656
43 03 36
25.612
607
36 84 49
24 . 637
657
43 16 49
25 . 632
608
36 96 64
24 . 658
658
43 29 64
25 . 652
609
37 08 81
24.678
659
43 42 81
25.671
610
37 21 00
24 . 698
660
43 56 00
25 . 690
611
37 33 21
24.718
661
43 69 21
25.710
612
37 45 44
24.739
662
43 82 44
25.729
613
37 57 69
24.759
663
43 95 69
25 . 749
614
37 69 96
24.779
664
44 08 96
25.768
615
37 82 25
24.799
665
44 22 25
25.788
616
37 94 56
24.819
666
44 35 56
25 . 807
617
38 06 89
24.839
667
44 48 89
25.826
618
38 19 24
24.860
668
44 62 24
25 . 846
619
38 31 61
24.880
669
44 75 61
25.865
620
38 44 00
24 . 900
670
44 89 00
25.884
621
38 56 41
24 . 920
671
45 02 41
25 . 904
622
38 68 84
24.940
672
45 15 84
25.923
623
38 81 29
24.960
673
45 29 29
25.942
624
38 93 76
24.980
674
45 42 76
25 . 962
625
39 06 25
25 . 000
675
45 56 25
25.981
626
39 18 76
25 . 020
676
45 69 76
26 . 000
627
39 31 29
25 . 040
677
45 83 29
26.019
628
39 43 84
25 . 060
678
45 96 84
26.038
629
39 56 41
25.080
679
46 10 41
26 . 058
630
39 69 00
25.100
680
46 24 00
26.077
631
39 81 61
25.120
681
46 37 61
26.096
632
39 94 24
25.140
682
46 51 24
26.115
633
40 06 89
25.159
683
46 64 89
26.134
634
40 19 56
25.179
684
46 78 56
26.153
635
40 32 25
25.199
685
46 92 25
26.173
636
40 44 96
25.219
686
47 05 96
26.192
637
40 57 69
25 . 239
687
47 19 69
26.211
638
40 70 44
25.259
688
47 33 44
26.230
639
40 83 21
25 . 278
689
47 47 21
26 . 249
640
40 96 00
25 . 298
690
47 61 00
26.268
641
41 08 81
25.318
691
47 74 81
26.287
642
41 21 64
25.338
692
47 88 64
26.306
643
41 34 49
25.357
693
48 02 49
26.325
644
41 47 36
25.377
694
48 16 36
26.344
645
41 60 25
25.397
695
48 30 25
26 . 363
646
41 73 16
25.417
696
48 44 16
26.382
647
41 86 09
25.436
697
48 58 09
26.401
648
41 99 04
25.456
698
48 72 04
26.420
649
42 12 01
25.475
699
48 86 01
26 . 439
650
42 25 00
25.495
700
49 00 00
26.458
310 STATISTICS IN PSYCHOLOGY AND EDUCATION
Table of Squares and Square Roots — Continued
dumber
Square
Square Root
Number
Square
Square Root
701
49 14 01
26.476
751
56 40 01
27 . 404
702
49 28 04
26.495
752
56 55 04
27.423
703
49 42 09
26.514
753
56 70 09
27.441
704
49 56 16
26.533
754
56 85 16
27.459
705
49 70 25
26.552
755
57 00 25
27.477
706
49 84 36
26.571
756
57 15 36
27.495
707
49 98 49
26 . 589
757
57 30 49
27.514
708
50 12 64
26 . 608
758
57 45 64
27.532
709
50 26 81
26.627
759
57 60 81
27.550
710
50 41 00
26 . 646
760
57 76 00
27.568
711
50 55 21
26 . 665
761
57 9121
27.586
712
50 69 44
26 . 683
762
58 06 44
27.604
713
50 83 69
26.702
763
58 21 69
27.622
714
50 97 96
26.721
764
58 36 96
27.641
715
51 12 25
26.739
765
58 52 25
27.659
716
51 26 56
26 . 758
766
58 67 56
27.677
717
51 40 89
26.777
767
58 82 89
27.695
718
51 55 24
26.796
768
58 98 24
27.713
719
51 69 61
26.814
769
59 13 61
27.731
720
51 84 00
26 . 833
770
59 29 00
27 . 749
721
51 98 41
26.851
771
59 44 41
27.767
722
52 12 84
26 . 870
772
59 59 84
27 . 785
723
52 27 29
26 . 889
773
59 75 29
27 . 803
724
52 41 76
26.907
774
59 90 76
27.821
725
52 56 25
26.926
775
60 06 25
27.839
726
52 70 76
26.944
776
60 21 76
27.857
727
52 85 29
26 . 963
777
60 37 29
27.875
728
52 99 84
26.981
778
60 52 84
27.893
729
53 14 41
27 . 000
779
60 68 41
27.911
730
53 29 00
27.019
780
60 84 00
27.92S
731
53 43 61
27.037
781
60 99 61
27.946
732
53 58 24
27 . 055
782
61 15 24
27.964
733
53 72 89
27.074
783
61 30 89
27 . 982
734
53 87 56
27.092
784
61 46 56
28.000
735
54 02 25
27.111
785
61 62 25
28.018
736
54 16 96
27.129
786
61 77 96
2S.036
737
54 31 69
27.148
787
61 93 69
28.054
738
54 46 44
27.166
788
62 09 44
2S.071
739
54 61 21
27.185
789
62 25 21
28.089
740
54 76 00
27 . 203
790
62 41 00
2S.107
741
54 90 81
27.221
791
62 56 SI
28.125
742
55 05 64
27 . 240
792
62 72 64
28.142
743
55 20 49
27.258
793
62 88 49
28.160
744
55 35 36
27.276
794
63 04 36
28.178
745
55 50 25
27 . 295
795
63 20 25
28.196
746
55 65 16
27.313
796
63 36 16
28.213
747
55 80 09
27.331
797
63 52 09
28.231
748
55 95 04
27.350
798
63 68 04
28.249
749
56 10 01
27 . 368
799
63 84 01
28.267
750
56 25 00
27.386
800
64 00 00
2S.2S4
STATISTICAL METHOD AND TEST RESULTS
311
Table of Squares and Square Hoots — Continued
lumber
Square
Square Root
801
64 16 01
28.302
802
64 32 04
28.320
803
64 48 09
28.337
804
64 64 16
28.355
805
64 80 25
28.373
806
64 96 36
28.390
807
65 12 49
28.408
808
65 28 64
28 . 425
809
65 44 81
28.443
810
65 61 00
28.460
811
65 77 21
28.478
812
65 93 44
28.496
813
66 09 69
28.513
814
66 25 96
28.531
815
66 42 25
28.548
816
66 58 56
28.566
817
66 74 89
28.583
818
66 91 24
28.601
819
67 07 61
28.618
820
67 24 00
28 . 636
821
67 40 41
28.653
822
67 56 84
28.671
823
67 73 29
28.688
824
67 89 76
28.705
825
68 06 25
28.723
826
68 22 76
28.740
827
68 39 29
28.758
828
68 55 84
28.775
829
68 72 41
{ 28.792
830
68 89 00
28.810
831
69 05 61
28 . 827
832
69 22 24
28.844
833
69 38 89
28.862
834
69 55 56
28 . 879
835
69 72 25
28.896
836
69 88 96
28.914
837
70 05 69
28.931
838
70 22 44
28 . 948
839
70 39 21
28.965
840
70 56 00
28.983
841
70 72 81
29 . 000
842
70 89 64
29.017
843
71 06 49
29 . 034
844
71 23 36
29 . 052
845
7140 25
29 . 069
846
71 57 16
29.086
847
71 74 09
29.103
848
71 91 04
29.120
849
72 08 01
29.138
850
72 25 00
29.155
Number
Square
Square Root
851
72 42 01
29.172
852
72 59 04
29.189
853
72 76 09
29 . 206
854
72 93 16
29 . 223
855
73 10 25
29.240
856
73 27 36
29.257
857
73 44 49
29.275
858
73 61 64
29 . 292
859
73 78 81
29.309
860
73 96 00
29 . 326
861
74 13 21
29 . 343
862
74 30 44
29 . 360
863
74 47 69
29.377
864
74 64 96
29.394
865
74 82 25
29.411
866
74 99 56
29 . 428
867
75 16 89
29.445
868
75 34 24
29 . 462
869
75 51 61
29 . 479
870
75 69 00
29.496
871
75 86 41
29.513
872
76 03 84
29 . 530
873
76 21 29
29 . 547
874
76 38 76
29 . 563
875
76 56 25
29.580
876
76 73 76
29.597
877
76 91 29
29.614
878
77 08 84
29.631
879
77 26 41
29 . 648
880
77 44 00
29 . 665
881
77 61 61
29.682
882
77 79 24
29 . 698
883
77 96 89
29.715
884
78 14 56
29.732
885
78 32 25
29 . 749
886
78 49 96
29 . 766
887
78 67 69
29 . 783
888
78 85 44
29.799
889
79 03 21
29.816
890
79 21 00
29.833
891
79 38 81
29 . 850
892
79 56 64
29.866
893
79 74 49
29 . 883
894
79 92 36
29.900
895
80 10 25
29.916
896
80 28 16
29 . 933
897
80 46 09
29 . 950
898
80 64 04
29 . 967
899
80 82 01
29.983
900
81 00 00
30.000
312 STATISTICS IN PSYCHOLOGY AND EDUCATION
Table of Squares and Square Roots — Continued
Number
Square
Square Root
Number
Square
Square Root
901
81 18 01
30.017
951
90 44 01
30.838
902
81 36 04
30 . 033
952
90 63 04
30 . 854
903
81 54 09
30 . 050
953
90 82 09
30.871
904
81 72 16
30.067
954
91 01 16
30 . 887
905
81 90 25
30.083
955
91 20 25
30.903
906
82 08 36
30.100
956
91 39 36
30.919
907
82 26 49
30.116
957
91 58 49
30.935
908
82 44 64
30.133
958
91 77 64
30.952
909
82 62 81
30.150
959
91 96 81
30.968
910
82 81 00
30.166
960
92 16 00
30 . 984
911
82 99 21
30.183
961
92 35 21
31.000
912
83 17 44
30.199
962
92 54 44
31.016
913
83 35 69
30.216
963
92 73 69
31.032
914
83 53 96
30.232
964
92 92 96
31.048
915
83 72 25
30.249
965
93 12 25
31.064
916
83 90 56
30.265
966
93 31 56
31.081
917
84 08 89
30 . 282
967
93 50 89
31.097
918
84 27 24
30 . 299
968
93 70 24
31.113
919
84 45 61
30.315
969
93 89 61
31.129
920
84 64 00
30.332
970
94 09 00
31.145
921
84 82 41
30.348
971
94 28 41
31.161
922
85 00 84
30.364
972
94 47 84
31.177
923
85 19 29
30.381
973
94 67 29
31.193
924
85 37 76
30.397
974
94 86 76
31.209
925
85 56 25
30.414
975
95 06 25
31.225
926
85 74 76
30.430
976
95 25 76
31.241
927
85 93 29
30.447
977
95 45 29
31.257
928
86 11 84
30 . 463
978
95 64 84
31.273
929
86 30 41
30.480
979
95 84 41
31.289
930
86 49 00
30.496
980
96 04 00
31.305
931
86 67 61
30.512
981
96 23 61
31.321
932
86 86 24
30 . 529
982
96 43 24
31.337
933
87 04 89
30 . 545
983
96 62 89
31.353
934
87 23 56
30.561
984
96 82 56
31.369
935
87 42 25
30.578
985
97 02 25
31.3S5
936
87 60 96
30.594
986
97 21 96
31.401
937
87 79 69
30.610
987
97 41 69
31.417
938
87 98 44
30 . 627
988
97 61 44
31.432
939
88 17 21
30 . 643
989
97 81 21
31.448
940
88 36 00
30.659
990
98 01 00
31.464
941
88 54 81
30.676
991
9S 20 81
31.4S0
942
88 73 64
30 . 692
992
98 40 64
31.496
943
88 92 49
30.708
993
98 60 49
31.512
944
89 11 36
30.725
994
98 80 36
31.528
945
89 30 25
30.741
995
99 00 25
31.544
946
89 49 16
30.757
996
99 20 16
31.559
947
89 68 09
30.773
997
99 40 09
31.575
948
89 87 04
30.790
998
99 60 04
31.591
949
90 06 01
30.806
999
99 SO 01
31.607
950
90 25 00
30.822
1000
100 00 00
31.623
INDEX
Italics are used for Reference to Definitions.
Age-scale, 109, 110
Array, 155
Attenuation, 211; correction for,
212
Average, 8, 9, 28, 31, 50, 51; relia-
bility of an, 121
Average deviation or AD, 22, 23,
32, 34, 35, 51, 52
Axes, coordinate, 60; use in cor-
relation, 159, 175
Barlow's Tables, 302
Bias in sampling, 144. See Sam-
pling.
Binomial expansion, 79; in prob-
ability, 77-80; graphic repre-
sentation of, 80
Blakeman, J., test for linearity,
210
Bowley, A. L., 302
Bravais, 163
Brown, Wm, 269, 292
Brown and Thomson, 191, 218,
302
Burt, Cyril, 251
Carothers, F. E., 134, 280, 300
Central tendencies, 8-16; reliabil-
ity of measures of, 120-127
Classification of measures into fre-
quency distributions, 2-4
Class-interval. See Step-interval.
Coefficient of alienation, 289
Coefficient of contingency, 198;
computation of, 198-199; com-
parison with correlation coeffi-
cient, 200; short method of
computing, 201
Coefficient of correlation, 1^9;
as a ratio, 152-153; repre-
sented graphically, 158-159;
steps in computation of, from
guessed average, 163-168; steps
in computation of, from aver-
age, 169-170; reliability of,
170; interpretation of, 288-
299. See also Correlation.
Coefficient of regression, 175, 178
Coefficient of variation, calcula-
tion of, 41-42
Coin tossing, in experiments on
laws of chance, 79-81
Column diagram. See Histogram.
Comparison of groups in terms of
central tendencies and variabil-
ities, 42; in terms of overlap-
ping, 45
Comparison of obtained distribu-
tions with normal probability
curve, 81
Contingency method, 195-203.
See also Coefficient of contin-
gency.
Continuous series, 1; tabulation
of measures in, 2-7
Correction, computation of cor-
313
314
INDEX
rection, C, in Short Method,
31; for attenuation, 211
Correlation, 149-152; positive,
negative, and zero, 150-151;
graphic representation of, 161—
162; construction of correla-
tion table, 154; product-mo-
ment method of computing,
163-170; rank methods of
computing, 189-195; spurious,
258; effect of errors of observa-
tion on, 211. See also Par-
tial correlation and Multiple
correlation.
Correlation-ratio ; in non-linear
relation, 204-205; steps in
computing, 206; comparison
with r to determine linearity of
regression, 209-210; correction
of " raw" eta, 209; reliability
of, 208
Criterion, 266; value of, in deter-
mining validity of tests, 266-
267
Cumulative errors, effect on mul-
tiple R, 238-239
Deciles, 45. See Percentiles.
Deviation. See Quartile devia-
tion, Average deviation, and
Standard deviation
Dice throwing, in experiments on
laws of chance, 80-81
Difference, reliability of, between
measures of central tendency,
128-137; reliability of, be-
tween two r's, 171. See Stand-
ard and Probable error.
Discrete series, 2; median in, 12;
short method applied to, 36
Elderton, W. P. and E. M., 301
Equation, of straight line, 175;
plotting of linear, 176-178; of
regression lines, in Deviation
Form, 178-179; in Score Form,
180-182
Error, curve of, 83. See also Nor-
mal curve.
Errors, of sampling, 143; of ob-
servation, 211; constant, 274
variable 274. See also Prob-
able and Standard errors.
Footrule (Spearman's) in corre-
lation, 192-195
Frequency distribution, three
methods of constructing, 3-4
Frequency Polygon, 59-63; com-
parison with histogram, 65
Garrett, H. E., 114
Grades, method of, in correlation,
192. See also Footrule.
Graphic methods, of representing
data, in a frequency distribu-
tion, 59-71; of representing
correlation coefficient, 158-162
Grouping, in tabulation, 3; as-
sumptions in, 5
Heterogeneity, effect of, on cor-
relation, 259; on reliability, 271
Hillegas, Milo B., 108
Histogram, 63-66 ; comparison
with frequenc}r potygon, 65
Holzinger, Karl J., 271
Homogeneity of a group, 17
Hull, Clark, method of transmut-
ing ranks, 111-115; method of
combining tests, 282, 300
Index of reliability, 273
Jerome, Harry, 82
Jones, D. Caradog, S3, S4, 174,
211, 302
INDEX
315
Kelley, T. L., 33, 195, 254, 259,
263, 267, 272, 273, 289, 292, 302
Law of normal frequency, 82
Line graphs, 72-73
Line of means, best fitting line,
160, 173; plotting of, 175;
equation of, 175-182
Linearity of relation, 203; tests
for, 209-210
May, Mark A., 223, 224, 244, 263
McCall, W. A., 109, 110,302
Mean, Arithmetic. See Average.
Mean deviation. See Average
deviation.
Median, 11; 12, 13, 38, 50; reli-
ability of, 126
Methods of combining test scores,
277; by percentiles, 278; by
median mental age, 279; by
variability of test scores, 279-
281; by conversion into com-
parable distributions, 281-284
Middle 50%, 21, 85
Midpoint of step, how to find, 6;
as representative of all the
scores on the step, 6
Midscore, in ungrouped discrete
series when N is even, 12; when
N is odd, 12
Miner, John Rice, 302
Monroe, W. S., 185, 302
Mode, 15, 16,50
Moore, H. L., 255
Multiple coefficient of correlation,
R, 222; computation of, 230-
231; general formula for, 238;
"chance" R, 239; alternate
forms for, 239
Musselman, J. R., 261
Non-linear relation, 203-205
Normal curve, 74', deduction from
binomial expansion, 80; why
employed in psychological meas-
urement, 81-84; properties of,
84-85; use in the solution of a
variety of problems, 94ff; in
test making, 101-109; in trans-
mutation of ranks, 111-115; in
measuring reliability, 123, 131
Normal probability curve. See
Normal curve.
Normal frequency distribution,
83 ; illustrations of, 75
Ogive, 66; construction of, 67,
71; smoothing of, 68; in calcu-
lating percentiles, 69-70
Otis Correlation Chart, 167
Otis, A. S., 217, 259, 272, 302
Overlapping, in the measurement
of groups, 44-45; of elements
or factors in correlation, 291-
299
Partial correlation, 221; illus-
tration of, in three-variable
problem, 223-231; notation in,
232; general formulas for use
in, 231-240; models of four- and
five-variable problems, 240-
244; illustration of, in four-
variable problem, 244-251 ;
value of, in analysis and causal
investigations, 25 Iff ; limitations
to use of, 258
Pearl, Raymond, 295, 297
Pearson, Karl, 163, 200, 205, 209
Percentile scale, 109; evaluation
of, 209
Percentiles, calculation of, 45ff,
percentile scores, 46; graphic
method of finding, 69; method
of combining scores from dif-
ferent tests, 278
316
INDEX
Phillips, Frank M., 252
Pintner and Patterson, 49, 279
Probable error, relation to Q, 21;
relation to other measures of
variability in a normal distri-
bution, 85; use in solution of
problems, 94-109
Probable error, of an average,
125ff; of a median, 126; of a,
127; of a difference, 129; table
for finding reliability of a dif-
ference in terms of, 135; of a
coefficient of correlation, 170-
171
Probable error of estimate, in pre-
diction, 184-185; in partial and
multiple correlation, 237
Probable error of measurement,
274-276
Product-moment method of find-
ing r, deviations from GA, 163-
168; deviations from average,
168-170
Pyle, W. H., 279
Quartile deviation (Q), 17, 18-
22; in discrete series, 40; when
to use, 50
Quartiles, Qi and Qz, computa-
tion of, 18-19
r, Product-moment coefficient of
correlation, formulas for, 167,
168. See Coefficient of correla-
tion, and Correlation.
Random sample, 142-145. See
also Sampling.
Range, 2, 17, 50
Rank difference method of com-
puting correlation, 189ff; when
to use, 195
Ranks, transmutation of, into
units of amount, 11 Iff
Reavis, George, 253
Reduced scores, in combining
test scores, 283-284; in com-
putation of r, 285
Regression equations, deviation
form, 174f ; in score form, 180f ;
partial equations of, 235; non-
linear, 203ff
Regression coefficients, 174, 178
Relative variability, measures of,
40. See also Coefficient of
variation.
Reliability, measures of, 118-137;
limitations to measures of, 142-
145; coefficient of, 268-271;
dependence of coefficient of, on
size and variability of group,
271-272; index of, 273. See
also Probable error and Stand-
ard error.
Rietz, H. L., et al., 302
Rosenow, Curt, 239
Ruch-Stoddard Correlation Sheet,
167
Ruch, G. M. and Del Manzo,
M. C, 271, 299
Rugg, H. O., 301
Sampling, random, 120; errors
of, 142-143; unreliability due
to, 144; criteria of, 144
Scaling total scores, 109. See also
Percentile scale, Age-scale, T-
scale.
Scatter diagram, 154
Score, meaning of, 7
Secrist, Horace, 302
Semi-interquartile range, 21. See
Quartile deviation.
Skewness, 86-89
Sommerville, R. C, 56, 219
Spearman, C, 212, 213
INDEX
317
Spearman's Footrule, 192; proph-
ecy formula,269
Spurious Correlation, 258-261
Standard deviation (a), 26, 27, 35;
relation to other measures of
variability, 85; reliability of,
127; general formulas for par-
tial o-'s, 233-235; of the sum or
difference of corresponding val-
ues of two series of test scores,
286-288
Standard error, of an average,
121-125; of a median, 126;
of a (7, 127; of a Q, 128; of a
difference, 128-133; table for
finding the reliability of a dif-
ference in terms of, 134; of a
sum or difference, measures
correlated, and uncorrelated,
187
Standard error of estimate, in
prediction, 183; in partial and
multiple correlation, 237; in
interpreting, r, 288-290
Standard error of measurement,
274-276 ; in interpreting r, 290-
291
Step-interval, 2, 3, 4, 5; midpoint
of, 5-6; assumptions with re-
gard to data on, 5-6
Tables of frequencies of normal
probability curve, in terms of a,
91; in terms of PE, 93
Tabulation, of measures into fre-
quency distribution, 3f; of
correlation table, 154
Thorndike, E. L., 88, 301
Thurstone Correlation Sheet, 167
Thurstone, L. L., 302
Trabue, M. R., 127, 137
Transmutation of ranks into units
of amount, 111
T-scale, 110
True scores, 118, 272-273
Validity, measurement of, in a
test, 266-268
Variable errors, effect on r, 211;
measurement of, 274-276
Variability, 16; causes of, 82, 88;
comparison of groups with re-
spect to, 42-44; coefficient of
relative, 41 ; reliability of meas-
ures of, 127-128. See also Aver-
age deviation, Quartile devia-
tion, and Standard deviation.
Weighting of tests, by variability
of test scores, 279
Whitley, M. T., 267
Whipple, G. M., 279
Woody, Clifford, 104, 105, 107
Woodworth, R. S., method of
combining tests, 283; use of
" reduced scores " in comput-
ing r, 285
Yule, G. Udny, 80, 121, 122, 196,
200, 210, 212, 218, 221, 237, 286,
302
HA33 Garrett, Henry Edward!' X
education!108 " **«**>* and
G192
n
-^hr-^J^i
Date Due
Hfc33 c. 1
G192 Garrett, Henry Edward.
author statistics in psychology
and Education.
TITLE
DATE DUE
BORROWER'S NAME
Ju
:uj^^^ Ca
o q»,3fl^
L'^^^>-^ i_