Skip to main content

Full text of "Statistics in Psychology and Education"

See other formats


H_ 


American  Foundation 
ForThe  Blind  inc. 


Digitized  by  the  Internet  Archive 
in  2012  with  funding  from    - 
Lyrasis  Members  and  Sloan  Foundation 


http://www.archive.org/details/statisticsinpsycOOhenr 


STATISTICS  IN  PSYCHOLOGY 
AND   EDUCATION 


STATISTICS  IN  PSYCHOLOGY 
AND  EDUCATION 


BY 

HENRY  E.   GARRETT 

ASSISTANT   PROFESSOR    OF    PSYCHOLOGY,    COLUMBIA    UNIVERSITY 


WITH   AN   INTRODUCTION   BY 

R.  S.  WOODWORTH 

PROFESSOR   OF   PSYCHOLOGY,    COLUMBIA    UNIVERSITY 


LONGMANS,   GREEN  AND  CO. 

55    FIFTH     AVENUE,     NEW    YORK 

CHICAGO,  TORONTO,  LONDON 

1926 


Copyright,  1926,  by 
LONGMANS,  GREEN  AND  CO. 


First  Edition,  January,  192G 
Reprinted,  November,  1926 


MADE   IN    THJB    UNITED   STATES 


INTRODUCTION 

Modern  problems  and  needs  are  forcing  statistical  methods 
and  statistical  ideas  more  and  more  to  the  fore.  There  are  so 
many  things  we  wish  to  know  which  cannot  be  discovered  by  a 
single  observation,  or  by  a  single  measurement.  We  wish  to 
envisage  the  behavior  of  a  man  who,  like  all  men,  is  rather  a 
variable  quantity,  and  must  be  observed  repeatedly  and  not 
once  for  all.  We  wish  to  study  the  social  group,  composed  of 
individuals  differing  one  from  another.  We  should  like  to  be 
able  to  compare  one  group  with  another,  one  race  with  another, 
as  well  as  one  individual  with  another  individual,  or  the  indi- 
vidual with  the  norm  for  his  age,  race  or  class.  We  wish  to 
trace  the  curve  which  pictures  the  growth  of  a  child,  or  of  a 
population.  We  wish  to  disentangle  the  interwoven  factors  of 
heredity  and  environment  which  influence  the  development  of 
the  individual,  and  to  measure  the  similarly  interwoven  effects 
of  laws,  social  customs  and  economic  conditions  upon  public 
health,  safety  and  welfare  generally.  Even  if  our  statistical 
appetite  is  far  from  keen,  we  all  of  us  should  like  to  know  enough 
to  understand,  or  to  withstand,  the  statistics  that  are  constantly 
being  thrown  at  us  in  print  or  conversation— much  of  it  pretty 
bad  statistics.  The  only  cure  for  bad  statistics  is  apparently 
more  and  better  statistics.  All  in  all,  it  certainly  appears  that 
the  rudiments  of  sound  statistical  sense  are  coming  to  be  an 
essential  of  a  liberal  education. 

Now  there  are  different  orders  of  statisticians.  There  is, 
first  in  order,  the  mathematician  who  invents  the  method  for 
performing  a  certain  type  of  statistical  job.  His  interest,  as  a 
mathematician,  is  not  in  the  educational,  social  or  psychological 
problems  just  alluded  to,  but  in  the  problem  of  devising  instru- 


VI  INTRODUCTION 

ments  for  handling  such  matters.  He  is  the  tool-maker  of  the 
statistical  industry,  and  one  good  tool-maker  can  supply  many 
skilled  workers.  The  latter  are  quite  another  order  of  statisti- 
cians. Supply  them  with  the  mathematician's  formulas,  map 
out  the  procedure  for  them  to  follow,  provide  working  charts, 
tables  and  calculating  machines,  and  they  will  compute  from 
your  data  the  necessary  averages,  probable  errors  and  correla- 
tion coefficients.  Their  interest,  as  computers,  lies  in  the  quick 
and  accurate  handling  of  the  tools  of  the  trade.  But  there  is 
a  statistician  of  yet  another  order,  in  between  the  other  two. 
His  primary  interest  is  psychological,  perhaps,  or  it  may  be 
educational.  It  is  he  who  has  selected  the  scientific  or  practical 
problem,  who  has  organized  his  attack  upon  the  problem  in 
such  fashion  that  the  data  obtained  can  be  handled  in  some 
sound  statistical  way.  He  selects  the  statistical  tools  to  be 
employed,  and,  when  the  computers  have  done  their  work,  he 
scrutinizes  the  results  for  their  bearing  upon  the  scientific  or 
practical  problem  with  which  he  started.  Such  an  one,  in 
short,  must  have  a  discriminating  knowledge  of  the  kit  of 
tools  which  the  mathematician  has  handed  him,  as  well  as  some 
skill  in  their  actual  use. 

The  reader  of  the  present  book  will  quickly  discern  that  it 
is  intended  primarily  for  statisticians  of  the  last-mentioned 
type.  It  lays  out  before  him  the  tools  of  the  trade;  it  explains 
very  fully  and  carefully  the  manner  of  handling  each  tool;  it 
affords  practice  in  the  use  of  each.  While  it  has  little  to  say  of 
the  tool-maker's  art,  it  takes  great  pains  to  make  clear  the  use 
and  limitations  of  each  tool.  As  any  one  can  readily  see  who 
has  tried  to  teach  statistics  to  the  class  of  students  who  most 
need  to  know  the  subject,  this  book  is  the  product  of  a  genuine 
teacher's  experience,  and  is  exceptionally  well  adapted  to  the 
student's  use.  To  an  unusual  degree,  it  succeeds  in  meeting 
the  student  upon  his  own  ground. 

R.  S.  Woodworth 
Columbia  University 


PREFACE 

The  present  day  emphasis  on  measurement  and  the  quanti- 
tative treatment  of  results  has  made  a  knowledge  of  statistical 
method  not  only  extremely  useful  but  almost  necessary  to  the 
student  of  psychology,  education,  and  the  social  sciences.  To 
those  who  have  been  well  trained  in  mathematics,  the  acquisi- 
tion of  statistical  technique  offers  no  particular  difficulty.  To 
many  otherwise  capable  students,  however,  either  because  of 
inadequate  preparation  in  mathematics,  or  because  their  prep- 
aration is  not  very  recent,  the  application  of  statistical  method 
to  data  obtained  from  test  and  experiment  is  more  than 
ordinarily  difficult. 

It  is  for  this  last  group  of  students,  especially,  that  this 
book  has  been  written.  Its  primary  purpose  is  to  present  the 
subject  in  a  simple  and  concise  form  understandable  to  those 
who  have  no  previous  knowledge  of  statistical  method.  With 
this  end  in  view,  theory  has  everywhere  been  subordinated  to 
practical  application,  and  numerous  illustrations  of  the  various 
statistical  devices  have  been  provided.  References  have  been 
given,  however,  for  the  benefit  of  those  interested  in  the  mathe- 
matical theory  underlying  the  methods  introduced. 

The  reader  will  note  that  in  nearly  all  cases  formulas  have 
simply  been  stated  without  proof.  This  has  been  done,  because 
the  writer  believes  that  most  students  of  mental  and  social 
measurement  are — and  probably  should  be — more  concerned 
with  what  a  formula  means  and  does  than  in  how  it  is  derived. 
There  is  considerable  justification  for  such  an  attitude.  In 
every  science  certain  facts  obtained  from  other  fields  must  be 
taken  on  faith.  We  do  not,  to  take  a  simple  example,  restrict 
the  use  of  the  radio  or  the  microscope  to  those  who  understand 
the  physical  principles  involved,  and  there  seems  to  be  no  real 

yii 


vni  PREFACE 

reason  why  a  student  of  psychology  should  not  make  good  use 
of  a  correlation  formula  when  he  cannot  derive  it  mathe- 
matically. 

A  chapter  has  been  given  to  the  subject  of  reliability — a 
topic  too  often  passed  over  lightly — and  considerable  space  has 
been  devoted  to  correlation.  An  entire  chapter,  also,  has  been 
given  to  partial  and  multiple  correlation.  This  method,  while 
comparatively  recent,  is  being  widely  used  in  educational 
research,  and  is  probably  destined  in  the  near  future  to  be  more 
often  used  in  the  psychological  laboratory.  In  the  last  chapter, 
the  application  of  correlation  and  other  statistical  methods  is 
shown  to  tests  and  testing. 

Many  have  contributed  to  the  making  of  this  book  of  whom 

only  a  few  can  be  mentioned.     To  Professors  R.  S.  Woodworth 

and  Mark  A.  May  who  read  the  manuscript,  the  writer  is 

indebted  for  many  useful  and  constructive  criticisms.     He  is 

also  grateful  to  Dr.  M.  R.  Neifeld,  to  Mr.  V.  W.  Lemmon, 

and  to  Miss  Elizabeth  Farber  for  computations  and  helpful 

suggestions. 

Henry  E.  Garrett 
Columbia  University 


CONTENTS 


CHAPTER  I 
THE   FREQUENCY   DISTRIBUTION 

SECTION  PAGE 

I.  The  Tabulation  of  Measures  into  a  Frequency  Distribu- 

tion     1 

1.  Measures  in  General:   Continuous  and  Discrete      ...  1 

2.  Classification  of  Measures  in  Continuous  Series      ...  2 

3.  Three  Ways  of  Expressing  the  Limits  of  a  Step-interval    .  5 

4.  The  Meaning  of  a  Single  Score  in  a  Continuous  Series      .  7 

II.  Measures  of  Central  Tendency 8 

1.  The  Average,  or  Arithmetic  Mean     .      ...      .      .      .  8 

2.  The  Median 11 

3.  The  Mode     .      . 15 

III.  Measures  of  Variability 16 

1.  The  Range 17 

2.  The  Quartile  Deviation,  or  Q 17 

3.  The  Average  Deviation,  or  AD 22 

4.  The  Standard  Deviation,  or  SD 26 

( 

IV.  The  Short  Method  of  Finding  the  Average,   AD,   and 

SD(a) 28 

1.  The  Calculation  of  the  Average  by  the  Short  Method       .  28 

2.  The  Calculation  of  the  AD  by  the  Short  Method   ...  32 

A.  The  Calculation  of  the  AD  from  the  Average   ...  32 

B.  The  Calculation  of  the  AD  from  the  Median     ...  35 

3.  The  Calculation  of  the  Standard  Deviation  by  the  Short 

Method 35 

4.  The  Short  Method  Applied  to  Discrete  Series  ....  36 

V.  The  Comparison  of  Groups 40 

1.  The  Measurement  of  Relative  Variability 40 

2.  The   Comparison  of  Two  Groups  in  Terms  of  Central 

Tendency  and  Variability 42 

3.  The  Comparison  of  Two  Groups  in  Terms  of  Overlapping  44 

VI.  The  Calculation  of  the  Percentiles  in  a  Frequency  Dis- 
tribution      45 

is 


X  CONTENTS 

SECTION  PAGE 

VII.  When  to  Use  the  Different  Measures  of  Central  Ten- 
dency and  Variability 50 

VIII.  Summary  of  Formulas  for  Finding  the  Measures  of  Cen- 
tral Tendency  and  Variability 51 

IX.  Illustrative  Problems 53 


CHAPTER  II 

GRAPHIC  METHODS  AND  THE  NORMAL  CURVE 

I.  The  Graphic  Representation  of  the  Frequency  Distribu- 
tion      59 

1.  The  Frequency  Polygon 59 

2.  The  Histogram  or  Column  Diagram 63 

3.  The  Ogive,  or  Cumulative  Frequency  Graph  ...      .      .66 

II.  Other  Uses  of  Graphical' Methods:   the  Comparative  Line 

Graph .71 

III.  The  Normal  Probability  Curve 74 

1.  Elementary  Principles  of  Probability         , 76 

2.  Why  the  Probability  Curve  is  Employed  in  Psychological 

Measurement - 81 

3.  Important  Properties  of  the  Normal  Curve 84 

4.  The  Measurement  of  Skewness 86 

IV.  Some  Practical  Applications  of  the  Normal  Curve     .     .     89 

1.  The  Construction  and  Use  of  Tables  X  and  XI     ....     89 

2.  A  Variety  of  Problems  Solved  by  Means  of  Tables  X  and  XI     94 

3.  The  Arrangement  of  Problems  or  other  Test  Items  into  a 

Scale  in  Which  the  Difficulty  of  Each  Item  is  Known  with 
Reference  to  Each  Other  Item  as  Well  as  Some  Selected 
Zero  Point 101 

4.  The   Conversion   of  Judgments  by  Relative   Position — or 

Relative  Merit — into  a  or  PE  Positions  on  the  Scale   .      .    107 

5.  The  Scaling  of  Total  Scores  on  a  Test 109 

V.  The  Transmutation  of  Measures  by  Relative  Position 
(in  Order  of  Merit)  into  Units  of  Amount  on  the 
Assumption  of  Normality  in  the  Trait  Measured     .   Ill 

CHAPTER  III 
THE  RELIABILITY  OF   MEASURES 
I,  What  is  Meant  by  the  Reliability  of  a  Measure      .      .  118 


CONTENTS  XI 

SECTION  PAGE 

II.  The  Reliability  of  Measures  op  Central  Tendency  .      .  120 

1.  The  Reliability  of  the  Average  or  Mean 120 

A.  In  Terms  of  the  Standard  Error,  <rav. 120 

B.  In  Terms  of  the  Probable  Error,  PEav.       .      .      .      .125 

2.  The  Reliability  of  the  Median 126 

III.  The  Reliability  of  Measures  of  Variability     ....  127 

1.  The  Standard  Deviation,  or  a 127 

2.  The  Quartile  Deviation,  or  Q 128 

IV.  The  Reliability  of  the  Difference  between  Two  Measures  128 

1.  The  Reliability  of  the  Difference  between  Two  Averages     .  128 

A.  In  Terms  of  the  o"(diff.) 128 

B.  In  Terms  of  the  PE(dm.) 133 

2.  The  Reliability  of  the  Difference  between  Two  Medians   .  136 

V.  Some  Problems  which  Involve  Measures  of  Reliability     .  138 

VI.  Limitations  to  the  Reliability  Formulas,  and  Cautions  to 

be  Observed  in  Interpreting  Them         142 

VII.  Summary  of  Reliability  Formulas 145 

CHAPTER  IV 

CORRELATION 

I.  What  is  Meant  by  Correlation 149 

II.  The  Coefficient  of  Correlation:   What  it  is,  and  what  it 

Does 152 

1.  The  Coefficient  of  Correlation  as  a  Ratio 152 

2.  Graphical  Representation  of  the  Coefficient  of  Correlation  158 

III.  The  Calculation  of  the  Coefficient  of  Correlation  by 

the  Product-moment  Method 163 

1.  The    Product-moment    Formula    when    Deviations    are 

Taken  from  the  Guessed  Averages  of  the  Two  Distri- 
butions        163 

2.  The    Product-moment     Formula    when    Deviations    are 

Taken  from  the  Actual  Averages  of  the  Two  Distribu- 
tions       168 

IV.  The  Probable  Error  of  a  Coefficient  of  Correlation     .  170 

1.  The  PEr . 170 

2.  The  PE  of  the  Difference  between  Two  r's 171 

V.  The  Regression  Equations 173 

1.  In  Deviation  Form 173 

2.  The  Regression  Equations  in  Score  Form 180 

3.  The   Reliability   of   the    " Predictions"    made   from   the 

Regression  Equations 183 


xii  CONTENTS 

SECTION  PAGE 

VI.  The  Complete  Solution  of  a  Correlation  Problem  .     .  185 

VII.  Methods  of  Measuring  Correlation  which  Take  Account 

only  of  the  Relative  Position  or  Rank     .     .     .  189 

1.  The  Method  of  Rank-differences 190 

2.  The  Method  of  Gains,  or  the  Spearman  Footrule   .      .      .  192 

3.  Summary  of  the  Rank  Methods 195 

VIII.  A  Method  of  Measuring  Relationship  when  the  Data  are 

Grouped  into  Classes  or  Categories.        The    Contin- 
gency Method 195 

IX.  Non-linear  Relationship 203 

1.  The  Correlation  Ratio 203 

2.  The  Correction  of  "raw"  eta  .      .      . 209 

3.  Test  of  Linearity  of  Regression  ; 209 

X.  The   Correction   of  a   Coefficient  of  Correlation  for 

"Attenuation." 211 

XI.  Summary  of  Formulas  in  Chapter  IV 213 

CHAPTER   V 

PARTIAL  AND   MULTIPLE   CORRELATION 

I.  The  Meaning  of  Partial  and  Multiple  Correlation  .     .   221 

II.  A  Correlation  Problem  Involving  3  Variables       .      .*     .  223 

III.  General  Formulas  for  Use  in  Partial  and  Multiple  Corre- 
lation  231 

1.  General  Formulas  for  Partial  r's 231 

2.  General  Formulas  for  Partial  o-'s  of  any  Order    ....   233 

3.  General  Formulas  for  the  Regression   Equation,  and  Co- 

efficients of  Regression 235 

4.  General  Formulas  for  Standard  and  Probable  Errors  of 

Estimate 237 

5.  General  Formula  for  R,  the  Coefficient  of  Multiple  Correla- 

tion         23S 

6.  Outline  of  the  Formulas  Needed  in  Correlation  Problems 

which  Involve  (a)  Four  Variables,  (6)  Five  Variables     .   240 

IV.  A  Multiple  Correlation  Problem  Involving  4  Variables  .  244 

V.  The  Value  and  Use  of  Partial  and  Multiple    Correlation  251 

VI.  Spurious  Correlation 258 

1.  Spurious  Correlation  Due  to  Heterogeneity  of  Material   .   25S 

2.  Spurious  Index  Correlation 260 


CONTENTS  xin 

SECTION  PAGE 

3.  Spurious  Correlation  of  a  Single  Test  with  a  Composite  of 

which  it  is  a  Member 260 

VII.    SUMMARY   OF   FORMULAS    IN    CHAPTER   V 261 

CHAPTER  VI 

SOME  APPLICATIONS  OF  STATISTICAL  METHOD  AND 
TECHNIQUE   TO  TESTS  AND   TEST  RESULTS 

I.  The  Validity  of  Test  Scores 266 

1.  Validity  Determined  through  Correlation  with  a  Criterion   .   266 

2.  Indirect  Measures  of  Validity 267 

II.  The  Reliability  of  Test  Scores 268 

1.  The  Reliability  of  a  Test  as  Measured  by  its  Self-Correla- 
tion      268 

(A)  The   " Reliability  Coefficient" 268 

(B)  Effect  on  Reliability  of  Lengthening  or  Repeating  the 

Test 269 

(C)  Coefficient  of  Reliability  from  One  Application  of  a 

Test 271 

(D)  Dependence  of  the  Reliability  Coefficient  on  the  Size 

and  Variability  of  the  Group         271 

2. '  The  Index  of  Reliability 272 

3.  The  Standard  Error  and  Probable  Error  of  Measurement: 

<T(M)  and  PE(M) 274 

III.  Combining  the  Scores  from  Different  Tests      ....   277 

1.  Combining  Test  Scores  by  Percentiles 278 

2.  Combining  Test  Scores  by  the  Method  of  Median  Mental 

Age 279 

3.  Combining  Tests  which  have  been  Weighted  According  to 

the  Variability  of  the  Test  Scores 279 

4.  Combining  Test  Scores  by  Converting  the  Scores  of  Dif- 

ferent Tests  into  Comparable  Series 281 

IV.  The  a  of  the  Sum  or  Difference  of  Corresponding   Values 

of  Two  Series  of  Test  Scores      . 286 

V.  How  to  Interpret  the  Coefficient  of  Correlation  between 

Two  Tests  or  Other  Measures 288 

1.  The  Interpretation  of  a  Coefficient  of  Correlation  in  Terms 

ofo-(est.) 288 

2.  The  Iiiterpretation  of  a  Coefficient  of  Correlation  in  terms 

of  the  Standard  Error  of  Measurement,  cr^M)      .      .      .      .  290 

3.  Interpretation  of  a  Coefficient  of  Correlation  in  Terms  of  the 

Percentage  of  Common  (Overlapping)  Elements  or  Fac- 
tors    291 


STATISTICS  IN  PSYCHOLOGY 
AND  EDUCATION 


CHAPTER  I 

THE    FREQUENCY   DISTRIBUTION 

I.  The  Tabulation  of  Measures  into  a  Frequency 

Distribution 

1.  Measures  in  General :   Continuous  and  Discrete  Series 

In  the  measurement  of  mental  and  social  traits  or  capacities 
most  of  the  facts  with  which  we  deal  fall  into  what  are  known 
as  continuous  series.  A  continuous  series  may  be  defined 
simply  as  a  series  which  is  theoretically  capable  of  any  degree 
of  subdivision.  JQ's,  for  example,  are  generally  thought  of  as 
increasing  by  increments  of  1  on  a  scale  which  extends  from  the 
idiot  to  the  genius;  however,  there  is  actually  no  real  reason — 
at  least  theoretically — why  with  more  refined  methods  of 
measurement  we  should  not  be  able  to  get  IQ's  of  100.8  or  even 
100.83.  Nearly  all  capacities  measured  by  mental  and  educa- 
tional tests  and  scales,  as  well  as  such  attributes  as  height, 
weight,  cephalic  index,  etc.,  have  been  found  to  be  continuous, 
so  that  within  the  range  of  the  scale  used,  any  measure — 
integral  or  fractional — may  exist  and  have  meaning.  When- 
ever gaps  occur  in  a  truly  continuous  series,  therefore,  these  are 
usually  to  be  attributed  to  our  failure  to  measure  enough  cases, 
or  to  the  relative  crudity  of  our  measuring  instruments,  or 


2  STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

to  some  other  fact  of  the  same  sort,  rather  than  to  the  fact  that 
no  measures  exist  within  the  gaps. 

There  are,  however,  measures  which  do  not  fall  into  continu- 
ous series.  Thus  a  salary  scale  in  a  department  store  may  run 
from  $10  per  week  to  $20  per  week  in  units  of  50  cents  or  $1.00; 
no  one  receives,  let  us  say,  $17.53  per  week.  Or,  to  take 
another  example,  the  average  family  in  a  certain  locality  may 
work  out  mathematically  to  be  4.57  children,  although  there  is 
obviously  a  real  gap  between  four  and  five  children.  Series 
like  these,  which  contain  real  gaps,  are  called  discrete  or  dis- 
continuous. 

It  is  probably  fortunate— at  least  from  the  standpoint  of  the 
beginner  in  statistics— that  nearly  all  of  the  measures  which  we 
make  in  psychology  are  continuous  or  can  be  treated  as  con- 
tinuous. This  considerably  simplifies  the  problem,  inasmuch  as 
we  may  concern  ourselves  (for  the  present  at  least)  almost 
entirely  with  methods  of  handling  continuous  data,  postponing 
the  discussion  of  discrete  series  to  a  later  page. 

2.  The  Classification  of  Measures  in  Continuous  Series 

Data  collected  from  test  or  experiment  are  often  merely  a 
series  of  numbers  or  mass  of  figures  without  meaning  or  signifi- 
cance until  they  have  been  rearranged  or  classified  in  some 
systematic  way.  The  first  task  that  confronts  us,  then,  is  the 
organization  of  our  material,  and  this  leads  naturally  to  a 
grouping  of  the  measures  into  classes  or  categories.  The  pro- 
cedure in  grouping  falls  under  three  main  heads,  which  are 
given  in  order  below: 

(1)  The  determination  of  the  range:  the  interval  between 
the  largest  and  the  smallest  measures.  The  range  is  easily 
found  by  subtracting  the  smallest  from  the  largest  measure. 

(2)  Deciding  upon  the  number  and  size  of  the  groups  to  be 
used  in  classification.  The  number  and  the  size  of  these  steps 
or  class-intervals  depend  largely  upon  the  range  and  the  kind  of 
measures  with  which  we  are  dealing. 


THE  FREQUENCY  DISTRIBUTION 


(3)  The  tabulation  of  the  separate  measures  within  their 
proper  step-  or  class-intervals. 


TABLE  I 

Army  Alpha  Scores  Made 

by  54  Columbia  College  Men 

1.    THE   ORIGINAL   ! 

SCORES    (UNGROUPED) 

185 

174 

127 

183 

168 

* 

126       177       154 

157       189 

172 

*201 

158 

160 

179 

184 

155       137       177 

164       198 

176 

188 

197 

151 

188 

188 

169       195       165 

185       188 

164 

195 

176 

185 

185 

179 

146       182       153 

158       160 

191 

176 

138 

185 

155 

178 

151       144       191 

170       157 

*  Maximum  score  = 

201 

*  Minimum  score  = 

=  126. 

2.   THE   SAME   SCORES    GROUPED    INTO    A   FREQUENCY 

DISTRIBUTION   BY 

THREE 

METHODS 

(A) 

(B) 

(C) 

(1) 

(2) 

(3) 

Scores 

Tabulat: 

ion 

F 

Scores 

F 

Scores 

F 

200  up 

to  205 

/ 

1 

200-204.99 

1 

200-204 

1 

195  " 

"  200 

//// 

4 

195-199.99 

4 

195-199 

4 

190  " 

"  195 

//, 

2 

190-194.99 

2 

190-194 

2 

185  " 

"  190 

MU 

10 

185-189.99 

10 

185-189 

10 

180  " 

"  185 

'ill" 

3 

180-184.99 

3 

180-184 

3 

175  " 

"  180 

mu 

III 

8 

175-179.99 

8 

175-179 

8 

170  " 

"  175 

in 

3 

170-174.99 

3 

170-174 

3 

165  " 

"  170 

in 

3 

165-169.99 

3 

165-169 

3 

160  " 

"  165 

mi 

4 

160-164.99 

4 

160-164 

4 

155  " 

"  160 

mu 

1 

6 

155-159.99 

6 

155-159 

6 

150  " 

"  155 

mi 

4 

150-154.99 

4 

150-154 

4 

145  " 

"  150 

i 

( 

1 

145-149.99 

1 

145-149 

1 

140  " 

"  145 

i 

1 

140-144.99 

1 

140-144 

1 

135  " 

"  140 

ii 

2 

135-139.99 

2 

135-139 

2 

130  " 

"  135 

0 

130-134.99 

0 

130-134 

0 

125  " 

"  130 

n 

2 

125-129.99 

2 

125-129 

2 

AC- 

54 

AT  = 

54 

N 

=  54 

These  three  principles  of  classification  are  illustrated  in 
Table  I.  The  figures  in  this  table  represent  the  Army  Alpha 
scores  received  by  54  college  men.  Since  the  highest  score  is 
201,  and  the  lowest  126,  the  range  is  found  at  once  to  be  exactly 
75  points.  In  deciding  upon  the  number  of  "steps"  or  class- 
intervals  to  be  used  in  grouping,  the  best  general  rule  is  to  select 
by  trial  a  step-interval  which  will  yield  not  more  than  20  nor 
less  than  10  steps.  The  number  of  steps  which  a  given  interval 
will  yield  can  be  determined  approximately  (within  one  step) 


4  STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

by  dividing  the  range  by  the  step  tentatively  chosen.  In  the 
present  problem,  for  example,  75  (the  range)  divided  by  5  (the 
step-interval)  gives  15,  which  is  one  less  than  the  actual  number 
of  steps,  namely  16.  A  step-interval  of  3  points  will  yield 
approximately  25  steps,  while  a  step-interval  of  10  points  will 
yield  approximately  7.5  steps.  (Actually,  for  the  given  data,  a 
step-interval  of  3  points  yields  26  steps,  and  one  of  10  points 
8  steps.) 

The  tabulation  of  the  separate  scores  within  their  appro- 
priate step-  or  class-intervals  is  shown  in  Table  I(2A).  In  the 
first  column  of  this  table, — in  the  column  marked  "  Scores, " — 
the  step-intervals  have  been  listed  serially,  with  the  smallest 
measures  at  the  bottom  of  the  column.  The  first  interval, 
"125  up  to  130,"  begins  at  125  and  ends  at  130;  the  second 
interval  "130  up  to  135,"  begins  at  130  and  ends  at  135  and 
so  on.  The  last  interval,  "200  up  to  205,"  begins  at  200  and 
ends  at  205.  In  column  2,  marked  "Tabulation,"  the  separate 
scores  have  been  listed  opposite  their  proper  intervals.  The 
first  score,  185  [see  Table  1(1)],  is  represented  by  a  tally  placed 
opposite  step-interval  "185  up  to  190";  the  second  score,  201, 
by  a  tally  placed  opposite  step-interval  "200  up  to  205";  the 
third  score,  188,  by  a  tally  placed  opposite  "185  up  to  190" 
and  so  on  for  the  other  scores.  When  all  54  scores  have  been 
listed,  the  total  number  of  tallies  on  each  step-interval  (i.e., 
the  frequency)  is  written  in  column  3,  headed  F  (frequencies). 
The  sum  of  the  F  column  is  called  N.  In  the  present  case,  of 
course,  N  =  54.  When  the  total  frequency  of  each  step-interval 
has  been  tabulated  opposite  its  proper  step-interval,  as  shown 
in  column  3,  our  54  Alpha  scores  are  arranged  into  what  is 
known  as  a  Frequency  Distribution. 

The  reader  will  note  that  the  lower  limit  of  the  first  step  in 
the  distribution  (i.e.,  125  up  to  130)  has  been  taken  at  125 
although  the  lowest  actual  score  in  the  series  is  126.  This  is 
due  to  the  fact  that  when  the  step-interval  equals  5  units,  it 
facilitates  tabulation  as  well  as  computations  which  come  later 
onx  if  the  lower  limit  of  the  first  step-interval  (and  accordingly 


THE  FREQUENCY  DISTRIBUTION  5 

of  each  succeeding  step-interval)  is  a  multiple  of  5.  A  step- 
interval  of  126  up  to  131  is  just  as  good  as  a  step-interval  of 
125  up  to  130,  theoretically;  the  second,  however,  is  much 
easier  to  handle  from  the  standpoint  of  the  arithmetic  involved. 

3.  Three  Ways  of  Expressing  the  Limits  of  a  Step-interval 

Table  I  (2  A,B,C)  illustrates  three  ways  of  writing  the  limits 
of  a  step-interval.  In  (A)  the  interval  "125  up  to  130"  means 
that  all  scores  from  125  up  to  but  not  including  130  fall  on  this 
step.  In  (B)  the  step-interval  125-129.99  means  exactly  the 
same  thing.  The  upper  limit  is  written  129.99  simply  to 
emphasize  the  fact  that  this  step-interval  includes  score  129 
plus  fractional  parts  up  to  130,  but  does  not  include  score  130. 
(C)  expresses  the  same  facts  more  clearly  than  (A)  and  not  so 
exactly  as  (B).  Thus  125-129  means  that  this  step-interval 
begins  with  score  125  and  ends  with  score  129.  A  diagram  will 
indicate  how  (A),  (B),  and  (C)  are  simply  three  ways  of  express- 
ing the  same  facts. 

Step  Step 

Begins  Ends 

1  1         »         2         ,         3         ,         4         ,        5         1 

125  126  127  128  129  130 

Either  method  (B)  or  method  (C)  is  advised  as  preferable 
to  (A).  It  is  fairly  easy — even  when  one  is  on  guard — to  let 
a  score  of  say  160  slip  into  the  step-interval  155  up  to  160  due 
simply  to  the  presence  of  the  160  at  the  upper  limit  of  the  step. 
The  accurate  tabulation  of  a  frequency  distribution  depends 
on  getting  each  score  into  its  proper  step-interval,  and  for  this 
reason  one  cannot  be  too  careful  in  defining  the  limits  of  the 
steps. 

In  any  frequency  distribution  we  always  assume  that  the 
scores  within  a  given  interval  (i.e.,  the  frequency)  are  spread 
evenly  over  the  entire  interval;  and  this  assumption  holds 
whether  the  length  of  the  step  is  3,  5  or  10  units.  If  we  wish  to 
represent  all  of  the  scores  within  a  given  interval  by  some 
single  value,  however,  the  midpoint  of  the  interval  is  taken  as 


6  STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

the  most  logical  choice.  To  illustrate,  in  the  step-interval 
155-159  [see  Table  I  (2  C)]  the  six  scores  on  this  step  are  all 
represented  by  the  same  value,  157.50,  the  midpoint  of  the 
interval,  although  the  scores  are  155,  155,  157,  157,  158,  158. 
The  reason  why  157.50  is  the  midpoint  of  the  step-interval  can 
be  shown  graphically  as  follows: 

Step  Step 

Begins  Ends 

I 1         i         2  ,3,4,5| 

155      156      157   1   158      159      160 

157.50 

A  simple  rule  for  finding  the  midpoint  of  a  step  is 

__. ,     .    .     ,          ,.    .,    -   .       .  (upper  limit  —  lower  limit) 
Midpoint  =  lower  limit  of  step  -j — . 

For  example,  in  the    present  case,   155H ^ =157.50. 

Again,  since  the  length  of  the  step  is  5,  it  follows  that  the  mid- 
point must  be  2.5  points  from  the  lower  limit  of  the  step,  i.e., 
at  155+2.5  or  157.50. 

It  is  often  a  question  whether  the  midpoint  is  a  fair  repre- 
sentative of  all  of  the  scores  on  a  given  step-interval.  If  we 
examine  the  six  scores  on  step  155-159,  two  scores,  the  two 
155's,  are  below  the  midpoint;  two  scores,  the  two  157's,  are 
practically  on  the  midpoint;  and  two  scores,  the  two  158's,  are 
above  the  midpoint.  Also  an  examination  of  the  step  preced- 
ing and  the  step  following  155-159  shows  that  on  both  of  these 
steps  there  are  2  measures  above  and  2  below  the  midpoint. 
There  seems  good  evidence,  therefore,  for  assuming  that  the 
midpoint  represents  fairly  the  scores  on  these  intervals,  though 
it  is  true  that  the  balancing  of  scores  above  and  below  the 
midpoint  is  not  always  as  clear  cut  as  in  the  examples  cited.  In 
certain  cases,  in  fact  (e.g.,  when  the  distribution  is  considerably 
"skewed"  *),  there  are  often  many  more  scores  on  one  side  of 
the  midpoint  than  the  other,  and  the  midpoint  assumption  is 

1  When  the  scores  are  "  piled  "  up  at  either  the  lower  or  the  upper  end  of 
the  scale,  the  distribution  is  said  to  be  "  skewed.'!     See  page  86. 


THE  FREQUENCY  DISTRIBUTION  7 

then  clearly  untenable.  The  fact  remains,  however,  that  in 
most  frequency  distributions  of  mental  and  educational  measure- 
ments, especially  when  the  number  of  measures  is  large,  the 
assumption  that  the  midpoint  represents  all  of  the  scores  on  the 
interval  is  a  valid  one,  since  in  the  long  run  about  as  many 
scores  will  fall  above  as  below  the  midpoint  value. 

4.  The  Meaning  of  a  Single  Score  in  a  Continuous  Series 

So  far  we  have  discussed  the  classification  of  scores  into  step- 
intervals  (the  frequency  distribution)  and  the  necessity  of  defin- 
ing carefully  the  upper  and  lower  limits  of  our  step-intervals. 
We  shall  now  try  to  give  a  more  precise  notion  of  what  is  meant 
by  a  single  score,  for  example,  a  score  of  165  points  on  Army 
Alpha.  If  we  think  of  the  score  165  as  occupying  a  certain 
interval  or  distance  on  a  linear  scale,  then  any  fractional  value 
from  165  up  to  (but  not  including)  166,  e.g.,  165.3,  165.8,  etc., 
will  fall  within  this  interval  and  be  scored  simply  as  165.  See 
illustration : 

Step  1G5 


165  166 


A  score  of  165  may  mean,  therefore,  that  the  person  who  made 
it  was  just  barely  through  165  items,  or  that  he  had  nearly 
completed  166 — in  either  case  his  score  will  be  165. 

In  performance  scales  a  score  equal  to  or  greater  than  8, 
say,  but  less  than  9  is  placed  on  step  8-9  or  8-8.99  and  scored  8. 
In  most  product  scales,  however, — the  Thorndike  Handwriting 
Scale  is  an  example — a  score  of  8  represents  any  value  from  7.5 
to  8.5:  i.e.,  any  value  from  a  point  one  half  step  below  8  to 
a  point  one  half  step  above.  Thus  scores  7.7,  8.0,  8.4,  etc., 
would  all  be  scored  8.  If  as  before  we  think  of  a  score  on  such 
a  scale  as  a  linear  magnitude,  8  represents  the  midpoint  of  that 
interval  which  extends  from  7.5  to  8.5.    See  illustration: 

Step  8 

! i 

7.5  8  8.5 


8  STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

This  method  of  scoring  is  employed  in  scales  which  measure 
handwriting,  drawing,  composition,  etc. 

It  is  evident  from  the  foregoing  that  the  meaning  of  a  single 
score  in  a  continuous  series  will  depend  upon  how  the  test 
is  scored.  If  the  score  is  not  defined  by  the  test,  it  is  probably 
safer  to  assume  that  a  score  of  22,  say,  means  22-23,  rather 
than  21.5-22.5. 

II.  Measures  of  Central  Tendency 

When  scores  or  other  measures  have  been  tabulated  into  a 
frequency  distribution,  generally  our  next  task  is  to  find  a 
measure  of  central  tendency.  The  value  of  a  measure  of  central 
tendency  is  twofold:  in  the  first  place,  it  is  a  single  measure 
which  represents  all  of  the  scores  made  by  the  group,  and 
as  such  gives  a  concise  description  of  the  performance  of  the 
group  as  a  whole;  secondly,  it  enables  us  to  compare  two  or 
more  groups  in  terms  of  typical  performance.  There  are  three 
measures  of  central  tendency  in  common  use,  (1)  the  average 
or  arithmetic  mean,  (2)  the  median,  and  (3)  the  mode.  We 
shall  consider  these  three  measures  in  order. 

1.  The  Average,  or  Arithmetic  Mean  1 

The  average  is  the  best  known  of  the  measures  of  central 
tendency.  It  may  be  defined  simply  as  the  sum  of  the  sepa- 
rate scores  or  measures  in  a  series  divided  by  their  number. 
To  illustrate,  if  a  man  makes  $3.00,  $4.00,  $3.50,  $5.00  and 
$4.50  on  five  successive  days,  his  average  daily  wage  ($4.00)  is 
obtained  by  dividing  the  sum  of  his  daily  earnings  by  the  number 
of  days  he  has  worked.  The  formula  for  the  average  of  a 
series  of  ungrouped  measures  is  simply 

A                2  (Measures)  /1N 

Average  = -^ , (1) 

in  which  N  is  the  number  of  measures  in  the  series.2 

1  The  term  "  average  "  is  often  used  as  a  general  expression  to  cover  any 
measure  of  central  tendency.     It  is  here  used  in  a  more  restricted  sense. 

2  The  symbol  2  means  "sum  of." 


THE  FREQUENCY  DISTRIBUTION  9 

When  measures  have  been  grouped  into  a  frequency  dis- 
tribution, it  is  necessary  to  calculate  the  average  by  a  slightly 
different  method  from  the  one  given  above.  The  two  illustra- 
tions in  Table  II  will  make  this  method  clear.  The  first  of 
these  shows  the  calculation  of  the  average  for  the  54  Army 
Alpha  scores  which  we  have  already  tabulated  into  a  frequency 
distribution  in  Table  I.  Note  that  we  first  calculate  the  FXM 
column  by  multiplying  the  midpoint  (M)  of  each  step-interval 
by  the  number  of  scores  (F)  on  it;  and  that  the  average  (171.57) 
is  then  simply  the  sum  of  the  FXM  (9265)  divided  by  N  (54). 
The  use  of  the  midpoint  for  all  of  the  scores  on  the  interval  is 
made  necessary  by  the  fact  that  when  scores  have  been  grouped 
into  step-intervals  they  lose  their  identity  and  are  thereafter 
represented  by  the  midpoint  of  the  particular  interval  on  which 
they  happen  to  fall.  Hence,  we  must  multiply  or  "weight" 
the  midpoint  of  each  step  (M)  by  the  frequency  (F)  on  that 
step;  add  the  FXM,  and  divide  by  N  to  get  the  average.  The 
formula  may  be  written 

Average  =  *^ (2) 

Example  (2),  Table  II,  is  a  second  illustration  of  the  calcula- 
tion of  an  average  from  grouped  data.  This  frequency  dis- 
tribution represents  200  scores  made  by  a  group  of  adults  on  a 
cancellation  test.  These  scores  are  classified  into  9  steps; 
and  since  the  step-interval  is  4  points,  the  midpoint  of  each 
step  is  found  by  adding  J  of  4  to  the  beginning  of  each  step  (for 
example,  104+2=106).  The  FXM  column  (found  as  shown 
above)  totals  23988,  and  N  equals  200.  Hence,  applying 
formula  (2),  the  average  is  found  to  be  119.94. 

In  both  illustrations  in  Table  II  we  have  found  the  average 
of  the  scores  made  by  a  given  group.  There  is  no  reason, 
however,  why  we  cannot  use  either  formula  (1)  or  (2)  to  find 
the  average  of  a  number  of  measurements  made  on  the  same 
individual,  as  well.  Thus  an  individual's  reaction  time  to  light 
may  be  measured  100  times,  the  measures  tabulated  into  a 


10        STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 


TABLE  II 

To  Illustrate  the  Calculation  of  the  Average,  Median,  and  Mode, 
from  Data  Grouped  into  a  Frequency  Distribution 

1.  data  from  table  i  (2),  54  army  alpha  scores 
the  step-interval  =  5  points 


Scores 

Midpoint 

F 

FXM 

200-204.99 

202.5 

1 

202. 

50 

195- 

-199 

99 

197.5 

4 

790. 

00 

190- 

-194 

99 

192.5 

2 

385. 

00 

185- 

-189 

.99 

187.5 

10 

1875. 

00 

180- 

-184 

.99 

182.5 

3 

547. 

50 

175- 

-179. 

.99 

177.5 

3    26 

1420, 

,00 

170- 

-174.99 

172.5 

517 

50 

165- 

-169 

.99 

167.5 

3 

502 

50 

160- 

-164 

.99 

162.5 

4 

650 

.00 

155- 

-159 

.99 

157.5 

6 

945 

.00 

150 

-154 

.99 

152.5 

4 

610 

.00 

145- 

-149 

.99 

147.5 

1 

147 

.50 

140 

-144.99 

142.5 

1 

142 

.50 

135- 

-139 

.99 

137.5 

2 

275 

.00 

130- 

-134.99 
-129.99 

132.5 
127.5 

0 

2 

N  =  54 

125- 

255 

.00 

9265.00 

vprn 

p-p  = 

X(FXM) 

=  9265 

1 .57. 

(2)  (^  =  27^  Median  =  175+ix5  =  175.625. 

(3)  Crude  mode  falls  on  class-interval,  185-189.99  or  at  187.5 

2.    SCORES  MADE  BY  200  ADULTS  ON  A  CANCELLATION  TEST 
STEP-INTERVAL  =  4  POINTS 

F  FXM 


Scores 

Midpoint 

136- 

-139 

138 

132- 

-135 

134 

128- 

-131 

130 

124- 

-127 

126 

120- 

-123 

122 

116- 

-119 

118 

112- 

-115 

114 

108- 

-111 

110 

104- 

-107 

106 

mffe 

2(FXM) 

_23988. 

3 

414 

5 

670 

16 

2080 

23 

2898 

52 

6344 

49    52 
27    bl 

5782 

3078 

18 

1980 

7 

742 

AT  =  200  23988 

(1)  Average  =  ~"  "M/  =^—  =  119.94. 

(2)  (^  =  100)  Median  =  116-f^X4  =  119.92. 

(3)  Crude  mode  falls  on  class-interval,  120-123,  or  at  122. 


THE  FREQUENCY  DISTRIBUTION  11 

frequency  distribution,  and  the  average  found  in  exactly  the 
same  way  in  which  we  find  the  average  reaction  time  to  light 
of  100  different  observers. 

2.  The  Median 

When  scores  or  other  measures  are  arranged  in  order  of 

size,  the  median  is  defined  as  the  midpoint  of  the  series,  that  is, 

as  the  point  above  which  and  below  which  are  50%  of  the 

measures.     By  definition,  therefore,  the  median  may  be  found 

N 
by  counting  off  one  half  of  the  measures,  i.e.,  — ,  from  either  end 

of  the  series. 

Let  us  first  consider  the  calculation  of  the  median  for  scores 
or  measures  in  a  simple  ungrouped  series.  Two  cases  arise: 
Case  I  when  N  is  odd,  and  Case  II  when  N  is  even.  As  an  illus- 
tration of  the  first  case,  take  the  following  eleven  consecutive 
scores:   14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24.     Now  since  N 

N 
equals  11,  —  =  5.5;  and  counting  off  the  first  five  scores,  namely, 
Ji 

14,  15,  16,  17,  18,  we  reach  19,  since  score  18  means  "18  up 

to  19."     (See  page  7.)     The  .5  left  of  our  5.5  then  locates  the 

median  midway  between  19  and  20,  viz.,  at  19.5.     To  verify 

this  result  we  may  count  off  5.5  scores  beginning  at  the  other 

end   of  the   series.     The   five  scores,  24,  23,  22,  21,  and  20, 

take  us  to  20  (the  upper  limit  of  score  19)  and  the  .5  left  puts 

the  median  at  a  point  midway  on  the  scale  between  20  and  19, 

viz.,  at  19.5  again.     (See  diagram  below.) 

Case  I  (N  is  odd) 

Begin  5.5  Scores  Median  5.5  Scores  End      ] 

I 1  1  I  1  I  19-5  1  1  I  I  [         1 

14   15   16   17   18   19   20   21   22   23   24   25 

To  illustrate  the  procedure  when  N  is  even,  let  us  drop  off  the 

first  score  (14)  from  the  series  of  eleven  scores  in  Case  I.    N  is 

N 
now  10,  and  -^  is  5.0.   Counting  off  the  first  five  scores,  therefore. 


12         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

from  the  small  end  of  the  series,  i.e.,  15,  16,  17,  18,  19,  we  reach 
20  (the  upper  limit  of  "  19  up  to  20")  as  the  median.  Likewise, 
if  we  count  down  five  scores  from  24,  i.e.,  24,  23,  22,  21,  20,  we 
again  reach  20,  the  lower  limit  of  the  step  "  20  up  to  21."  See 
diagram  below: 

Case  II  (N  is  even) 

Begin  (5  Scores)  Median  (5  Scores)  End 

1  1 11111111 

15        16        17        18        19        20        21        22        23        24        25 

It  will  be  noted  that  in  the  two  cases  just  cited,  the  measures 
were  taken  to  be  in  continuous  series.  If,  instead  of  continuous, 
the  eleven  scores  under  Case  I  are  taken  as  discrete  or  discontin- 
uous there  is  now  no  value  which  fulfills  the  definition  of  the 
median  as  the  midpoint  in  the  series.  When  N  is  odd,  however, 
the  midscore  or  the  middle  measure  may  be  obtained  by  counting 

off  - — ~ —  scores  from  either  end  of  the  series,  after  the  scores 

have  been  arranged  in  order  of  size  from  least  to  greatest. 

11  +  1 
Thus,  (Case  I)  — - —  or  6  scores  counted  off  from  either  end  of 

our  series  puts  the  midscore  at  19 — since  there  are  5  scores 
above  and  5  scores  below  this  score.  A  slightly  different  pro- 
cedure is  necessary  when  N  is  even.  If  the  ten  scores  under 
Case  II,  for  example,  are  taken  as  discrete,  there  is  in  this 
series,  clearly  no  median  value,  and  no  midscore.  However, 
in  such  cases  as  this  it  is  customary  to  take  the  midscore  arbi- 
trarily at  a  point  midway  between  the  two  middlemost  scores. 

N+l 
Thus,  in  our  illustration,  — - — =5.5,  which  puts  the  midscore 

A 

at  19.5,  midway  between  19  and  20,  the  two  middlemost  scores. 
(For  a  discussion  of  the  median  for  discrete  measures  grouped 
into  a  frequency  distribution,  see  page  36.) 

The  method  of  calculating  the  median  for  continuous  data 
grouped  into  a  frequency  distribution  is  shown  in  the  two 
examples  in  Table  II.     Since  there  are  54  scores  in  the  first 


THE  FREQUENCY  DISTRIBUTION  13 

N  .  . 

distribution,  —  is  27.     The  median,  therefore,  is  that  point  on 

the  scale  which  has  27  scores  on  each  side  of  it.  If  we  begin  at 
the  small  end  of  the  distribution  x  and  add  up  the  scores  in 
order,  the  step-intervals  125-129.99  to  170-174.99,  inclusive, 
are  found  to  contain  just  26  scores.  The  next  step,  175-179.99, 
contains  8  scores  (assumed  to  be  evenly  spread  over  the 
entire  step.  See  page  5.)  To  get  the  1  extra  score  needed  to 
make  27,  therefore,  we  must  take  1/8X5 — the  length  of  step 
— and  add  this  amount  (.625)  to  175,  the  beginning  of  the  step- 
interval  175-179.99.  This  puts  the  midpoint  at  175+.625  or 
175.625,  which  is,  accordingly,  the  median  of  the  distribution. 
(See  Diagram  I.) 

A  second  illustration  of  how  the  median  is  found  when  the 
data  are  grouped  into  a  frequency  distribution  is  given  in 
Table  II  (2).  This  second  example  should  aid  in  clearing  up 
any  doubtful  points  in  the  first  problem.  Since  there  are  200 
scores  in  this  distribution,  one  half  of  the  scores  is  100,  and  the 
median  must  lie  at  a  point  100  scores  distant  from  either  end  of 
the  distribution.  If  we  begin  at  the  small  end  of  the  distribu- 
tion, i.e.,  at  104-107,  and  add  the  scores  in  order,  52  scores  will 
take  us  through  step  112-115.  The  49  scores  on  the  next  step- 
interval,  (116-119)  total  101  scores — one  too  many  to  give  us 
the  median.  To  get  the  48  scores  needed  to  make  exactly  100, 
therefore,  we  must  take  48/49X4  (the  length  of  the  step)  and 
add  this  amount,  3.92  to  116,  the  beginning  of  the  step-interval. 
This  takes  us  exactly  100  scores  into  the  distribution,  and  locates 
the  median  at  119.92.  Diagram  I  (2)  shows  graphically  how 
this  median  is  obtained. 

Summary  of  the  steps  in  computing  the  median  from  data 
tabulated  in  a  frequency  distribution: 

N 
(1)  Find  —  measures. 

,  z 

N 
1  While  the  median  may  be  found  equally  well  by  counting  in  —  scores  from 

the  large  end  of  the  distribution,  it  is  simpler  to  begin  at  the  small  end,  and  the 
student  is  advised  to  follow  this  plan  first. 


14 


STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 


(2)  Begin  at  the  smaller  end  of  the  distribution  and  count 
the  measures  serially  up  to  the  interval  which  contains  the 
median. 

N- 

(3)  Divide  the  number  of  measures  necessary  to  fill  out  — 

by  the  frequency  on  the  interval  containing  the  median  [reached 


Scale 

F 

179 
178 
177 
176 

IT8! 

8 

7 

6 

Step  2 

\    5 

175-179  3 

4 

3 

2 

1 

174 
173 
172 
171 

120 

3 

Step  •- 
170-174  s 

M5 

2 

1 

34  Scores  to  180 


8F's 


.21  Scores  to  175.625,  the  Median 
26  Scores  to  175 


3F's 


Median  =175  +^  X  5  =175.625 

DIAGRAM  I  (1) 

The  Calculation  of  the  Median. 

Explanation — 26  9cores  go  up  to  175  on  the  scale;  34  scores  to  180.  To  find  how 
far  27  scores  will  go,  we  must  take  J  of  5  (the  step  length)  and  add  this  to  175.  This 
puts  the  median  at  175.625. 


in  (2)  above]  and  multiply  the  result  by  the  length  of  the 
Btep-interval. 

(4)  Add  the  amount  obtained  in  (3)  to  the  lower  limit  of 


THE  FREQUENCY  DISTRIBUTION 


15 


the  step  which  contains  the  median.     This  will  give  the  median 
point  on  the  scale. 

3.  The  Mode 

The  mode  is  most  simply  defined  as  that  measure  which 
occurs  most  often  in  a  series.     In  the  series,  10,  11,  11,  12,  12, 


Step  £ 
116-119  § 


Step  jj 
112-115  s 


Scale 
_120_ 


119 


118 


117 


-1-16- 


115 
114 
113 


4-1-2- 


OS 


X 


101  Scores  to  120 

100  Scores  to  119.92,  the  Median 


52  Scores  to  116 


Median  =  116 +*%>  x  4  =119.92 


DIAGRAM  I  (2) 
The  Calculation  of  the  Median. 

Explanation — 52  scores  counted  off  take  us  to  116  on  the  scale;  101  scores  take  us 
to  120.  To  find  how  far  100  scores  go,  we  must  take  48/49  of  4  (the  step  length)  and 
add  this  amount  (3.92)  to  116.     This  locates  the  median  at  119.92. 


13,  13,  13, 14,  14,  and  15,  for  example,  since  the  most  often 
recurring  measure  is  13  this  measure  may  be  taken  as  the  mode. 
In  Table  I  (1)  we  find  from  the  ungrouped  scores  that  185  occurs 
5  times — more  often  than  any  other  single  score — and  hence  185 
may  be  taken  as  the  mode  of  this  series. 


16         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

When  the  scores  or  measures  are  continuous  and  have  been 
grouped  into  a  frequency  distribution,  the  " crude  mode"  is 
often  taken  as  the  midpoint  of  the  step-interval  which  contains 
the  greatest  frequency.  In  Table  I,  for  example,  if  we  did  not 
know  from  the  ungrouped  scores  that  185  is  the  modal  score, 
the  crude  mode  of  the  distributions  given  in  (2)  would  be  taken 
at  187.50,  the  midpoint  of  step  185-189,  the  step-interval  con- 
taining the  greatest  frequency.  Likewise,  in  Table  II,  the 
crude  mode  would  be  122,  the  midpoint  of  the  step  which  con- 
tains the  greatest  frequency. 

It  is  clear  that  the  crude  mode  will  be  dependent  to  a  large 
extent  upon  the  size  of  the  step-interval  selected  (i.e.,  on  whether 
the  grouping  is  by  large  or  small  steps)  and  for  this  reason  it  is 
often  an  unstable  measure  of  central  tendency.  This  is  not 
necessarily  a  serious  drawback,  however,  as  the  mode  is  usually 
employed  simply  to  indicate  in  a  rough  way  the  center  of  con- 
centration in  the  distribution.  For  this  purpose  it  is  not 
necessary  to  define  it  so  carefully  as  we  do  the  median  or  the 
arithmetic  mean. 


III.  Measures  of  Variability 

In  Section  II  we  discussed  the  calculation  of  the  so-called 
" measures  of  central  tendency" — measures  typical  or  repre- 
sentative of  the  set  of  scores  as  a  whole.  Our  next  step  is  the 
calculation  of  the  variability  of  the  scores,  i.e.,  of  the  "scatter" 
or  "spread"  of  the  separate  scores  or  measures  around  their 
measure  of  central  tendency.  This  will  be  the  task  of  the  pres- 
ent section. 

The  usefulness  of  some  measure  of  variability  can  be  shown 
by  a  simple  example.  Suppose  that  we  have  given  a  test  of 
controlled  association  to  a  group  of  50  boys  and  the  same  test 
to  a  group  of  50  girls.  The  average  scores  are,  Boys,  34.6  sees., 
and  Girls,  34.5  sees. — so  far  as  the  averages  go,  there  is 
apparently  no  difference  in  the  performance  of  the  two  groups. 
Suppose,  however,  that  on  examining  the  original  scores,  we 


THE  FREQUENCY  DISTRIBUTION  17 

find  the  boys'  scores  ranging  from  15  to  51  sees,  and  the  girls' 
scores  ranging  from  19  to  45  sees.  This  discovery  would  make 
it  evident  at  once  that  in  a  general  way,  the  boys  "  cover  more 
territory" — are  more  variable — than  the  girls,  and  this  greater 
variability  may  be  of  considerably  more  interest  than  the  lack 
of  difference  in  the  average  scores.  If  a  group  is  homogeneous, 
i.e.,  made  up  of  individuals  of  nearly  the  same  ability,  most  of 
the  scores  will  fall  near  the  same  point  on  the  scale,  the  range 
will  be  relatively  short,  and  the  variability  will  be  small.  If, 
however,  the  group  contains  individuals  of  widely  differing 
capacity,  the  scores  will  be  strung  out  from  high  to  low,  the  range 
will  be  relatively  wide,  and  the  variability  will  be  large.  Four 
measures  have  been  devised  to  take  account  of  this  factor  of 
variability  within  a  set  of  measures.  These  are  (1)  the  range, 
(2)  the  quartile  deviation,  or  Q,  (3)  the  average  deviation,  or 
AD,  and  (4)  the  standard  deviation,  or  SD. 

1.  The  Range 

In  grouping  the  scores  in  Table  I  into  a  frequency  distribu- 
tion (page  3)  we  have  already  had  occasion  to  use  the  range. 
It  may  be  re-defined  simply  as  the  interval  between  the  largest 
and  the  smallest  measures.  In  the  illustration  given  above, 
the  range  of  the  boys'  scores  is  51-15  or  36,  and  the  range  of  the 
girls'  scores  45-19  or  26.  The  range  is  the  most  general  measure 
of  " spread"  or  " scatter."  It  includes  100%  of  the  distribution, 
and  is  employed  when  we  wish  to  make  a  rough  comparison  of 
two  or  more  groups  for  variability;  or  when  the  number  of 
measures  is  too  small  to  justify  the  calculation  of  some  more 
refined  measure  of  variability.  Since  the  range  only  takes  ac- 
count of  the  extremes  of  the  series,  it  is  obviously  unreliable 
when  frequent  or  large  gaps  occur  in  the  distribution  of  scores. 

2.  The  Quartile  Deviation,  or  Q 

The  quartile  deviation,  or  Q,  may  be  defined  as  one  half 
of  the  distance  between  the  75th  and  the  25th  percentile  points 
in  the  given  distribution.     The  25th  percentile,  or  Qi,  is  the 


18         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

first  quarter  or  quartile  point  on  the  scale;  the  point  below 
which  lie  25%  of  the  measures.  In  like  manner,  the  75th 
percentile,  or  Qz,  is  the  third  quarter  or  quartile  point  on  the 
scale,  the  point  below  which  lie  75%  of  the  measures.  (By 
analogy,  the  median  is  Q2,  the  second  quartile  point.) 

In  order  to  find  Q,  it  is  obvious  that  we  must  first  calculate 
the  75th  and  25th  percentile  points.  These  points  are  found  in 
exactly  the  same  way  as  the  median:  viz.,  to  find  Qi  we  count 
off  25%  of  the  scores  from  the  beginning  of  the  distribution; 
and  to  find  Qs,  we  count  off  75%  of  the  scores  from  the  beginning 
of  the  distribution. 

Table  III  illustrates  the  calculation  of  Q  for  the  distribution 

of  54  Alpha  scores  tabulated  in  Table  I.     First,  to  find  Qi,  we 

must  count  off  1/4  of  the  total  number  of  scores,  i.e.,  13.5,  from 

the  small  end  of  the  distribution.   When  the  scores  (the  F's)  are 

added  in  order  the  first  six  step-intervals  (the  steps  125-129.99 

to  150-154.99  inclusive)  are  found  to  contain  10  scores.     The 

next  step,  155-159.99,  contains  6  scores.1     We  need  only  3.5 

additional  scores,  however,  to  make  up  the  necessary  13.5; 

3  5 
hence  we  take  -77- X  5  (the  step  length)  and  add  this  amount 

(2.92)  to  155,  the  beginning  of  the  step.     This  locates  Qi  at 

155+2.92  or  157.92. 

In  like  manner,  we  find  Q%  by  counting  off  3/4  of  the  score^ 

from  the  small  end  of  the  distribution.    3/4  of  2V  =  40.5;  and  thb 

F's  on  steps  125-129.99  to  180-184.99,  inclusive,  added  in  order, 

total  37.     The  next  step,  185-189.99,  contains  10  scores.     To 

3  5 
round  out  the  necessary  40.5,  therefore,  we  take  tttX5    (the 

step  length)  and  add  this  amount  (1.75)  to  185,  the  beginning 
of  the  step.  This  puts  Q3  at  186.75  since  40.5  scores  reach  this 
point. 

1  Assumed  to  be  spread  evenly  over  the  entire  step.     See  page  5. 


THE  FREQUENCY  DISTRIBUTION 


19 


TABLE  III 

To  Illustrate  the  Calculation  op  Q,  AD,  and  SD  from 
Data  Grouped  into  a  Frequency  Distribution 


1.   DATA  FROM  TABLE  I,  54  ARMY  ALPHA  SCORES 


V 


(1) 

Scores 

200-204 . 99 
195-199.99 
190-194.99 
185-189.99 
180-184.99 
175-179.99 
170-174 . 99 
165-169.99 
160-164.99 
155-159.99 
150-154.99 
145-149.99 
140-144.99 
135-139.99 
130-134.99 
125-129.99 


(2) 
Midpoint 

202.50 
197.50 
192.50 
187.50 
182.50 
177.50 
172.50 
167.50 
162.50 
157.50 
152.50 
147.50 
142.50 
137.50 
132.50 
127.50 


(3) 
F 

1 
4 
2 
10 
3 
8 
3 
3 
4 
6 
4 
1 
1 
2 
0 
2 


AT  =  54 
Average  =  171.57  (Table  II) 

AT 

—  =  13.5,  therefore, 


^  =  155+^X5  =  157.92 


(4) 
D 

30.93 
25.93 
20.93 
15.93 
10.93 
5.93 
.93 

■  4.07 

■  9.07 
-14.07 
■19.07 
-24.07 
-29.07 
-34.07 
-39.07 
•44.07 


(5) 
FD 

30.93 

103.72 

41.86 

159.30 

32.79 

47.44 

2.79 

-12.21 

-33.28 

-84.42 

-76.28 

-24.07 

-29.07 

-68.14 

-88!  ii 

837.44 


(6) 

956.66 
2689.46 

876.13 
2537.65 

358.39 

281.32 

2.79 

49.69 

329.06 
1187.79 
1454.66 

579.36 

845.06 
2321 . 53 

'3884^33 

18353.88 


—  =40.5,  therefore, 
Q3  =  185+^X5  =  186.75 


g.A=g»,186-75-157.92Bl4>42 


AD  =  ZTO     837^4  =15<51 


N 


54 


SD  = 


V 


2TO2 


N 


-4 


18353 . 88 


54 


V339. 887  =  18.44 


20         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

TABLE  III — Continued 

2.    DATA  FROM  TABLE  II (2),  200  CANCELLATION  SCORES 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

Scores 

Midpoint 

F 

D 

FD 

FD* 

136-139 

138 

3 

18.06 

54.18 

978.49 

132-135 

134 

5 

14.06 

70.30 

988.42 

128-131 

130 

16 

10.06 

160.96 

1619.26 

124-127 

126 

23 

6.06 

139.38 

844 . 64 

120-123 

122 

52 

2.06 

107.12 

220.67 

116-119 

118 

49 

-   1.94 

-  95  06 

184.42 

112-115 

114 

27 

-  5.94 

-160.38 

952.66 

108-111 

110 

18 

-  9.94 

-178.92 

1778.47 

104-107 

106 

7 
N  =  200 

-13.94 

-  97.58 

1360.27 

1063.88 

8927.30 

Average  = 

=  119.94  (Table  II) 

N 
4  : 

=  50,  therefore, 

3N 
4 

=  150,  therefore 

t 

Qi- 

=  112+— 

^27 

X4  =  115.70 

Qz-- 

49 
=  120+^X4  = 
52 

123.77 

Q^Q3-Qi=123.77-115.70_1Q1 


sro_1063.88 
AD~  N  200     ~5'6Z 


on        jWD*        /8927.30         pQ 
^=VnV-=V-200-=6-68 


With  Qi  and  Q3  known,  the  quartile  deviation,  Q,  is  easily 
calculated  from  the  formula 

Q  =  ^^ (3) 

_     .,                        ul        n     186.75-157.92       1/f    ._ 
In  the  present  problem,  Q  = or  14.42. 

A  second  illustration  of  the  calculation  of  Q  from  a  frequency 
distribution  is  given  in  Table  III  (2).     Since  the  N  of  this  dis- 


THE  FREQUENCY  DISTRIBUTION  21 

tribution  is  200, 1/4  of  the  measures  equals  50.  The  steps  104- 
107  and  108-111  contain  25  scores;  and  the  next  step  contains  27 
scores.  To  find  the  point  reached  by  50  scores,  therefore,  we 
must  take  25/27X4  (the  step  length)  and  add  this  amount 
(3.70)  to  112,  the  lower  limit  of  step  112-115.  This  locates 
Qi  at  115.70. 

To  find  Q3,  we  must  count  off  3/4  of  AT  or  150  scores  from 
the  small  end  of  the  distribution.  The  first  four  steps  include 
101  scores,  and  the  next  step,  120-123,  contains  52.  To  fill 
out  150,  therefore,  we  take  49/52X4  (the  length  of  step)  and 
add  this  increment  (3.77)  to  120  to  locate  Q3  at  123.77.  Sub- 
stituting 115.70  for  Qi  and  123.77  for  Q3  in  formula  (3)  we 
get  a  Q  of  4.04  points. 

The  quartile  points,  Qi  and  Q3,  are  of  considerable  impor- 
tance in  that  they  mark  off  the  limits  within  which  fall  the 
middle  50%  of  the  measures  in  the  distribution.  The  distance 
between  these  two  points  is  often  called  the  interquartile  range; 
hence  Q  is  sometimes  called  the  Semi-interquartile  Range. 
Q  actually  measures  the  average  distance  of  the  two  quartile 
points  from  the  median,  and  because  of  the  ease  with  which 
it  can  be  found  is  a  valuable  measure  of  the  closeness  with 
which  the  scores  are  grouped  directly  around  the  median  point. 
If  the  scores  of  a  distribution  are  closely  packed  together,  the 
quartiles  will  be  close  together  and  Q  will  be  small ;  if  the  scores 
are  scattered,  the  quartiles  will  be  relatively  far  apart,  and  Q 
will  be  large. 

When  the  distribution  is  symmetrical  or  "  normal  "  (see 
page  85)  Q  marks  off  exactly  the  limits  of  the  25%  of  the  cases 
just  above,  and  the  25%  of  the  cases  just  below  the  median: 
and  accordingly,  the  median  lies  just  halfway  between  the  two 
quartile  points  Q\  and  Q3.  Q  is  then  commonly  known  as  the 
PE  (probable  error).  The  terms  Q  and  PE  are  often  used  inter- 
changeably, although  it  is  probably  best  to  restrict  the  use  of 
the  latter  term  to  normal  distributions,  and  to  the  measure- 
ment of  reliability.  The  value  of  the  PE  as  a  measure  of 
reliability  will  be  discussed  at  length  in  Chapter  HI, 


22         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

Summary  of  Steps  in  Calculation  of  Q  (Data  Grouped) 

To  find  Qi : 

1.  Divide  N  by  4. 

2.  Begin  at  the  small  end  of  the  distribution,  and  count 

the  scores  up  to  the  interval  which  contains  Q\. 

3.  Divide  the  number  of  measures  necessary  to  locate 

/  N\ 

Qi  ( i.e.,  to  complete  —  J  by  the  frequency  in  the 

interval  reached  in  (2)  above,  and  multiply  the 
result  by  the  step-interval. 

4.  Add  the  amount  obtained  in  (3)  to  the  lower  limit  of 

the  step-interval   on  which  Qi   lies.     The  result 
is  Qi. 

To  find  Q3: 

1.  Find  3/4  of  .V. 

2.  Begin  as  before  at  the  small  end  of  the  distribution, 

and  count  up  the  scores  until  the  interval  which 
contains  Qs  is  reached. 

3.  Divide  the  number  of  scores  required  to  locate  Qs  by 

the  frequency  in  the  interval  reached  in  (2)  and 
multiply  the  result  by  the  step-interval. 

4.  Add  the  amount  obtained  in  (3)  to  the  lower  limit  of 

the  step-interval  on  which  Q3  lies.     This  locates 
Qb. 
To  find  Q: 

Substitute  Q3  and  Qi  in  formula  (3), 

n_Qs-Qx 
^~     2     * 

3.  The  Average  Deviation,  or  AD 

The  average  deviation  or  AD  (also  written  mean  deviation- 
or  MD)  may  be  defined  as  the  average  of  the  deviations  of  all 
the  separate  measures  in  a  series  taken  from  their  central 
tendency   (usually  the  average,   less  frequently  the  median, 


THE  FREQUENCY  DISTRIBUTION  23 

or  mode).  In  averaging  deviations  to  find  the  AD,  no  account 
is  taken  of  signs,  and  all  deviations,  whether  positive  or  negative, 
are  treated  as  positive. 

An  example  will  make  the  definition  clearer.  If  we  have 
five  scores,  6,  8,  10,  12,  and  14,  the  average  is  easily  found  to 
be  10.  It  is  then  a  simple  process  also  to  find  the  deviation  of 
each  measure  from  the  average  by  subtracting  the  average  from 
each  measure.  Thus  6,  the  first  score,  minus  10  equals  —4 
(calculation  algebraic);  8-10= -2;  10-10  =  0;  12-10  =  2; 
and  14  —  10  =  4.  The  five  deviations  measured  from  the  aver- 
age are  —4,  —2,  0,  2,  and  4.  Now  adding  these  deviations 
without  regard  to  sign,  the  sum  is  12;  and  dividing  12  by  5, 
we  get  2.4,  as  the  average  of  the  5  deviations  from  the  average, 
or  the  AD.  The  formula  for  the  AD  with  simple  ungrouped 
numbers  like  these  may  be  written, 

22) 

1D  =  y  (arithmetical), (4) 

in  which  2D  =  sum  of  deviations,  and  N  is,  as  before,  the  num- 
ber of  cases  or  items  in  the  series. 

In  Table  III,  the  calculation  of  the  AD  for  scores  grouped 
into  a  frequency  distribution  is  illustrated  by  two  problems. 
The  average  of  problem  (1)  has  already  been  found  in  Table 
II  to  be  171.57.  Hence,  to  find  the  average  deviation  of  the 
scores  in  this  distribution  from  the  average,  we  must  take  our 
deviations  (D's)  around  this  point.  Note,  however,  that,  since 
the  scores  have  been  grouped  into  step-intervals,  we  are  no 
longer  able  to  get  the  D  of  each  score  from  the  average;  and 
hence  we  simply  find  the  deviation  (D)  of  the  midpoint  of  each 
step  from  the  average.  The  substitution  of  the  midpoint  value 
for  all  of  the  scores  within  the  step  is  the  only  difference 
between  the  computation  of  D's  with  grouped  and  ungrouped 
measures.  For  example  the  D  of  step  200-204.99  is  30.93, 
found  by  subtracting  171.57  (the  average)  from  202.50  (the 
midpoint  of  the  step).  Likewise,  the  D  of  the  next  step  is 
25.93,  found  by  subtracting  171.57  from  197.50.     All  of  the  D's 


24         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

are  positive  as  far  down  the  scale  as  170-174.99,  as  in  each 
case  the  midpoint  is  larger  numerically  than  the  average. 
From  the  step-interval  165-169.99  on  down  to  the  beginning 
of  the  series,  however,  the  D's  are  negative,  as  the  midpoints 
of  these  steps  are  all  smaller  than  171 .57.  Thus  the  D  of  step 
165-169.99  is  -4.07,  e.g.,  167.50-171.57= -4.07;  and  the  D 
of  the  lowest  step  in  the  distribution,  125-129.99,  is  —44.07. 

It  will  be  helpful  in  finding  deviations  to  remember  that 
the  average  is  always  subtracted  from  the  individual  score  or 
midpoint  value.     That  is, 

Deviation  =  Score  or  Midpoint  — Average  (calculation  alge- 
braic). 

Hence  it  is  clear  that  when  the  score  or  midpoint  is 
numerically  larger  than  the  average,  the  deviation  must  be 
positive;  when  the  score  or  midpoint  is  numerically  smaller 
than  the  average,  the  deviation  must  be  negative. 

It  is  obviously  unnecessary  to  subtract  the  average  from 
each  midpoint  separately  in  order  to  obtain  the  different  D's. 
The  reason,  of  course,  is  that  each  step-interval  is  5  points; 
hence,  after  finding  the  D  of  step  200-204.99  to  be  30.93, 
we  need  only  subtract  5  points  from  this  D  in  order  to  obtain 
25.93,  the  D  of  the  next  step;  then  5  again  to  obtain  20.93, 
the  D  of  the  next  step,  and  so  on.1  The  negative  D's  are 
obtained  in  exactly  the  same  way  as  the  positive  D's.  Thus 
.93-5= -4.07;    -4.07-5= -9.07  and  so  on  to  -44.07. 

Column  4  gives  the  deviation  of  each  step-interval  (as 
represented  by  its  midpoint)  from  the  average  of  the  dis- 
tribution. There  are,  however,  more  scores  on  some  steps 
than  on  others;  and  for  this  reason  each  midpoint -devia- 
tion (D)  in  column  4  must  be  "  weighted  "  (multiplied)  by 
the  number  of  scores  (F)  which  it  represents.  This  gives 
the  FD  column, — column  5.  The  first  FD  is  30.93;  for  since 
there  is  only  1  score  on  step  200-204.99,  we  need  simply 
multiply  the  first  D  by  1.     The  next  FD  is  103.72;  since  each 

1  Checking  the  D's  occasionally  to  avoid  carrying  an  error  throughout  our 
calculations. 


THE  FREQUENCY  DISTRIBUTION  25 

of  the  4  scores  on  step  195-199.99  has  a  D  of  25.93.     In  like 

manner,  we  obtain  the  other  FD's,  by  multiplying  each  D  in 

column  4  by  its  corresponding  frequency  (F)  in  column  3. 

When  all  of  the  FD's  have  been  calculated,  we  sum  the 

column  without  regard  to  sign  and  divide  by  N  to  obtain  the 

837.44 
AD.     In    the    present    problem,    the    AD    equals    — =j —  or 

15.51. 

The  formula  for  the  AD  for  measures  grouped  into  a  fre- 
quency distribution  may  now  be  written  as  follows: 

AD=  — -(arithmetical) (5) 

This  formula  applies  equally  well  to  the  AD  found  from  the 
average,  median,  or  mode. 

The  second  problem  in  Table  III  shows  the  calculation  of 
the  AD  for  the  200  cancellation  scores,  grouped  into  a  fre- 
quency distribution  with  a  step  of  4.  The  average  for  this 
distribution  has  been  found  to  be  119.94  (see  Table  II,  2). 
Hence,  the  D  of  the  first  step  136-139  (midpoint  138),  from  the 
average  is  18.06.  The  next  D  may  be  found  by  subtracting 
4  (the  step-interval)  from  18.06,  and  each  succeeding  D  in 
turn  by  subtracting  4  from  the  D  just  preceding  it. 

The  FD's  in  column  5  are  found  [as  previously  shown  in  (1)] 
by  "  weighting  "  each  D  by  the  F  which  it  represents, — by 
the  F  opposite  it.  The  sum  of  the  FD  column  is  1063.88; 
and  since  N  is  200,  from  formula  (5)  we  obtain  5.32,  as  the 
AD  of  the  scores  in  this  distribution  from  their  average 
119.94. 

In  a  perfectly  symmetrical  or  normal  distribution  (page 
85)  the  AD — when  measured  off  above  and  below  the  average 
— marks  the  limits  of  the  middle  57.5%  of  the  measures. 
Thus  the  AD  is  seen  to  be  slightly  larger  than  the  Q.  In  general, 
a  large  AD  means  that  the  scores  in  the  distribution  are  scat- 
tered around  the  central  tendency;  a  small  AD  means  that 
they  are  concentrated  within  a  relatively  narrow  range. 


26         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

4.  The  Standard  Deviation,  or  SD 

The  standard  deviation  or  SD  is  the  most  reliable  of  the 
measures  of  variability,  and  for  this  reason  is  customarily  used 
in  research  which  requires  great  accuracy.  The  SD  differs 
from  the  AD  in  several  respects.  In  the  first  place,  in  cal- 
culating the  AD  we  disregard  signs  and  treat  all  deviations 
as  positive;  in  finding  the  SD,  on  the  other  hand,  we  avoid 
this  difficulty  of  signs  by  squaring  the  separate  deviations. 
Again,  the  deviations  used  in  computing  the  SD  are  always 
taken  from  the  average,  and  never  from  the  median  or  mode 
as  is  sometimes  done  in  finding  an  AD.  The  conventional 
symbol  used  to  denote  the  SD  is  the  Greek  letter  sigma,  a. 

We  may  define  the  SD  or  a  as  the  square  root  of  the  mean 
(or  average)  of  the  squared  deviations  taken  from  the  average 
of  the  distribution.  To  illustrate  the  calculation  of  the  SD 
in  a  simple  case,  let  us  consider  the  example  used  to  illustrate 
the  calculation  of  the  AD  (see  page  25)  in  which  the  devia- 
tions of  the  five  measures,  6,  8,  10,  12,  and  14,  from  their 
average  10  were  found  to  be  —4,  —2,  0,  2,  and  4,  respectively. 
If  we  square  each  of  these  deviations  we  get  16,  4,  0,  4,  and  16 
(the  minus  signs  become  plus  in  squaring).  Next,  summing  up 
these  five  squares  and  dividing  by  5,  the  mean  of  the  squares 
(8)  is  obtained;  extracting  the  square  root  of  this  result  gives 
2.828  the  SD  or  a  of  the  series.  The  formula  for  the  a  of  a 
series  of  numbers,  ungrouped,  is 


2D2 

w (6) 

Table  III  illustrates  the  calculation  of  a  for  scores  grouped 
into  a  frequency  distribution.  The  process  is  identical  with 
that  used  for  simple  numbers  except  that  in  addition  to  squar- 
ing the  D  of  each  midpoint  from  the  average,  we  "  weight  ' 
each  of  these  squared  deviations  by  the  frequency  which  it 
represents — the  frequency  opposite  it.  This  gives  the  FD2 
column.  By  simple  algebra,  DXFD~FD2)  and  accordingly 
the  easiest  way  to  obtain  the  entries  in  this  column  is  by 


THE  FREQUENCY  DISTRIBUTION  27 

multiplying  the  corresponding  D's  and  FD's  in  columns  4  and  5. 
The  first  FD  entry,  for  example,  is  956.66,  the  product  of 
30.93X30.93;  the  second  is  2689.66,  the  product  of  103.72  X 
25.93,  and  so  on  to  the  end  of  the  column.  All  of  the  FD2,s 
are  necessarily  positive,  since  each  negative  D  is  matched  by 
a  negative  FD  and  consequently  the  product  is  positive.  The 
sum  of  the  FD2  column  (18,353.88)  divided  by  N(54)  gives 
the  mean  of  the  squared  deviations  as  339.887;  and  the 
square  root  of  this  result  is  18.44,  the  standard  deviation. 
The  formula  for  the  SD  when  the  data  are  grouped  into  a 
frequency  distribution  is 

fzFm 
^\^r (7) 

Problem  (2)  of  Table  III  furnishes  another  illustration  of 
the  calculation  of  cr  from  grouped  data.  Column  6,  the  FD2 
column  has  been  obtained,  as  in  the  previous  problem,  by 
multiplying  each  D  by  its  corresponding  FD.  The  sum  of  the 
FD2  column  is  8927.30;  and  N  is  200.  Hence,  applying 
formula  (7)  we  get  6.68  as  the  standard  deviation  [see  Table 
III,  (2)  for  calculations]. 

The  standard  deviation  is,  in  general,  less  affected  by 
chance  fluctuations  than  the  AD,  and  is,  therefore,  a  more 
stable  measure  of  dispersion.  In  a  "  normal  "  distribution 
(page  85)  the  SD  when  measured  off  above  and  below  the 
average  marks  the  limits  of  the  middle  68.26%  (roughly  the 
middle  2/3)  of  the  distribution.  This  is  approximately  true, 
also,  for  less  symmetrical  distributions.  For  example,  in  the 
first  problem  in  Table  III,  the  middle  two  thirds  of  the 
scores  will  fall  roughly  between  score  190  (171.57+18.44)  and 
score  153  (171.57—18.44).  The  standard  deviation  is  always 
larger  than  the  AD  which,  in  turn,  is  always  larger  than  Q. 
This  relation  supplies  a  rough  but  simple  check  on  the  accuracy 
of  calculated  measures  of  variability. 


28         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

IV.  The  Short  Method  of  Finding  the  Average,  the  AD, 

AND   THE   SD(a) 

In  Tables  II  and  III,  the  average,  the  AD,  and  the  SD 
have  been  calculated  by  what  is  oftentimes  known  as  the 
Long  Method.  The  reader  will  recall  that  the  average  in  these 
tables  was  found  by  multiplying  the  midpoint  of  each  step- 
interval  by  the  number  of  scores  on  the  step,  summing  up 
this  column  (the  FXM)  and  dividing  by  N,  the  number  of 
cases  (page  9).  Besides,  in  finding  the  AD  and  the  SD  all 
midpoint  deviations  were  figured  from  the  actual  averages  of 
the  distributions. 

It  is,  no  doubt,  already  apparent  that  the  Long  Method 
(LM)  requires  the  handling  of  large  numbers  and  decimals 
and  that  the  calculations  are  often  tedious.  To  save  time 
and  labor,  therefore,  the  Guessed  Average  Method,  or  more 
simply  the  Short  Method  (SM),  has  been  devised  for  the 
express  purpose  of  cutting  down  the  calculations  involved 
in  finding  the  average,  the  AD,  and  the  SD.  (The  Short 
Method  does  not  apply  to  the  computation  of  the  Median  and 
the  Q,  which  are  always  found  by  the  methods  with  which 
we  are  already  familiar.)  The  student  of  statistics  should 
make  a  special  effort  to  learn  the  Short  Method  to  the  point 
where  he  can  use  it  with  facility.  Not  only  is  it  a  great  time 
and  labor  saver,  but  in  the  calculation  of  coefficients  of 
correlation  it  is  well-nigh  indispensable. 

Table  IV  (2)  illustrates  the  calculation  of  the  average, 
AD,  and  SD  by  the  Short  Method.  In  order  to  make  a  com- 
parison of  the  computations  involved  in  the  two  methods 
easier,  the  calculations  by  the  Long  Method  of  the  average, 
AD,  and  SD  for  the  same  data  are  also  given  in  the  Table. 

1.  The  Calculation  of  the  Average  by  the  Short  Method 

The  first  important  fact  to  grasp  in  beginning  a  study  of 
the  calculation  of  the  average  by  the  Short  Method  is  that  we 
"  guess  "  or  assume  an  average  at  the  outset,  and  later  apply 


THE  FREQUENCY  DISTRIBUTION 


29 


TABLE  IV 

To  Illustrate  the  Calculation  of  the  Average,  AD,  and  SD  by  the 
Short  Method.  Data  from  Table  II  (1)  Calculations  for  Long 
Method  Given  for  Comparison. 


1.  long  method 


(i) 

Scores 
200-204 
195-199 
190-194 
185-189 
180-184 
175-179 
170-174 
165-169 
160-164 
155-159 
150-154 
145-149 
140-144 
135-139 
130-134 
125-129 


1.  Aver. 


(2) 
Midpoint 
202.5 
197.5 
192.5 
187.5 
182.5 
177.5 
172.5 
167.5 
162.5 
157.5 
152 
147. 
142 
137 
132 
127 


(3) 
F 

1 

4 
2 
10 
3 
8 
3 
3 
4 
6 
4 
1 
1 
2 
0 
2 


(4) 
FXM 
202 . 5 
790.0 
385.0 

1875.0 
547.5 

1420.0 
517.5 
502.5 
650.0 
945.0 
610.0 
147.5 
142.5 
275.0 


iV=54 


255.0 
9265.0 


(5) 
D 

30.93 
25.93 
20.93 
15.93 
10.93 
5.93 
.93 

-  4.07 
-9.07 
-14.07 
-19.07 

-  24 . 07 

-  29 . 07 

-  34 . 07 

-  44 . 07 


■ZFM     9265 


N 


54 


=  171.57 


— V^N183! 


54 


2.   SHORT  method 


(1) 

(2) 

(3) 

(4) 

Scores 

Midpoint 

F 

D 

200-204 

202.5 

1 

7 

195-199 

197.5 

4 

6 

190-194 

192.5 

2 

5 

185-189 

187.5 

10 

Fg  =  31 

1      4 

180-184 

182.5 

3 

3 

175-179 

Average 

=177.5 

8 

2 

170-174 

171 

57 

172.5 

3 

1 

165-169 

167.5  (GA) 

3  ] 

160-164 

162.5 

4 

-1 

155-159 

157.5 

6 

-2 

150-154 

152.5 

4 

-3 

145-149 

147.5 

1 

>  Fi  =  23 

-4 

140-144 

142.5 

1 

-5 

135-139 

137.5 

2 

-6 

130-134 

132.5 

0 

-7 

125-129 

127.5 

2 

-8 

A=54 


(6) 
FD 

30.93 

103.93 

41.88 

159.30 

32.79 

47.44 

2.79 

-12.21 

-36.28 

-84.42 

-76.28 

-24.07 

-  29 . 07 

-68.14 

-88.14 

837.44 


(7) 
FD* 

956.66 
2689.46 

876.13 
2537.65 

358.39 

281.32 

2.59 

49.69 

329.06 
1187.79 
1454.66 

579 . 36 

845 . 06 
2321.53 

3884^33 


18353.88 


2. 

53 .  88 


AD  = 


SFD_  837.44 

N 


54 


15.51 


=  18.44 


(5) 

(6) 

FD 

FD* 

7 

49 

24 

144 

10 

50 

40 

160 

9 

27 

16 

32 

3  (  +  109) 

3 

4 

4 

12 

24 

12 

36 

4 

16 

5 

25 

12 

72 

16  (-65) 


128 


GA=  167.50 

c2=  .6639 

C  =  .8148X5=4.07 
Average  =  167 . 5  +4 .  07  =  171 .  57 


2.       AD 


174 

2FD+c(Fi- 


770 


Fg) 


c=4+=  .8148 
5  4 


N 


Xstep 


_174  +  . 8148(23-31) 


=  15.51 


VSFD2  /770 

— j c2=  -J-gj—  .6639  =  3.687X5=18 


54 


44 


X5 


30         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

a  correction  to  this  guessed  average  (GA)  in  order  to  obtain 
the  actual  average.  There  is  no  set  rule  for  guessing  an  average. 
The  best  plan  is  to  take  the  midpoint  of  a  step  somewhere 
near  the  center  of  the  distribution,  and  if  possible  the  mid- 
point of  that  step-interval  which  contains  the  greatest 
frequency.  In  our  problem  the  greatest  F  is  on  step  185-189. 
However,  the  GA  is  taken  at  167.5  instead  of  187.5  since  the 
former  is  closer  to  the  center  of  the  distribution.  With  the 
question  of  the  GA  settled,  the  correction  which  must  be 
applied  to  it  to  get  the  average  is  determined  as  outlined  in  the 
following  steps: 

(1)  First,  we  fill  in  the  D  column,  column  4.  Here  are 
entered  the  deviations  of  the  midpoints  of  the  steps  measured 
from  the  GA  in  units  of  step-interval.  Thus  172.5,  the  mid- 
point of  step  170-174,  deviates  from  167.5,  the  GA}  by  1 
step-interval;  and  hence,  a  figure  1  is  placed  in  the  D  column 
opposite  172.5.  In  like  manner,  177.5  deviates  2  steps  from 
167.5;  and  accordingly,  a  2  goes  in  the  D  column  opposite 
177.5.  Reading  on  up  the  column  from  177.5,  the  succeeding 
D  entries  are  found  in  the  same  way  to  be  3,  4,  5,  6,  and  7. 
The  last  entry,  7,  is  the  step  deviation  of  202.5  from  167.5 
(the  actual  point  deviation,  is,  of  course,  35). 

Returning  to  167.5,  we  find  that  the  D  of  this  point, 
measured  from  the  GA  (from  itself)  is  0;  and  hence  a  0  is 
placed  in  the  D  column  opposite  step  165-169.  Below  167 . 5, 
all  of  the  D  entries  are  negative,  as  all  of  the  midpoints  are  less 
than  167.5,  the  GA.  So  the  D  of  162.5  from  167.5  is  -1 
step-interval;  and  the  D  of  157.5  from  167.5  is  —2  step- 
intervals.     The  other  D's  are  —3,  —4,  —5,  —6,  —7,  —8. 

(2)  The  D  column  completed,  we  next  compute  the  FD 
column — column  5.  The  FD  entries  are  found  in  exactly  the 
same  way  as  in  the  Long  Method  [compare  (1)];  namely, 
each  D  in  column  4  is  multiplied,  or  "  weighted,"  by  the 
appropriate  F  in  column  3.  Note  that  in  the  Short  Method 
we  multiply  each  F  by  its  deviation  from  the  GA  in  units 
of  step-interval  instead  of  by  its  actual  deviation  from  the 


THE  FREQUENCY  DISTRIBUTION  31 

average  of  the  distribution,  and  that  for  this  reason  the  com- 
putation of  the  FD's  is  much  simpler  here  than  in  the  Long 
Method.  All  of  the  FD's  above  (greater  than)  the  GA  will 
be  positive,  and  all  below  (smaller  than)  the  GA  negative, 
since  the  signs  of  the  FD's  depend  on  the  signs  of  the  D's. 

(3)  From  the  FD  column  the  correction  is  obtained  as 
follows:  The  sum  of  the  plus  FD's  is  109;  of  the  negative 
FD's,  —  65.  This  makes  44  more  plus  FD's  than  minus 
(the  algebraic  sum  is  +44)  and  44  divided  by  54  (N)  equals 
.8148,  which  is  the  correction,  "  c,"  in  units  of  step-interval. 
If  we  multiply  c  (.8148)  by  5,  the  length  of  the  step,  the  result 
is  C  (4 .  07) ,  the  score  correction,  or  the  correction  in  score  units. 
When  +4.07  is  added  to  167.5,  the  GA}  the  result  is  171.57, 
the  average.  (Compare  this  result  with  the  average  found  by 
the  Long  Method.) 

A  summary  of  the  steps  in  the  calculation  of  the  average  by 
the  Short  Method  may  be  outlined  as  follows  (see  Table  IV,  2) : 

(1)  Organize  the  scores  or  measures  into  a  frequency 
distribution. 

(2)  Guess  an  average  somewhere  near  the  center  of  the 
distribution,  and  preferably  on  the  step  containing  the 
greatest  frequency. 

(3)  Find  the  deviation  of  the  midpoint  of  each  step-interval 
from  the  GA  in  units  of  step-interval. 

(4)  Multiply  or  weight  each  step-deviation  (D)  by  its 
appropriate  F,  i.e.,  by  the  F  opposite  it. 

(5)  Find  the  algebraic  sum  of  the  plus  and  minus  FD's,  and 
divide  this  sum  by  N,  the  number  of  cases.  This  gives  c, 
the  correction  in  units  of  step-interval. 

(6)  Multiply  c  by  the  length  of  the  step-interval  to  get  C, 
the  score  correction. 

(7)  Add  C  algebraically  to  the  guessed  average  to  get 
the  actual  average.  Sometimes  C  will  be  positive  and  some- 
times negative,  depending  upon  where  the  average  has  been 
guessed.     The  method  applies  equally  well  in  either  case. 


32         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

If  it  seems  to  the  reader  that  the  Short  Method  belies  its 
name,  let  him  compare  the  calculations  in  columns  4  and  5 
(SM)  with  the  calculation  of  column  4  (LM).  In  spite  of  the 
extra  column,  the  SM  has  a  decided  advantage  over  the  LM, 
for  as  all  deviations  from  the  GA  are  in  units  of  step-interval 
(whole  numbers)  the  arithmetic  is  considerably  easier  in  the 
latter  method.  In  distributions  containing  large  numbers, 
the  calculation  of  the  average  by  the  LM  becomes  very 
laborious;  and  it  is  with  such  distributions  that  the  SM 
justifies  itself  as  a  time  and  labor  saver,  rather  than  with 
distributions  containing  small  numbers. 

2.  The  Calculation  of  the  AD  by  the  Short  Method 

(A)  The  Calculation  of  the  AD  from  the  Average 
The  chief  advantage  in  finding  the  AD  by  the  Short  Method 
instead  of  the  Long  Method  lies  in  the  fact  (already  noted  in 
calculating  the  average)  that  in  the  Short  Method  deviations 
are  taken  from  a  GA  in  units  of  step-interval.  This  procedure 
eliminates  fractions  and  cuts  down  multiplication;  but  at  the 
same  time  it  necessitates  the  application  of  a  correction  to 
the  XFD  and  as  a  result  complicates  the  AD  formula.  The 
formula  for  the  AD  by  the  Short  Method  is: l 

.n    2FD+c(Fi-Fg),       ,,      ,    .      .    ■        .  ,0. 

AD  = ~ -X length  of  step-interval.    .     (8) 

The  term  Fl  in  the  formula  refers  to  the  sum  of  the  F's 
on  those  steps  whose  midpoints  are  less  (the  subscript  "  I ' 
means  less)  than  the  average  of  the  distribution.  The  term 
Fg  refers  to  the  sum  of  the  F's  on  those  steps  whose  midpoints 
are  greater  (the  subscript  "  g  "  means  greater)  than  the  average. 
In  Table  IV,  for  example,  all  of  the  midpoints  from  167.5 
down  to  127 . 5,  inclusive,  are  less  than  171 .  57,  the  average 
and  hence  the  Fl  is  23.  All  of  the  midpoints  from  172.5  up  to 
202.5,  inclusive,  are  greater  than  171.57;  and  hence  the  Fg 
is  31.     It  is  important  to  remember  that  the  Fl  and  the  Fg 

1  This  formula  applies  equally  well  to  the  AD  calculated  from  average, 
median,  or  mod©. 


THE  FREQUENCY  DISTRIBUTION  33 

are  always  calculated  from  the  actual  average  of  the  distribution 
(never  from  the  guessed  average)  as  the  reference  point.  In  con- 
sequence the  3  scores  on  step  165-169  whose  midpoint,  167 . 5, 
is  less  than  171.57  are  included  in  the  Fl.  A  simple  check 
on  the  size  of  the  Fl  and  Fg  is  to  make  sure  that  Fi+Fg=N. 
(Note  that  in  the  present  problem  23+31  =  54.) 

The  other  terms  in  the  formula  require  little  explanation. 
The  c  is  the  correction  in  units  of  step-interval.  It  has  already 
been  found  in  calculating  the  average  (page  31)  and  equals 
.8148.  The  2FD  is  the  arithmetic  sum  of  the  FD  column, 
and  equals  174. 

If  now  we  substitute  for  2FD,  c,  Fl,  and  Fg  in  formula 
(8),  the  numerator  is  174+  .8148(23-31)  or  167.482.  Dividing 
this  result  by  54  (2V)  we  obtain  3.102,  the  AD  expressed  in 
units  of  step-interval;  and  this  value  multiplied  by  5  (the 
step)  gives  15.51,  the  AD  of  the  distribution.  (Compare  with 
the  AD  found  by  the  Long  Method.)  Notice  that  it  is  always 
necessary  to  multiply  the  result  given  in  the  formula  by  the 
step-interval,  since  XFD  and  c  are  both  in  units  of  step. 

Formula  (8)  is  a  relatively  quick  way  of  rinding  the  AD 
of  a  frequency  distribution.  The  value  of  the  formula  is 
somewhat  limited,  however,  since  it  gives  correct  iD's  only 
when  c,  the  step-correction,  is  less  than  1.00.  In  Table  IV, 
c=  .8148 — is  less  than  1.00 — and  in  consequence  the  formula 
holds,  as  we  find  on  comparing  the  AD's  given  by  the  Long  and 
Short  Methods.  One  method  of  circumventing  this  limitation 
in  the  AD  formula,  is  to  make  use  of  the  fact  that  no  matter 
where  the  GA  is  taken,  a  correction  can  always  be  calculated 
by  means  of  which  we  can  obtain  the  actual  average.  If  the 
c  so  found  is  less  than  1 .  00,  formula  (8)  may  be  applied 
directly;  if,  however,  c  is  larger  than  1.00,  we  must  guess 
another  average  on  the  same  step  as  the  actual  average 
(which  is  now  known)  and  take  deviations  from  this  "  new  " 
GA.  The  formula  will  then  hold.  (There  is  another  formula 
for  the  AD  which  avoids  the  difficulty  mentioned:  see  Kelley 
T.  L.,  Statistical  Method,  p.  72ff.) 


34        STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

A  summary  of  the  steps  in  the  calculation  of  the  AD  from  the 
average  by  the  Short  Method  may  be  given  as  follows: 

(1)  Find  c,  the  correction  in  step-units,  as  shown  on  page 
31.     If  c  is  less  than  1.00: 

(2)  Find  the  arithmetic  sum  of  the  FD's. 

(3)  Calculate  the  Fl:  the  total  number  of  scores  on  steps 
with  midpoints  less  than  the  average.  Next  calculate  the  Fg : 
the  total  number  of  scores  on  steps  with  midpoints  greater  than 
the  average. 

(4)  Substitute  for  FD,  c,  Fl,  Fg,  N,  and  the  step  length  in 
formula  (8)  to  find  the  AD. 

TABLE  V 

To  Illustrate  the  Calculation  of  the  AD  from  the  Median 
by  the  Short  Method.     Data  prom  Table  11(2) 

(1)  (2)  (3) 

Scores  Midpoint  F 

133-139     138      3 


132-135  134  5 

128-131  130  16 

124-127  126  23 

120-123  122  52 

116-119  118  (GM)  49 

112-115  114  27 


Fa  =  99 


108-111  110  18  f 

104-107  106  7  J 


Fi  =  101 
AT  =  200  265 


(4) 

(5) 

D 

FD 

5 

15 

4 

20 

3 

48 

2 

46 

1 

52 

0 

-1 

-27 

-2 

-36 

-3 

-21 

N 
2=10° 

48 
Median  =  116+^X4  =  119.92 

Guessed  median  =  118  (midpoint  of  step  116-119) 
Correction,  C  =  119. 92- 118. 00  =  1.92 

1.92 
c  =  — j—  =  .  48 
4 

Applying  formula:   AD  = ^ Xstep  length 

.n    265+ .48(101 -99)^ 

AD  = 200 X4  = 

AD  =  1. 33X4  =  5. 32 


THE  FREQUENCY  DISTRIBUTION  35 

(B)  The  Calculation  of  the  AD  from  the  Median 

It  is  sometimes  desirable  to  calculate  the  AD  from  the 
median  instead  of  the  average.  The  formula  for  the  AD 
from  the  median  is  exactly  the  same  as  formula  for  AD  from 
the  average  (see  page  32).  However,  the  scheme  of  the  work 
differs  in  some  respects  from  the  calculation  of  the  AD  from 
the  average,  and  hence  it  is  illustrated  in  Table  V  for  the  200 
cancellation  scores  taken  from  Table  II  (2). 

First  we  find  the  true  median,  119.92,  by  the  method 
outlined  on  pages  13-14.  Next,  we  assume  or  guess  a  median 
at  the  midpoint  of  the  step-interval  which  contains  the  true 
median,  viz.,  at  118.  Since  the  true  median  is  known,  the 
score  correction,  C,  is  found  directly  to  be  1 .  92  by  subtracting 
118  from  119.92  (true  median — assumed  median).  Then 
dividing  1.92  by  4,  the  step-interval,  we  obtain  .48,  the  cor- 
rection  in  step-units  (c) . 

The  D's  are  taken  from  118,  the  guessed  median,  and  the 
FD's  are  obtained  (as  shown  in  Table  IV)  by  "  weighting  " 
each  D  by  its  corresponding  F.  The  arithmetic  sum  of  column 
5,  i.e.,  the  XFD,  is  265.  Fl,  the  total  number  of  scores  on  mid- 
points 118  to  106  inclusive  (those  less  than  119.92)  equals 
101.  And  Fg,  the  total  number  of  scores  on  midpoints  122  to 
128  inclusive  (those  greater  than  119.92)  equals  99. 

With  2FD,  c,  Fl,  and  Fg  known,  the  AD  is  now  easily 
found  by  substituting  these  values  in  formula  (8).  The 
numerator  becomes  265+. 48  (101  —  99)  or  265.96;  and  divid- 
ing by  200  and  multiplying  by  4,  the  step-interval,  we  get  5 .  32 
as  the  AD  from  119.92,  the  median. 

3.  The  Calculation  of  the  Standard  Deviation  (a)  by  the  Short 
Method 

The  calculation  of  the  standard  deviation  by  the  Short 
Method  is  considerably  less  complex  than  the  calculation  of 
the  AD.    The  formula  is : 


(7  = 


kFD2 
\~~Aj c2  X  the  step-interval,    ...     (9) 


36         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

in  which  the  ZFD2  is  the  sum  of  the  squared  deviations  in 
units  of  step-intervals,  taken  from  the  guessed  average,  and  c 
is  the  correction  in  units  of  step-interval. 

An  illustration  of  the  calculation  of  a  by  the  Short  Method 
is  given  in  Table  IV.  The  first  step  is  to  fill  in  the  FD2  column 
(column  6)  by  multiplying  each  D  in  column  4  by  its  corre- 
sponding FD  in  column  5.  The  process  is  identical  with  that 
used  in  the  Long  Method,  except  that  the  Z)'s  are  all  expressed 
in  units  of  step-interval.  This,  of  course,  considerably  simpli- 
fies the  multiplication.  The  calculation  of  c  has  already  been 
described  on  page  31.  The  sum  of  the  FD2  column  (2FD2) 
is  770,  and  c2  is  .6639.  Applying  formula  (9)  therefore,  we 
get  3.687X5  or  18.44  as  the  a  of  the  distribution. 

The  formula  for  a  by  the  Short  Method  unlike  the  AD 
formula,  holds  good  no  matter  what  the  size  of  the  correction, 
c.  This  general  applicability  of  formula  (9)  serves  to  increase 
its  value. 

4.  The  Short  Method  Applied  to  Discrete  Series 

We  have  defined  a  discrete  series  on  page  2  as  one  in 
which  there  are  real  gaps.  This  means  that  in  a  truly  dis- 
crete series  each  measure,  instead  of  representing  an  interval 
on  a  scale  as  in  a  continuous  series,  is  a  separate  and  distinct 
value.  There  is,  for  example,  a  real  gap  between  one  man 
and  two  men;  or  between  one  dollar  and  two  dollars — 
provided  the  unit  of  measurement  in  the  latter  case  is  one 
dollar. 

Table  VI  illustrates  the  method  of  finding  the  measures  of 
central  tendency  and  variability  for  discrete  measures  tabu- 
lated into  a  frequency  distribution.  The  data  consist  of  the 
records  of  the  number  of  children  in  44  families  of  a  rural 
community.  In  the  first  column  of  the  table  is  given  the 
number  of  children  in  the  family;  in  the  second  column — 
under  the  F — the  number  of  families  of  a  given  size.  We  find, 
for  example,  one  family  of  10  children;  three  of  9;  four  of 
8,  etc.     Since  the  measures — here  the  children — are  discrete, 


THE  FREQUENCY  DISTRIBUTION 


37 


TABLE  VI 

To  Illustrate  the  Calculation  of  the  Average,  Median,  <t,  AD, 
and  SD  When  Measures  are  Discrete 

The  "F"  column  gives  the  number  of  families  containing  the  children  listed  in  first 
column. 


Measures, 
No.  Children 

10 
9 
8 
7 
6 
5 
4 
3 
2 
1 
0 


F 
Families 

1 

3 
4 
3 
5 


N  =  44 

N 
2=22 


F„  =  24 


Fi  =  20 


D 


FD 


90 


FD* 


5 

5 

25 

4 

12 

48 

3 

12 

36 

2 

6 

12 

1 

5+40 

5 

0 
-1 

-  7 

7 

-2 

-  8 

16 

-3 

-12 

36 

-4 

-  8 

32 

-5 

-15-50 

75 

292 


GA=5 
-10 


c  = 


44 

Average  =  4. 77 
Median  =  5.0 

Mode  =  5.0 

N 


=  -.23      c2  =  .054 


Q  =  QizQi  =  6^-3  =  1  75 


AD  = 


2  2 

XFD+c(Fi-Fg)     90-  .23(20-24) 


N 


44 


AD  =  2.07 


SD  = 


)FD* 


A    N 


-V! 


292 


054 


£D  =  2.57 
22;   since  22nd  measure  falls  on  5,  Median  =5 


N 

•j- =  11;   since  11th  measure  falls  on  3,  Qi  =  3. 


3.V 


=  33;   since  33rd  measure  falls  between  6  and  7,  $3  =  6.5. 


each  measure  must  be  taken  at  face  value,  and  there  are,  in 
consequence,  no  midpoint  values  for  the  different  steps.  As 
a  result,  the  average  being  guessed  at  5,  D's  are  taken  directly 
from  this  point.  The  FD  and  the  FD2  columns  are  calculated 
exactly   as    shown    in   Table   IV   for    continuous   series — the 


38         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

first  column  is  obtained  by  multiplying  corresponding  F  and 
D  values,  and  the  second  by  multiplying  corresponding  D 
and  FD  values.  Note  that  since  the  step-interval  is  1,  the 
correction  c  equals  C  directly. 

If  we  apply  the  correction  —  .  23  to  5,  the  guessed  average, 
the  average  of  the  distribution  4 .  77  is  obtained.  This  result, 
while  mathematically  correct,  is  obviously  a  rather  difficult 
one  to  interpret  in  a  practical  way,  however,  as  it  is  impossible 
for  a  family  to  have  four  and  a  fraction  children.  Possibly 
the  median  is  a  more  meaningful  measure.  One  half  of  the 
measures  is  22,  and  counting  in  from  the  small  end  of  the 
series  we  find  that  the  twenty-second  score  falls  on  the  fre- 
quency opposite  step  5.  Fractional  values  are,  of  course,  really 
meaningless  in  a  discrete  series ;  and  hence  we  must  simply  take 
5  as  being  rough1,  y  the  median  of  the  distribution  without  any 
interpolation.  The  median  family,  accordingly, — and  the 
modal  family  as  well — may  be  said  to  contain  5  children,  and 
on  the  face  of  it,  this  result  seems  to  be  of  more  practical  value 
than  the  statement  that  the  average  number  of  children  to  a 
family  is  4 .  77. 

It  is  worth  while  examining  further,  however,  exactly 
what  is  meant  by  the  statement  that  the  average  number  of 
children  per  family  is  4.77.  In  the  first  place  it  means,  of 
course,  that  the  number  of  children  in  the  N  families  examined, 
divided  by  N,  gives  us  4.77.  But  furthermore,  if  the  families 
examined  are  actually  a  fair  sample  of  all  of  the  families  in  the 
"  population  "  from  which  they  are  taken  (see  page  120), 
it  means  that  if  we  had  taken  all  of  these  families — or 
another  fair  sample  of  them — the  average  size  of  the  family 
would  have  been  (approximately)  the  .same.  The  average, 
then,  is  a  constant  factor  for  the  given  population,  such  that, 
knowing  the  number  of  families  in  any  fair  sample  of  the 
population,  we  can  multiply  this  number  by  the  constant  factor 
and  obtain  (approximately)  the  number  of  children  in  all  of 
these  families.  Good  use  may  thus  be  made  of  the  average, 
therefore,  even    when    the    measures  are  necessarily   discrete: 


THE  FREQUENCY  DISTRIBUTION  39 

exactly  the  same  kind  of  use  that  can  be  made  of  the  average 
In  the  case  of  continuous  measures. 

The  median,  on  the  other  hand,  together  with  the  quartiles, 
really  breaks  down  in  the  case  of  discrete  measures.  In  the 
example  above  of  the  families,  there  is  actually  no  value  which 
fulfills  the  definition  of  the  median  as  such  a  point  or  value 
that  one  half  of  the  measures  exceed  it,  and  one  half  fall  below 
it.  There  are  just  44  families  in  all;  the  median,  then,  would 
be  such  a  point  that  22  families  exceeded  it  and  22  fell  below  it. 
Now  there  are  20  families  falling  below  5;  8  families  at  5:  and 
16  families  above  5.  If  we  place  the  median  exactly  at  5, 
only  20  families  instead  of  the  required  22  fall  below.  And 
if  we  place  the  median  even  the  least  fraction  above  5,  the 
number  falling  below  is  increased  by  all  of  the  families  having 
5  children,  so  that  there  are  then  22+8  families  falling  below 
the  median,  or  more  than  half.  There  is,  in  short,  no  median 
value  for  this  series  under  the  definition  of  the  median  which 
we  have  been  using. 

Sometimes,  however,  another  definition  of  the  median  is 
given,  namely,  that  it  is  the  score  or  measure  made  by  the 
middle  individual  wjien  the  individuals  have  been  arranged  in 
order — for  scores — from  least  to  greatest.1  Strictly  speaking, 
this  definition  also  breaks  down  in  the  case  of  discrete  measures, 
since  there  is  really  no  sense  in  speaking  of  two  or  more  individ- 
uals who  have  the  same  score  as  being  arranged  in  order  of 
magnitude,  when  measures  are  discrete.  Thus  the  8  families, 
of  5  children  each,  are  all  exactly  equal  as  regards  number  of 
children.  Of  course,  we  might  admit  that  in  a  sense,  some 
one  (any  one)  of  these  8  families  is  the  middle  of  the  whole 
series,  and  since  it  is  a  family  of  5  children,  the  median — so 
defined — is  just  5,  no  more  nor  less.  This  is  the  median  as  we 
have  used  it.  At  best,  however,  it  is  a  rough  and  unreliable 
measure. 

In  computing  the  measures  of  variability  in  a  discrete 
series,  the  Q  is  the  only  one  which  offers  difficulties.     In  the 

1  See  discussion  of  midscore,  page  12. 


40         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

present   illustration,   one  fourth  of  the  measures   (  — )  is  11, 

and  counting  in  from  the  small  end  of  the  series  11  scores, 
we  put  Qi  on  step  3  (as  in  the  case  of  the  median,  no  interpola- 
tion is  made).  If  we  check  this  value  of  Qi  by  counting  in  33 
scores  from  the  large  end  of  the  distribution,  we  again  obtain 

/3N 
3  as  the  value  of  Qi.     Three  fourths  of  the  measures  f— - 

is  33;  and  counting  in  33  scores  from  the  small  end  of  the 
series,  we  find  that  we  complete — or  count  through — the 
frequency  on  step  6.  If  11  scores  are  counted  off  from  the 
other  direction,  we  complete — or  count  through — the  frequency 
on  step  7.  This  puts  Q%  at  either  6  or  7,  and  the  best 
way  out  of  the  difficulty  is  to  take  Qs  as  roughly  equal  to 
6.5,  i.e.,  midway  between  6  and  7.  This  is  of  course  a 
makeshift,  though  even  at  that  probably  as  accurate  as  the 
median   or   quartiles  ever  are  in  discrete  series.      Taking  Q± 

q  5  —  3 
equal  to  3,  and  Qs  equal  to  6 . 5,  Q  is  — "— — -  or  1 .  75. 

The  AD  and  a  in  a  discrete  series  are  found  from  formulas 
(8)  and  (9)  in  exactly  the  same  way  as  in  a  continuous  series. 
For  example,  Fl — the  number  of  families  less  than  4.77 — ■ 
is  22;    and  Fg — the  number  of  families  greater  than  4.77 — 

is  24.     The  AD  is,   therefore,    90+[~  -231(20-24)  xl    ^ 


I292 
step-interval)   or  2.07.     The  a  is  */— — .054X1  (the  step- 
interval)  or  2 . 57. 

V.  The  Comparison  of  Groups 

1.  The  Measurement  of  Relative  Variability.    The  Coefficient 
of  Variation 

Thus  far  we  have  been  dealing  entirely  with  measures  of 
absolute  variability  within  the  distribution,  the  Q,  the  AD, 
and  the  SD.  It  is  sometimes  desirable,  however,  to  measure 
relative  variability  as  for  instance  to  compare  the  variability 


THE  FREQUENCY  DISTRIBUTION  41 

of  one  group  on  two  different  tests,  or  of  two  or  more  groups 
on  the  same  test.  The  measures  of  absolute  variability  are 
not  sufficient  in  such  cases  as  these  unless  the  averages  of  the 
two  distributions  are  equal  or  approximately  equal.  A  problem 
will  serve  to  make  this  clear. 

A  group  of  50  boys  works  for  6  minutes  on  an  arithmetic 
test  and  makes  an  average  score  of  20 . 5  with  a  a  of  5 .  24.  The 
same  group  works  for  10  minutes  on  the  same  test  and  makes 
an  average  score  of  34 . 8  with  a  a  of  9 .  62.  If  we  compare  the  a's 
of  these  two  distributions  we  should  probably  be  inclined  to  say 
that  the  group  was  considerably  more  variable  in  the  10  minute 
period  than  in  the  6  minute  period.  Despite  the  fact  that  the 
a  in  the  second  period  is  nearly  twice  as  large  as  the  a  in  the 
first  period,  however,  this  does  not  mean  necessarily  that  the 
variability  of  the  group  has  doubled  with  the  increased  time 
allowance  (or  even  increased  at  all)  for  the  average  score  has 
also  increased  from  20.5  to  34.8.  In  other  words,  the  two 
o-'s  are  not  directly  comparable  as  they  have  been  measured 
around  different  central  tendencies.  In  order  to  compare 
the  relative  variability  of  this  group  in  the  two  periods  it  is 
evident,  therefore,  that  we  must  have  a  measure  which  takes 
account  both  of  the  dentral  tendency  and  the  variability.  Such 
a  measure  is  Pearson's  Coefficient  of  Variation,  given  by  the 
formula, 

V=^- (10) 

Average 

Applying  this  formula  to  the  present  problem  we  find  that 

For  the  6  minute  period :     V  =    '  0,   , —  =  25 .  56. 

20.5 

i?     4-u    m     •     i.         •  j    tt    9.62X100     0_  nA 
For  the  10  minute  period:  7= — ^-r-x —  =  27.64. 

o4 .  o 

Instead  of  being  50%  as  variable  in  the  6  minute  period  as 

25  56 
m  the  10,  therefore,  the  group  is  seen  to  be  actually       ' 

or  93%  as  variable, 


42        STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

The  coefficient  of  variation  is  especially  useful  in  those 
problems  in  which  the  variability  of  the  group  under  different 
conditions  is  the  factor  studied.  As  stated  above,  when  the 
averages  are  equal  the  absolute  variability  may  be  compared 
directly. 

2.  The  Comparison  of  Two  Groups  in  Terms  of  Their  Measures 
of  Central  Tendency  and  Variability 

The  existence  of  a  difference  between  the  averages  or  the 
medians  of  two  groups  does  not  indicate,  necessarily,  that 
there  are  any  very  marked  differences  in  the  performance  of  the 
various  individuals  within  the  two  groups.  An  obtained  differ- 
ence in  central  tendency  may  mean  that  the  person  ranking 
lowest  in  the  one  group  is  better  than  the  person  ranking  high- 
est in  the  other;  on  the  other  hand,  it  may  mean  also  that 
only  a  very  small  per  cent  of  the  better  group  is  actual^ 
ahead  of  the  poorer.  For  this  reason  in  comparing  groups  it 
is  not  sufficient  to  state  simply  the  difference  between  their 
averages  or  medians,  for  any  such  difference  will  depend  for  its 
significance  largely  upon  the  variability,  or  spread,  within  the 
groups  compared. 

Table  VII  will  illustrate  what  is  meant.  A  group  of  300 
boys  and  a  group  of  250  girls  have  been  measured  on  the 
same  test,  and  the  average,  median,  Q  and  a  of  each  group 
computed.  Now  if  we  compare  the  central  tendencies,  it  is 
clear  that  the  average  girl  is  2 .  19  points  ahead  of  the  average 
boy,  and  that  the  median  girl  is  2.25  points  ahead  of  the 
median  boy.  If  taken  alone  this  result  might  suggest  a  fairly 
definite  sex  difference  in  the  given  test;  but  before  drawing  this 
conclusion,  we  should  compare  the  variability  of  the  two  groups. 

A  comparison  of  the  Q's  and  c's  shows  that  the  girls  tend  to 
scatter  somewhat  more  around  their  central  tendency  than 
the  boys.  The  range  of  scores  is,  however,  practically  the  same 
in  both  groups:  100%  of  the  boys  and  92%  of  the  girls  score 
between  12  and  32  on  the  scale.  Also  from  the  quartiles 
it  is  evident  that  the  middle  50%  of  the  boys  scored  between 


THE  FREQUENCY  DISTRIBUTION 


43 


19  and  24  (approximately)  while  the  middle  50%  of  the  girls 
scored  between  20  and  27  (approximately). 


TABLE  VII 

Comparison 

OF 

Two   Groups  in  Terms  of  Central   Tendency, 
Variability,  and  Overlapping 

Boys 

Girls 

Scores            F 

D 

FD 

F£)2 

Scores          F 

D           FD              FD* 

28-32         15 
24-28         68 
20-24       128 
16-20         79 
12-16         10 

AT  =300 
f=150 

2 

1 

0 

-1 

-2 

30 

68+98 

-79 
-20-99 

60 
68 

79 
40 

247 

32-36      20 
28-32      35 
24-28      73 
20-24      68 
16-20      41 
12-16       13 

iV  =  250 

J-u. 

2           40               80 

1          35+75       35 

0 

-1       -68               68 

-2       -82              164 

-3       -39-189   117 

464 

GA=22.0 

&4=26 

-1 

C    300 

-.003 

-114 
C        250" 

-.456       c2  =  .208 

C=-. 003X4  = 

=  -.01 

C= -.456X4= -1.82 

Average  =  2 1.9£ 

1 

- 

Average  =24.18 

Median  =  20+ 

^X4  =  21.91 

Median  =  24+^ 

i  o 

X4  =  24.16 

[?-»>- 

=  16+^X4  =  19 

.29 

[^=62.5]q,= 

=  20+~X4  =  20.50 

68 

[^  =  225]0, 

=  24+^X4  =  24.47 

[f=i87.5]e, 

=  24+^-X4  =  27.59 

Q=2.59 

:4 

Q  =  3.55 

/247 
a~\300>< 

/464 
ff=V250-208><4 

=  .907X4 

:  =  3 

.63 

=  1.28X4  = 

5.12 

What  per  cent  of  the  boys  reach  or  exceed  24.16,  the  median  of  the 
girls?  217  boys  score  below  24.  Step  24-28  contains  68  scores;  hence 
there  are  68/4  or  17  scores  per  scale  unit  on  this  step.  17X-16  =  2.72. 
217+2.72  or  219.72  of  the  boys'  scores  fall  below  24.16,  the  girls'  median. 

300-219.72  ~80.28.  Accordingly,  ~*  or  26.76%— approximately  27%— 

of  the  boys  reach  or  exceed  the  median  score  of  the  girls. 


44         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

Again,  we  find  from  comparing  the  o-'s  that  the  middle  2/3 
of  the  boys  scored  between  21. 99 ±3. 63,  i.e.,  between  18  and 
25  (approximately)  and  that  the  middle  2/3  of  the  girls  scored 
between  24.18±5.12,  i.e.,  between  19  and  29  (approximately) 
on  the  scale.  In  spite  of  the  difference  in  averages  and 
medians,  therefore,  it  is  evident  from  the  measures  of  varia- 
bility that  the  boys  and  girls  scored  over  almost  exactly  the 
same  part  of  the  scale. 

To  compare  the  variability  of  the  boys  as  a  group  with  that 
of  the  girls,  we  must  compute  the  coefficients  of  variation. 
These  are 

„     «  T7    3.63X100     ir  - 

For  Boys:   V=  g^— =  16.5. 

For  Girls:    F=5-^**00  =  21.2. 

24.18 

16  5 
Expressed  as  a  per  cent,  the  boys  are  91 '     or  78%  as  variable 

as  the  girls. 

3.  The  Comparison  of  Two  Groups  in  Terms  of  Overlapping 

A  second  way  of  showing  how  alike,  or  unlike,  two  groups 
are  in  their  performance  on  a  given  test  is  to  state  the  amount 
of  overlapping  in  the  distributions  of  scores  made  by  the  two 
groups.  This  information  serves  as  a  valuable  supplement 
to  that  secured  from  a  comparison  of  central  tendencies  and 
variabilities.  Overlapping  is  usually  measured  by  the  per  cent 
of  the  one  group  which  reaches  or  exceeds  the  median  of  the 
other.  In  the  present  problem  we  may  compute  the  per  cent 
of  boys  who  reach  or  exceed  the  median  score  of  the  girls. 

The  calculation  of  this  measure  of  overlapping  is  as  follows. 
First,  we  add  up  the  boys'  scores  from  the  small  end  of  the 
distribution  to  find  how  many  fall  below  24 .  16,  the  girls' 
median.  Two  hundred  and  seventeen  boys,  10+79  +  128, 
score  below  24,  the  lower  limit  of  the  step  24-28.  To  find 
how  many  score  below  24.16,  we  divide  the  68  scores  on  this 


THE  FREQUENCY  DISTRIBUTION  45 

step-interval  by  4  (the  length  of  step)  and  multiply  the  result 

(17)  by  .16  in  order  to  find  how  far  beyond  24  we  must  go  to 

reach  the  point  24 .  16.      The  result  of  this  last  calculation  is 

2.72,  and  accordingly  a  total  of  217+2.72  or  219.72  of  the 

boys'  scores  out  of  the  total  300  fall  below  24.16,  the  girls' 

median  score.     If  we  subtract  219.72  from  300,  it  follows  that 

80.28  of  the  boys'  scores  lie  above  24. 16.     It  is  clear,  then,  that 

80  28 

'      or  27%  of  the  boys  score  at  or  beyond  the  girls'  median. 

oUU 

(See  Table'VII.) 

Summarizing  the  results  from  Table  VII  and  the  discus- 
sion of  the  preceding  paragraphs,  we  find  that  the  difference 
between  the  average  boy  and  average  girl  is  2. 19  points  in  favor 
of  the  girls,  and  that  the  difference  between  the  median  boy 
and  median  girl  is  2.25  points  in  favor  of  the  girls.  Twenty- 
seven  per  cent  of  the  boys  reach  or  exceed  the  median  score  of 
the  girls;  100%  of  the  boys  and  92%  of  the  girls  score  within 
the  same  limits  on  the  scale;  the  middle  2/3  of  the  boys  score 
between  18  and  25,  and  the  middle  2/3  of  the  girls  score  between 
19  and  29.  The  obvious  conclusion  from  these  data  seems  to 
be  that  individual  differences  within  either  group — between 
boy  and  boy  or  between  girl  and  girl — are  probably  of  more 
importance  (because  greater)  than  the  differences  between 
boy  and  girl  indicated  by  the  averages  or  medians  taken  alone. 

VI.  The  Calculation  of  the  Percentiles  in  a  Frequency 

Distribution 

We  have  already  found  it  necessary  in  finding  the  quartile 
deviation,  Q  (see  page  18)  to  calculate  Qi,  the  first  quartile 
or  25th  percentile,  and  Qz,  the  third  quartile,  or  75th  percentile. 
It  is  often  very  useful  to  know,  in  addition  to  these  points, 
the  ten  decile  points  in  the  distribution  as  well,  viz.,  the  10th, 
the  20th,  the  30th,  the  40th,  etc.,  percentile  points.  These 
values  are  calculated  in  exactly  the  same  manner  as  the  median 
and  the  quartiles.     As  the  25th  percentile,  for  example,  was 


4G         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

found  by  counting  off  1/4  of  the  scores  from  the  small  end  of 
the  distribution,  and  the  50th  percentile  (the  median)  by  count- 
ing off  1/2  of  the  scores,  in  exactly  the  same  way  the  10th 
percentile  is  found  by  counting  off  1/10,  and  the  20th  percentile 
by  counting  off  2/10  of  the  scores  from  the  small  end  of  the  dis- 
tribution. Percentiles  are  of  considerable  value  in  enabling 
us  to  compare  the  standing  of  different  individuals  in  a  number 
of  tests,  or  to  combine  the  standing  of  the  same  individual  in 
different  tests  (see  page  278  for  a  fuller  discussion  of  this). 

Table  VIII  gives  the  method  of  calculating  the  percentiles 
in  the  distribution  of  54  Army  Alpha  scores  taken  from  Table  I. 
The  10th  percentile,  147,  is  located  by  finding  10%  of  54, 
and  counting  off  5.4  scores  from  the  small  end  of  the  distribu- 
tion. In  like  manner,  the  20th  percentile,  which  is  2/10  or 
10.8  scores  from  the  small  end  of  the  distribution  is  located 
at  155.67.  The  20th  percentile  score  is  taken  as  155.  This 
is  due  to  the  fact  that  a  score  of  155  in  a  continuous  series 
means  "155  up  to  156"  and  consequently  155.67  falls  on  score 
155,  just  as  160.25,  the  30th  percentile  point,  falls  on  score 
160.1  The  other  percentile  points,  and  their  scores,  are 
tabulated  in  Table  VIII. 

A  word  should  be  said  with  regard  to  the  calculation  of  the 
0  and  100th  percentiles.  These  values  are  the  lowest  and  the 
highest  scores,  respectively,  in  the  distribution.  For  example, 
we  find  from  the  original  scores  in  Table  I  that  the  lowest 
score  is  126  and  the  highest  201.  Therefore,  the  0  percentile 
falls  at  126  and  the  100th  at  201. 

Note  the  column  in  the  table  marked  Cum.  F  (cumulative 
frequency) .  The  entries  in  this  column  were  obtained  by  adding 
the  scores  (the  F)  serially  beginning  with  those  on  step  125-129 : 
e.g.,  2+0  =  2;  2+2=4;  4+1  =  5,  etc.  From  this  column 
we  can  quickly  tell  how  far  we  must  count  into  the  distribution 
in  order  to  reach  any  percentile  point.  For  example,  the  70th 
percentile  is  37.8  scores  from  the  beginning  of  the  distribution; 

1  This  applies  also  to  the  median  and  the  quartilep  in  a  distribution  of  scores 
in  continuous  series. 


THE  FREQUENCY  DISTRIBUTION 


47 


TABLE  VIII 

To  Illustrate  the  Calculation  of  the  Percentiles  in  a 
Frequency  Distribution 

1.  data  from  table  i 


Scores 

F 

Cum.  F 

Percentiles 

Scores 

200-204 

1 

54 

100 

201 

195-199 

4 

53 

90 

194 

190-194 

2 

49 

80 

188 

185-189 

10 

47 

70 

185 

180-184 

3 

37 

60 

179 

175-179 

8 

34 

50 

175 

170-174 

3 

26 

40 

167 

165-169 

3 

23 

30 

160 

160-164 

4 

20 

20 

155 

155-159 

6 

16 

10 

147 

150-154 

4 

10 

0 

126 

145-149 

1 

6 

140-144 

1 

5 

135-139 

2 

4 

130-134 

0 

2 

125-129 

2 

2 

N~- 

=  54 

CALCULATIONS : 

10%  of  54  = 

5.4 

4 
145  +  — 

-X5  =  147 

20%  of  54  =  10.8 
30%  of  54  =  16.2 
40%  of  54  =  21.6 
50%  of  54  =  27 
60%  of  54  =  32.4 
70%  of  54  =  37.8 
80%  of  54=43.2 
90%  of  54  =  48.6 


155  +  ^-X5  =  155.67  (155) 
160  +  ^-X5  =  160.25  (160) 
165+^X5  =  167.67  (167) 
175+  I  X5  =  175.626  (175) 

o 

6  4 
175+-^-X5  =  179 


185+Io  x5  =  18540  (185> 
185+^X5  =  188.1  (188) 

190+^X5  =  194 


48 


STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 


TABLE  VIII— Continued 

2.  DATA  FROM  "A  SCALE  OF  PERFORMANCE  TESTS,"  BY  PINTNER  AND 
PATTERSON,  PAGE  133.  SCORES  MADE  BY  72  NINE-YEAR  OLDS  ON  THE 
SUBSTITUTION  TEST    (iN   SECONDS). 


Scores  (sec.) 

F 

Cum.  F 

Percentiles 

Scores 

80-89 

1 

1 

100 

80 

90-99 

2 

3 

90 

108 

100-109 

5 

8 

80 

121 

110-119 

5 

13 

70 

126 

120-129 

13 

26 

60 

133 

130-139 

9 

35 

50 

141 

140-149 

6 

41 

40 

152 

150-159 

11 

52 

30 

158 

160-169 

5 

57 

20 

172 

170-179 

3 

60 

10 

192 

180-189 

4 

64 

0 

219 

190-199 

3 

67 

200-209 

2 

69 

210-219 

3 

72 

N  =  72 
calculations: 

10%  of  72  (90th  percentile 

20  %  of  72  (80th  percentile 

30%  of  72  (70th  percentile 

40%  of  72  (60th  percentile 

50%  of  72  (50th  percentile 

60%  of  72  (40th  percentile 

70%  of  72  (30th  percentile 

80%  of  72  (20th  percentile 

90%  of  72  (10th  percentile 


=  7.2       100+^X10  =  108.4  (10S) 
o 

=  14.4       120+^X10  =  121 

=  21.6       120+^X10  =  126.6  (126) 

=28.8       130+^X10  =  133 


=  36 


140+  -r  X10  =  141.67  (141) 


o  o 


=  43.2       150+j^X10  =  152 

=  50.4       150+j^Xl0  =  15S.5  (15S) 

=  57.6       170+ --  X10  =  172 


=  64.8       190+  4  X10  =  192.67  (192) 


THE  FREQUENCY  DISTRIBUTION  49 

hence  it  is  clear  from  the  Cum.  F's  that  37  scores  will  take  us 
to  185 — upper  limit  of  step  180-184 — and  that  the  70th 
percentile  lies  on  step  185-189. 

When  once  the  percentile  table  has  been  drawn  up,  it  is  a 
relatively  simple  matter  to  find  the  percentile  corresponding 
to  any  given  score.  In  our  problem,  for  instance,  the  man 
who  makes  a  score  of  177  falls  on  the  55th  percentile — midway 
between  the  50th  (175)  and  the  60th  (179)  percentiles;  while 
the  man  who  scores  158  has  a  percentile  score  of  26,  six  tenths 
of  the  interval  between  the  20th  percentile  (155)  and  the 
39th  percentile  (160).  Other  interpolations  may  be  easily 
made  in  like  manner. 

In  Table  VIII  (2)  the  percentiles  have  been  calculated  for 
the  distribution  of  scores  (in  seconds)  made  by  seventy-two 
9-year  olds  on  the  Woodworth- Wells  Substitution  test.1  As  the 
scores  are  in  time-units,  the  lowest  score  is  the  best  (the 
quickest)  performance,  while  the  highest  score  is  the  worse  (the 
slowest)  performance.  Consequently,  the  percentile  scale  is 
reversed:  we  count  from  the  100th  percentile  down  instead 
of  from  the  0  percentile  up.  To  find  the  90th  percentile  for 
example,  we  count  in  7.2  (10%  of  N)  from  80-89  until  we 
reach  108.4  (score  108).  Counting  in  two  tenths  of  N  from 
80-89,  we  reach  121,  the  80th  percentile.  The  100th  per- 
centile is  taken  at  80,  theoretically  the  fastest  record;  the  0 
percentile  at  219,  the  poorest  record. 

From  the  percentile  table  we  may  say  that  a  9-year  old  who 
completes  the  Substitution  Test  in  141  sees,  has  a  percentile 
score  of  50 — stands  at  the  median  of  the  group;  while  a  child 
of  9  who  takes  181  sees,  to  complete  the  test  sjtands  15th  in 
the  group — midway  between  the  10th  percentile  (192)  and  the 
20th  percentile  (172). 

1  Pintner  and  Patterson:   A  Scale  of  Performance  Tests,  1921,  p.  133. 


50         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

VII.  When  to  Use  the  Various  Measures  of  Central 
Tendency  and  Variability 

The  beginner  in  statistics  is  often  at  a  loss  to  know  which 
measure  of  central  tendency  or  variability  to  use.  The  following 
summary  will  serve  as  a  guide  for  most  of  the  problems  which 
the  student  will  ordinarily  meet : 

1.  When  to  Use  the  Average,  Median,  and  Mode 

1.  Use  the  Average: 

(1)  When  each  score  or  measure  should  have  equal 

weight  in  determining  the  central  tendency. 

(2)  When  the  highest  reliability  is  sought. 

(3)  When  product-moment  coefficients  of  correlation, 

or  measures  of  reliability  are  to  be  subse- 
quently computed. 

2.  Use  the  Median: 

(1)  When  a  quick  and  easily  computed   measure  of 

central  tendency  is  necessary. 

(2)  When  there  are  extreme  measures   which  would 

affect  the  average  disproportionately. 

(3)  When  certain  scores  or  measures  should  influence 

the  central  tendency,  but  all  that  is  known  about 
them  is  that  they  are  above  or  below  the  central 
tendency. 

3.  Use  the  Mode: 

(1)  When  a  quick  approximate  measure  of  concentration 
is  desired. 
(2)  When  only  the  most  often  recurring  score  is  sought. 

2.  When  to  Use  the  Range,  Q,  AD,  and  <r 

1.  Use  the  Range: 

(1)  When  the  data  are  too  scant  or  scrapp3T  to  justify 

the  calculation  of  another  measure  of  variability. 

(2)  When  a  knowledge  of  the  total  spread  is  all  that  is 

necessary. 


THE  FREQUENCY  DISTRIBUTION  51 

2.  Use  the  Q: 

(1)  For  a  quick,  inspectional  measure  of  variability. 

(2)  When  there  are  scattered  or  extreme  measures. 

(3)  When  only  the  concentration  around  the  central 

tendency  is  sought. 

3.  Use  the  AD: 

(1)  When  it  is  desired  to  weight  all  deviations  accord- 

ing to  their  size. 

(2)  When  extreme  deviations  should  not  influence  the 

measure  of  variability. 

4.  Use  o". 

(1)  When  the  highest  reliability  is  desired. 

(2)  When  it  is  desired  that  extreme  deviations  influence 

the  measure  of  variability. 

(3)  When   coefficients   of  correlation   or  measures   of 

reliability  are  later  to  be  computed. 

VIII.  Summary  of  Formulas  for  Finding  the  Measures  of 
Central  Tendency  and  Variability 

1.  Measures  of  Central  Tendency 
I.  Average: 

A.  Long  Method: 

(a)  data  ungrouped : 

A                2  (Measures)  ,_ 

Average  =  — — j= '- (1) 

(b)  data  grouped : 

Average  =  -A-^ — - (2) 

B.  Short  Method: 

(a)  data  grouped : 
Average = GA +C     (Algebraic.) 

c  =  2(TO)(algebraic)xlengthofstep 


52         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

2.  Median: 

Arrange  the  measures  in  order  of  size,  and  count  off 
1/2  of  the  measures  beginning  at  the  small  end  of 
the  series. 

3.  Mode: 

For  Crude  Mode  take  most  frequent  score,  or  mid- 
point of  atep  with  largest  frequency. 

2.  Measures  of  Variability 

1.  Range  =  (largest  measure)  —  (smallest  measure). 

2.  Quartile  Deviation: 

Q=Qj^-, (3) 

3.  Average  Deviation: 

A.  Long  Method : 

(a)  data  ungrouped : 

.  n     2D  (arithmetical)  fA. 

AD—  jy  -, (4) 

(b)  data  grouped : 

.  ~     2FD  (arithmetical)  /rN 

AD= K—^ ', (o) 

B.  Short  Method: 

(a)  data  grouped : 

,n    2FD+c(Fl-Fg)„,       .,     ,   ,  fQ. 

AD  = ^ -X length  of  step,      .     .     (8) 

4.  Standard  Deviation: 

A.  Long  Method : 

(a)  data  ungrouped : 

'->Sr. (6) 


(b)  data  grouped : 

H 

N 


.-^ m 


THE  FREQUENCY  DISTRIBUTION 


53 


B.  Short  Method: 

(a)  data  grouped: 


(T=V 


ZFD2 

N 


c2  X  length  of  step,  ....     (9) 


5.  Coefficient  of  Variation: 

100(7 


V 


Average' 
IX.  Illustrative  Problems 


(10) 


The  following  problems  illustrate  the  calculation  of  the 
average,  median,  mode,  Q,  AD,  and  o-  for  continuous  and 
discrete  series.  They  are  given  as  examples  of  the  Short 
Method,  and  should  be  carefully  reviewed  by  the  student. 


Example  I 

Calculation  of  the  Average,  Median,  Mode,  Q,  AD, 

and  SD. 

Step 

»  =  7 

Measures 

Midpoint 

F                         D 

FD 

FZ)2 

145-151.99 

148.5 

1  1 

6 

6 

36 

138-144.99 

141.5 

1 

5 

5 

25 

131-137.99 

134.5 

2 

4 

8 

32 

124-130.99 

127.5 

2 

►F*7=34     3 

6 

18 

117-123.99 

120.5 

3 

2 

6 

12 

110-116.99 

113.5 

10 

1 

10+41 

10 

103-109.99 

106.5 

Av 

= 

15 

96-102.99 

99.5 

106 

.26 

14  1 
6 
3 

-1 

-14 

14 

89-  95.99 
82-  88.99 

92.5 
85.5 

>Fi  =  25  Z\ 

-12 
-  9 

24 

27 

75-  81.99 

78.5 

2J 

-4 

-  8-43 

32 

N 

=  59 

84 

230 

N 

2 

=  29.5 

GA  =  106.5 

2 
C="59  = 

AD=Si+< 

-.034)[25- 
59 

-34] 

X7 

-.034 

t 

:2  = . 

001 

AD  =  10.00 

C=-.  034X7=  -.238 
Average  =  106 . 5  -f-  ( -  .  238)  =  106 .  26 

Median  =  103  +  ~X7  =  105. 10 
15 


.=  J?30. 

V  59 


.001X7 
er  =  l.  97X7  =  13. 79 


Mode  =  106. 50 

N 
4=14.75 

f=44.25 


[ 


[ 


Qi=96+^-X7  =  97.875 


#3  =  1104 


14 
4.25 


Q  =  7.55 


10 


X7  =  112.975 


54        STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 


Example  II 

Calculation  of  Average,  Median,  Q  and  SD.     Step  =  1 


Soores 

22-22.9 

21-21.9 

20-20.9 

19-19.9 

18-18.9 

17-17.9 

16-16.9 

15-15.9 

14-14.9 

13-13.9 

12-12.9 

11-11.9 

10-10.9 

9-  9.9 

8-  8.9 

7-  7.9 

6-  6.9 

5-  5.9 

4-  4.9 

3-  3.9 

2-  2.9 

1-  1.9 


F 

1 

7 

16 

35 

81 

172 

330 

600 

1,031 

1,793 

2,572 

2,951 

3,187 

3,319 

2,891 

2,149 

1,315 

684 

302 

112 

38 

10 

#  =  23,596 

N 

,J  =11,798 


GA  =  10.5 
-2234 


c=- 


23,596 
C=-.09 
Average  =  10.41 


=  -.09 


D 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

-1 

-2 

-3 

-4 

-5 

-6 

-7 

-8 

-9 


c2  =  .008 


Median  =  10 


978 
'3187 


Xl  =  10.31 


FD 

12 

77 

160 

305 

648 

1,204 

1,980 

3,000 

4,124 

5,379 

5,144 

2,951+24,984 

•3,319 

-5,782 
•6,447 
5,260 
■3,420 
■1,812 
•    784 

-  304 

-  90-27,218 


-2,224 


FD* 

144 

847 

1,600 

2,745 

5,184 

8,428 

11,880 

15,000 

16,496 

16,137 

10,288 


3,319 

11,564 

19,341 

21,040 

17,100 

10,872 

5,488 

2,432 

810 

1S0,715 


,1S0, 715       „„ 
V  23^96 -00SX1 


r^= 5,899]  q1==8+iii?> 


[^= 17,697]  <?.« 


2891 

7QQ 

12+^X1=12.29 
25/2 


=  2.77 


Q  =  1.92 


THE  FREQUENCY  DISTRIBUTION 


55 


Example  III 

Calculation  of  Average,  Median,  Mode,  Q,  AD,  SD,  for  Discrete  Series 

Step  =  1 


Measures 

F 

21 

21 

22 

1 

23 

4 

>  Fl 

24 

9 

25 

Average 

"~   =25.036 
26 

21, 
11 

\ 

27 

28 

6 
1 

■   Fg 

29 

_^j 

N  = 

56 

N 
2 

28 

GA=25 

5o 

( 

;      c-=.ooi 

Average  =  25 .  04 

Median  =25 

Mode  =  25 

[?-»] 

Qi=24 

Of*-] 

& 

=26 

D 

-4 

-3 

-2 

-1 

1 

2 

3 

4 


FD 

FD 

-8 

32 

-3 

9 

-8 

16 

-9-28 

9 

11 

11 

12 

24 

3 

9 

4+30 

16 

58 


126 


AD  =  58+. 036(37-19)  xl 
5o 

4D  =  1.05 

<r  =  1.50 

O-i.o 


56 


STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 


PROBLEMS 

1.  Tabulate  the  following  scores  into  three  frequency  distributions, 
using  class-intervals  of  3,  5,  and  10  units  respectively. 

Scores  made  on  the  Thorndike  Entrance  Examination  by  100 
applicants  for  admission  to  Columbia  College.  (From  Sommerville, 
R.  C:  Physical,  Motor  and  Sensory  Traits,  Archives  of  Psychology, 
75,  1924.)     Note: — Fractions  have  been  dropped. 


2. 


63 


80 


75 


90 


81 


83 


78 

81 

83 

83 

89 

98 

46 

90 

103 

81 

71 

93 

82 

78 

86 

85 

73 

83 

74 

86 

84 

72 

63 

76 

103 

78 

85 

81 

105 

94 

78 

101 

76 

98 

74 

75 

88 

65 

80 

81 

98 

56 

103 

90 

92 

85 

78 

73 

87 

75 

102 

58 

78 

95 

73 

73 

73 

96 

83 

110 

95 

90 

87 

86 

96 

98 

82 

86 

70 

70 

95 

71 

89 

86 

85 

72 

94 

92 

73 

84 

79 

74 

88 

72 

92 

86 

93 

84 

50 

85 


76 


82 


99 


91 


The  following  distributions  represent  the  scores  made  on  a  logical 
memory  test  by  two  racial  groups,  A  and  B. 

(1)  Find  the  average,  median,  Q  and  SD  of  each  distribution. 

(2)  What  per  cent  of  group  A  reaches  or  exceeds  the  median  of 

group  B? 

(3)  Compare  the  relative  variability  of  the  two  groups  by  means 

of  their  coefficients  of  variation. 


Scores 

Group  A 

Group  B 

79-83 

6 

8 

74-78 

7 

8 

69-73 

8 

9 

64-68 

10 

16 

59-63 

12 

20 

54-58 

15 

18 

49-53 

23 

19 

44-48 

16 

11 

39-43 

10 

13 

34-38 

12 

8 

29-33 

6 

7 

24-28 

3 

2 

#  =  128 


#  =  139 


THE  FREQUENCY  DISTRIBUTION 


57 


3.  Compare  the  30th,  60th,  and  90th  percentile  scores  in  Group  A 

[problem  (2)]  with  the  corresponding  percentile  scores  in 
Group  B. 

4.  The  following  problems  are  given  for  the  purpose  of  affording 

practice  in  finding  measures  of  central  tendency  and  measures  of 
variability.  In  every  case  where  the  Average,  AD,  or  SD  is  to 
be  found,  use  the  Short  Method. 


(1)  Find  the  Average 

!  and 

SD. 

Scores 

F 

70-71 

2 

68-69 

2 

66-67 

3 

64-65 

4 

62-63 

6 

60-61 

7 

58-59 

5 

56-57 

4 

54-55 

2 

52-53 

3 

50-51 

1 

(2)  Find  the  Median  and  AD 
(from  the  Median.) 


Scores 
90-94 
85-89 
80-84 
75-79 
70-74 
65-69 
60-64 
55-59 
50-54 
45-49 
40-44 


iV  =  39 


F 
2 
2 
4 
8 
6 
11 
9 
7 
5 
0 
2 

iV  =  56 


(3)  Find  the  Average,  AD, 
and  SD. 


Scores 

F 

120-122 

2 

117-119 

2 

114-116 

2 

111-113 

4 

108-110 

5 

105-107 

9 

102-104 

6 

99-101 

3 

96-98 

4 

93-95 

2 

90-92 

1 

(4)  Find  the  Average  and  SD. 
(Discrete  Series.) 


Scores 

80  ' 

79 

78 

77 

76 

75 

74 

73 

72 

71 


2V  =  4Q 


F 
1 

3 
3 
6 

8 
7 
3 
4 
2 
1 

iV=38 


58 


STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 


(5)  Find  the  Median  and  Q.     (6)  Find  the  Average,  Median  and  SD. 


Scores 

F 

Measures 

F 

100-109 

5 

80-84 

8 

90-99 

9 

75-79 

14 

80-89 

14 

70-74 

19 

70-79 

19 

65-69 

24 

60-69 

21 

60-64 

29 

50-59 

30 

55-59 

27 

40-49 

25 

50-54 

26 

30-39 

15 

45-49 

28 

20-29 

10 

40-44 

20 

10-19 

8 

35-39 

15 

0-9 

6 

30-34 

10 

#  =  162 


#  =  220 


2.   (1) 


Answers 

Group  A 

Group  B 

Average 

53.88 

56.21 

Median 

52.70 

56.64 

Q 

9.64 

9.90 

SD 

13.82 

13.73 

(2)  39%  of  Group  A  reaches  or  exceeds  the  median  of  Group  B 

(3)  Coefficient  of  Variation,  Group  A  =  25. 64;    Group  B  =24.43 ; 

Group  B  is  95 . 3%  as  variable  as  Group  A. 


3. 


Group  A 

Group  B 

30th  percentile  score 

46 

49 

60th  percentile  score 

56 

60 

90th  percentile  score 

74 

75 

(1) 

Average  =  61.26 

£D=  4.99 

(2) 

Median  =  67.27 

AD=  8.97 

(3) 

Average  =  106. 5 

AD=  5.55 

SD  =  7.2S 

(4) 

Average  =  75.66 

SD=  2.11 

(5) 

Median  =  55.67 

(3  =  16.41 

(6) 

Average  =  57.0 

Median  =  57. 04 

£D  =  13.17 

CHAPTER  II 

GRAPHIC  METHODS  AND  THE  NORMAL  CURVE 

I.  The   Graphic   Representation   of  the   Frequency 

Distribution 

We  learned  in  the  last  chapter  how  scores  or  other  measures 
of  capacity  may  be  organized  and  condensed  into  the  tabular 
arrangement  called  a  frequency  distribution.  In  addition 
we  found  how  such  arrangement  aids  us  in  calculating  measures 
of  central  tendency  and  variability,  and,  in  general,  gives  us  a 
better  idea  of  the  facts  as  a  whole.  Still  further  aid  in  analyzing 
numerical  data  may  be  secured  by  a  graphic  or  pictorial  treat- 
ment of  our  material.  The  advertiser  has  long  recognized 
the  power  of  the  illustration  to  catch  the  eye  and  hold  the 
attention  where  the  most  careful  array  of  statistics  fails.  And 
in  like  manner,  the  statistician,  through  the  medium  of  dia- 
grams and  graphs^  attempts  to  utilize  the  attention-getting 
power  of  visual  presentation  and  at  the  same  time  to  translate 
numerical  facts — often  abstract  and  difficult  of  interpretation — 
into  a  more  concrete  and  understandable  form. 

There  are  three  methods  of  representing  graphically — i.e., 
of  "  plotting  " — measures  which  have  been  grouped  into  a 
frequency  distribution.  The  first  method  gives  the  Frequency 
Polygon;  the  second  the  Histogram  or  Column  Diagram; 
and  the  third,  the  Ogive,  or  cumulative  frequency  graph. 
These  will  be  considered  in  order. 

1.  The  Frequency  Polygon 

Before  outlining  the  method  of  constructing  a  frequency 
polygon,  it  might  be  well  to  review  briefly  the  simple  algebraic 
principles   which   apply    to   all   graphical    representation    of 

59 


60 


STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 


Y 

F 

a, 

3) 

0) 

CO 

II 

'<& 

o 

a 

bs< 

jiss 

a 

0 


JC 


numerical  data.  Graphing  or  plotting  is  done  with  reference 
to  two  lines  or  "  coordinate  axes,"  the  one  the  vertical  or 
F-axis,  the  other  the  horizontal  or  X-axis.  These  basic  lines 
are  perpendicular  to  each  other,  the  point  where  they  inter- 
sect being  called  0,  or  the  origin  "  (see  Diagram  II).  To 
locate  or  "plot"  a  point  "P"  whose  coordinates  are  x =4, 
and  2/  =  3,  we  go  out  from  the  origin  4  units  on  the  X-axis,  and 

up  from  the  origin  3  units 
on  the  F-axis,  and,  where 
the  perpendiculars  to  these 
points  intersect,  locate  the 
point  P  (see  Diagram  II). 
In  like  manner,  any  point 
whose  x  and  y  values  are 
known  can  be  located 
with  reference  to  OY  and 
OX,  the  coordinate  axes. 
Distances  measured  along 
the  X-axis  are  commonly 
called  abscissas,  and  dis- 
tances  along   the    Y-axis   ordinates. 

We  may  now  show  how  these  principles  of  graphing  apply 
to  the  construction  of  the  frequency  polygon  shown  in  Diagram 
III  (1).  This  graph  pictures  the  frequency  distribution  of 
Table  I.  The  limits  of  the  step-intervals  (the  abscissas) 
are  laid  off  at  regular  intervals  along  the  base  line  (the  X-axis) 
from  the  origin;  and  the  frequencies  within  each  interval 
(the  ordinates)  are  measured  off  on  a  scale  along  the  F-axis. 
There  are  2  scores  on  the  first  step,  125-129  (see  Table  I). 
To  represent  these  on  our  diagram,  we  go  out  on  the  X-axis 
to  127.5 — midway  between  125  and  130 — and  up  2  F-units. 
Here  we  locate  the  first  point.  The  frequency  on  the  next 
step-interval,  130-134  is  0;  hence  the  second  point  falls  mid- 
way between  130  and  135  directly  on  the  X-axis.  The  2 
scores  on  step  135-139,  the  1  score  on  step  140-144,  and  the 
frequency  on  each  succeeding  step  is,  in  every  case,  represented 


DIAGRAM  II 

The  Use  of  Coordinate  Axes 
X  and  Y. 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE   61 


to  fi 

.2 
o 

a 

3  D 

o 


ll 

1 

| 

,    , 

i 

a: 

/ 

V 

i 

/ec 

^r 

! 

*— 

% 

II 

jj1 

p 

u 

0 

1 

> 

s 

1 

r 

S 

1 

- 

, 

120  125  130  135  140  145  150  155  160  165  170  175  130  185  190  195  200  205  210 

Scores 

DIAGRAM  III  (1) 

Frequency  Polygon  Plotted  from  Distribution 
of  54  Scores  in  Table  I 


J.U 

9 

. 

8 

7 

S 6 

a 
§5 

1 

o 
&4 

o 

3 

I 

oc 

r-H 

c 
?. 

i 

ral 

II 

2 

W5 

-31 

o 

r-i 

ir 

1 
—  R 

II 

1 

> 

< 

f 

, 

! 

j 

120  125  130  135  140  145  150  155  160  165  170  175  180  185  190  195  200  205  210 

Scores 

DIAGRAM  III  (2) 

Histogram  Plotted  from  Data  in  Table  I. 


G2        STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

by  a  point  the  specified  number  of  scores  (Y-units)  above  the 
X-axis,  and  midway  between  the  upper  and  lower  limits  of 
the  step  on  which  it  lies.  It  is  important  to  remember  in  plot- 
ting a  frequency  polygon  that  the  midpoint  of  the  step  is  always 
taken  to  represent  all  of  the  scores  within  that  interval.  The 
heights  of  the  ordinates  at  the  different  midpoints  represent 
the  frequencies  within  the  intervals. 

When  all  of  the  points  have  been  located  they  are  joined 
in  regular  order  to  give  the  outline  of  the  frequency  polygon 
shown  in  Diagram  III  (1).  In  order  to  complete  the  figure, 
note  that  the  step  next  below  the  lowest  (125-129)  and  the 
step  next  above  the  highest  (200-204)  are  included  on  the 
X-scale.  The  frequency  of  each  of  these  steps  is  taken  as  0; 
and  in  consequence  the  frequency  polygon  begins  and  ends  on 
the  X-axis. 

The  distance  taken  to  represent  a  step-interval  on  the 
X-axis  will  usually  depend  on  the  width  of  the  cross  section 
paper  used  and  on  the  number  of  steps  in  the  distribution. 
No  general  rule  can  be  given  for  the  choice  of  an  X-unit:  nor 
for  the  choice  of  the  unit  taken  to  represent  1  score  on  the 
F-axis.  The  length  of  the  diagram,  and  the  maximum  fre- 
quency on  any  given  step  (as,  for  example,  the  10  scores  on 
step  185-189)  will  generally  serve  to  indicate  within  what 
practical  limits  the  F-unit  must  be  selected.  After  plotting 
several  polygons,  the  student  will  soon  discover  that  a  too- 
long  F-unit  exaggerates  the  changes  in  the  distribution  from 
step  to  step,  while  a  too-short  F-unit  makes  the  graph  too 
flat.  In  like  manner,  a  too-long  X-unit  tends  to  stretch 
out  the  polygon,  while  a  too-short  X-unit  crowds  the  separate 
points  on  the  frequency  surface  and  makes  comparisons 
difficult. 

The  total  frequency  (N)  of  the  distribution  is  represented 
by  the  area  of  the  polygon:  that  is,  by  the  area  between  the 
boundary  or  frequency  surface  and  the  base  line.  The  area 
of  any  given  interval  cannot  be  taken  as  proportional  to  the 
number  of  cases  within  the  interval,  however,  because  of  the 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE   63 

numerous  irregularities  in  the  distribution,  and  consequently 
of  the  frequency  surface. 

To  show  the  position  of  the  average,  median,  and  mode 
on  the  graph,  we  must  first  locate  these  values  on  the  X-axis, 
and  then  erect  perpendiculars  as  shown  in  the  diagram.  Note 
that  the  mode  is  easily  located  as  the  highest  point  on  the* 
frequency  surface. 

The  steps  involved  in  constructing  a  frequency  polygon  may  be 
summarized  as  follows: 

1.  Draw  two  straight  lines  perpendicular  to  each  other, 
the  vertical  line  near  the  left  side  of  the  paper,  the  horizontal 
line  near  the  bottom.  Call  the  vertical  line — the  F-axis — 
OY,  and  the  horizontal  line — the  X-axis — OX.  Put  the  0 
where  the  two  lines  intersect.     This  point  is  called  the  origin. 

2.  Lay  off  the  step-intervals  of  the  frequency  distribution 
at  regular  intervals  along  the  X-axis.  Begin  with  the  lower 
limit  of  the  step  next  below  the  lowest  as  the  origin,  and  end 
with  the  upper  limit  of  the  step  next  above  the  highest.  Label 
the  successive  X-points  with  the  step  limits.  Select  as  the 
X  unit  a  distance  which  will  permit  all  of  the  steps  to  be 
represented  on  the  one  graph. 

3.  Mark  off  on  the  Y-axis  successive  unit  distances  to 
represent  the  scores  on  the  different  steps.  Choose  a  scale 
which  will  permit  the  maximum  frequency  to  be  represented 
on  the  graph. 

4.  From  the  midpoint  of  each  step-interval  on  the  X-axis, 
go  up  in  the  Y  direction  a  distance  equal  to  the  number  of 
scores  on  the  step.     Place  a  point  here. 

5.  Join  the  points  plotted  in  (4)  with  straight  lines  to  give 
the  frequency  polygon. 

2.  The  Histogram  or  Column  Diagram 

A  second  method  of  representing  a  frequency  distribution 
graphically  is  to  construct  a  histogram  or  column  diagram. 
This  type  of  graph  is  illustrated  in  Diagram  III  (2),  with  the 
same  distribution  of  scores  represented  by  the  frequency 
polygon  in  Diagram  III  (1).     The  two  graphs  are  constructed 


64         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

in  much  the  same  way  with  this  important  difference:  that 
whereas,  in  a  frequency  polygon,  all  of  the  scores  within  a 
given  interval  are  represented  by  the  midpoint  of  that  interval, 
in  the  histogram  the  assumption  is  made  that  all  of  the  scores 
within  an  interval  are  spread  uniformly  over  the  entire  interval. 
For  this  reason,  the  measures  within  any  given  interval  in  a 
histogram  are  represented  by  a  rectangle  constructed  with 
base  equal  to  the  length  of  the  step-interval,  and  altitude 
equal  to  the  number  of  measures  within  the  interval.  Thus  [see 
Diagram  III  (2)]  the  2  scores  on  step  125-129  are  represented 
by  a  rectangle  with  base  equal  to  the  length  of  step-interval 
on  the  X-axis,  and  altitude  equal  to  2  units  measured  off  on 
the  F-axis.  As  there  are  no  scores  within  the  next  interval 
130-134,  no  rectangle  is  drawn  here.  The  altitudes  of  the 
other  rectangles  vary  with  the  number  of  scores  on  the  intervals. 
When  the  same  number  of  scores  occur  on  two  (or  more) 
adjacent  steps,  as  in  the  intervals  from  140  up  to  145  and  from 
145  up  to  150,  the  base  of  the  rectangle  covers  two  (or  more) 
intervals  on  the  X-axis.  The  highest  rectangle  is,  of  course, 
that  which  has  the  step  185  up  to  189  as  its  base  and  10,  the 
maximum  frequency,  as  its  altitude.  In  selecting  scales  for 
the  X-  and  F-axes,  the  same  considerations  as  to  numbers  of 
intervals,  size  of  paper,  maximum  frequency,  etc.,  noted  under 
the  frequency  polygon,  must  be  observed. 

Although  in  a  histogram  each  step-interval  is  represented 
by  a  separate  rectangle,  it  is  not  necessary  to  project  the  sides 
of  these  different  rectangles  to  the  base  line,  as  shown  in 
Diagram  III  (2),  as  the  rise  and  fall  of  the  boundary  line  showing 
the  increase  or  decrease  in  the  number  of  scores  from  step  to 
step  is  usually  the  important  fact  to  be  brought  out.  As 
in  the  frequency  polygon,  the  total  frequency  (N)  is  represented 
by  the  area  of  the  histogram.  In  contrast  to  the  frequency 
polygon,  however,  the  area  of  each  rectangle  in  a  histogram  is 
directly  proportional  to  the  number  of  measures  in  the  interval, 
so  that  we  have  in  the  column  diagram  an  accurate  picture 
of  the  number  of  scores  falling  on  each  step. 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE   65 


In  order  to  make  easier  a  comparison  of  the  two  types  of 
frequency  graph,  the  distribution  of  Table  III  is  plotted  in 
Diagram  IV,  on  the  same  coordinate  axes,  both  as  a  frequency 
polygon  and  a  histogram.  The  increased  number  of  cases 
and  the  more  symmetrical  distribution  of  scores  make  both 


52 

1 

'? 

\ 

5U 

rrr 

\ 

4o 

i 

\ 

4b 

1 

\ 

44 

/ 

\ 

4^ 

/ 

r 

1 

4U 

i 

\ 

OO 

/ 

\ 

o4 
32 

/ 

\ 
\ 

/ 

\ 

-2  oO 

§28 

/ 

\ 

\ 

/ 

\ 

p  <so 

£24 

/ 

/ 

\ 

/ 

C-i 

CM 

X 

J*  22 

9(1 

/ 

OS 

\ 

\ 

/ 

/ 

T— 1 

III 

II 

— CD 

— C 

\ 

lo 

16 
14 

19 

/ 

K 

\ 

\ 

/ 
/ 

5* 

\ 

/ 

<H 

3? 

\ 

in 

/ 

ci 

\ 

g 

> 

/ 

i— t 
|| 

\ 

a 

7^ 

p 

> 

s 

/ 

1 

<l 

L^H 

& 

/ 

/ 

^ 

100         104 


103 


112 


116 


120      124 

Scores 


128 


132 


136 


140 


144 


DIAGRAM  IV 

Plotting  op  Frequency  Polygon  and  Histogram. 
[Data  from  Table  III  (2)]. 

of  these  graphs  more  regular  in  appearance  than  the  graphs 
of  Diagram  III.1 

The  question  of  when  to  use  the  frequency  polygon  and 
when  to  use  the  histogram  cannot  be  answered,  unfortunately, 
by  giving  a  general  rule  which  will  cover  all  cases.  The 
frequency  polygon  is  less  exact  than  the  histogram  in  that 
it  does  not  represent  accurately— i.e.,  in  terms  of  area— the 

1  Other  examples  of  frequency  polygons  and  histograms  may  be  found  on 
page  75. 


6G         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

number  of  measures  on  the  successive  step-intervals.  For 
comparing  two  or  more  distributions  plotted  on  the  same 
diagram,  however,  the  frequency  polygon  is  probably  the  more 
useful,  since  the  many  vertical  lines  in  the  histogram  often 
coincide.  Both  the  histogram  and  the  frequency  polygon 
tell  the  same  story,  and  both  are  useful  in  enabling  us  to  show 
in  a  graphic  fashion  whether  the  scores  of  a  group  distribute 
uniformly  over  the  scale,  or  whether  they  pile  up  at  the  low 
or  the  high  end.  Not  only  information  with  regard  to  the 
group  but  information  with  regard  to  the  test  may  be  thus 
secured.  If  a  test  is  too  easy,  the  scores  will  fall  dispropor- 
tionately at  the  high  end  of  the  scale;  if  too  hard  at  the  low 
end.  If  the  test  is  neither  too  hard  nor  too  easy,  the  scores 
will  tend  to  be  symmetrically  distributed,  a  few  individuals 
scoring  high,  a  few  low,  and  the  majority  scoring  somewhere 
near  the  middle  of  the  scale.  In  this  last  case,  the  frequency 
polygon  or  histogram  approximates  the  "  ideal  "  or  normal 
frequency  distribution  (see  page  76). 

3.  The  Ogive 

The  ogive,  or  cumulative  frequency  graph,  is  a  third 
way  of  representing  a  frequency  distribution  by  means  of  a 
diagram.  Before  we  can  plot  an  ogive,  the  scores  of  the  distri- 
bution must  first  be  added  serially  or  cumulated,  as  shown  in 
Table  IX  for  the  two  distributions  taken  from  Table  II  (1 
and  2).  (These  two  distributions  have  already  been  used  to 
illustrate  the  frequency  polygon  and  histogram  in  Diagrams 
III  and  IV.)  Note,  that  the  first  two  columns  in  Table  IX 
are  exactly  the  same  as  in  any  frequency  distribution,  but 
that  in  the  third  column  the  scores  have  been  "  accumulated  " 
successively  from  the  low  end  of  the  distribution  as  described 
on  page  46.  The  last  cumulative  score  is,  of  course,  equal 
toiV.1 

1  Cumulative  distributions  are  useful  also  in  telling  quickly  how  many  in  a 
group  scored  above  or  below  a  certain  point  on  the  scale.  In  Table  IX,  for 
example,  we  read  that  10  men  in  the  group  made  Alpha  scores  below  155,  47 
below  190,  etc. 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE   67 


125   130  135  140  145  150    155  160   165   170  175   180  185    190  195  200 

Step-Intervals 

DIAGRAM  V  (1) 
Ogive  Curve.    Data  prom  Table  II  (1). 


205 


200 

_ 

100 

180 

( 

90 

Frequencies 

§     S    8 

80 
70 
60 

|100 

1  80 

50 
40 

/l 

/    1 

/      l 

i   60 

1 
1 

- 

30 

40 

M<in. 
1 

- 

20 

20 

m.     t       _ 

i 

1 
j 

I 

i 

i 

r 

ail      a 

- 

10 

i 

14 

108 

112 

116 

120           124 
Step-Intervals 

128 

132 

136 

14 

0 

DIAGRAM  V  (2) 
Ogive  Curve.    Data  prom  Table  II  (2). 


68         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

The  two  ogives  which  represent  the  distributions  of  Table 
IX  are  shown  in  Diagram  V  (1  and  2).  Consider  first  the 
ogive  of  the  54  Alpha  scores  shown  in  (1).  The  step-intervals 
of  the  distribution  have  been  laid  off  along  the  X-axis,  and 
successive  distances  equal  to  the  total  number  of  scores  in  the 
distribution  (here  54)  have  been  laid  off  on  the  F-axis.  It  will 
be  remembered  in  plotting  the  frequency  polygon  that  the 
frequency  of  each  step  was  taken  at  the  midpoint  of  the  step- 
interval;    in  constructing  an  ogive,  however,  each  cumulative 


TABLE  IX 

Cumulative  Frequencies 

OF    THE 

Two   Distributions 

in   Table  11 

(For  Plotting  the  Ogives  of  Diagram  V) 

(1) 

(2) 

Measures 

F 

Cum.  F 

Measures 

F 

Cum.  F 

200-204 

1 

54 

136-139 

3 

200 

195-199 

4 

53 

132-135 

5 

197 

190-194 

2 

49 

128-131 

16 

192 

185-189 

10 

47 

124-127 

23 

176 

180-184 

3 

37 

120-123 

52 

153 

175-179 

8 

34 

116-119 

49 

101 

170-174 

3 

26 

112-115 

27 

52 

165-169 

3 

23 

108-111 

18 

25 

160-164 

4 

20 

104-107 

7 

7 

155-159 

6 

16 



150-154 

4 

10 

iV=200 

145-149 

1 

6 

140-144 

1 

5 

135-139 

2 

4 

130-134 

0 

2 

125-129 

2 

iV  =  54 

2 

frequency  must  be  plotted  at  the  upper  limit  of  the  step  on  which 
it  falls.  The  first  point  on  the  curve,  for  example,  is  2  Y- 
units  (the  cumulative  frequency  on  step  125-129)  above  130; 
the  second  point  is  2  7-units  above  135,  the  third,  4  7-units 
above  140,  and  so  on  to  the  last  point  which  is  54  7-units  above 
205.  The  plotted  points  are  joined  in  order  to  give  the  ogive. 
Note  that  the  curve  begins  at  125  on  the  A"-axis,  and  ends  at 
205  just  54  7-units  above  the  X-axis. 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE   69 

Because  the  sample  is  small  and  the  distribution  of  scores 
unsymmetrical,  the  ogive  in  (1)  is  somewhat  jagged  in  outline. 
To  eliminate  such  irregularities  as  these  and  to  facilitate  later 
computations,  we  often  "  smooth  "  an  ogive  by  sketching  in  a 
smooth  curve  through  as  many  of  its  points  as  possible.  The 
dotted  line  in  Diagram  V  (1)  shows  the  result  of  this  smooth- 
ing process.  If  the  sample  is  large,  and  the  measures  well 
distributed,  smoothing  is  often  unnecessary  [see  Diagram 
V  (2)]. 

The  ogive  in  Diagram  V  (2)  has  been  plotted  from  the 
distribution  in  Table  IX  (2),  as  described  above.  It  offers 
no  new  difficulties  and  need  not  be  considered  in  any  detail. 
Note  that  the  curve  begins  at  104,  the  lower  limit  of  the  first 
step,  and  ends  at  140,  the  upper  limit  of  the  last  step  on  the 
scale;  also  that  the  cumulative  F%  7,  25,  52,  etc.,  have  all 
been  plotted  at  the  upper  limits  of  their  respective  step-intervals. 
This  ogive  does  not  require  any  smoothing  as  the  distribution 
which  it  represents  is  very  symmetrical. 

The  ogive  has  been  less  frequently  used  by  workers  in  exper- 
imental psychology  and  education  than  either  the  frequency 
polygon  or  the  histogram,  and  is  probably  somewhat  more 
difficult  for  the  general  reader  to  interpret.  It  has,  however, 
several  distinct  advantages.  In  the  first  place,  unlike  the 
other  frequency  graphs,  the  shape  of  the  ogive  remains  prac- 
tically the  same  when  the  size  of  the  step-interval  varies. 
Furthermore,  while  the  frequency  polygon  and  histogram  can- 
not be  compared  unless  the  step-intervals  are  the  same,  this 
restriction  does  not  apply  to  the  ogive. 

Probably  the  chief  value  of  the  ogive  to  the  student  of 
mental  measurement  lies  in  the  relative  ease  with  which 
percentile  values  may  be  calculated  from  the  curve.  The 
method  of  getting  these  values  is  illustrated  in  Diagram  V  (1 
and  2).  First,  a  perpendicular  is  erected  on  the  X-axis  at 
the  upper  limit  of  the  last  step-interval,  and  continued  until 
it  reaches  the  curve.  (In  the  first  ogive  this  perpendicular  will 
be  erected  at  205.)     Next,  this  line  between  the  curve  and  the 


70         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

X-axis  is  divided  into  10  equal  parts  (by  means  of  a  compass 
or  mm.  rule)  and  the  points  of  division  labeled  10,  20,  30,  40, 
50,  60,  70,  80,  90,  and  100  (the  100  point  lies  on  the  curve, 
the  0  point  on  the  X-axis).  These  points  are  used  to  locate  the 
10  decile  points  in  the  distribution.  To  find  the  second 
decile,  or  20th  percentile,  for  example,  we  draw  a  line  from  the 
second  point,  i.e.,  from  20,  parallel  to  the  X-axis,  and  where 
this  line  cuts  the  curve,  drop  a  perpendicular  to  the  X-axis. 


Individuals  in  Order 

DIAGRAM  VI 

Another  Way  of  Constructing  an  Ogive.  The  Individuals  are 
Arranged  in  Order  Along  the  Baseline,  Each  Man's  Score 
Being  Marked  Off  on  the  Ordinate  Above  Him. 


This  perpendicular  locates  the  20th  percentile  on  the  A'-scale. 
The  other  percentiles  and  quartiles  may  be  found  in  the  same 
way.  Notice  in  ogive  (1)  that  the  0  percentile  is  125 — theo- 
retically the  lowest  score  in  the  distribution — and  that  the 
100th  percentile  is  205 — theoretically  the  highest  score  in  the 
distribution. 

The  student  should  compare  the  percentile  values  obtained 
from  the  ogive  with  the  same  values  as  calculated  in  Table 
VIII  (1).     Due  to  the  greater  smoothness  of  the  curve,  the 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE        71 

percentiles  obtained  from  ogive  (2)  will  be  more  accurate  than 
those  got  from  the  ogive  (1). 

The  accuracy  with  which  we  are  able  to  obtain  the 
percentiles  graphically  will  depend,  in  general,  on  the  accuracy 
with  which  the  points  of  the  curve  have  been  plotted,  the  fine- 
ness of  the  scale,  the  number  of  cases,  and  the  symmetry  of 
the  distribution. 

Another  way  of  constructing  an  ogive  is  shown  in  Diagram 
VI,  with  the  data  of  Table  IX  (1).  Imagine  the  54  individuals 
in  the  distribution  arranged  along  the  baseline  according  to 
the  size  of  their  scores,  the  score  of  each  man  being  marked 
off  on  the  ordinate  above  him.  When  these  points  are  joined 
by  straight  lines,  we  have  a  series  of  rectangles  of  the  histogram 
type,  the  base  of  each  rectangle  representing  the  number  of 
men  making  the  given  score,  the  height  of  each  rectangle 
representing  the  size  of  the  score.  A  smooth  curve  may  be 
sketched  through  (or  as  near  as  possible  to)  the  midpoint 
of  the  upper  base  of  each  rectangle — as  shown  in  the  diagram — 
to  give  an  ogive  curve.  From  this  ogive,  percentiles  may  be 
easily  found.  To  get  the  median,  for  example,  we  erect  a  per- 
pendicular at  27  ( -d-  J  on  the  X-axis,  and  draw  a  line  through 

the  point  where  this  perpendicular  cuts  the  curve  parallel  to 
the  X-axis  to  locate  the  median  approximately  at  175  on  the 
F-scale.  The  quartiles  and  the  percentile  points  may  be  found 
in  exactly  the  same  manner. 


II.  Other  Uses  of  Graphical  Methods — the  Com- 
parative Line  Graph 

Many  problems  in  mental  measurement,  especially  those 
which  involve  the  measurement  of  changes  attributable  to 
growth,  learning,  practice,  etc.,  readily  lend  themselves  to 
graphical  treatment.  Diagram  VII  illustrates  several  such 
problems,  in  which  the  data  are  represented  by  "  line  graphs." 
As  in  all  graphs  hitherto  considered,  the  measures  are  plotted 


72 


STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 


with  reference  to  the  coordinate  axes,  OY  and  OX,  the  coor- 
dinates of  a  plotted  point  being  its  abscissa  or  X-distance, 
and  its  ordinate,  or  F-distance. 

Figure  1  illustrates  the  "  age  "  or  "  growth  "  curve.     It 


10 


11    12     13     14     .15     16     17    18    Ads. 
Age 


Fig.   1. — Logical  memory.     Age  is  represented  on  X-line   (horizontal);   score,  e.g., 
number  of  ideas  remembered,  on  F-line  (vertical).     (After  Pyle.) 


12     16      20     24     28     32     36 
Weeks  of  Practice 


40      44     48 


Fig.   2. — Improvement  in  telegraphy.     Weeks  of  practice  on  X-lines;  number  of 
letters  per  minute  on  F-line.      (After  Bryan  and  Harter.) 

DIAGRAM  VII 

Comparative  Line  Graphs. 


represents   the   growth   in  logical   memory   (for  a   connected 
passage)  in  boys  and  girls  from  8  to  18  years  old. 

Figure  2  illustrates  the  "  learning  "  or  "  practice  "  curve. 
It  shows  the  improvement  in  sending  and  receiving  telegraphic 
messages,   resulting  from  successive  trials  at  the  same  task 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE   73 

over  a  period  of  weeks.     Improvement  is  measured  in  terms 
of  the  number  of  letters  sent  or  received  per  minute. 

Figure  3  is  a  "  performance  "   or   "  practice  "   curve.     It 
represents  25  successive  trials  with  the  hand  dynamometer 

60  r 


50 

w 

C 

&  30 
u 

O 

20 
10 


J L 


j L 


12345678 


9  10  11  12  13  14  15  16  17  18  19  20  21 
Trials 


23  24  25 


Fig.  3. — Hand  dynamometer  readings  in  kilograms  for  25  successive  grips  at  intervals 
of  10  seconds.     Two  subjects,  a  man  and  a  woman. 

100  r 


i i_ 


j_ 


lhr.91ir.24hr. 


48  hr. 


144  hr. 


Fig.  4. — Curve  of  forgetting.     The  numbers  on  base  line  give  hours  elapsed  from 
time  of  learning;   numbers  along  F-axis  give  per  cent  retained.     (After  Ebbinghaua.) 

DIAGRAM  VII 

Comparative  Line  Graphs. 


by  one  man  and  one  woman.  Note  that  the  successive  trials 
are  laid  off  on  the  X-axis,  and  the  strength  of  grip  (in  kgs.) 
on  the  F-axis.  Graphs  like  these  are  useful  in  enabling  us  to 
compare  individuals  or  groups  at  various  stages  in  the  test'  or 
performance.  They  also  enable  us  to  study  the  effect  of 
fatigue  with  successive  trials. 

Figure  4  shows  the  well-known  "  curve  of  forgetting  "  (or 


74         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

retention).  It  represents  memory  retention,  as  measured  by 
the  percentage  of  the  original  material  retained  after  the 
passage  of  different  time  intervals.  The  time  intervals  between 
relearning  are  laid  off  on  the  X-axis;  the  per  cent  retained,  as 
shown  by  the  relearning,  on  the  X-axis. 


III.  The  Normal  Probability  Curve 

In  Diagram  VIII  are  shown  four  graphs — two  frequency 
polygons  and  two  histograms — which  represent  frequency 
distributions  of  data  drawn  from  anthropometry,  psychology, 
and  meteorology.  It  is  at  once  apparent  that  all  of  these 
graphs  have  the  same  general  form — the  measures  are  con- 
centrated closely  around  the  center,  and  taper  off"  from  the 
central  high  point,  or  crest,  equally  to  right  and  left.  In 
general  we  find  relatively  few  measures  at  the  "  low  "  score 
end  of  the  scale;  an  increasing  number  up  to  a  maximum 
at  the  midposition,  and  a  progressive  falling  off  as  we  go 
toward  the  "  high  "  score  end  of  the  scale.  If  we  divide 
the  area  under  each  curve  (the  area  between  the  curve  and 
the  X-axis)  by  a  line  drawn  perpendicularly  through  the 
central  high  point  to  the  base  line,  the  two  parts  will  be 
practically  similar  in  form  and  equal  in  area.  This  results 
from  the  fact  that  each  curve  shows  almost  perfect  bilateral 
symmetry.  The  perfectly  symmetrical  curve,  or  frequency  sur- 
face, to  which  all  of  the  figures  in  Diagram  VIII  approximate, 
is  shown  in  Diagram  IX.  This  bell-shaped  curve  is  called 
the  Normal  Probability  Curve,  or  simply  the  Normal  Curve, 
and  is  of  the  greatest  value  in  psychological  measurement. 
An  understanding  of  its  characteristics  is  essential  to  the 
student  of  experimental  psychology  and  measurement;  and 
consequently  the  rest  of  this  chapter  will  be  concerned  with  the 
study  of  the  properties  and  uses  of  the  Normal  Curve. 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE   75 


saiouorib&i^ 

fl  <r> 

<BcO 

u 

T) 

4> 

bll 

-fl 
o 

a 

•n 

r7 

V 

c3 

O 

03 

s 

S-i 

CO 

fl 

H 

3 

u 




flO> 

"So 


T3  o 
«*-i 

u 

03  vim 


o  « 


1    76    7. 

shorn 
,  page 

6    68    70    72    7 
In  Inches 

85  adult  male 
(After  Yule 

i    6 
ture 

f  85 
es. 

^ 

^s 

fir,     ro1""1 

\ 

fl    OQ 

\ 

58     6 

1.— Sta 
in  Bri 

V 

sjityvig  jo  I'BAjtvjni  qoni  jed  •ba.ij 


o 
DIAGRAM  VIII 


fa 


fa 

§3 


oiaoiflOiooooiaoia 

OO  l~-  t»  <o  «o  >o  iO  •*  •*  0-3  M(N 


.«8 

eo  V 

2* 

a  *- 

««2 
a 

S  to 

<U   03 

Sea 

a>  83    _ 

<H       .   fl) 

•NOW 


Samples  op  Frequency  Distributions  Drawn  prom  Different  Fields. 


76 


STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 


1.  Elementary  Principles  of  Probability.     The  Derivation  and 
Construction  of  the  Probability  Curve 

Perhaps  the  simplest  approach  to  an  understanding  of  the 
Normal  Curve  is  through  a  consideration  of  the  elementary 
facts  of  probability.  As  used  in  statistics,  the  "  probability  " 
of  the  occurrence  of  an  event  may  be  defined  as  the  expected 
relative  frequency  of  occurrence  of  the  given  event  in  a  very 


5C 

% 

v 

1 

68.26% 

V 

S 

/ 

— 4PE. 

S'-X 

PE       - 

I 

-2: 

I 

'E 

-1] 

DE 

ll 

>E 

23 

DE 

3f 

eV 

4PE 

*Y 

— 3(T 


-2<r 
Sigma  Scale 


-Iff 


0 
Mean 


+lff 


+2ff 


+  3<r 


DIAGRAM  IX 

Normal  Probability  Curve. 


large  (infinite)  number  of  observations.  This  expected  relative 
frequency  of  occurrence  may  be  based  upon  a  knowledge  of  the 
conditions  determining  the  probable  occurrence,  as  in  dice 
throwing  or  coin  tossing,  or  upon  empirical  data,  as  in  mental 
and  social  measurements. 

The  probability  of  an  event  may  be  stated  most  simply, 
perhaps,  as  a  ratio;  as,  for  example,  when  we  say  that  the 
probability  of  a  coin  falling  heads  or  tails  is  1/2,  or  that  of  a  die 
showing  a  two  spot  is  1/6.     This  ratio,  called  the  "  probability 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE    77 

ratio,"  may  be  defined  as  that  fraction  the  numerator  of  which 
equals  the  expected  outcome  or  outcomes  and  the  denominator 
of  which  equals  the  total  possible  outcomes.  Such  a  ratio  always 
falls  between  the  limits  0  (impossibility  of  occurrence)  and 
1.00  (certainty  of  occurrence).  Thus  the  probability  that  the 
sky  will  fall  is  0;  that  an  individual  now  living  will  some  day 
die  is  1.00.  Between  these  limits  there  are  all  possible  degrees 
of  probability  expressed  by  the  probability  ratio. 

Let  us  now  apply  these  simple  principles  of  probability 
to  the  specific  case  of  what  happens  when  we  toss  coins  (coin 
tossing  and  dice  throwing  furnish  simple  and  often-used  illus- 
trations of  the  laws  of  chance).  If  we  toss  one  coin,  obviously 
it  must  fall  either  heads  (H)  or  tails  (T)  100%  of  the  time 
and  a  head  or  tail  is  equally  probable.  Expressed  as  a  ratio, 
the  probability  of  an  H  is  1/2;  of  a  T,  1/2;  and 

(H-f-T),  i.e.,  1+|= 1.00. 

Again,  if  we  toss  two  coins,  (a)  and  (6),  at  the  same  time 
there  are  4  possible  arrangements  which  the  coins  may  take: 


(1) 

(2) 

(3) 

(4) 

a     b 

a     b 

a     b 

a    b 

H  H 

H  T 

T  H 

T  T 

That  is,  both  coins  (a)  and  (6)  may  fall  H;  (a)  may  fall  H 
and  (b)  T;  (6)  may  fall  H  and  (a)  T;  or  both  coins  may  fall  T. 
Expressed  as  a  probability  ratio,  the  chances  of  2  heads  are 
1/4;  of  one  head  and  one  tail,  2X1/4  or  1/2;  of  2  tails  1/4. 

Let  us  go  a  step  further  and  increase  the  number  of  coins 
to  three.  If  we  toss  three  coins,  (a),  (6),  and  (c)  simultaneously 
there  are  8  possible  outcomes: 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

(8) 

a    b    c 

a    b    c 

a    b   c 

a   b  c 

a    b    c 

a   b  c 

a    b   c 

a  b  c 

HHH 

HHT 

HTH 

HTT 

THH 

THT 

TTH 

TTT 

Expressed  as  a  ratio,  the  chances  of  3  heads  are  1/8  (combina- 
tion 1) ;  of  2  heads  and  1  tail  3/8  (combinations  2,  3,  and  5) ; 


78         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

of  1  head  and  2  tails  3/S  (combinations  4,  6,  and  7) ;  and  of 
3  tails  1/8  (combination  8).  In  exactly  this  same  way  we  can 
figure  the  probability  of  different  combinations  when  we  have 
4,  5,  or  any  number  of  coins. 

These  probable  outcomes  may  be  secured  in  a  very  much 
simpler  way  than  by  listing  all  of  the  various  possible  com- 
binations as  shown  above.  If  there  are  two  independent  events, 
the  probability  of  the  occurrence  or  non-occurrence  of  each 
being  the  same  (as  in  the  probability,  of  a  coin  falling  heads  or 
tails)  the  "  compound  "  probabilities  may  be  found  by  the 
expansion  of  the  binomial  (p+q)2  in  which  p  equals  the  prob- 
ability of  its  happening,  q  the  probability  of  its  not  happening, 
and  the  exponent  2  indicates  the  number  of  events.  Now  if  we 
substitute  H  for  p,  and  T  for  q  (tails  =  non-heads),  we  have  for 
two  coins  (H+T)2:  and  squaring,  the  binomial  (H+T)2  = 
H2+2HT+T2.     This  expansion  may  be  written, 

1  H2      1  chance  in  4  of  2  heads;  probability  ratio  =  1/4 

2  HT     2  chances  in  4  of  1  head  and  1  tail;  probability  ratio  =  1/2 
1  T2       1  chance   in  4  of  2  tails;  probability  ratio  =  1/4 

Total  =  4 

Note  that  these  results  are  identical  with  those  obtained  above 
by  listing  the  various  possible  outcomes  when  two  coins  are 
tossed. 

If  we  have  three  independent  events,  the  expression 
(p+q)3  becomes,  for  three  coins,  (H+T)3.  Expanding  this 
binomial,  we  get  H3  +  3H2T+3HT2+T3  which  may  be  written, 

1  H3      1  chance    in  8  of  3  heads;   probability  ratio  =1/S 

3  H2T  3  chances  in  8  of  2  heads  and  1  tail;  probability  ratio  =3/8 
3  HT2  3  chances  in  8  of  1  head  and  2  tails;  probability  ratio  =  3/8 
IT3       1  chance    in  8  of  3  tails;  probability  ratio  =  1/8 

Total  =  8 

Again  these  results  are  identical  with  those  got  by  listing  the 
various  possible  outcomes  obtained  by  tossing  throe  coins. 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE   79 

The  binomial  expansion  may  be  applied  more  generally  to  the 
case  in  which  there  are  any  number  of  independent  events, 
just  so  long  as  the  probability  of  occurrence  or  non-occurrence 
is  the  same  for  each  separate  event.  Thus  if  we  toss  10  coins 
simultaneously,  we  have  by  analogy  with  the  above  (p+#)10, 
which  equals  (H+T)10,  putting  H  for  probability  of  a  head, 
T  for  probability  of  a  non-head  (tail)  and  10  for  the  number 
of  coins  tossed.  When  the  expression  (H+T)10  is  expanded, 
we  have,1 

H10+10H9T+45H8T2  +  120H7Ts+210H6T4+252H6T5+210H4Ti 
+  120H:iT7+45H2T8+10HT9+T10 

which  may  be  summarized  as  follows: 

Probability 
Ratio 

1  H10         1  chance   in  1024  of  all  coins  falling  heads.   .  .  toVt 

10  H9T     10  chances  in  1024  of  9  heads  and  1  tail Ti^ 

45  H8T2    45  chances  in  1024  of  8  heads  and  2  tails T££T 

120  H7T3  120  chances  in  1024  of  7  heads  and  3  tails yV^r 

210  HCT4  210  chances  in  1024  of  6  heads  and  4  tails ^T 

252  H5T5  252  chances  in  1024  of  5  -heads  and  5  tails t%Vt 

210  H4T6  210  chances  in  1024  of  4  heads  and  6  tails ■£££. 

120  H3T7  120  chances  in  1024  of  3  heads  and  7  tails TVftr 

45  H2T8    45  cliances  in  1024  of  2  heads  and  8  tails Tf| T 

10  HT9     10  chances  in  1024  of  1  head   and  9  tails T^JT 

IT10         1  chance   in  1024  of  all  coins  falling  tails ToW 

Total  =  1024 

These  results  are  represented  graphically  in  Diagram  X, 
by  a  histogram  and  frequency  polygon  plotted  on  the  same 
axes.  The  eleven  terms  of  the  expansion  have  been  laid  off  at 
equal  distances  on  the  X-axis,  and  the  chances  of  the  occurrence 
of  each  combination  of  H's  and  T's  plotted  as  scores  on  the 
F-axis.  The  result  is  a  symmetrical  probability  curve,  with  the 
greatest  concentration  in  the  center,  and  the  "  scores  "  (the 
chances)  falling  away  by  corresponding  decrements  above  and 

1  The  reader  may  take  this  expansion  on  faith ;  or  he  may  refer  to  the  chapter 

on  Binomials  in  any  elementary  Algebra. 


80 


STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 


below  the  central  point.  Diagram  X  represents  the  results 
which  we  should  expect  to  get  theoretically  by  tossing  10  coins 
1024  times. 

Many  experiments  have  been  made  for  the  purpose  of 
checking  the  theoretical  against  the  actual  results,  by  tossing 
coins  or  throwing  dice  a  great  many  times.  In  one  well- 
known  experiment1    12   dice  were  thrown  4096  times,   each 


/ 

\ 
\ 

/ 
/ 

t 

\ 

/ 

\ 

200 

\ 

i 
f 

\ 
\ 

i 
i 

\ 

V 

i 
i 

\ 

i 
i 

\ 

100 

/ 

\ 

\ 

/ 
t 

\ 

i 

\ 

\ 

i 

s 
\ 

/ 

/ 

\ 

. 

• 

N 

— 1 

^-t^ 

B10     10H°T  45H8T2  120H7r3210i/t5T4252H6r5210H4Tt5120fl3TT45H-T6  10HT9       T10 

DIAGRAM  X 

Probability  Surface  Obtained  from  the  Expansion  of  (H+T)10. 


4,  5,  and  6  spot  being  taken  as  a  "  success  "  and  each  1,  2, 
and  3  spot  as  a"  failure.''  For  example,  in  a  throw  of  3,  1, 
2,  6,  4,  6,  3,  4,  1,  5,  2,  3,  there  would  be  5  successes.  The 
observed  frequencies  of  the  different  number  of  successes 
and  the  theoretical  results  secured  from  the  binomial  expan- 
sion have  been  plotted  on  the  same  axes  in  Diagram  XI.  The 
reader  will  note  how  closely  the  observed  frequencies  check 
the  theoretical:  how  close  the  two  polygons  are  to  being 
identical.  If  the  reader  should  care  to  verify  the  results  of 
Diagram  XI  by  tossing  10  coins   1024  times,  he  will  find  his 

1  Yule  G.  Udny,    An  Introduction  to  the   Theory  of  Statistics,  5th  edition, 
1919,  p.  258. 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE   81 

empirical    results    closely    in    accord    with    the    theoretical 
expectations. 

2.  Why  the  Probability  Curve  is  Employed  in  Psychological 

Measurement 

The    frequency    curve    plotted   in   Diagram  X  from   the 

expansion  of  the  expression  (H+T)10  is  a  symmetrical  10-sided 

polygon.     If  the  number  of  factors  (e.g.,  coins)  is  increased 


1000 


>4 

§  600 

o> 
a 
c 

o 
>  400 


200 


*""■-< 

\ 

s 

• 

\\ 

/  / 
/ / 

\ 

1 

V 

/ 

\ 

•\ 

/ 

s 

^^ 

r^"~* 

'S- 

^*5!^=^ 

10        11        12 


Theoretical  curve 
Actual  curve 


DIAGRAM  XI 

Comparison  of  Observed  and  Theoretical  Results  in  Throwing 
12  Dice  4096  Times.     (After  Yule,  page  258.) 


from  10  to  20,  to  30,  and  then  to  40  (the  baseline  extent  remain- 
ing the  same)  the  number  of  sides  of  the  polygon  will  increase 
from  10  to  20,  to  30,  to  40.  With  each  increase  in  the  number 
of  factors,  the  points  on  the  curve  will  move  more  and  more 
closely  together,  until  finally  when  the  number  of  factors 
becomes  very  large  [when  n  in  the  expression  (p+q)n  becomes 
infinite]  the  polygon  will  become  a  perfectly  smooth  curve 
like  the  one  in  Diagram  IX.  The  "  ideal  "  polygon  or  normal 
curve,  therefore,  may  be  said  to  represent  the  relative  frequency 
of  occurrence  of  various  combinations  of  a  very  large  number 
of  equal,  similar,  and  independent  factors,  when  the  chances  of 
the  occurrence  or  non-occurrence  of  each  factor  is  the  same. 


82         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

If  now  we  compare  the  frequency  curve  in  Diagram  IX 
with  the  four  graphs  plotted  from  actual  data  obtained  in 
measurements  of  height,  intelligence  (IQ),  memory  span, 
and  temperature  (see  Diagram  VIII)  the  similarity — as  noted 
above — of  these  graphs  to  the  normal  curve  is  clearly  evident. 
In  other  words,  these  distributions  of  variable  phenomena  act 
as  though  they  were  determined  by  the  operation  of  factors 
which  are  present  or  absent  according  to  the  same  laws  which 
govern  the  combinations  of  coins  and  dice.  This  is  found 
to  be  true  of  many  other  distributions  as  well;  so  that  the 
general  tendency  of  quantitative  data  to  follow  the  normal 
probability  curve  is  often  called  the  "  law  of  normal  frequency." 
Stated  briefly,  this  law  is  as  follows:  measurements  of  natural 
phenomena  as  well  as  measurements  of  mental  and  social 
traits  tend  to  be  distributed  symmetrically  about  their  central 
tendency  in  proportions  which  are  determined  by  the  laws 
of  chance. 

The  reason  why  frequency  distributions  of  variable 
phenomena  are  similar  to  chance  distributions  obtained  from 
tossing  coins  or  throwing  dice  is  that  the  former,  like  the  latter, 
are  probably  due  very  often  to  the  operation  of  the  laws  of 
chance.  "  Chance  "  may  be  defined  as  the  result  obtained 
from  the  operation  of  a  great  many  factors,  none  of  which  is 
dominant,  or,  put  id  another  way,  all  of  which  are  (relatively) 
similar,  equal,  and  independent.  A  number  of  small  factors, 
for  example,  determine  whether  a  coin  will  fall  heads  or  tails, 
or  whether  a  die  will  show  a  2,  3,  or  6  spot:  the  twist  of  the 
wrist,  height  from  which  coin  or  die  is  thrown,  weight  or  size 
of  coin  or  die,  kind  of  floor  on  which  experiment  is  made,  and 
many  others.1  In  like  manner  a  man's  height,  or  his  weight, 
or  the  shape  of  his  head,  or  his  intelligence,  or  his  eye  color 
is  determined,  very  probably,  by  a  large  number  of  factors 
which  have  approximately  the  same  influence  on  the  final 
result.  (Note:  Should  one  or  more  of  these  factors  have 
special  weight  the  distribution  will  no  longer  be  of  the  prob- 

1  See  Jerome  Harry,  Statistical  Methods,  1924.  pp.  169-170. 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE   83 

ability  type,  but  will  be  skewed  or  shifted  over  towards  the 
uoper  or  the  lower  end  of  the  scale.  The  question  of  "  skew- 
ness  "  will  be  considered  on  page  86.) 

Experiments  have  shown  that  the  normal  probability 
curve  serves  to  describe  the  frequency  of  occurrence  of  many 
variable  facts  with  a  relatively  high  degree  of  accuracy.  Some 
of  these  distributions  have  already  been  shown  in  Diagram  VIII. 
Important  facts  which  give  normal,  or  approximately  normal, 
distributions  may  be  classified  as  follows:1 

1.  Biological  statistics:  the  proportions  of  male  to  female 
births  for  the  same  country  or  community  over  a  period 
of  years;  the  proportion  of  different  types  of  plants  and 
animals  in  cross-fertilization  (the  Mendelian  ratios). 

2.  Anthropometrical  statistics:  height,  weight,  cephalic 
index,  etc.,  for  large  groups  of  same  age  and  sex. 

3.  Social  and  economic  statistics:  rates  of  birth,  marriage, 
or  death,  under  uniform  conditions;  wages  and  output  of 
large  numbers  of  workers  under  like  conditions  and  in  same 
occupation;  labor  costs,  prices,  etc. 

4.  Psychological  measurements:  intelligence  as  measured 
by  standard  tests;  speed  of  association,  perception,  reaction 
time,  etc.;  educational  test  scores,  e.g.,  in  spelling,  arithmetic, 
reading. 

5.  Errors  of  observation:  measures  of  height,  speed  of 
movement,  magnitudes,  physical  and  mental  traits,  etc., 
contain  errors  which  are  as  likely  to  cause  them  to  lie  above 
as  below  the  true  value  Such  errors  follow  the  normal 
probability  curve.     (This  topic  is  treated  in  Chapter  III.) 

The  normal  curve  is  often  called  the  normal  probability  curve 
because  it  gives  the  theoretical  probabilities  of  the  occurrence 
of  chance  phenomena.  It  is  also  called  the  normal  frequency 
curve  because  frequency  distributions  of  actual  data  obtained 
from  the  measurement  of  many  variable  facts  are  normal. 
Finally,  it  is  called  the  "  curve  of  error  "  because  when  repeated 
measurements  have  been  made  of  such  variables  as  height, 

1  Jones  D.  Caradog,  A  First  Course  in  Statistics,  1921,  p.  233. 


84        STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

linear  magnitudes,  time  and  extent  of  movement,  reaction, 
time,  etc,  the  separate  measures  tend  to  diverge  from  the 
"  true "  measure  (or  standard)  by  amounts  which  when 
plotted  give  the  characteristic  probability  curve  (see  Chapter 

We  may  conclude  this  discussion  of  the  normal  curve 
with  a  word  of  caution.  Despite  the  similarity  of  actual  and 
chance  distributions,  the  student  must  be  careful  not  to  draw 
the  conclusion  that  because  of  this  analogy,  we  can  assume 
forthwith  that  mental  and  physical  traits  are  always  (or  neces- 
sarily) due  to  the  operation  of  equal,  similar,  and  independ- 
ent factors  governed  entirely  by  chance.  The  factors  which 
determine,  say,  musical  ability  or  intelligence  are  too  little 
known  to  warrant  the  assumption,  a  priori,  that  they  operate 
in  the  same  manner,  and  in  accordance  with  the  same  laws, 
as  those  factors  which  give  chance  distributions  of  coins  or 
dice.  The  selection  of  the  normal  curve,  rather  than  some 
other  type  of  curve,  is,  after  all,  sufficiently  justified  by  the 
fact  that  it  does  generally  fit  the  data  better.  However 
"  the  theoretical  justification  and  the  empirical  use  of  the 
curve  are  two  quite  different  matters."  x 

3.  Important  Properties  of  the  Normal  Frequency  Curve 

In  the  normal  frequency  curve,  the  average,  the  median, 
and  the  mode  all  fall  exactly  at  the  midpoint  of  the  distribution, 
and  hence  are  numerically  equal.  This  follows  from  the  fact 
that  the  normal  probability  curve  is  perfectly  symmetrical 
bilaterally,  and  in  consequence  all  of  the  measures  of  central 
tendency  must  fall  at  the  middle  of  the  curve.  Also  in  the 
normal  curve,  the  measures  of  variability  include  certain  con- 
stant fractional  amounts  of  the  total  area  of  the  curve  as 
follows  (see  Diagram  IX) : 

1.  If  the  SD  is  laid  off  in  the  plus  and  minus  directions 
from  the  mean  (to  right  and  left)  along  the  baseline,  and  if 
perpendiculars  are  erected  at  these  points,  the  area  included 
1  Jones  D.  Caradog,  ibid.,  p.  233. 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE   85 ' 

by  the  perpendiculars,  the  baseline,  and  the  curve  itself  con- 
tains the  middle  68 .  26%  of  the  total  area  under  the  curve. 
Stated  briefly,  between  the  mean  and  ±1<7  are  found  the 
middle  2/3  (approximately)  of  the  cases  in  the  normal  dis- 
tribution. 

2.  If  the  AD  is  laid  off  in  the  plus  and  minus  directions 
from  the  mean  along  the  baseline,  and  if  perpendiculars  are 
erected  at  these  points,  the  area  included  by  the  perpendicu- 
lars, the  baseline,  and  the  curve,  contains  the  middle  57 . 5% 
of  the  total  area.  Put  briefly,  between  the  mean  and  ±1AD 
will  be  found  the  middle  57.5%  of  the  cases  in  the  dis- 
tribution. 

3.  If  the  PE  is  laid  off  in  the  plus  and  minus  directions 
from  the  mean  along  the  baseline,  and  if  perpendiculars  are 
erected  at  these  points,  the  area  included  by  the  perpendicu- 
lars, the  baseline  and  the  curve  contains  the  middle  50%  of 
the  area.  Since  the  PE  (equivalent  to  the  Q  in  a  normal  dis- 
tribution) equals  1/2  the  distance  between  the  75th  and  25th 
percentiles,  in  a  perfectly  symmetrical  distribution  it  marks 
off  the  25%  of  the  area  directly  above  and  the  25%  directly 
below  the  mean — the  middle  50%  of  the  measures. 

Certain  constant  relations  will  be  found  to  obtain  among 
the  measures  of  variability.  These  are  easily  derived  from  the 
per  cents  of  area  included  by  each. 

1.  PE=    .6745  a 

2.  PE=    .84534D 

3.  <r  =  1.4825P# 

4.  <7  =  1.2533AD 

5.  AD=    .7979  o- 

6.  AD  =  1.1843P# 

The  first  of  these  relations  is  the  only  one  used  often  enough  to 
warrant  its  being  memorized.  From  these  equations  it  should 
now  be  evident  why  it  was  stated  earlier  (page  27)  that  the  a 
is  always  greater  than  the  AD  which  is,  in  turn,  always  greater 
than  the  Q(PE). 


86         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

4.  The  Measurement  of  Skewness 

In  a  frequency  polygon  or  histogram,  usually  the  first 
thing  which  strikes  the  eye  is  the  symmetry,  or — what  is  more 
often  the  case — the  lack  of  symmetry  in  the  figure.  In  the 
normal  curve  the  mean,  the  median,  and  the  mode  all  coincide, 
and  there  is  a  perfect  balance  or  symmetry  between  the  right 
and  left  halves  of  the  figure.  In  a  "  skewed  "  distribution, 
on  the  other  hand,  the  mean,  the  median,  and  the  mode  fall 
at  different  points  in  the  distribution,  and  the  balance  (or 
center  of  gravity)  is  thrown  to  one  side  or  the  other — to  right 
or  left.  The  degree  of  displacement  or  skewness  is  measured 
by  the  formula, 

~.                  3  (mean  — median)  ,.  „N 

Skewness  = ^ ,  ....     (11) 

and  in  the  normal  distribution,  since  the  mean  =  the  median, 
the  skewness  is  0.  The  more  nearly  the  distribution  approaches 
the  normal  type,  the  closer  together  the  mean  and  the  median, 
and  the  less  the  skewness. 

If  we  apply  formula  (11)  to  the  distribution  of  54  Army 
Alpha  scores  in  Table  I,  we  get  —  .66  as  the  measure  of  skew- 
ness. Distributions  like  this  one  are  said  to  be  skewed  negatively, 
or  to  the  left:  the  scores  are  massed  at  the  high  end  of  the  scale 
(the  right  end),  and  spread  out  gradually  at  the  low  or  left  end,  as 
shown  in  Diagram  XII.  Distributions  are  skewed  positively  or 
to  the  right  when  the  scores  are  massed  at  the  low  (the  left)  end 
of  the  scale,  and  spread  out  gradually  at  the  high  or  right  end 
(see  Diagram  XIII). 

Formula  (11)  gives  the  measure  of  skewness  of  the  distribu- 
tion of  200  cancellation  scores  in  Table  II  (2)  as  + .  003. 
This  indicates  a  very  low  degree  of  positive  skewness,  and  shows 
how  very  closely  this  distribution  approaches  the  probability 
type. 

There  are  several  reasons  why  distributions  are  skewed.  In 
the  first  place  we  should  hardly  expect  the  distribution  of  IQ's 
obtained  from  a  group  of  25  eight-year  old  boys  to  be  normal, 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE   87 

nor  the  distribution  of  IQ's  obtained  from  a  special  class  for 
the   dull   and   feebleminded,    even   though    the   latter   group 


Median 


Average 


DIAGRAM  XII 

Negative  Skewness:  To  the  Left. 


were  large.  The  small  size  of  the  group  in  the  first  case, 
and  "  special  selection  "  l  in  the  second  are  sufficient  causes 
of    skewness.2     Again,    technical    faults    in    the    construction 


Median 


DIAGRAM  XIII 
Positive  Skewness:    To  the  Right. 

of  the  test,  errors  in  scoring  and  the  like  may  often  produce 
skewness  in  a  distribution  of  test  scores. 

In  addition  to  these  more  obvious  causes,  skewness  also 

*A  "  selected  "  group  is  one  which  is  not  representative  of  the  larger  group 
from  which  it  is  drawn. 

2  For  an  illustration  of  skewness  due  to  both  of  these  causes,  see  the  distribu- 
tion of  Table  I. 


88         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

results,  oftentimes,  from  a  real  lack  of  "  normality  "  in  the 
data.1  This  condition  arises  when  several  of  the  factors 
determining  a  given  result  are  dominant  or  prepotent  and 
hence  are  present  more  often  than  chance  would  allow  (see 
page  83).  A  simple  illustration  of  this  will  be  found  in  those 
distributions  which  result  from  the  throwing  of  loaded  dice. 
When  dice  of  this  sort  are  thrown,  the  resulting  distributions 
will  always  be  skewed,  due  to  the  greater  "  potency  "  of  the 
heavier  faces.  Again,  to  take  an  illustration  from  real  data, 
the  graph  representing  the  chances  of  death  is  considerably 
skewed — being  higher  in  infancy  and  old  age  than  in  youth 
or  old  age — because  of  the  difference  in  number  and  impor- 
tance of  the  "  causes  of  death  "  at  certain  ages. 

One  other  illustration  may  be  taken,  this  time  from  the  field 
of  tests.  If  an  arithmetic  test  which  involves  only  the  four 
fundamental  operations  is  given  to  1000  eighth  grade  children, 
there  will  be  a  piling  up  of  the  scores  towards  the  high  score 
end  of  the  distribution:  a  negative  skewness.  On  the  other 
hand,  if  the  test  contains  only  problems  in  fractions,  square 
root,  interest,  etc.,  there  will  be  a  piling  up  of  the  scores  (or  at 
least  a  shift  in  the  peak  of  the  curve)  towards  the  low  score  end 
of  the  scale:  a  positive  skewness.  These  results  may  be  ex- 
plained in  terms  of  the  small  positive  and  negative  factors  which 
produce  the  probability  curve.  Too  easy  a  test  excludes  from 
operation  some  of  the  factors  which  make  for  an  extension  of 
the  curve  at  the  upper  end,  such  as  a  knowledge  of  more  ad- 
vanced arithmetical  relations,  which  the  brighter  children  would 
know.  Too  hard  a  test  excludes  from  operation  factors  which 
make  for  the  extension  of  the  curve  at  the  lower  end,  such  as  a 
knowledge  of  very  simple  facts  which  would  permit  the  answer- 
ing of  a  few,  at  least,  of  the  questions  had  these  been  included. 

1  Theoretically,  there  is  no  real  reason  why  distributions  should  always  be 
normal.  Thorndike  has  written:  "  There  is  nothing  arbitrary  or  mysterious 
about  variability  which  makes  the  so-called  normal  type  of  distribution  a  neces- 
sity, or  any  more  rational  than  any  other  sort,  or  even  more  to  be  expected  on  a 
priori  grounds.  Nature  does  not  abhor  irregular  distributions." — Mental  and 
Social  Measurements,  pp.  88-89. 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE   89 

In  the  one  case  we  have  a  number  of  perfect  scores,  and  little 
discrimination;  in  the  second  case  a  number  of  zero  scores, 
and  equally  poor  discrimination. 

IV.  Some   Practical  Applications  of  the   Normal  Curve 

The  entire  area  under  any  frequency  curve  represents  the 
total  number  of  frequencies  in  the  distribution  (see  page  62). 
If  we  know  the  total  area  of  the  curve,  therefore,  and  in  addition 
the  proportion  of  the  total  area  in  a  given  segment,  it  is  pos- 
sible to  compute  very  simply  the  frequency  represented  by  the 
segment.  This  information  in  regard  to  the  normal  curve  is 
given  in  Tables  X  and  XI  from  which  the  theoretical  frequency 
of  any  fractional  part  of  the  probability  curve  may  be  easily 
obtained.  Acquaintance  with  these  tables  is  extremely  valuable 
in  the  solution  of  a  large  number  of  varied  problems.  For 
this  reason  before  considering  any  problems  which  depend  for 
their  solution  on  the  assumption  of  the  normal  distribution, 
it  is  very  desirable  that  the  construction  and  use  of  Tables 
X  and  XI  be  clearly  understood. 

1.  The  Construction  and  Use  of  Tables  X  and  XI 

Table  X  gives  the  fractional  parts  of  the  total  area  under 
the  normal  curve  found  between  the  mean  and  ordinates 
erected  at  various  distances  from  the  mean,  such  distances 
measured  in  a  units.1  The  total  area  of  the  curve  (the  num- 
ber of  cases  in  the  distribution)  is  taken  arbitrarily  as  10,000 
because  of  the  greater  ease  with  which  fractional  parts  of  area 

x 
may  then  be  calculated.     The  first  column  of  the  table,  -, 

a 

gives  the  distances  in  tenths  of  a  measured  off  on  the  baseline 
from  the  mean  as  the  0  point  or  origin;  distances  in  hun- 
dredths of  cr  are  given  by  the  headings  of  the  columns.  To 
find  the  number  of  cases  in  a  normal  distribution  between 
the  mean  and  the  ordinate  erected  at  a  distance  of  l<r  from 

1  Table  X  should  be  studied  in  conjunction  with  Diagram  IX. 


90        STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

x 
the    mean,    we  go  down  the  -  column  until  1.0  is  reached, 

a 

and  in  the  next  column  under  .  00  take  the  entry  opposite  1 . 0, 

viz.,  3413.     This  figure  means  that  there  are  3413  cases  in 

10,000,  or  34.13%  of  the  entire  area  of  the  curve  between  the 

mean  and  la;  or  put  more  exactly,  34.13%  of  the  cases  in  the 

normal  distribution  fall  within  the  interval   bounded   by  the 

baseline,  the  F-ordinate  erected  at  the  mean,  the  F-ordinate 

erected  at  a  distance  of  la  from  the  mean,  and  the  curve  itself 

(see  Diagram  IX  for  illustration).     To  find  the  per  cent  of  the 

x 
distribution  between  the  mean  and  1 .  57a  we  go  down  the  - 

a 

column  to  1.5,  then  across  horizontally  to  the  column  headed 

.07  and  take  the  entry  4418.     This  means  that  in  a  normal 

distribution,  44.18%  of  the  entire  distribution  falls  between 

the  mean  and  1 .  57a-. 

Thus  far  we  have  considered  only  a  distances  measured  in 

the  positive  direction  from  the  mean;   that  is,  we  have  taken 

account   only  of  the  right  half — the  high  score  end — of  the 

normal    curve.     Since  the  curve    is    bilaterally  symmetrical, 

however,  the  entries  in  Table  X  may  be  used  for  a  distances 

measured  in  the  negative  (to  the  left)  as  well  as  the  positive 

direction.    Accordingly,  to  find  the  per  cent  of  the  distribution 

between  the  mean  and  —  1 .  26<r,  we  simply  take  the  entry  3962 

in  the  table:  the  entry  in  the  column  headed  .06  opposite  1.2 

x 
in  the  -  column.     This  means  that  39.62%  of  the  cases  in 
a 

the  distribution  fall  between  the  mean  and  —  1.26o\     In  the 

same  way,  the  percentage   of   cases   between   the   mean  and 

—  l.OOo-  is  found  to  be  34.13;   and  the  student  will  now  be 

able  to  verify  the  statement  made  on  page  85  that  between 

the  mean  and  ±1.00cr  are  68.26%  of  the  cases  in  the  normal 

distribution. 

While  theoretically  the  normal  curve  meets  the  baseline 

at  infinite  distances  to  the  right  and   left  of  the  mean,  for 

practical  purposes  the  curve  may  be  taken  to  end  at  points 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE        91 


TABLE  X 

Fractional  Parts  op  the  Total  Area  (Taken  as  10,000)  under  the 
Normal  Probability  Curve,  Corresponding  to  Distances  on 
the  Baseline  between  the  Mean  and  Successive  Points  Laid 
off  from  the  Mean  in  Units  of  Standard  Deviation. 

Example :  between  the  mean,  and  a  point  1 . 3  er  ( —  =  1.3),  is  found 
40.32%  of  the  entire  area  under  the  curve. 


.00 


.01 


.02 


.03 


.04 


.05 


.06 


.07 


.08 


.09 


0.0 

0000 

0040 

0080 

0120 

0160 

0199 

0239 

0279 

0319 

0359 

0.1 

0398 

0438 

0478 

0517 

0557 

0596 

0636 

0675 

0714 

0753 

0.2 

0793 

0832 

0871 

0910 

0948 

0987 

1026 

1064 

1103 

1141 

0.3 

1179 

1217 

1255 

1293 

1331 

1368 

1406 

1443 

1480 

1517 

0.4 

1554 

1591 

1628 

1664 

1700 

1736 

1772 

1808 

1844 

1879 

0.5 

1915 

1950 

1985 

2019 

2054 

2088 

2123 

2157 

2190 

2224 

0.6 

2257 

2291 

2324 

2357 

2389 

2422 

2454 

2486 

2517 

2549 

0.7 

2580 

2611 

2642 

2673 

2704 

2734 

2764 

2794 

2823 

2852 

0.8 

2881 

2910 

2939 

2967 

2995 

3023 

3051 

3078 

3106 

3133 

0.9 

3159 

3186 

3212 

3238 

3264 

3290 

3315 

3340 

3365 

3389 

1.0 

3413 

3438 

3461 

3485 

3508 

3531 

3554 

3577 

3599 

3621 

1.1 

3643 

3665 

3686 

3708 

3729 

3749 

3770 

3790 

3810 

3830 

1.2 

3849 

3869 

3888 

3907 

3925 

3944 

3962 

3980 

3997 

4015 

1.3 

4032 

4049 

4066 

4082 

4099 

4115 

4131 

4147 

4162 

4177 

1.4 

4192 

4207 

4222 

4236 

4251 

4265 

4279 

4292 

4306 

4319 

1.5 

4332 

4345 

4357 

4370 

4383 

4394 

4406 

4418 

4429 

4441 

1.6 

4452 

4463 

4474 

4484 

4495 

4505 

4515 

4525 

4535 

4545 

1.7 

4554 

4564 

4573 

4582 

4591 

4599 

4608 

4616 

4625 

4633 

1.8 

4641 

4649 

4656 

4664 

4671 

4678 

4686 

4693 

4699 

4706 

1.9 

4713 

4719 

4726 

4732 

4738 

4744 

4750 

4756 

4761 

4767 

2.0 

4772 

4778 

4783 

4788 

4793 

4798 

4803 

4808 

4812 

4817 

2.1 

4821 

4826 

4830 

4834 

4838 

4842 

4846 

4850 

4854 

4857 

2.2 

4861 

4864 

4868 

4871 

4875 

4878 

4881 

4884 

4887 

4890 

2.3 

4893 

4896 

4898 

4901 

4904 

4906 

4909 

4911 

4913 

4916 

2.4 

4918 

4920 

4922 

4925 

4927 

4929 

4931 

4932 

4934 

4936 

2.5 

4938 

4940 

4941 

4943 

4945 

4946 

4948 

4949 

4951 

4952 

2.6 

4953 

4955 

4956 

4957 

4959 

4960 

4961 

4962 

4963 

4964 

2.7 

4965 

4966 

4967 

4968 

4969 

4970 

4971 

4972 

4973 

4974 

2.8 

4974 

4975 

4976 

4977 

4977 

4978 

4979 

4979 

4980 

4981 

2.9 

4981 

4982 

4982 

4983 

4984 

4984 

4985 

4985 

4986 

4986 

3.0 

4986.5 

4986.9 

4987.4 

4987.8 

4988.2 

4988.6 

4988.9 

4989.3 

4989.7 

4990.0 

3.1 

4990.3 

4990.6 

4991.0 

4991.3 

4991.6 

4991.8 

4992.1 

4992.4 

4992.6 

4992.9 

3.2 

4993.129 

3.3 

4995.166 

3.4 

4996.631 

3.5 

4997.674 

3.6 

4998.409 

3.7 

4998.922 

3.8 

4999.277 

3.9 

4999.519 

4.0 

4999.683 

4.5 

4999.966 

5.0 

4999.997133 

From:  Tables  for  Statisticians  and  Biometricians,  edited  by  Karl  Pearson, 
Cambridge  University  Press, 


92        STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

—  3cr  and  +3o-  from  the  mean.     We  find  from  Table  X,  for 

example,  that  4986.5  cases  in  the  total  10,000  fall  between  the 

mean  and  3a;    and  4986.5  cases  will,  of  course,  fall  between 

the   mean    and    —  3cr   also.     Therefore,    since    9973    cases   in 

10,000,  or  99.73%  of  the  distribution,  fall  within  the  limits 

set  by  —3cr  and  +3<r,  by  cutting  off  the  curve  at  these  two  points 

we  disregard  only  .27  of  1%  of  the  distribution — a  negligible 

amount,  except  in  very  large  samples. 

Instead  of  a,  the  PE  may  be  used  as  the  unit  of  measurement 

in  determining  the  theoretical  frequencies  within  given  intervals 

of  the  normal  curve.     Table  XI  gives  the  fractional  parts  of 

the  total  area  under  the  normal  curve  found  between  the  mean 

and  ordinates  erected  at  various  PE  distances  from  the  mean. 

The  table  is  read  in  exactly  the  same  way  as  Table  X.     To 

find,  for  instance,  the  number  of  cases  between  the  mean  and 

1PE  (or  more  accurately  the  ordinate  erected  at  that  point) 

x 
we  go  down  the  ^=— -  column  to  1.0  and  in  the  next  column 
PE 

under  .00  read  2500.  Twenty-five  per  cent  of  the  cases  in  the 
distribution,  therefore,  lie  between  the  mean  and  1PE.  In  like 
manner  25%  of  the  cases  lie  between  the  mean  and  —1PE; 
hence,  it  is  clear  that  the  middle  50%  of  the  distribution  is  con- 
tained between  the  mean  and  —1PE  and  -\-lPE  measured 
off  from  the  mean.  This  table  does  not  read  in  as  fine  units 
as  Table  X,  only  tenths  and  .05ths  PE  divisions  being  given. 
If  smaller  divisions  are  desired,  however,  interpolation  can 
be  made. 

Just  as  it  is  customary  to  disregard  that  part  of  the  curve 
beyond  the  limits  ±3<r,  so  we  ordinarily  disregard  that  part 
of  the  curve  beyond  the  limits  ±4PE.  This  is  done  because 
9930  cases  (4965X2)  in  the  total  10,000  fall  between  the  mean 
and  ±^PE  (see  Table  XI).  Hence,  in  cutting  of  the  curve 
at  +4PE  and  —4PE,  we  disregard  only  .70  of  1%  of  the  cases 
in  the  distribution. 

There  is  little  to  choose  as  between  Tables  X  and  XI.  The 
former  admits  of  slightly  easier  interpolation,  but  the  latter 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE   93 

is  probably  accurate  enough,  without  interpolation,  for  most  of 
the  work  done  in  psychological  measurement. 


TABLE  XI 

Fractional  Parts  of  the  Total  Area  (Taken  as  10,000)  under  the 
Normal  Probability  Curve,  Corresponding  to  Distances  on  the 
Baseline  between  the  Mean  and  Successive  Points  Laid  off 
from  the  Mean  in  Units  of  PE. 

Example :  we  find  between  the  mean  and  a  point  1 .  55  PE  ( -^=  =  1 .  55  J 
from  the  mean  35.21%  of  the  entire  area  under  the  curve. 


X 

.00 

.05 

X 

.00 

.05 

PE 

PE 

0 

0000 

0135 

3.0 

4785 

4802 

.1 

0269 

0403 

3.1 

4817 

4831 

.2 

0536 

0670 

3.2 

4845 

4858 

.3 

0802 

0933 

3.3 

4870 

4881 

.4 

1063 

1193 

3.4 

4891 

4900 

.5 

1321 

1447 

3.5 

4909 

4917 

.6 

1571 

1695 

3.6 

4924 

4931 

.7 

1816 

1935 

3.7 

4937 

4943 

.8 

2053 

2168 

3.8 

4948 

4953 

.9 

2291 

2392 

3.9 

4957 

4961 

1.0 

2500 

2606 

4.0 

4965 

4968 

1.1 

2709 

(2810 

4.1 

4971 

4974 

1.2 

2908 

3004 

4.2 

4977 

4979 

1.3 

3097 

3188 

4.3 

4981 

4983 

1.4 

3275 

3360 

4.4 

4985 

4987 

1.5 

3441 

3521 

4.5 

4988 

4989 

1.6 

3597 

3671 

4.6 

4990 

4991 

1.7 

3742 

3811 

4.7 

4992 

4993 

1.8 

3896 

3939 

4.8 

4994 

4995 

1.9 

4000 

4057 

4.9 

4995 

4996 

2.0 

4113 

4166 

5.0 

4996 

4997 

2.1 

4217 

4265 

5.1 

4997.1 

4997.4 

2.2 

4311 

4354 

5.2 

4997.7 

4998 

2.3 

4396 

4435 

5.3 

4998.2 

4998.4 

2.4 

4472 

4508 

5.4 

4998.6 

4998.8 

2.5 

4541 

4573 

5.5 

4999 

4999.1 

2.6 

4602 

4631 

5.6 

4999.2 

4999.3 

2.7 

4657 

4682 

5.7 

4999.4 

4999.5 

2.8 

4705 

4727 

5.8 

4999.55 

4999 . 6 

2.9 

4748 

4767 

5.9 

4999.65 

4999.7 

94        STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

2.  A  Variety  of  Problems  Solved  by  Means  of  Tables   X 
and  XI 

Under  this  heading  we  shall  consider  a  number  of  problems 
which  may  be  solved  by  means  of  Tables  X  and  XI,  on  the 
assumption  that  the  distributions  which  they  involve  are  normal 
or  approximately  normal.  For  easy  reference  later,  each 
group  of  examples  is  preceded  by  a  general  statement  of  the 
problem  which  they  are  designed  to  illustrate. 

A.  To  Determine  the  Per  Cent  of  Cases  in  a  Normal  Distribution 
which  Fall  within  Given  Limits. 

Problem  (1) — Given  a  normal  distribution  with  Average 
=  12,  and  a  =  4.00.  (a)  What  per  cent  of  the  cases  fall 
between  8  and  16?  (6)  What  per  cent  of  the  cases  lie  above 
18?     (c)  below  6? 

(a)  A  score  of  16  is  just  4  points  above  the  mean,  and  a  score 
of  8  is  just  4  points  below  the  mean.  If  we  divide  this  differ- 
ence of  4  points  by  the  a  of  the  distribution  (by  4)  it  is  clear 
that  16  is  la  above  the  mean  and  that  8  is  la  below  the  mean 
(see  Diagram  XIV,  Fig.  I).  68.26%  of  the  cases  in  a  normal 
distribution  fall  between  the  mean  and  ±la  (Table  X).  Hence, 
68.26%  of  the  scores  in  the  given  distribution,  or  approximately 
the  middle  2/3,  fall  between  8  and  16.  This  result  may  also 
be  stated  in  terms  of  "  chances."  Since  68.26%  of  the  cases 
in  the  distribution  fall  between  8  and  16,  the  chances  are 
6826  in  10,000  or  68  in  100  that  any  score  in  the  distribution 
will  be  found  between  8  and  16. 

(b)  A  score  of  18  is  6  points  or  1.5a  above  the  mean. 
From  Table  X  we  find  that  43.32%  of  the  cases  fall  between 
the  mean  and  1.5cr.  Accordingly,  6.68%  of  the  cases 
(50% -43.32%)  must  lie  above  18,  in  order  to  fill  out  the 
50%  of  cases  in  the  right  half  of  the  curve  (see  Fig.  1).  Stated 
as  "  chances,"  there  are  668  chances  in  10,000  or  about  7  in 
100  that  any  future  score  will  lie  above  18. 

(c)  A  score  of  6  is  —  1.5<r  from  the  mean.     Between  the 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE   95 


-1.5CT         1.5C 
FlG.l. 


-1:150"    1:15(T 

Fig.  3. 


150^ 
1:25PE         182.50 

Fig.  I. 


.530-^7.8i0-V1.280- 
FlG.  5. 


-1.20-   -1.20- 


1.20"     1.20" 


FlG.  8. 


-2.45PE2p£   1PE  1PE  2i?E 

0  point 
FIG.  7. 

DIAGRAM  XIV 

Illustrating  a  variety  op  Problems  Solved  by  Means  of  Tables 

X  and  XI. 


96         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

mean  and  —  1.5a  (6)  are  43.32%  of  the  cases  in  the  entire 
distribution.  Hence,  6.68%  of  the  cases  lie  below  6 — fill  out 
the  50%  below  the  mean — and  the  chances  are  7  in  100  that 
any  future  score  will  lie  below  6. 

Problem  (2) — Given  a  distribution  with  Average  =  29 .  75, 
and  Q  =  4 .  56.  What  per  cent  of  the  distribution  lies  between 
22  and  26?  What  are  the  chances  that  a  score  will  fall  be- 
tween 22  and  26? 

In  a  normal  distribution  Q  is  equal  to  the  PE.    Score  22  is 

since  . '      =  1 . 70  J  from  the  mean, 

and  score  26  is  3 .  75  points  or  —  .  822PE  from  the  mean  (see 
Diagram  XIV,  Fig.  2).  From  Table  XI,  we  find  that  37.42% 
of  the  cases  in  a  normal  distribution  lie  between  the  mean  and 
—  1.7QPE;  and  that  21%  of  the  cases  he  between  the  mean 
and  —  .WIPE.  By  simple  subtraction,  therefore,  16.42%  of 
the  cases  fall  between  —  1 .  70PE  and  —  .  S22PE  or  between 
22  and  26.  The  chances  are  1642  in  10,000  or  16  in  100  that  a 
score  will  fall  between  22  and  26. 

B.  To  Find   the   Limits  in  Any  Normal  Distribution  which  Will 
Include  a  Given  Per  Cent  of  the  Cases 

Problem  (1) — Given  a  distribution  with  Average  =  16,  and 
(T=4.     What  limits  will  include  the  middle  75%  of  the  cases? 

The  middle  75%  of  the  cases  in  a  normal  distribution  must 
include  the  37.5%  just  above  and  the  37.5%  just  below  the 
mean.  From  Table  X,  we  find  that  3749  cases  in  10,000,  or 
37.5%  of  the  distribution  fall  between  the  mean  and  1.15a-; 
and  consequently,  37.5%  of  the  distribution  must  fall  between 
the  mean  and  —  1 .  15a-.  The  middle  75%  of  the  cases,  there- 
fore, lie  between  the  mean  and  ±1.15<r;  or  since  a  equals 
4,  between  the  mean  and  ±4.60  points.  Adding  ±4.60 
to  the  mean  (to  16),  we  find  that  the  middle  75%  of  the 
scores  in  the  given  distribution  lie  between  20.60  and  11.40 
(see  Diagram  XIV,  Fig.  3). 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE   97 

Problem  (2) — Given  a  distribution  with  Average  =  150, 
and  Q  =26.  What  limits  will  include  the  highest  20%  of  the 
group? 

The  highest  20%  of  a  normally  distributed  group  must 
have  30%  of  the  cases  between  its  lower  limit  and  the  mean 
in  order  to  fill  out  the  50%  of  cases  in  the  right  half  of  the  dis- 
tribution (see  Diagram  XIV,  Fig.  4).  From  Table  XI,  we  find 
that  3004  cases  in  10,000,  or  30%  of  the  distribution,  fall  between 
the  mean  and  1 .  25PE.  Since  the  PE  of  the  given  distribution 
is  26,  1.25PE  will  be  1.25X26  or  32.5  points  above  the  mean, 
namely,  at  182 .  50.  The  lower  limit  of  the  highest  20%  of  the 
given  group,  therefore,  is  182.50;  and  the  upper  limit  is  the 
highest  score  in  the  distribution,  whatever  that  may  be. 

C.  To    Determine    the    Relative    Difficulty    of    Test    Questions, 
Problems,  or  Other  Test  Items 

Problem  (1) — Given  a  test  question  or  problem  solved  by 
10%  of  a  large  unselected  group;  a  second  problem  solved 
by  20%  of  the  group;  and  a  third,  solved  by  30%  of  the 
group.  Assuming  that  the  capacity  measured  by  the  test 
problems  is  distributed  "  normally  "  what  is  the  relative 
difficulty  of  questions  1,  2,  and  3? 

Our  first  task  is  to  find  for  question  1  a  position  in  the 
distribution,  above  which  are  10%  (the  per  cent  passed)  and 
below  which  are  90%  (the  per  cent  failed)  of  the  entire  group. 
The  highest  10%  in  a  normally  distributed  group  has  40%  of 
the  cases  between  its  lower  limit  and  the  mean  (50%  — 10%  = 
40%,  see  Diagram  XIV,  Fig.  5),  and  from  Table  X  we  find 
that  39.97%,  i.e.,  40%,  of  a  normal  frequency  distribution  falls 
between  1.28a  and  the  mean.  Hence,  question  1  falls  at  a 
point  on  the  baseline  of  the  curve  whose  abscissa  is  1.28o- 
from  the  mean;  and  accordingly  1.28a  may  be  taken  as  its 
difficulty  value. 

In  the  same  way,  question  2,  passed  by  20%  of  the  group, 
falls  at  a  point  in  the  distribution   30%   above  the  mean 


98         STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

(50% -20%  =  30%,  see  Fig.  5).  From  Table  X  we  find  that 
29.95%,  i.e.,  30%,  of  the  group  falls  between  the  mean  and 
.84(7;  hence  question  2  has  a  difficulty  value  of  .84a-.  In  like 
manner  question  3,  which  falls  at  a  point  in  the  distribution 
20%  above  the  mean  has  a  difficulty  value  of  .53(7,  since 
20.19%  of  the  distribution  lies  between  the  mean  and  .53o\ 
To  summarize  our  results: 


Question 

Passed  by 

<t  value 

<r  difference 

1 

2 
3 

10%   ' 

20% 

30% 

1.28 
.84 
.53 

.44 
.31 

The  a  difference  in  difficulty  between  2  and  3  is  .31,  roughly  only 
3/4  of  the  o-  difference  in  difficulty  between  1  and  2  (.44) 
in  spite  of  the  fact  that  the  per  cent  difference  is  the  same  in 
the  two  cases.  On  the  assumption  that  ability  follows  the 
normal  frequency  distribution,  therefore,  it  is  evident  that  the 
a  and  not  the  per  cent  difference  gives  the  real  index  of  dif- 
ferences in  difficulty. 

Problem  (2) — Given  three  test  items,  No.  1,  No.  2,  and 
No.  3,  passed  by  50%,  40%,  and  30%,  respectively,  of  a  large 
group.  What  per  cent  of  the  same  group  must  pass  test  item 
No.  4,  in  order  for  it  to  be  as  much  more  difficult  than  No.  3, 
as  No.  2  is  more  difficult  than  No.  1? 

A  question  or  problem  which  is  "  passed  "  by  50%  of  a 
group  is,  of  course,  "  failed  "  by  50%  also,  and  accordingly, 
such  a  problem  falls  exactly  in  the  middle  of  normal  distribu- 
tion of  difficulty.  Test  item  1,  therefore,  has  a  a  value  of  0; 
it  falls  just  on  the  mean  (see  Diagram  XIV,  Fig.  6).  Test 
item  2  lies  at  a  point  in  the  distribution  10%  above  the  mean, 
as  40%  of  the  group  passed,  and  60%  failed  this  problem. 
Accordingly,  the  a  value  of  this  item  is  .25,  since  from  Table 
X,  we  find  that  9 .  87% — roughly  10% — of  the  cases  He  between 
the  mean  and  .  25c.  Test  item  3,  passed  hy  30%  of  the  group, 
lies  at  a  point  20%  above  the  mean,  and  this  item,  therefore, 
has  a  difficulty  value  of  .  52<r  as  19 .  85%  (20%)  of  the  normal 
distribution  lies  between  the  mean  and  .  52c. 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE   99 

Now  since  item  2  is  .25<r  further  along  on  the  difficulty 
scale  (towards  the  high  score  end  of  the  curve)  than  item  1, 
it  is  clear  that  item  4  must  be  .  25a  above  item  3,  if  it  is  to  be 
as  much  harder  than  3  as  2  is  harder  than  1.  Item  4,  therefore, 
must  have  a  value  of  .52(7+  .25(7  or  .11  a)  and  from  Table  X, 
we  find  that  27.94%  of  the  group  fall  between  the  mean  and 
this  point.  This  means  that  50%  — 28%  or  22%  of  the  group 
pass  item  4.     To  summarize  by  a  table: 


Test  Item 

Passed  by 

Difficulty  Value  (<r) 

<r  difference 

1 

50%   * 

.00 

— 

2 

40% 

.25 

.25 

3 

30% 

.52 

— 

4 

22% 

.77 

.25 

A  problem  or  test  item  must  be  passed  by  22%  of  the  group, 
therefore,  in  order  for  it  to  be  as  much  more  difficult  than  an 
item  passed  by  30%,  as  an  item  passed  by  40%  is  more  difficult 
than  one  passed  by  50%.  Note  again  that  per  cent  differences 
are  not  reliable  indices  of  differences  in  difficulty  when  the 
capacity  measured  is  taken  to  be  distributed  normally. 

D.  To  Separate  a  Given  Group  into  Sub-Groups  According  to 
Capacity,  When  the  Capacity  is  Normally  Distributed 

Problem  (1) — Suppose  that  we  have  measured  100  college 
men  on  a  certain  test.  We  wish  to  classify  our  group  into  5 
sub-groups  A,  B,  C,  D,  and  E,  according  to  ability,  the  range 
of  ability  to  be  equal  in  each  sub-group.  Assuming  that 
the  capacity  measured  by  the  test  is  distributed  normally,  or 
approximately  so,  and  that  the  group  is  relatively  unselected, 
how  many  men  should  be  placed  in  groups  A,  B,  C,  D,  and 
E,  respectively? 

Let  us  first  represent  the  positions  of  the  five  sub-groups 
graphically  on  the  normal  curve  as  shown  in  Diagram  XIV, 
Fig.  7.  If  the  baseline  of  the  curve  is  taken  to  extend  from 
—  3cr  to  +3(7,  that  is,  over  a  range  of  6(7,  dividing  this  range  by  5, 
we  get  1 . 2(7  as  the  baseline  extent  to  be  allotted  to  each  group. 


100      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

These  five  intervals  may  be  laid  off  on  the  baseline  as  shown 
in  the  figure,  and  perpendiculars  drawn  to  demarcate  the 
various  sub-groups.  It  is  clear  that  group  A  covers  the  upper 
1.2a;  group  B,  the  next  1.2a;  that  group  C  lies  .60-  to  the 
right  and  .60-  to  the  left  of  the  mean;  and  that  groups  D  and 
E  occupy  the  same  relative  positions  on  the  left  half  of  the 
curve,  as  B  and  A  occupy  on  the  right  half. 

Now  to  find  what  per  cent  of  the  whole  group  falls  within 
the  A  group,  we  must  find  what  per  cent  of  a  normal  distribu- 
tion lies  between  3a  (the  upper  limit  of  the  A  group)  and  l.Sa 
(the  lower  limit  of  the  A  group)  (see  Fig.  7).  From  Table  X 
we  know  that  49.86%  of  a  normal  distribution  falls  between 
the  mean  and  3a;  and  that  46.41%  falls  between  the  mean 
and  l.Sa.  Hence  3.45%  of  the  total  area  under  the  normal 
curve  (49.86%-46.41%)  falls  between  3a  and  1.8a,  and, 
accordingly,  group  A  comprises  3.45%  of  the  whole  group. 

The  per  cents  in  the  other  groups  are  found  in  exactly  the 
same  way.  Thus,  46.41%  of  the  normal  curve  falls  between 
the  mean  and  1.8a  (upper  limit  of  group  B)  and  22.57%  falls 
between  the  mean  and  .60-  (lower  limit  of  the  same  group). 
Subtracting,  46. 41% -22. 57%  or  23.84%  of  our  whole  group 
evidently  belongs  in  sub-group  B.  Group  C  lies  .60-  above 
and  .  6a  below  the  mean.  Between  the  mean  and  .  60-  is  con- 
tained 22.57%  of  a  normal  distribution,  and  the  same  per 
cent  is  contained  between  the  mean  and  —  .  60-.  Group  C, 
then,  includes  45%  (22. 57% X 2)  of  the  whole  group.  Finally, 
group  D  which  falls  between  —  .Qa  and  —  1 .80-  contains  exactly 
the  same  percentage  of  the  total  as  group  B;  and  group  E 
which  falls  between  — 1.80-  and  —  3a  contains  the  same  per 
cent  as  group  A.  The  percentage  (and  number)  of  men  in 
each  group  is  given  in  the  following  summary: 

Group  A  B  C  D  E 

Per   cent   of  total   in 

each  group 3.5  23 . 8  45  23 . 8  3.5 

Number  in  each  group 

(100  men  in  all) ...        4  or  3  24  45  24  4  or  3 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE      101 

On  the  assumption  that  the  capacity  measured  follows  the 
normal  probability  curve,  therefore,  only  4  men  in  the  group 
of  100  should  be  placed  in  group  A — call  the  marked  ability 
group;  24  in  group  B,  the  high  average  ability  group;  45  in 
group  C,  the  average  ability  group;  24  in  group  D,  the  low 
average  ability  group ;  and  4  in  group  E,  the  very  low  or  stupid 
group. 

The  above  procedure  may  be  used  in  determining  how  many 
individuals  in  a  large  class  should  get  grades  of  say,  A,  B,  C, 
D,  E,  or  it  may  be  employed  for  any  number  of  grade-groups. 
The  assumption  must  be  made,  however,  that  the  subject  in 
which  the  individuals  are  being  graded  follows  the  normal  curve. 

3.  The  Arrangement  of  Problems  or  Other  Test  Items  into  a 
Scale  in  which  the  Difficulty  of  Each  Item  is  Known 
with  Reference  to  Each  Other  Item  as  Well  as  Some 
Selected  Zero  Point 

One  of  the  important  tasks  which  confronts  the  worker 
with  tests  is  the  construction  of  scales  which  shall  contain 
problems  or  questions  graded  in  difficulty  from  very  easy 
to  very  hard  by  known  steps  or  intervals.  Given  a  set  of 
problems  or  test  items,  if  we  know  what  per  cent  of  a  large 
group  (selected  from  among  those  for  whom  the  test  is  intended) 
pass  or  fail  each  problem,  it  is  a  comparatively  easy  matter 
to  arrange  the  problems  in  a  rough  order  of  difficulty.  Such 
an  arrangement,  however,  constitutes  a  very  crude  scale,  as 
we  know  very  little  about  the  relative  difficulty  of  the  separate 
problems  (see  page  98)  and  next  to  nothing  about  the  range 
of  ability  tested. 

For  this  reason  in  most  scaled  tests — if  we  can  assume  a 
normal  or  approximately  normal  distribution  in  the  capacity 
tested — the  unit  of  measurement  is  taken  as  the  a  or  the  PE.  By 
so  doing  we  are  able  not  only  to  arrange  the  test  items  in  a  simple 
order  of  difficulty,  but  to  "  set  "  or  space  them  at  definite  points 
along  a  scale  of  difficulty — along  the  baseline  of  the  normal 
curve.     On  such  a  scale  the  distance  from  one  item  to  another, 


102       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

or  from  any  given  item  to  the  selected  zero  point  is  known  as 
definitely  as  the  distance  between  two  divisions  on  a  foot  rule. 
To  illustrate  concretely  how  a  scale  of  this  sort  is  made,  let  us 
suppose  that  we  wish  to  construct  a  scaled  test  for  measuring 
"  reasoning  ability  "  (e.g.,  by  means  of  syllogisms)  in  12  year 
olds;  or  an  addition  scale  for  Grade  IV;  or  a  scale  for  testing 
sentence  memory  in  8  year  olds.  The  steps  involved  may  be 
outlined  as  follows: 

(1)  First  it  is  necessary  to  compile  a  large  number  of 
problems  or  other  test  items  which  vary  in  difficulty  from  very 
easy  to  very  hard,  and  which  are  fairly  representative  of  the 
field  covered  by  the  test. 

(2)  These  problems  are  then  given  to  as  large  a  random 
sample  as  possible  from  among  those  for  whom  the  scale  is 
intended. 

(3)  The  per  cent  of  the  group  which  solves  each  problem 
correctly  is  next  computed.  This  allows  duplicates  and  prob- 
lems too  easy  or  too  hard  or  those  which  for  one  reason  or 
another  are  unsatisfactory  to  be  discarded.  It  also  permits 
the  arrangement  of  the  problems  selected  for  the  scale  into  an 
order  of  difficulty.  A  problem  solved  correctly  by  90%  of  the 
group  is  obviously  easier  than  one  solved  correctly  by  75%; 
while  the  second  problem  is,  in  turn,  clearly  less  difficult  than 
one  solved  correctly  by  50%.  The  larger  the  per  cent  passing 
the  lower  the  position  of  the  problem  on  the  difficulty 
scale. 

(4)  By  means  of  Table  XI  each  per  cent  correct  found  in  (3) 
may  now  be  converted  into  a  PE  (or  a) *  distance  above  or  below 
the  mean.  The  procedure  here  is  as  follows.  An  item  solved 
correctly  by  40%  of  the  group  is  10%  or  .375PE  above  the 
mean.  In  like  manner,  an  item  solved  correctly  by  78%  of  the 
group  is  28%  (78% -50%)  or  l.lbPE  below  the  mean.  We 
may  tabulate  the  results  for  five  items  selected  at  random  as 
follows  (see  Diagram  XIV,  Fig.  8) : 

1  The  procedure  is  identical  when  a  is  employed  instead  of  the  PE. 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE      103 

Problem  A                B                C            D  E 

Per  cent  solving 93               78               55           40  14 

Distance  from  mean  in  per- 
centage terms —43  —28  —5          10  36 

Distance  from  the  mean  in 

PE  terms -2.20  -1.15  -.20  .375  1.60 

Note  that  Problem  A  is  solved  by  93%  of  the  group,  i.e.,  by 
the  upper  50%  (the  right  half  of  the  curve)  plus  the  43%  to  the 
left  of  the  mean.  Hence  it  is  —  2 .  20PE  to  the  left  of  the  mean. 
In  like  manner,  the  percentage  distance  from  the  mean  measured 
to  the  right  or  left — plus  or  minus — for  each  problem  may  be 
found  by  simply  subtracting  the  per  cent  passing  from  50%. 
From  these  percents,  the  PE  distance  of  the  problem  from  the 
mean  can  be  read  from  Table  XIV,  as  shown  above. 

(5)  With  the  PE  distance  of  each  problem  above  or  below 
the  mean  established,  the  PE  distance  of  each  problem  from 
the  "  zero  point  "  of  ability  in  the  test  may  be  calculated. 
This  zero  point  is  located  in  the  following  way.  Suppose  that 
5%  of  the  whole  group  failed  to  solve  a  single  problem  correctly. 
This  puts  the  point  of  zero  ability  45%  of  the  distribution  below 
the  mean  or  at  a  point  —  2A5PE  from  the  mean.1  The  PE 
distance  of  each  problem  in  the  scale  may  now  be  found  from 
this  arbitrary  zero  point.  To  illustrate  with  the  five  problems 
above : 


Problem 

A 

B 

C 

D 

E 

PE  distance  from  mean 

-2.20 

-1.15 

-.20 

.375 

1.60 

PE  distance  from   assumed 

zero,  i.e.,  -2A5PE 

.25 

1.30 

2.25 

2.83 

4.05 

The  simplest  way  to  find  the  PE  distances  from  the  given  zero 
point  is  to  subtract,  algebraically,  the  distance  of  the  zero  point 
below  the  mean,  from  the  PE  distance  of  each  problem  from  the 
mean.  Problem  A,  for  example,  is  —2.20 — (  —  2.45)  or  .25PE 
from  the  zero  point;  while  problem  E  is  1.60 — (  —  2.45)  or 
4 .  05PE  from  the  zero  point.     The  PE  value  of  each  of  the  other 

1  Note  that  this  point  is  not  a  true  zero  unless  the  problems  range  down  to 
zero  difficulty.  It  serves,  however,  as  a  convenient  reference  point  for  the 
group  for  whom  the  test  is  intended. 


104      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

problems  as  measured  from  the  given  zero  point  is  found  in  the 
same  way. 

When  the  PE  value  from  zero  of  each  of  the  problems  has 
been  determined,  the  difficulty  of  each  problem  with  respect 
to  every  other  problem  as  well  as  to  zero  is  known  and  the 
scale  is  finished. 

It  is  evident,  of  course,  that  a  scale  of  this  sort  will  not 
usually  have  equal  difficulty  intervals  or  "  steps  "  from  easy 
to  hard.  However,  this  fact,  while  inconvenient,  does  not 
necessarily  invalidate  the  usefulness  of  the  scale  as  a  measuring 
instrument.  In  lieu  of  a  rule,  one  might  use  a  stick  on  which 
marks  had  been  set  at  2,  3.7,  4.8,  etc.,  inches  with  a  fair  degree 
of  accuracy.  Nevertheless  linear  measurements  are  certainly 
more  easily  obtained  with  a  rule,  and  in  like  manner  scores  are 
more  easily  obtained  when  the  scale  has  equal  steps  than  when 
the  steps  are  unequal.  For  this  reason  among  others,  scale 
makers  have  tried  as  far  as  possible  to  have  the  steps  on  their 
scales  approximately  equal.  One  method  of  doing  this  is  to 
eliminate  from  the  scale  as  first  constructed,  certain  "  odd  n 
problems,  and  retain  only  those  which  fall  at  points  approx- 
imately the  same  distance  apart.  Another  plan  is  to  try  out 
a  new  set  of  problems,  and  from  among  these  select  problems 
which  will  fill  in  the  gaps  in  the  scale ;  or  to  change  the  wording 
or  scoring  of  a  problem  in  such  a  way  as  to  shift  it  up  or  down 
on  the  scale  of  difficulty. 

A  good  example  of  the  first  method  of  securing  equal  steps 
on  the  scale  is  given  by  the  Woody  Arithmetic  Scales,  Series  B.1 
These  scales  represent  a  selection  of  certain  problems  from  the 
longer  Series  A  (scales  constructed  by  the  method  outlined 
above)  and  contain  problems  which  are  progressive^  more 
difficult  by  approximately  equal  steps.  The  problems  in  Series 
A  are  not  spaced  at  equal  points  on  a  difficulty  scale.  In 
the  Addition  Scale,  for  example,  problem  No.  1  has  a  difficulty 
value    of    1 .  23PE    as    measured    from    the    arbitrary    zero 

1  Woody,    Clifford:    Measurements    of   Some  Achievements    in   Arithmetic. 
Teachers  College,  Columbia  University,  1916. 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE      105 

-2.425PE;1   problem  No.  2  has  a  difficulty  value  of  1A0PE, 

and  problem  No.  3  a  difficulty  value  of  2.50PE. 

i 

TABLE  XII 

Difficulty  Values  (PE)  of  the  Problems  in  the  Woody 
Arithmetic  Scale  (Addition),  Series  A  and  B 

PE  Differences 


Problem  No. 

Series  A,  PE  Value 

Series  B,  PE  Value 

±   jjj    jL»iucicuuca 

(Series  B) 

1 

1.23 

1.23 

2 

1.40 

1.40 

.17 

3 

2.50 

2.50 

1.10 

4 

2.61 

5 

2.83 

2.83 

.33 

6 

3.21 

7 

3.26 

3.26 

.43 

8 

3.35 

9 

3.63 

10 

3.78 

3.78 

.52 

11 

3.92 

12 

4.18 

13 

4.19 

4.19 

.41 

14 

4.85 

4.85 

.66 

15 

4.97 

16 

5.52 

5.52 

.67 

17 

5.59 

18 

5.73 

19 

5.75 

5.75 

.23 

20 

6.10 

6.10 

.35 

21 

6.44 

6.44 

.34 

22 

6.79 

6.79 

.35 

23 

7.11 

7.11 

.32 

24 

7.43 

7.43 

.32 

25 

7.47 

26 

7.61 

27 

7.62 

28 

7.67 

29 

7.71 

7.71 

.28 

30 

7.71 

31 

7.97 

32 

8.04 

33 

8.18 

8.18 

.47 

34 

8.22 

35 

8.58 

36 

8.67 

8.67 

.49 

37 

8.67 

38 

9.19 

9.19 

.52 

1  The  arbitrary  zero  point  on  the  Woody  addition  scale  is  —2A25PE  below 
the  median  of  Grade  II. 


106       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

The  number  and  the  PE  value  of  the  other  problems  in  Series 
A  (Addition)  and  the  problems  which  have  been  selected  from 
this  series  to  make  up  Series  B  are  shown  in  Table  XII.  Each 
problem  in  Series  A,  as  noted  above,  is  expressed  in  terms  of  its 
PE  distance  from  the  arbitrary  zero  point  —2A25PE  below 
the  second  grade  median.  The  extremely  high  PE  values  of  the 
problems  in  the  upper  half  of  the  scale  result  from  the  fact 
that  the  scale  is  intended  for  the  elementary  grades  from  II 
to  VIII  inclusive,  and  hence  the  more  difficult  problems  fall 
entirely  out  of  the  range  of  second  grade  ability.  Note  that 
except  in  a  very  few  cases,  the  problems  in  Series  B  appear  as 
a  graded  series  from  easy  to  hard  in  which  the  steps  from 
problem  to  problem  are  fairly  well  equalized.  The  score  on  this 
scale  is  simply  the  number  of  problems  solved  correctly — the 
distance  which  one  progresses  up  the  scale — just  as  a  child's 
height  is  so  many  feet  and  inches  on  a  scale  of  height. 

On  a  scale  which  has  equal  steps,  we  know  that  the  increase 
from  say  point  10  to  12  is  the  same  as  the  increase  from  12  to 
14,  and  1/2  the  increase  from  14  to  15.  Moreover,  we  may 
say  that  the  child  who  works  8  problems  is  as  far  ahead  of  the 
child  who  works  4,  as  the  second  child  is  ahead  of  one  who  cannot 
work  a  single  problem.  We  must  be  extremely  careful  not  to 
interpret  one  measure  of  capacity  on  such  a  scale  as  "so  many 
times' '  another  measure,  however.  Unlike  measures  of  height  or 
weight  which  are  measured  from  absolute  zeros,  the  measures 
given  by  a  scale  of  performance  are  taken  from  some  arbitrary 
zero  point  selected  by  the  experimenter.  So  while  we  may  say 
that  a  man  72  inches  in  height  is  twice  as  tall  as  a  child  who  is 
only  36  inches  in  height,  we  cannot,  by  analogy,  say  that  a  child 
who  scores  5  on  an  addition  test  has  doubled  his  ability  when 
he  is  able  to  score  10,  unless  the  measures  in  the  test  have  been 
taken  from  the  absolute  zero  point  of  "  just  no  ability  at  all  " 
in  addition. 

The  method  of  constructing  a  scale  outlined  above  may  be 
used  with  any  group,  grade,  or  class.  When  the  scale  is 
designed  for  use  with  more  than  one  group,  e.g.,  for  the  whole 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE      107 

elementary  school,  an  extension  of  the  method  given  is  often 
used.     In  brief,  this  is  as  follows: 

(1)  The  PE  value  of  each  problem  is  determined  for  each 
grade  separately,  as  shown  above,  by  computing  the  per  cent 
who  pass  each  problem. 

(2)  The  PE  distances  between  the  different  grade  medians 
are  then  computed.  This  is  done  by  finding  the  per  cent  of 
the  pupils  in  each  grade  who  have  scores  larger  than  the  median 
score  of  the  next  grade.  These  per  cents,  when  turned  into 
PE  values  by  means  of  Table  XI,  give  the  PE  distances 
between  adjoining  grade  medians. 

(3)  Knowing  the  PE  distances  between  the  grade  medians, 
we  may  now  convert  the  PE  distance  of  each  problem  from 
a  given  grade  median  into  a  PE  distance  from  some  common 
zero  point.  The  different  PE  values  of  each  problem  as 
determined  for  the  various  grades  are  averaged  to  give  the 
final  scale  value * — the  distance  from  the  common  zero  point. 

A  shorter  method  than  the  one  described  may  also  be  used. 
This  is  to  compute  the  PE  value  of  a  problem  once  for  all  from 
the  per  cent  of  a  large  sample — drawn  from  the  entire  group — 
who  pass  the  problem.  This  plan  is  practically  identical  with 
that  which  we  have  already  described  on  page  102.  It  assumes 
that  the  capacity  which  the  scale  is  designed  to  measure  is  dis- 
tributed normally  throughout  the  entire  group.  While  probably 
not  as  exact  as  the  more  elaborate  method,  it  has  the  advantage 
of  simplicity  and  straightforwardness. 

4.  The   Conversion   of   Judgments   by  Relative  Position — or 
Relative  Merit — into  a  or  PE  Positions  on  a  Scale 

The  preceding  paragraphs  have  dealt  with  the  construction 
of  performance  scales  built  up  on  the  principle  that  the  per  cent 
passing  (or  failing)  a  given  problem  is  the  best  index  of  the 
difficulty  of  that  problem.     It  sometimes  happens,  however, 

1 A  method  of  weighting  the  PE  values  of  a  problem  in  averaging  the  results 
from  the  different  grades  is  described  by  Woody  in  his  "Measurements  of  Some 
Achievements  in  Arithmetic." 


108       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

that  the  ability  to  be  measured  is  of  such  a  nature  that  per- 
formance in  it  cannot  be  scored  simply  as  correct  or  incorrect, 
but  must  be  determined  by  a  comparison  with  other  perform- 
ances of  a  like  sort.  This  leads  to  the  construction  of  product 
scales.  Handwriting  scales,  composition  scales,  drawing  scales 
are  examples  of  instruments  in  which  the  quality  of  the  product 
is  measured,  and  not  its  presence  or  absence  in  terms  of  a 
per  cent  or  number  correct.  For  example  an  individual's 
handwriting  is  rated  for  merit  by  comparing  it  with  "  standard  " 
specimens  of  handwriting  the  quality  of  which  is  known. 

Quality  scales  are  constructed  on  the  assumption  that 
equally  often  noticed  differences — in  merit  or  excellence — are 
equal.  The  first  step  is  to  secure  a  large  number  of  samples 
of  the  thing  to  be  measured,  e.g.,  specimens  of  handwriting  or 
composition,  ranging  from  very  poor  to  excellent.  The  next 
step  is  to  have  a  large  number  of  presumably  able  judges 
arrange  these  specimens  in  order  of  merit,  in  this  way  comparing 
each  specimen  with  each  other  one.  The  number  of  times 
each  specimen  is  ranked  above  each  other  one  is  now  reduced 
to  percentage  terms,  and  this  per  cent  is  expressed  as  a  PE 
difference  between  the  two  specimens.  The  PE  difference 
determined,  specimens  selected  for  the  scale  may  be  expressed 
as  so  many  PE  above  some  arbitrary  zero  point.  We  may  take 
specimens  8  and  9  on  the  Hillegas  Composition  Scale  1  as  an 
illustration  of  the  method.  Hillegas  had  each  of  202  judges 
arrange  a  number  of  English  compositions  in  order  of  merit. 
An  artificial  composition  was  selected  as  being  of  zero  merit, 
and  given  the  value  0  on  the  scale.  Of  the  202  judges,  136 
or  67.5%  ranked  9  as  better  than  8.  From  Table  XI,  we 
know  that  a  percentage  difference  of  67.5%  indicates  a  PE 
difference  of  .QQPE,  and  this  value,  therefore,  expresses  the 
amount  by  which  9  is  better  than  8.  The  value  of  8  had 
already  been  found  to  be  7 .72PE  above  the  0  point  on  the 
scale.     Hence  9  is  7 .  72+  .  66  or  8 .  SSPE  above  the  zero  compo- 

i  Hillegas,    Milo   B.     A    Scale  for   the   Measurement   of  Quality  in   English 
Composition  by  Young  People.     Teachers  College,  Columbia  University,  1912, 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE      109 

sition.  The  values  of  the  other  compositions  on  the  Hillegas 
Scale  as  measured  in  PE  values  from  zero,  the  differences  deter- 
mined in  terms  of  relative  merit,  are  0,  1 .  83,  2 .  60,  3 .  69,  4 .  74, 
5.85,  6.75,  7.72,  8.38,  9.37.  Note  that  the  steps  on  this 
scale  are  fairly  regular,  being  approximately  1PE  apart. 

5.  The  Scaling  of  Total  Scores  on  a  Test 

Before  concluding  this  brief  review  of  the  methods  of  con- 
structing scales,  we  should  mention  several  methods  used  for 
scaling  total  scores  on  a  test.  The  distinction  between  these 
methods  and  those  we  have  outlined  is  that  in  the  latter,  instead 
of  scaling  each  separate  element  on  the  test  for  difficulty — 
except  possibly  to  secure  an  approximate  order  of  difficulty — 
we  simply  determine  the  difficulty  value  attained  as  a  result  of 
doing  correctly  a  certain  number  of  test  elements.  In  other 
words  the  score  depends  on  total  number  of  questions  answered 
or  problems  worked,  and  the  difficulty  value  of  individual 
problems  is  not  considered  as  in  (3)  and  (4)  above.  The  three 
methods  1  proposed  for  scaling  total  scores  give,  respectively, 
(a)  a  percentile  scale,  (6)  an  age  scale,  and  (c)  a  T-scale. 

(a)  We  have  already  learned  how  to  locate  the  percentile 
values  in  a  distribution  of  scores  (pages  45-46).  In  a  per- 
centile scale  a  child  making  a  certain  score  (total  number  correct) 
on  a  test  is  given  a  percentile  rating  of  20,  30,  70,  etc.,  according 
to  his  position  in  the  distribution.  The  percentile  method 
assumes  that  the  difference  between  a  percentile  of  say  10  and 
20  is  the  same  as  the  difference  between  a  percentile  of  40  and 
50:  that  percentile  differences  are  equal  throughout  the  scale. 
There  is  considerable  reason  to  doubt  this  assumption  of  equal 
units  on  the  percentile  scale,  however;  and  for  this  reason  while 
practically  very  useful,  the  percentile  scale  is  not  entirely  sound 
theoretically. 

(6)  In  the  age  scale,  the  mean  number  of  points  scored, 
on  the  test  by  unselected  7  year  olds  is  scored  7,  the  mean  num- 
ber of  points  scored  by  unselected  9  year  olds  is  scored  9,  and 

i  See  McCall,  W.  M.     How  to  Experiment  in  Education,  1923,  p.  95ff. 


110       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

so  on  for  other  age  groups.  Scores  which  fall  between  age 
groups  are  evaluated  by  interpolation.  The  age  scale  is  widely 
used,  and  is  easily  interpreted.  The  chief  drawback  to  its  use 
seems  to  be  the  difficulty  of  getting  unselected  samples  for 
determining  the  norms  of  the  low  and  high  age  groups.  Many 
very  young  children  are  not  in  the  schools,  while  many  of  the 
older  ones  for  one  reason  or  another  have  been  eliminated. 
As  a  result,  age  scales  are  only  strictly  accurate  between  very 
narrow  ranges  of  ability. 

(c)  Recently  McCall  has  suggested  a  method  of  scaling  total 
scores,  the  !T-scale,  which  eliminates  many  of  the  defects  of  both 
the  percentile  and  the  age  scale  methods.  In  this  method,  scores 
are  based  on  the  a  of  the  distribution  of  scores  made  by  un- 
selected 12  year  olds.  jT-scores  range  from  0  to  100.  The 
zero  point  on  the  scale  is  taken  at  5a  below  the  mean  and  the 
100  point  at  5a  above  the  mean.  The  unit  of  measure,  or  one 
"  T  "  is  .1  of  the  a  of  the  distribution  of  unselected  12  year 
olds.  The  mean  T'-score,  therefore,  is  50  and  each  10  points 
above  or  below  this  point  represent  la  of  the  12  year  old  dis- 
tribution. In  actual  practice  I'-scores  will  be  found  to  range 
generally  between  15  and  85.  A  person  who  stands  at  the  mean 
of  12  year  olds  on  a  given  test  has  a  !T-score  of  50;  one  who 
stands  la  above  the  mean,  a  T-scove  of  60;  and  one  who  stands 
la  below  the  mean  of  12  year  olds  a  T'-score  of  40. x 

The  construction  of  the  T-scale  has  been  described  in  great 
detail  by  McCall  in  Chapter  X  of  his  How  to  Measure  in 
Education,  and  in  consequence  only  the  most  important 
advantages  of  the  scale  need  be  considered  here.2  In  the 
first  place,  the  scale  covers  a  wide  range  of  ability  which  may 
be  extended  if  necessary.  Secondly,  all  T-scores  are  expressed 
in  terms  of  the  same  unit  and  with  respect  to  the  same  zero 
point  and  are  equal  throughout  the  scale.  Accordingly, 
scores  from  different  tests  are  directly  comparable  and  may 

1  For  an  example,  see  the  Thorndike-McCall  Reading  Scales,  published  by 
Teachers  College,  Columbia  University. 

2  For  a  complete  discussion  of  the  advantages  of  the  T-Scale  over  the  age 
and  percentile  scales,  see  McCall,  How  to  Experiment  in  Education,  1923,  94ff. 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE      111 

be  combined  by  simple  addition.  Finally,  a  score  of  a  given 
size  will  always  have  the  same  meaning  when  referred  to  the 
mean  of  unselected  12  year  olds  which  remains  at  50. 

V.  The  Transmutation  of  Measures  by  Relative  Position 
(in  Order  of  Merit)  into  Measures  in  Units  of 
Amount 

It  is  often  very  desirable,  especially  in  the  calculation  of 
coefficients  of  correlation,  to  be  able  to  transmute  measures 
arranged  in  order  of  merit  into  measures  in  units  of  amount 
or  "  scores  "  on  some  linear  scale.  This  can  easily  be  accom- 
plished by  means  of  tables,  provided  we  can  assume  "  nor- 
mality "  in  the  trait  for  which  the  ranking  has  been  made. 
To  take  an  example,  let  us  suppose  that  we  have  15  salesmen 
ranked  in  order  of  merit  for  selling  efficiency,  the  most  effi- 
cient ranked  No.  1,  the  least  efficient  ranked  No.  15.  Now 
if  we  are  justified  in  assuming  that  selling  efficiency  follows 
the  normal  probability  curve,  we  can — with  the  aid  of  Table 
XIII — assign  to  each  man  a  "  selling  score  "  on  a  scale  of  10 
or  100  points  which  will  very  probably  represent  his  capacity  as 
a  salesman  much  better  than  a  rank  of  2,  6,  or  14.  The  problem 
may  be  stated  as  follows: 

Problem  (1) — Given  15  salesmen  ranked  in  order  of  merit 
by  their  sales-manager,  transmute  these  rankings  into  scores 
on  a  scale  of  10  points. 

The  procedure  is  as  follows:    First  by  means  of  a  simple 

formula, 

„           ,        ...         100(^-.5)l  ,10. 

Per  cent  position  =  — =r= -  /       .     .     .     (12) 

in  which  R  is  the  rank  of  the  individual  in  the  series,  and  N 
the  number  ranked,  we  determine  the  "  per  cent  position  "  of 
each  man.  Next,  from  Table  XIII  we  read  off  the  score  on  a 
scale  of  10  points.     Thus  Salesman  A  who  ranks  No.  1  (see  the 

1  This  formula  and  the  method  built  around  it  were  devised  by  Professor  Clark 
Hull.  See  Hull,  The  Computation  of  the  Pearson  r  from  Ranked  Data,  Journal 
of  Applied  Psychology,  1922,  6,  385. 


112      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

table   below)  has  a  per  cent  position  of   ^— —  or  3.34, 

and  his  score  from  Table  XIII  is  8.5  (finer  interpolation  un- 
necessary).    In  like  manner,  Salesman  B  who  ranks  No.  2  has 

a  per  cent  position  of r— — : —  or  10,  and  his  score,  accord- 

ingly,  is  7.5.     The  scores  of  the  others,  found  in  exactly  the 
same  way,  are  given  in  the  following  table: 


Salesmen 

Rank 

Per  cent  Position 

Score  (Scale  10) 

A 

1 

3.34 

8.5 

B 

2 

10.00 

7.5 

C 

3 

16.67 

6.9 

D 

4 

23.34 

6.4 

E 

5 

30.00 

6.0 

F 

6 

36.67 

5.7 

G 

7 

43.34 

5.3 

H 

8 

50.00 

5.0 

I 

9 

56.67 

4.7 

J 

10 

63.34 

4.3 

K 

11 

70.00 

4.0 

L 

12 

76.67 

3.6 

M 

13 

83.34 

3.1 

N 

14 

90.00 

2.5 

0 

15 

96.67 

1.5 

On  several  previous  occasions,  it  has  been  pointed  out  that 
the  assumption  of  normality  in  a  trait  or  capacity  implies  that 
differences  at  the  extremes  of  capacity  are  relatively  much 
greater  than  the  same  differences  around  the  average  or  mean. 
This  is  clearly  brought  out  in  the  table  above;  for  while  all 
differences  in  the  order  of  merit  series  equal  1,  the  differences 
between  the  transmuted  scores  vary  considerably,  being 
greatest  at  the  ends  of  the  series,  and  smallest  in  the  middle. 
The  difference  between  A  and  B,  for  example,  or  between 
N  and  O,  is  three  times  as  great  as  the  difference  between  G 
and  H.  Stated  differently,  we  may  say  that  it  is  three  times  as 
easy  to  move  from  H  to  G  (from  8th  to  7th  place)  as  from  B 
to  A  (from  2nd  to  1st  place). 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE      113 

TABLE  XIII 

[From  Hull,  Journal  of  Applied  Psychology,  1922] 

The  Transmutation  of  an  Order  of  Merit  into  Units  of  Amount  or 

"Scores." 


Let  R  represent  the  rank  in  the  Order  of  Merit,  and  N  the  number 
iked.     Then  from  the  formula,  Per  ( 
per  cent  position,  and  from  it  the  score. 


ranked.     Then  from  the  formula,  Per  cent  position  = =r= — '- — ,  find  the 


Example 

::    IfJV=25, 

and  R=  3, 

Per  cent  position  = 

100(3-5) 
25 

or  10.00, 

and  from  th 

e  table  the  score  is  7 . 5. 

Per  cent 

Score 

Per  cent 

Score 

Per  cent 

Score 

.09 

9.9 

22.32 

6.5 

83.31 

3.1 

.20 

9.8 

23.88 

6.4 

84.56 

3.0 

.32 

9.7 

25.48 

6.3 

85.75 

2.9 

.45 

9.6 

27.15 

6.2 

86.89 

2.8 

.61 

9.5 

28.86 

6.1 

87.96 

2.7 

.78 

9.4 

30.61 

6.0 

88.97 

2.6 

.97 

9.3 

32.42 

5.9 

89.94 

2.5 

1.18 

9.2 

34.25 

5.8 

90.83 

2.4 

1.42 

9.1 

36.15 

5.7 

91.67 

2.S 

1.68 

9.0 

38.06 

5.6 

92.45 

2.2 

1.96 

8.9 

40.01 

5.5 

93.19 

2.1 

2.28 

8.8 

41.97 

5.4 

93.86 

2.0 

2.63 

8.7 

43.97 

5.3 

94.49 

1.9 

3.01 

8.6 

45.97 

5.2 

95.08 

1.8 

3.43 

8.5 

47.98 

5.1 

95.62 

1.7 

3.89 

8.4 

50.00 

5.0 

96.11 

1.6 

4.38 

8.3 

52.02 

4.9 

96.57 

1.5 

4.92 

8.2 

54.03 

4.8 

96.99 

1.4 

5.51 

8.1 

56.03 

4.7 

97.37 

1.3 

6.14 

8.0 

58.03 

4.6 

97.72 

1.2 

6.81 

7.9 

59.99 

4.5 

98.04 

1.1 

7.55 

7.8 

61.94 

4.4 

98.32 

1.0 

8.33 

7.7 

63.85 

4.3 

98.58 

.9 

9.17 

7.6 

65.75 

4.2 

98.82 

.8 

10.06 

7.5 

67.48 

4.1 

99.03 

.7 

11.03 

7.4 

69.39 

4.0 

99.22 

.6 

12.04 

7.3 

71.14 

3.9 

99.39 

.5 

13.11 

7.2 

72.85 

3.8 

99.55 

.4 

14.25 

7.1 

74.52 

3.7 

99.68 

.3 

15.44 

7.0 

76.12 

3.6 

99.80 

.2 

16.69 

6.9 

77.68 

3.5 

99.91 

.1 

18.01 

6.8 

79.17 

3.4 

100.00 

0 

19.39 

6.7 

80.61 

3.3 

20.93 

6.6 

81.99 

3.2 

114      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

Another  use  to  which  Table  XIII  may  be  put  is  in  the 
combining  of  incomplete  order  of  merit  rankings.  To  illus- 
trate with  a  problem: 

Problem  2 — Given  six  persons,  A,  B,  C,  D,  E,  and  F,  to 
be  ranked  for  honesty  by  three  judges.  Judge  1  knows  all  six 
well  enough  to  rank  them;  Judge  2  knows  only  three  well 
enough  to  rank  them;  and  Judge  3  knows  four  well  enough 
to  rank  them.  Can  we  obtain  a  fair  order  of  merit  for  all 
six  persons  by  combining  these  three  sets  of  rankings,  two  of 
which  are  incomplete? 

We  may  tabulate  the  data  as  follows: 

Persons  A  B  C  D  E  F 

Judge  l's  ranking 1  2  3  4  5  6 

Judge  2's  ranking 2  1  3 

Judge  3's  ranking 2  1  3  4 

Now  assuming  that  honesty  is  "  normally  distributed  ': 
it  seems  fair  that  A  should  get  more  credit  for  ranking  first  in 
a  list  of  six  than  D  for  ranking  first  in  a  list  of  three,  or  C  for 
ranking  first  in  a  list  of  four.  In  the  order  of  merit  rankings, 
all  three  are  given  the  same  rank.  But  when  we  assign  scores 
to  each  person  in  accordance  with  his  position  in  the  list  bjr 
means  of  formula  (12)  and  Table  XIII,  A  gets  77  for  his  first 
place,  D  gets  69  for  his,  and  C  gets  72  for  his  (see  table  below) . ! 

Persons  A  B  C  D  E  F 

Judge  l's  ranking 1  2  3  4  5  6 

Score 77  63  54  46  37  23 

Judge  2's  ranking ..  2  ..  1  ..  3 

Score 50  69  .  .  33 

Judge  3's  ranking 2  ..  1  ..  3  4 

Score 55  ..  72  43  28 

Sum  of  scores 132       113       126       115       SO       S4 

Average  score 66         57         63         58       40       28 

Order  of  Merit 1  4  2  3         5         6 

1  It  is  somewhat  doubtful  whether  it  is  usually  worth  the  trouble  to  trans- 
mute orders  of  merit  into  scores  as  shown  above  and  then  combine  them  so  as 
to  get  a  weighted  order  (see  Garrett,  H.  E.,  An  Empirical  Study  of  the  Various 
Methods  of  Combining  Incomplete  Order  of  Merit  Ratings.  Journal  of  Educational 
Psychology,  1924,  XV,  pp.  157-171).  If  it  is  deemed  desirable  to  weight  ratings, 
however,  the  method  given  will  prove  useful. 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE      115 

The  other  ratings  are  transmuted  in  the  manner  shown  above. 
All  of  the  scores  are  then  combined  and  averaged  to  give  the 
final  weighted  order  of  merit  as  shown  in  the  table. 

With  formula  (12)  and  Table  XIII  it  is  possible  to 
transmute  any  set  of  ranks  into  scores  on  the  assumption  of  a 
normal  distribution  in  the  trait  for  which  the  ranking  is  made. 
This  is  very  useful  in  the  case  of  those  traits  which  are  not 
easily  measured  by  ordinary  methods,  but  for  which  individ- 
uals may  be  arranged  in  an  order  of  merit,  as  for  example 
athletic  ability,  personality,  beauty,  etc.  It  is  also  valuable 
in  correlation  when  a  set  of  ranks  is  the  only  available  "  crite- 
rion "  for  a  given  ability  while  the  "  independent  "  tests  are 
scored  in  ordinary  test  units.1  Transmuted  scores  may  be 
combined,  or  averaged,  like  other  test  scores. 

A  word  of  explanation  may  be  said  in  regard  to  the  con- 
struction of  Table  XIII.  This  table  was  derived  from  a  table 
of  the  theoretical  frequencies  of  the  normal  frequency  distri- 
bution in  which  the  curve  was  taken  to  end  at  ±2.5cr.  The 
baseline  of  the  curve  is  5cr,  therefore,  and  may  conveniently  be 
subdivided  into  100  parts,  each  .  05<r.  The  first  .  05<r  from  the 
upper  extreme  limit  of  the  curve  takes  in  .09%  of  the  distri- 
bution and  is  scored  9.9  (or  99  on  a  scale  of  100).  The  next 
.05(7  (.lOcr  from  the  upper  end  of  the  curve)  takes  in  .20%  of 
the  entire  distribution  and  is  scored  9.8,  or  98,  and  so  on.  In 
each  case,  the  percent  position  gives  the  fractional  part  of  the 
normal  distribution  which  lies  to  the  right  of  the  given  a  value 
on  the  baseline.     The  a  values  determine  the  score. 

PROBLEMS 

1.  (a)  Plot  both  distributions  given  in  example  (2),  page  56  as 
frequency  polygons  and  histograms.  For  comparative 
purposes  plot  the  frequency  polygon  and  the  histogram  for 
each  distribution  with  respect  to  the  same  coordinate  axes: 
on  the  same  diagram. 
(b)  Calculate  a  measure  of  skewness  for  both  distributions. 

1  The  definition  of  a  criterion  and  its  value  in  determining  the  validity  of 
one  or  more  tests  is  discussed  at  length  in  Chapters  V  and  VI. 


116      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

2.  Plot  distribution  A,  example  (2),  page  56,  as  an  ogive.     Compare 

the  percentiles  obtained  from  the  graph  with  the  calculated 
values. 

3.  Assuming  that  trait  X  is  completely  determined  by  6  factors — all 

equal  in  value,  similar,  and  independent,  and  each  as  likely  to 
be  present  as  absent — plot  the  distribution  which  one  would 
most  probably  get  from  the  measurement  of  trait  X  in  an 
unselected  group  of  1000  people. 

4.  In  a  random  sample  of  1000  cases,  Average  =  14 . 4,  and  a  =  2. 5. 

(a)  What  per  cent  of  the  cases  lie  between  12  and  16? 

(b)  What  are  the  chances  that  any  future  case  will  be  above  18? 

(c)  What  are  the  chances  that  any  future  case  will  be  below  8? 

5.  In  an  approximately  normal  distribution  of  100  cases,  Average  = 

29.74,  Q(PE)  =3. 18. 
(a)  What  per  cent  of  the  cases  lie  between  24  and  25? 
(6)  What  limits  include  the  middle  60%  of  the  cases? 
(c)  What  limits  include  the  lowest  5%  of  the  cases? 

6.  In  a  certain  test  the  7th  grade  median  is  28,  with  a  Q  of  4.8;  and 

the  8th  grade  median  is  31 .6,  with  a  Q  of  4.0.  What  per  cent 
of  the  7th  grade  is  above  the  median  of  the  8th  grade? 

7.  A  group  of  12  year  olds,  two  years  ago,  had  a  reading  ability 

expressed  by  an  average  of  40,  and  a  <r  of  3.6;  and  a  composition 
ability  expressed  by  an  average  of  62,  and  a  a  of  9.6.  Today 
the  group  has  gained  12  in  reading  and  10.8  in  composition. 
How  many  times  greater  is  the  former  than  the  latter  gain? 

8.  Four  problems,  1,  2,  3,  and  4,  are  solved  by  50%,  60%,  70%, 

and  80%,  respectively,  of  a  large  group.  Compare  the  dif- 
ference in  difficulty  between  1  and  2  with  the  difference  in 
difficulty  between  3  and  4. 

9.  In  a  college  the  10  grades  A+,  A,  A- ;  B+,B,B-;  C+,C,C-; 

and  D  are  given.  On  the  assumption  that  ability  in  mathe- 
matics is  distributed  normally,  how  many  men  in  a  group  of 
500  Freshmen  should  receive  each  grade? 

10.  Five  problems  are  passed  by  15%,  34%,  50%,  62%,  and  80% 
of  a  large  unselected  group.  If  the  zero  point  of  ability  is 
taken  at  —  3a,  what  is  the  a  value  of  each  problem  as  measured 
from  this  point? 


GRAPHIC  METHODS  AND  THE  NORMAL  CURVE      117 

11.  In  a  large  group  of  competent  judges,  88%  rank  composition  A 

as  better  than  composition  B;  65%  rank  B  as  better  than  C. 
If  C  is  known  to  have  the  PE  value  of  3.5  as  measured  from 
the  zero  composition,  i.e.,  the  composition  of  zero  merit,  what 
are  the  PE  values  of  B  and  A  as  measured  from  this  "  zero  "? 

12.  Twenty-five  men  on  a  football  squad  are  ranked  in  order  of  merit 

from  1  to  25  for  general  playing  ability  by  the  coach.  Assuming 
"  normality  "  in  the  trait  "  general  playing  ability  "  transmute 
these  ranks  into  units  of  amount  on  a  scale  of  100  points. 

Answers 

4.  (a)  57.04%.     (b)  749  in  10,000.      (c)  52  in  10,000. 

5.  (a)  4.8%.     (6)  25.76  and  33.72.     (c)  21.95  and  the  lower  limit 

of  the  distribution. 

6.  30.65%. 

7.  2 .  96  (approximately  3)  times  as  great. 

8.  Difference  between  1  and  2,  .25<j;  between  3  and  4,  .315a-. 

9.  Grades:         A+    A     A-    B+      B       B-     C+     C     C-    D 
No.  men 

receiving:      3    f  14      40      80      113      113      80      40      14      3 

10.  In  order:  4.04;  3.41;  3.00;  2.69;  2.16. 

11.  B,  4.07PE;  A,  5.82PE. 
12. 


tank 

Score 

Rank 

Scoi 

1 

89 

13 

50 

2 

80 

14 

48 

3 

75 

15 

46 

4 

71 

16 

44 

5 

68 

17 

42 

6 

65 

18 

39 

7 

63 

19 

37 

8 

61 

20 

35 

9 

58 

21 

32 

10 

56 

22 

29 

11 

54 

23 

25 

12 

52 

24 

20 

25 

11 

CHAPTER   III 
THE  RELIABILITY  OF  MEASURES 

I.  What  is  Meant  by  the  Reliability  of  a  Measure 

By  the  "  true  "  measure  of  an  individual's  capacity  in  any 
trait,  as  for  example,  the  true  measure  of  his  height,  reaction, 
time,  or  intelligence,  we  mean  the  average  of  an  infinite  number 
of  measurements  of  the  given  capacity  made  under  precisely 
the  same  conditions.  Obviously,  in  actual  practice,  we  can  never 
deal  with  true  measures  as  thus  defined — for  usually  wre  must 
be  satisfied  with  a  single  measure,  or  at  best  with  a  compara- 
tively few  measures  of  the  given  trait.  We  can,  however, 
measure  the  amount  by  which  an  obtained  measure  "most 
probably"  varies  from  its  corresponding  true  measure;  and  this 
measure  of  "probable  divergence"  serves  as  an  index  of  the 
reliability  of  the  obtained  measure — of  how  good  an  approxi- 
mation it  is  of  the  true  measure. 

In  like  manner,  the  reliability  of  an  obtained  measure  of  a 
group  is  determined  by  finding  the  probable  divergence  of  the 
obtained  measure  from  the  true  measure  of  the  group.  The 
true  measure  of  a  group — as  for  example  the  true  average 
or  the  true  a — is  defined  as  that  measure  obtained  by  taking 
into  account  all  of  the  members  of  the  group,  and  the  true 
measure  of  difference  between  two  groups  is  the  difference 
between  their  true  means  or  medians.  To  show  just  what 
is  meant  by  the  "  true  measure  "  of  a  group,  let  us  suppose 
that  we  could  measure  the  height  of  every  12  year  old  boy 
in  the  United  States.  If  from  this  frequency  distribution  of 
heights,  we  should  calculate  a  measure  of  central  tendency 
and  a  measure  of  variability — the  average  and  a  for  example — 
this  average  would  be  the  true  average  height  of  12  year  old 

IIS 


THE  RELIABILITY  OF  MEASURES  119 

boys  in  the  United  States,  and  the  a  would  be  the  true  measure 
of  scatter  around  this  average.  In  the  same  way,  if  we  could 
measure  the  height  of  every  12  year  old  girl  in  the  United 
States,  it  would  be  possible  to  secure  the  true  average  height, 
and  the  true  variability  around  it,  of  12  year  old  girls  in  this 
country.  Moreover,  knowing  the  true  average  height  of  12 
year  old  boys  and  the  true  average  height  of  12  year  old  girls, 
it  would  be  a  very  simple  matter  to  find  the  true  difference 
between  the  average  height  of  12  year  old  boys  and  12  year 
old  girls  in  the  United  States. 

Unfortunately  it  is  rarely,  if  ever,  possible  to  measure  all 
of  the  individuals  in  a  group  or  "  population,"  and  it  is,  of 
course,  impossible  to  take  an  infinite  number  of  measures  of 
a  given  individual.  We  must  be  content,  therefore,  to  deal  with 
"  samples  "  selected  from  the  total  number  of  possible  meas- 
ures; and,  as  a  result,  due  to  slight  differences  in  the  samples 
chosen,  measures  of  central  tendency  and  variability  are  often 
larger  or  smaller  than  their  corresponding  true  measures. 
Hence,  whenever  we  have  measured  an  individual  or  a  group, 
we  must  ask  ourselves  this  question:  "  How  reliable  a  measure 
of  capacity  have  I  secured?  How  well  does  it  '  represent  ' 
the  true  measure  which  I  should  get  from  a  very  large  (infinite) 
number  of  measures  of  this  individual — or  from  measuring 
all  of  the  individuals  in  the  population  from  which  my  group 
is  taken?"  This  question  will  often  lead  to  a  second:  "  How 
many  measurements  must  I  make  in  order  to  get  a  result 
which  shall  meet  a  certain  standard  of  reliability,  i.e.,  show  a 
probable  divergence  from  the  true  result  which  is  less  than 
some  given  amount?" 

The  purpose  of  the  following  sections  is  to  develop  methods 
which  will  enable  us  to  answer  these  questions.  First,  the 
reliability  of  the  mean  and  median  will  be  considered;  then 
the  reliability  of  the  measures  of  variability;  and  finally  the 
reliability  of  the  difference  between  two  measures.1 

1  The  method  of  finding  the  reliability  of  a  coefficient  of  correlation  is  given 
later  on  page  170. 


120      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 


II.  The  Reliability  of  Measures  of  Central  Tendency 

1.  The  Reliability  of  the  Average  or  Mean 

A.  The  Reliability  of  the  Mean  in  Terms  of  its  Standard 

Error  Oav.) 

Perhaps  the  simplest  approach  to  the  study  of  the  reliabil- 
ity of  the  average  is  to  examine  the  factors  upon  which  the 
reliability  of  this  measure  must  depend.  Suppose  that  we  wish 
to  find  the  average  score  of  college  freshmen  in  the  United 
States  on  Army  Alpha.  To  measure  the  achievement  of 
college  freshmen  in  general,  would  require  in  strict  logic  that 
we  test  all  of  the  freshmen  in  the  United  States.  However, 
this  is  a  well-nigh  impossible  task,  and  hence  we  must  be 
satisfied  with  taking  the  records  of  as  large  and  random  a  sample 
of  freshmen  as  we  can  secure.  This  means  that  we  cannot  use 
freshmen  from  only  a  single  institution  or  from  only  one  sec- 
tion of  the  country,  and  that  we  must  guard  against  selecting 
only  those  with  low  or  high  scholastic  records.  The  more 
successful  we  are  in  getting  an  "  unselected  "  group  the  more 
nearly  representative  will  this  group  be  of  all  of  the  freshmen  in 
the  country.  Evidently,  therefore,  the  reliability  (the  "  repre- 
sentativeness ")  of  an  average  depends,  for  one  thing,  on  how 
impartially  we  have  selected  our  sample. 

Granted  a  fair  sample,  the  reliability  of  an  average  can  be 
shown  to  depend  upon  two  characteristics  of  the  distribution, 
(1)  the  number  of  cases,  and  (2)  the  variability  or  spread  of 
the  measures  within  the  sample. 

(1)  It  is  clear  that  the  number  of  cases  must  influence  the 
stability  of  an  average,  since  the  addition  of  even  one  extra 
measure  to  a  series  will  bring  about  a  change  in  the  average 
unless  the  additional  case  happens  to  coincide  with  it  exactly. 
Moreover,  the  addition  of  one  case  to  a  set  of  10  measures  will 
cause  a  greater  change  in  the  obtained  average — written 
"  average(0bt.)" — than  the  addition  of  one  extra  case  to  a 
set  of  1000  measures,  as  each  case  counts  for  less  in  the  larger 


THE  RELIABILITY  OF  MEASURES  121 

group.  It  has  been  shown  empirically,  as  well  as  theoretically,1 
that  the  reliability  of  an  average (0bt.)  will  increase,  not  in  pro- 
portion to  the  number  of  measures  upon  which  it  is  based, 
but  rather  in  proportion  to  the  square  root  of  the  number  of 
measures.  Thus  the  average (obt.)  of  25  measures  of  a  vari- 
able quantity  is  not  25  times,  but  V25  or  5  times  as  reliable 
as  a  single  measure  of  the  quantity.  And  in  like  manner,  the 
average  of  36  cases  is  not  4  times  as  reliable  as  the  average 

of  9  cases,  but  only  twice  as  reliable — since  V  36  divided  by 

V9  equals  2. 

(2)  In  addition  to  the  size  of  the  sample,  the  reliability 
of  an  average  must  depend  also  upon  the  variability  of  the 
separate  measures  around  the  obtained  average.  If  the  a  of 
the  distribution  is  large,  the  separate  measures  tend  to  scatter 
widely  from  the  average,  and  we  are  unable  to  say  where  those 
cases  in  the  population  which  we  have  not  measured  will  most 
probably  fall:  whether  they  will  be  close  to,  or  far  from  the 
obtained  average.  On  the  other  hand,  if  the  a  is  small  we  may 
be  fairly  certain  that  unmeasured  cases  will  fall  fairly  close 
around  the  average.  For  this  reason,  the  reliability  of  an 
obtained  average  depends  upon  the  size  of  its  a — and  as  a 
increases,  the  reliability  decreases. 

We  find,  then,  that  the  reliability  of  an  average  depends 
first  upon  our  having  selected  a  fairly  representative  sample 
from  the  larger  group — or  population — which  we  are  studying. 
When  this  condition  has  been  met,  and  only  then,  the  reli- 
ability of  an  average  can  be  measured  mathematically  in  terms 
of  its  standard  error — in  terms  of  the  number  of  cases,  and 
the  a  of  the  distribution  (written  cr(dis)).  The  formula  for  the 
standard  error  of  an  average  or  mean,  written  o-av.  is 

°"~Vft' (13) 

1  Yule:  An  Introduction  to  the  Theory  of  Statistics,  19l9,  p.  257.  For  results 
of  experiment,  see  Fullerton  and  Cattell:  On  the  Perception  cf  Small  Differences, 
Publications  of  the  University  of  Pennsylvania,  Philosophical  Series  2,  1892. 


122       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

This  is  one  of  the  most  important — and  most  often  used — of 
the  reliability  formulas.  Note  that  a  decrease  in  <7(dis.),  or  an 
increase  in  the  size  of  N  will  cause  the  standard  error  to  be- 
come smaller  numerically.  A  decrease  in  <rav.  means  that  the 
probable  divergence  of  the  obtained  average  from  the  true  is 
just  so  much  less;  hence  the  reliability  of  an  average(0bt.)  in- 
creases as  crav.  decreases. 

A  problem  will  illustrate  the  value  and  use  of  formula  (13). 

Problem  (1) — In  1883,  the  Anthropometric  Committee  of 
the  British  Association  found  the  average  height  of  8585  adult 
males  in  the  British  Isles  to  be  67 .  46  inches  with  a  a  of  2 .  57 
inches.1  How  reliable  is  this  average?  What  is  its  probable 
divergence  from  the  average  which  would  have  been  secured 
had  all  adult  males  in  the  British  Isles  been  measured? 

Applying  formula  (13)  the  standard  error  of  the  mean, 
<rav.,  is  found  to  be  .0277  inch.  This  result  is  interpreted 
in  the  following  way.  The  chances  are  6826  in  10,000  or  68 
in  100  that  the  obtained  average  of  67.46  inches  does  not 
diverge  from  the  true  average  by  more  than  ±l<rav.7  i.e.,  by  more 
than  ±.0277  inch.  Stated  in  another  way,  the  chances  are 
68  in  100  that  the  true  average  lies  within  the  limits  67.46+ 
.0277  and  67. 46 -.0277,  or  between  67.488  and  67.432 
inches.  We  can  be  practically  certain  that  the  true  mean 
lies  within  the  limits  67.46±3X  .0277  (=fc3o-av.),  or  between 
67.543  and  67.377  inches  (see  Table  X  for  a  values). 

Just  how  the  standard  error  measures  the  reliability  of  an 
average  may  be  shown  most  clearly,  perhaps,  by  an  illustra- 
tion. Suppose  that  we  have  measured  the  heights  of  1000 
groups  of  men,  each  group  containing  8585,  the  groups  or 
samples  chosen  at  random  from  the  general  population.  The 
1000  averages  obtained  from  these  groups  will  tend  to  differ 
slightly  from  one  another  due  to  so-called  errors  of  sampling 
(see  page  143)  and  hence  not  all  samples  will  represent  with 
equal  accuracy  the  population  from  which  they  have  been 

i  Yule,  An  Introduction  to  the  Theory  of  Statistics,  1919,  pp.  112  and  141, 


THE  RELIABILITY  OF  MEASURES  123 

drawn.  Now  suppose,  further,  that  it  were  possible  to  secure 
the  average  height  of  the  entire  male  population  of  the  British 
Isles.  If  we  should  subtract  this  true  mean  from  each  one  of 
the  1000  obtained  means,  obviously  we  would  get  1000  differ- 
ences, and  these  1000  "  measures "  (differences)  would — 
according  to  the  best  assumption  that  we  can  make — follow 
the  normal  probability  curve  (see  page  83).  In  this  hypo- 
thetical distribution  of  differences,  we  should  have  relatively 
few  large  plus  or  minus  deviations,  and  a  relatively  large  num- 
ber of  small  plus,  small  minus,  and  zero  deviations — in  short, 
the  obtained  means  would  hit  close  to  the  true  mean  more  often 
than  they  would  miss  it. 

The  average  of  this  distribution  of  differences  would  fall 
(most  probably)  at  0;  for  other  things  being  equal,  this  will 
be  the  difference  most  often  obtained — the  maximum  frequency 
— in  subtracting  the  true  from  the  obtained  means.     The  a  of 

this  distribution  is  given    by  the   formula    -^=.      In   other 

VN 
words,  the  standard  error  of  the  mean  measures  the  spread 
of  the  differences  (obtained-true)  around  0  as  a  central  tend- 
ency; and  for  this,  reason  o-av.  is  a  measure  of  the  probable  diver- 
gence of  the  obtained  average  from  its  corresponding  true 
average. 

These  results  are  represented  graphically  in  Diagram  XV, 
Fig.  1.  The  1000  differences  between  the  1000  obtained  means 
and  the  true  mean  are  shown  arranged  into  a  normal  frequency 
distribution  with  mean  at  0,  and  a  equal  to  .  0277.  The  heights 
of  the  different  ordinates  represent  the  frequency  of  the  various 
obtained-true  differences:  the  height  of  the  maximum  ordinate 
at  the  mean  is  the  zero  difference.  Now  we  know  that  the  a  of  a 
normal  distribution  includes  the  middle  68.26%  of  the  cases, 
when  measured  off  in  the  plus  and  minus  directions  from  the 
mean.  Hence  we  may  say  that  the  chances  are  68  in  100  that 
the  difference  between  the  obtained  mean  of  67.46  inches  and 
the  true  mean  will  not  be  greater  than  ± .  0277  inch.  Or,  as 
stated  above,  there  are  68  chances  in  100  that  the  true  average 


124       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

lies  within  the  limits  67. 46 +.0277  and  67. 46 -.0277,  or 
between  67.488  and  67.432  inches.  Furthermore,  we  can  be 
practically  sure  that  the  true  average  will  fall  within  the  limits 
dz3o-av.  from  the  mean.  Three  times  ±.0277  is  ±.0831;  and 
accordingly  there  are  9973  chances  in  10,000  (see  Table  X)  that 
the  true  average  lies  within  the  limits  67.46±  . 0831,  or  between 
67.543  and  67.377  inches. 


-.0831  —.0277         0 

FlG.l 


.0277 


+3  <r 


.0831 


5000- 
cases 


28.1 


29       29.C     30.2     30.8 
Fig.  3 


2.17CT 
31.5    32 


-1.6PE 


2i         26.4  30 

Fig.  i 


142.7   147.7   149.7  151.7   152.7   153.7 
Fig.  5 


1.340- 


Fig.  6 


DIAGRAM  XV 


The  average  height  of  our  sample  of  8585  British  males  has 
been  found  to  be  67.46  inches  with  a  standard  error  of  .0277 
inch.  Let  us  now  proceed  to  the  second  question  stated 
on  page  119,  viz.,  "How  many  measurements  must  I  make 
in  order  to  get  a  result  whose  probable  divergence  from  the 
true  result  is  less  than  some  given  amount ?"  Suppose,  for 
example,  that  we  wish  to  secure  an  average  which  is  twice  as 
reliable  as  the  average  we  now  have — how  many  cases  will  be 
required?     Assuming  that  the  spread  in  the  increased  group, 


THE  RELIABILITY  OF  MEASURES  125 

i.e.,  <T(dig.),  remains  approximately  the  same,  all  that  we  need 
do  in  order  to  cut  the  standard  error  in  two  and  thus  double 
the  reliability,  is  to  place  a  2  in  the  denominator  of  the  fraction 

;       .     But  2V8585  becomes  V4X8585  when  the  2  is  placed 

V8585 

under  the  radical,  and,  accordingly,  it  is  evident  that  8585  must 
be  multiplied  by  4  in  order  to  make  <rav.  just  1/2  its  original 
size.  By  analogy,  to  double  the  reliability  of  any  average 
we  must  multiply  N  by  4;  to  triple  the  reliability,  by  9,  etc. 
Assuming  substantially  the  same  o-(dlSi),  the  average  obtained 
from  400  cases  is  twice  as  reliable  as  the  average  got  from 
100,  and  the  average  from  900  cases  three  times  as  reliable  as 
that  from  100  cases. 

B.  The  Reliability  of  the  Mean  in  Terms  of  the  PE  of  the  Average 

In  measuring  the  reliability  of  an  average  the  PE  of  the 
average — written  PZ?(av.) — may  be  used  instead  of  the  crav 
The  Pi?(av.)  is  interpreted  in  exactly  the  same  way  as  the  o-(av.) . 
Its  formula  is  derived  simply  by  multiplying  formula  (13)  by 
.6745  (seepage  121): 

PE(av^'67y^ (14) 

Applying  this  formula  to  our  problem  of  heights  P£,(av.) 
is  found  to  be  .0187  inch.  The  chances  are  even,  therefore, 
that  the  obtained  average  of  67 .  46  inches  does  not  differ  from 
the  true  average  by  more  than  ± .  0187  inch.  Moreover, 
since  ±4PE  includes  practically  all  of  the  cases  in  a  normal 
distribution,  we  may  be  certain  (the  chances  are  99  in  100) 
that  the  true  average  lies  within  the  limits  67.46±4X  .0187, 
or  between  67.39  and  67.53  inches  (see  Table  XI  for  PE 
values). 

A  comparison  of  the  extreme  limits  within  which  we  may 
be  practically  sure  that  the  true  average  will  lie  shows  that  the 
values  of  these  limits  differ  slightly  when  ±4P2£  instead  of 
±3<r  are  taken  as  limiting  points   [see   Problem  (1)  above]. 


126       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

This  discrepancy  is  due  to  the  fact  that  ±3<7  takes  in  9973 
of  the  10,000  cases  in  the  normal  distribution,  while  ±4Pi? 
takes  in  but  9930  cases  (see  Tables  X  and  XI).  The  a  limits, 
therefore,  contain  43  more  cases  than  the  PE  limits,  and  while 
43  cases  in  10,000  may  seem  to  be  an  insignificant  number — 
and  is  insignificant  if  taken  from  the  middle  of  the  distribution 
— even  so  few  cases  as  this  have  considerable  importance  at  the 
extremes  of  the  distribution.  This  may  be  seen  in  the  fact 
that  we  must  take  ±4:A5PE,  in  order  to  have  our  PE  limits 
correspond  exactly  to  ±3<r,  since  these  limits  include  9974 
cases  in  10,000. 

It  is  customary,  however,  in  measuring  reliability  to  use 
zt4:PE  instead  of  ±4.45P1?  as  limits  of  practical  certainty. 
In  the  first  place,  ±4:PE  mark  off  limits  within  which  the 
chances  are  very  great — 9930  in  10,000 — that  the  true  average 
will  fall.  And  furthermore,  the  slight  increase  in  reliability  got 
by  using  ±4.45Pi?  instead  of  ±4PE  is  not  usually  sufficient 
to  offset  the  greater  convenience  of  the  latter  figure. 

2.  The  Reliability  of  the  Median 

The  formulas  for  measuring  the  reliability  of  an  obtained 
median  are  easily  derived  from  those  for  measuring  the  reli- 
ability of  the  mean.  The  o-(mdn.)  and  Pi^mdn.)  are  1.25331,  or 
roughly  5/4,  times  the  o-av.  and  P2£(av0  respectively. 

_5  0-(dis.)  n  »* 

<r  (num.)-  J"  ;^f> UOJ 

DJ?         _5   . 6745Xcr(d|S) _  . 8454cr (dls-)  ,  p. 

or 

PBo-w-f-^.1 (16a) 

Formulas  (15),  (16),  and  16a)  are  all  used  and  interpreted 
in  the  same  way  as  the  reliability  formulas  for  the  average  or 

1  This  formula  should  be  used  when  Q  and  not  a  is  given. 


THE  RELIABILITY  OF  MEASURES  127 

mean.     A  problem  will  serve  to  show  how  the  reliability  of  the 
median  is  found. 

Problem  (2) — Measurement  of  801  12  year  old  boys  on 
the  Trabue  Language  Scale  A  1  gave  the  following  results : 
Median  =  21.4;  Q  =  4.9.  What  is  the  reliability  of  this 
median?  How  close  is  it  to  the  true  median  score  of  12  year 
old  boys? 

From  formula  (16a)  the  PE{mdn.)  is  found  to  be  .2164.  The 
chances  are  50  in  100,  therefore,  that  the  true  median  does  not 
differ  from  21 . 4  by  more  than  ±  .  2164.  We  may  be  practically 
certain  that  the  true  median  lies  within  the  limits  21.4±4X 
.2164,  or  between  22.27  and  20.53. 

Since  cr(mdn0  and  PEimdn,}  are  both  larger — approximately 
1 .  25  times — than  the  corresponding  measures  of  reliability  of  the 
average (obt.),  it  is  clear  that  the  obtained  average  is  always  more 
reliable  than  the  obtained  median  of  the  same  group.  For 
this  reason  the  average  is  used  whenever  the  highest  reliability 
is  sought  (see  page  50). 

III.  The  Reliability  of  Measures  of  Variability 
1.  The  Standard  Deviation,  or  <r 

We  have  seen  that  the  reliability  of  an  obtained  average 
or  obtained  median  is  found  by  determining  the  probable 
divergence  of  the  obtained  from  the  true  measure.  In  the 
same  way,  the  reliability  of  an  obtained  a  or  an  obtained  Q 
is  measured  by  the  probable  divergence  of  this  measure  from 
the  true  a  or  the  true  Q,  viz.,  the  a  or  the  Q  which  we  should 
get  from  all  possible  measures  of  the  trait  in  question.  The 
formula  for  finding  the  reliability  of  an  obtained  a  is 

-*«**  (17) 


"     V2N' 

In  Problem  (1),  page  122,  we  found  that  for  8585  adult 
British    males,    the    obtained    <t — the    a    taken    around    the 

i  Trabue,  M.  R.,  Completion  Test  Language  Scales,  1916,  p.  15. 


128       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

average(obt.)  of  67.46  inches — was  2.57  inches.  The  question 
may  well  be  asked:  how  reliable  is  this  a?  How  well  does 
it  represent  the  true  a  which  we  should  get  if  deviations  could 
be  taken  from  the  true  average?  Substituting  for  <ri6iam)  and 
N  in  formula  (17),  the  value  of  ov  is  found  to  be  .0196  inch. 
This  means  that  the  chances  are  68  in  100  that  2 .  57  inches 
does  not  differ  from  the  true  a  by  more  than  ±.0196  inch; 
and  that  the  chances  are  997  in  1000  that  the  o-(dls0  does  not 
differ  from  the  true  a  by  more  than  3X=b.0196  or  ±.0588 
inch.  We  can  be  practically  certain,  then,  that  the  true  a 
lies  within  the  limits  2.57±  .0588,  or  between  2.63  and  2.51 
inches. 

2.  The  Quartile  Deviation,  or  Q 

The  reliability  of  the  Q  of  a  distribution  is  found  from  the 
formula, 

CQ-    vm  ' (18) 

1.65X0  ,10  v 

OQ=-7m~ (18a) 


or  in  terms  of  Q, 


The  801  12  year  old  boys  who  took  the  Trabue  Completion 
Test,  Scale  A  (see  page  127),  had  a  median  score  of  21 .4  points 
with  a  Q  of  4.9  points.  What  is  the  reliability  of  this  Q? 
From  formula  (18a)  aQ  is  found  to  be  .202.  The  chances  are 
68  in  100,  therefore,  that  4.9,  the  obtained  Q,  does  not  differ 
from  the  true  Q  by  more  than  ±  .  202  point.  And  the  chances 
are  9973  in  10,000  that  the  true  Q  lies  within  the  limits  4.9± 
3  X  .  202,  or  between  5 . 5  and  4 . 3  points. 

IV.  The  Reliability  of  the  Difference  between  Two 

Measures 

1.  The  Reliability  of  the  Difference  between  Two  Averages 
A.  The  Reliability  of  the  Difference  in  Terms  of  the  c(dm.) 
Suppose  that  we  wish  to  find  whether  there  is  any  difference 
in  the  performance  of  10  year  old  boys  and  10  year  old  girls 


THE  RELIABILITY  OF  MEASURES  129 

on  a  certain  general  intelligence  test.  The  usual  method  of 
attacking  this  problem  is  to  select  as  large  and  as  random  a 
sample  of  10  year  old  boys  and  10  year  old  girls  as  possible;  give 
them  our  test,  compute  the  average  scores,  and  find  the  dif- 
ference between  the  two  averages.  If  this  difference  is,  let  us  say, 
several  points  in  favor  of  the  girls,  such  a  result  would  be 
evidence  (on  the  face  of  it)  for  believing  that  the  average  girl  is 
better  than  the  average  boy.  Before  drawing  this  conclusion 
definitely,  however,  we  should  know  how  reliable  the  obtained 
difference  is:  what  its  probable  divergence  is  from  the  true  dif- 
ference which  we  should  get  if  we  could  subtract  the  true  average 
of  the  boys  from  the  true  average  of  the  girls.1  Otherwise,  if 
we  compared  the  averages  of  other  groups  of  boys  and  girls 
similarly  selected  as  our  groups,  we  might  wipe  out  or  even 
reverse  the  difference  found.  One  formula  for  calculating  the 
reliability  of  an  obtained  difference  is 

C(diff.)  =  *  &  (av.  l)~r°"   (av.2);         ....      (19) 

in  which  <rav.  x  is  the  standard  error  of  the  first  obtained  average, 
o"av.2  is  the  standard  error  of  the  second  obtained  average,  and 
c«iifl.)  is  the  standard  error  of  the  difference  between  the  two 
averages.  Thus  to  find  the  reliability  of  the  difference  between 
two  averages,  we  must  first  know  the  reliability  of  the  averages 
themselves. 

Let  us  illustrate  the  use  and  value  of  formula  (19)  by  means 
of  a  problem. 

Problem  (3) — In  a  study  of  the  intelligence  of  foreign  born 
white  draft  during  the  Great  War,  a  sample  of  308  native 
born  Germans  and  a  sample  of  325  native  born  Danes  were 
found  to  test  as  follows  on  the  "  combined  scale:"2 

Country  of  Birth 

Germany 

Denmark 


No.  of  Cases 

Average  Score 

0-(dIs.) 

308 

13.88 

2.43 

325 

13.69 

2.23 

1  Simpler  methods  of  studying  the  significance  of  the  difference  between  two 
averages  are  given  in  Chapter  I,  p.  40. 

2  The  combined  scale  was  made  up  of  the  8  Alpha  tests,  the  Stanford-Binet, 
and  tests  4,  5,  6,  and  7  of  Beta.     The  maximum  score  was  25. 


130       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

The  difference  between  the  two  obtained  averages  is  seen 
to  be  .  19  in  favor  of  the  Germans.  Is  this  a  reliable  difference? 
Would  further  testing  of  other  groups  of  Germans  and  Danes 
give  approximately  the  same  difference;  or  is  it  probable  that 
the  difference  would  be  reduced  to  zero,  or  even  reversed  in  favor 
of  the  Danes?  Stated  more  exactly,  what  is  the  probable 
divergence  of  this  difference  from  the  true  difference  between 
Germans  and  Danes?  To  answer  these  questions,  we  must  find 
the  reliability  of  the  averages  of  the  Germans  and  the  Danes, 
and  from  these  the  reliability  of  the  difference  between  the 
averages. 

By  formula  (13)  the  standard  errors  of  the  two  averages  are, 

For  Germans: 

2.43 


(Tov    — 


or    .1385. 


For  Danes: 


V308 

— =  or   .1237. 
V325 


Substituting  these  values  in  formula  (19)  we  have  that 

aidm  =  V(.  1385)  2  +  (.  1237) 2  =  .  1857. 

The  actual  difference  between  the  two  averages  is  .19,  there- 
fore, and  the  standard  error  of  this  difference,  earn,  is  .  1857. 

An  obtained  difference  is  interpreted  in  terms  of  its  standard 
error  in  exactly  the  same  way  in  which  an  obtained  average 
is  interpreted  in  terms  of  its  standard  error.  Thus  we  may 
say  that  the  chances  are  68  in  100  that  the  obtained  difference 
of  .  19  does  not  diverge  from  the  true  difference  by  more  than 
± .  1857;  and  that  the  chances  are  99  in  100  that  .  19  does  not 
differ  from  the  true  difference  by  more  than  3X±.1S57 — by 
more  than  ±  .  56  (see  Table  X) . 

To  sum  up  our  findings  so  far,  we  may  be  almost  certain  that 
the  true  difference  between  the  averages  of  the  Germans  and 
Danes  lies  within  the  limits  .  19±.56  or  between  —.37  and 
+  .75.     Note  that  the  lower  limit  of  this  range  is  negative, 


THE  RELIABILITY  OF  MEASURES  131 

and  in  consequence  there  is  at  least  some  chance  that  the  true 
difference  is  less  than  zero — that  the  average  of  the  Danes 
will  sometimes  actually  be  higher  than  that  of  the  Germans. 
In  spite  of  the  obtained  difference  in  favor  of  the  Germans, 
we  cannot  be  100%  sure  that  the  true  difference  between  the 
average  German  and  the  average  Dane  is  greater  than  zero. 

Just  what  then,  it  may  be  asked,  are  the  chances  of  a  true 
difference  greater  than  zero  between  Germans  and  Danes? 
Before  answering  this  question,  let  us  digress  for  the  moment  to 
consider  the  following  hypothetical  situation.1  Suppose  that  we 
could  secure  the  averages  of  1000  groups  of  native  born  Ger- 
mans and  1000  groups  of  native  born  Danes  on  the  combined 
scale,  the  samples  selected  at  random  from  the  general  popula- 
tion of  native  born  Germans  and  Danes  and  roughly  of  the 
same  size  as  the  samples  we  have.  Suppose  further,  that  these 
groups  could  be  paired  off  so  that  we  should  have  1000  differ- 
ences between  the  obtained  averages  of  Germans  and  Danes, 
these  hypothetical  differences  corresponding  to  the  actually 
obtained  differences  of  .  19.  Now  according  to  the  best  assump- 
tion that  we  can  make  this  distribution  of  differences  would  fol- 
low the  normal  probability  curve;  the  lower  limit  of  the  dis- 
tribution would  be  at  —  .37,  the  upper  limit  at  .  75  and  the  mean 
at  .  19  as  shown  in  Diagram  XV,  Fig.  2.  The  mean  is  taken  at 
.  19  because  this  is  the  difference  actually  obtained,  and  hence 
may  be  fairly  taken  as  the  most  probable.  Again,  the  chances 
are  even  that  any  other  obtained  difference  will  be  greater  or 
less  than  .  19;  and  accordingly,  the  logical  place  for  this  differ- 
ence would  seem  to  be  at  the  mean.  The  a  of  this  distribution 
of  differences  is  .  1857,  the  crdlff.. 

Now  to  determine  the  chances  that  the  true  difference 
between  Germans  and  Danes  is  greater  than  zero,  we  divide  .  19, 
which  is  the  distance  of  the  mean  difference  from  the  zero  dif- 
ference, by  .  1857,  the  a  of  the  difference-distribution.     This  tells 

1  The  argument  here  which  differs  somewhat  from  that  on  page  123  is 
believed  to  be  better  adapted  to  the  present  illustration  than  the  other.  The 
two  are  essentially  the  same,  however. 


132       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

us  how  far  the  zero  difference  is  below  the  mean  in  u  terms. 

19 
■  '     „.  is  1 .  02cr,  and  from  Table  X  we  find  that  in  the  normal 
.  1857 

curve  3461  cases  in  10,000  lie  between  the  mean  and  1.02cr. 
Adding  in  the  5000  cases  above  the  mean  (see  Digram  XV, 
Fig.  2)  and  translating  cases  over  into  "  chances,"  it  is  clear  that 
the  chances  are  8461  in  10,000  that  the  true  difference  between 
the  averages  of  Germans  and  Danes  is  greater  than  zero.  We 
may  be  practically  certain,  therefore,  when  we  compare  groups 
of  Germans  and  Danes  on  the  combined  scale,  that  84  times 
in  100  or  4  times  in  5,  the  difference  between  the  average  scores 
will  be  in  favor  of  the  Germans.  This  answers  the  question 
put  on  page  130:  "What  are  the  chances  of  a  true  difference 
greater  than  zero  between  the  Germans  and  Danes?" 

The  obtained  difference  of  .  19  is  sufficiently  large  to  insure 
considerably  more  than  an  even  chance  of  a  true  difference 
between  Germans  and  Danes.  It  is  not  large  enough,  how- 
ever, to  guarantee  that  the  Germans  will  always  score  higher, 
on  the  average,  than  the  Danes.  The  further  question  arises, 
therefore: — how  much  difference  would  be  required  to  insure 
absolute  reliability, — to  guarantee  that  the  Germans  will 
always  lead  the  Danes.  This  question  is  easily  answered 
with  the  help  of  Fig.  2.  If  the  point  —3a-  below  the  mean 
(the  point  taken  at  —  .  37)  were  the  zero-difference  point,  we 
should  then  be  practically  certain,  since  the  whole  curve  of 
differences  would  lie  to  the  right  of  this  point,  of  a  true  difference 
always  greater  than  zero.  To  accomplish  this,  however,  i.e.,  to 
shift  the  zero-difference  point  down  to  —  .  37,  the  mean  difference 
would  have  to  be  .37+.  19  or  .56.     This  new  difference  (D) 

56 
divided  by  <rd,fl.  would  equal    *     .  or  3a-,  and  the  chances  would 

.  lo57 

then  be  9986.5  in   10,000  that  the  true  difference  between 

Germans  and  Danes  on  the  combined  scale  will  always  be 

greater  than  zero. 

We  may  summarize  the  preceding  paragraphs  as  follows. 

The  obtained  difference  between  the  averages  of  the  Germans 


THE  RELIABILITY  OF  MEASURES  133 

and  Danes  on  the  combined  scale  is  found  to  be  .  19,  or  1/3 
(approximately)  of  what  it  should  be,  (.56)  to  insure  a  com- 
pletely reliable  difference.  The  obtained  difference  is  large 
enough,  however,  to  guarantee  that  4  times  in  5  the  average 
score  of  the  native  born  Germans  will  be  higher  than  the 
average  score  of  the  native  born  Danes.1 

Once  we  understand  what  the  <rd!fL  formula  means,  the 
reliability  of  an  obtained  difference  in  terms  of  "  chances  that 
the  obtained  difference  represents  a  true  difference  greater 
than  zero  "  may  be  conveniently  read  from  Table  XIV.     For 

example,  when  D=.19  and  cam.-  =  •  1857,  so  that -  =  1.02, 

Odlff. 

we  find  at  once  from  the  table  that  the  chances  are  84  in  100 
that  the  true  difference  is  greater  than  zero.     Moreover,  since  a 

of  3  means  practically  complete  reliability,  we  know  that  a 

0"diff. 

of  1 .  02  is     '      or  about  34%  of  what  it  should  be  in  order 

to  insure  a  difference  always  greater  than  zero. 

It  is  usually  customary  to  take  a  of  3  as  indicative  of 

,  °dlff. 

complete  reliability,  since  —  Scr  includes  practically  all  of  the 
cases  in  the  "  distribution  of  differences  "  below  the  mean  (see 

Diagram  XV,  Fig.  2).    A greater  than  3  is  to  be  taken  as 

Cdiff. 

indicating  just  so  much  added  reliability. 

B.  The  Reliability  of  the  Difference  in  Terms  of  the  PE(diff.) 

The  reliability  of  the  difference  between  two  obtained  means 
may  be  measured  by  the  PE^m.)  as  well  as  by  the  a-(d,fl.).  The 
formula  for  PE^m.)  is 

PE(dm,  =  VP^V.  d  +^2<av.  2),      .     .     .     (20) 
in  which  PEiax,  y  and  PECav.  2>  are  the  PE's  of  the  two  given  ob- 

1  Assuming  that  the  samples  used  represent  adequately — at  least  as  ade- 
quately as  the  present  samples — the  population  of  native  born  Germans  and 
Danes. 


134       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 


TABLE  XIV 

To  Find  the  Chances  of  a  True  Difference  Greater  than  Zero, 
Given  the  Actual  Difference  between  the  Two  Obtained 
Measures,  and  the  earn- 

For  example:    a —=1.3  means  that    the  chances  are  90  in   100   that    the   true 

ffdlff. 
difference  (the  difference  between  the  true  measures)  is  greater  than  zero. 


Note. — The  "chances  in  100"  increase  so  slowly  after  1.50  that  the column 

increases  thereafter  by  .10  instead  of  by  .05.  dlfl- 


D 


D 


. 

Chances  in  100 



Chances  in  100 

""din*. 

""cliff. 

.00 

50 

1.15 

87 

.05 

52 

1.20 

88 

.10 

54 

1.25 

89 

.15 

56 

1.30 

90 

.20 

58 

1.35 

91 

.25 

60 

1.40 

92 

.30 

62 

1.45 

93 

.35 

64 

1.50 

93 

.40 

65 

1.60 

94 

.45 

67 

1.70 

96 

.50 

69 

1.80 

96 

.55 

71 

1.90 

97 

.60 

73 

2.00 

98 

.65 

74 

2.10 

98 

.70 

76 

2.20 

99(98.6) 

.75 

77 

2.30 

99(98.9) 

.80 

79 

2.40 

99(99.2) 

.85 

80 

2.50 

99(99.4) 

.90 

82 

2.60 

99(99.5) 

.95 

83 

2.70 

100(99.7) 

1.00 

84 

2.80 

100(99.74) 

1.05 

85 

2.90 

100(99.8) 

1.10 

86 

3.00 

100(99.9) 

tained  averages.     Formula  (20)   is  interpreted  in  exactly  the 
same  manner  as  formula  (19) — a  problem  will  illustrate  its  use. 

Problem  (4) — On  the  two  halves  of  the  Wood  worth-Wells 
Substitution  Test 1  timed  separately,  200  Barnard  Freshmen 
made  the  following  records : 

Average  (Sees.)         o^dls.) 

First  half 65.51  11.13 

Second  half 60.32  12.04 

1  Carothers,  F.  E.,  Psychological  Examination  of  College  Students,  Archives  of 
Psychology.  46,  1921,  p.  36. 


THE  RELIABILITY  OF  MEASURES 


135 


TABLE  XV 

To  Find  the  Chances  of  a  True  Difference  Greater  than  Zero, 
Given  the   Actual   Difference   between   the   Two   Measures 

AND  THE   P-Edlff- 

D 


For  example:    a 


PE, 


1.10  means  that  there  are  77  chances  in  100  that  the  true 


cliff. 


difference  (the  difference  between  the  true  measures)  is  greater  than  zero. 

Note. — The  "chances  in   100"  increase  so  slowly  after  2.0  that  the 
increases  thereafter  by  .10  instead  of  .05. 

D  „,  .     _  D 


D 


PE 


column 


diff. 


-P^'dlff. 

.00 

.05 

.10 

.15 

.20 

.25 

.30 

.35 

.40 

.45 

.50 

.55 

60 

.65 

.70 

.75 

.80 

.85 

.90 

.95 

1.00 

1.05 

1.10 

1.15 

1.20 

1 .  25 

1.30 

1.35 

1.40 

1.45 

1.50 


Chances  in  100 

PEam. 

Chances  in  100 

50 

1.55 

85 

51 

1.60 

86 

53 

1.65 

87 

54 

1.70 

87 

55 

1.75 

88 

57 

1.80 

89 

58 

1.85 

89 

59 

1.90 

90 

61 

1.95 

91 

62 

2.00 

91 

63 

2.10 

92 

64 

2.20 

93 

66 

2.30 

94 

67 

2.40 

95 

68 

2.50 

95 

,   60 

2.60 

96 

71 

72' 

2.70 

2.80 

97(96.6) 
97 

73 
74 
75 

2.90 
3.00 
3.10 

97(97.5) 
98(97.9) 
98 

76 
77 
78 
79 

3.20 
3.30 
3.40 
3.50 

98(98.5) 
99(98.7) 
99(98.9) 
99 

80 

3.60 

99 

81 

3.70 

99 

82 
83 
84 

3.80 
3.90 
4.00 

99(99.5) 
100(99.6) 
100(99.7) 

84 


Is  this  gain  in  time  from  the  first  to  the  second  half  of  the  test 
sufficiently  large  to  indicate  a  true  difference  in  the  time 
required  to  learn  the  key  after  practice,  or  would  further  testing 
with  other  groups  probably  reduce,  or  even  reverse,  the  gain? 


136       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

First,  to  find  the  probable  errors  of  the  two  averages: 
First  half: 

P£(av.  i)='674^— 1-13^  .  5310.     By  formula  (14) 

Second  half: 

PE(*v.2)= j== =  .5743.     By  formula  (14) 

Substituting  PE{SLV,  X)  and  PEitLy.  2>  in  formula  (20)  we  have 
PEmn.)  =  V(.5310)2  + (.5743)2  =  .  7822. 

The  obtained  difference,  D,  is  5 .  19  and  the  PEmn.)  is  .  7822. 
Therefore,  r^= is  6.64,  and  since  we  find  from  Table  XV 

" &  (diff .) 

(to  be  read  exactly  like  Table  XIV)  that  a  ^—= of  4  indicates 

P&  (diff.) 

complete  reliability,  it  follows  that  our  obtained  difference  is  not 
only  completely  reliable,  but  is  2.64P#(6.64— 4.00)  or  about 
66%  larger  than  it  need  be  in  order  to  insure  a  true  difference 
greater  than  zero. 

Just  as  it  is  customary  to  take  a of  3  as  indicative  of 

0"dlff. 

complete  reliability,  so  a  ^  = must  be  at  least  4  in  order 

P&  (diff.) 

to  insure  complete  reliability. 


2.  The  Reliability  of  the  Difference  between  Two  Medians 

The  two  formulas  (19)  and  (20),  used  in  finding  the  relia- 
bility of  the  difference  between  two  means,  may  be  used  also 
for  finding  the  reliability  of  the  difference  between  two  medians 
when  written: 

0'«Uff.)BS'V»2(m<ln.  l)+0'2(mdn.2)j        ....       (21) 

and 

P ■^(dlfl.)==^/-f>-E'*'(mdn.  1)  + -P-E"""  (mdn.  2),     ■       •       •        (-2) 


THE  RELIABILITY  OF  MEASURES  137 

We  may  illustrate  these  formulas  by  a  problem: 

Problem  (5) — The  following  results  were  obtained  from  a 
group  of  12  year  old  boys  and  a  group  of  12  year  old  girls — 
Grades  III  to  VIII  inclusive — on  the  Trabue  Language 
Scale  A.1 

iV  Median  Q 

Boys 801  21  40  4.9 

Girls 448  22.80  5.3 

The  actual  difference  between  the  two  medians  is  1.4 
points  in  favor  of  the  girls.  Assuming  that  the  two  groups 
are  fairly  unselected,  is  this  difference  sufficiently  large  to 
insure  a  true  difference  greater  than  zero  in  favor  of  the  girls? 

Since  the  measure  of  variability  given  is  the  Q,  we  shall  use 
the  formula  for  PE(Am.).  First,  to  find  the  reliability  of  the 
two  medians: 

For  girls :        PE^(la.)  =  j  •  A^=  =  .3130.     By  formula  (16a) 

For  boys :  P#(mdn.)  =  j  •  4=  =  .  2164.  By  formula  (16a) 
Substituting  in  (22)  we  have, 

PE(flUL)  =  V(.3130)2+(.2164)2  =  .  3805 

The  obtained  difference  is  1.4  and  the  PEmn.)  is  .3805. 
Therefore,  ^ is  3.68,  and  from  Table  XV  we  find  that 

P  -^(dlft.) 

the  chances  are  99.3  in  100  that  there  is  a  difference  greater 
than  zero  between  the  true  median  scores  of  12  year  old  boys 

j-?^ )  of  what 

it  should  be  conventionally  in  order  to  guarantee  complete 
reliability.  However,  it  is  sufficiently  high  to  be  taken — 
for  all  practical  purposes — as  completely  reliable. 

1  Completion-Test  Language  Scales,  1916,  p.  15. 


138       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

V.  Some  Problems  Which  Involve  Measures  of 

Reliability 

This  Section  is  designed  to  illustrate  a  variety  of  problems 
which  require  in  their  solution  the  reliability  formulas  given 
in  this  Chapter  and  the  frequency  tables.  For  quick  reference 
later,  each  group  of  examples  is  preceded  by  a  general  state- 
ment of  the  essential  problem  involved. 

A.  To  Find  the  Probability  That  the  True  Average  is  Greater  or 
Less  than  Some  Designated  Point  on  the  Scale,  or  That 
it  Falls  within  Given  Limits 

Problem  (1) — Given  Averageobt.  =  30.2.  C(di3.)  =  6.00. 
N  — 100.  On  the  assumption  that  this  sample  is  fairly  repre- 
sentative of  the  population  from  which  it  is  drawn,  (a)  what 
is  the  reliability  of  the  obtained  average?  (b)  What  are  the 
chances  that  the  true  average  is  less  than  29?  (c)  greater 
than  31.5?     (d)  that  the  true  average  lies  between  28  and  31? 

(a)  From  formula  (13)  we  find  that  the  crav.  is  .6;  hence 
the  chances  are  68  in  100  that  the  obtained  average  does  not 
diverge  from  the  true  average  by  more  than  ± .  6,  and  that 
the  true  average  falls  between  the  limits  30.8  and  29.6. 
Moreover,  the  chances  are  99.7  in  100  that  30.2  does  not 
diverge  from  the  true  average  by  more  than  ±.6X3  or  ±1.8; 
i.e.,  that  the  true  average  falls  within  the  limits  28.4  and  32. 

These  results  are  represented  graphically  in  Diagram  XV, 
Fig.  3.  This  normal  probability  distribution  represents  the 
distribution  of  means  that  we  should  expect  to  get  from  a 
large  number  of  random  samples,  selected  in  the  same  way  as 
the  sample  we  have.1  The  central  tendency  of  this  hypo- 
thetical distribution  of  means  is  taken  at  30.2,  the  actually 
obtained,  and  hence  the  most  probable,  mean.  The  standard 
deviation  o£  the  distribution  is  .6,  the  standard  error  of  the 
given  obtained  mean. 

(b)  What  are  the  chances  that  the  true  mean  is  less  than  29? 

1  See  the  discussion  on  pages  122-123. 


THE  RELIABILITY  OF  MEASURES  139 

29  lies  1.2  points  or  2a  below  the  obtained  mean  of  30.2 
(see  Fig.  3).  From  Table  X,  we  find  that  4772  cases  in  10,000 
fall  between  the  mean  and  2a  in  a  normal  distribution;  and, 
accordingly,  5000  —  4772  or  228  cases  must  lie  below  2a.  The 
chances  are  228  in  10,000,  therefore,  that  the  true  mean  lies 
below — is  less  than — 29. 

(c)  What  are  the  chances  that  the  true  mean  is  greater 
than  31.5?  This  score  is  1.3  points  or  2.17o-  above  the 
obtained  mean.  There  are  4850  cases  in  10,000  between  the 
mean  and  2.17<r  in  a  normal  distribution:  and  5000  —  4850  or 
150  cases  above  this  point.  Hence  the  chances  are  150  in  10,000 
or  about  2  in  100  that  the  true  mean  is  greater  than  31.5  (i.e., 
lies  above  2.17a). 

(d)  What  are  the  chances  that  the  true  mean  lies  between 
28  and  31?  28  is  2.2  points  or  —  3.67o-  from  the  mean;  and 
31  is  .8  of  a  point  or  1 .  34c-  from  the  mean.  Between  the  mean 
and  —3.67(7  in  a  normal  distribution  are  4999  cases  in  10,000, 
and  between  the  mean  and  1.34ct  are  4099  cases  in  10,000. 
Within  the  interval  from  —  3.67<r  to  1.34cr,  therefore,  we  find 
4999+4099  or  9098  cases.  Stated  as  chances,  there  are  about 
91  chances  in  100  that  the  true  average  lies  between  28  and  31. 

Problem  (2) — Given  Average(obt-)  =  26 . 4.  PE{SLV-)  =  1.5. 
What  are  the  chances  that  the  true  average  of  the  group  of 
which  the  given  group  is  a  random  sample  is  (a)  as  large  as  30? 
(b)  as  small  as  24? 

As  in  Problem  (1),  this  situation  may  be  represented  by  a 
normal  probability  curve,  with  the  mean  at  26.4  and  PE  equal 
to  1.5  (see  Diagram  XV,  Fig.  4). 

(a)  What  are  the  chances  that  the  true  average  of  the  group 
is  as  large  as  30?  30  is  3.6  points  or  2.4  PE  above  the  obtained 
average  of  26.4.  There  are  4472  cases  in  10,000  between  the 
mean  and  2.4  PE  in  a  normal  distribution  (Table  XI);  and 
5000-4472  or  528  cases  above  2.4  PE,  i.e.,  above  30.  Hence 
the  chances  are  528  in  10,000  or  about  5  in  100  that  the  true 
average  is  as  large  (or  larger  than)  30. 


140      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

(6)  What  are  the  chances  that  the  true  average  is  small  as 
24?  24  lies  2.4  points  or  —1.6  PE  from  the  mean.  There  are 
3597  cases  in  10,000  between  the  mean  and  — 1.6  PE  in  a  normal 
distribution,  and  5000-3597  or  1403  cases  below  -1.6  PE. 
The  chances  are  1403  in  10,000,  therefore,  that  the  true  average 
is  as  small  (or  smaller  than)  24. 

B.  To  Find  the  Probability  That  the  Divergence  of  an  Obtained 
Measure  from  its  True  Measure  Will  be  within  Given 
Limits 

Problem  (3) — Given  Average(obL)  =  152.7  and  c(av.)=4.5. 
Find  the  probability  that  the  given  obtained  average  will  not 
diverge  (or  vary)  from  the  true,  by  more  than  (a)  1  point, 
(b)  3  points,  (c)  5  points,  (d)  10  points. 

(a)  This  is  essentially  the  same  problem,  expressed  in  a  slightly 
different  way,  as  the  problems  under  A.  To  find  the  probability 
that  the  obtained  average  differs  from  the  true  by  as  much  + 1  or 
—  1,  we  must  find  the  chances  that  the  true  mean  lies  within  the 
limits  152.7=1=1,  i.e.  between  151.7  and  153.7.  (This  is  shown  in 
Diagram  XV,  Fig.  5).  A  deviation  of  ±1  point  is  a  deviation  of 

±t~^  or  ±  .222c  from  the  obtained  mean.     From  Table  X  we 
4.5 

find  that  880  cases  in  10,000  in  a  normal  distribution  fall  between 

the  mean  and  +  .222<7  or   —  .222a.     Accordingly,  880X2  or 

1760  cases  fall  within  the  interval  +  .222o-  to  —  .222<r,  and  the 

chances  are  1760  in  10,000  that  the  obtained  mean  will  not 

diverge  from  the  true  mean  by  more  than  ±  1  point. 

3 
(6)  Three  points  are  i^— r  or  ^  ■  ^7 a  ^rom  the  mean.     There 

are  2475X2  or  4950  cases  within  the  interval  .667cr  measured 

off  to  the  right  and  left  of  the  mean.     Hence  there  are  4950 

chances  in  10,000  that  the  obtained  mean  will  not  diverge  from 

the  true  mean  by  more  than  dz3  points. 

5 
(c)  Five  points  are  zk—  or  d=  1 .  llo-  from  the  mean.    Hence 

there  are  3665X2  or  7330  chances  in  10,000  that  the  obtained 


THE  RELIABILITY  OF  MEASURES  141 

average  will  not  differ  from  the  true  average  by  more  than  ±5 
points. 

(d)  Ten  points  are  ±j-r  or  ±2.22o-  from  the  mean;  and 

accordingly  there  are  4868X2  or  9736  chances  in  10,000  that 
the  obtained  mean  will  not  diverge  from  the  true  mean  by  more 
than  ±  10  points. 

C.  To  Find  the  Probability  That  the  True  Difference  between  the 
Measures  of  Two  Groups  is  Greater  or  Less  than  a  Given 
Amount 

Problem  (4) — The  difference  between  two  obtained  means 
is  3.  o"(dlft)  =  1.5.  (a)  What  are  the  chances  that  the 
true  difference  between  the  means  of  the  two  groups  is  greater 
than  0?     (b)  greater  than  1?     (c)  greater  than  3? 

3 

(a)  Zero  difference  is  — -  or  2a  below  the  mean  of  differences, 

I .  o 

viz.,  3  (see  Diagram  XV,  Fig.  6).  There  are  4772  cases  in  10,000 

between  the  mean  of  a  normal  distribution  and  2a.  Accordingly, 

there  are  5000+4772  or  9772  chances  in  10,000  that  the  true 

difference  is  greater  than  zero.     (Note  that  this  result  may  be 

read  off  directly  from  Table  XIV— that =  2.) 

tfdlff. 

2 
(6)  One  is  — —  or  1 .  33o-  below  the  mean.     There  are  4082 
1.5 

cases  in  10,000  in  a  normal  distribution  between  the  mean  and 
1 .  33(7.  The  chances,  therefore,  are  5000+4082  or  9082  in  10,000 
that  the  true  difference  is  greater  than  1. 

(c)  What  are  the  chances  that  the  true  difference  is  greater 
than  3?  The  obtained  difference  of  3  has  been  placed  at  the 
mean  of  differences  as  the  obtained,  and  hence  the  most  prob- 
able difference.     The  chances  are  even,  therefore,  or  50-50  that 

the  true  difference  is  greater  (or  less)  than  3.    Note  that is 

0"(dlff.) 

—^  or  0.     (Table  XIV.) 


142       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

VI.  Limitations  to  Reliability  Formulas,  and  Cautions 
to  be  Observed  in  Interpreting  Them 

The  formulas  which  have  been  given  in  this  chapter  for 
calculating  the  standard  errors  of  obtained  measures  of  central 
tendency  and  variability  make  use  of  only  two  characteristics 
of  the  distribution  from  which  the  measure  has  been  obtained, 
viz.,  the  a  (distribution) — the  spread  of  the  measures — and 
N,  the  number  of  cases.  It  is  obvious  that  so  far  as  the 
formulas  themselves  are  concerned  there  is  nothing  which 
would  prevent  our  finding  a  standard  error  for  a  measure 
obtained  from  any  group.  Such  a  general  and  uncritical  appli- 
cation of  reliability  formulas,  however,  will  almost  surely  lead 
to  erroneous  conclusions,  and  for  this  reason  it  is  necessary  to 
indicate  briefly  some  of  the  limitations  to  reliability  formulas 
as  well  as  some  cautions  to  be  observed  in  interpreting  results 
secured  from  them. 

(1)  In  the  first  place,  in  interpreting  standard  errors  we 
always  make  the  assumption  that  measures  obtained  from 
successive  samples  are  distributed  according  to  the  normal 
probability  curve.  This  assumption  is  only  true,  however, 
when  the  number  of  cases  is  large;  it  is  not  valid  when  the 
sample  is  small.  Hence  the  significance  of  a  measure  of  relia- 
bility is  conditioned  upon  our  having  a  sufficiently  large  number 
of  cases.  If  N  is  less  than  25,  there  is  little  sense  or  justifica- 
tion in  using  reliability  measures.  One  simple  and  practical 
method  of  judging  whether  the  sample  is  "  sufficiently "  large 
is  to  continue  taking  independent  measures  or  adding  cases 
drawn  at  random,  until  the  addition  of  extra  cases  fails  to 
produce  an  appreciable  fluctuation  in  the  average  or  median. 
When  this  point  is  reached  the  sample  is  probably  large  enough 
to  be  taken  as  fairly  representative  of  the  larger  group  from 
which  it  has  been  drawn.  As  a  corollary  it  must  be  recognized, 
however,  that  mere  numbers  are  not  in  themselves  a  guarantee 
of  a  representative  sample. 

(2)  A  more  serious  limitation  to  the  measures  of  reliability 


THE  RELIABILITY  OF  MEASURES  143 

arises  from  the  fact  that  standard  and  probable  errors  of 
obtained  measures  can  be  assumed  to  measure  only  those  errors 
which  result  from  fluctuations  due  to  " random  sampling." 
An  illustration  will  make  this  term  clear.  On  page  122  we 
found  that  the  obtained  average  height  of  8585  adult  British 
males  was  67.46  inches  with  a  standard  error  of  .0277  inch. 
This  means  that  the  chances  are  997  in  1000  that  the  true 
average  height  of  British  males  lies  between  67.54  and  67.38 
inches.  Now  by  "true  average  height"  we  mean  the  average 
height  of  all  British  males,  from  whom  our  group  of  8585 
is  an  attempted  random  sampling.  If  our  group  were  per- 
fectly representative,  its  average  would  equal  the  true  aver- 
age exactly.  Except  by  chance,  however,  neither  this  sample 
nor  another  similarly  selected,  and  approximately  of  the  same 
size,  will  represent  the  entire  population  perfectly;  and  further- 
more, it  is  extremely  unlikely  that  the  averages  calculated 
from  successive  samples  will  equal  each  other.  Nevertheless, 
if  the  samples  are  actually  random,  and  there  are  no  large  con- 
stant errors  present,  the  calculated  averages  will  tend  to  vary 
around  the  true  average  of  the  whole  group  within  a  compara- 
tively small  range.  (  Variations  like  these,  which  arise  from  the 
fact  that  we  must  generally  work  with  samples  instead  of  the 
whole  population,  are  called  " errors  of  sampling." 

The  function  of  the  standard  and  of  the  probable  errors  is  to 
give  a  measure  of  this  sampling  error,  i.e.,  of  the  probable  amount 
of  deviation  to  be  expected  in  an  obtained  measure  from  the 
corresponding  true  measure,  as  a  result  of  working  with  a  single 
sample.  In  other  words,  the  standard  or  probable  error  meas- 
ures the  error  made  in  taking  a  sample  as  representative  of  the 
larger  group  or  population.  If  the  standard  error  of  a  given 
mean  is  small,  it  does  not  follow  that  the  obtained  mean  is 
highly  reliable,  necessarily;  a  small  standard  error  indicates 
merely  that  the  reliability  is  high,  in  so  far  as  fluctuations  due 
to  differences  in  sampling  are  concerned. 

Reliability  formulas  give  no  measure  of  the  effects  of  errors 
due  to  other  causes  than  those  which  arise  from  sampling. 


144      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

Errors  which  arise  from  the  failure  to  get  a  random  sample,  for 
example,  are  neither  detected  nor  measured  by  these  formulas. 
To  illustrate  this  point,  the  average  Army  Alpha  score  made 
by  500  college  men  between  the  ages  of  18  and  25  will  not  be 
representative  of  the  male  population  of  this  age-range.  Col- 
lege men  form  a  highly  selected  group,  and  in  consequence, 
other  samples  of  500  drawn  at  random  from  the  male  population 
between  the  ages  of  18  and  25  will  return  very  different  results 
from  that  of  the  college  group.  These  differences  in  average 
score  cannot  be  attributed  to  errors  of  sampling;  and  to  take 
this  group  as  representative  of  the  general  male  population 
between  the  ages  of  18  and  25,  and  to  calculate  the  standard 
error  of  its  average  will  lead  to  an  entirely  erroneous  idea  of  the 
intelligence  of  the  general  population.  (The  given  sample 
might,  of  course,  serve  very  well  as  a  group  representative  of 
the  population  of  college  men.) 

Other  variations  not  measured  by  the  reliability  formulas 
arise  from  errors  due  to  practice,  fatigue,  coachability  of  tests, 
faulty  technique  in  giving  and  scoring  tests,  and,  in  fact,  errors 
due  to  a  bias  of  any  sort.  Standard  errors  calculated  for  measures 
secured  from  samples  which  contain  such  errors  will  always  be 
of  doubtful  value. 

The  careful  study  of  successive  samples,  retests  when 
practicable,  care  in  controlling  conditions,  and  the  use  of 
objective  checks  whenever  possible,  will  eliminate  many  of 
these  troublesome  and  prolific  sources  of  error.  Assuming 
that  constant  errors  are  small  or  practically  negligible,  one 
of  the  simplest  tests  of  the  adequac}^ — the  "  representative- 
ness"— of  a  sample  consists  in  taking  several  other  groups 
of  approximately  the  same  size  from  the  general  population. 
If  the  measures  calculated  from  these  groups  are  of  very  nearly 
the  same  size,  we  may  be  reasonably  assured  that  we  have 
representative  samples.  If  the  similarity  is  not  fairly  close, 
we  must  continue  adding  cases  until  the  successive  samples 
are  approximately  similar.  Oftentimes  more  information  may 
be  secured  in  regard  to  the  reliability  of  our  measures  in  this 


THE  RELIABILITY  OF  MEASURES  145 

way  than  could  be  obtained  from  a  blanket  use  of  reliability 
formulas. 

(3)  In  concluding  this  discussion,  we  should  add  one  word 
in  regard  to  the  use  of  formulas  which  measure  the  reliability 
of  the  difference  between  two  obtained  measures,  namely, 
oW.)  and  PE@w.)-  These  formulas  make  allowance  only  for 
variable  errors  in  the  original  measures — for  errors  which 
arise  in  sampling.  Constant  errors  in  the  original  scores  and 
errors  of  the  sort  mentioned  above  are  not  detected,  nor  their 
influence  measured.  Furthermore,  these  formulas  always 
assume  that  the  measures  or  scores  in  the  two  series  which  are 
compared  are  uncorrelated  (see  page  288).  These  limitations 
must  be  borne  in  mind  when  using  or  interpreting  differences 
in  terms  of  the  "  true  "  difference.  .  .  . 

VII. — Summary  of  Reliability  Formulas 

1.  The  Reliability  of  Measures  of  Central  Tendency 

(1)  The  Average  or  Mean 

i  „         —  q'(dl3->  (\<X\ 

-l.  <T(aver.)  —      ,— - \lO) 

9         PF            -  ■  6745(7(dls-)  nA\ 

L.       /'^(ave,..)  — -== ^14; 

(2)  The  Median 

1  ^  _  5   g~(diS.)  y--v 

1-  0-(mdn.)-^7/^ UOj 

I.     JPA(mdn.)=- — -= — (16) 

3.       -P^Cmdn.)  =  T       ,— (16a) 

2.  The  Reliability  of  Measures  of  Variability 

(1)  The  Standard  Deviation 

i.    ff„=^ (17) 


146       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 
(2)  The  Quartile  Deviation 

<e,_     V2N (I8) 

'""-vw (18o) 

3.  The  Reliability  of  the  Difference  between  Two  Measures 

(1)  The  Average 

1-  0"(dlff.)  =VCT   (aver.  1)4*0"   (aver.  2) (19) 

2.       PE(am.)  =  vPE   (aver.l)-\-PE   (aver.  2).         ■       ■       ■        (20) 

(2)  The  Median 


1-  0"(dlff.)  —  ^C^Cmdn.  l)~rfw(mdn.  2) (21) 


2.     PE{aift.)=vPE2(man.  i)+P-E2(mdn.  2).      •     .     •     (22) 


PROBLEMS 

Note:  For  uniformity  in  figuring  "chances"  in  the  following  problems, 
take  all  a  and  PE  distances  to  three  decimals  and  correct  back  to  the  second 
place.  Count  all  fractions  over  one  half  as  wholes  and  drop  all  under  one 
half.     For  example,  write  1.876<r  as  1.88a;   .023  PE  as  .02  PE,  etc. 

1.  Given  that  the  obtained  average  is  26.4;  a  is  3.2;  N  is  100. 

{a)  What  are  the  chances  that  the  true  average  for  the  10,000  from 
which  the  100  cases  measured  are  a  random  sampling  will 
be  greater  than  27? 

(b)  That  it  will  be  between  26  and  27? 

(c)  What  are  the  chances  that  the  true  variability  will  be  between 

3.1  and  3.3? 

(d)  That  the  true  variability  will  be  less  than  3 . 5? 

2.  Given:  Median  =  72 . 40.    Q  =  12.84.    N  =  S1. 

(a)  What  are  the  chances  that  the  true  median  of  the  population 

from  which  this  random  sample  is  drawn  is  above  75? 

(b)  That  it  lies  between  70  and  74? 

(c)  What  are  the  chances  that  the  true  Q  is  not  greater  than  15? 

(d)  That  it  lies  between  10  and  14? 


THE  RELIABILITY  OF  MEASURES  147 

3.  Given:  Av.  1=29.6.    <r(dtoi)  =  3 . 54.    N=100. 

Av.  2  =  28.4.    o-(dl8.)  =  5.36.    #  =  225. 

(a)  Find  the  o-av.  for  both  distributions. 
(6)  Find  the  reliability  of  the  difference  between  the  means, 
(c)   What  difference  would  be  completely  reliable,  assuming  that 
the  variability  remains  practically  unchanged? 

4.  In  Example  2,  page  56,  find  the  reliability  of  the  difference  between 

the  means  of  distributions  A  and  B  [use  the  <r(difl.)]. 

5.  Average (obt-)=K.     PE(Siy)  =  3.5.     What  are  the  chances  that  the 

true  average  will  not  diverge  from  the  obtained  by  more  than 
(a)  1,  (b)  3,  (c)  10. 

6.  Given  that  Mdn.  1-Mdn.  2  =  3.6.    PEidm  =  3 . 0. 

(a)  What  are  the  chances  that  true  difference  is  less  than  0? 

(b)  That  it  is  1  or  more? 

(c)  What  per  cent  is  the  obtained  difference  of  the  difference  neces- 

sary for  complete  reliability? 

7.  Find  the  reliability  of  the  average  in 

(a)  Example  4,  page  116. 

(b)  Example  5,  page  116. 

8.  In  a  random  sample  of  100  cases  each  from  the  four  groups  A,  B,  C, 

and  D,  the  following  are  obtained : 

A.  Average  =  101.  cr(dls)  =  10 . 0. 

B.  Average  =  104.  <r(dIs.)  =  11.0. 

C.  Average  =  93.  o-(dls<)=  9.6. 

D.  Average  =  86.  c^\s.)—  8-5. 

What  are  the  chances  that,  in  general,  the  average  of 
(a)  the  A's  is  better  than  the  average  of  the  B\s. 
(6)  the  A's  is    5  better  than  the  average  of  the  C's. 

(c)  the  A's  is  10  better  than  the  average  of  the  D's. 

What  are  the  chances  that 
(a)  a  B  will  be  better  than  the  average  A. 
(6)  a  B  will  be  better  than  the  average  C. 
(c)   a  B  will  be  better  than  the  average  D. 


148       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

A^SWEBS 


1. 

(a)    3  in  100. 

(b)  86  in  100. 

(c)   34  in  100. 

(d)  91  in  100. 

2. 

(a)  16  in  100. 

(b)  55  in  100. 

(c)   90  in  100. 

id)  71  in  100. 

3. 

(a)  0-av.  i  =  •  354.    o-av  2  =  .  357. 

(6)  99  chances  in  100  of  a  true  difference 

(c)   1.51. 

4. 

92  chances  in  100  of  a  true  difference.     ( 

5. 

(a)  15  in  100. 

(6)  44  in  100. 

(c)  95  in  100. 

6. 

(a)  21  in  100. 

(6)  72  in  100. 

(c)  30%. 

7. 

(a)      o-av.=  .0791. 

(6)  P#av.=  .318. 

(Table  XIV)< 


a)  222  in  10,000. 

b)  9846  in  10,000  or  99  in  100. 

c)  9999.277  in  10,000  (100%). 

a)  61  in  100. 

b)  84  in  100. 

c)  95  in  100. 


CHAPTER  IV 
CORRELATION 

I.  What  is  Meant  by  Correlation 

Up  to  this  point  in  our  discussion  we  have  concerned  our- 
selves chiefly  with  methods  of  computing  statistical  measures 
which  shall  represent  in  a  reliable  way  the  performance  of  an 
individual  or  a  group  in  some  denned  capacity  or  trait.  Fre- 
quently, however,  it  is  of  greater  importance  to  examine  the 
relation  of  some  capacity,  such  as  general  intelligence,  to 
some  other  capacity,  such  as  musical  ability,  than  to  measure 
performance  in  a  single  trait  alone.  For  example,  we  may 
ask  whether  there  is  any  relation  between  general  intelligence 
as  measured  by  a  standard  intelligence  test  and  scholastic 
achievement  as  measured  by  "  grades "  or  "  marks."  Or, 
more  specifically,  we  may  inquire  whether  an  individual  who 
gives  evidence  of  high  general  intelligence  tends  to  outstrip  the 
average  individual  in  school  work.  Again,  knowing  the  ability 
of  an  individual  in  one  test,  can  we  say  anything  about  his 
ability  in  another  and  different  test?  Are  certain  abilities 
highly  related,  and  others  relatively  independent?  These 
questions,  and  others  of  the  same  general  nature,  are  studied 
by  the  Method  of  Correlation. 

The  statistical  device  whereby  relationship  is  expressed 
on  a  quantitative  scale  is  called  the  "  coefficient  of  correlation," 
and  is  designated  by  the  letter  "  r." 

Let  us  consider  first  the  situation  where  the  correlation  is 
fixed  and  unchanging.  We  know  that  the  circumference  of 
a  circle  is  always  3.1416  times  its  diameter,  no  matter  how 
large  or  how  small  the  circle,  or  in  what  part  of  the  world  we 

149 


150       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

find  it.  Each  time  that  we  increase  or  decrease  the  diameter 
of  a  circle,  we  increase  or  decrease  the  circumference  by  just 
3.1416  times  the  same  amount.  In  short,  the  relation  is  fixed 
and  definite,  and  hence  we  say  that  the  " correlation"  between 
diameter  and  circumference  is  perfect,  and  that  r  is  equal  to 
1.00.  In  like  manner,  if  we  find  that  100  men  take  exactly  the 
same  arrangement  in  two  tests,  so  that  the  man  who  ranks  first 
(or  highest)  in  the  one  ranks  first  in  the  other,  the  man  who 
ranks  second  in  the  first  test  ranks  second  in  the  other,  and 
that  this  one-to-one  correspondence  holds  throughout  the 
entire  list,  the  correlation  here  is  perfect  also,  for  the  relative 
position  of  each  man  is  exactly  the  same  in  one  test  as  in  the 
other.     The  coefficient  of  correlation,  r,  is  equal  to  1.00. 

Now  let  us  consider  the  case  where  there  is  just  no  relation 
at  all.  Suppose  that  we  have  examined  100  college  seniors 
on  the  Army  Alpha  test  and  on  a  tapping  test.  The  average 
Alpha  score  for  the  whole  group  is  175,  and  the  average  tap- 
ping rate  is  185  taps  in  30  seconds.  Suppose  further,  that 
when  we  divide  our  group  into  three  equal  parts,  the  average 
Alpha  score  of  the  upper  one-third  is  190,  and  the  average 
tapping  rate  184;  the  average  Alpha  score  of  the  middle  third 
is  175  with  an  average  tapping  rate  of  186;  and  the  average 
Alpha  score  of  the  lowest  one-third  is  160  with  an  average 
tapping  rate  of  185.  Now  clearly  since  the  tapping  rate  is 
almost  identical  in  all  three  groups,  we  should  be  unable  to 
draw  any  conclusion  from  a  man's  tapping  rate  alone  as  re- 
gards his  probable  score  on  Alpha.  An  average  tapping  rate 
of,  say,  185  to  190,  is  as  liable  to  be  found  with  an  Alpha  score  of 
150  as  with  one  of  175  or  even  200.  We  should  be  as  well 
qualified,  then,  to  estimate  a  man's  Alpha  score  knowing  only 
his  tapping  rate  as  we  should  be  able  to  estimate  it  if  all  we 
knew  about  the  man  in  question  was  that  he  had  blue  eyes 
and  light  hair.  In  either  case  our  estimate  would  be  no  better 
than  a  guess.  There  is,  therefore,  little  or  no  correspond- 
ence in  the  degree  or  amount  of  capacity  possessed  by  a  given 
individual  in  the  traits  measured  by  the  two  tests,  and  the 


CORRELATION  151 

coefficient  of  correlation  r  will  equal  zero,  which  means  that 
there  is  just  no  correlation  present. 

So  far  we  have  indicated  that  perfect  relationship  may 
be  expressed  by  a  coefficient  of  1.00,  and  that  just  no  rela- 
tion by  a  coefficient  of  0.  Between  these  two  limits  we  may 
have  relations  of  varying  degree,  indicated  by  such  coeffi- 
cients as  .30,  .60,  .90.  In  every  case  a  coefficient  between 
0  and  1.00  implies  some  degree  of  positive  association,  the 
degree  of  association  depending  on  size  of  the  coefficient. 

Relation  may  be  negative  as  well  as  positive,  however. 
That  is,  a  large  degree  of  one  ability  may  be  associated  with  a 
small  degree  of  another,  or  vice  versa.  When  this  inverse 
relation  is  perfect,  r  equals  —  1 .  00.  To  illustrate,  suppose  that 
in  a  certain  group  of  25  boys,  we  find  that  the  boy  standing 
highest  in  Latin  ranks  lowest  in  Shop  Work;  that  the  boy  who 
stands  second  in  Latin  stands  next  to  the  bottom  in  Shop  Work ; 
and  that  any  given  boy  is  found  to  stand  exactly  the  same 
distance  from  the  top  of  the  group  in  Latin  as  he  stands  from  the 
bottom  of  the  group  in  Shop  Work.  Table  XVI  on  p.  152  will 
illustrate  the  situation. 

The  correspondence  here  is  fixed  and  definite  enough,  but 
the  relation  is  inverse.  Hence  the  correlation,  while  perfect, 
is  negative,  and  the  coefficient  of  correlation  r  equals  —  1 .  00. 
Negative  coefficients  may  range  all  the  way  from  —  1 .  00 
up  to  0,  just  as  positive  coefficients  range  from  1 .00  down  to  0. 

Coefficients  of  correlation,  then,  may  range  up  and  down 
on  a  scale  which  extends  from  —  1 .  00  through  0  to  + 1 .  00.  A 
positive  correlation  indicates  a  positive  relation  or  correspond- 
ence; a  zero  correlation  the  absence  of  relation;  and  a  negative 
correlation  indicates  an  inverse  relation.  While  for  the  sake  of 
simplicity,  we  have  illustrated  above  only  perfect  positive, 
perfect  negative,  and  zero  correlation,  only  rarely  do  we  get 
coefficients  at  the  extremes  of  the  scale.  In  most  cases  cal- 
culated coefficients  will  be  found  at  intermediate  points,  e.g., 
at  .90,  .  20,  —  .  30,  etc.  Such  intermediate  values  as  these 
are  to  be  interpreted  as  "  high  "  or  "  low  "  in  a  general  way 


152       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

depending  upon  how  close  they  are  to  ±  1 .  00  or  0.  A  more 
complete  discussion  of  the  meaning  of  a  correlation  coefficient 
is  given  later  on  page  160. 


TABLE  XVI 

To  Illustrate  a  Correlation  of 

-1.00 

Boy 

Standing  in  Latin       Standi 

ing  in  Shop  Work 

1 

1 

25 

2 

2 

24 

3 

3 

23 

4 

4 

22 

5 

5 

21 

6 

6 

20 

7 

7 

19 

8 

8 

18 

9 

9 

17 

10 

10 

16 

11 

11 

15 

12 

12 

14 

13 

13 

13 

14 

14 

12 

15 

15 

11 

16 

16 

10 

17 

17 

9 

18 

18 

8 

19 

19 

7 

20 

20 

6 

21 

21 

5 

22 

22 

4 

23 

23 

3 

24 

24 

2 

25 

25 

1 

II.  The  Coefficient  of  Correlation: — What  it  is,  and 

What  It  Does 

1.  The  Coefficient  of  Correlation  as  a  Ratio 

Instead  of  taking  up  directly  the  method  of  computing 
an  r,  we  shall  first  try  in  this  section  to  give  a  clear  notion 
of  just  what  an  r  represents  and  how  it  measures  relationship. 
The  steps  in  the  calculation  of  r  by  the  "product-moment  ' 
method — the  standard  method — will  then  be  given  in  detail  in 
the  next  section. 

Let  us  begin  with  Diagram  XVI.     This  diagram,  which  is 


CORRELATION 


153 


DIAGRAM  XVI 

To  Show  How  Correlation  May  be  Expressed  as  a  Ratio 


Weight  in 

Kgs.     (X-  variable) 

45- 
49 

50-        55- 
54           59 

60-         65- 
64          69 

70- 
74 

7579 

80- 
84 

189 

1 

185 

/ 

"3 

184 

1 

3 

3 

4 

2 

3 

XJ 

180 

/ 

/// 

/// 

//// 

// 

/// 

eS 

"S 

> 

179 

4 

11 

6 

3 

2 

2 

TO 

175 

//// 

Mm// 

m/ 

/// 

// 

// 

174 

2 

9 

11 

8 

2 

1 

H 

170 

// 

M//// 

m  m 

m/  /// 

// 

/ 

a 

169 

1 

5 

7 

10 

3 

fell 

165 

/ 

m 

m/// 

m/m/ 

/// 

164 

1 

2 

7 

i 

2 

160 

/ 

// 

m/// 

/ 

159 

1 

1 

i 

155 

/ 

/ 

/ 

Fy  Av.wt. 

1  82.5 

16  71.3 

28  66.4 

33  62.8 

26  59.2 

13  57.9 

3  54.2 


Fx 


10 


28 


37 


22 


Av.  ht.    162.5      166.5      169.8      172.8      173.6      178.6      178.5 


(A) 


Weight 

80-84 
75-79 
70-74 
65-69 
60-64 
55-59 
50-54 
45-49 


« 


Av.  ht.  for  given  wt. 

181.7  « 

178.5  7 

178.6  S 
173.6  S 

172.8  ^ 
169.8  S 
166.5  | 
162.5  « 


Height 

185-189  w 

180-184  7 

175-179  I 

170-174  X 

165-169  ~ 

160-164  | 

155-159  * 


6         120 
181.7 

(B) 
Av.  wt.  for  given  ht. 

82.51  £ 

71.3J71-91 

66.4  3 

62.8  I 

59.2  S 

57.9 

54.2 


a 


Increase  in  average  height 19.2-^-6.55  =  2.93 

Corresponding  increase  in  actual  weight 37 . 5  -f-  7 .  75  =  4 .  84 

Ratio,  ttt7=    -60 
4.84 

Increase  in  average  weight 17.7^-7.75  =  2.28 

Corresponding  increase  in  height 25-^6.55  =  3.82 

Ratio,  |^|=    .60 

Average  height  =  172 . 6  cms.  (rbt.  =  6 .  55  cms. 

Average  weight  =  63 . 4  kgs.  <rwt.  =  7 .  75  kgs. 

Ratio,  -^-'  =  ~Tr-  =  118 
p-ht.     o .  55 


154       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

called  a  "  scatter  diagram,"  represents  the  paired  heights  and 
weights  of  120  college  men.  The  construction  of  such  a  scat- 
ter diagram  is  relatively  simple.  Along  the  left  hand  margin 
from  bottom  to  top  are  laid  off  the  steps  of  the  height  distribu- 
tion; while  along  the  top  of  the  diagram  from  left  to  right  are 
laid  off  the  steps  of  the  weight  distribution.  Each  of  the  120 
men  may  now  be  located  on  the  diagram  with  respect  both  to 
his  height  and  his  weight.  Suppose,  for  example,  that  a  man 
weighs  68  kgs.  and  is  176  cms.  tall.  His  height  locates  him  in 
the  3rd  row  from  the  top,  and  his  weight  in  the  5th  column 
from  the  left.  Accordingly,  this  man  belongs  in  the  third 
"  cell  "  of  the  5th  column  and  a  tally  is  put  in  this  cell.  Note 
that  in  Diagram  XVI  there  are  6  men  and  6  tallies  in  this 
cell — that  is,  there  are  6  men  who  weigh  65  to  69  kgs.  and 
are  175  to  179  cms.  tall.  In  the  manner  described  every  one 
of  the  120  men  has  been  located  in  some  cell  or  square 
according  to  the  two  attributes,  height  and  weight.  Along 
the  bottom  of  the  diagram  in  the  Fx  row  will  be  found  the 
number  of  men  who  fall  within  each  weight  column  (weight 
is  the  ^-variable,  page  60) ;  while  along  the  right  hand  margin 
in  the  Fy  column  are  tabulated  the  number  of  men  who  fall 
within  each  height  row  (height  is  the  F-variable,  page  60). 
Of  course,  both  the  Fy  column  and  the  Fx  row  total  120,  the 
number  of  men  in  all.  All  of  the  frequencies  in  each  cell  may 
be  totaled  and  written  in  numerical  form  as  shown  in  the 
diagram.  When  only  the  total  frequency  in  each  cell  is  given, 
a  scatter  diagram  becomes  a  correlation  table  (see  Diagram 
XXI). 

Several  important  facts  may  be  gleaned  from  the  scatter 
diagram  as  it  stands.  For  example,  we  are  able  to  classify 
all  the  men  in  a  given  weight-column  with  regard  to  height. 
In  the  3rd  column  we  find  28  men  all  of  whom  weigh  55  to 
59  kgs.  One  of  these  28  is  180  to  184  cms.  tall;  4  are  175 
to  179  cms.  tall;  9  are  170  to  174  cms.  tall;  7  are  165  to  169 
cms.  tall;  and  7  are  160  to  164  cms.  tall.  In  the  same  way 
we  may  classify  all  the  men  within  any  height-row  accord- 


CORRELATION  155 

ing  to  weight.  In  the  row  next  to  the  bottom  we  find  that 
of  the  13  men  who  are  160  to  164  cms.  tall,  1  weighs  45  to 
49  kgs.;  2  weigh  50  to  54  kgs.;  7  weigh  55  to  59  kgs.;  1 
weighs  60  to  64  kgs.;  and  2  weigh  65  to  69  kgs.  It  is  fairly 
clear,  too,  that  the  " drift"  of  paired  heights  and  weights  is 
from  the  upper  right  section  of  the  diagram  (the  "high  score" 
end)  to  the  lower  left  hand  section  (the  "low  score"  end). 
That  is  to  say,  even  a  superficial  examination  of  the  diagram 
indicates,  in  general,  a  fairly  marked  tendency  for  tall,  medium, 
and  short  men  to  rank  high,  medium,  and  low,  respectively, 
on  the  weight  scale;  and  this  observation  holds,  in  spite  of  the 
scatter  of  heights  or  weights  within  any  given  "array"  (an 
array  is  the  distribution  of  cases  within  a  given  column  or  row) . 
Without  any  further  evidence,  therefore,  we  should  probably 
be  willing  to  hazard  the  guess  that  the  correlation  between 
height  and  weight  is  positive  and  fairly  high. 

Suppose  that  we  go  a  step  further  and  calculate  the 
average  height  of  the  men  who  weigh  45  to  49  kgs. — the  men 
in  column  1.  The  average  height  of  these  3  men — using  the 
guessed  average  method  of  Chapter  I — is  162.5  cms.,  and  this 
figure  is  entered  at  the  bottom  of  the  diagram.  In  the  same 
way,  we  can  find  the  average  height  of  the  men  who  fall  in  each 
of  the  succeeding  weight-columns.  These  averages  are  tabu- 
lated under  (A)  and  from  the  summary  it  is  evident  that  for  an 
actual  weight  increase  of  approximately  37.5  kgs.1  (from  47.5 
to  85)  we  have  a  corresponding  increase  in  average  height  of  19 . 2 
cms.  (from  162.5  to  181.7).  Thus  it  is  clear  that  in  our  group 
of  120  college  men,  an  increase  of  approximately  37.5  kgs.  in 
weight  is  paralleled  by  increase  of  19.2  cms.  in  average  height. 

Before  going  any  further  let  us  shift  from  height  to  weight, 
and  applying  the  same  method  as  above  find  the  increase  in 
average  weight  which  corresponds  to  the  actual  increase  in 
height.  Taking  the  bottom  row — the  3  men  155  to  159  cms. 
tall — we  find  that  the  average  weight  of  this  small  group  is 

1  The  complete  range  is  not  taken  into  account  because  the  data  are  scanty 
at  the  ends  of  the  distribution. 


156       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

54.2  kgs.  The  average  weight  of  the  13  men  who  are  160  to 
164  cms.  tall  is  57.9  kgs.,  and  in  like  manner  the  average 
weight  of  each  height-row  may  be  found  and  entered  in  the 
" Average  Weight"  column.  Summarizing  the  results  for  the 
group  in  (B)  as  we  did  in  (A)  above,  we  find  that  along  with  an 
increase  in  height  of  25  cms.  (160  to  185)  there  goes  a  cor- 
responding increase  in  average  weight  of  17.7  kgs.1  (71.9  to 
54.2). 

Now  if  the  coefficient  of  correlation  measures  the  mutual 
dependence  or  the  degree  of  correspondence  between  two  sets 
of  scores  or  measures,  we  should  expect  the  ratio 

increase  in  average  height  19.2  .  ,. 

e.g.,  ^—  to  measure  the  cor- 


corresponding  increase  in  weight'  37.5 

relation  of  height  and  weight,  that  is,  to  give  us  r.     And  like- 
wise, and  for  the  same  reasons,  we  should  expect  the  ratio 

increase  in  average  weight  17.7    ,  ,, 

e.g.,  -^=-  also  to  measure  the 


corresponding  increase  in  height'  25 

correlation.  The  two  ratios  work  out,  however,  to  be  .  51  and 
.71  respectively,  which  means  evidently  that  neither  is  suit- 
able as  a  measure  of  correlation,  since  the  relation  of  height  to 
weight  should  certainly  be  the  same  as  the  relation  of  weight 
to  height  in  the  same  group. 

The  difficulty  here — and  while  not  an  obvious  one,  it  is  easy 
to  understand  once  it  has  been  pointed  out — is  that  we  have 
failed  to  take  account  of  the  fact  that  the  increases  in  height 
and  weight,  and  naturally  the  ratios  formed  from  them, 
depend  for  their  numerical  value  upon  the  units  which  we  have 
arbitrarily  chosen  for  measuring  height  and  weight.  Thus 
while  we  have  measured  height  in  cms.  and  weight  in  kgs.,  it  is 
clear  that  different  units,  say,  of  1  mm.  for  height  and  1  kg. 
for  weight,  or  of  1  inch  for  height  and  1  lb.  for  weight,  would 
have  given  us  very  different  ratios.  In  other  words,  the  ratios 
which  give  the  change  in  average  height  with  corresponding 
change  in  weight,  and  the  change  in  average  weight  with  cor- 

i  The  single  F  in  the  top  row  has  been  combined  with  the  F  of  the  row  just 
below  to  prevent  overweighting. 


CORRELATION  157 

responding  increase  in  height  will  vary  according  to  the  units 

in  which  height  and  weight  are  measured,  and  we  have  no  way 

of  telling  which  ratio  (or  what  unit)  is  the  right  one.     The 

best  way  out  of  this  difficulty  is  to  express  the  changes  in 

height  and  weight  in  terms  of  the  a's  of  the  height  and  weight 

distributions,  respectively.     It  will  make  no  difference  then 

in  what  units  our  original  measurements  have  been  made,  as 

changes  in  both  height  and  weight  will  be  recorded  in  terms 

of  <j.     The  <j  of  the  height  distribution  of  our  120  men  is  6.55 

cms.,  and  the  a  of  the  weight  distribution  is  7.75  kgs.  (see 

Diagram  XVI).  Accordingly,  if  we  divide  the  increase  in  average 

height  and  the  parallel  increase  in  weight  by  6.55  and  7.75 

. .     !      „         . .         increase  in  average  height       , 

respectively,  the  ratio T. — ^— -. —    .  ,  -  becomes 

corresponding  increase  in  weight 

2  93 

. '   j  or  .605  (see  Diagram  XVI).     And  in  like  manner,  if  we 

divide  the  increase  in  average  weight  and  the  parallel  increase  in 

height    by    7.75    and    6.55,    respectively,    the    second    ratio, 

increase  in  average  weight       ,  2.28         0^    „. 

becomes  -  —  or  .60.    lire  two 


corresponding  increase  in  height  3 .  82 

ratios  are  now  equal,  and  either  may  be  taken  as  representing  the 
coefficient  of  correlation1 — as  giving  the  degree  of  association 
between  height  and  weight  in  our  group  of  120  men. 

This  method  of  finding  relationship  is  useful  for  demon- 
strating in  a  simple  way  what  the  ratio  which  we  call  the  coeffi- 
cient of  correlation  actually  does.  It  is,  however,  neither  a 
very  practical  nor  precise  method  of  finding  a  coefficient  of 
correlation  and  is  never  used  in  actual  practice.  Its  chief  lack 
of  precision  lies  in  the  fact  that  in  estimating  the  range  of 
scores  or  measures  in  either  or  both  distributions  (see  footnote, 
page  155)  we  are  often  uncertain  where  to  begin  or  end  the 
series,  due  to  the  fact  that  the  data  are  oftentimes  scanty  at 
the  extremes  of  the  distributions.  As  a  matter  of  fact,  the  coeffi- 
cient of  correlation  in  the  present  problem  was  first  found 

1  On  a  scale  in  which  1.00  denotes  perfect  relation. 


158       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

by  the  method  given  later  on  in  Section  III,  and  proper  adjust- 
ment was  then  made  in  the  ranges  so  as  to  give  the  correct  r. 

2.  Graphical  Representation  of  the  Coefficient  of  Correlation 

Not  only  can  we  represent  the  coefficient  of  correlation  as 
a  ratio,  but  we  can  also  demonstrate  graphically  what  a  coeffi- 
cient of  correlation  means.  The  correlation  coefficient  of 
.  60  found  in  Diagram  XVI  between  height  and  weight  is  shown 
graphically  in  Diagram  XVII.  In  this  diagram  the  distance 
taken  to  represent  one  unit  (consider  the  step-interval  as  the 
unit)  on  the  height  scale  and  the  distance  taken  to  represent 
one  unit  on  the  weight  scale  have  been  selected  with  due  regard 
for  the  difference  in  size  of  the  two  cr's  in  order  that  changes 
in  height  and  weight  may  be  comparable.  This  adjustment 
is  a  very  simple  one.  We  know  from  Diagram  XVI  that 
the  cT(Wt.)  which  equals  7.75  kgs.  is  1.18  times  the  or(ht.)  which 

equals  6.55  cms.  (since  '  '  =1.18).  Hence  it  is  only  neces- 
sary that  we  take  each  height-step  1 .  18  times  the  length  ar- 
bitrarily taken  to  represent  one  weight-step,  in  order  that  the 
X  and  Y  distances  may  be  comparable.  (Since  the  weight 
distribution  is  laid  off  from  left  to  right,  and  the  height  dis- 
tribution from  bottom  to  top,  the  first  may  be  referred  to  as 
the  X  variable,  and  the  second  as  the  Y  variable,  see  page  60.) 
To  take  a  simpler  case,  if  the  a  for  height  were  twice  as  large 
as  the  a  for  weight,  we  should  take  each  step  on  the  height 
scale  just  \  each  step  on  the  weight  scale. 

When  the  diagram  has  been  laid  out  in  the  manner  described 
above  represent  by  a  cross  the  mean  height  of  the  men  in 
each  array — each  weight  column  (these  mean  heights  may 
be  found  from  Diagram  XVI).  Next,  draw  a  vertical  line 
through  the  mean  of  the  distribution  of  120  weights,  and  a 
horizontal  line  through  the  mean  of  the  distribution  of  120 
heights.  [The  average  height  of  the  120  men  is  172.6  cms., 
and  their  average  weight  is  63.4  kgs.  (see  Diagram  XVI)]. 
With  these  two  lines  as  coordinate  axes,  draw  through   their 


CORRELATION 


159 


intersection  (the  origin)  a  straight  line  which  shall  go  through, 
or  as  close  as  possible  to,  each  of  the  crosses  which  have  been 
plotted.     A  rough — but  fairly  accurate — method   of   drawing 


a 


.22  *^ 

> 

*"'      T— I 


aH 

o 

a 

•i-l    C5 
+3    CD 

S  2 


45-49         50-54 


Weight  in  Kgs.      (X  -  variable) 
55-59         60-64         65-69         TO -74 


rs-79 


80-84 


sc=3 

o 

/o 

"^    X 

II 

x/ 

X 

?y=3 



X* 

£C=5 

xx 

y/y. 

°/ 

X 

/      ° 

'    o 

Average  weight  line  drawn  through  63.4  kgs. 
height    "         "  "        172.6  cms. 

DIAGRAM  XVII 

Coefficient  of  Correlation  Shown  Graphically 


such  a  line  is  to  stretch  a  black  thread  through  the  origin  and 
shift  it  back  and  forth  until  it  touches  as  many  crosses  as 
possible.  The  crosses  at  the  extremes  need  not  concern  us 
very  much,  since  they  are  located  from  only  a  few  cases.     This 


160      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

sloping  line,  which  may  be  called  the  line  of  "  best  fit,"  describes 

better  than  any  other  straight  line  the  "  run  "  of  the  crosses — 

the  increase  in  average  height  which  corresponds  to  the  given 

increase  in  weight.     Accordingly,  to  find  the  correlation  simply 

find  the  ratio  of  the   distance  of  any  point  on   this  sloping 

line   from   the   horizontal   or  X-axis   to   the   distance   of  the 

same  point  from   the   vertical   or  Y-axis.     For  example,  if  a 

convenient   point   P  is   taken  with  x  =  5  cms.,  its  y  distance 

(measured  by  mm.  ruler)  will  be  found  to  be  approximately 

y  .    3 
3  cms.,  and  the  ratio  -  is  -=  or  .60.     In  like  manner,  the  x  and 

x      5 

y  coordinates  of  any  other  point  on  this  sloping  line  will  be 

y 
found  to  give  the  ratio  -  a  value  of  .  60. 

x 

2  93 

Our  sloping  line  pictures  graphically  the  ratio     '       — the 

4 .  o-± 

correlation  of  .60 — which  we  worked  out  in  (1)  above.     This 

line,  which  will  be  known  hereafter  as  the  "  regression  line 

of  height  on  weight,"  has  important  properties  which  will  be 

considered  later  (page  173).     Also  in  the  following  sections  we 

shall  give  the  equation  of  this  line,  which  will  enable  us  to  draw 

it  in  on  the  diagram  very  much  more  accurately  than  can  be 

done  by  the  trial-and-error  method  described  on  page  159. 

It  is  a  comparatively  easy  though   not  a   necessary  task 

to  verify  the   correlation   coefficient  of    .60  found  from  the 

regression  line  of  height  on  weight  by  drawing  in  the  second 

"  regression  line,"  that  of  weight  on  height.     This  can  be  done 

by  designating  the  means  of  the  different  height -rows  by  circles 

in  exactly  the  same  manner  in  which  we  marked  the  means  of 

the  weight-columns  by  crosses.     (The  means  of  the  rows  may 

be  obtained  from  Diagram  XVI.)    The  mean  of  the  lowest  row 

is  54 . 2,  of  next  above  57 . 9,  etc.    When  all  of  the  circles  have 

been  correctly  placed,  we  draw  a  straight  line  which  shall  go 

through — or  as  close  as  possible  to — each  circle,  just  as  we  did 

with  the  crosses  above.     Now  if  a  point  P'  is  taken  on  this 

second  line  with  a  y  =  5  cms.,  its  x  distance  will  be  found  to  be 


CORRELATION  161 

approximately  3  cms.,  and  the  ratio  -  is  .60.     This  relation 

holds  for  any  point  on  the  line.  Both  regression  lines,  there- 
fore, give  us  the  same  measure  of  the  correlation  between  height 
and  weight. 

Diagram  XVII  is  still  further  useful  in  showing  just  what  a 
correlation  of  1.00,  0,  or  —1.00  is  graphically.  Suppose  (1) 
that  the  two  regression  lines  in  the  figure  move  together  until 
they  coincide  in  such  a  way  as  to  make  an  angle  of  45  degrees 
with  the  horizontal  or  X-axis.  The  x  value  of  any  point  on 
this  "  compound  "  line  will  always  equal  its  y  value — hence 

the  ratios  -  and  -  are  always  equal  to  each  other  l  and  r  equals 

1 .  00  (see  Diagram  XVIII).     Accordingly,  in  perfect  positive  cor- 
relation, ail  the  crosses  and  all  the  circles  in  a 
correlation  diagram  fall  along  a  single  straight 
line  which  runs  from  the  upper  right  hand 
section  of  the  diagram  (the  1st  quadrant)  to 
the  lower  left  hand  section  (the  3rd  quadrant).  x 
The  tallest  man  is  the  heaviest,    the  next 
tallest,    the   next   heaviest,   and  throughout 
the  entire  120  the  correspondence  of  height     diagram  xviii 
and  weight  is  always  1  to  1. 

Now  suppose  (2)  that  the  first  regression  line,  the  line 
through  the  means  of  the  height  arrays  in  the  columns — 
through  the  crosses — moves  around  until  it  coincides  with  the 
X-axis,  the  line  through  the  average  of  all  the  heights  in  the 
table.  And  suppose  again  that  the  second  regression  line,  the 
line  through  the  means  of  the  weight  arrays  in  the  rows — 
through  the  circles — moves  around  until  it  coincides  with  the 
F-axis,  the  line  through  the  average  of  all  the  weights  in  the 

v         x 
table.      The  ratios  -  and  -  are  now  both  equal  to  0  (since  in 

x  y 

the  first  case  x,  and  in  the  second  case  y,  equals  0)  and  r,  the 

1  This  is  true  also  because  the  compound  regression  line  becomes  the  diagonal 
of  a  square.     Again,  the  tangent  of  an  angle  of  45°  =  1.00. 


ftf 

162       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 


o 
o 
o 


)(  X  X 


X  XX 


o 

C) 
C) 


DIAGRAM  XIX 


coefficient  of  correlation,  equals  0.     The  conclusion  that  r  =  0 

might  also  be  drawn  from  the  fact  that  under  the  conditions 

described  the  average  height  is  the  same  for  the  whole  range  of 

weights  and  the  average  weight  the  same  for  the  whole  range 

of  heights.     Hence,  a  man  of  average  height  is  equally  liable 

v  to   be   heavy,   medium,  or  light,  and  a  man 

of  average  weight  equally  liable  to  be  tall, 

medium,  or  short.     (Compare  with  the  case 

in   which   the  average  tapping  rate  was  the 

same    for    very    high,    high,    and    medium 

high    Alpha    scores,   page  150.)      A   picture 

of    zero    correlation    is   shown   in   Diagram 

XIX. 

Lastly,  suppose   (3)   that  the  two  regression  lines   swing 

around  until  they  run  from  the  upper  left  hand  section  (the 

2nd  quadrant)  to  the  lower  right   hand   section    (the   fourth 

quadrant).     Now  if  the  two  lines  again  coincide  so  as  to  make 

an  angle  of  45  degrees  with  the  X-axis — as  described  in  (1) — 

the  x  of  any  point  on  this  compound  line  will  always  equal  the 

v         x 
y  of  the  same  point,  and  the  ratios  -  and  -  will  again  always 

x  y 

equal  1.00.     A  glance  at  the  figure  will  show,  however,  that 
either  the  x  or  the  y  of  these  ratios  must 
always  be  negative,  and  for  this  reason  the 
ratios  will  always  be  negative.     The  coef- 
ficient    of    correlation,    therefore,    equals 
—  1.00,    and    the  relation    is    perfect   but 
inverse.      In  perfect  negative  correlation,  it 
is  clear  then  that  all  of  the  crosses  and  all 
of  the   circles  fall  along  a  single  straight 
line  which  runs  from  the  upper  left  to   the  lower  right  hand 
corner  of  the  diagram.     The  tallest  man  in  the  group  is  the 
lightest,  the  next  tallest  the  next  lightest,  and  as  height  de- 
creases weight  increases  progressively.     (Diagram  XX.) 

The  regression  lines  coincide  only  when  the  correlation  is 
perfect — positive    or    negative.      For   degrees   of   correlation 


45   > 

DIAGRAM  XX 


CORRELATION  163 

between  these  limits,  the  two  regression  lines  are  separate, 
and  take  intermediate  positions  as  shown  in  Diagram  XVII 
for  an  r  =  .  60. 

III.  The  Calculation  of  the  Coefficient  of  Correlation 
by  the  Product-Moment  Method 

1.  The  Product-Moment  Formula  When  Deviations  Are  Taken 
from  the  Guessed  Averages  of  the  Two  Distributions 

With  the  meaning  of  a  coefficient  of  correlation  firmly  in 
mind  as  a  result  of  the  discussion  of  the  last  section,  we  are 
now  ready  to  consider  the  calculation  of  r  by  the  product- 
moment  method.1  Diagram  XXI  will  serve  as  an  illustration 
of  the  computations  involved.  This  correlation  table  gives 
the  paired  heights  and  weights  of  120  college  men  and  is 
derived  from  the  scatter  diagram  for  the  same  data  shown  in 
Diagram  XVI.  The  complete  process  of  calculating  r  is  out- 
lined in  the  following  steps.  (Diagram  XXI  should  be  con- 
stantly referred  to  in  the  discussion  that  follows.) 

Step  I 

Construct  a  scatter  diagram  and  from  it  a  correlation  table 
as  described  on  page  154. 

Step  II 

Guess  an  average  for  the  height  distribution  (given  in  the 
Fy  column),  and  draw  double  lines  to  mark  off  the  row  which 
contains  the  GA^,  as  shown  in  Diagram  XXI.  Note  that 
the  average  for  the  height  distribution  has  been  guessed  at 
172.5  (midpoint  of  interval  170-174)  and  that  Dy's  have 
been  taken  from  this  point.  Now  fill  in  the  FDy  and  the 
FDy2  columns.  From  the  first  column  the  correction  Cv  (cy  in 
units  of  step)  is  obtained;  and  this  correction  together  with  the 
sum  of  the  FDy2  column  will  give  the  <j  of  the  height  distribu- 
tion, uy.  The  value  of  <ry  is  6.55  cms.  (1.31X5) — see  calcula- 
tions in  the  Diagram. 

1  The  r  found  by  this  method  is  often  called  the  "  Pearson  r  "  after  Prof. 
Karl  Pearson,  who  devised  the  product-moment  formula,  following  Bravais's 
earlier  work. 


164       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 


DIAGRAM  XXI 

Calculation  of  the  Product-Moment  Coefficient  of  Correlation 
between  the  heights  and  weights  of  120  college  men 

Weight  in  kgs.  CX  variable) 


4549 

50-54 

55-69 

60-64 

65-69 

70-74 

75-79 

80-84 

Fy 

By 

3 
2 
I 
0 

-1 
,    a 

-3 

FDy       FDfr             2a»V 
3              9            12 

oo 

(12) 

1 

12 

I 

16 
28 
33 
26 
13 
3 

*•*!    TO 

C    l-H 

"ice  3 

(-2) 
1 
-2 

3 

3(?) 

6 

(4) 

4 
16 

(G) 

2 
12 

3<8> 
24 

32            64             68             2 

(-1) 
4 

-4 

0 

11 

(l) 
6 
6 

(2) 

3 

6 

(3) 

2 
6 

(4) 

2 

8 

28(63)      28             26             4 

Eg 

2° 

0 
9 

0 

11 

8° 

2° 

1° 

(3) 
1 
3 

(2) 

5 
10 

(1) 

7 
7 

0 
10 

(-1) 
3 
-3 

-26             26             20             3 

<S  3 

CO 

(6) 
1 
6 

2<4) 
8 

7(2> 
14 

0 

1 

-4 

-26             52             28             4 

J? 

(9) 
1 
9 

(6) 
1 
6 

0 

1 

-  9  (-61)  27             15 

Ea 
Ac 

3         10        28        37        22 
-3-2-1          0          1 

9          5          6        120 
2         3          4 

2           206           159         -13 

(146) 

iFDa; -9     -20     -28  (-57)          22 

18        15        24  (79)  =22 

,.FZ>|     27        40        28                    22 

36        45        9S  =  2Q4 

Calculation  of  r: 

VEST-017 

22 
Cx  =  ^-T=.183 

146 
Y^-.017X.183 

c22/=.0003 

c2*=.0334 

r 

1.31X1.55 

Cy=.0 

85 

<5 

Cx=. 

915 

r 

=  .60 

S-.OOOS) 

/294 

/             0334X5 

PEr 

.6745[l-(.60)2] 
Vl20 

<ry  =  1.3lX5 

<rx  =  1.55X5 

PEr 

=  .04  (Table  XVIII) 

tTy  =  6 .  55 

<rz  =  7.75 

Now  guess  an  average  for  the  weight  distribution  (given 
in  the  Fx  row)  and  draw  double  lines  to  designate  the  column 
which  contains  the  GA{yrt,).  The  average  of  the  weight 
distribution  has  been  guessed  at  62.5  (midpoint  of  interval 
60-64)  and  ZVs  have  been  taken  from  this  point.  Fill  in  the 
FDX  and  FDX2  rows.     From   these    rows   the    correction   Cx 


CORRELATION  165 

(cx  units  of  step)  and  the  a  of  the  weight  distribution  ax,  may  be 
obtained.  The  value  of  ax  is  7.75  kgs.  (1.55X5) — see  calcula- 
tions on  the  Diagram. 

Step  III 

The  calculations  in  Step  II  simply  repeat  the  familiar  proc- 
ess of  finding  a  <r  by  the  Guessed  Average  Method.  (Chapter 
I,  page  35.)  Our  first  new  task  is  to  fill  in  the  'Zx'y'  column. 
The  entries  in  this  column  may  be  either  +  or  — ,  and  hence 
two  columns  are  provided  under  ^x'y',  one  for  plus  and  one 
for  minus  entries. 

The  procedure  for  determining  the  entries  in  the  2x'y' 
column  may  be  illustrated  by  taking  the  single  entry  in  the 
only  occupied  cell  in  the  topmost  row.  The  deviation  of  this 
cell  from  the  GA  of  the  weight  distribution,  that  is,  its  Dx,  is  4 
steps,  and  its  deviation  from  the  GA  of  the  height  distribution, 
its  Dy,  is  3  steps.  Hence,  the  product  of  the  deviations  of  this 
cell — its  "  product-moment  " — from  the  two  guessed  averages 
is  4X3  or  12,  and  a  small  figure  12  is  placed  in  the  upper 
right  hand  corner  of  the  cell.1  Moreover,  since  the  "  product- 
moment  "  of  the  1  frequency  in  this  cell  is  1(4X3)  or  12  also, 
a  figure  12  is  placed  in  the  lower  left  hand  corner  of  the  cell  to 
denote  the  product  of  the  deviations  (or  the  product-deviation) 
of  this  single  frequency  from  the  two  GA's.  There  are  no 
other  frequencies  in  the  cells  of  this  row,  and  12  is  placed  at 
once  in  the  Xx'y'  column  2  under  the  +  sign. 

Now  let  us  consider  the  next  row  from  the  top,  taking  the 
cells  in  order  from  right  to  left.  The  cell  below  the  one  whose 
product-deviation  we  have  just  found,  also  deviates  4  steps 
from  the  GA  of  the  weight  distribution  (its  Dx  =  4)  but  its  devia- 
tion from  the  GA  of  the  height  distribution  is  only  2  steps 

1  We  may  take  the  coordinates  of  this  cell  to  be  x  =  4,  and  y  =3.  The  first 
is  obtained  by  counting  over  4  steps  from  the  vertical  column  containing  the 
GA  for  weight,  and  the  second  by  counting  up  3  steps  from  the  horizontal  row 
containing  the  GA  for  height.  In  each  case  the  unit  of  measurement  is  the  step- 
interval. 

2  The  prime  (')  of  x  and  y  deviations  is  to  indicate  that  all  deviations  are 
taken  from  the  two  GA's. 


166       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

(its  Dy  =  2).  Hence  the  product-deviation  of  this  cell  is  4X2 
or  8  [note  the  small  (8)  in  the  upper  right  hand  corner  of  the 
cell],  and  since  there  are  3  frequencies  in  the  cell,  each  with  a 
product-deviation  of  8,  the  final  entry  in  the  lower  left  hand 
corner  of  this  cell  is  3(4X2)  or  24.  In  like  manner,  the  product- 
deviation  of  the  2nd  cell  in  the  row  is  6, — its  Dx=3,  and  its 
Dy  =  2, — and  since  there  are  2  frequencies  in  the  cell,  the  final 
entry  is  2(3X2)  or  12.  Each  of  the  4  frequencies  in  the  third 
cell  has  a  product-deviation  of  4  (the  Dx  of  the  cell  is  2,  and  the 
Dy  is  2  also)  and  the  final  cell  entry  is  4(2X2)  or  16.  In  the 
4th  cell  each  of  the  3  frequencies  has  a  Dx  of  1  and  a  Dv  of  2, 
and  the  product  deviation  is  3(1X2)  or  6.  The  entry  of  the 
5th  cell,  the  cell  in  the  (?A(wt0  column,  is  0,  since  Dx  =  0,  and 
of  course  3(2X0)  =0.  Notice  particularly  the  entry  in  the 
last  cell  of  this  row,  viz.,  —2.  This  negative  entry  results 
from  the  fact  that  the  deviation  of  this  cell  from  the  GA(wt0, 
its  Dx,  is  —1,  and  its  Dy  is  2;  the  product-deviation  of  its 
single  frequency,  therefore,  is  1(  —  1X2)  or  —2.  Now  total 
separately  the  plus  and  minus  x'y"s  in  this  row.  The  results, 
58  and  —2,  are  entered  separately  in  the  lix'y*  column  under 
the  appropriate  signs. 

The  final  entries  of  the  cells  in  the  other  rows  in  the  table 
and  the  sums  of  the  product-deviations  of  each  row  are  obtained 
in  the  manner  described  above.  It  must  be  borne  in  mind 
in  calculating  x'y"s  that  the  product-deviations  of  all  frequencies 
in  the  first  and  third  quadrants  are  positive,  while  the  product- 
deviations  of  all  the  frequencies  in  the  second  and  fourth  quad- 
rants are  negative  (see  page  162).  Also  remember  that  all 
frequencies  in  either  the  column  containing  the  GAiwti)  or 
in  the  row  containing  the  GAiht,}  have  0  product-deviations, 
since  in  one  case  the  Dx,  and  in  the  other  the  Dy,  equals  0. 

All  frequencies  in  any  given  row  have  the  same  Dy,  and  for 
this  reason  the  arithmetic  of  calculation  may  be  considerably 
reduced  if  each  frequency  in  the  row  is  first  multiplied  by  its 
DXj  and  the  sum  of  these  deviations  multiplied  once  for  all 
by  the  common  Dv.     To  illustrate,  for  the  2nd  row  from  the 


CORRELATION  1G7 

bottom — taking  the  cells  from  right  to  left — when  we  multiply 
the  frequency  of  each  cell  by  its  DX)  the  result  is  (2  X 1)  +  (1 X  0)  + 
(7X-l)  +  (2X-2)  +  (lX-3)  or  -12.  Now  multiplying  this 
partial  "  deviation-sum  "  by  the  Dy  of  the  whole  row,  i.e.,  by 
—  2,  we  get  24  at  the  final  Hx'y'  entry  for  the  row.  This  result 
checks  the  28  and  —4  entered  separately  in  the  lix'y'  column. 
This  shorter  method  is  useful  in  getting  the  total  Xx'y'  entry  of  a 
given  row  quickly.  It  is  less  easy  to  check  for  errors,  however, 
than  the  method  of  getting  the  entry  for  each  cell  separately, 
illustrated  on  page  166. l 

Step  IV 

When  the  sum  of  the  product-deviations  of  each  row  have 
been  entered  in  the  Zx'y'  column,  the  algebraic  sum  of  the 
Xx'y'  column  may  be  obtained  (e.g.,  159  —  13  =  146).  The 
coefficient  of  correlation  is  then  found  by  the  formula: 


(23) 


x'y' 

■at              <-ZOy 

Xx'y' 

146. 
120 ' 

<Jx(Jy 

for   cx, 

Substituting  for    (Ar   ,  r^:    for   cx,   .183;    for   cv,  .017:    and 
I\        1Z0 

for  ax  and  <rV}  1.55  and  1.31,  respectively,  (see  Diagram  XXI 
for  figures)  r  is  found  to  equal .  60. 

Notice  that  the  terms  cx,  cy,  ax  and  oy  are  all  left  in  units  of 
step-interval  when  substituted  in  formula  (23).  This  is  done 
simply  because  all  product-deviations  (x'yns)  are  in  step-units 
and  hence  it  is  very  much  easier  to  keep  all  the  other  terms 
in  the  formula,  and  in  consequence  both  numerator  and  de- 
nominator, in  step-units.     By  this  procedure  the  value  of  the 

1  Printed  charts  for  facilitating  the  calculation  of  coefficients  of  correlation 
by  the  product-moment  method  are  now  available.  Examples  are  the  Ruch- 
Stoddard  Correlation  Charts,  University  Bookstore,  Iowa  City,  Iowa,  and 
Thurstone  Correlation  Data  Sheet,  C.  H.  Stoelting  &  Co.,  Chicago.  The  first 
of  these  gives  the  product-deviation  of  each  cell  printed  on  the  chart.  Otis 
has  also  devised  a  correlation  chart  based  on  the  product-moment  method  which 
does  away  with  the  necessity  of  finding  the  x'y,J&.  This  chart  is  published  with 
directions  for  its  use  by  the  World  Book  Co.,  Yonkers,  N.  Y. 


168       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

fraction — the  coefficient  of  correlation — is  not  changed  and  the 
arithmetic  is  considerably  reduced. 

2,  The  Product-Moment  Formula  When  Deviations  Are  Taken 
from  the  Actual  Averages  of  the  Two  Distributions 

Since  formula  (23)  assumes  that  all  x  and  y  deviations  have 
been  taken  from  the  two  guessed  averages,  for  this  reason  it  is 

necessary  to  correct  — ~  by  the  amount  of  the  two  corrections, 

cx  and  cy.  If  deviations  are  taken  from  the  actual  averages  of 
the  two  distributions  instead  of  from  the  GA's,  no  correction 
is  needed,  as  both  cx  and  cv  then  equal  0.  Thus  when  devia- 
tions are  taken  from  the  two  averages,  formula  (23)  becomes 

Xxy  (24) 


NaxVy 

and  this  is  the  form  in  which  the  product-moment  formula  is 
usually  written.     The  formula  may  be  put  in  still  another  form. 

If  we  write  J-rr-  for  <jx  and  \/-tt-  for  <?V)  the  formula  then 
becomes  (the  Ns  cancel) 

VZx2  •  v  2y2 

in  which  the  x  and  y  deviations  are  from  the  averages  as  in 

(24)  and  Vzx2  and  vlj/2  are  the  sums  of  the  squared  devia- 
tions from  the  two  averages. 

Formula  (23)  should  always  be  used  when  there  are  more 
than,  say,  30  or  40  cases.  Formula  (25)  may  be  used,  to 
advantage,  however,  with  short  series  when  the  purpose  of  the 
experimenter  is  to  find  whether  there  is  any  relation  present 
rather  than  to  discover  the  degree  of  relation  very  accurately. 
No  correlation  table  is  required  with  formula  (25).  An  illus- 
tration of  the  use  of  this  formula  is  given  in  Table  XVII,  in 
which  the  problem  is  to  find  the  correlation  between  the  scores 


CORRELATION 


169 


TABLE  XVII 

To  Illustrate  the  Calculation  of  r  when  Deviations  are  Taken 
from  the  Averages  of  the  Distributions 


Score  in 

Score  in 

Individual  Testl(Z) 

Test  2(F) 

X 

V 

x2 

y2 

xy 

A 

50 

22 

-12 

-8.4 

144 

70.56 

100.8 

B 

53 

25 

-  9 

-5.4 

81 

29.16 

48.6 

C 

56 

34 

-  6 

3.6 

36 

12.96 

-21.6 

D 

58 

28 

-  4 

-2.4 

16 

5.76 

9.6 

E 

60 

26 

-  2 

-4.4 

4 

19.36 

8.8 

F 

61 

30 

-   1 

-    .4 

1 

.16 

.4 

G 

61 

32 

-   1 

1.6 

1 

2.56 

-  1.6 

H 

64 

30 

2 

-    .4 

4 

.16 

-      .8 

I 

67 

28 

5 

-2.4 

25 

5.76 

-12.0 

J 

70 

34 

8 

3.6 

64 

12.96 

28.8 

K 

71 

36 

9 

5.6 

81 

31.36 

50.4 

L 

73 

40 

11 

9.6 

121 

92.16 

105.6 

Average     62 


30.4 

Average  (Test  1)=62.0 
Average  (Test  2)  =30.4 


578 


282.92 


317 


V578- V282. 92 


=  .78 


Pi^-6745(1Zl-78)V08 


317.0 


made  on  two  tests  of  association  by  12  adults.     The  steps  in 
finding  r  may  be  outlined  as  follows : 

Step  I 

Find  the  average  of  Test  1  and  the  average  of  Test  2.  In  the 
table  the  first  average  is  62 . 0,  and  the  second,  30 . 4. 

Step  II 

Find  the  deviations  of  each  score  in  Test  1  from  its  average,  62, 
and  enter  in  column  x.  (The  deviations  from  the  average  of  the  first 
test  may  be  called  ^-deviations,  those  from  the  average  of  the  second 
test,  y-deviations.)  Find  the  deviation  of  each  score  in  Test  2  from 
its  average,  30 . 4,  and  enter  in  column  y. 

Step  III 

Square  all  ^-deviations,  and  all  ^-deviations,  and  enter  these  squares 
in  columns  x2  and  y2,  respectively. 


170      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

Step  IV 

Multiply  the  corresponding  x  and  y  deviations  and  enter  these 
products  in  the  xy  column. 

Step  V 

Substitute  for  Xxy  (317),  for  2z2  (578),  for  2?/2  (282.92)  in  formula 
(25)  as  shown  in  Ta.ble  XVII,  and  solve  for  r. 

IV.  The  Probable  Error  of  a  Coefficient  of  Correlation 
The  PE  of  an  r  may  be  found  from  the  formula, 

m  =  16745XO-^ 

VN 

If  we  substitute  in  formula  (26)  the  r—  .60  and  the  N=  120 
of  the  height-weight  problem  (see  Diagram  XXI),  PET  will 
equal  .04.1  This  means  that  the  chances  are  even  that  the 
"  true  "  r  falls  within  the  limits  .  60db  .04,  or  between  .56  and 
.64;  and  that  the  chances  are  9930  in  10,000  (Table  XI)  that 
the  true  r  falls  within  the  limits  .60±4X  .04,  or  between  .44 
and  .76.  By  the  true  r  is  meant  (see  page  118)  that  r  which 
we  should  expect  to  get  between  height  and  weight  in  the 
population  from  which  our  group  of  120  is,  presumably,  a 
random  sampling. 

To  be  reasonably  sure  that  there  is  some  correlation  present 
an  obtained  r  should  be  at  least  4  times  its  PE.  For  example, 
given  the  situation  in  which  r  is  exactly  4  times  its  PE,  in  which, 
say,  r=  .16  and  PEr=  .04,  we  can  only  be  sure  that  the  true  r 
falls  within  the  limits  .  16±4X  .04,  or  between  0  and  .32.  It 
is  customary,  therefore,  not  to  consider  an  r  as  reliable — as  in- 
dicative of  a  correlation  at  least  better  than  0 — unless  it  is  at 
least  4  times  its  PE.  To  be  certain  of  a  low  degree  of  correla- 
tion an  r  should  be  5  or  6  times  its  PE. 

We  found  in  Chapter  III  that  the  reliability  of  the  differ- 
ence between  two  averages  or  two  medians  can  be  calculated  by 

1  If  we  know  r  and  Ar,  the  PET  may  be  read  directlv  or  bv  interpolation  from 
Table  XVIII. 


CORRELATION 


171 


means  of  the  formulas  for  <rmtt.)  and  PJ^(dia.)"(see  page  128).  In 
the  same  way,  the  reliability  of  the  difference  between  two 
obtained  r's  can  be  found  from  the  size  of  the  PE  of  their 
difference. 


TABLE 

XVIII 

Probable 

Errors 

OF  THE 

Coefficient  or  Correlation  for  Various 

Numbers  of 

Measures  (N)   and  for  Various  Values  of 

r 

Number  of 

Correlat 

ion  Coefficient  r 

Measures 

0.0 

0.1 

0.2 

0.3 

0.4 

0.5 

0.6 

20 

1508 

1493 

1448 

1373 

1267 

1131 

0965 

30 

1231 

1219 

1182 

1121 

1035 

0924 

0788 

40 

1067 

1056 

1024 

0971 

0896 

0800 

0683 

50 

0954 

0944 

0915 

0868 

0801 

0715 

0610 

70 

0806 

0798 

0774 

0734 

0677 

0605 

0516 

100 

0674 

0668 

0648 

0614 

0567 

0506 

0432 

150 

0551 

0546 

0529 

0501 

0463 

0413 

0352 

200 

0477 

0472 

0458 

0434 

0401 

0358 

0305 

250 

0426 

0421 

0409 

0387 

0358 

0319 

0272 

300 

0389 

0386 

0374 

0354 

0327 

0292 

0249 

400 

0337 

0334 

0324 

0307 

0283 

0253 

0216 

500 

0302 

0299 

0290 

0274 

0253 

0226 

0193 

1000 

0213 

0211 

0205 

0194 

0179 

0160 

0137 

Number  of 
Measures 

0.65 

0.7 

0.75 

0.8 

0.85 

0.9 

0.95 

20 

0871 

0769 

0860 

0543 

0419 

0287 

0147 

30 

0711 

0628 
0544 

0539 

0444 

0342 

0234 

0120 

40 

0616 

0467 

0384 

0296 

0203 

0104 

50 

0551 

0486 

0417 

0343 

0265 

0181 

0093 

70 

0466 

0411 

0353 

0290 

0224 

0153 

0079 

100 

0391 

0345 

0294 

0242 

0187 

0128 

0066 

150 

0318 

0281 

0241 

0198 

0153 

0105 

0054 

200 

0275 

0243 

0209 

0172 

0133 

0091 

0047 

250 

0246 

0218 

0187 

0154 

0118 

0081 

0042 

300 

0225 

0199 

0170 

0140 

0108 

0074 

0038 

400 

0195 

0172 

0148 

0122 

0094 

0064 

0033 

500 

0174 

0154 

0132 

0109 

0084 

0057 

0029 

1000 

0123 

0109 

0093 

0077 

0059 

0041 

0021 

The  formula  for  PE{ 

diff.)  between  two 

r's  is 

PEw&n-T$  =  s/PE2Tl+PE\,  .     .     .     .     (27) 

in  which  PEn  and  PEn  are  the  PE's  of  the  two  r's  to  be  com- 
pared, and  must  first  be  obtained  from  formula  (26). 

The  value  of  formula  (27)  may  be  illustrated  by  the  following 
problem.    Suppose  that  in  a  group  of  100  eight  year  old  boys  the 


172      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

r  between  IQ  and  the  A -cancellation  test  is  .  20  with  a  PE  of 
.065;  and  that  in  a  group  of  110  eight  year  old  girls  the  r  be- 
tween the  same  two  tests  is  .25  with  a  PE  of  .06.  The  corre- 
lation is  .05  higher  for  girls  than  for  boys.  Is  this  difference 
sufficiently  large  to  indicate  that  the  true  correlation  between  IQ 
and  the  A -test  is  higher  for  8  year  old  girls  than  for  8  year  old 
boys?  To  answer  this  question,  we  must  determine  the  PE 
of  the  difference  between  the  two  r's.  From  formula  (27), 
P^(diff.r1-r2)  =  'V/(.065)2+(.06)2=.09,  and  comparing  the  ob- 
tained  difference    of    .05    with    the  PE{dm,   we    find    that 

-5-^ =  .556.     This  means  (see  Table XV)  that  there  are  only 

64  chances  in  100  of  a  real  difference,  a  difference  greater 
than  0,  between  the  true  correlations  of  IQ  and  the  A  -test  for 
8  year  old  boys  and  girls.  The  difference  of  .05  is,  therefore, 
quite  unreliable.  To  be  completely  reliable  the  obtained  differ- 
ence should  be  at  least  4X.09  or  .36.  (A  difference  is  con- 
sidered reliable  when  r— is  4  or  more,  see  page  133.)     In 

*&  (diff .) 

the  present  case  the  obtained  difference  is  only  about  14  per 
cent  of  what  it  should  be  in  order  to  guarantee  a  true  difference 
between  the  r's  of  the  boys  and  girls. 

The  formulas  for  PET  and  PE^m.Tl-T2)  are  subject  to  the 
same  restrictions  and  must  be  interpreted  with  the  same  caution 
as  the  other  standard  and  probable  error  formulas  (see  Chap- 
ter III,  page  145).  In  order  to  be  of  any  real  value  as  meas- 
ures of  reliability,  PEr  and  PE{am^  should  be  calculated  for 
r's  obtained  from  random  and  reasonably  large  samples.  PE's 
found  for  r's  obtained  from  small  and  obviously  selected 
groups  may  give  an  entirely  false  picture  of  the  observed 
coefficient's  reliability — especially  when  the  coefficient  is  large. 
An  r  of  .90  found  from  20  cases,  for  instance,  is  unreliable 
despite  the  fact  that  PEr=  .03  (see  Table  XVIII).  Another 
sample  of  20  drawn  from  the  same  population  might  give  an 
r  one  half  as  large. 


CORRELATION  173 

V.  The  Regression  Equations 
1.  The  Regression  Equations  in  Deviation  Form 

We  have  already  discovered  (Diagram  XVII)  that  there  are 
two  regression  lines  in  a  correlation  table,  and  that  the  first 
"  best  fits  "  the  means  of  the  successive  columns  (the  average 
heights,  represented  by  crosses)  while  the  second  "  best  fits  " 
the  means  of  the  rows  (the  average  weights,  represented  by 
circles).  These  lines  of  "  best  fit  "  were  seen  to  be  of  value  in 
showing  graphically  the  change  in  average  height  accompanying 
a  given  change  in  weight,  and  the  change  in  average  weight 
accompanying  a  given  change  in  height.  Moreover,  we  found 
that  either  line  will  measure  the  correlation  directly  when  the 
x  and  y  steps  in  the  diagram  have  been  laid  out  with  due  allow- 
ance for  the  difference  in  size  of  the  o-'s  of  the  X  and  Y  dis- 
tributions. 

This  last  use  of  the  regression  line  is  of  little  practical  value, 
however.  It  is  very  much  easier  to  draw  up  a  correlation 
table  without  bothering  about  the  difference  in  the  two  cr's, 
and  find  r  by  the  product-moment  formula  as  shown  in 
Diagram  XXI,  thah  to  try  and  estimate  r  from  the  regression 
lines.  In  fact,  the  real  value  of  the  regression  lines  is  not  to 
give  r,  but  to  enable  us  to  " predict"  an  individual's  "most 
probable"  standing  in  a  test  or  series  of  measures,  given  his 
standing  in  another  test  or  series  of  measures. 

We  may  describe  briefly  how  this  is  done.  Suppose  that 
we  wish  to  estimate  a  man's  height  from  our  correlation  table, 
knowing  his  weight  to  be  68  kgs.  Now  the  best  possible 
"  guess  "  that  we  can  make  of  this  man's  height  is  to  give  the 
average  height  of  all  men  who  fall  in  the  65-69  weight  interval. 
From  Diagram  XVI  the  "  mean  weight  "  of  the  25  men  in  this 
column  is  found  to  be  173.6  cms.,  and  hence  173.6  cms.  is  the 
most  likely  height  of  a  man  who  weighs  68  kgs.  In  like  manner, 
the  most  probable  height  of  a  man  who  weighs  72  kgs.  is  178 . 6 
cms. — the  mean  height  of  the  9  men  who  fall  in  the  weight 
column  70-74  kgs.     In  general,  then,  the  most  probable  height 


174       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

of  any  man  is  the  mean  of  the  heights  of  all  the  men  in  the  group 
who  weigh  the  same  (approximately)  as  he — who  fall  in  the 
same  weight  column.1  The  line  which  best  fits  the  mean 
heights  of  the  successive  weight-columns  is  the  line  which  gives 
the  change  in  average  height  with  the  change  in  weight  (the 
line  through  the  crosses  in  Diagram  XVII).  Given  a  man's 
weight,  therefore,  we  can  best  "  predict  "  his  height  from  the 
regression  line  of  height  on  weight;  and  by  analogy,  given  a 
man's  height,  we  can  best  predict  his  weight  from  the  regres- 
sion line  of  weight  on  height  (the  line  through  the  circles  in 
Diagram  XVII). 

If  we  had  the  equations  of  the  two  regression  lines,  it 
would  seem  obvious  that  estimates  could  be  made  from  these 
much  more  efficiently  and  quickly  than  from  the  plotted 
regression  lines.  For  then  knowing  a  man's  standing  in  the 
X- variable  (his  weight)  we  should  be  able  on  substituting  in 
the  equation  connecting  X  and  Y  to  find  directly  his  most 
probable  standing  in  the  F-variable  (height).  The  equations 
of  the  two  regression  lines  have  been  deduced  by  Prof.  Karl 
Pearson,  who  took  as  his  criterion  the  idea  of  the  "  best  fit- 
ting "  fine.  Pearson's  method,  briefly,  was  to  find  the  equa- 
tion of  that  line  from  which  the  sum  of  the  squares  of  the 
deviations  of  the  means  in  the  different  arrays  (the  rows  or  the 
columns)  is  the  least  possible.2  There  are,  of  course,  two  such 
lines.  The  one  "best  fits"  the  means  of  the  rows,  the  other 
"best  fits"  the  means  of  the  columns. 

The  equation  of  the  line  drawn  through  the  means  of  the 
columns  (the  crosses  in  Diagram  XVII)  is  written  in  its 
simplest  form  3  as 

y  =  r^-x (28) 

1  There  is  a  certain  error  of  estimate  made  in  taking  a  man's  most  probable 
height  as  being  the  average  of  his  weight-group.  The  method  of  finding  the 
size  of  this  error  will  be  considered  later  on  page  1S3. 

2  For  a  mathematical  treatment  of  the  application  of  the  Method  of  Least 
Squares  to  the  problem  of  deducing  the  regression  equations,  see  Jones,  A  First 
Course  in  Statistics,  1921,  pp.  106ff  and  271. 

s  A  brief  review  of  the  equation  of  a  straight  line  and  of  the  method  of  plot- 


CORRELATION 


175 


The  expression  r—  is  called  the  regression  coefficient  and  is 

often  replaced  in  the  equation  by  the  expression  byx  or  612, 
so  that  (28)  is  sometimes  written  y  =  byx'X  and  y  =  bi2-x. 

If  we  substitute  the  values  of  r,  <ry,  and  <rx, — obtained  from 
Diagram  XXI — in  formula  (28)  we  have 

y=  .WX^y^-x  or  y  =  .51x, 

as  the  equation  which  measures  the  regression  of  height  on 

Y 


AB=3l 
/=6J 


0 


— x 


DIAGRAM  XXII 

(      . 

ting  a  simple  linear  equation  is  given  in  order  to  simplify  the  discussion  of  the 
regression  equations. 

Let  X  and  Y  be  coordinate  axes,  or  axes  of  reference.  Now  suppose  that  we 
are  given  the  equation  y=2x  and  are  required  to  represent  the  relation  between 
x  and  y  graphically.  To  do  this  we  substitute  values  for  x  in  the  equation  and 
compute  the  corresponding  values  of  y.  When  x  =  2,  for  example,  j/  =  2X2  or 
4;  when  a;  =  3,  y  =  2X3  or  6.  In  like  manner,  given  any  x  value,  we  can  com- 
pute the  y  which  will  "  satisfy  "  the  equation,  that  is,  make  the  left  side  equal 
to  the  right.  Now  if  the  series  of  points  determined  from  the  pairs  of  x  and  y 
values  as  given  by  the  equation  are  plotted  with  respect  to  the  X  and  Y  axes  (see 
Diagram  XXII)  they  will  be  found  to  fall  along  a  straight  line,  and  this  straight 
line  will  picture  the  relation  of  x  and  y,  y  =2x.  This  line  will  pass  through  the 
origin,  since  when  x  =  0,  y  also  equals  0.  The  equation  y  =  2x  represents,  then, 
a  straight  line  which  passes  through  the  origin  and  the  relation  of  its  points  is 

y 

such  that  -  (called  the  slope  of  the  line)  always  equals  2. 
x 

The  general  equation  of  any  straight  line  which  passes  through  the  origin 

may  be  written  y  =  mx,  where  m  is  the  slope  of  the  line.     If  we  replace  the  m 

of  the  general  formula  by  the  expression  r  •  —  we  see  at  once  that  the  regression 

<rx 
equation  in  deviation  form  is  simply  the  equation  of  a  straight  line  which  goes 
through  the  origin. 


176      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

weight.  This  equation  represents  a  straight  line  through  the 
origin,  and  hence  it  is  a  simple  matter  to  plot  it,  as  shown 
in  Diagram  XXIII.  First,  however,  we  must  draw  a  vertical 
line  through  the  point  63.4  kgs.,  the  mean  of  all  the  weights 
(the  X's)  in  the  table,  and  a  horizontal  line  through  172.6  cms., 
the  mean  of  all  the  heights  (the  Y's)  in  the  table.  These  two  lines 
are  the  coordinate  axes.  Now  since  our  plotted  line  must  go 
through  the  origin  [see  note (3),  page  175],  only  one  other  point  is 
needed  to  determine  it.  If  x  =  2  (any  value  will  do  just  as  well) , 
y  becomes  .51X2  or  1.02.  To  plot  this  point,  measure  out  2 
units  from  the  origin  along  the  horizontal  axis  and  go  up  1 .  02 
units  from  the  same  line.  This  will  locate  the  point,  x  =  2, 
y  =  1.02.  (Any  convenient  scale  may  be  used  for  measuring 
off  x  and  y  distances — a  mm.  rule  is  useful.) 

The  line  drawn  through  the  point  just  located  and  the 
origin  (0,  0)  is  the  regression  line  of  height  on  weight.  From 
the  equation,  it  is  clear  that  a  point  on  this  line  with  an  a:- value 
of  1.00  has  a  corresponding  y~ value  of  .51  (substitute  x=l 
in  the  equation  and  2/=. 51).  This  means  that  a  deviation 
of  1  unit  from  the  mean  of  the  X's  (from  the  vertical  line 
drawn  through  the  mean  weight  of  the  group)  is  accompanied 
by  just  .  51  time  as  much  deviation  from  the  mean  of  the  F's 
(from  the  horizontal  line  drawn  through  the  mean  height  of 
the  group)  (see  Diagram  XXIII).  Put  concretely,  a  man 
who  stands  1  kg.  above  the  average  weight  of  the  group  is 
most  probably  .51  cm.  above  the  mean  height  of  the  group 
also — if  his  weight  is  64.4  kgs.  (63.4+1.00)  his  height  is 
probably  173.11  cms.  (172.6+.51).  To  take  another  exam- 
ple, the  man  who  weighs  60  kgs. — stands  3.4  kgs.  below  the 
mean  weight — is  most  probably  170.87  cms.  tall — stands  1.73 
cms.  below  the  mean  height.  In  this  example,  we  substitute 
#=—3.4  in  the  equation,  and  y=— 1.73.  In  general  then 
we  know  from  the  regression  equation  that  the  most  prob- 
able deviation  of  any  individual  in  our  group  *  from  the  mean 

1  Or  in  the  population  from  which  our  group  of  120  is  drawn,  provided  the 
group  is  a  random  sample. 


CORRELATION 


177 


DIAGRAM  XXIII 

Illustrating  Position  op  the  Regression  Lines,  and  Calculation 
or  the  Regression  Equations 

(Calculation  of  r  repeated  from  Diagram  XXI) 


4549 

Weight 
50-54   55-59 

in  kgB.  (X-variable) 
60-64  65-69    70-74    75-79 

80-84  Tu 

to 

7. 

TO 

~T 
1 

1 

12 

1 

3  3 

<3  rt 

(-2) 
1 

-2 

1° 

(2; 
3 

6      / 

/i 

16 

2<0) 

Ha 

16 

3  £; 

b* 

2  b 

(-1) 
4 
-4 

i° 

/l) 
/6 

<2> 
"6 

2 
6 

(4) 

2 
8 

28 



-2-° 

0 
— 9  — 

i? 

S'      0 
—  8 — 

0 
-■2-- 

0 



33 

*    OS 

•9  S 
"3  55 

1* 

3       „ 

J? 

7  / 

'1 

(-1) 

3 
-3 

26 

1 
6 

8      / 

/  1 

14 

<l° 

(-2) 
2 

-4 

13 

«3 

1(9) 
9 

6 

ii° 
i 

3 

Dy 

3 

FDy 

3 

9 

Zx'y' 
+ 
12 

2 

32 

64 

58 

2 

1 

28(03) 

28 

2G 

i 

0 

-1 

-26 

26 

20 

S 

o 

-26 

52 

28 

i 

Fx      3        10        28        37 
£>x  "3     -  2     -  1          0 

22 
1 

9 
2 

5         6       120 
3          4 

2           206           159         - 
(14ft) 

FZ>c  -9      -20     -28 (-57) 

22 

18 

15        24  (79)  =  22. 

FDX    27       40        28 

22 

36 

45        96  =  294 

Calculation  of  r: 

*-j||-.  017 

( 

CX 

=  120=183 
c2*=.0334 

146 
if-.017X.183 

c22/=.0003 

1.31X1.55 

CV=.085 

Cx=.915 

=  .60 

(?A(7)  =  172.5 

( 

?A(X)=62.5 

P#r=. 04 

Aver.(F)  =  172.6 

Av 
X5 

3I\(X)=( 
<TX=' 

33.4 

/206 
°y=  \120~ 

0003 

/294 
Vl20 

.0334X5 

=  6.55 

=7 

r.75 

Calculation  of  Regression  Equations: 
I.  Deviation  Form: 

(1)  y=.mx^iix=.51x 


7.75 

7  7^ 


71?/ 


II.  Score  Form: 

(1)  7-172.6=.51(X-63.4) 
7=.51X+140.3 

(2)  X-63.4=, 71(7-172. 6) 
X=. 717-59.1 


Calculation  of  Standard  Errors  of  Estimate: 

o-(est.  Y)=6.55X.8  =  5.2  cms. 
<r(est.X)=7.75X.8  =  6.20  kgs. 


178       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

height  is  just  .51  as  great  as  his  deviation  from  the  mean 
height.  Hence,  given  a  man's  deviation  from  the  mean  weight, 
we  are  able  to  predict  his  most  probable  deviation  from  the  mean 
height  of  the  group. 

The  regression  equation,  y  =  r-  —  -x,  is  known  as  the  regres- 

sion  equation  of  Y  on  X  in  Deviation  Form.  Stated  generally, 
this  equation  measures  the  most  probable  deviation  of  any  Y 
measure  from  the  mean  Y  corresponding  to  a  known  deviation 
in  the  X  measure  from  the  mean  X. 

The  equation  of  the  second  regression  line  drawn  through 
the  means  of  the  rows  (the  circles  of  Diagram  XVII)  is  written 

x  =  r-  —  -y (29) 

Gy 

This  equation  measures  the  regression  of  X  on  Y  and  in  the  pres- 
ent problem,  of  weight  on  height.    The  regression  coefficient  r  •  — 

<Ty 

is  sometimes  replaced  by  the  expression  bxy  or  621,  so  that 
(29)  is  often  written  x  =  bxy-y  or  £  =  621-2/. 

If  we  substitute  in  (29)  the  values  of  r,  ax,  and  try  found 
from  Diagram  XXI,  we  have 

7  75 
x=  .Q0X7r-^-y  or  x=  .71?/, 
0.55 

as  the  equation  which  measures  the  regression  of  weight  on 
height.  This  equation,  like  the  other,  represents  a  straight  line 
through  the  origin;  and  consequently,  one  point  on  the  line 
together  with  the  origin  (0,  0)  are  sufficient  to  plot  the  line. 
Put  y  =  l  in  the  equation,  and  x  will  equal  .71.  Now  plot 
the  point  a;  =.71,  y  =1.00  on  the  diagram,  and  draw  the 
regression  line  through  this  point  and  the  origin  (see  Diagram 
XXIII). 

It  is  evident  from  the  second  regression  equation  that  a 
deviation  of  1  cm.  from  the  mean  of  all  the  heights  (F's)  is 
most  probably  accompanied  by  a  deviation  of  .71  kg.  from  the 


CORRELATION  179 

mean  of  all  the  weights  (X's) ;  or  put  in  a  different  way,  the  most 
probable  deviation  of  any  man  from  the  mean  weight  is  just 
.71  as  great  as  his  deviation  from  the  mean  height.  A  man 
180  cms.  tall,  for  example  (7.4  cms.  above  the  mean  height), 
most  probably  weighs  68.65  kgs. — is  5.25  kgs.  above  the 
mean  weight).  (To  get  this  result  substitute  7.4  for  y  in  the 
equation,  and  solve  for  x.) 

The  equation  x  =  r y  is  known  as  the  regression  equation 

(Jy 

of  X  on  Y  in  deviation  form.  To  summarize  briefly  it  measures 
the  probable  deviation  of  an  X-measure  from  the  average  Xy 
corresponding  to  a  known  deviation  in  the  F-measure  from  the 
average  Y. 

Although  there  are  two  regression  equations,  both  of 
which  involve  x  and  y,  the  student  must  bear  in  mind  the 
important  fact  that  the  two  equations  cannot  be  used  inter- 
changeably and  that   neither  can   be   used  to   predict  both  x 

and  y.     The   first   regression    equation,    y  —  r-  —  -x,    is    to   be 

<J* 

used  only  when  y  is  to  be  predicted  from  x  (when  y  is 
the  "  dependent  "  variable),  while  the  second  regression  equa- 
tion, x  —  r-—  -y,  is  to  be  used  only  when  x  is  to  be  predicted 

(Jy 

from  y  (when  x  is  the  "  dependent "  variable).1  There 
are  always  two  regression  equations  unless  the  correlation  is 

perfect.  When  r=1.00,  however,  the  equation  y  =  v— -x 
becomes  y  =  ~.x,  or  ax-y  =  cry-x)  while  the  equation  x  =  r-—  -y 

<JX  (Jy 

becomes  x  =  —  •  y,  or  o-x-y  =  ay-x.     The  two  equations  are  now 

(Jy 

identical,  and  the  regression  lines  coincide. 

As  an  illustration  of  this  last  condition  suppose  that  the 

*  A  dependent  variable  depends  for  its  value  on  the  other  variable  in  the 

equation.     Thus  in  the  equation  y  =  r —  •£,  y  "  depends  "  on  the  value  given  x, 

ax 


180       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

correlation  between  height  and  weight  is   perfect,  ax  and  trw 

remaining  the  same.     The  first  regression  equation  would  now 

6.55 
become  y  =  1 . 00 X 7'7g -x,  or  y=  . 85?/,  while  the  second  regres- 

7  75 
sion  equation  would  become  x  =  1 .  00 X w-r=  'V,  or  x  =  1 .  18z/. 

Algebraically,  x— 1.18  z/  is  equivalent  to  y=  .85x  (since  in  the 

second  equation  #  =  -— ,  or  x  =  1.18y).     Under  the  prescribed 

.  oo 

conditions,  therefore,  we  should  have  a  single  equation  and  a 
single  line,  which  would  represent  equally  well  a  change  (devi- 
ation) in  Y  for  a  given  change  in  X,  or  a  change  (deviation) 
in  X  for  a  given  change  in  Y.  It  may  be  added  that  when 
r=1.00,  and  in  addition  the  two  as  are  equal  or  are  made 
equal  by  the  arrangement  of  the  diagram,  the  single  regression 
line  makes  an  angle  of  45  degrees  with  the  horizontal  axis  (see 
Diagram  XVIII,  and  the  discussion  on  pages  161-162). 

2.  The  Regression  Equations  in  Score  Form 

In  the  last  paragraph  the  point  was  stressed  that  formulas 
(28)  and  (29)  are  the  equations  of  the  regression  lines  in  devi- 
ation form — that  values  of  x  and  y  substituted  in  these  equa- 
tions are  deviations  from  the  means  of  the  X  and  Y  distribu- 
tions and  not  actual  scores  or  measures.1  While  equations  in 
deviation  form  are  all  that  we  actually  need  for  purposes  of  predic- 
tion, it  is  often  very  convenient  to  be  able  to  estimate  an  indi- 
vidual's actual  score  in  Y,  say,  directly  from  his  score  in  X  with- 
out the  trouble  of  first  converting  the  X-score  into  a  deviation 
from  the  mean  X.  This  can  be  done  very  simply  if  we  emplo}^ 
the  score  form  rather  than  the  deviation  form  of  the  regression 
equation.  The  conversion  of  deviation  to  score  form  may  be 
made  as  follows.  Let  the  average  of  the  F's  be  denoted 
by  Y'  and  any  F-score  by  Y,  then  the  y  deviation  of  anjr 
individual  from  the  mean  will  be  Y—Y'  (the  difference  between 

1  The  small  letters  x  and  y  are  used  to  denote  deviations  from  the  means  of 
the  X  and    Y  distributions.     The  large  letters  X  and    Y  denote  actual  scores. 


CORRELATION  181 

the  score  and  the  mean)   or,  in  general,  y=Y—Y'.     In  the 

same  way,  we  can  show  that,  in  general,  x  =  X  —  X\  when  x 
is  the  deviation  of  any  X  score  from  the  mean  X  from  X'. 

Now  substitute  7  —  Y'  for  y  and  X—X'  for  x  in  formulas 
(28)  and  (29)  and  the  two  regression  equations  become, 

Y-Y'  =  r-^(X-X')  or  Y  =  r-^(X-X')  +  Y',    .     (30) 
and 

X-X'  =  r--(7-7')  or  X  =  r.-(7-7')+X',    .     (31) 

Gy  Gy 

These  are  the  equations  of  the  two  regression  lines  in  score 
form.  In  both  equations,  X  and  Y  now  represent  actual  scores 
and  not  deviations  from  the  means  of  the  two  distributions. 

If  we  substitute  in  (30)  the  values  for  Y' ',  r,  ay,  gx,  and  X' 
obtained  from  Diagram  XXIII,  the  equation  becomes 

7-172.6=  .60x!^(^-G3.4), 
i  .  t  o 

or,  clearing  of  fractions, 

F=.51X+140.3. 

To  illustrate  the  use  of  this  equation,  let  us  suppose  that  a  man 
in  our  group  weighs  60  kgs.  (X)  and  that  we  wish  to  estimate 
his  most  probable  height  (7).  Substituting  60  for  X  in  the 
equation,  7  =  170.9;  and  accordingly  the  most  probable 
height  of  a  man  who  weighs  60  kgs.  is  170.9  cms. 

If  the  problem  is  to  predict  weight  instead  of  height,  we 
must  use  equation  (31).  Substituting  the  values  for  X',  r, 
ay,  ffx,  and  Y'  in  the  second  equation  we  have 

X-63.4=  .  60X^45(7-- m.  6) 
6,55 

or 

X=.  717-59.1. 

Now  given  a  man  180  cms.  tall,  we  find  putting  180  for  7  in 
the  formula,  that  X  =  68.7  kgs.  Hence  the  most  probable 
weight  of  a  man  180  cms.  tall  is  68.7  kgs. 


182       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

It  may  seem  strange  to  the  student  to  talk  of  "  pre- 
dicting "  a  man's  height  from  his  weight,  when  we  already 
know  the  height  and  weight  of  all  120  men  in  our  group.  Of 
course  when  we  have  both  height  and  weight  it  is  unneces- 
sary to  convert  one  into  the  other.  Suppose,  however,  that 
all  we  know  about  a  certain  man  is  his  weight  and  the  fact 
that  he  falls  within  the  age-range  of  our  group  of  120  men. 
Now  since  we  know  the  correlation  between  height  and  weight 
in  this  group  it  is  possible  from  the  regression  equation  to 
predict  the  most  probable  height  of  our  subject  in  lieu  of 
actually  measuring  him.  In  the  same  way,  the  regression 
equation  may  be  used  to  predict  the  height  of  any  man  in  the 
population  from  which  our  group  is  taken,  provided  our  group 
is  a  random  sample  of  the  larger  group.  The  regression  equa- 
tions hold,  of  course,  only  for  the  population  from  which  the 
sample  group  is  drawn.  We  could  not,  of  course,  estimate  the 
probable  heights  of  children  or  of  women  from  a  regression 
equation  which  had  been  worked  out  for  men  between  the  ages 
of  18  and  25  (the  age-range  of  the  men  in  our  group  of  120). 
And  conversely,  we  could  not  expect  regression  equations 
worked  out  for  elementary  children  to  hold  for  older  groups. 

Probably  height  and  weight — since  they  are  both  easily 
measured — do  not  show  the  value  of  the  regression  equations 
as  well  as  other  and  more  complex  traits.  To  take  a  problem 
of  more  direct  interest,  suppose  that  in  a  group  of  children 
of  approximately  the  same  age  the  r  between  IQ  and  average 
grades  made  in  the  first  year  of  high  school  works  out  to 
be  .70.  Now  if  we  know  the  IQ  of  a  child  entering  school 
the  next  year,  it  is  possible  to  estimate  what  his  probable 
scholastic  performance  will  be  from  the  regression  equation 
worked  out  from  the  group  of  the  previous  year.  This  may 
be  extremely  valuable  in  educational  guidance.  The  same 
thing  is  true  of  vocational  guidance — we  may  be  able  on  the 
basis  of  test  scores  to  predict  the  probable  success  of  an  individ- 
ual who  contemplates  entering  a  certain  trade  or  profession, 
and  thus  advise  him  more  intelligently. 


CORRELATION  183 

3.  The  Reliability  of  the  Predictions  Made  by  the  Regression 
Equations 

A.  The  Standard  Error  of  Estimate,  a{eKt.h  or  S 

We  have  constantly  referred  to  the  values  of  X  and  Y 
"  predicted  "  from  the  regression  equations  as  being  the  "  most 
probable  "  values  of  the  one  variable  accompanying  the  given 
value  of  the  other.  The  method  of  showing  just  how  reliable, 
i.e.,  how  probable,  our  predicted  values  are,  is  to  calculate 
their  standard  error  of  estimate,  written  o-(est).  To  find  the 
accuracy  with  which  we  are  able  to  estimate  F-values  from 
equation  (30) ,  we  employ  the  formula x 

0"(est.  y)  =  oyvl  —  f2, (32) 

in  which  <jy  is  the  <r  of  the  F-distribution,  and  the  "  (est.)"  is 
to  distinguish  its  <j  from  the  expressions  o-(dis.),  0"(aver.)>  etc.,  r  is, 
of  course,  the  coefficient  of  correlation  between  X  and  Y. 

Now  from  equation  (30)  we  have  found  that  a  man  weigh- 
ing 60  kgs.  is  most  probably  170.9  cms.  tall  (see  page  181). 
To  find  the  reliability  of  this  estimate  substitute  in  formula 

(32),  to  find,  

<r(est.y)  =  6.55xVl-.62  =  5.2. 

We  may  now  say  that  the  most  probable  height  of  a  man  weigh- 
ing 60  kgs.  is  170.9  cms.  with  a  o-(est.)  of  5.2  cms. — and  that 
the  chances  are  68  in  100  that  the  actual  height  of  the  given 
individual  falls  within  the  limits  170. 9  =±=5. 2,  or  between  165.7 
cms.  and  176 . 1  cms.  We  may  be  practically  certain  that  the 
height  of  this  man  falls  within  the  limits  170.9±3X5.2;  or 
between  155.3  cms.  and  186.5  cms. 

In  order  to  find  with  what  degree  of  accuracy  we  are  able 
to  predict  X  values  from  equation  (31)  we  use  the  formula,2 


o-(est.x)  =  o-xV/l  — r2, (33) 

in  which  <tx  is  the  a  of  the  X-distribution. 

1  c(est.  Y)  is  sometimes  written  Sy, 

2  o"(est.  X)  is  sometimes  written  Sx- 


184       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

We  have  already  found  from  formula  (31)  that  the  most 
probable  weight  (X)  of  a  man  180  cms.  tall  is  68.7  kgs.  (see 
page  181).  To  find  the  cr(est.  X)  of  this  prediction  we  substitute 
for  ax  and  r  in  formula  (33)  : 

<r(est.x)  =  7.75xVl-.62  =  6.2. 

Hence  the  most  probable  weight  of  a  man  in  our  group  (or  in 
the  population  from  which  it  is  drawn)  who  is  180  cms.  tall  is 
68.7  kgs.  with  a  (7(est.)  of  6.2  kgs.  The  chances  are  68  in  100 
that  the  actual  weight  of  this  man  falls  within  the  limits 
68.7±6.2,  or  between  62.5  and  74.9  kgs.  We  may  be  prac- 
tically certain  that  his  weight  falls  within  the  limits  68.7±3X 
6 . 2  or  between  50 . 1  and  87 . 3  kgs. 

B.  The  Probable  Error  of  Estimate,  PE(est.) 

The  Pi^t.)  may  be  used  for  estimating  the  accuracy  of  a 
prediction  instead  of  c(est.).  PE{esU)  is  obtained  by  simply 
multiplying  0-(e8t.)  by  the  constant  .6745.     Thus 

P£(est.y)=. 6745X^1^  ....     (34) 
and 

P^(est.x,=  .0745XcrxVl^7,  ....     (35) 

The  height  of  a  man  who  weighs  60  kgs.  has  been  estimated 
to  be  170.9  cms.  with  a  o-(est.  d  of  5.2  cms.  The  PE{a3bmY}  of 
this  estimated  height  is  .6745X5.2  or  3.5  cms.  The  chances 
are  even,  therefore,  that  the  actual  height  of  this  man  falls 
within  the  limits  170.9±3.5  or  between  167.4  and  174.4  cms. 

In  like  manner,  since  the  estimated  weight  of  a  man  ISO 
cms.  tall  is  68.7  kgs.  with  a  o-(est. X)  of  6.2,  the  PEiesuX)  of  this 
man's  weight  will  be  .6745X6.2  or  4.2  kgs.  The  chances  are 
even  that  this  man's  actual  weight  lies  within  the  limits 
68.7d=4.2  or  between  64.5  and  72.9  kgs. 

The  formulas  for  <r(est.)  and  P£,(est,)  measure  the  error  made 
in  taking  predicted  instead  of  actual  X  and  Y  scores.     Note 

that  when  r=1.00,  VI- r2  is  0;  and  consequently  since  both 


CORRELATION  185 

o-(est.)  and  PE {est.)  are  then  zero,  there  is  no  error  of  prediction. 
This  result  follows  because  all  of  the  paired  scores  fall  on  the 
one  double  regression  line  when  r=1.001  (see  page  161). 

An  inspection  of  the  formulas  for  o-(est.)  and  PE^U)  shows 
that  the  accuracy  of  the  prediction  from  the  regression  equa- 
tions depends  upon  the  o-'s  of  the  two  distributions  (the  uv 
and  crx)  and  upon  the  degree  of  correlation  between  the  two 
traits.  If  the  variability  in  Y,  say,  is  small,  and  the  correlation 
between  Y  and  X  high  (e.g.,  .90  to  1.00)  values  of  Y  can  be 
predicted  from  known  values  of  X  with  a  comparatively  high 
degree  of  accuracy.  When  the  variability  is  large  or  the  correla- 
tion low,  however,  the  prediction  often  becomes  so  unreliable 
as  to  be  almost  valueless;  and  even  with  a  fairly  high  coeffi- 
cient, predictions  will  often  have  such  a  large  error  of  estimate 
as  to  be  almost  valueless.  Thus,  in  spite  of  the  fact  that  an 
r=.60  is  usually  considered  fairly  substantial,2  we  can  only 
predict  a  man's  height  (F),  knowing  his  weight  (X),  within  a 
PE{est.)  of  3.5  cms.  In  other  words,  the  chances  are  only  50 
in  100  that  the  actual  height  does  not  differ  from  the  predicted 
height  by  more  than  ±3.5  cms. 

When  using  the  regression  equations  for  prediction,  the 
o-est.  or  the  PEest.  should  always  be  given.  In  general,  the 
value  of  a  prediction  will  depend — in  addition  to  the  size  of 
the  error  of  estimate — upon  the  fineness  of  the  units  of  measure- 
ment and  the  purposes  for  which  the  prediction  is  made. 

VI.  The  Complete  Solution  of  a  Correlation  Problem 

In  Diagram  XXIV  will  be  found  the  complete  solution  of 
a  second  correlation  problem.  The  purpose  of  another 
"  model  "  problem,  in  addition  to  the  height-weight  problem 
in  Diagram  XXIII,  is  to  strengthen  the  student's  grasp  on  cor- 
relation by  having  him  work  through  the  steps  in  finding  r 
and  the  regression  equations  with  a  new  set  of  data.     Often- 

1  See  Monroe,  An  Introduction  to  the  Theory  of  Educational  Measurements, 
1923,  pp.  351-353,  for  a  graphical  demonstration  of  the  meaning  of  <r(est.). 

2  See,  however,  the  discussion  of  high  and  low  correlation  on  page  288ff. 


186       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 


DIAGRAM  XXIV 

To  Illustrate  the  Complete  Solution  of  a  Correlation  Problem 


IQ  First  Test(X -variable) 
90-   95-"  100  105-110- 115- U20- 125- 130-  135- 140- 145- 150- 
94     99   104   109  114  119   124  129  134   139  144  149   154-^2/ 


155-159 

150-154 

145-149 

~  140-144 
a 

|  135-139 
g  130-134 


Dy 

8 

7 

6 

5 

4 

3 

2 

1 

0 

-  1 

-2 

-3 

-4 

-5 

-6 


FDy 

24 

192 

+   - 
IS 

14 

98 

13 

12 

72 

13 

40 

200 

37 

24 

90 

24 

21 

03 

21 

26 

52 

26 

13  (174)  13     13 


-19 
-24 
-45 
-24 
-  15 


19 
48 
135 
96 
75 


-    C(-133)36 
41         1195 


3 

13 
31 
17 
14 
5 


144 

91 
78 

185 
96 

63 
52 

13 

3 

26 
93 
68 
70 
30 
1012 


FDX  -15 -12 -24-28 -21  (-100)  14  28    24  44     35   24    21(l90)  =  90 


FD% 


75  48  72  56  21 


14  56  72  176  175  144  147  =  1056 


,  41-     S 

ch  =  .  09 
Cv=1.5 
Afi/  =  117.5+1.5 
=  119 


Cx 


90  AA 


Calculation  of  r: 
1012 


c2x=.44 
Cx  =  3.30 
Mx  =  117.5 +3.30 

=  120.8 


r  = 


136 


.3X66 


2.95X2.71 
=  .91 
PEr=.  01  (Table  XVIII) 


<rv=y 


1195 

133 
=  2.95X5 
=  14.75 


09X5      ax  = 


A 


1056 


136 
=  2.71X5 
=  13.55 

Calculation  of  Regression  Equations: 
I.  Deviation  Form: 


,44X5 


y 


yiX13.55X 


Q1v13j5 

X=  .91X,  ;    „rV 


99.c 
S4y 


14.75 

Calculation  of  PEW.)  

PE {sst.  Y)  =  .  0745 X 14 . 75 X  Vl-(.91)2 

=  4.12(4) 
PEm.  X)  =  ■  6745  X 13 .  55  X  ^T~ 

=  3.79(4) 


II.  Score  Form: 

r-119=.99(X-120.S) 
F=.99X-.59 
X-120.8=.S4(F-119) 
X=.S4F+20.S 
Examples : 
Let  X  =  100 

F  =  99-.59or9S±4 
Let  X  =  120 

r=ii8d=4 

(.91 2)     Let  F  =  100 

Ar  =  S4+20.84 

=  104=fc:4 


CORRELATION  187 

times  when  only  a  single  model  problem  is  given,  one  fails 
to  understand  certain  points  in  the  solution  which  another 
entirely  different  problem  will  succeed  in  clearing  up.  A  brief 
discussion  of  the  important  points  in  the  solution  of  this  prob- 
lem will  be  given  in  the  following  paragraphs,  which  the  student 
should  read  with  Diagram  XXIV  before  him. 

The  problem  is  to  find  the  relation  between  the  7Q's  of  136 
children  (of  same  chronological  age)  as  determined  from  two 
individual  intelligence  tests.  The  correlation  table  has  been 
constructed  from  a  scatter  diagram  as  explained  on  page  154. 
The  first  set  of  IQ's  is  the  X- variable,  and  the  second  set  of  IQJs 
the  F-variable.  Since  the  calculations  of  the  two  averages, 
cx,  cy,  <TX,  and  <rv,  cover  familiar  ground  and  have  been  given 
in  detail  on  the  diagram,  they  need  not  be  repeated. 

Note  first,  then,  that  the  product-deviations  in  the  "Zx'y' 
column  have  been  taken  from  column  115-119  (the  column 
containing  the  GA  of  the  X-distribution)  and  row  115-119 
(the  row  containing  the  GA  of  the  F-distribution) .  The 
entries  in  the  Hx'y'  column  have  been  obtained  by  the  shorter 
method  described  on  page  167 — each  cell  frequency  in  a  given 
row  has  been  multiplied  by  its  Dx,  and  the  sum  of  these  partial 
deviations  entered  in  the  column  Zsc'.  This  entry  has  then  been 
"  weighted  "  (multiplied)  once  for  all  by  the  Dy  of  the  whole 
row.  To  illustrate,  in  the  first  row  (reading  from  left  to  right) 
we  have  (IX 5)  + (IX 6)  + (1X7),  or  18,  as  2x'  entry.  (The 
DJs  are  5,  6,  and  7,  respectively,  and  may  be  found  from  the 
Dx  row  at  the  bottom  of  the  diagram.)  The  common  Dy  is  8, 
hence  the  2x'y'  entry  is  18X8  or  144.  Again  in  the  eighth  row, 
we  have  (3X-1)  +  (2X0)  +  (3X1)  +  (3X2)  +  (1X3)  +  (1X4)  or 
13  as  the  Xx'  entry.  The  Dv  of  this  row  is  1,  and  hence  the 
Xx'y'  entry  is  13.  To  take  still  another  example,  in  the  eleventh 
row  we  have  (2X -3)  +  (3X-2)  +  (3X -1)  +  (2X0)  +  (2X1)  or 
—  13  as  the  2a/.  Since  the  common  Dy  is  (  —  2),  the  x'y'  entry 
here  is  +26. 

After  all  of  the  2x'yf  entries  have  been  made  and  the  sum  of 
the  column  found,  the  calculation  of  r  from  formula  (23)  and  of 


188       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

PEr  from  formula  (26)  are  simply  matters  of  substitution. 
Remember  that  cX}  cy,  <rv,  ax,  are  all  left  in  units  of  step-interval 
in  the  r  formula  (see  page  167). 

The  regression  equations  in  Deviation  Form  under  (1) 
have  been  found  by  substituting  the  values  of  r,  crx,  and  ay  in 
formulas  (28)  and  (29),  and  the  two  straight  lines  which  these 
two  equations  represent  have  been  plotted  on  the  diagram. 
So  far  as  the  actual  solution  of  the  problem  is  concerned,  it  is 
unnecessary  to  plot  these  lines.  They  are  of  value,  however, 
in  indicating  whether  the  means  of  the  X  and  Y  arrays  may  be 
fairly  represented  by  straight  lines;  i.e.,  whether  the  regression 
is  apparently  "  linear."  If  the  relation  is  not  "  straight-line," 
other  methods  must  be  employed  in  calculating  the  correlation 
(see  page  203.) 

The  regression  equations  in  Score  Form  have  been  found, 
the  one  by  substituting  the  two  averages  and  the  regression 
coefficient  of  Y  on  X  (.99)  in  formula  (30),  and  the  other  by 
substituting  the  two  averages  and  the  regression  equation  of 
Ion  7  (-84)  in  formula  (31).  The  calculation  of  the  two 
PE's  of  estimate  is  shown  on  the  Diagram.  PE^est,  Y)  is  found 
from  formula  (34) ;   PE(esU  X)  from  formula  (35) . 

Several  examples  have  been  given  in  the  diagram  to  illus- 
trate the  use  of  the  regression  equations  in  "  prediction." 
Note  that  an  IQ  of  100  on  the  first  test  (X)  is  most  probably 
accompanied  by  an  IQ  of  98  on  the  second  test  (Y)  with  a 
PE(est.  Y)  of  4 .  12  (4)  points.  The  chances  are  50  in  100  that 
the  actual  IQ  on  the  second  test  falls  within  the  limits  98  ±4, 
or  between  102  and  94.  An  IQ  of  120  on  the  first  test  (X)  is 
most  probably  accompanied  by  an/Q  of  118  points  in  the  second 
test  (F),  and  the  PE{est,  y>  is  again  4  points.  All  predicted  F's 
have  the  same  error  of  estimate,  no  matter  where  on  the  scale 
the  Y  may  fall. 

While  the  errors  of  estimate  <T(est.)  and  PE{est.)  have  been 
used  hitherto  for  the  purpose  of  giving  the  reliability  of  specific 
predicted  scores,  they  may  also  be  interpreted  in  a  more 
general  fashion.    A  P^(est.  r>,  for  instance,  of  4  points  may  be 


CORRELATION  189 

taken  to  mean  that  one  half  of  the  IQ's  in  test  Y  failed  of  per- 
fect correlation  with  the  IQ's  in  test  X  by  ±4  points  or  more, 
while  the  other  one  half  failed  of  perfect  correlation  by  less 
than  ±4  points. 

In  most  correlation  problems  we  are  interested  in  pre- 
dicting the  scores  on  only  one  test.  (F  is  usually  taken  as  the 
dependent,  and  X  the  independent  variable.)  For  illustrative 
purposes,  however,  an  example  is  given  in  Diagram  XXIV  of 
the  prediction  of  an  IQ  in  X  from  an  IQ  in  Y.  Thus  for  an 
IQ(Y)  of  100  we  find  the  most  probable  IQ(X)  to  be  104  with 
a  PElesb,  X)  of  3 .  79  (4)  points.  The  chances  are  50  in  100 
that  the  actual  IQ(X)  falls  within  the  limits  104  ±4  points  or 
between  100  and  108. 

VII.  Methods   of   Measuring    Correlation   Which   Take 
Account  Only  of  Relative  Position  or  Rank 

In  many  problems,  especially  in  the  fields  of  applied  and 
vocational  psychology,  the  investigator  finds  that  he  must 
work  with  data  in  which  differences  in  capacity  or  merit  are 
expressed  in  ranks  rather  than  in  graded  scores  or  measures. 
To  mention  a  few  cases  of  this  sort,  we  have  individuals  ranked 
in  order  of  merit  for  honesty,  athletic  ability,  salesmanship, 
or  intelligence;  and  advertisements,  colors,  etc.,  ranked  for 
esthetic  qualities,  beauty,  or  individual  preference.  In  com- 
puting correlations  from  such  material  as  this  it  is  neccessary 
to  use  methods  which  take  account  only  of  the  relative  posi- 
tions or  ranks.  Also,  when  we  have  only  a  few  scores  (10  to 
25  for  example),  it  is  often  advisable  to  rank  these  in  orders 
of  merit  and  compute  the  correlation  by  a  rank  method  instead 
of  by  the  longer  and  more  laborious  product-moment  method. 
Coefficients  of  correlation  calculated  from  a  few  cases  are 
nearly  always  unreliable,  and  of  little  value  except  in  sug- 
gesting the  possible  existence  of  relation,  or  as  a  preliminary 
survey.  In  such  cases,  therefore,  simple  methods  are  recom- 
mended, as  they  save  much  time  and  labor  besides  giving 


190       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

results  which  are  as  good  as  those  secured  by  more  elaborate 
methods. 

In  the  present  Section  we  shall  consider  two  methods  of 
finding  the  correlation  when  the  data  to  be  correlated  have 
been  arranged  in  orders  of  merit.  These  methods  are  known 
respectively  as  (1)  the  Method  of  Rank-Differences,  and  (2) 
the  Method  of  Gains  or  the  Spearman  "  Footrule." 

1.  The  Method  of  Rank-Differences 

The  method  of  rank-differences  is  illustrated  in  Table  XIX. 
The  problem  is  to  find  the  relation  between  the  length  of 
service  and  the  selling  efficiency  of  12  salesmen.  The  men  are 
listed  in  column  1,  and  in  column  2,  opposite  the  name  of  each 
man,  is  given  the  number  of  years  he  has  been  in  the  service  of 
the  company.  In  column  3,  the  men  are  ranked  in  order  of 
merit  in  accordance  with  the  length  of  their  service.  For 
example,  G  who  has  been  longest  with  the  company  is  ranked 
1;  C,  the  next  longest,  is  ranked  2;  and  so  on  down  the  list. 
Notice  that  both  A  and  J  have  the  same  period  of  service,  and 
that  each  is  ranked  7.5.  Instead  of  ranking  one  7,  and  the 
other  8,  or  both  7  or  8,  we  compromise  by  ranking  both  7.5, 
and  F  who  follows  9.1 

In  column  4  the  men  are  ranked  in  order  of  merit  for  effi- 
ciency by  the  salesmanager.  The  most  efficient  man  (C)  is 
ranked  1,  the  least  efficient  (B)  is  ranked  12.  In  column  5, 
the  difference  (the  "D")  between  each  man's  efficiency  rank 
and  his  years  of  service  rank  is  entered,  and  in  the  next  column 
(6)  each  of  these  D's  is  squared.  The  correlation  between  the 
two  orders  of  merit  may  now  be  computed  by  substituting  for 
2D2  and  N  in  the  formula, 

62D2 

p=1-ww^Ty    (36) 

1  When  three  or  more  individuals  (or  specimens  of  any  sort)  are  tied — 
have  the  same  score — the  simplest  plan  is  to  give  them  all  the  median  order  of 
merit  rating.  Thus  three  individuals  who  are  5,  6,  and  7,  respectively,  are  all 
ranked  6,  and  the  next  following  8;  while  four  individuals  who  are  5,  6,  7,  and 
8,  are  all  ranked  6.5,  and  the  next  following  9. 


CORRELATION 


191 


TABLE  XIX 

To  Illustrate  the  Rank-Difference  Method  of  Finding  Correlation 


(l) 

Salesmen 

A 
B 
C 
D 
E 
F 
G 
H 
I 
J 
K 
L 
AT  =  12 


(2) 

Years  of 
Service 

5 

2 
10 

8 

6 

4 
12 

2 

7 

5 

9 

3 


(3) 
Order  of 

Merit 
(Service) 

7.5 
11.5 

2 

4 

6 

9 

1 
11.5 

5 

7.5 

3 
10 


(4) 

Order  of 

Merit 

(Efficiency) 

6 
12 

1 

9 

8 

5 

2 
10 

3 

7 

4 
11 


=  1 


62D2 


N(N2~1) 


=  1 


6X58 

12(143) 


(5) 

Difference 

between 

Ranks 

0>) 

1.5 

.5 
1.0 
5.0 
2.0 
4.0 
1.0 
1.5 
2.0 

.5 
1.0 
1.0 


=  .80 


From  Table  XX  r=.  81. 


P^Jgg^S,  ,07 


(6) 

Difference 
Squared 

(Z>2) 

2.25 

.25 

1.00 

25.00 

4.00 

16.00 

1.00 

2.25 

4.00 

.25 

1.00 

1.00 


58.00 


[See  formula  (37)] 


in  which  D  represents  the  difference  in  the  rank  of  an  individual 
in  the  two  series,  and  2D2  is  the  sum  of  the  squares  of  all  such 
differences.  N  is,  of  course,  the  number  of  cases,  and  p  is 
the  rank  order  coefficient  of  correlation,  p  may  be  transmuted 
into  a  product-moment  r  by  means  of  Table  XX. 

Substituting  58  for  2D2  and  12  for  N  in  formula  (36),  we 
obtain  a  p  of  .80,  and  from  Table  XX  this  is  found  to  be 
equivalent  to  an  r  of  .81.  The  PE  of  an  r  found  from  a  p, 
is  about  5%  larger  than  the  PE  of   the   product-moment  r.1 


The  formula  is 


PEr  = 


7063(1 -r2) 


Vn 


(37) 


and  since,  in  the  present  example,  r=  .81,  PEr—  .07.    Accord- 
ingly, the  coefficient  of  correlation  though  based  on  only  12 

1  See  Brown  &  Thomson,  Essentials  of  Mental  Measurement,  1921,  p.  103. 


192       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

cases  is  conventionally  reliable.  Whenever  N  is  less  than 
30,  however,  the  PEr  is  probably  much  larger  than  the  value 
given  by  the  formula.  In  any  case  r's  and  PEr's  secured  from 
less  than  30  cases  should  be  accepted  as  tentative,  and  inter- 
preted with  caution.  In  the  present  example,  all  that  we  are 
justified  in  concluding  is  that  in  our  particular  group  of  12 
men  there  is  evidence  of  a  close  correspondence  between  rank- 
ings for  efficiency  and  number  of  years  employed. 

TABLE  XX 
A  Table  to  Infer  the  Value  of  r  from  Any  Given  Value  of  p 

62£>2 


p  —  *■ 

N(N*-1) 

p 

r 

p 

r 

p 

r 

p 

r 

.01 

.0105 

.26 

.2714 

.51 

.5277 

.76 

.7750 

.02 

.0209 

.27 

.2818 

.52 

.5378 

.77 

.7847 

.03 

.0314 

.28 

.2922 

.53 

.5479 

.78 

.7943 

.04 

.0419 

.29 

.3025 

.54 

.5580 

.79 

.8039 

.05 

.0524 

.30 

.3129 

.55 

.5680 

.80 

.8135 

.03 

.062S 

.31 

.3232 

.56 

.5781 

.81 

.8230 

.07 

.0733 

.32 

.3335 

.57 

.5881 

.82 

.8325 

.03 

.0838 

.33 

.3439 

.58 

.5981 

.83 

.8421 

.09 

.0942 

.34 

.3542 

.59 

.6081 

.84 

.8516 

.10 

.1047 

.35 

.3645 

.60 

.6180 

.85 

.8610 

.11 

.1151 

.36 

.3748 

.61 

.6280 

.86 

.8705 

.12 

.1256 

.37 

.3850 

.62 

.6379 

.87 

.8799 

.13 

.1360 

.38 

.3935 

.63 

.6478 

.88 

.8S93 

.14 

.1465 

.39 

.4056 

.64 

.6577 

.89 

.89S6 

.15 

.1569 

.40 

.4158 

.65 

.6676 

.90 

.90S0 

.16 

.1674 

.41 

.4261 

.66 

.6775 

.91 

.9173 

.17 

.1778 

.42 

.4363 

.67 

.6873 

.92 

.  9269 

.18 

.1882 

.43 

.4465 

.68 

.6971 

.93 

.9359 

.19 

.1986 

.44 

.4567 

.69 

.7069 

.94 

.9451 

.20 

.2091 

.45 

.4669 

.70 

.7167 

.95 

.9543 

.21 

.2195 

.46 

.4771 

.71 

.7265 

.96 

.9635 

.22 

.2299 

.47 

.4872 

.72 

.7363 

.97 

.9727 

.23 

.2403 

.48 

.4973 

.73 

.7460 

.98 

.9818 

.24 

.2507 

.49 

.5075 

.74 

.7557 

.99 

.9909 

.25 

.2611 

.50 

.5176 

.75 

.7654 

1.00 

1.0000 

2.  The  Method  of  Gains,  or  the  Spearman  Footrule 

A  second  method  of  computing  correlation  when  the  data  are 
ranked  in  orders  of  merit  is  the  Method  of  Gains,  or  the  Spear- 
man "  Footrule.' '     Table  XXI  illustrates  the  use  of  the  Foot- 


CORRELATION  193 

rule  with  the  data  taken  from  Table  XIX.  It  will  be  noticed 
that  the  first  four  columns  are  the  same  in  both  methods,  i.e., 
each  series  is  arranged  first  in  an  order  of  merit.  The  methods 
differ  from  here  on,  however.  The  entries  in  column  5,  which  is 
headed  G  (" Gains"),  are  found  by  taking  the  plus  differences 
or  the  gains  in  rank  of  the  12  men  in  the  efficiency-rankings 
as  compared  with  their  service-rankings.  Thus  A  who  ranks 
7.5  in  "  service  "  and  6  in  "  efficiency  "  has  an  increase  in  rank 
or  gain  of  1 . 5  in  the  second  ranking  over  the  first.1  C,  F,  H,  I, 
and  J,  likewise  register  plus  differences  or  gains  in  their  effi- 
ciency rankings  as  compared  with  their  service  rankings.  The 
total  of  the  G  column  is  10.5.  Note  that  if  we  compute  the 
gains  in  rank  of  service  over  efficiency  instead  of  efficiency 
over  service,  the  same  G  will  be  obtained.  This  is  shown  in 
column  6,  marked  G'.  It  makes  no  difference,  therefore, 
whether  we  figure  gains  of  the  first  series  over  the  second,  or 
the  other  way  round,  second  over  first. 


TABLE   XXI 

To 

Illustrate  ' 

THE 

FOOTRULE 

Method  of 

Finding  Correlation 

(i) 

(2) 

(3) 

(4) 

(5) 

(6) 

Years  of 

Order  of  Merit 

Order  of  Merit         G  (Gains) 

G' 

(Gains) 

Salesmen 

Service 

(Service) 

(Efficiency) 

(4  over  3) 

(3 

over  4) 

A 

5 

7.5 

6 

1.5 

B 

2 

11.5 

12 

.5 

C 

10 

2 

1 

"i.6 

D 

8 

4 

9 

5.0 

E 

6 

6 

8 

2.0 

F 

4 

9 

5 

i'.o 

G 

12 

1 

2 

1.0 

H 

2 

11.5 

10 

i'.h 

I 

7 

5 

3 

2.0 

J 

5 

7.5 

7 

.5 

K 

9 

3 

4 

1.0 

L 

3 

10 

11 

1.0 

10.5 

10.5 

R  = 

62(7 

N2-l~ 

6X10.5 
143 

=  .56 

T 

(Table  XXII) 

=  .79 

1  Since  the  rankings  arc  ^rom  1  io  12,  a  rank  of  6  is  to  be  taken  as  higher 
than  a  rank  of  7.5. 


194      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

When  the  sum  of  the  G  column  has  been  obtained,  the  cor- 
relation may  be  found  from  the  formula, 

62(3 

R==1~~(N2-1)'     •.-••■     •     •     •     (38) 

Substituting  for  2(7  its  value  10.5,  and  for  N  its  value  12,  we 
get  an  R  of  .56.  From  Table  XXII  this  R  may  be  converted 
into  an  equivalent  product-moment  r  of  .79.  Note  that  this 
value  of  r  compares  favorably  with  the  r  (found  from  p)  of 
.81. 

table  x::n 
A  Table  to  Infer  the  Value  of  r  from  Any  Given  Value  of  R 

R  r  R  r  R  r 


R 

r 

00 

.000 

01 

.018 

02 

.036 

03 

.054 

04 

.071 

05 

.089 

06 

.107 

07 

.124 

08 

.141 

09 

.158 

10 

.176 

11 

.192 

12 

.209 

13 

.226 

14 

.242 

15 

.259 

16 

.275 

17 

.291 

18 

.307 

19 

.323 

20 

.338 

21 

.354 

22 

.369 

23 

.384 

24 

.399 

25 

.414 

26 

.429 

27 

.444 

28 

.458 

29 

.472 

30 

.486 

31 

.500 

32 

.514 

33 

.528 

34 

.541 

35 

.554 

36 

.567 

37 

.580 

38 

.593 

39 

.608 

40 

.618 

41 

.630 

42 

.642 

43 

.654 

44 

.666 

45 

.677 

46 

.689 

47 

.700 

48 

.711 

49 

.721 

50 

.732 

51 

.742 

.76 

.937 

52 

.753 

.77 

.942 

53 

.703 

.78 

.947 

54 

.772 

.79 

.952 

55 

.782 

.80 

.956 

56 

.791 

.81 

.961 

57 

.801 

.82 

.965 

58 

.810 

.83 

.968 

59 

.818 

.84 

.972 

60 

.827 

.85 

.975 

61 

.836 

.86 

.979 

62 

.844 

.87 

.981 

63 

.852 

.88 

.9S4 

64 

.860 

.89 

.987 

65 

.867 

.90 

.9S9 

66 

.875 

.91 

.991 

67 

.882 

.92 

.993 

68 

.889 

.93 

.995 

69 

.896 

.94 

.996 

70 

.902 

.95 

.997 

71 

.90S 

.96 

.998 

72  * 

.915 

.97 

.999 

73 

.921 

.98 

.9996 

74 

.926 

.99 

.9999 

75 

.  932 

1.00 

1.0000 

The  Footrule  formula  gives  a  rough  estimate  of  the  cor- 
relation, and  is  generally  less  accurate  than  the  rank- 
difference  formula.     The  coefficient  R  "  has  a  large,  though 


CORRELATION  195 

except  in  the  case  of  zero  correlation,  not  definitely  known 
PE;  does  not  vary  between  —  1  and  +1;  is  not  comparable 
in  meaning  with  the  product-moment  coefficient ;  and  in  general 
has  none  of  the  merits  except  brevity  of  the  formula  based  on 
the  squares  of  the  differences  in  rank."  x  The  Footrule  can  be 
employed  to  advantage,  however,  when  the  data  are  so  meager 
or  crude  as  to  make  a  more  refined  method  a  waste  of  time; 
or  it  may  be  used  in  a  preliminary  survey  to  determine  whether 
there  is  sufficient  evidence  of  correlation  to  warrant  the  applica- 
tion of  the  product-moment  method. 

3.  Summary  of  the  Rank  Methods 

The  product-moment  method  takes  account  of  both  the 
size  of  the  score  and  its  position  in  the  series.  The  rank 
methods  take  account  only  of  the  position  of  the  items  in 
the  series.  For  example,  individuals  who  score  90,  86,  and 
70,  on  a  given  test  must  be  ranked  1,  2,  and  3  in  order  of  merit 
despite  the  fact  that  the  difference  between  90  and  86  is  4,  and 
the  difference  between  86  and  70  is  16.  The  rank  methods 
indicate  the  presence  of  relationship  rather  than  the  extent 
of  relation.  In  general  it  may  be  set  down  as  a  convenient 
rule  that  rank  methods  should  never  be  used  ordinarily  except 
when  N  is  small — say  less  than  30.  Of  the  two  rank  methods, 
the  method  of  rank-differences  is  to  be  preferred  as  the  more 
accurate. 

VIII.  A  Method  of  Measuring  Relationship  When  the 
Data  are  Grouped  into  Classes  or  Categories. 
The  Contingency  Method 

Sometimes  the  need  arises  of  computing  correlation  when 
the  facts  in  which  we  are  interested  cannot  be  conveniently 
measured,  but  can  be  grouped  into  classes  or  categories.  To 
cite  a  few  examples  of  such  data,  we  can  classify  eye-color  as 
blue,  grey,  or  brown;   temper  as  quick,  even,  or  slow;   athletic 

i  See  Kelley,  T.  L.,  Statistical  Method,  1923,  p.  193. 


196       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

ability  as  good,  average  or  poor,  when  we  are  unable  to  measure 
such  facts  exactly.  The  methods  of  computing  correlation 
which  have  been  given  in  the  preceding  sections  are  generally 
applied  to  facts  which  can  be  measured  absolutely  in  terms 
of  some  common  unit,  or  which,  at  least,  can  be  ranked  in 
order  of  merit — they  do  not  ordinarily  apply  to  data  which 
can  only  be  grouped  into  classes.  Several  methods  are  avail- 
able for  such  material,  however.  One  of  the  best  of  these  is  the 
Contingency  Method  developed  by  Prof.  Karl  Pearson.1 
In  the  contingency  method  relation  is  expressed  by  C,  the 
Coefficient  of  Mean  Square  Contingency. 

Table  XXIII  illustrates  the  method  of  drawing  up  a  con- 
tingency table,  and  shows  in  detail  the  steps  involved  in  finding 
C.  The  problem  is  to  discover  whether  there  is  any  "  resem- 
blance "  (correlation)  between  the  eye-color  of  father  and  son. 
There  are  1000  cases.  Tabulation  of  data  is  similar  to  the 
method  used  in  constructing  a  correlation  table.  Reading 
down  the  first  column,  for  example,  we  find  that  out  of  a  total 
of  358  blue-eyed  fathers,  194  have  blue-eyed  sons;  83  grey- 
eyed  sons;  25  dark  grey  or  hazel-eyed  sons;  and  56  brown- 
eyed  sons.  In  the  first  row,  we  find  335  blue-eyed  sons  of 
whom  194  have  blue-eyed  fathers;  70  grey-eyed  fathers;  41 
dark  grey  or  hazel-eyed  fathers;   30  brown-eyed  fathers. 

After  the  contingency  table  is  completed,  the  first  step  in 
the  calculation  of  C  is  to  find  an  "  independence  value  "  for 
each  cell.  These  values — the  figures  in  the  parentheses  in  the 
cells — represent  the  number  of  fathers  and  sons  (whose  eye- 
color  is  given  by  the  column  and  row,  respectively,  in  which 
the  cell  lies)  whom  we  should  expect  to  find  in  any  given  cell 
in  the  absence  of  any  actual  association  in  the  eye-color  of 
father  and  son.  For  example,  the  observed  number  of  blue- 
eyed  fathers  who  have  blue-eyed  sons  in  our  sample  of  1000 
is   194.     If  there  were  no  correlation  between  the  eye-color 

of  father  and  son,  we  should  still  expect  to  find  — TTwT-"  or 


Yule,  G.  U.,  An  Introduction  to  the  Theory  of  Statistics,  1919,  p.  6-iff. 


CORRELATION 


197 


TABLE   XXIII 

To  Illustrate  the  Calculation  of  C,  the  Coefficient  of 
Mean  Square  Contingency.     [From  Yule,  p.  70] 

Column  2 


« 
o 
j 
o 
O 

H 

H 

GO 

o 

02 


Blue 

Grey 

Hazel 

Brown 
Totals 


Father's  Eye  Color 
Blue     Grey    Hazel  Brown  Totals 


(120) 
;  194 

(88) 
70 

(60) 
41 

(66) 
30 

335 

(102) 
83 

(75) 
124 

(51) 
41 

(56) 
36 

284 

(49) 
25 

(36) 
34 

(25) 
55 

(27) 
23 

137 

(87) 
55 

(64) 
36 

(44) 
43 

(48) 
109 

244 

(194)2 
120 

(83)2 


87 
(70)2 

88 
(124)2 


358        264        180        198 


1000 


Column  1 

Independence  Values 


335X358 

1000 
335X264 

1000 
335X180 

1000 
335X198 

1000 
284X358 

1000 
284X264 

1000 
284X180 

1000 
284X198 

1000 
137X358 

1000 


120 


88 


60 


=   66 


=  102 


=   75 


=   51 


56 


=  49 


137X264 
1000 

137X180 
1000 

137X198 
1000 

244X358 
1000 

244X264 
1000 

244X180 
1000 

244X198 
1000 


36 


=  25 


=  27 


=  87 


=  64 


=  44 


=  48 


44 

(30)2 

66 
(36)2 

56 
(23)2 

27 
(109)2 


£  =  1270.8 
AT  =  1000 


S-N=  270.8 


C  = 


A' 


S-N 


S 


■4 


270.8 
1270.8 


=  462 


198       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

120  blue-eyed  fathers  with  blue-eyed  sons  by  the  operation 
of  chance  alone.1  Again,  the  observed  number  of  grey-eyed 
fathers  who  have  blue-eyed  sons  is  70.     In  the  absence  of  any 

real  association,  chance  alone  would  account  for  — — — — —  or 

88  such  cases  in  our  sample  of  1000.  In  like  manner  "  independ- 
ence values  "  may  be  found  for  each  cell  by  the  simple  process 
of  multiplying  together  the  totals  of  the  row  and  column  in 
which  the  cell  lies  and  dividing  this  product  by  N,  the  number 
of  cases.     (See  column  1,  Table  XXIII.) 

When  the  independence  values  have  been  calculated  for 
each  cell,  the  next  step  is  to  square  each  cell  entry  and  divide 
this  result  by  the  independence  value  of  that  cell  (see  column 
2).  All  quotients  so  found  are  totaled  to  give  S  (1270.8),  and 
^(1000)  is  subtracted  to  give  S  —  N.  The  coefficient  of  mean 
square  contingency,  C,  may  then  be  found  from  the  formula, 


c= yV*  •  • (39) 

In  the  present  problem,  C—  .462. 

The  steps  in  the  computation  of  C  may  be  summarized  as 
follows : 

1.  Construct  a  contingency  table  as  shown  in  Table 
XXIII. 

2.  Determine  the  "  independence  value  "  for  each  cell  by 
multiplying  together  the  totals  of  the  row  and  column  in  which 
the  cell  falls  and  dividing  this  product  by  A'. 

3.  Square  the  number  found  in  each  cell,  and  divide  this 
result  by  the  independence  value  of  that  cell  obtained  in  (2) 
above. 

4.  Sum  the  quotients  obtained  from  (3).     Call  this  total  S. 

335 
1  We  find  that  of  all  the  sons  are  blue-eyed.     This  proportion  should 

hold  for  sons  of  all  fathers,  if  there  is  no  dependence  of  son  on  father  in  respect 

335 
to  eye-color.     Hence  — — —  of  the  35S  blue-eyed  fathers  should  have  blue-eyed 

sons  by  the  operation  of  chance  alone.     This  argument  applies  to  the  other 
"  independence  values  "  also. 


CORRELATION  199 

5.  Subtract  N  from  S,  giving  S—N. 

6.  Divide  S—N  by  S  and  extract  the  square  root  to  get  C, 
the  coefficient  of  mean  square  contingency. 

The  fundamental  principle  underlying  the  Contingency 
Method  is  a  comparison  of  the  frequency  of  association  (num- 
ber of  cases)  actually  found  in  each  cell  with  the  frequency 
of  association  which  we  should  expect  to  find  in  the  cells  if  the 
traits  considered  were  completely  unrelated  (independent). 
If  there  is  just  no  correlation  between  the  two  variables  in  our 
contingency  table,  (7=  .00;  if  there  is  perfect  correlation,  C 
approaches  1 .  00  as  a  limit. 

While  in  general  no  sign  is  attached  to  C,  as  this  coefficient 
simply  indicates  whether  the  two  traits  are  associated  or 
independent,  for  interpretative  purposes  a  minus  sign  may  be 
affixed  to  a  C  if  an  inspection  of  the  contingency  table  shows 
that  marked  degrees  of  the  one  trait  are  found  with  slight 
degrees  of  the  other.  Thus  from  an  inspection  of  Table  XXIII, 
it  is  evident  that  slight  pigmentation  of  eyes  in  the  father  is 
associated  with  slight  pigmentation  of  eyes  in  the  son,  and  hence 
in  the  present  case,  C  is  clearly  positive.1  If  marked  pigmenta- 
tion in  the  eyes  of  the  father  had  been  associated  with  slight 
pigmentation  in  the  eyes  of  the  son,  C  would  have  been  negative. 
In  other  words,  we  must  determine  whether  the  correlation  is 
positive  or  negative  from  the  contingency  table, — C  gives  simply 
the  degree  of  the  relation. 

One  disadvantage  of  the  contingency  method  lies  in  the 
fact  that  C  does  not  remain  constant — for  the  same  data — when 
the  number  of  classes  in  the  table  is  increased.  The  C  cal- 
culated from  a  3X3  fold  table  will  not  ordinarily  equal  the  C 
calculated  from  the  same  data  arranged  in,  say,  a  5X5  fold  table. 
Moreover,  the  maximum  value  which  a  C  can  take  will  depend 

1  Note,  for  example,  that  194  blue-eyed  fathers  have  blue-eyed  sons,  while 
only  30  brown-eyed  fathers  have  blue-eyed  sons.  Also,  109  brown-eyed  fathers 
have  brown-eyed  sons  while  only  56  blue-eyed  fathers  have  brown-eyed  sons. 
Other  comparisons  like  these  will  show  that  association  between  the  degree  of 
pigmentation  in  the  eyes  of  father  and  son  is  positive. 


200       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

on  the  fineness  of  the  classification  employed.     Yule  1  has  shown 

that 

when  the  number  of  classes  =  2  C  cannot  exceed  .  707 

when  the  number  of  classes  =  3  C  cannot  exceed  .816 

when  the  number  of  classes  =  4  C  cannot  exceed  .  866 

when  the  number  of  classes  =  5  C  cannot  exceed  .  894 

when  the  number  of  classes  =  6  C  cannot  exceed  .913 

when  the  number  of  classes  =  7  C  cannot  exceed  .  926 

when  the  number  of  classes  =  8  C  cannot  exceed  .  935 

when  the  number  of  classes  =  9  C  cannot  exceed  .  943 

when  the  number  of  classes  =  10  C  cannot  exceed  .949 

Yule  has  suggested,  in  the  light  of  these  facts,  that  we  "restrict 
the  use  of  the  '  coefficient  of  contingency  '  to  5  X  5-fold  or  finer 
classifications  "  in  order  that  the  maximum  value  of  C  may 
be  as  near  unity  as  possible.  On  the  other  hand,  we  must 
avoid  a  too-fine  classification  or  C  will  be  affected  by  slight 
or  "  casual  irregularities  of  no  physical  significance  ";  and  in 
addition  the  arithmetic  will  be  needlessly  increased. 

Since  the  classification  in  Table  XXIII  is  4  X  4-fold,  the 
value  of  C  would  very  probably  change  somewhat  if  the  num- 
ber of  classes  were  increased.  The  table  will  serve  very  well, 
however,  as  an  illustration  of  the  method,  and  of  the  arithmetic 
involved  in  finding  C.  Moreover,  as  the  maximum  C  from  a 
4X4-fold  table  is  .866,  and  the  C  found  from  Table  XXIII 
is  .462,  we  are  justified  in  concluding — in  spite  of  the  relative 
crudeness  of  our  measures — that  there  is  a  medium  positive 
correlation  between  pigmentation  of  eyes  in  father  and  son. 

The  relation  of  C  to  r,  the  Product-Moment  coefficient  of 
correlation,  is  of  considerable  importance.  C  may  be  taken  as 
practically  equivalent  to  r,  (1)  when  the  grouping  is  relatively 
fine, — 5 X 5-fold  or  finer;  (2)  when  the  sample  is  large;  (3) 
when  we  know,  or  are  justified  in  assuming,  that  the  traits 
which  we  are  correlating  are  normally  distributed.  In  case  the 
first  of  these  conditions  is  not  fulfilled,  Pearson  2  has  given  a 
correction  for  "  broad  categories  "  which  should  be  used  with 
4  X  4-fold  and  less  fine  classifications,  if  C  is  to  be  compared  with 

i  An  Introduction  to  the  Theory  of  Statistics,  1919,  p.  66. 

2  Pearson  Karl,  On  the  Measurement  of  the  Influences  of  "  Broad  Categories  " 
on  Correlation.     Biometrika,  Vol.  IX,  1913. 


CORRELATION 


201 


r.  For  5X5  fold  or  finer  classifications  this  correction  is 
usually  small,  and  unless  a  very  accurate  measure  of  correlation 
is  desired  it  may  be  disregarded  and  C  taken  as  roughly  equal 
to  r. 


TABLE  XXIV 

To  Illustrate  the  Calculation  of  C  by  Short  Method 
Boys:  Ages  4|-5£  Years 

Weight  in  Pounds 
24-28      29-33      34-38      39-43      44-48      49-53     Total 


Xfl 
J3 


45- 


42- 


m    39- 


r-C! 

'53 

w 


36- 
33-i 


30-. 


47 

1 

2 

3 

44 

4 

35 

21 

5 

65 

41 

5 

87 

90 

7 

1 

190 

38 

1 

18 

72 

8 

99 

35 

5 

15 

5 

25 

32 

2 

2 

38 


169 


133 


30 


Column  1: 


=    .3762 


Column  2: 


=    .3264 


8 1_99^25^2  J 

1  T  25      324     2251 
38Ll90+99  +  25J 

n  1   fl     16  .7569  .5184  .251         K_,n 

Column3:   m  \j +^ +—  +— ■+- J  =    .5549 

~  .  .       1    T1225  .  8100  ,  641 

Column  4:  _^+_-+_j 

1  ["4     441      49 1 
3o|_3      65  +190J 


=    .4671 


Column  5: 


Column  6: 


30 1 
6|_65^190 


LI 

90J 


■H 


=    .2792 


=    .0650 


P  =  2. 0688 


P-l 


1 . 0688 


P      =  A  2  0688 


=  .719 


384 


202       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

The  arithmetic  involved  in  computing  C  may  be  lessened 

somewhat  by  combining  the  twofold  process  of  (1)  calculating 

independence  values  and  (2)  dividing  the  square  of  each  cell 

frequency   by    its    independence  value.     This  Short    Method 

of  finding  C  is  illustrated  in  Table  XXIV.     Note  that  the 

first  occupied  cell  in  the  first  column  of  the  table  has  a  fre- 

99X8 
quency    of  1  and  an  independence  value  of  ,  and  that 

oo4 

the  cell  frequency  squared  and  divided  by  the  independence 

,       .    1X384      _  .  ,.     ,      .        1X384 

value  is         n -.     lnis  quotient,  viz.,  is  the  contnbu- 

tion  of  this  particular  cell  to  the  total  S.     In  like  manner  the 

52X384 
contribution  to  S  of  the  next  cell  in  this  column  is  — — -^~ ; 

and  of  the  third  and  last  cell,  .     These  contributions 

384 /  1      25     4 
from  column  1  may  be  combined  as  follows,  "iv-!  qTT+fp+q 

and  the  contribution  of  each  of  the  other  five  columns  to  S  may 

be  found  in  exactly  the  same  way.     One  further  simplification 

may  be  made.     Since  iV(384)  is  a  common  factor  in  each  column, 

it  may  be  left  out  of  the  computations  entirely  in  calculating 

the  contribution  of  each  cell,  as  shown  in  the  table.     Then  if 

/p3J 
the  sum  of    all  six  columns    is    denoted    by  P,   C  = 


P 

directly.1 

By  the  Short  Method,  C  is  found  to  equal  .719,  and  the 
coefficient  of  correlation  for  the  same  table  will  be  found  to  be 
.709  (see  page  216).  The  correspondence  of  C  and  r  is  some- 
what closer  here  than  is  generally  obtained,  although  the 
difference  between  C  and  r  is  never  very  great  when  the  con- 
ditions prescribed  on  page  200  have  been  met.     In  the  present 


i  Since  P  =  ~,  S  =  PAr.      Substituting  PN  for  S  in  the  formula  C  =  -vr    ~    , 


JPN-N  .  .  JP—I 

=  V — pv —  or  rcniovinK  t"e  common  factor,  C  =  -y — — — 


CORRELATION  203 

case,  N  is  fairly  large,  the  classification  is  6  X 6-fold,  and  the 
distributions  of  both  height  and  weight  fairly  normal. 

The  steps  in  the  computation  of  C  by  the  Short  Method  may 
be  summarized  as  follows  (see  Table  XXIV). 

1.  Square  the  frequency  in  each  cell  of  column  1,  and 
divide  each  square  by  the  row  total  in  which  the  cell  falls. 

2.  Add  all  of  the  results  for  column  1,  and  divide  by  the 
column  total,  a  common  factor.     Record  this  partial  sum. 

3.  Repeat  (1)  and  (2)  for  each  of  the  other  columns  in 
the  table. 

4.  Call  the  sum  of  all  partial  sums  P. 

5.  Find  C  from  the  formula  C  =  a  / — — — . 

In  many  problems  in  psychology  in  which  the  relation 
between  various  attributes,  whether  of  individuals  or  things, 
is  sought,  C  will  prove  of  considerable  value. 


IX.  Non-Linear  Relationship 

1.  The  Correlation  Ratio 

The  relation  which  exists  between  the  paired  values  of  two 
sets  of  measures  X  and  Y  may  be  described  in  a  general  way 
as  either  "  linear  "  or  "  non-linear."  When  the  means  of  the 
arrays  of  successive  columns  or  rows  in  a  correlation  table  fol- 
low straight  lines  (exactly  or  approximately)  the  regression  is 
called  "  linear,"  and  the  relation  between  the  two  sets  of 
measure  or  scores  is  a  "  straight  line  relation."  On  the  other 
hand,  when  the  drift  or  the  trend  of  the  means  in  the  successive 
arrays  cannot  be  described  by  a  straight  line,  but  can  be  prop- 
erly represented  only  by  a  curve  of  some  kind,  the  regression 
is  called  curvilinear,  or  in  general  non-linear,  and  the  relation 
between  the  two  variables  is  a  "  curved  line  relation." 

Our  previous  discussion  has  been  concerned  entirely  with 
cases  in  which  the  relation  between  X  and  Y  was  known  to  be 
linear  and  in  which  r  gave  a  fair  measure  of  the  degree  of  correla- 


204       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

tion.  Cases  sometimes  arise  in  psychological  measurement, 
however,  in  which  the  relation  between  X  and  Y  is  clearly 
non-linear,  and  in  such  cases  the  coefficient  of  correlation  r — 
since  the  product-moment  method  assumes  linear  relationship 
— cannot  be  used.  The  reason  for  this  may  be  stated  in  brief 
as  follows.  When  a  definitely  curvilinear  relation — instead  of 
being  described  by  a  curve — is  represented  by  a  straight  line, 
the  scatter  of  the  paired  values  is  considerably  greater  about 
the  straight  line  than  about  the  curve.  This  results  from  the 
fact  that  the  scatter  about  a  curve  joining  the  means  of  the 
successive  arrays  is  necessarily  less  than  the  scatter  about  a 
straight  line  which  has  been  "  fitted  "  to  these  mean  points. 
The  less  the  scatter  about  the  regression  line  or  curve,  the 
greater  the  degree  of  correlation;  hence  a  coefficient  of  cor- 
relation calculated  from  a  correlation  table  in  which  the 
regression  is  truly  curvilinear  will  be  materially  less  than  the 
true  correlation  between  the  variables  X  and  Y.  (See  Foot- 
note 1.) 

In  order  to  measure  non-linear  relation,  therefore,  we  need 
a  more  generalized  coefficient  than  the  coefficient  of  correlation, 
r: — that  is,  we  need  a  coefficient  which  will  measure  the  con- 

1  A  simple  illustration  will  make  clear  just  why  this  is  true.  The  correlation 
between  the  following  two  short  series  (Table  XXV)  by  the  product-moment 
formula  (formula  25)  is  .93.  The  true  correlation,  however,  is  1.00,  i.e.,  perfect, 
since  the    Y  values  are  absolutely  dependent  on  the  X  values: — as  X  increases 


TABLE 

XXV 

Variable  X 

Variable 

1 

.25 

2 

.50 

3 

1.00 

4 

2.00 

5 

4.00 

in  steps  of  1  (in  arithmetic  progression)  Y  doubles  (increases  in  geometric 
progression).  The  reason  why  r  is  less  than  1.00  is  perfectly  obvious  as  soon  as 
we  plot  the  paired  X  and  Y  values  (see  Diagram  XXV).  Since  the  relationship 
between  X  and  Y  is  curvilinear,  it  cannot  be  described  by  a  straight  line.  Con- 
sequently when  straight  line  relationship  is  assumed  (as  in  the  product-moment 
formula)  the  plotted  points  do  not  fall  on  the  relation  line,  and  r  is  less  than 
1.00 — the  true  correlation  between  X  and  Y.  In  true  curvilinear  correlation,  r 
is  always  less  than  rj. 


CORRELATION  205 

centration  of  the  paired  X  and  Y  values  about  a  relation  curve, 
just  as  r  measures  the  concentration  of  the  paired  values  about 
a  relation  line.  One  such  coefficient  is  the  Correlation  Ratio, 
devised  by  Prof.  Karl  Pearson,  and  designated  by  the  symbol  77. 
(eta).  Since  eta  is  a  general  coefficient  it  may  be  employed 
when  the  regression  is  linear  as  well  as  non-linear.  If  the  regres- 
sion is  linear — if  the  means  of  the  arrays  fall  on  straight  lines 
— 77  will  equal  r;  if  the  regression  is  non-linear — if  the  means 


2  3 

X  -  variable 

DIAGRAM  XXV 

do  not  fall  on  straight  lines — 77  will  be  greater  than  r.  In  gen- 
eral, as  long  as  the  relation  between  Y  and  X  is  non-linear 
77  and  r  will  differ,  77  always  being  greater  than  r.  The 
coefficient  of  correlation,  therefore,  is  seen  to  be  simply  a 
limiting  value  of  the  more  general  77,  just  as  straight  line 
relationship  is  simply  a  limiting  case  of  curvilinear  relation. 

77  is  always  positive,  and  varies  from  zero  to  1 .  00.  Whether 
or  not  the  relation  given  by  77  is  positive,  negative  or  a  varying 
one  must  be  determined,  however,  from  the  direction  taken 
by  the  curve  of  relation;  i.e.,  by  inspection  of  the  correlation 
diagram. 


206       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

The  process  of  calculating  77  from  a  correlation  table  in 
which  the  relation  is  definitely  non-linear  is  shown  in  Diagram 
XXVI.  The  steps  involved  in  finding  the  values  to  be  sub- 
stituted in  the  formula  for  r\  may  be  outlined  as  follows: 

Step  I 

Construct  a  correlation  table  as  shown  in  Diagrams  XXIII  and 
XXIV  and  described  on  page  154. 

Step  II 

Find  the  average  (Y')  and  the  a  of  the  F-distribution,  using  the 
Guessed  average  Method  described  in  Chapter  I. 

Step  III 

Compute  the  averages  (Y'x)  of  the  successive  F-arrays,  i.e.,  the 
arrays  of  the  columns.     Enter  these  in  row  marked  Y'x. 

Step  IV 

Find  the  deviation  of  each  Y'x  from  the  average  of  the  whole  table, 
Y'\  that  is,  find  (Y'x  —  Y')  for  each  column. 

Step  V 

Square  each  deviation — each  (Y'x—Y') — and  enter  the  results  in 
the  row  marked  (Y'x—Y')~. 

Step  VI 

Multiply  or  weight  each  (Y'x—  Y')z  by  the  Fx  of  its  column.  In 
the  first  column,  for  example,  multiply  15.52  [i.e.,  (Y'x—Y')-]  by  20, 
its  Fx. 

Step  VII 

Find  the  sum  of  the  FX(Y'X—Y')2  column.  Divide  this  sum  by  X, 
and  extract  the  square  root.  The  result  is  amy,  the  standard  deviation 
of  the  means  of  the  various  columns  about  the  arithmetic  mean  of  all 
of  the  Fs. 

Step  VIII 

Divide  <rmy  by  <ry  to  get  the  correlation  ration  ryx.  The  formula 
for  7]yx  may  be  written, 

flyx^  —  , (40) 

(Ty 

If  now  we  substitute  in  formula  (40)  the  values  of  <rmy 
and  au  found  from  Diagram  XXVI,   the  correlation-ratio  v\yx 


CORRELATION 


207 


o 


f 
t-1 
d 

CO 

1-3 
w 

► 

H 

o 
2 


3 

> 

H 
O 
W 
H 

cc 

l-t 

o 
> 

D 

H 
W 

H 

Q 
> 
c-1 
n 
cj 
F 

O 

o 
*j 

i-3 
M 
H 

O 
O 

SI 
H 
F 
► 

M 

o 

> 

H 
•-< 
O 


x 


i_^ 

H* 

J »— r; 

Iw 

»-; 

N 

•  " 

-S  s 

L5- 

3 

■ 

1 

1 

* 

c» 

Number 

of 

pr 

)lilcrns  i 

vo 

■ked 

¥ 

-vari 

ib 

e 

*•- 

© 

w 

to 

ti 

»» 

C31 

OS     — } 

00 

o 

o 

!_l 

IS 

CO 

*- 

CT 

II 

& 

© 

'-■■ 
to 

1 

eo 
;o 

tp- 

o 

to 

o 

e» 

•o 

C!l 

EH 

«? 

H 

iS 

1 

-=s 

II 

fej 

h'- 

Oi 

*■ 

OS 

I"1 

io 

i— 

1 

to 

as 

'© 
en 

W 

w-. 

*"* 

o 

so 

OI 

to 

<=> 

«l 

*-£ 

1 

«= 

w 

1 

1 

II 

•J2 

© 

tO 

os 

'co 
oo 

c-» 

CS 

w 
cs 

CO 

L 

1 

(O 

H» 

h-1 

< 

■"pi 

1 

e>  o 

fc  Sj 

os 

1 

1 

II 

°»lc»l 
OI  .     1 

OS 

© 

CO 

I- 

a 

fO 

J, 

|_l 

<- 

|o| 

Ci 

CO 

is 
09 

O 
OS 

*" 

o 

SB 
\ 

■x 

CO 

O 

£0 

o 

I 

co 

II 

CH 

1 

\ 

\ 

SI 

C"S 

to 

tS 

1 

<! 

jg 

< 

o 

Is 

o 

co 

SO 

~3 

to 

F' 

rs 

--i 

**■ 

vi 

V 

si 

\ 

eg 

II 

00 

00 

,1 

*» 

i_i 

\ 

N 

i 

60 

N> 

< 

.3 

*- 

*■ 

•-S 

CO 

to 

'© 

o 

iO 

-q 

as 

ffl 

^^\C5 

fS 

■-s 

>o 

— 

II 

• 

GO 

o 

CO 

so 

IP- 

p 

o 

© 

- 

00 

Ol 

^. 

^v 

3S 

rf* 

(O 

O 

>D 

< 

— i 

,-v 

iS 

H 

>*- 

o" 

id 

o 

a 

p 

SO 

CO 

OS 

°-o 

© 

>*- 

CO 

«S 

;c 

ts 

o 

^ 

^<j 

is 

~J 

o 

S. 

~* 

00 

1! 

> 

\ 

o 

Ol 

cs 

es 

ht- 

jj, 

OS 

to 

,_, 

_ 

M 

^J 

JO 

M 

M 

M 

^ 

C' 

C7I 

-5 

M 

1-1 

H- 

~" 

C5 

rf* 

ss 

*- 

<i 

oo 

n- 

Si 

ts 

o 

<c 

os 

o 

1 

o 

1 

1 

1 

OS 

1 

1 

o 

- 

ts 

OS 

*- 

Ol 

cn 

-J 

CO 

to 

b 

«: 

< 
| 

* 

< 

1 

1 

1 

co 

1 

•j0 

1 

Ol 

1 

1 

CO 

*» 

to 

*- 

<-o 

(O 

to 

^ 

-J 

Ci 

CJ' 

to 

os 

ao 

*> 

to 

to 

CO 

o 

*» 

35 

CS 

o 

ol^ 

if 

,1 

? 

1 

cs 

or|-g 

w|§ 

b 

II 

■J 

*^ 

1 

e 

t^ 

1 

l 

J-1 

tS 

CO 

II 

.*■ 

cs 

> 

o 

-= 

s> 

o 

09 

to 

,_, 

_ 

*■ 

CO 

CO 

-1 

CO 

a 

a 

s 

en 

-a 

o 

ri 

CO 

to 

lo 

o 

c 

o> 

CO 

as 

^ 

o 

O 

C^l 

^2 

CO 

s> 

0 

e 

^ 

to 

CO 

o 

p 

«s' 

T9 

I 

o 

«K 

II 

w- 

II 

EO 

to 

OS 

>< 

208       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

becomes  .931.1  This  coefficient  shows  how  the  number  of 
problems  worked  (on  the  average)  in  a  certain  arithmetic  test 
(F)  is  related  to  the  grade  position  (X)  of  465  pupils.  The 
curve  which  describes  this  relation — the  curve  which  best 
marks  the  trend  or  "  drift  "  of  the  means  of  the  successive  Y 
arrays — has  been  drawn  in  on  the  figure.  Note  that  it  begins 
low  and  gradually  rises,  suddenly  bending  up  in  a  concave 
fashion. 

From  the  diagram  alone  it  would  seem  to  be  clear  enough 
that  the  regression  of  7  on  J  is  non-linear.  Further  evidence 
of  this  may  be  found  in  the  fact  that  the  coefficient  of  cor- 
relation, r,  calculated  from  this  table  (on  the  assumption,  of 
course,  of  linear  relationship)  is  .  80, — about  .  13  less  than 
7]yx.  The  method  of  determining  definitely  whether  regres- 
sion is  linear  or  non-linear  in  any  table  will  be  given  in  (3) 
following. 

There  are  always  two  q's  in  every  non-linear  correlation 
table,   just   as   there   are   always   two   regression   coefficients, 

r—  and  r— ,  in  a  table  in  which  regression  is  linear.     The  one, 
ax  cry 

written  r]yx,  refers  to  the  regression  of  Y  on  X  (Y  is  the  dependent 

variable);   the  other,  written  rjxy,  refers  to  the  regression  of  X 

on  Y  (X  is  the  dependent  variable).     The  value  of  r)xy  may  be 

computed  in  exactly  the  same  way  as  rjyx  by  substituting  X 

for  Y  in  the  outline  of  "  steps  "  given  above.     The  formula  is 

*.-— , (-42) 

Unlike  r  which  has  the  same  value  in  both  regression  equa- 
tions [see  formulas  (28)  and  (29)]  rj yx  and  y]xy  will  usually  differ, 
their  values  depending  on  the  degree  of  scatter  about  the 
curves  joining  the  means  of  the  Y  and  X  arrays.     In  the  present 

1  The  PE  of  rj  may  be  found  from  the  formula 


P*,-*«£=aS (41) 


or  from  Table  XVIII. 


CORRELATION  209 

problem,  for  example,  rjxy  =  .818,  while  rjyx=  .931  as  shown  above. 
In  the  special  case  in  which  the  regression  is  truly  linear,  y\yx 
and  7}xy  equal  each  other,  and  both  equal  r  (see  page  205). 

2.  The  Correction  of  "  Raw  "  Eta 

The  value  of  rj  depends  materially  on  the  number  of  cases 
in  the  sample,  and  on  the  fineness  of  the  grouping.  As  a  general 
rule,  rj  should  never  be  calculated  unless  N  is  fairly  large. 
When  N  is  comparatively  small  or  the  number  of  arrays  is 
large,  Pearson  1  has  given  a  correction  which  should  be  applied 
to  the  "  raw  "  (i.e.,  calculated)  value  of  rj. 

If  we  represent  the  number  of  arrays  by  k  the  formula  for 
"  corrected  eta  "  is 


V2 


(k-3) 


N 
corrected  r\2  = (  ,      ....     (43) 

N 

(The  rj  on  the  right  hand  side  of  the  equation  is  the  "  raw  "  eta.) 
If  we  apply  this  correction  to  the  value  of  rjyx  obtained 
in  the  present  problem,  we  have,  substituting    .931  for  7]yx, 
8  (the  number  of  F^arrays)  for  k,  and  465  for  N, 

(.931)2-.011 


corrected  rj2 


yxm 


1—  .011 


V  yx—   qoq  —  .oboo, 

and 

7]VX=  .930. 

In  the  present  case  the  correction  is  very  small.  If  iV  is 
small,  however,  or  k  large,  the  raw  eta  may  be  considerably 
reduced. 

3.  Test  for  Linearity  of  Regression 

It  is  oftentimes  difficult  to  tell  from  the  appearance  of  a 
correlation  table  whether  the  regression  is  linear  or  non-linear; 

i  Biometrika,  1923,  14,  412-417. 


210       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

and  in  such  cases  it  is  best  to  calculate  both  r  and  77.  As  stated 
above,  if  the  regression  is  strictly  linear  77  equals  r;  and  the 
greater  the  departure  from  linearity  the  greater  the  difference 
between  77  and  r.  A  simple  test  of  linearity  is  that  f  (zeta) 
the  difference  between  y\2  —  r2  shall  differ  from  zero  by  an 
amount  which  is  not  greater  than  that  which  might  arise 
from  fluctuations  due  to  random  sampling.     To  make  this  test, 

we  must  first  find  PE$  given  by  the  formula 1 



PEt=. 6745X2^ V(l-r72)2-(l-r2)2  +  l,   .     (44) 

The  second  radical  in  formula  (44)  is  approximately  equal 
to  1,  and  hence  unless  great  accuracy  is  required  we  may 
write  the  formula  simply  as 

PE{=.  6745X2^, (45) 

In   the   problem   which   we   have   been    considering    %*= 

.930  and  r=  .80.     Accordingly,   f=  (.930)2-(.80)2  or   .2249, 

and    from    formula    (45)    PE$  =  .030.2       Zeta,     therefore,     is 

/  •  T         2249  \ 

7.49  times  its  PE     since  T^Fr  =  — -—^r-  or  7.49     and  there  is  no 

\  r  fci^        .  Uo(J  / 

doubt  as  to  the  non-linearity  of  the  regression.  To  determine 
whether  -=r=-  denotes  a  real  or  simply  a    chance  difference 

between  r]2  and  r2,  Table  XV,  the  ^-^ table,  may  be  used 

conveniently. 

If  zeta  is  very  small,  or  if  both  77  and  r  are  small,  a  simple 
test  for  linearity  (Blakeman's  test  3)  which  does  not  require 
finding  PE$  may  be  used.     According  to  this  test,  when 

Ar(772-r2)<11.37 (46) 

1  This  formula  is  due  to  Blakeman.  Sec  Yule,  An  Introduction  to  the  Theory 
of  Statistics,  p.  352. 

2  Formula  (44)  gives  PE  (zeta)  as  .02S.  The  difference  between  the  results 
given  by  formulas  (44)  and  (45)  is  negligible  here. 

3  Blakeman,  J.,  On  Tests  for  Linearity  of  Regression,  Biometrika.  4.  1906, 
pp.  332-350. 


CORRELATION  211 

fche  regression  is  linear.  In  our  problem,  N(r)2  —  r2)  =  104.58, 
and  the  regression  is  clearly  non-linear. 

True  non-linear  relation  is  often  met  with  in  psycho- 
physics,  and  in  experiments  dealing  with  fatigue,  practise, 
forgetting,  etc.  Most  mental  and  physical  tests,  however, 
have  been  found  to  exhibit  linear  relationship,  and  in  con- 
sequence r  has  been  employed  in  psychology  and  education 
to  a  much  greater  extent  than  v.  If  the  regression  is  definitely 
non-linear,  it  makes  considerable  difference  whether  77  or  r 
is  taken  as  the  measure  of  relation.  Unless  the  regression  is 
clearly  curvilinear,  however,  little  error  is  introduced  by 
taking  r  instead  of  rj;  and  this  is  especially  true  if  the  cor- 
relation is  low. 

The  coefficient  of  correlation,  r,  is  superior  to  rj  in  that 
knowing  its  value  we  can  easily  write  the  equation  from  which 
the  value  of  the  dependent  variable  may  be  estimated  from  the 
independent.  This  is  not  possible  with  the  correlation-ratio. 
In  order  to  estimate  one  variable  from  the  other  in  non-linear 
relation,  a  curve  must  be  fitted  to  the  means  of  the  arrays  of 
the  columns  or  rows.1 

(  » 

X.  The   Correction   of   a   Coefficient   of   Correlation 

for  "  Attenuation  " 

The  accuracy  of  any  series  of  test  scores  or  other  meas- 
ures of  capacity  is  always  conditioned  by  the  number  and 
size  of  the  chance  variations — "  errors  of  observation  " — pres- 
ent. The  term  "  errors  of  observation  "  may  be  taken  to  in- 
clude slight  changes  in  technique  and  procedure  on  the  part 
of  the  experimenter,  as  well  as  variations  in  the  subjects 
due  to  fatigue,  distraction,  shifts  in  attention  or  attitude 
towards  the  test,  and  other  minor  fluctuations  of  different 
sorts.  If  the  number  of  observations  is  large,  errors  of  observa- 
tion— since  their  effect  is  as  liable  to  be  in  the  negative  as  the 

1  The  subject  of  curve  fitting  is  fully  dealt  with  in  more  advanced  books  on 
statistics.  See,  Jones,  D.  C,  A  First  Course  in  Statistics,  1921,  Chaps.  XV, 
XVI,  and  XVtL  for  a  fairly  elementary  discussion. 


212       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

positive  direction — will  tend  in  the  long  run  to  cancel  each  other 
off  as  far  as  the  average  is  concerned.  Such  errors,  however, 
always  tend  to  increase  the  a  of  the  distribution,  and  to 
decrease  or  "  attenuate  "  a  coefficient  of  correlation  calculated 
between  series  in  which  they  are  present.  For  this  reason, 
it  is  generally  advisable  to  correct  raw  r's  for  observational 
errors,  and  special  formulas  have  been  devised  to  rule  out  their 
effect.1 

It  is  first  necessary  to  make  at  least  two  independent 
measures  of  each  capacity,  and  to  find  the  self-correlation  of 
each  test.2  This  done,  the  r  corrected  for  attenuation  may  be 
found  from  formula  (47)  given  below.  The  complete  procedure 
is  as  follows: 

Let  A  and  B  represent  the  tests  to  be  correlated. 

Let  A\  represent  the  1st  series  of  scores  obtained  in  A. 

Let  A 2  represent  the  2nd  series  of  scores  obtained  in  A. 

Let  Bi  represent  the  1st  series  of  scores  obtained  in  B. 

Let  B2  represent  the  2nd  series  of  scores  obtained  in  B. 

Let  Tab  represent  the  "  true  "  correlation  between  tests 
A  and  B. 

Let  rAlA2  represent  the  self-correlation  of  test  A. 

Let  rBlB2  represent  the  self-correlation  of  test  B. 

Let  rAlB2  represent  the    obtained    correlation   between   A 
and  B2. 

Let  rAiBx  represent  the  obtained    correlation    between   A 2 
and  B\. 
Then3 

v  (r^]B2)(;\42si)  (An\ 

Tab=     ,- ===== ,         (4/; 

1  See  the  two  articles  by  C.  Spearman: 

(a)    The  Proof  and  Measurement  of  the  Association  between   Two   Things, 
American  Journal  of  Psychology,  190-4,  Vol.  XV,  p.  72-101. 
and  (b)   Demonstration  of  Formulae  for  True  Measure  of  Correlation,  American 
Journal  of  Psychology,  1907,  Vol.  XVIII,  p.  161-169. 

2  See  page  288. 

3  See  Yule,   An  Introduction  to  the  Theory  of  Statistics,     pp.  213-214  for 
discussion  of  this  formula. 


CORRELATION  213 

To  illustrate  the  formula,  suppose  that.  A  is  a  Following 
Directions  Test,  and  B  a  Mixed  Relations  Test,  and  that 

rAlA2  =  .  72  rBlB2  =  .  75 

rAlB2  =  .  35  rA2B1  =  .  42 

Substituting  in  formula  (47)  we  have 

V.72X.75 

or  correcting  for  observational  errors,  we  raise  the  correlation 
from  .35  and  .42  (the  obtained  r's)  to  .52. 

If  we  have  only  the  one  correlation  between  two  given  tests 
A  and  B,  so  that  formula  (47)  is  inapplicable,  it  is  still  possible 
to  obtain  an  approximate  correction  for  attenuation  by 
dividing  the  "  raw  "  coefficient  by  the  geometrical  mean  of  the 
two  "  reliability  coefficients." 1     Formula  (47)  then  becomes 

rAB=  /**-, (48) 

v  TAiA2TB1B2 

Thus  if  the  obtained  correlation  between  tests  A  and  B  above 
had  been  .  50,  and  the  reliability  coefficients,  as  before,  .  72  and 
.  75,  we  could  correct  (approximately)  for  attenuation  as  follows : 

Tab  =     ,  =  ■  68. 

V.72X.75 

Corrected  for  attenuation,  the  obtained  coefficient  is  increased 
from  .50  to  .68. 


XL  Summary  of  Formulas  Used  in  This  Chapter 
1.  For  Product-Moment  r,  deviations  from  GA's 


Ixy 


N  CXCy 


(23) 


ax(Ty 
1  See  Spearman,  C,  American  Journal  of  Psychology,  1904,  Vol.  XV,  p.  271. 


214       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

2.  For  Product-Moment  r,  deviations  from  actual  averages 

r=ivd'     (24) 

r-    J-*  ■     •' (25) 

3  P^r  =  ^5Xil-!) (26) 

Vat 

4.  PJE(dM.  ri_r2)  =  VPEn2+PEr22\       (27) 

5.  Regression  Equations  in  Deviation  Form 

y  =  r-^-x, (28) 

x=r-^-y, (29) 

6.  Regression  Equations  in  Score  Form 

Y  =  r-^(X-X')  +  Y', (30) 

&x 

X  =  r--(Y-Y')+X', (31) 

7.  Standard  Errors  of  Estimate 

o-(est.  r)  =  oyvl  —  r2, (32) 

0-(est.  X)  =  0*Vl  —  r2, (33) 

P^(est.y)=  .6745<r„Vl-r2J (34) 

PE{est.  a-)  =■  6745(7, VT=^, (35) 

8.  Correlation  Measured  from  "  Ranks  " 

62Z>2 

P  =  1~iY(iV^l)' (36) 

pR=  .70630--^ (37) 

62(7 

/?  =  1-(^^TI).        (38) 


CORRELATION  215 

9.  Coefficient  of  Mean  Square  Contingency,  C 

C-^—, r.    (39) 

10.  Non-line^  Regression 

%*  =  —",       (40) 


a 


p^  =  ;C745X(l-^)) (41) 


*»-— .        (42) 


2     C*c  — 3) 

71 N~ 

Corrected  ??2  = -( rr-, (43) 

N 
P^r=. 6745X2^. V(l-^)2_(1_r2)2+1>    g     (44) 

P#r  =. 6745 X^Jjr  (approximately),       .     .     .     .(45) 

JV(r72-r2)<11.37, (46) 

11.  Correction  for  Attenuation 

v/(r^1g2)(r^2gl) 
r^g=    /7 — ===,        (47) 

Tab=    .      TA\B;     =,        (48) 

PROBLEMS 

1.  Find  the  coefficient  of  correlation  (product-moment)  between  the 
following  sets  of  Army  Alpha  and  typewriting  scores  made  by 
100  students  in  a  typewriting  class.     The  typewriting  scores  are 


216       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

in  number  of  words  written  per  minute  (with  certain  penalties). 
In  tabulating  scores,  let  typing  be  the  F-variable  and  Alpha  the 
X-variable.     Take  the  F-step  as  5  and  the  X-step  as  10  units. 

Typing  (F)     Alpha  (X)         Typing  (F)     Alpha  (X)         Typing  (F)     Alpha  (X) 


46 

152 

26 

164 

40 

120 

31 

96 

33 

127 

36 

140 

46 

171 

44 

144 

43 

141 

40 

172 

35 

160 

48 

143 

42 

138 

49 

106 

45 

138 

41 

154 

40 

95 

58 

149 

39 

127 

57 

146 

23 

142 

46 

156 

23 

175 

45 

166 

34 

156 

51 

126 

44 

138 

48 

133 

35 

120 

47 

150 

48 

173 

41 

154 

29 

148 

38 

134 

28 

146 

46 

166 

26 

179 

32 

154 

46 

146 

37 

159 

50 

159 

39 

167 

34 

167 

29 

175 

49 

139 

51 

136 

41 

164 

34 

183 

47 

153 

32 

111 

41 

150 

39 

145 

49 

164 

49 

179 

32 

134 

58 

119 

31 

138 

37 

184 

35 

160 

-  47 

136 

26 

154 

48 

149 

40 

172 

40 

90 

40 

149 

30 

145 

53 

143 

43 

143 

40 

109 

46 

173 

38 

159 

38 

158 

39 

168 

37 

157 

29 

115 

52 

187 

41 

153 

43 

93 

47 

166 

51 

149 

55 

163 

31 

172 

40 

163 

37 

147 

33 

189 

35 

175 

52 

169 

22 

147 

31 

133 

38 

75 

46 

150 

23 

178 

39 

152 

44 

150 

37 

168 

32 

159 

37 

143 

46 

156 

42 

150 

31 

133 

2.  In  the  Correlation  Table  1  given  below,  find 

(a)  the  coefficient  of  correlation,  and  PEr; 

(b)  the  regression  equations  in  Score  Form,  and  the  standard  errors 

of  estimate. 

(c)  What  is  the  most  probable  height  of  a  boy  who  weighs  30 

pounds?  45  pounds? 

i  See  Table  XXIV  for  the  C  worked  out  for  these  data. 


CORRELATION 


217 


Boys:   Ages  4.5  to  5.5  Years 
Weight  in  Pounds  (X) 


24-28 

29-33 

34-38 

39-43 

44-48 

49-53 

Totals 

(Fy) 

£m 

45-47 

1 

2 

3 

02 

0) 

42-44 

4 

35 

21 

5 

65 

39-41 

5 

87 

90 

7 

1 

190 

d 

•   F-H 

36-38 

1 

18 

72 

8 

99 

'53 

33-35 

5 

15 

5 

25 

w 

30-32 

2 

2 

Totals 
Fa; 

8 

38 

169 

133 

30 

6 

384 

3.  In  the  following  correlation  table,1  find 

(a)  the  coefficient  of  correlation,  and  the  PEr. 

(b)  What  is  the  most  probable  grade  of  a  pupil  who  makes  120  on 

Alpha? 


Army  Alpha 

IQ's 

School 
Marks 

84  and 
lower 

85- 
89 

90- 
94 

95- 
99 

100- 
104 

105- 
109 

110- 
114 

115- 
119 

120- 
124 

125 

over 

Totals 

90  and  over 

3 

3 

15 

12 

9 

9 

5 

56 

85-89 

8 

17 

15 

24 

13 

6 

6 

89 

80-84 

4 

6 

22 

21 

20 

10 

5 

1 

89 

75-79 

7 

25 

33 

23 

10 

7 

4 

109 

70-74 

4 

10 

18 

14 

22 

12 

1 

1 

82 

65-69 

1 

3 

3 

12 

7 

8 

8 

1 

43 

60-64 

2 

5 

3 

1 

1 

12 

Totals 

1 

7 

26 

77 

99 

105 

87 

41 

25 

12 

480 

From.  Otis,  Statistical  Methods  in  Educational  Measurement,  1925,  p.  315. 


218       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

4.  Find  the  correlation  between  the  following  test  scores  by 

(a)  the  Rank-Difference  Method,  and 

(b)  the  Method  of  Gains. 

Cancellation  Score 

(A  test  +  Number  Group 

Checking  Test) 

110 

98 

118 

104 

112 

124 

119 

95 

94 

97 

110 

94 

126 

120 

118 

(Note. — Since  the   Cancellation  scores  are  in  seconds,   the  highest 
score  (94)  is  numerically  the  lowest.) 

5.  Compute  the  coefficient  of  contingenc}^  C,  for  the  two  tables  given 

below,  which  show: 

A.  The  resemblance  between  brothers  in  athletic  capacity.1 

B.  The  resemblance  between  fathers  and  sons  in  temperament.2 


Individual 

Intelligence  Score 
(Alpha) 

Kp 
My 
Le 

185 

203 

188 

Hy 

Sh 

195 
176 

Ld 

174 

Sn 

158 

St 

197 

Wn 

176 

Pe 

138 

Gr 

126 

Bn 

160 

Gm 

151 

Ly 
Ws 

185 
185 

Athletic  Capacity — First  Brother 


« 
a 
W 

H 
O 
H 

Q 
O 

o 

w 


Athletic 

Betwixt 

Non-athletic 

Totals 

Athletic 

906 

20 

140 

1066 

Betwixt 

20 

76 

9 

105 

Non-athletic 

140 

9 

370 

519 

Totals 

1066 

105 

519 

1690 

1  From  Yule,  An  Introduction  to  the  Theory  of  Statistics,  p.  74,  after  Pearson. 

2  From  Brown  and  Thompson,  Essentials  of  Mental  Measurement,  1921 
p.  125.  The  coefficient  of  contingency  is  not  usually  calculated  for  tables  having 
less  than  a  5X5  fold  classification.  These  tables,  however,  will  illustrate  the 
method  in  a  simple  way, 


CORRELATION 


219 


B 

Fathers 


Merry 

Melancholy 

Alternating 

Even 

Totals 

Merry 

122 

8 

81 

67 

278 

Melancholy 

10 

2 

7 

10 

29 

O 

Alternating 

70 

9 

101 

68 

248 

Even 

58 

6 

66 

45 

175 

Totals 

260 

25 

255 

190 

730 

6.  The  following  correlation  table  gives  the  relation  between  the 
scores  on  the  Thorndike  College  Entrance  Intelligence  Examina- 
tion and  the  extra-curricular  activities  of  102  Columbia  College 
students.1 

(a)  Find  rjyx  for  this  table. 

(6)  Find  r,  and  test  the  regression  of  7  on  J  for  linearity. 

Thorndike  Scores  (X) 


55- 
59 

60.- 
64 

65- 
69 

70- 

74 

75- 
79 

80- 

84 

85- 
89 

90- 
94 

95- 
99 

100- 
104 

Fy 

^ 

18-20 

2 

2 

4 

02 

]+3 

15-17 

2 

3 

1 

6 

> 

•|-H 

< 

c3 

12-14 

4 

6 

2 

2 

14 

9-11 

1 

2 

4 

4 

6 

7 

3 

27 

3 

6-8 

1 

6 

2 

2 

6 

2 

4 

1 

24 

3 
o 
i 

o3 

3-5 

1 

1 

3 

5 

3 

5 

1 

1 

20 

0-2 

1 

1 

1 

1 

1 

2 

7 

Totals 

Fx 

2 

2 

3 

16 

13 

20 

16 

15 

11 

4 

102 

i  From  Sommerville,  R.  C,  Physical,  Motor,  and  Sensory  Traits.    Archives 
of  Psychology,  1924,  75,  p.  101, 


220      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

7.  Verify  the  correlation-ratio  r)xv  of  .  82  given  for  Diagram  XXYI  (see 

page  209). 

(a)  Test  the  regression  of  X  on  Y  for  linearity. 

(6)  Plot  the  regression  line  (or  curve)  on  the  diagram. 

8.  Ma  is  the  series  of  scores  from  one  trial  of  a  memory  test. 
Mb  is  the  series  of  scores  from  a  second  trial  of  the  same  test. 
Aa  is  the  series  of  scores  from  one  trial  of  an  association  test. 
A6  is  the  series  of  scores  from  a  second  trial  of  this  test. 

The  r's  are  as  follows: 

between  Ma  and  Mb,  .  60. 

between  Mb  and  Aa,  .50. 

between  Ma  and  A  b,  .55. 

between  Aa  and  A b  .72. 

Find  the  r  between  M  and  A  corrected  for  attenuation. 

Answers 

1.  r=-.05;  PEr=. 07. 

2.  (a)  r=.709;  PEr  =  .017. 

(b)  Y=  .4X+24.42;  X=l. 267-11. 66 

°"(est.  Y)  =  1  ■ '  9 ;   c(est>  X)  —  3 .  18. 

(c)  36.42  inches;  42.42  inches. 

3.  (a)  r=.455;  PEr=  .  024. 

(6)  85.4  with  a  PEiesU  Y)  of  4.75. 

4.  (a)  p=.187;  r=.19        PEr=  .18. 
(6)  #=.09;  r=.16. 

5.  A.      C=.6S  B.  C=.16. 

6.  (a)  r]yx=  A3;  r\yx  (corrected)  =  .36. 

(6)  r=  —  .09.     The  regression  is  almost  certainly  non-linear. 

8.  r=.80. 


CHAPTER  V 
PARTIAL  AND   MULTIPLE   CORRELATION1 

I.  The  Meaning  of  Partial  and  Multiple  Correlation 

The  coefficient  of  correlation  between  sets  of  test  scores 
(or  other  series  of  measures)  often  represents  not  simply  the 
degree  of  relationship  existing  between  these  measures  in 
themselves,  but  the  degree  of  this  relation  plus  the  indirect 
effect  of  other  factors  to  which  they  are  both  related.  For 
this  reason  in  measuring  the  correlation  between  two  sets  of 
measures,  it  is  necessary  that  we  eliminate  or  rule  out  as  far  as 
possible  those  uncontrolled  factors  which  through  their  common 
relation  to  the  measures  to  be  correlated  tend  to  raise  or  lower 
the  "  net  "  correlation.  As  an  illustration  of  the  effect  on 
correlation  of  uncontrolled  factors,  suppose  that  the  correlation 
between  intelligence  (i)  and  age  (a)  in  a  large  group  of  children 
whose  ages  range  from  7  to  14  years  is  rlQ;  that  the  correlation 
between  school  achievement  (s)  and  age  (a)  in  the  same  group 
is  rsa;  and  that  the  correlation  between  intelligence  (z)  and 
school  achievement  (s)  is  rls.  Xow  this  last  coefficient,  rls, 
is  not  simply  a  measure  of  the  influence  of  intelligence  on  school 
achievement,  but  is  a  measure  of  the  influence  of  intelligence, 
plus  the  indirect  effect  of  differences  in  age,  on  school  achieve- 
ment. In  order  to  determine  the  relation  between  intelli- 
gence and  school  achievement  uninfluenced  by  the  age  factor, 
it  is  necessary  to  rule  out  the  effect  of  age-differences.  This 
can  be  accomplished  in  two  ways:  (1)  by  selecting  children  all 
of  whom  are  of  the  same  age,  or  (2)  by  finding  a  "  partial  ': 
coefficient    of    correlation    between    intelligence    and    school 

1  The  discussion  of  partial  and  multiple  correlation  given  in  this  chapter  follows 
Yule  in  method  and  nomenclature. 

221 


222       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

standing.  Such  a  partial  coefficient  is  written  rl5.a,  and  may 
be  thought  of  as  giving  the  net  correlation  between  intelligence 
and  school  achievement  for  children  of  the  same  age,  or  as  the 
net  correlation  between  intelligence  and  school  achievement 
with  age  constant.  In  short,  a  coefficient  of  partial  correlation 
may  be  said  to  represent  the  net  relation  between  two  variables 
when  one  or  more  other  variables  which  might  increase  or 
decrease  the  true  correlation  have  been  ruled  out  or  held  con- 
stant. 

In  addition  to  its  value  as  a  device  whereby  we  are  able 
to  control  conditions  by  ruling  out  disturbing  factors,  partial 
correlation  is  highly  important  also  in  that  it  enables  us  to  build 
up  regression  equations  involving  three  or  more  variables  from 
which  a  test  score  (or  other  measure)  may  be  predicted  when 
we  know  the  corresponding  scores  made  on  the  other  tests. 
The  value  of  the  regression  equation  in  estimating  scores — its 
accuracy  as  a  predicting  instrument — may  be  determined  from 
the  "  multiple "  coefficient  of  correlation.1  This  coefficient 
gives  the  correlation  between  the  scores  actually  obtained  on  a 
given  test,  and  the  scores  on  the  same  test  predicted  by  the  re- 
gression equation  from  the  scores  made  on  two  or  more  correlated 
tests.  The  multiple  coefficient  of  correlation  may  be  thought 
of  also  as  giving  the  correlation  between  a  trait  (or  traits)  as 
measured  by  a  single  test,  and  the  same  trait  (or  traits)  as 
measured  by  a  number  of  tests  taken  together.  (The  multiple 
coefficient  will  be  best  understood  by  working  through  an  actual 
problem.) 

To  summarize  briefly,  partial  and  multiple  correlation 
may  be  considered  as  representing  an  important  extension 
of  the  theory  and  technique  of  "  simple  "  or  two- variable  cor- 
relation to  include  problems  which  involve  three  or  more 
variables. 

1  o"  (est.)  also  gives  the  accuracy  of  the  regression  equation  in  predicting  single 
scores.     (See  page  183.) 


PARTIAL  AND  MULTIPLE  CORRELATION  223 

II.  A  Correlation  Problem  Involving  Three  Variables 

The  simplest  and  most  straightforward  approach  to  an 
understanding  of  the  value  of  the  method  of  partial  and  mul- 
tiple correlation  and  of  the  technique  involved  is  by  way  of  an 
illustration.  In  the  present  section,  therefore,  is  shown  the 
application  of  partial  and  multiple  correlation  to  a  three-vari- 
able problem;  and  following  this,  the  general  formulas  and 
some  further  applications  of  the  method  are  considered. 

The  problem  selected  (Table  XXVI)  is  taken  from  a  study 
made  by  Professor  Mark  May  1  of  the  factors  which  influence 
"  academic  success."  In  that  part  of  his  study  from  which  our 
example  is  taken,  May  wished  to  find  how  accurately  he  could 
"  predict "  the  academic  success  or  scholastic  achievement  of  450 
Syracuse  freshmen  from  a  knowledge  of  their  general  intelligence 
and  study  habits.  Academic  success  was  defined  specifically  as 
the  number  of  "  credit "  or  "honor"  points  obtained  by  a  student 
at  the  end  of  his  first  semester  in  college.  The  number  of  honor 
points  secured  depends  on  the  number  of  A,  B,  and  C  grades 
made  by  the  student  in  his  courses.  Thus  a  grade  of  A  carries  3 
honor  points;  a  grade  of  B,  2  honor  points;  a  grade  of  C,  1 
honor  point ;  and  a  grade  of  D,  which  is  a  passing  mark,  carries 
no  honor  point  credit.  The  maximum  number  of  points  which 
a  freshman  taking  the  "  regular  "  course  can  obtain  in  one 
semester  is  48. 

General  intelligence  was  measured  by  a  combination  of  the 
Miller  Mental  Ability  Test,  and  the  Dartmouth  Completion 
of  Definitions  Test.  The  Miller  Test  contains  120  items  and 
the  Dartmouth  Test  40,  so  that  the  maximum  "  raw  score  " 
was  160.  The  scores  of  the  450  students  ranged  from  50  to 
150,  the  distribution  being  fairly  normal. 

As  a  measure  of  industry  and  application,  it  was  decided  to 
take  the  number  of  hours  per  week  spent,  on  the  average,  in 
study.     Information  in  regard  to  study  habits  was  obtained 

1  May,  Mark  A.,  Predicting  Academic  Success,  Journal  of  Educational  Psy- 
chology, 1923,  Vol.  XIV,  7,  pp.  429-440. 


224      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

by  means  of  a  questionnaire  given  at  the  beginning  and  at  the 
middle  of  the  first  semester.  Among  other  items  of  informa- 
tion asked  for  in  the  questionnaire  were  such  things  as  the 
number  of  hours  spent  per  week  at  meals,  in  sleeping,  etc.  In 
this  way  an  attempt  was  made  to  have  the  student  think  that 
he  was  being  checked  up  on  the  distribution  of  his  total  time, 
and  not  on  his  study  habits  alone.  The  self-correlation  between 
the  two  statements— number  of  hours  spent  in  study — on  the 
first  and  second  questionnaires  was  .86,  which  indicates  a  very 
satisfactory  degree  of  reliability. 

As  previously  stated,  the  main  object  of  this  study  was  to 
find  how  accurately  the  number  of  honor  points  which  a  student 
receives  can  be  predicted  from  a  knowledge  of  his  study 
habits  and  his  general  intelligence.1  In  solving  this  problem, 
however,  it  is  necessary  to  find  the  partial  coefficient  which 
shows  to  what  extent  honor  points  are  related  to  general 
intelligence  when  the  variable  factor  of  study-hours  per  week 
is  held  constant;  and  also  the  partial  coefficient  which  shows 
to  what  extent  honor  points  are  related  to  study-hours  when 
the  variable  factor  of  general  intelligence  is  held  constant. 
This  information,  in  itself,  will  prove  to  be  of  considerable 
interest.  The  solution  of  the  whole  problem  is  given  in  the 
following  series  of  steps — the  necessary  data  and  statistics 
will  be  found  in  Table  XXVI 

Step  I.  Note  that  the  mean  and  a  of  each  series  of  measures, 
and  the  inter  correlations  are  first  calculated.  These  inter- 
correlations  are  the  usual  product-moment  r's,  computed  as 
shown  in  Chapter  IV.  The  r  between  (1)  honor  points,  and 
(2)  general  intelligence,  written  ru  is  .60;  the  r  between  (1) 
honor  points  and  (3)  number  of  study  hours,  written  ri3,  is  .32; 
and  the  r  between  (2)  general  intelligence  and  (3)  number  of 
study  hours,  i.e.,  r23,  is  —.35.  The  low  correlation  between 
honor   points    and    study-hours    is   of   considerable    interest; 

1  Other  factors,  of  course,  such  as  health,  personality,  previous  preparation, 
etc.,  are  of  considerable  importance  in  determining  honor  points  as  May  indicates 
in  his  article.  The  two  factors  selected  were  chosen  simply  because  they  are 
not  only  important,  but  also  objective  and  measurable. 


PARTIAL  AND  MULTIPLE  CORRELATION  225 

but  probably  the  most  interesting  r  is  the  —  .35  between  study- 
hours  and  general  intelligence.  Evidently,  the  brighter  the 
student,  the  less  he  studies! 

Step  II.  The  next  step  is  to  calculate  the  "  net  "  correlation 
between  (1)  honor  points  and  (2)  general  intelligence  with  the 
influence  of  (3)  study-hours  "partialed"  out  or  held  constant. 
This  net,  or  partial  coefficient  of  correlation,  is  written  ri2.3. 
The  formula  1  for  ri2.3  is 

7-12.3  =  77.=^        / — -f=.  [Formula  (49),  page  232]. 
vi  —  r  13      vi — r  23 

Substitution  of  the  values  of  n.2,  nz,  and  r23  in  the  formula 
gives  ri2.3  a  value  of  .802.  This  means  that  if  all  of  our  450 
students  studied  exactly  the  same  number  of  hours  per  week 
(i.e.,  if  the  number  of  study  hours  were  constant),  the  coefficient 
of  correlation  between  honor  points  earned  and  general  intel- 
ligence scores  would  be  .802  instead  of  .60,  the  obtained  coeffi- 
cient, ri2.  In  other  words,  if  each  student  spent  the  same 
number  of  hours  in  study,  there  would  be  a  much  closer  corre- 
spondence between  general  intelligence  and  honor  points  than 
there  is  when  the  number  of  study  hours  varies. 

The  partial  coefficient  of  correlation  between  (1)  honor 
points  and  (3)  hours  spent  in  study  for  (2)  general  intelligence 
constant  is  given  by  the  formula 

ri3.2  =    ,    ri8~r"?gl=.        [Formula  (49)] 
vl-r2i2vl  — H23 

Substitution  of  the  values  of  7*13,  ^12  and  r23  gives  a  partial 
coefficient  713.2=  .707  as  against  a  "raw"  coefficient,  7*13,  of  .32. 
It  is  evident,  therefore,  that  if  our  group  were  of  the  same  degree 
of  general  intelligence  2  there  would  be  a  much  closer  correspond- 

1  The  general  formulas  from  which  this  and  other  formulas  used  in  this 
section  are  derived  will  be  found  in  Section  III  following. 

2  By  "  same  degree  of  general  intelligence  "  is  meant  the  same  score  on  the 
given  general  intelligence  tests. 


226       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

ence  between  the  number  of  honor  points  received  and  the 
number  of  hours  spent  in  study  than  there  is  when  the  members 
of  the  group  possess  varying  degrees  of  general  intelligence — and 
this  is  certainly  the  result  to  be  expected. 

The  last  partial  coefficient  of  correlation  r2s.i=—  .715. 
This  coefficient  gives  the  net  correlation  between  (2)  general 
intelligence  and  (3)  study-hours,  for  (1)  honor  points  held 
constant,  and  is  found  from  the  formula 

r23.i  =    .        9       .-       =    .        [Formula  (49)] 
V 1  —  rJi2  v  1—  H13 

Like  the  two  partial  r's  above,  we  may  interpret  r2z.\  to  mean 
that  the  correlation  between  general  intelligence  score  and 
hours  spent  in  study  in  a  group  in  which  every  student  has 
earned  the  same  number  of  honor  points  would  be  much  higher — 
negatively — than  the  raw  correlation  between  these  same  two 
factors  in  a  randomly  selected  group — a  group  in  winch  the 
number  of  honor  points  received  by  different  students  vary. 
Thus  we  discover  that  the  brighter  students  not  only  study 
less  than  the  average  and  dull  (since  ros  =  —  .35)  but  that  the 
brighter  the  student  the  less  he  needs  to  study  in  order  to  reach 
a  given  standard  of  academic  success, — to  secure  a  given  number 
of  honor  points  (since  r23.i=  —.715). 

Step  III.  The  partial  coefficients  of  correlation  calculated, 
the  next  step  is  to  write  the  regression  equation  from  winch  the 
most  probable  number  of  honor  points  which  a  student  will 
receive  can  be  estimated,  given  his  general  intelligence  score  and 
the  number  of  hours  he  spends  in  study  per  week.  The  regres- 
sion equation  for  three  variables  is  written — in  Deviation  Form 
— as  follows:   [Formula  (51)]. 

Xl  =  bi2.3X2  +  bi3. 2.T3- 

In  this  formula  x\  is  the  dependent  variable  and  stands  for 
honor  points;    X2  and  £3  are  the  independent  variables,  and 


PARTIAL  AND  MULTIPLE  CORRELATION  227 

stand  for  general  intelligence  and  study-hours  respectively.1  In 
Score  Form  the  equation  becomes:  [Formula  (52)] 

(Xi-Av.Xi)=6i2.3(Z2-Av.Z2)+6i3.2(X3-Av.X8), 
or  transposing  and  collecting  terms, 

X\  —  612.3  X2+613.2  Xz-\-K  (a  constant). 

It  is  clear  that  before  we  can  use  this  equation  we  must 
find  the  values  of  the  regression  coefficients  612.3  and  613.2. 
These  are  found  from  the  formulas, 

&12.3  =  7*12.3-^;    and    613.2 =ri3.2—1—,    [Formula  (53)] 
0"2.13  0-3.12 

and  as  we  already  have  the  value  of  ri2.3  and  7*13.2  it  is  only 
necessary  to  find  0-1.23,  0-2.13,  and  0-3.12  (the  "partial"  o-'s)  in 
order  to  replace  the  regression  coefficients  in  the  equation  by 
numerical  values. 

Step  IV.  The  values  of  the  "  partial  "o-'s  are  found  from 
the  formulas,  _____     

1.  0-1.23  =01  Vl— r2i2Vl— r2i3.2. 

2.  02.13  =02 Vl  — r^Vl— r2i2.3.    [Formula  (50)] 

3.  0-3.12=0-3^1  —  r223^1—  ^213.2. 

Substituting  the  known  values  of  the  raw  and  partial  r's  in  these 
formulas  we  get  0-1.23  =  6.34;  0-2.13  =  8.84;  03.12  =  3.97.  (For 
calculations,  see  Table  XXVI.) 

Step  V.  From  the  partial  o-'s  and  the  partial  r's,  the  numerical 
values  of  the  regression  coefficients  612.3  and  613.2  are  found  to 
be  .57  and  1.13,  respectively.  Hence  we  may  now  write  the 
regression  equation  as 

#1=  .57^2  +  1.13x3; 

or  multiplying  by  a  convenient  constant  (e.g.,  by  1.75),  (the  num- 
ber of  honor  points)  =  1  (score  on  the  intelligence  tests)  +2  (num- 
ber of  hours  spent  in  study  per  week).  It  is  evident  from  this 
equation  that  in  so  far  as  the  general  intelligence  score  and 

1  Note  the  resemblance  of  this  equation  to  the  simple  regression  equation 
for  two  variables  y=bn-x  (page  174).  If  x\  is  put  for  y  and  x2  for  x  in  this 
equation,  we  have,  21  =612 -£2. 


228      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

number  of  study  hours  per  week  determine  the  number  of  honor 
points  received,  their  relative  weight  is  as  1 :  2. 

TABLE  XXVI 

A  Correlation  Problem  Involving  Three  Variables 
Step  I 

(1)  Honor  Points         (2)  General  Intelligence     (3)  Hours  of  Study 

per  Week 
ilfi  =  18.5  ikf2  =  100.6  Af3  =  24 

Ol  =  11.2  (T2  =  15.8  03=6 

ri2=.60  ri3=.32  r23=-.35 

Step  II.    Calculation  of  Partial  Coefficients  of  Correlation,     (see  Note) 

ftM«,       **-'•»•'■» =160-.32(-.35)  = 

n-3       Vr^WI^3        •  9474  X.  9367       '**'     '     '     ^ 
ri3-ri2r23  =  .32-  .60(-  .35)  =   7QfJ 

Vl-r^Vl^rSa  .8X.9367 

_        r23— ri2r.3         _  —  .35—  .32X  .60_       __ 

^"vr^^yp^Ts"    .8X.9474         •'*■ 

*  For  Vl— r2  values,  use  Table  XXVII. 
Step  III.    The  Regression  Equations 

Xi=  612.3X2+613.^3     (Deviation  Form),     ....     (51) 
or 

Xi  =  bi2.zX2+bu.2X3+K.     (Score  Form),  ....     (52) 
in  which 

6i2.3=n2.3 — —     and    613.2=7*13.2 — — (53) 

02.13  0"3.12 

Step  IV.    Calculation  of  o's 

(1)  Q-1.23  =<riVl-y2i2Vl-r2i3.2  =  11.2X.8X. 7072=6. 34.      .     (50) 

(2)  q-2.13  =0-2  Vl  -rhsVl  -r2i2.3  =  15 . 8  X  ■  9367  X  ■  5973  =  8 .  84 

(3)  o3.i2  =  (Wl-r223Vl-r2i3.2  =  6X.9367X. 7072  =  3. 97 
Step  V.    The  Regression  Coefficients  and  Regression  Equation 

Substituting  for  7*12.3,  7*13.2,  0-1.23,  0-2.13,  0-3.1-2,  we  have 

612.3=. 802 x|^=. 57;  613.2=  .707  X§^  =  1.13. 

Hence  the  regression  equation  becomes: 

xi  =  . 57a*2+l .  13.r3      (Deviation  Form), 
or  Zi=  .57X2+1.13X3-66      (Score  Form). 

Step  VI.     Calculation  of  the  Standard  Error  of  Estimate 

o(est.  Xi)  =oi.23  =  6.34 (54) 

P#(est.A-i)  =  .6745X6.34=4.2S (55) 

Step  VII.    The  Coefficient  of  Multiple  Correlation 

7^(23)  =  Jl--!A3 (56) 

™         0-1 

=  .824 

Note. — It  should  be  noted  that  while  the  partial  coefficient  of  correlation 
7*23.1  is  of  interest  as  giving  us  the  relation  between  general  intelligence  and  hours 


PARTIAL  AND  MULTIPLE  CORRELATION  229 

spent  in  study  for  a  constant  number  of  honor  points,  it  is  unnecessary  in  the 
regression  equation,  x\  =612.3^2  +&13. 2^3.  In  order  to  evaluate  the  constants 
612.3  and  613.2  in  this  regression  equation,  we  need  only  7-12.3  and  ^13.2.  In  any 
problem  involving  three  variables,  only  two  partial  coefficients  of  correlation 
need  be  computed,  if  we  are  interested  only  in  the  prediction  of  Xi  values  from 
known  values  of  X2  and  X3. 


to  Infer  the 

TABLI 

Value  of 

5  XXVII 

a  Given 

A  Table 

V  1—  r2  FROM 

Value  of  r 

r 

Vl-r2 

r 

Vl-r2 

r 

Vl-r2 

.00 

1.0000 

.34 

.9404 

.68 

.7332 

.01 

.9999 

.35 

.9367 

.69 

.7238 

.02 

.9998 

.36 

.9330 

.70 

.7141 

.03 

.9995 

.37 

.9290 

.71 

.7042 

.04 

.9992 

.38 

.9250 

.72 

.6940 

.05 

.9987 

.39 

.9208 

.73 

.6834 

.06 

.9982 

.40 

.9165 

.74 

.6726 

.07 

.9975 

.41 

.9121 

.75 

.6614 

.08 

.9968 

.42 

.9075 

.76 

.6499 

.09 

.9959 

.43 

.9028 

.77 

.6380 

.10 

.9950 

.44 

.8980 

.78 

.6258 

.11 

.9939 

.45 

.8930 

.79 

.6131 

.12 

.9928 

.46 

.8879 

.80 

.6000 

.13 

.9915 

.47 

.8827 

.81 

.5864 

.14 

.9902 

.48 

.8773 

.82 

.5724 

.15 

.9887 

.49 

.8717 

.83 

.5578 

.16 

.9871 

.50 

.8660 

.84 

.5426 

.17 

.9854 

.51 

.8617 

.85 

.5268 

.18 

.9837  ( 

.52 

.8542 

.86 

.5103 

.19 

.9818 

.53 

.8480 

.87 

.4931 

.20 

.9798 

.54 

.8417 

.88 

.4750 

.21 

.9777 

.55 

.8352 

.89 

.4560 

.22 

.9755 

.56 

.8285 

.90 

.4359 

.23 

.9732 

.57 

.8216 

.91 

.4146 

.24 

.9708 

.58 

.8146 

.92 

.3919 

.25 

.9682 

.59 

.8074 

.93 

.3676 

.26 

.9656 

.60 

.8000 

.94 

.3412 

.27 

.9629 

.61 

.7924 

.95 

.3122 

.28 

.9600 

.62 

.7846 

.96 

.2800 

.29 

.9570 

.63 

.7766 

.97 

.2431 

.30 

.9539 

.64 

.7684 

.98 

.1990 

.31 

.9507 

.65 

.7599 

.99 

.1411 

.32 

.9474 

.66 

.7513 

1.00 

.0000 

.33 

.9440 

.67 

.7424 

To  write  the  regression  in  Score  Form,  we  simply  replace 
xi  by  (Xi-18.5);  x2  by  (X2-100.6);  and  £3  by  (X3-24). 
The  equation  then  becomes 

Xi=.  57X2+ 1.13X3  -66. 


230      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

Given  a  student's  general  intelligence  score  (X2)  and  the 
number  of  hours  he  spends  in  study  per  week  (X3)  we  can,  from 
this  equation,  estimate  the  most  probable  number  of  honor  points 
which  he  will  receive  in  the  first  semester.  By  way  of  illustra- 
tion, suppose  that  a  student  has  a  general  intelligence  score  of 
120  points  and  that  he  studies  on  the  average  20  hours  per 
week:  how  many  honor  point  will  he  most  probably  receive 
during  the  first  semester?  Substituting  X2  =  120  and  X3  =  20 
in  the  regression  equation,  we  have  that 

Xi=*. 57X120+1. 13X20-66,  or  Xi  =  25. 

The  most  probable  number  of  honor  points  which  this  student 
will  receive,  therefore,  using  the  given  criteria  as  the  basis  of  our 
estimate,  is  25. 

Step  VI.  This  estimate — like  every  other  "  most  probable  " 
number  of  honor  points  predicted  from  the  regression  equation 
— has  a  certain  "  error  of  estimate."  The  standard  error  of 
estimate  of  all  honor  points,  i.e.,  Xi's,  predicted  from  the 
regression  equation  Xi  =  612.3X2 +&i3.2X3-|-i£  is  designated 
o-(est.xi)    and    equals   0-1.23    [see  Formula    (50)]   directly.     The 

Potest.  Xi)  IS    •  6745  X  <7(est.  Xx). 

The  standard  error  of  estimate  in  the  present  problem  is 
6.34  points,  and  the  PE^t.Xl)  is  4.28  points.  In  the 
illustration  above,  therefore,  the  25  estimated  honor  points 
have  a  PE^st.xi)  °f  4.28  points,  which  means  that  the  chances 
are  even — 50  in  100 — that  this  student  will  receive  (roughly) 
not  less  than  21  nor  more  than  29  honor  points.  The  reliability 
of  any  other  honor  points  estimate  made  from  the  regression 
equation  may  be  found  in  exactly  the  same  way. 

Step  VII.  The  final  step  in  the  solution  of  our  problem  is  to 
compute  the  coefficient  of  multiple  correlation.  This  "  mul- 
tiple r,"  which  is  generally  written  R1,  has  been  defined  (see 
page  222)  as  the  coefficient  of  correlation  between  the  scores 

1  Multiple  R  must  not  be  confused  with  the  R  of  the  Spearman  FootruJe 
formula,  page  104. 


PARTIAL  AND  MULTIPLE  CORRELATION  231 

actually  made  on  a  given  test  and  the  scores  on  the  same  test 
predicted  from  the  regression  equation.  Expressed  more 
mathematically,  R  gives  the  correlation  between  the  dependent 
variable  Xi,  and  the  independent  variables,  X2,  X3,  etc.,  taken 
together  as  a  team.  The  formula  for  R  when  there  are  two 
independent  variables  is  

Ri&3)  =  ^l-^^.  [Formula  (56)] 

In  the  present  problem,  i2i(23)=  .824.  This  means  that 
if  the  most  probable  number  of  honor  points  which  each 
student  in  our  group  of  450  will  receive  is  predicted  from  the 
regression  equation,  the  correlation  between  these  450  pre- 
dicted scores  and  the  450  scores  actually  received  will  be. 824. 
Multiple  R,  therefore,  tells  us  how  closely  Xi  is  related  to  the 
combined  action  of  X2  and  X3,  or — in  the  present  instance — how 
closely  honor  points  are  related  to  general  intelligence  and  num- 
ber of  hours  spent  in  study  per  week,  taken  together. 


III.  General  Formulas  for  Use  in  Partial  and  Multiple 

Correlation 

I.  General  Formulas  for  Partial  r's 

We  have  found  (Table  XXVI)  that  in  a  correlation  problem 
involving  three  variables,  we  are  enabled  by  the  method  of 
partial  correlation  to  find  the  net  relation  between  two  variables 
when  a  third  is  ruled  out  or  held  constant.  In  like  manner,  by 
an  extension  of  the  method  of  partial  correlation,  we  can  secure 
the  net  correlation  between  Xi  and  X2  when  two  or  more 
variables  have  been  ruled  out  or  held  constant.  Thus  the 
partial  coefficient  of  correlation  7-12.34  means  by  analogy  to 
ri2.s  that  the  correlation  between  Xi  and  X2  has  been  freed 
of  the  influence  of  both  X3  and  X4;  and  the  partial  coeffi- 
cient of  correlation  ri2.34 . . .  n  means  that  the  correlation 
between  Xi  and  X2  has  been  freed  (theoretically)  of  the 
influence  of  all  disturbing  factors. 


232      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

In  every  partial  coefficient  of  correlation  the  subscripts 
to  the  left  of  the  point  are  called  primary  subscripts  and  denote 
the  two  variables  whose  correlation  we  are  seeking.  The 
subscripts  to  the  right  of  the  point  are  called  secondary  sub- 
scripts, and  denote  those  variables  which  are  to  be  ruled  out 
or  held  constant.1  The  order  of  a  partial  r  is  determined  by 
the  number  of  its  secondary  subscripts:  ru.z  or  7*13.2  or 
7*23. 1,  for  example,  is  a  partial  r  of  the  first  order,  while  "  entire  " 
or  "  total  "  r's,  such  as  r\2  or  ri3  or  r23  are  coefficients  of  zero 
order. 

The  general  formula  for  partial  r's  of  the  nth  order  is  written 

^12.34  .  .  .  (n-1)— rin.34  .  •  .  (n- l)?"2n.34  .  .  .  (n-1)         //(m 
7*12.34  .  .  .  « =        ,  7=  =  .       (49) 

VI—  rzin.34  .  .  .  (n-1) V  1— 7*-2n.34  .  .  .  (n-1) 

From  formula  (49)  partial  r's  of  any  given  order  can  be  found. 
In  a  four-variable  problem,  for  example,  ri2.34  may  be  written 
by  reference  to  the  formula  as 

ri2.3  — "14.37*24.3 
7-12.34  =      ,  j====, 

V  1  —  H14.3V  1  —  H24.3 

that  is  to  say,  in  terms  of  the  partial  r's  of  the  first  order.  These 
first  order  partial  r's  must  then  be  computed  by  (49)  from  r's 
of  zero  order  before  the  second  order  r's  can  be  evaluated.  To 
find  partial  r's  of  a  higher  order,  we  must  first  express 
them  in  terms  of  the  partial  r's  of  the  next  lower  order;  and 
these  r's,  in  turn,  in  terms  of  r's  of  the  next  lower  order,  and  so 
on  until  r's  of  zero  order  have  been  reached.2  In  other  words, 
it  is  necessary  to  "work  up"  from  zero  order  r's,  whenever  r's 
of  any  higher  order  are  to  be  computed.  Hence  it  is  apparent 
that  with  each  additional  variable  the  arithmetic  of  calculation 

1  The  order  in  which  the  secondary  subscripts  are  written  is  entirely  imma- 
terial, e.g.,  7*12.34  —  fn. 43-  The  order  of  the  primary  subscripts  is  of  importance, 
however,  in  telling  us  which  variable  is  "  dependent  "  and  which  "independ- 
ent." Thus  m  means  that  Xi  is  dependent — is  to  be  predicted  from  X%\  while 
m  means  that  X2  is  dependent — is  to  be  predicted  from  Xi.  The  numerical 
value  of  ri2  and  m  is,  of  course,  the  same. 

2  In  calculating  partial  r's,  use  Table  XXVII  to  get  VI  —  r2  values. 


PARTIAL  AND  MULTIPLE  CORRELATION  233 

is  greatly  increased.     As  a  result,  unless  the  work  is  carefully 
planned,   the   calculations   soon  become   extremely  laborious. 
The  PE  of  a  partial  r  of  any  order  may  be  found,  like  the 
PE  of  an  " entire"  r,  by  substituting  in  formula  (26). 

2.  General  Formulas  for  Partial  cr's  of  Any  Order 

Just  as  the  correlation  between  two  sets  of  scores  or  other 
measures  can  be  determined  when  the  influence  of  1,  2,  3,  ...  n 
other  factors  is  held  constant,  so  the  variability  (the  a)  of 
any  set  of  scores  can  be  found  when  the  influence  of  1,  2,  3,  ...  n 
factors  is  held  constant.  As  an  illustration  of  this,  take 
0*1.23  of  Table  XXVI.  This  "  partial  o-"  gives  the  variability 
of  Xi  (honor  points)  freed  of  the  influence  exerted  by  the  two 
factors  X2  (general  intelligence)  and  X3  (average  study-hours 
per  week).     The  general  formula  for  a'$  of  any  order  is 

(T1.234  .  .  .  n  =  0'l'V/l  —  r2l2Vl—  r2i3.2^1  —  r2l4.23  •  •  . 

Vl-r2l7,23...u-i) (50) 

This  formula  may  be  used  to  compute  the  net  o-'s  in  correlation 
problems  which  involve  any  number  of  variables.  In  a  five- 
variable  problem,  for  example,  01.2345  is  written 


(1)  01.2345  =  01  Vl — r2i2  Vl  —  r2i3.2  Vl  —  r2i 4.23^1  —  r2i 


5.234 


and  by  analogy  to  (1)  or  by  reference  to  (50)  the  other  o-'s  may 
be  written: 

(2)  02.1345  =  02 Vl  —  r2i2 Vl  —  r223.i v'l — r224.i3 Vl  —  r225.i34 


(3)  03.1245  =  03 Vl  —  r2l3  Vl  —  r223.1  Vl  — r234.12 Vl  —  r235.124 

(4)  04.1235  =  04 Vl-r2i4Vl-r224.lVl-r234.12Vl-r245.123 


(5)  05.1234  =  05 Vl  —  r2i5Vl  — r225.iVl  —  r235.i2Vl  — r2 


45.123 


Each  of  these  o-'s  measures  the  variability  of  a  single  factor 
when  the  effects  of  the  other  four  are  ruled  out  or  held  con- 
stant. All  of  them  are  o's  of  the  fourth  order,  since  there  are  4 
secondary  subscripts,  and  the  order  of  a  partial  a,  like  the  order 


234      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

of  a  partial  r,  is  determined  by  the  number  of  its  secondary 
subscripts. 

By  a  simple  rearrangement  of  the  secondary  subscripts  any 
higher  order  o  may  be  written  in  more  than  one  way.  A  a  of 
the  second  order  may  be  written  in  two  ways:  e.g.,  0-1.23  which  is 
given  on  page  227  as  0-1.23  =  Q'iV/l  —  r^Vl  —  r2i3.2  may  also  be 
written  0-1.32  =  o-i  V 1  —  r^v^l  —  r2i2.3- 
In  like  manner,  0-2.13  may  be  written 

(1)  0-2.13  =  0-2 Vl  —  f2l2 Vl  —  r223.i, 
or  

(2)  0-2.31  =  0-2^1—  r223^1  —  r2l2.3j 


and 

0-3.12  may 

be  written 

(1)  03.12  = 

(2)  0-3.21  = 

or 

0-3V1- 

-r2i3Vl- 

-r223.i 

0-3  Vl - 

-r223Vl- 

-r2i3.2. 

The  alternate  forms  of  a  partial  a  are  useful  as  a  check  on  the 
arithmetic  calculations,  and  too  because  they  make  unnecessary 
the  calculation  of  otherwise  unused  and  hence  superfluous 
partial  r's.  Thus  by  using  the  second  forms  of  02.13  and  0-3.12 
instead  of  the  first  (see  Table  XXVI)  wTe  make  unnecessary 
the  calculation  of  r23.i  so  far  as  the  computation  of  the  o-'s  is 
concerned.  Furthermore,  if  r23.i  is  not  used  elsewhere  in  the 
problem,  it  need  not  be  calculated  at  all  (see  page  228).  Two 
partial  r's,  are  all  that  we  need  in  order  to  write  the  regression 
equation  in  a  three-variable  problem. 

The  number  of  alternate  forms  in  which  any  higher  order  0- 
may  be  written  depends  on  the  number  of  permutations  which 
its  secondary  subscripts  can  take.  We  have  seen  that  a  second 
order  a  may  be  written  in  two  ways:  0-1.23  and  0-1.32.  In  the 
same  way,  any  0-  of  the  third  order,  e.g.,  0-1.234  may  be  written 
in  6  ways:  01.234,  0*1.243,  0-1.324,  01.342,  0-1.423,  0-1.432.  Any  <r  of 
the  fourth  order,  e.g.,  0-1.2345  may  be  written  in  24  ways,  and 
any  a  of  the  fifth  order,  e.g.,  01.23450,  in  120  ways.1 

1  This  follows  from  the  law  of  permutations.     The  permutations  of  4  things 
taken  4  at  a  time  are  4^4  =  4X3X2  XI  =24;  and  the  permutations  of  5  things 


PARTIAL  AND  MULTIPLE  CORRELATION  235 

Fortunately  we  need  only  a  very  few  of  all  of  these  possible 
arrangements.  Care,  nevertheless,  must  be  taken  that  the 
correct  forms  are  chosen,  for  just  as  the  number  of  partial  r's 
which  must  be  computed  in  a  3-variable  problem  can  be  reduced 
by  a  judicious  choice  of  <r  formulas,  so  also  in  problems  which 
contain  more  than  3  variables  the  number  of  partial  r's  may  be 
considerably  reduced  by  proper  selection.  And  it  is  in  the 
longer  problems  that  a  reduction  of  the  number  of  partial  r's  to 
be  computed  counts  most,  since  it  is  here  that  the  calculations 
become  laborious.  The  partial  a's  which  require  the  calcula- 
tion of  the  minimum  number  of  partial  r's  are  given — for  4-  and 
5-variable  problems — in  the  outline  solutions  on  pages  240-244. 
These  will  be  found  useful  for  quick  reference.  By  analogy 
to  these,  the  selection  of  the  a  formulas  in  problems  which 
involve  more  than  five  variables  can  be  easily  made. 

3.  General  Formulas  for  the  Regression  Equation,  and  Co- 
efficients of  Regression 

The  general  regression  equation,  which  expresses  the  rela- 
tion between  a  single  dependent  variable,  Xi,  and  a  number  of 
independent  variables,  X2,  X3,  X4  .  .  .  Xn,  may  be  written  in 
Deviation  Form  as  follows : 

Xl  =  6l2.34  ...  n     X2  +  bl3.24  .  .  .  n      #3+    .   .   .  &ln.23  .  .  .  (n-1)      Xn.       (51) 

and  in  Score  Form  as 

Xl  =  6l2.34  .  ..  n  X2  +  613. 24  .  .  .  raX3+  .  .  .  6ln.23  .  .  .  (n-l)  Xn~\-K.    (52) 

The  regression  coefficients  612.34  .  .  .  »,  613.24  .  .  .  »,  etc.,  give  the 
weight  or  value  to  be  attached  to  each  independent  variable 
when  Xi  is  to  be  estimated  from  all  of  these  in  combination. 
Moreover,  the  regression  coefficients  indicate  the  weight  which 
each  independent  variable  has  in  determining  Xi  exclusive  of  the 
influence  of  the  other  variables,  and  hence  we  can  tell  from  the 
regression  equation  just  what  part  the  score  on  each  of  several 

taken  5  at  a  time  are  6P&  =  5  X4  X3  X2  X 1  =  120.  In  general,  the  permutations 
of  n  things  taken  n  at  a  time  are  nPnacn{n  —  l)(ji—2)  .  .  .  to  n  factors.  See 
the  Chapter  on  Permutations  and  Combinations  in  any  Algebra. 


236       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

tests  plays  in  determining  the  score  on  the  test  taken  as  the 
dependent  variable. 

The  regression  coefficients  in  a  regression  equation  may  be 
computed  from  the  formula 

7  CI. 234  .  .  .  n  /ro\ 

012.34  .  .  .  n  =  ^12.34  .  .  .  n •        •       •       •       \06) 

02.134  .  .  .  n 

If  the  problem  involves  only  three  variables,  the  regression 
equation  becomes  Xi  =  612.3X2+013.2X3  -\-K.  In  this  equa- 
tion, the  regression  coefficients  612.3  and  613.2  are — like  the 
partial  r's,  ri2.3,  and  ri3.2 — of  the  first  order.     The  first,  612.3, 

equals  ri2.3 — : — ;     and  the  second,  613.2,  equals  7*13.2 — : —  (see 

0-2.13  03.12 

page  227  and  Table  XXVI).  Regression  equations  which 
involve  more  than  three  variables  are  easily  written  by  refer- 
ence to  formula  (52)  and  their  regression  coefficients  may  be 
found  from  formula  (53).  In  a  five-variable  problem,  for 
example,  the  regression  equation  becomes 

Xi  =  612.345X2+613.245X3+614.235X4+615.234X5+^, 

and  the  regression  coefficients  (6's  of  the  third  order)  are 

01.2345 


6l2.345  =  7-12.345 
6l3.245  =  ^13.245 
6l4.235  =  7,14.235 
6l5.234  =  7*15.234 


0-2.1345 

01-2345 
0-3.1245 

Q'1.2345 
0-4.1235 

Q'1.2345 
0-5.1234 


Obviously,  to  compute  these  regression  coefficients  we  must 
first  compute  the  third  order  partial  r's,  and  the  necessary 
partial  q-'s.  The  calculation  of  the  6's  is  then  a  matter  of  sub- 
stitution. 


PARTIAL  AND  MULTIPLE  CORRELATION  237 

4.  General  Formulas  for  Standard  and  Probable  Errors  of 
Estimate 

All  Xi  scores  estimated  from  a  regression  equation  have  a 
standard  error  of  estimate,  a^st-xo,  which  measures  the  error 
made  in  taking  estimated  instead  of  actual  scores  (see  page  230) . 
cr {eat. xo  is  found  from  the  formula  for  0-1.234  ...  n,  as  follows: 

C(est.  Xi)  =  0"1.234  ...  n, (54) 

and 

P#(est.X1)=.6745X<X(est.X1) (55) 

As  ci.234  .  .  .  n  must  always  be  computed  in  order  to  find 
the  regression  coefficients  (see  examples  above),  o-(est.  xo  is 
known  at  once  without  further  calculation.  The  value  of  a 
standard  error  of  estimate  has  already  been  illustrated  on  page 
230  from  the  data  of  Table  XXVI.  To  repeat,  we  find  in 
Table  XXVI,  that  the  o-^st.x^  °f  anY  estimated  number  of 
honor  points  is  6.34,  and  that  the  P£T(est.^1)  is  4.28  points. 
Hence,  the  chances  are  even  that  the  "most  probable,"  i.e., 
estimated,  number  of  honor  points  received  by  any  student — as 
found  from  the  regression  equation — will  be  in  error  by  4  points 
or  less  (roughly).  We  may  be  practically  certain  that  any 
estimated  number  of  honor  points  is  not  in  error  by  more  than 
4X4  or  16  honor  points. 

It  may  be  shown  by  the  method  of  least  squares  x  that  the 
standard  error  (or  PE)  of  estimate  is  a  minimum  when  the 
regression  equation  is  used  to  estimate  the  Xi  scores.  For  this 
reason,  values  of  Xi  predicted  from  the  regression  equation  are 
said  to  be  the  "best"  estimates  of  the  actual  Xi  values  which 
can  be  made  from  a  linear  equation  which  contains  the  given 
variables.  The  regression  equation  Xi  =  . 57X2  +  1.13X3  — 66 
(see  page  230)  will  serve  as  an  illustration  of  what  is  meant. 
Assuming  that  the  relation  between  Xi  and  X2,  Xi  and  X3, 
and  X2  and  X3  is  linear  in  every  case,  Xi  (honor  points)  can  be 
estimated  from  this  equation  with  a  smaller  error  of  estimate 
than  from  any  other  equation. 

1  See  Yule,  An  Introduction  to  the  Theory  of  Statistics,  p.  231. 


238      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

6.  General  Formula  for  R,  the  Coefficient  of  Multiple  Correlation 

The  correlation  between  a  single  dependent  variable  X\  and 
(n  — 1)  independent  variables, — e.g.,  X2,  X3,  X4  .  .  .  Xn — in 
combination  is  given  by  the  formula 


#1(23  .  .  •  n)  =  \/l ~  0     '      H,    ....        (56) 

\  <T"l 

in  which  #i(23  . . . »)  is  the  coefficient  of  multiple  correlation, 

c\  is  the  o-  of  the  dependent  series  of  X\  scores,  and  0-1.23  ...  n 

equals  the  standard  error  of  estimate  (see  formula  54).     When 

there  are  only  three  variables,  the  multiple  coefficient  of  cor- 

2 

O    1  2*^ 

1 ^— ;    when    there    are  five 


R 

1(23) 

=  A 

h 

C21.23 

\ 

/1- 

9 
or  1 

.2345 
o 

;    and 

variables  #k2345)  =  \/1 5 — ;    and   in   like   manner  the  R 

\  0-1 

for  six,  seven,  or  any  number  of  variables  may  be  written  by 
reference  to  (56). 

Since  the  error  of  estimate  is  a  minimum  when  the  regression 
equation  is  used  for  estimating  Ari  scores,  it  follows  that 
the  multiple  coefficient  of  correlation  R  gives  the  maximum 
correlation  obtainable  between  the  actual  X\  scores  and  X\ 
scores  estimated  from  a  knowledge  of  the  independent  vari- 
ables X2,  X3  .  .  •  Xn,  in  the  regression  equation.  R  is  valu- 
able, therefore,  as  indicating  how  effectively  a  given  com- 
bination of  measures  (or  "team  of  tests")  represents  the  actual 
values  of  X\  when  these  measures  are  combined  in  the  best 
possible  way.  R  is  always  positive  no  matter  what  the 
signs  in  the  regression  equation  may  be.  Errors  of  sampling, 
therefore,  do  not  neutralize  each  other  but  tend  to  become 
cumulative.  As  a  result,  the  PE  of  R — which  is  found  from  the 
same  formula  as  the  PE  of  any  product-moment  ?' — is  not  a 
fair  measure  of  the  coefficient's  validity.  To  test  the  validity 
of  an  obtained  R,  we  must  compare  it  with  the  value  of  that  R 
which  we  should  get  from  the  same  number  of  cases  and  the 
same  number  of  variables,  when  the  variables  are  uncorrected, 


PARTIAL  AND  MULTIPLE  CORRELATION  239 

i.e.,  with  the  R  which  would  arise  from  fluctuations  of  sampling 
alone.     The  formula  for  this  R  is 

R=^T' <w> 

in  which  n  is  the  number  of  variables,  and  N  is  the  number  of 
cases.1  To  illustrate  this  formula,  let  us  apply  it  to  the  three- 
variable  problem  in  Table  XXIV,  in  which  n  =  3,  and  N  =  450. 
Substituting  for  N  and  n  in  the  formula,  we  get  an  R  equal 
to  .07,  which  indicates  a  highly  satisfactory  degree  of  validity  for 
the  obtained  R  of  .824. 

If  we  replace  0-1.23 n  in   formula  (56)  by  its  value  in 

terms  of  the  entire  and  partial  r's  [see  formula  50]  we  may 
write  the  general  formula  for  #i(234  . . .  n),  as  follows: 


R 


1(234  .  .  .  n)  = 


Vl-[(]  -r2i2)(l-r2i3.2)  .  .  .  (l-r2in.23  . .  .  (»-i>)].     .     (58) 


Moreover,  since  a  higher  order  a  may  be  written  in  a  variety  of 
ways,  the  number  depending  upon  its  order  (see  page  234),  we 
have  in  the  alternate  forms  for  R  &  valuable  means  of  checking 
the  accuracy  of  our  arithmetical  calculations.  In  a  three- 
variable  problem,  for  example,  Ri&3)  may  be  written  as 


fii(23)  =  Vl-[(l-r2i2)(l-r2i3.2)], 
or 

#K32)  =  Vl-[(l-r213)(l-r2i2.3)]. 

In  like  manner,  in  a  4-variable  problem  #i#34)  may  be  found 
from 

£i(234)  =  Vl-[(l-r2i2)(l-r2i3.2)(l-r2i4.23)], 

and  checked  by 

#K342)  =  Vl-[(l-r2i3)(l-r214.3)(l-r212.34)]. 

1  Rosenow,  Curt,  The  Analysis  of  Mental  Functions,  Psychological  Mono- 
graphs, 1917,  Vol.  XXIV,  5,  p.  20. 


240      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

6.  Outline  of  the  Formulas  Needed  in  Correlation  Problems 
Which  Involve  (a)  Four  Variables  and  (b)  Five  Variables 

In  multiple  correlation  problems,  generally  the  main  task  is 
to  find — with  a  minimum  of  time  and  calculation — the  regres- 
sion equation  which  expresses  the  relation  of  the  dependent 
variable  to  the  independent  variables.  For  this  purpose,  when 
working  with  more  than  three  variables,  the  simplest  plan  is  to 
write  down  the  formula  for  the  regression  equation  required 
first  and  then  proceed  deductively  to  find  those  partial  r's  and 
higher  order  cr's  which  are  necessary  for  computing  the  regres- 
sion coefficients.  The  formulas  for  getting  the  regression 
equation  with  a  minimum  amount  of  calculation  are  given — for 
four  and  five  variables — in  the  following  outlines.  It  is  neces- 
sary, of  course,  that  all  zero  order  r's  be  first  computed  before 
the  partial  correlation  technique  can  be  applied. 

(a)  Formulas  for  Four- Variable  Problems 

(1)  Regression  Equation.  The  regression  equation  for  four  vari- 
ables is  written  by  reference  to  formula  (52)  as  follows: 

(2)  Regression  Coefficients.  The  three  regression  coefficients 
needed  in  (1)  are  found  from  formula  (53), — 

,  Cx.234 

Oi2.34  —  7*12.34 

C2.134 

,  0*1.234 

Oi3.24  —  Tu.  24 

C73.124 

,  Cl.234 

014.23—  7*14.23 

CT4.123 

These  regression  coefficients  evidently  require  the  computation  of 
3  second  order  partial  r's,  and  4  third  order  o-'s. 


PARTIAL  AND  MULTIPLE  CORRELATION  241 

(3)  Partial  r's. 

To  find:  To  find:  To  find: 

(a)  (6)  (c) 

7*12.3  —  #14.3  7*24  3  7"l3.2— 7*14.2  T34.2  7*14.2— 7*13.2  7*34.2 

ri2.34=      ;  /       -       —      7*13.24= ,  — ,  7*14.23  =  " 


Vl-r2i4.3Vl-r224.3  Vl-r214.2Vl-r234  2  '    Vl-r213.2Vl. 


•7*-34.2 


We  must  find  3  first  We  must  find  3  first  No  partials  of  first 
order  partial  r's  as  order  partial  r's  as  order  are  needed 
follows:  follows:  other    than    those 


already  found. 


ri2-ri3  r23  ri3-ri2  r23 

ri2.3=— — . — —         ri3  2=- 


Vl-r2i3Vl-r223  Vl-r2i2Vl-r2 


12  v  X-7-23 


ri4-ri3  r34  ri4-ri2  r24 

ri4.3=— 7=      -   / =         ri4.2=- 


Vl-r2i3Vl-r234  "    Vl-r2i2Vl-r224 

r24-r23  r34  r34-r23  r24 

r24.3=— 7== — ,  r34.2=- 


Vl-r228Vl-r284  '     Vl-r223Vl-r224 

[Note  that  a  minimum  of  9  partial  r's  must  be  computed,  3  of  the 
second  order  and  6  of  the  first  order.  The  9  first  and  second  order  r's 
together  with  the  6  zero  order  r's  make  15  coefficients  of  correlation 
required  in  all.] 

(4)  Standard  Deviations.  The  four  third  order  cr's  required  may 
be  found  from  the  following  formulas  which  make  use  of  no  partial  r's 
other  than  those  already  computed  in  (3)  above.     From  formula  (50) : 

Cl.284  =  <Tl  Vl  —  r2i2  Vl  — rai«.»  Vl  —  f2i4. 23 

CT2.134    (i.e.,  (72.34l)=0-2  V   1— r223  V   1  — 7*224.3  V   1— r2i2.34 

c3.i24  (i.e.,  (73.24i)=(73 V 1— r223 V  1— ?'234.2 V  1  — r2i 
0-4.123  (i.e.,  o-4.32i)=o4V/l  — r234Vl— r224.3V/l  — r2i 


3.24 


4.23 


The  numerical  values  of  the  regression  coefficients  may  now  be 
computed  and  substituted  in  the  regression  equation. 

(5)  The  Standard  Error  of  Estimate,  a- (est.  xi)-  From  formulas 
(54)  and  (55)  we  find: 

ocest.xx)  =01.234      [for  value  01.234  see  (4)  above] 

PE(eat.  X{)  =  •  6745  0(est.  Xi) 


242      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

(6)  Coefficient  of  Multiple  Correlation,  R.  In  a  four- variable 
problem  the  multiple  coefficient,  R,  is  written  Riqu)  and  may  be 
found  from  formula  (56) : 


Rwui  =  yjl -~ 


This  formula  may  also  be  written  as: 

#i<2W)  =  VH(l^)(l-r«llll)(l-r«M.„) 
or  as 

#1(234)  =  V/l-[(l-r213)(l-r214.J)(l-^12.34) 

(6)  Formulas  for  Five-Variable  Problems 

(1)  Regression  Equation: 

^l  =  Oi2.345A^2-j-Oi3. 245A3-hOi4.235-X44-O15.234X5-h.lv.     .        •         (52) 

(2)  Regression  Coefficients: 

,  0*1.2345  7  0*1.2345  /~0\ 

Ol2.345  =  yi2.345 j  Oi4.236  =  ^14.235 ,       •         •         (.Oo) 

0*2. 1345  0"4.1235 

,  0*1.2345  ,  0*1.2345 

Ol3.245  =  ^l3.245  "      ,  Oig.234  =  ri5.234  • 

0*3.1245  0*5.1234 

(3)  Partial  r's.    We  compute  22  partial  r's  as  follows  (formula  49) : 
(a)  (o) 

To  find:  r12.345  write  as  r12.453.  To  find;  fi3_24s  write  as 


Then 


Then— 


23-45 


^12.45  —  ?"l3.45   ^23.45  „  „  „ 

7-12.453  =  —T  7= •  r  -  rn.45-ri2.45r23.45 

To  compute  this  r  we  need  3 
partial  r's  of  the  second   order,  To  compute  this  r  we  need  no 

partial  r's  other  than  those  already 
found  in  (a). 


viz., — 

ri2.4— ru.4  r25.4 

ri2.45  — 


ri3.45  — 


Vl-rhsWl-rhs.4 
ri34— ris.4  r35.4 

r23.4— r25.4r35.4 


r23.45  —       /  =      /  ~' 

Vl-rJ2Mvl-r23u 
To  compute  these  3  r's  we  need 
6  r's  of  the  first  order,  viz., — 
ru.4  ris.4  ri3.4 

T26.4  ^23.4  rjS.4 


PARTIAL  AND  MULTIPLE  CORRELATION  243 

(c)  W 

To   find:    ri4.235  write  without  To  find:    r]5.234  write  without 

change—  change— 

7*1  A. "3  —7*15.23  9*45.23  7*15.23  —  9*14.23  7*45.23 

ri4.235  = /  j-  7*15.^34 


V^-rh^Vl  -r»«.s«"  Vl -r214.23  Vl-r^s-aa' 

To  compute  this  r  we  need  3  m               A    ±1  •                   j 

.,,-,,                i        ,  to  compute  this  r  we  need  no 

partial  r  s  of  the  second  order,  partialg  other  than  those  already 

vlz->  found  in  (c). 

7*14.2  —7*13.2  7*34.2 


7*14.23  : 


7*15.23  — 
7*45.23 


Vl  -r\3.2Vl  -rhi.2 

7*15.2  —7*13.2  7*35.2 


Vl  -rhz.2  Vl  -rh&.2 

7*45.2  —  7*34.2  7*35.2 


Vl-rhiWl-rsJ 
To  compute  these  r's  we  need 
6  r's  of  the  first  order,  viz., — 

7*14  2  7*13.2  7*15.2 

7*34.2  7*35.2  7*45.2 

[Note  that  we  must  compute  a  minimum  of  4  third  order  r's,  6 
second  order  r's,  and  12  first  order  r's,  22  in  all.] 

(4)  Standard  Deviations.  The  5  fourth  order  cr's  required  may 
be  found  from  the  following  forms  which  make  use  of  only  those 
partial  r's  already  computed  in  (3): 

0-1.2345  =o-1Vl-r212V/l-r2i3 2Vl-rtu.2zVl-rhs.Z4  •  (50) 

CT2.1345   (i.C,    02.453l)  =0-2^1 -r224Vl-r225.4V/l-r223.45Vl-r2l2. 345 

0-3.1245  (i.e.,  0-3.4521)  =o-3 V 1  —  r234  V 1  —  rJ3d.,  Vl  —  r2i-iA6Vl  —  r2i3.245 

0-4.1236  (i.e.,    0-4.235l)=0-4V/l-r224V/l-r234.2V/l-r245.23'V/l-7*2l4.235 
0*6.1234   (i.C,   0-5.234l)  =0-5 V 1  —  r226 V 1  —  r236.2 V 1  —  r245.23 V 1  —  r2i5.234 

(5)  Standard  Error  of  Estimate  a- (est.  xa 

©■(est.x!)  =0-1.2345  [see  (4)  above  for  value]     .    .     .     (54) 

P^(est.X1)=.6745  0-(est.Xi)     . .      (55) 


244      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 
(6)  Coefficient  of  Multiple  Correlation,  R. 

•'"' (56) 


it  1(2346)  —  A/  1 ~ 


which  may  be  written  also  as 

Rums*  =  V/l-[(l-r212)(l-r213.2)(l-r214.23)(l-r2i5.234)], 
and  checked  by 

^K2346)  =  Vl-[(l-r»M)(l-r*1,.4)(l-r«w.„)(l-r*1>.a46)]. 

IV.  A  Multiple   Correlation  Problem  with  Four 

Variables 

In  Section  II  we  found  that  a  student's  honor  points  (X\) 
could  be  estimated  with  a  considerable  degree  of  accuracy  from 
a  knowledge  of  his  general  intelligence  score  (X2)  and  the  num- 
ber of  hours  he  spends  in  study  per  week  (X3) .  The  PEiest.  Xl) 
made  in  estimating  individual  scores  from  this  three-variable 
regression  equation  was  found  to  be  4.28  points;  and  the  coeffi- 
cient of  multiple  correlation,  Ri@3)  which  indicates,  in  general, 
how  well  the  estimated  scores  represent  the  actual  scores  was 
.824.  Now  suppose  that  we  add  to  the  two  independent 
variables  X2  and  X3  a  third  factor  X4 — e.g.,  the  quality  of  the 
preparatory  work  done  by  the  student  in  High  School.1  This 
will  give  us  three  independent  variables  from  which  to  estimate 
the  dependent  variable  honor  points,  and  the  question  arises : — 
with  how  much  greater  accuracy  will  this  additional  factor 
enable  us  to  predict  academic  success? 

The  answer  to  this  question  will  be  found  in  Table  XXVIII, 
which  gives  a  complete  solution  of  this  problem,  following  the 
scheme  outlined  for  four- variable  problems  in  Section  111(6). 
Some  additional  discussion  of  procedure  and  methods  and 
several  points  to  be  especially  noted  are  given  in  the  following 
paragraphs. 

Remember  first  of  all  that  the  mean  and  the  a  of  each  set  of 
measures  must  be  known  as  well  as  their  6  inter  correlations, 

1  This  was  measured  by  the  average  grade  obtained  in  the  work  offered  for 
entrance  to  College.  May,  Predicting  Academic  Success,  Journal  of  Educa- 
tional Psychology,  Vol.  XIV,  434-436. 


PARTIAL  AND  MULTIPLE  CORRELATION  245 

r's  of  the  zero  order.  The  calculation  of  these  6  intercorrela- 
tions  is  actually  the  most  laborious  part  of  the  solution  of  a 
multiple  correlation  problem — in  spite  of  the  fact  that  we  have 
passed  it  over  with  little  comment  heretofore — since  a  separate 
correlation  table  must  be  drawn  up  for  each  r. 

(1)  The  discussion  from  here  on *  follows  the  outline  given 
in  (6)  on  page  240.  Thus,  before  calculating  any  partial  r's,  we 
write  the  regression  equation,  and  from  it  deduce  what  partial 
r's  and  higher  order  cr's  will  be  required. 

(2)  It  is  clear  from  the  regression  coefficients  that  we  shall 
need  three  partial  r's  of  the  second  order: — viz.,  ri2.34,  ri3.24, 
and  ri4.23;  and  four  partial  <r's  of  the  third  order,  viz.,  0-1.234, 
0-2.134,  0-3.124,  and  04.123,  in  order  to  evaluate  the  constants  in 
the  regression  equation.  Only  the  partial  r's  actually  required 
in  the  regression  equation  need  be  calculated. 

(3)  In  order  to  find  ri2.34  we  shall  need  three  first  order 
partial  r's,  viz.,  ri2.3,  ri4.3,  and  r24.3j  and  to  find  ri3.24  we  shall 
need,  again,  three  first  order  partial  r's,  viz.,  ri3.2,  ri4.2,  and  r34.2- 
To  find  the  last  second  order  partial,  ri4.23,  no  additional  first 
order  r's  are  required  other  than  those  already  found.  A  mini- 
mum of  9  partial  r's,  therefore,  is  required  in  all. 

The  partial  ri2.34  gives  the  net  correlation  between  (1)  honor 
points  and  (2)  general  intelligence  when  both  (3)  study  hours 
and  (4)  average  High  School  grades  have  been  eliminated  as 
variable  factors  or  held  constant.  In  like  manner,  ri3.24  gives 
the  net  correlation  between  (1)  honor  points  and  (3)  study 
hours  when  both  (2)  general  intelligence  and  (4)  average  High 
School  grades  are  held  constant.  The  first  second  order  partial 
r,  i.e.,  ri2.34,  equals  .764  and  is  but  slightly  reduced  from  ri2.3 
which  equals  .802;  while  the  second  partial  ri3.24  =  .676,  and 
is  also  but  slightly  less  than  ri3.2  which  equals  .707.  This 
comparison  of  partial  r's  shows  the  relatively  small  influence 
of  High  School  grades  on  the  net  correlation  between  (1)  honor 
points  and  (3)  study  hours  with  general  intelligence  constant, 
as  well  as  the  small  influence  of  this  factor  on  the  net  correlation 
1  See  Table  XXVIII.     The  divisions  in  the  text  parallel  those  in  the  table. 


246      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

between  (1)  honor  points  and  (2)  general  intelligence  for  study 
constant.  Notice,  however,  that  while  the  zero  order  coefficient 
of  correlation  between  (1)  honor  points  and  (4)  average  High 
School  grades,  i.e.,  ru  is  .40,  ri4.2  =  .246,  ri4.3  =  .387,  and 
7*14.23  =  .088.  Evidently,  nearly  all  of  the  correlation  which 
appears  between  (1)  honor  points  and  (4)  average  High  School 
grades  may  be  attributed  to  the  common  dependence  of  these 
two  factors  on  (2)  general  intelligence  and  to  a  somewhat  lesser 
degree  on  (3)  study  hours. 

(4)  By  using  the  forms  given  in  (6)  page  240,  we  are  enabled 
to  calculate  the  four  third  order  as  required  by  the  regression 
coefficients  without  the  necessity  of  finding  any  additional 
partial  r's  (see  page  234).  These  partial  o's  viz.,  0-1.234,  02.134, 
etc.,  give  the  net  variability  of  the  distribution  of  measures 
denoted  by  the  primary  subscripts  when  the  influence  of  all 
three  of  the  other  factors  (secondary  subscripts)  has  been 
excluded.  To  take  a  single  example,  01.234  is  6.31  as  against 
a  01  of  11.2,  which  means,  concretely,  that  if  each  of  the  450 
students  in  the  group  were  exactly  alike  as  regards  (2)  general 
intelligence,  (3)  study-hours,  and  (4)  average  High  School  grades, 
the  a  of  their  distribution  of  honor  points  would  be  only  about 
half  as  large  as  the  observed  o: — the  o  of  the  group  in  which 
these  factors  differ  in  weight  or  value. 

The  computation  of  the  regression  coefficients  is  simpl}-  a 
matter  of  combining  the  partial  r's  and  o's  already  found. 
When  this  has  been  done,  we  may  substitute  in  the  regres- 
sion equation  to  find  xi  =  . 55^2 +  1.07x3  +  .083o*4,  or  multiply- 
ing by  12.5  (a  convenient  constant),  (the  number  of  honor 
points)  =7  (score  on  general  intelligence  test)  +13  (the  number 
of  hours  spent  per  week  in  study)  +1  (average  High  School 
grades).  In  Score  Form  the  regression  equation  becomes 
Xi  =  .55X2+1.07Z3+.083X4-69. 

It  is  clear  from  the  regression  equations  that  the  number 
of  hours  spent  in  study  has  twice  the  weight  of  the  score  on 
general  intelligence  test  and  thirteen  times  the  weight  of  the 
average  High  School  grades,  in  determining  the  number  of 


PARTIAL  AND  MULTIPLE  CORRELATION  247 

honor  points  which  a  student  will  most  probably  receive  at  the 
end  of  the  first  semester.  Apparently  (as  noted  above),  the 
average  High  School  grades  have  relatively  little  influence  on 
honor  points  as  compared  with  the  other  factors  in  the  equation. 

(5)  Still  further  evidence  of  the  small  importance  of  High 
School  grades  in  improving  the  estimate  of  honor  points  is 
to  be  seen  in  the  size  of  the  PE^t.Xl)-  The  PE  of  estimate 
made  in  predicting  honor  points  from  the  present  equation  is 
4.26  points  as  compared  with  a  Finest  x$  of  4.28  points  made 
in  using  the  regression  equation  which  does  not  include  High 
School  grades  (see  page  230) .  This  means  that  we  can  estimate 
the  number  of  honor  points  which  a  student  will  receive,  know- 
ing his  general  intelligence  score  and  the  number  of  hours  he 
spends  in  study  per  week,  with  but  slightly  greater  error  than 
when  we  know  in  addition  to  these  two  the  average  grade  he 
has  received  in  High  School  also.  It  would  seem  apparent, 
therefore,  that  the  work  required  to  build  up  a  regression  equa- 
tion which  will  include  the  latter  factor  is  hardly  worth  while. 

(6)  The  multiple  coefficient  of  correlation,  2£i(234)  is  .826 
as  compared  with  the  Ri@3)  of  .824.  A  comparison  of  these 
multiple  coefficients  further  substantiates  the  conclusion 
that  High  School  grades  contribute  practically  nothing  to  the 
reliability  of  an  honor  point  estimate. 

It  will  be  of  considerable  interest  to  compare  the  reliability 
of  our  estimate  of  honor  points  when  the  factors,  singly  and 
in  combination,  are  taken  into  account.  In  this  way  the 
"prognostic"  value  of  the  multiple  regression  equation — as 
shown  by  the  size  of  o-(est.  xi> — will  be  more  readily  appreci- 
ated. The  standard  errors  of  estimate  and  the  coefficients 
of  correlation  for  the  different  factors  taken  singly  and  in 
combination  are  given  below: 

Dependent  Variable: 

(Honor  Points  X{)  o"(est.  Z\)  Coefficients  of  Correlation 
Xx=.43X2-24.76                                  8.96  r12=.60 

Xi=.60X3+4.1  10.61  ris-,32 

Xi«.57X2+l. 13X3-66  6.34  #1(23)"=.824 

X=  .55X2+1. 07X3+.083X4-69        6.31  #i(234)  =  .826 


248      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 


CO 


> 

X 
M 

W 

M 

H 


CO 

H 

s 

> 

I 

fa 
O 

O 


« 
O 

fa 
O 

B 

«j 

« 
M 
O 

O 


o 
o 

M 

H 

3 

o 

w 


o 

o 

•d 

lO 

,£03 

OS 

t>^ 

bC-d 

l> 

bD 
03 

3 


c3 

>> 

cu 

•n 

Mm 

o 
o 

03 

go  % 
"8£ 

+3 

CO    p 

l-H 

3  a 

d 

o 

03 

rj<   CO 
CN 


O 

•  iH 

-4-3 
C3 
-4-3 

d 

P-. 

a 

o 
U 


CO 


0) 

o 

d 

0> 
bD 


0) 

d 

03 

o 

CN 


co 

.9 

O 
fa 

f-i 
O 
d 

o 

w 


CD 


O 
O 


00 


lO   CM 

00    tH 


k! 

+ 


iO   CO 

^ 

CO    CO 

1 

II   II 

+ 

s  £ 

o 

CM 

0) 
bfl 

CD 

o 
cc 

d 

o 

03 
f-l 
03 

a 

CO 

II 

CO 

O    <N 

o 

o 

•  • 

d 

CO    CO 

II   II 

II 

V|-l 

d 

#o 

d 
o 

03 

CN          CO 

— 

-+j 

d 

03 

c  c 

V- 

3 

C 

O 

"o 

w 

O 

00. 

d 

d 

««H 

o 

o 

o 

*w 

*co 

03 

CO 

CO 

a 

0> 

03 

H 

03 

bO 

txO 

rd 

03 

03 

o 

CO 

rt 

« 

u 

/^^ 

o 

in 

<N 

fa 

N-"' 

b     b 


PARTIAL  AND  MULTIPLE  CORRELATION 


249 


OS 


H3 

o 
H 


a> 

a 
•-* 
•+■» 
d 
o 
o 


> 

M 

M 
H 


n3 

a 

o 
H 


I 

i— i 

> 


> 


> 


> 


© 
S-i 

'3  .. 

02  O 

^^ 

03  03 

*^  ?-< 

t-i    03 

o2 


m  d 
cc^ 

O  H 

o 


© 

.a 
-^ 

<4-l 
O 


H-= 

S-I 

03 
ft 

CO 

a 


HI© 

©      O 


o 


CO 

GO 

o 

II* 

1> 

CM 

os 

X 

CO 

1> 

OS 

o 

* 

l> 

X 

CM 

OS 

1 

rH 

© 

CO 

O 

e 

CM 

<» 

►*s 

1 

■♦^ 

to 

§ 

1 

© 

c 

V. 

fe< 

o 

CM 

CO 

i>- 

CO 

r^ 

ill 

II 

CM 

CO 

cm 
cm 

iO 

IM 

II 

S    co 

CM 

Sj- 

CO 

i-» 

L, 

1 

1 

CO 

I     x 

CO 

1 

1 

t^ 

■*r 

1 

CO 

•*r 

1        ^.^ 

5^. 

^H 

1 

CO 

J^ 

T-H 

o 

^. 

i-l       lO 

c 

> 

X 

CO 
OS 

CM 

> 

X 

CO 
CO 

co 

CM 

\,      CO 

1 

~» 

o 

1 

CM 

o 

os 

1 

S         1 

1 

w 
^ 

CO 

X 

1 

CM 

CO 

X 

1 

*T      1 

c 

1 

1 

GO 

£ 

| 

1 

£ 

1                       1 

1 

1 

GO 

1                       1 

i— t 

CM 

t-4 

o 

*-*          TH 

> 

CO 

> 

"tf 

>  -. 

1 

1 

i 

1 

1 

II 

o 

CO 
CO 

os 

X 
i>- 

CO 

co 

os 


ca 


© 


CO 


CM 

X 

CO 

CM 


1^ 

O 


OS 
CI 

CO 
OS 

X 
cm 

GO 
CO 
OS 


CO 

O 

H 


> 


> 


© 


03 
ft 

CO 

£ 


02    •  • 

f-t 

©  o 

ca 


CM 
O 


CO 
CM 


CO 

l^ 

CO 

! 

II 

CO 

T— 1 

CO 
CM 
CM 

no 

CO 

CM 

II 

1       C3           tH 

5- 

1 
i— 1 

> 

CO 

1 

X 

CO 

co 
OS 

5*. 

1 

> 

1 

T-H 

T-H 

X 

os 

CO 
OS 
OS 

•3" 

5^" 

|i    x 

\„     CO 

S» 

CO 

CM 

x 

■ 

CO 

CM 

1 

S        i 

1 

CM 

CO 

r* 

1 

CM 

5- 

CO 

X 

1 

CM                   1 
5^          y^, 

c-i 

1 

t>. 

I 

, 

tH 

CJ 

1                 , 

C 

1 

1 

r}H 

!>. 

1 

1 

]>. 

t* 

1                 1 

1—1 

o 

OS 

i—l 

o 

"?H 

i— I      co 

> 

CO 

V 

"Ch 

OS 

\      CO 

1 

1 

1 

1 

1 

1 

OS 

CO 

OS 
OS 

X 
t> 

co 
CO 

os 


© 
e 

OS 


© 


CO 
1> 


CO 
CM 
rh 

X 
t> 

00 
CO 

I 

CM 

o 
oo 


250      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 


o 

CO 

-t 

IC 

) 

to 

3 

u; 

tC 

> 

1— 1 

CO 

CD 
II 

00 

00 

CD 

00 
II 

05 

05 

CO 

o 

CO 

II 

1-1 

CD 
II 

Oi 

to 

C3 

• 
• 
• 

(2 

d 

_o 
'•+3 

03 
P 

•>* 
CO 

o 

CO 

C5 

• 

C5 

X 

oo 

CO 

X 

t 

00 

o 

X 

CI 

o 

X 

00 
CI 

+ 

? 

CN 

O 

OS 

o 

]> 

<M 

a> 

H 

rH 

o 

X 

CD 

• 

t^ 

O 

1> 

05 

X 

o 

fr 

*0 
0) 

.S 

H-> 

a 
o 
O 

| 

X 

00 

X 

CM 

i—i 

T— 1 
II 

CD 
CO 

C5 

X 

00 

to 

1—4 

II 

X 
1> 

CD 
CO 

X 

CD 
II 

C3 
CO 

o 

d 

X 

to 

]>i 

II 

co 

rO 

d 

CO 

tr 

i 

rH 

CO 

CD 

3 

: 

00 
CO 

00 

c 
1 

1—4 

CO 

CD 

3 

CO 

c 
a 

c 

1 

1—1 

CO 
CD 

3 
D 

1 
i— I 

CD 

1—1 

+ 

CM 

LO 

to 

II 

1 

03 

rH 
O 

C3 

m 

a 

CO 

1 

CO 
00 

r— 

C" 

C£ 

II 

!  « 

> 

i 
) 

!    d 

C~> 

esT 

« 

r-i 

^f 

X 
<* 

X 

o 

X 

00 

CO 

cu 

CO 

00 

o 

II 

'   X   .2 

> 

h~ 

- 

- 

5- 

CM 

o 

i> 

00 

B 
o 

o 

2     rt 

<             r— . 

X 

1 

1— 1 

1 
rH 

1 

-4 

1 

i—l 

to 

i> 

co 

o 

1 

+ 

+ 

b           o 

II    II    « 

•"H 

—                  — •               *J 

02             00         r-H 

«        S        j-J 

b  S  a 

PQ 

go 

1 

i—i 

> 

> 

T 

i—l 

> 

> 

1 

i—l 

> 

<* 

CO 
CM 

1 

— 1 

> 

-t-3 

el 

"o 
cfi 

03 

o 

o 

d 
o 

1 

b 

5 
1 

1 

b 

i" 

1 

b 

5 
1 

CO 

b 
\ 

b 
1 

1 

CO 

CM 

^b 
f 

-o 

03 

rD 
"■+3 

o 

1 

CO 

o 

1—1 

o 

1-H 
+ 

H 

03 

H-» 

cc< 

•n 

., 

CO 

CM 

CM 

rH 

+ 

to 

o 

N 

-,     1 

CO 

"oo 

CM 

CO 

** 

_o 

to 

cj 

CO 

+•» 

\ 

— t 

> 

r 

—4 

> 

CM 

r 

> 

CO 

r 

—4 

> 

CD 

rH 

bJO 

0) 

rH 
1— 1 

.3 

rO 

rO 

o 

03 

o 

rH 

CD 

O 
O 
i—i 

II 

CO 

w 

o 

d 
s 

"u 

03 

•  • 

b 

b 

b 

b 

^ 

1 

O 

CO 

o 

•rH 

II 

eo 
C4 

II 

CO 

II 

rH 

II 

CO 
CM 

rH 

c3 

Cm 

J32 

rO 

03 

*» 

^ 

O 

Vh 

w 

T3 

U 

> 

p 

b 

CM 

b 

£ 

s 

o 

d 

.2 

r£ 

<4-l 
O 

00 

to 
to 

II 

>»H 

o 

d 
#o 

'•+3 

03 

to 

a 

cii 

-4-> 

ty3 

"-3 

1 

c3 
-P 
g 

00 

TO 

H-> 

3 

»o 

ft 

> 

1 

Ch 

in. 

a 

o 

.9 

*h3 

1 

03 

H 

1 

3 

H-J 

'■+3 

CO 

.a 

3 

03 
O 

io 

o 
>-• 

OQ 

0 

CO 

to 


co 

00 


^ 


ft. 


CD 
CM 

00 


CO 

X 

00 

LO 

iO 

oo 
X 

CD 

l> 

00 


> 


I 

> 

II 


r* 

o 

03 

rC 


PARTIAL  AND  MULTIPLE  CORRELATION  251 

The  important  fact  here  is  that  cr(est.  xo  is  considerably 
less,  and  the  correlation  considerably  greater,  when  X2  and  X3 
are  taken  together  than  when  either  is  taken  alone.  The  stand- 
ard error  of  estimate  and  the  R  improve  very  slightly  when  X4 
is  added  to  X2  and  X3.  It  is  very  probable  that  by  an  exten- 
sion of  the  method  of  partial  and  multiple  correlation  to  in- 
clude other  variables  in  addition  to  those  we  already  have, 
the  o-(est.  xi)  of  our  problem  could  be  still  further  reduced  and 
R  increased. 

Before  working  out  a  regression  equation  containing  added 
variable  or  variables  the  " predictive  value"  of  the  "new" 
equation  should  be  found  by  computing  o-(est.xi)  or  &  This 
will  enable  us  to  determine  what  the  effect  will  be  of  adding 
another  variable  or  variables,  and  whether  <7(est.  Xl)  is  sufficiently 
reduced  or  R  sufficiently  increased  to  justify  the  additional 
calculation.  In  the  present  problem,  for  instance,  either 
<T(est.x1y  or  .Ri(234)  would  have  told  us  that  average  High 
School  grades  add  practically  nothing  to  the  predictive  value 
of  a  regression  equation  which  already  contains  the  two 
variables  general  intelligence  and  number  of  hours  spent  on 
the  average  in  study  each  week. 

V.  The  Value  and  Use  of  Partial  and  Multiple 

Correlation 

1.  The  Value  and  Use  of  Partial  Correlation  in  Analysis  and  in 
Causal  Investigations 
Partial  correlation  is  of  considerable  importance  in  the 
analysis  of  the  part  played  by  each  of  several  factors  in  a  total 
result,  inasmuch  as  it  enables  us  to  find  the  net  relationship 
between  two  sets  of  scores  or  measures  when  the  influence  of 
one  or  more  other  factors  is  excluded.  A  concrete  illustration 
of  this  use  of  partial  correlation  may  be  cited  from  the  work  of 
Cyril  Burt.1  Burt  wished  to  find  how  much  a  child's  mental 
age — as  given  by  the  Binet  tests — influenced  his  school  attain- 
ment.   His  subjects  were  300  children  from  7  to  14  years  old. 

I  Burt,  Cyril,  Mental  and  Scholastic  Tests,  London,  1921,  pp.  180-184, 


252       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

Each  child's  (1)  MA  (Binet)  was  found;  likewise  his  (2) 
scholastic  achievement  as  measured  by  educational  examina- 
tions and  checked  by  teachers;  and  (3)  his  chronological  age. 
The  " entire"  coefficient  of  correlation  between  Binet  MA  and 
scholastic  achievement  (ru)  was  .91.  When  chronological 
age  (3)  was  held  constant,  the  partial  r  (7*12.3)  between 
Binet  MA  and  scholastic  achievement  dropped  to  .68.  This 
shows,  in  the  first  place,  that  age  has  a  decided  effect  on  the 
observed  correlation  between  MA  and  school  work — that  it 
tends  to  increase  or  " dilate"  the  obtained  r.  This  dilation  is 
due  to  the  fact  that  both  MA  and  school  attainment  tend  to 
increase  with  chronological  age,  and  hence  this  common  depend- 
ence on  chronological  age  is  sufficient  to  bring  about  a  consider- 
able " boost"  in  the  observed  correlation.  In  the  second  place, 
the  7*12.3  =  .68  indicates  that  a  substantial  relation  remains 
between  MA  and  school  work  when  age  conditions  are  uniform. 
In  other  words,  Binet  MA  (intelligence)  is  a  substantial  factor 
in  a  pupil's  school  attainment  irrespective  of  his  chronological 
age.  To  take  the  analysis  a  step  further,  Burt  found  that  the 
correlation  between  school  work  (2)  and  chronological  age 
(3)  (7*23),  was  .87;  and  that  when  the  effect  of  Binet  MA  was 
held  constant,  the  partial  r  between  school  work  and  chrono- 
logical age  (7*23. 1),  was  .49.  The  persistence  of  a  fairly  high 
relation  between  school  work  and  chronological  age  when 
intelligence  is  eliminated  offers  confirmatory  evidence,  accord- 
ing to  Burt,  of  the  "undue  influence  of  age  upon  school  classifi- 
cation." In  these  illustrations  it  is  clear  that  the  calculation  of 
the  partial  r's  is  the  first  step  in  an  analysis  of  the  factors  which 
determine  school  attainment.  By  an  extension  of  this  same 
method  the  influence  of  other  factors  may  be  excluded  and  net 
relations  secured. 

From  the  analyses  made  through  the  elimination  of  factors 
by  partial  correlation,  we  are  often  enabled  to  determine  exist- 
ing "causal"  relationships.     Thus  Phillips1  in  a  study  of  the 

1  Phillips,  Prank  M.,  Application  of  Partial  Correlation  to  a  Health  Problem. 
Reprint  No.  867  from  Public  Health  Reports,  Sept.,  1923. 


PARTIAL  AND  MULTIPLE  CORRELATION  253 

causes  contributing  to  absence  on  account  of  sickness  among 
government  employees  over  the  period  of  a  year  found  that  the 
observed  correlation  between  absence  (i.e.,  number  of  persons 
absent)  and  mean  temperature  on  the  day  of  absence  (rat.)  was 
—  .37.  When  the  four  factors  (1)  relative  humidity  at  8  a.m. 
on  the  day  of  absence;  (2)  relative  humidity  at  noon  of  the 
previous  day;  (3)  inches  of  rainfall  on  the  day  of  absence;  and 

(4)  per  cent  of  possible  sunshine  on  the  day  of  absence  were  held 
constant,  the  net  correlation  (rat.  1234)  remaining  between 
absence  and  temperature  was  —.39,  practically  the  same  as 
the  original  correlation.  Since  this  was  the  only  r  of  any  size 
(the  other  r's  both  entire  and  partial  were  negligible)  the 
obvious  conclusion  seems  to  be  that  of  the  factors  studied, 
temperature  on  the  day  of  absence  is  the  most  important  sec- 
ondary or  contributing  cause  of  absence.  (The  sickness  must 
be  taken,  of  course,  as  the  primary  cause  of  absence.)  Here 
and  elsewhere  let  it  be  understood  that  partial  correlation  has 
absolutely  nothing  to  say  about  "  causes,"  as  such.  The  con- 
clusion as  to  which  of  two  factors  is  the  cause  and  which  the 
effect  is  a  matter  of  common  sense  analysis.  In  the  illustration 
given,  the  distinction  between  cause  and  effect  is  obvious. 

Another  interesting  example  of  the  use  of  partial  correlation 
in  a  causal  investigation  is  found  in  the  work  of  Reavis.1 
This  investigator  undertook  to  ferret  out  the  causes  of  attend- 
ance and  non-attendance  in  rural  schools.  Certain  factors, 
(1)  distance  from  school,  (2)  age-grade  relation,  (3)  kind  of 
work  done  by  the  pupils,  (4)  training,  experience,  etc.,  of  teacher, 

(5)  school  equipment,  and  (6)  kind  of  community  were  taken  as 
having  more  or  less  effect  on  school  attendance.  When  partial 
correlation  was  applied  to  the  problem,  it  was  found  that  the 
entire  coefficient  of  correlation  between  attendance  and  distance, 
and  attendance  and  kind  of  community,  were  the  least  reduced. 
The  first  was  lowered  from  —  .45  to  —  .43 ;  and  the  second  from 
.30  to  .28.     Of  all  the  factors  selected,  therefore,  these  two  seem 

1  Reavis,  George,  Factors  Controlling  Attendance  in  Rural  Schools.    Teachers 
College,  Columbia  University,  1920. 


254      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

to  have  the  most  direct  or  independent  influence  on  school 
attendance.  As  in  the  problem  cited  above,  the  distinction 
between  cause  and  effect  in  this  illustration  is  clear: — it  is 
evident  that  distance  from  school  and  kind  of  community  are 
the  causes  and  not  the  effects  of  attendance  or  non-attendance. 

2.  The  Value  of  the  Regression  Equation  in  Prediction  and 
Analysis 

The  value  of  the  regression  equation  is  twofold:1  (1)  In  its 
usual  form,  it  gives  the  weights  to  be  assigned  each  of  several 
independent  variables,  in  order  that  Xi  (the  dependent  variable) 
may  be  predicted  or  forecasted  with  minimum  error  (see  page 
237).  (2)  In  its  " special"  form  it  may  be  used  to  analyze — 
within  certain  limits — a  given  capacity  or  ability.  We  shall 
consider  these  two  uses  of  the  regression  equation  in  order. 

(1)  It  has  already  been  stated  that  the  regression  equation 
enables  us  to  combine  two  or  more  tests  or  other  measures 
(independent  variables,  X2,  X3,  .  .  .  Xn).  into  a  single  value 
(Xi)  in  such  a  way  as  to  give  the  best  possible  estimate  of  X\. 
In  the  three-variable  problem  on  page  228,  for  example,  the 
regression  equation  gives  us  the  best  possible  forecast  of  the 
number  of  honor  points  (Xi)  which  a  student  will  receive,  when 
we  know  his  general  intelligence  score  (X2)  and  the  average 
number  of  hours  he  spends  per  week  in  study  (X3).  Moreover, 
once  calculated,  the  regression  equation  may  be  used  subse- 
quently to  estimate  other  student's  scores  in  Xi  when  only  their 
scores  in  X2  and  X3  are  known.  The  value  of  the  regression 
equation  as  a  forecasting  instrument  is  determined  by  the  size 
of  the  standard  error  of  estimate,  and  by  the  multiple  coefficient 
of  correlation. 

A  good  illustration  of  the  value  of  the  regression  equation  in 
forecasting — taken  from  another  field  than  psychology — is  to  be 
found  in  the  work  of  Moore  in  forecasting  the  cotton  crop  in 

1  Kelley,  T.  L.,  Tables  to  Facilitate  the  Calculation  of  Partial  Coefficients  of 
Correlation  and  Regression  Equations,  BulletiD  of  the  University  of  Texas, 
1916,  27,  p.  7. 


PARTIAL  AND  MULTIPLE  CORRELATION  255 

the  Southern  States.1  Taking  the  cotton  crop  in  Georgia  as 
the  dependent  variable  (to  cite  a  single  example)  and  the  May 
rainfall,  June  temperature,  and  August  temperature  as  inde- 
pendent variables,  Moore  built  up  a  regression  equation  from 
which  it  was  possible  to  get  a  better  forecast  of  the  crop  at  the 
end  of  August  than  the  official  method  of  the  U.  S.  Department 
of  Agriculture  could  obtain  from  the  condition  of  the  crop  in 
September.  (By  better  forecast  is  meant  a  smaller  error  of 
prediction.) 

In  addition  to  its  use  as  a  forecasting  instrument,  the  regres- 
sion equation  may  be  used  also  to  determine  the  value  or 
" weight"  which  each  test  in  a  battery  should  have  in  order 
that  the  composite  scores  obtained  from  the  battery  (group 
of  tests)  shall  be  the  best  possible  estimates  of  that  capacity 
which  the  whole  battery  of  tests  presumably  measures.  This 
is  essentially  the  same  problem  as  that  of  prediction  or  fore- 
casting discussed  in  the  last  paragraph.  Suppose,  by  way  of 
illustration,  that  the  problem  is  to  devise  a  group  test  for  measur- 
ing general  intelligence;  and  that  this  battery  is  to  consist  of 
four  tests.  The  first  step  is  to  secure  some  good  " criterion"  2 
of  general  intelligence.  This  may  be  (1)  school  grades,  (2) 
teachers'  estimates,  (3)  (1)  and  (2)  combined,  or  (4)  some 
standard  intelligence  examination,  as  for  example,  Stanford- 
Binet  or  Army  Alpha.  The  next  step  is  to  select  four  tests 
which  will  separately  give  (1)  high  correlations  with  the  criterion, 
and  (2)  low  correlations  with  each  other.3  These  two  condi- 
tions guarantee  that  each  test  will  measure  some  aspect  or  phase 
of  the  criterion ;  and  further  that  each  test  will  probably  measure 
a  different,  or  slightly  different,  phase  of  the  criterion,  since 
the  low  intercorrelations  will  prevent  much  duplication.  Let 
us  call  the  criterion  Xc  and  the  four  tests  of  the  battery  Xi, 
X2,  X3,  and  X4.     The  regression  equation  in  Score  Form  is 

1  Moore,  H.  L.,  Forecasting  the  Yield  and  Price  of  Cotton,  1917,  pp.  108-115. 

2  See  page  266  for  definition  of  "  criterion." 

3  The  ideal  battery  of  tests  would  consist  of  tests  which  correlate  as.  high  as 
possible  with  the  criterion,  and  as  low  as  possible  with  each  other, 


256       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

Xc  =  AX1+BX2+CXz+DX±+K:  in  which  A,  B,  C,  D,  the 

regression  coefficients,  are  the  "weights"  to  be  given  the 
scores  made  on  the  four  tests,  and  K  is  a  numerical  constant. 
Now  to  take  a  very  simple  case,  suppose  that  A  —  \;  B  =  2; 
C  =  3;  and  D  =  4.  The  regression  equation  then  becomes 
Xc=  lXi  +  2X2+3X,3+4X4+i^:  which  means  that  a  subject's 
score  on  test  No.  1  must  be  multiplied  by  1,  his  score  on  test 
No.  2  by  2,  his  score  on  test  No.  3  by  3,  and  his  score  on  test 
No.  4  by  4  in  order  that  his  composite  score  on  the  battery  may 
give  the  "best"  estimate  of  his  score  on  Xc,  the  criterion. 

The  regression  equation  may  be  said  to  furnish  the  ideal 
method  of  combining  several  tests  into  a  team,  since  each  test 
in  a  regression  equation  is  weighted  according  to  its  correlation 
with  the  criterion,  independently  of  the  other  tests  in  the  team 
or  battery.  Under  these  conditions  the  standard  error  of 
estimate  is  a  minimum  while  the  correlation  of  the  predicted  Xe 
values  and  the  actual  Xc  values  (multiple  R)  is  the  maximum 
obtainable  with  the  given  set  of  tests.  R  tells  the  extent  to 
which  our  team  represents  the  criterion. 

(2)  The  only  difference  between  the  usual  or  " regular" 
form  of  the  regression  equation  and  the  "special"  form  to  be 
considered  now  is  that  in  the  special  form,  the  o-'s  of  all  of  the 
different  tests  (or  other  measures)  are  taken  as  equal.  This 
procedure  eliminates  differences  in  the  size  of  the  test  units  as 
well  as  differences  in  "spread"  or  variability,  and  enables  us  to 
determine  (from  the  correlation  alone)  the  relative  weight  with 
which  each  independent  factor  "enters  into"  or  contributes  to 
the  dependent  variable  (the  criterion)  independently  of  the  other 
factors.  In  this  way,  an  analysis  can  be  made  of  the  impor- 
tance of  several  different  factors  in  some  final  result.  It  is  very 
important  to  remember,  however,  that  in  its  special  form,  the  re- 
gression equation  cannot  be  used  for  forecasting. 

We  may  illustrate  the  special  use  of  the  regression  equation 
with  data  taken  from  the  three-variable  problem  on  page  228. 
If  Xi,  honor  points,  be  taken  as  the  criterion,  while  X2,  general 
intelligence,  and  X3,  average  number  of  hours  spent  in  study 


PARTIAL  AND  MULTIPLE  CORRELATION  257 

per  week  are,  as  before,  the  independent  variables,  the  usual 
or  " regular"  regression  equation  is  written: 

Xi  —  612.3X2 +613.2X3+^. 
Replacing  the  b's  in  this  equation  by  means  of  formula  (53), 

v  CT1.2S  v     ,  0-1.23 -rr     1    rr. 

Al=ri2.3 A2+ri3.2 A3+A; 

(T2.13  0-3.12 

and  replacing  the  partial  o's  [by  formula  (50)],  we  have 
v  0-1  Vl  —  r2i3Vl  —  r2i2.3  v 

Al=ri2.3 1  >  -  A2 

(72  V 1  —  f223  V 1  —  H12.3 

.  (TiVl-r2i2^l-r2i3.2  v    ,  ^ 

+ri3.2 y -—   r  Xz+K. 

0-3  V 1  —  r223  V 1  —  H13.2 

Substituting  numerical  values  for  the  r's  and  putting  0-1  =  0-2  =  0-3, 
we  have 

or 

Xi  =  .8X2  +  .QX3+K. 

L 
This  result  may  be  interpreted  to  mean  that  in  so  far  as  the 

two  factors,  general  intelligence  and  number  of  hours  spent  on 
the  average  in  study  per  week,  "enter  into"  the  ability  to  get 
honor  points,  they  contribute  with  the  relative  weight  of 
.8  :  .6  or  4  :  3.  It  must  be  clearly  understood  that  this  ratio 
refers  to  the  relative  contribution  of  the  two  factors  themselves 
to  the  final  result  and  not  to  the  relative  weights  of  their  scores. 
The  weight  to  be  assigned  each  score  is  found  from  the  regular 
regression  equation  given  on  page  229.  It  is  of  considerable 
interest,  however,  to  note  that  while  the  scores  on  the  general 
intelligence  test  and  number  of  study  hours  are  as  1:2,  the 
actual  contribution  of  these  two  factors  to  honor  points  (allow- 
ing for  differences  in  units,  variability,  etc.)  is  as  4  :  3.  Intel- 
ligence, therefore,  as  we  should  expect,  has  more  weight  than 
hours  spent  in  study  in  determining  the  hypothetical  ability 


258      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

which  we  have  called  " academic  success."  Much  of  the 
weight  which  study-hours  has  is  due  to  its  relatively  high 
negative  correlation  (  —  .35)  with  intelligence. 

In  concluding  this  discussion  of  partial  and  multiple  correla- 
tion, certain  limitations  to  the  use  of  the  method  should  be 
pointed  out.  In  the  first  place,  in  order  that  partial  coefficients 
of  correlation  be  valid,  it  is  necessary  that  all  of  the  zero  order 
coefficients  be  computed  from  data  in  which  the  regression  is 
linear.  Before  calculating  any  partial  r's,  we  should  make 
sure  that  all  zero  order  r's  have  linear  regression:  if  there  is 
any  doubt  as  to  linearity,  the  tests  given  on  page  209  should 
be  employed.  In  the  second  place,  the  number  of  cases  must 
be  large,  especially  if  there  are  a  number  of  variables,  otherwise 
partial  and  multiple  coefficients  will  have  little  significance. 
Coefficients  which  are  misleadingly  high  may  be  obtained 
when  studies  which  involve  many  variables  are  based  upon 
relatively  few  cases.  When  the  limitations  and  conditions 
mentioned  are  fully  recognized  and  met,  however,  partial  and 
multiple  correlation  furnishes  us  with  an  exact  and  powerful 
instrument  for  the  analysis  of  problems  which  arise  in  mental 
and  social  measurements. 

VI.  Spurious  Correlation1 

The  correlation  between  two  sets  of  test  scores  is  said  to  be 
"spurious"  when  it  is  due  in  whole  or  part  to  factors  other  than 
those  which  determine  performance  in  the  tests  themselves. 
In  general,  the  cause  of  spurious  correlation  may  be  said  to  lie 
in  a  failure  to  control  conditions;  and  the  most  usual  effect  of 
this  lack  of  control  is  a  "boosting"  or  dilation  of  the  coefficient. 
Some  of  the  more  general  situations  which  may  lead  to  spurious 
correlation  are  given  under  the  following  heads: 

1.  Spurious  Correlation  Due  to  the  Heterogeneity  of  Material 

We  have  already  found  occasion  to  show  elsewhere  (page 
221)  how  a  lack  of  uniformity  in  age  conditions  will  lead  to 

iSec  also  Chap.  IV,  p.  211. 


PARTIAL  AND  MULTIPLE  CORRELATION  259 

correlation  which  is  too  high,  i.e.,  is  spurious.  Differences  in 
age  within  the  group  will  lead  to  a  distinctly  higher  correlation 
between  two  tests — when  the  test  scores  increase  with  age — 
than  the  correlation  which  we  should  obtain  in  a  single  age 
(a  homogeneous)  group.  To  cite  a  simple  case,  in  a  group 
of  boys  from  10  to  18  years  old,  a  substantial  correlation  will 
appear  between  strength  of  grip  and  length  of  forearm,  quite 
apart  from  any  real  relation,  due  solely  to  the  fact  that  both  of 
these  physical  attributes  increase  with  age. 

Failure  to  take  account  of  the  age  factor  is  a  prolific  source 
of  error  in  correlational  work.  In  stating  the  correlation 
between  two  tests,  or  the  reliability  coefficient  of  a  test,  we 
should  always  be  careful  to  specify  the  range  of  ages,  grades, 
etc.,  in  order  to  show  the  heterogeneity  of  the  group.  With- 
out this  information  an  r  per  se  is  practically  valueless. 

Many  other  factors  besides  age  may  lead  to  spurious  cor- 
relation. To  cite  a  familiar  example : 1  if  alcoholism,  degeneracy 
and  bad  heredity  are  all  positively  related,  the  r  between  alcohol- 
ism and  degeneracy  will  be  too  high  (due  to  the  indirect  effect 
of  heredity  on  both  factors)  unless  the  heredity  influences  are 
kept  constant.  Again,  to  take  another  example,  suppose  that 
we  have  found  the  scores  on  a  general  intelligence  examination 
and  a  cancellation  test  for  two  distinctly  different  groups, 
e.g.,  500  college  seniors  and  500  day  laborers;  and  that  the 
average  ability  in  both  tests  is  definitely  higher  in  the  college 
group.  Now  if  the  correlation  between  these  tests  is  zero  in 
each  group  taken  separately,  when  the  two  groups  are  combined 
a  positive  correlation  will  be  obtained  due  simply  to  the  hete- 
rogeneity of  the  composite  group.2  Such  a  correlation  is,  of 
course,  spurious. 

To  be  valid,  it  is  clear  that  a  correlation  must  be  freed  of 
extraneous  influences  which  affect  the  homogeneity  of  the 
material.     When  such  influences  cannot  be  determined  quan- 

1  Kelley,  T.  L.,  Tables  to  Facilitate  the  Calculation  of  Partial  Coefficients  of 
Correlation  and  Regression  Equations,  Bull.  Univ.  Texas,  1916,  No.  27. 

2  Otis,  A.  S.,  Statistical  Method  in  Educational  Measurement,  1925,  pp.  334- 
336. 


260      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

titatively,  this  is  far  from  an  easy  task.  Provided,  however, 
the  factor  or  factors  producing  heterogeneity  are  measurable, 
their  influence  may  usually  be  allowed  for  by  the  method  of 
partial  correlation. 

2.  Spurious  Index  Correlation 

It  can  be  shown  x  that  three  variables  Xi,  X2,  and  X3  may 

be  totally  uncorrelated,  and  still  a  correlation  between  Z\  =  ^r- 

A-3 

X2 

8 "id  Z2  =  -tf*  may  be  obtained  which  is  as  large  as  .50.     To  take  a 

-*3 

concrete  case,  if  two  individuals  observe  a  series  of  magnitudes 
(e.g.,  Galton  Bar  settings)  independently,  the  absolute  errors 
of  observation  (Xi  and  X2)  may  be  uncorrelated,  and  still  a 
distinct  correlation  appear  between  the  errors  made  by  the  two 
observers  when  these  are  expressed  as  per  cents  of  the  magnitude 
observed  (X3).  The  spurious  element  here  is,  of  course,  the 
common  factor,  X3,  in  the  denominator  of  the  ratios. 

One  of  the  commonest  examples  of  spurious  index  correla- 
tion in  psychology  is  found  in  the  correlation  of  7Q's  obtained 
from  two  different  intelligence  tests.  If  the  7Q's  of  500  children 
ranging  in  age  from  3  to  14  years  are  calculated  from  two  tests 
Xi  and  X2,  the  correlation  between  IQXl  and  IQX2  will  be  con- 
siderably increased  because  of  the  presence  of  the  common  factor 

of  chronological  age  X3    (since  IQ  =  -^-r-\   in  the  two  series. 

The  spurious  element  here  may  be  eliminated  by  holding  con- 
stant the  common  factor  of  age  through  partial  correlation. 

3.  Spurious  Correlation  of  a  Single  Test  With  a  Composite  of 

Which  it  is  a  Member 

If  the  scores  of  several  tests,  Xi}  X2,  X3,  etc.,  are  averaged 
or  added,  and  the  composite  scores,  Xcom.  correlated  with  the 
scores  of  any  single  test  Xi,  the  correlation  resulting  will  be  too 
high  (spurious)  because  of  the  presence  of  Xi  in  the  composite. 

1  Yule  G.  U.,  An  Introduction  to  the  Theory  of  Statistics,  pp.  215-216. 


PARTIAL  AND  MULTIPLE  CORRELATION  261 

The  amount  or  degree  of  the  spurious  element  is  measured  by 

the  ratio  -  in  which  t  =  the  number  of  elements  in  the  single 
s 

test,  and  s  =  the  number  of  elements  in  the  composite1  (see  page 
293).  To  illustrate:  there  are  20  items  in  the  Number  Series 
Completion  Test  of  the  Army  Alpha,  and  212  items  in  the  whole 
test.  Now  if  there  were  no  correlation  at  all  between  the  scores 
on  Alpha  and  Completion  there  would  still  be  a  spurious  cor- 
relation between  the  two  tests  equal  to  the  ratio  of  the  number 
of  items  in  Completion  to  the  total  number  of  items  in  Alpha, 
i.e.,  22A  or  .094.  A  correlation  obtained  between  Completion 
and  Alpha,  therefore,  will  be  too  high,  due  simply  to  the  inclu- 
sion of  the  Completion  items  in  both  sets  of  data. 

It  should  be  noted  that  when  several  tests  are  all  of  the 
same — or  approximately  the  same — length,  the  amount  of 
spurious  correlation  which  will  result  from  correlating  any 
single  test  with  a  composite  of  them  all  is  approximately  con- 
stant ( -  is  same ) .     For  this  reason  it  is  valid  to  compare  the 

correlations  of  the  separate  tests  with  the  composite  in  order 
to  discover  which  tests  are  most  representative  of  the  capacity 
measured  by  them  all  (see  page  267). 

VII.  Summary  of  Formulas  in  Chapter  V 

1.  Partial  r's, 

^12.34  .  .  .  (»-l)— Tln.34  .  .  .  (n-l)^2».34  .  .  .  (n-1)  //inx 

ri2.34 . . .  » =     ,  ,-  =.       .     (49) 

VI— r-ln.34  .  .  .  (»-l) V  1—  r^2n.34  .  .  .  (n-1) 

2.  Partial  o-'s, 

0-1.234  •    •    •   ft  =  (TlVl  -rV^l  _r213  2Vl  -r214.23    .    .    .    Vl-r2l„.23.    .    .   (»-!)•     (50) 

3.  Regression  Equation,  Deviation  Form, 

Xl=bl2.S  .  .  .  n^2  +  ?>13.2  .  .  .  n%3  ■  •  .   +  &ln.23  •  •  •  (n-l)Xn.         (51) 

1  Musselman,  J.  R.,  Spurious  Correlation  Applied  to  Urn  Schemata,  Journal 
of  American  Statistical  Association,  Vol.  XVIII,  Sept.,  1923. 


262       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

4.  Regression  Equation,  Score  Form, 

X\  =  &12.34  .  .  .  wX2  +  6l3.24  •  •  •  nXs  .  .  .  +  &lw.23  .  .  .  (n-l)Xn-\-K.     (52) 

5.  Regression  Coefficients, 

7  0-1.234  ...71  ,cox 

012.34  .  .  .  n  =  ?12.34  .  .  .  n {OS) 

02.134  .  .  .  n 

6.  Standard  Error  of  Estimate, 

0(est.A'1)  =  CT1.234  .  .  .  n (54) 

7.  Probable  Error  of  Estimate, 

PE (est.  xx)=  •  6745X0- (est.  xi) (55) 

8.  Multiple  Coefficient  of  Correlation, 


#i(23  . . .  n)  —  \ll — o~ — ~~^ (56) 

\        a~i 

9.  Formula  for  " Chance''  R, 

#  =  ^p. (57) 

10.  Alternate  formula  for  R, 

#1(234   ...«)=  Vl-[(l-r212)(l-^13.2)    •    •    •    (l-r2m.,3   .    .   .    („-!))].  (58) 

PROBLEMS 

1.  The  r  for  intelligence  and  school  achievement  in  a  group  of  children 

8  to  14  years  old  is  .80.  The  r  for  intelligence  and  age  in  the  same 
group  is  .70.  The  r  for  school  achievement  and  age  is  .60. 
What  will  be  the  correlation  between  intelligence  and  school 
achievement  in  children  of  the  same  age? 

2.  'The  correlation  between  (1)  Army  Alpha  and  (2)  Cancellation  in  a 

group  of  100  freshmen  is  .20.  The  correlation  between  (1)  Army 
Alpha  and  (3)  Controlled  Association  in  the  same  group  is  .70. 
The  correlation  between  (2)  Cancellation  and  (3)  Controlled 
Association  is  .45.  What  is  the  net  correlation  between  Alpha 
and  Cancellation  in  this  group?  Between  Alpha  and  Controlled 
Association?     How  do  you  interpret  your  results? 


PARTIAL  AND  MULTIPLE  CORRELATION  263 

3.  Given  the  following  data : 1 

Xi  =  high  school  grade  in  mathematics. 
X2  =  grade  in  an  English  interest  test. 
X3  =  grade  in  a  history  interest  test. 
X4  =  grade  in  a  mathematics  interest  test. 

o-1=4.93  r12=.20  r23=.63 

0-2  =  3.13  r13=.15  r24=.21 

cr3  =  6.12  r14=.24  r34=.54 
0-4  =  4.64 

(a)  Work  out  the  regression  equation  of  Xi  on  X2,  X3,  X4. 
(6)  What  are  the  relative  weights  of  the  three  tests,  X2,  X3,  and 
X4,  in  determining  the  score  on  Xi? 

4.  The  following  records  were  secured  from  450  Liberal  Arts  freshmen 

at  Syracuse  University:  2 


Honor  points 

2. 

Intell. 

3.  Aver.  H.  S. 
Grades 

4.  Units             5.   Hours  per 
week  of  study 

Mi  =  18.5 

Mr- 

=  100.6 

M3  = 

79 

M4=16.1         M5  =  24 

o-!  =  11.2 

0-2  : 

=   15.8 

o3  = 

7.5 

0-4=   1.5          0-5=  6 

r12=.60 

7*23  = 

.36 

r34: 

=  .40            r45=.25 

r13=.40 

( 

r24  = 

.20 

r3  5 

=  .11 

r14=.22 

T2b  = 

-.35 

r15=.32 

(a)  Work  out  a  regression  equation  with  (1)  honor  points  as  the 

dependent  variable. 

(b)  If  a  student  has  an  intelligence  score  of   110,  a  High  School 

average  of  75,  offers  15  units  for  entrance,  and  studies  on  the 
average  25  hours  per  week,  what  is  his  most  probable 
number  of  honor  points? 

5.  Using  as  much  of  the  data  in  Example  (4)  as  is  necessary,  find 
how  many  hours  a  student  must  study  if  he  has  an  intelligence 
score  of  120,  and  wants  to  make  20  honor  points?     (Hint :  work 

1  Kelley,  T.  L.,  Educational  Guidance,  Teachers  College,  Contributions  to 
Education,  1914,  71,  p.  104. 

2  May,    Mark  A.,  Predicting  Academic  Success,   Journal   of  Educational 
Psychology,  1923,  Vol.  XIV,  7,  429-440. 


264      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

out  the  regression  equation  of  study  hours  on  honor  points  and 
intelligence  and  substitute  the  given  values  in  the  equation.) 

6.  Let  Xi  be  a  criterion,  and  X2  and  X3  two  other  tests.     Correlations 

and  a's  are  as  follows : 

ri2=.60  r23=.20  <r,=   5.00 

n3=.50  a2  =  10.00 

o-3=  8.00 

How  much  more  accurately  can  Xx  be  predicted  from  X2  and  X3 
in  combination  than  from  either  alone? 

7.  Given  a  team  of  two  tests,  each  of  which  correlates  .50  with  a 

criterion.     If  the  correlation  of  the  two  tests  is  .20, 
(a)  How  much  would  the  addition  of  another  test  which  correlates 

.50  with  the  criterion  and  .20  with  each  of  the  other  tests  improve 

the  predictive  value  of  the  team? 
(6)  How  much  would  the  addition  of  two  such  tests  improve  the 

predictive  value  of  the  team? 

8.  Two  absolutely  independent  measures  B  and  C  completely  deter- 

mine a  third  measure  A.     If  B  correlates  .50  with  A,  what  is 
the  correlation  of  C  and  A? 

9.  Using  the  data  given  in  Example  (1)  above,  analyze  school  achieve- 

ment in  terms  of  intelligence  and  age.     What  is  the  relative 
importance  of  the  contribution  made  by  these  factors? 

10.  A  group  test  contains  10  tests  with  a  total  of  200  items.  One  of 
the  tests  correlates  .60  with  the  composite  scores  on  the  battery. 
If  this  test  contains  15  items,  how  much  of  the  given  correlation 
is  spurious? 

Answers 

1.  r=.67. 

2.  The  r  between  Alpha  and  Cancellation  is  —  .18;   between  Alpha 

and  Controlled  Association,  .  70. 

3.  (a)  xi=  .37x2-.llz3+.28:c4. 

(6)  Grade  in  mathematics  =  6. 5  (grade  in  English  interest  test) 
—2  (grade  in  history  interest  test) +5  (grade  in  mathematics 
interest  test). 


PARTIAL  AND  MULTIPLE  CORRELATION  265 

4.  (a)  Xi=.58X2+.  14X3-1. O3X4+I.  10XB-62 

(6)  24  with  a  PE(est.  Xl)  of  4  points. 

5.  18  hours  with  a  PEiesUX0  of  2.7  hours:  18db2.7 

6.  From  X2  alone  cr(est.  Xl)  =  4 . 0 
From  X3  alone  o-(est.  xx)  =  4 . 3 
From  X2  and  X3       cr(est.  Xl>  =  3.5 

7.  (a)  i?  increases  from  .64  to  .73. 
(6)  R  increases  from  .64  to  .79. 

8.  rAC=.8m. 

9.  Intelligence  and  age  contribute  in  the  ratio  (approximately)  of 

10  :  1. 

10.   .075. 


CHAPTER  VI 

SOME  APPLICATIONS  OF  STATISTICAL  METHOD  AND 
TECHNIQUE  TO  TESTS  AND  TEST  RESULTS 

To  treat  properly  all  of  the  statistical  methods  which  may 
be  applied  to  tests  would  require  not  a  single  chapter  but  a 
volume  in  itself.  The  aim  of  the  present  chapter,  therefore,  is 
to  consider  simply  those  methods — having  to  do  largely  with 
correlation  and  reliability — which  are  deemed  essential  (1)  in 
the  treatment  of  ordinary  problems  involving  tests  and  (2)  as  a 
foundation  for  more  advanced  work  in  methods  of  treating  test 
results. 

I.  The  Validity  of  Test  Scores 

The  validity  of  any  measuring  instrument  depends  on  the 
fidelity  with  which  it  measures  whatever  it  purports  to  measure. 
A  yardstick  is  " valid"  when  measurements  made  by  it  can  be 
checked  by  other  measuring  instruments.  And  in  like  manner 
a  test  is  valid  when  the  capacity  which  it  measures  corresponds 
to  the  same  capacity  as  otherwise  objectively  measured  and 
defined. 

1.  Validity  Determined  through  Correlation  with  a  Criterion 

The  validity  of  a  test  is  usually  determined  by  finding  the 
correlation  between  the  test  and  some  independent  criterion. 
A  criterion  is  defined  as  that  measure  in  terms  of  which  the 
value  of  a  test  is  estimated  or  judged.  The  criterion  of  a 
general  intelligence  test,  for  example,  may  be  school  marks,  or 
ratings  for  intelligence,  or  some  other  test  believed  to  be  valid.1 

1  Stanford-Binet  is  often  taken  as  a  reliable  criterion  of  general  intelligence. 
For  example,  see  Herring  Revision  of  Bluet-Simon  tests. 

266 


STATISTICAL  METHOD  AND  TEST  RESULTS  267 

The  criterion  for  a  trade  test  is  actual  ability  in  the  trade.  A 
high  correlation  between  a  test  and  its  criterion  may  be  taken 
as  evidence  of  validity,  provided  both  the  test  and  the  criterion 
are  reliable.  Before  accepting  criterion-correlations  as  final, 
however,  we  must  know  the  reliability  of  our  test,  and  if  possi- 
ble, we  should  know  also  the  reliability  of  our  criterion.1 

2.  Indirect  Measures  of  Validity 

When  a  reliable  criterion  is  not  available,  indirect  methods 
must  be  employed  to  determine  validity.  One  indirect  method 
is  to  combine  the  scores  on  a  number  of  tests  of  the  same 
general  function  and  to  judge  as  best  (most  valid  for  the  func- 
tion) that  test  which  correlates  highest  with  the  average  of  all. 
Thus  Whitley  2  found  for  three  discrimination  tests,  Naming 
Colors,  Naming  Forms,  and  Naming  Objects,  the  following 

correlations : 3 

[Naming  Colors    r=  .67 

Average  of  all  three  tests  with    \  Naming  Forms    r  =  .99 

l  Naming  Objects  r=  .96 

She  concludes  that  "  Naming  Forms  seems  more  a  typical  test 
in  so  far  as  it  measures  an  ability  common  to  these  three  tests. " 
In  the  absence  of  an  independent  measure  of  the  function  the 
average  of  several  tests  of  that  function  may  be  taken  as  one 
criterion. 

A  second  indirect  method  of  measuring  validity  is  to  find 
correlations  between  the  given  test  and  other  tests,  in  this  way 
discovering  some  of  the  facts  which  the  test  does,  and  does  not, 
measure.  For  example,  tests  of  Controlled  Association,  e.g., 
Opposites,  Logical  Relations,  "etc.,  correlate  much  more  highly 
with  tests  of  general  intelligence  and  " reasoning"  than  with 
tests  of  Cancellation  or  Color-Naming.  The  first  group  of 
tests  is,  therefore,  a  better  (more  valid)  measure  of  the  capacity 

i  Kelley,  T.  L.,  The  Reliability  of  Test  Scores,  Journal  Educational  Research, 
1921,  Vol.  3,  5,  p.  370. 

2  Tests  for  Individual  Differences,  Archives  of  Psychology,  1911,  19,  p.  78. 

3  The  "spurious"  element  here  is  constant  provided  the  tests  are  all  of 
practically  the  same  length  (see  page  261). 


268       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

measured  by  the  general  intelligence  and  reasoning  tests  than 
the  second  group.  (Indirect  measures  of  this  sort  are  advisable 
only  in  the  absence  of  more  direct  and  valid  criteria.) 

The  absence  of  valid  criteria  for  many  of  his  tests  forces  the 
careful  psychologist  to  define  tests  strictly  in  terms  of  what 
they  actually  do.  Hence  the  tendency  of  present-day  testers  is 
to  call  a  test  by  some  descriptive  name  rather  than  in  terms  of 
some  more  or  less  well-defined  " mental  function. ';  Accord- 
ingly, we  have  Opposites  Tests,  and  Completion  Tests  rather 
than  tests  of  Association  or  Reasoning. 

II.  The  Reliability  of  Test  Scores 

1.  The  Reliability  of  a  Test  as  Measured  by  Its  Self-Correlation 
A.  The  "  Reliability  Coefficient  " 

The  reliability  of  a  test  (or  of  any  measuring  instrument) 
is  determined  by  the  consistency  with  which  it  measures  the 
capacity  of  those  taking  it.  If  a  group  repeats  a  test  and  each 
individual  in  the  group  scores  close  to  his  first  record,  we  regard 
the  test  as  reliable.  If,  however,  there  are  large  positive  and 
negative  differences  between  the  scores  made  by  individuals  on 
the  first  and  second  giving  of  the  test  over  and  above  the 
practice  effect l — and  if  such  differences  occur  in  a  large  num- 
ber of  cases — obviously  the  test  is  inconsistent  and  unreliable. 
One  method  of  measuring  the  reliability  of  a  test  is  to  correlate 
the  scores  made  on  the  test  by  a  given  group  with  the  scores 
made  on  the  same  or  a  duplicate  test  by  the  same  group.  This 
is  the  method  of  self-correlation;  and  the  r  so  found  is  called 
the  "reliability  coefficient." 

When  the  reliability  coefficient  of  a  test  is  1.00,  the  test  is  an 
absolutely  accurate  measure  of  whatever  capacity  it  tests,  and 
when  the  reliability  coefficient  is  .00  the  test  has  just  no  relia- 
bility. The  lower  the  reliability  coefficient  the  less  the  reliability 
or  consistency  of  the  test  as  a  measuring  instrument. 

1  Practice,   since  it  serves  to  increase  all  scores  proportionally,   does  not 
affect  self-correlation.     It  does,  however,  introduce  a  constant  error. 


STATISTICAL  METHOD  AND  TEST  RESULTS  269 

How  high  should  self-correlation  be  in  order  to  indicate  a 
satisfactory  reliability?  This  is  an  important  question  and  its 
answer  depends  largely  on  the  nature  of  the  test  and  the  size  and 
variability  of  the  group  for  whom  the  test  is  intended.  Most 
makers  of  general  intelligence  tests  demand  a  reliability  coeffi- 
cient of  at  least  .90  between  duplicate  forms  of  their  tests  for 
unselected  groups  of  the  same  chronological  age.  To  be  a  reli- 
able measure  of  capacity,  a  mental  or  physical  test  should — 
generally  speaking — have  a  minimum  reliability  coefficient  of 
at  least  .80.  This  minimum  will  vary  with  the  group,  however, 
as  the  reliability  coefficient  is  considerably  affected  by  the  range 
of  scores  made  on  the  test  (see  page  271).  For  this  reason,  in 
giving  the  reliability  coefficient  of  a  test  the  size  and  variability 
of  the  group  measured  should  always  be  stated. 

B.  Effect  on  Reliability  of  Lengthening  or  Repeating  the  Test 

If  the  self-correlation  of  a  test  is  unsatisfactory  two  courses 
are  open:  (1)  we  can  lengthen  the  test  until  the  reliability  is 
greater;  or  (2)  we  can  repeat  the  test  and  its  duplicate  twice 
each,  average  the  two  series  of  scores,  and  correlate  these 
averages.  If  after  (2)  the  reliability  coefficient  is  still  too  low, 
we  can  repeat  the  test  and  its  duplicate,  three,  four,  or  as  many 
times  as  is  necessary  to  secure  the  desired  reliability  coefficient. 
To  do  either  (1)  or  (2)  empirically  would  require  a  consider- 
able amount  of  time  and  labor;  hence  it  is  fortunate  that  a 
good  measure  of  the  effect  of  (1)  or  (2)  may  be  expeditiously 
secured  by  applying  Spearman's  (sometimes  called  Brown's1) 
" prophecy"  formula: 

Nr 
Tx~.l+(N-l)r (59) 

To  illustrate  the  application  of  this  formula,  suppose 
(a)  that  the  self-correlation  of  a  test  is  .70  and  that  we  wish  to 
know  what  will  be  the  effect  of  doubling  the  length  of  the  test 

1  Brown,  Wm.,  The  Essentials  of  Mental  Measurement,  1911,  p.  102. 


270       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

on  its  reliability.  Substituting  r  =  .70  and  N  =  2  in  the  formula, 
and  solving  for  rx  we  have 

2X.70 

Doubling  the  test's  length,  therefore,  increases  the  self-correla- 
tion from  .70  to  .82.  Instead  of  doubling  the  length  of  the  test, 
we  may  give  it  and  its  duplicate  twice  each,  average  the  two 
scores  made  by  each  individual  in  the  two  series,  and  correlate 
these  averages.  The  result  will  be  the  same  (as  far  as  purely 
statistical  factors  are  concerned)  as  that  obtained  by  doubling 
the  length  of  the  test. 

The  " prophecy"  formula  may  be  used  in  another  way. 
Suppose  (6)  that  the  self-correlation  of  a  test  or  the  correlation 
of  the  test  and  its  duplicate  is  .80.  How  much  will  the  test 
have  to  be  lengthened  (or  how  many  times  repeated)  in  order 
to  insure  a  reliability  coefficient  (rx)  of  .95?  Substituting  r  =  .80 
and  rx=.95  in  the  formula,  and  solving  for  iV, — 

.95=        -SN  -8N 


1+.82V-.8     .2+.  SN' 
.04AT=.19 

N  =  4 .  75  or  5 .  00  (in  whole  numbers) . 

The  test  must  be  5  times  its  present  length  or  repeated  (together 
with  its  duplicate)  5  times  in  order  to  raise  the  self-correlation 
from  .80  to  .95. 

When  a  test  is  increased  in  length,  e.g.,  doubled  or  tripled, 
the  items  or  questions  added  must  always  be  equal  in  reliability 
to  the  reliability  of  the  original  test,  if  the  results  from  the 
prophecy  formula  are  to  be  valid.  Provided  this  condition  is 
satisfied,  it  is  evident  that  if  we  increased  the  length  of  a  test 
indefinitely  we  could — theoretically — raise  its  self-correlation  to 
any  desired  figure.  This  seems  scarcely  reasonable,  however; 
and  there  is  evidence  to  indicate   that  while   the   reliability 


.STATISTICAL  METHOD  AND  TEST  RESULTS  271 

coefficient  increases  according  to  the  formula  for  the  first  four 
or  five  pooled  tests,  thereafter  it  increases  ''more  slowly  than 
the  prediction  formula  would  lead  us  to  expect."  ! 

C.  Coefficient  of  Reliability  from  One  Application  of  a  Test 

If  a  test  has  no  duplicate  and  cannot  well  be  repeated,  we 

may  measure  the  reliability  of  half  of  the  test  and  then  by 

Spearman's  formula  find  the  reliability  of  the  whole  test.     The 

procedure  is  as  follows:    First,  we  make  up  two  independent 

sets  of  scores  by  combining,  say,  alternate  exercises  in  the  test. 

For  example,  one  set  of  scores  may  be  the  performance  on  the 

odd  exercises,  e.g.,  1,  3,  5,  etc.;   the  other  set  the  performance 

on  the  even  exercises,  e.g.,  2.  4,  6,  etc.;  or  some  other  plan  may 

be  used.2     These  two  sets  of  scores  are  now  correlated  to  find 

the  reliability  coefficient  of  the  half  test.     If  the  self-correlation 

of  the  half  test  so  found  is  called    r*,  substituting  X  =  2  in 

Spearman's  formula,   we   can  calculate  the  reliability  of  th 

whole  test  bv  the  formula, 

2 


rh 


(6o; 


In  using  this  formula  we  make  the  assumption  that  the  halves 
of  the  test  as  we  have  made  them  up  are  approximately  equiva- 
lent in  difficulty  and  content. 

D.  Dependence  of  the  Reliability  Coefficient  on  the  Size  and 
Variability  of  the  Group 

The  coefficient  of  reliability  obtained  from  a  test  and  its 
duplicate  given  to  the  pupils  of  a  single  grade  cannot  be  taken 
as  indicative  of  the  same  degree  of  reliability  as  the  identical 
coefficient  obtained  from  a  group  composed  of  pupils  spread  over 
several  grades.     This  is  due  to  the  fact  that  the  heterogeneity — 

1  Hoizinger,  Karl  J.,  Note  on  the  Use  of  Spearman's  Prophecy  Formula  for 
Reliability,  Journal  Educational  Psychology.  1923.  Vol.  XIV.  5.  pp.  301-305. 

2  Ruch.  G.  ML,  and    Del    Manzo,  M.  C,   The  Downey  Will  Temperament 
Hfi  Test;  Analysis  of  its  Reliability  and  Validity,  Journal  Applied  Psvcbok  g 

Vol.  VII.  1.  1923.  p.  65. 


272       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

the  size,  and  spread — of  the  two  groups  is  different.  Recently 
Kelley  l  has  devised  a  formula  from  which,  knowing  the  relia- 
bility coefficient  of  a  test,  say,  in  a  group  composed  of  pupils 
from  a  single  grade,  we  can  determine  what  the  reliability  coeffi- 
cient of  the  same  test  must  be  in  a  group  composed  of  pupils 
from  several  grades  in  order  that  the  test  be  equally  effective 
in  both  ranges.     The  formula  is 


Vl-r  ' 


2 


(61) 


in  which  u  and  2  are  the  o-'s  of  the  scores  in  the  small  and  large 
groups,  respectively,  and  r  and  R  are  the  reliability  coefficients 
of  the  test  in  the  small  and  large  groups.  To  illustrate,  suppose 
that  in  a  single  grade  r=^.50  and  c  =  5.00;  and  that  in  a  large 
group  made  up  of  children  from  grades  3  to  8,  inclusive,  2  =  15. 
What  R  (i.e.,  reliability  coefficient)  must  the  test  yield  in  the 
large  group  in  order  to  be  as  effective  here  as  in  the  small  group? 
Substituting  for  a,  2,  and  r  in  the  formula,  R  =  .94, — which 
means  that  a  reliability  coefficient  of  .50  in  the  small  group 
indicates  the  same  degree  of  reliability  as  a  reliability  coefficient 
of  .94  in  the  group  in  which  the  range  of  " talent"  is  three  times 
as  great. 

This  formula  may  be  used  to  determine  whether  a  test  is 
equally  effective  in  parts  of  the  range  (a)  as  in  the  whole  range 
(2) ;  or  in  one  range  as  in  another.  It  also  serves  to  make  clear 
the  necessity  of  always  giving  the  size  and  spread  of  the  group 
in  stating  and  interpreting  reliability  coefficients.2 

2.  The  Index  of  Reliability 

By  an  individual's  "true"  score  in  a  test  is  meant  the 
average  of  a  very  large  number  of  measurements  made  of  the 
given  individual  on  the  same  or  duplicate  tests  under  precisely 

i  The  Reliability  of  Test  Scores,  Journal  Educational  Research,  1921,  Vol. 
Ill,  5,  pp.  370-379. 

2  Otis,  A.  S.,  Statistical  Method  in  Educational  Measurement,  1925,  pp. 
253-254. 


STATISTICAL  METHOD  AND  TEST  RESULTS  273 

the  same  conditions.  It  has  been  shown  1  that  the  correlation 
between  a  series  of  obtained  scores  and  their  corresponding 
"true"  scores  may  be  found  from  the  formula 


^"obt.  true 


=  vVi2, (62) 


in  which  7*12  is  the  self-correlation  or  the  reliability  coefficient 
obtained  from  duplicate  forms  of  the  test.  Given  the  reliability 
coefficient,  therefore,  it  is  possible  to  secure  the  coefficient  of 
correlation  between  a  set  of  obtained  scores  and  their  correspond- 
ing true  scores.  This  coefficient,  robt.  true,  is  called  the  "index  of 
reliability,"  and  is  the  maximum  value  which  the  reliability 
coefficient,  ri2*  can  take.  This  will  be  seen  to  follow  from 
the  fact  that  "the  highest  possible  correlation  which  can  be 
obtained  (except  as  chance  might  occasionally  lead  to  higher 
spurious  correlation)  between  a  test  and  a  second  measure  is 
with  that  which  truly  represents  what  the  test  actually  measures, 
— that  is,  the  correlation  between  the  test  and  the  true  scores  of 
individuals  in  just  such  tests."  2  Since  ri2  is  usually  less  than 
1.00,  rGbt.  true  is  nearly  always  greater  than  ri2. 

To  illustrate  the  index  of  reliability,  suppose  that  for  a  given 

group,  ri2  =  .64.  Then  roht_  true  =  V.64  or  .80,  and  .80  is  the 
highest  self-correlation  which  can  be  obtained  (except  by 
chance)  with  this  test  in  its  present  form.  The  index  of 
reliability  is  a  useful  and  easily  interpreted  measure  of  a  test's 
reliability,  since  by  simply  extracting  the  square  root  of  an 
obtained  reliability  coefficient  we  can  find  the  maximum  reli- 
ability which  the  test  is  capable  of  yielding.  Thus,  if  r& 
=  .25,  so  that  robt.  trUe  =  v  .25  or  .50,  it  is  obviously  a  waste  of 
time  to  continue  using  the  test  without  lengthening  or  otherwise 
improving  it. 

1  Kelley,  T.  L.,  A  Simplified  Method  of  Using  Scaled  Data  for  Purposes  of 
Testing.     School  and  Society,  1916,  Vol.  IV;   34,  71. 

2  Kelley,   T.   L.,    The   Reliability   of   Test    Scores,   Journal  of    Educational 
Research,  1921,  Vol.  Ill,  5,  327. 


274      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

3.  The  Standard  Error  and  Probable  Error  of  Measurement 
coif)  and  PE{m 

We  have  seen  that  the  reliability  of  a  test  may  be  measured 
in  terms  of  (1)  its  reliability  coefficient,  and  (2)  its  index  of 
reliability.  Still  another  way  of  measuring  the  reliability  of  a 
test  is  to  determine  how  closely  a  score  obtained  on  the  given 
test  approximates  its  corresponding  true  score.  (True  scores 
have  been  defined  on  page  272.)  An  obtained  score  will  usually 
differ  in  some  degree  from  its  corresponding  true  score  due 
to  the  presence  of  two  sorts  of  errors, — constant  errors  and 
variable  errors.  Constant  errors,  since  their  weight  is  all  in 
one  direction,  do  not  affect  self-correlation,  and  can  usually  be 
ruled  out  or  their  influence  measured.  Variable  errors,  how- 
ever, since  they  may  be  either  positive  or  negative,  are  less 
easily  eliminated  than  constant  errors,  and  hence  are  more 
effective  in  producing  departures  of  obtained  scores  from  cor- 
responding true  scores. 

The  measurement  of  the  influence  of  variable  errors,  there- 
fore, becomes  a  matter  of  considerable  importance.  It  may  be 
done  by  calculating  the  standard  error  of  measurement — 
written  o-(m> — which  may  be  interpreted  as  a  measure  of  the 
amount  of  variable  error,  or  as  a  measure  of  the  probable 
divergence  of  obtained  scores  from  true  scores  after  the  elimi- 
nation of  constant  errors.     The  a{M)  is  derived  directly  from 

the  <j(est.)  as  follows.  In  the  equation  ff(ejt.i)=ci^l-^i2  (see 
formula  32),  if  <n  is  the  a  of  the  scores  in  test  1,  and  T\%  is  the 
correlation  between  tests  1  and  2,  then  <r(est.  i>  measures  the 
accuracy  with  which  individual  scores  in  test  1  may  be  esti- 
mated from  a  knowledge  of  the  corresponding  scores  in  test  2. 
Now  if  the  scores  on  test  2  are  taken  to  represent  true  scores, 
and  the  scores  on  test  1,  obtained  scores  on  the  same  test  the 
equation  may  be  written 


^(est.  obt.)  —  O'obt.'V  I       T  obt.  true. 

But  r0b».  truo=  v>'i2,  and  r2ODt.  true  =  ''12  the  reliability  coefficient. 


STATISTICAL  METHOD  AND  TEST  RESULTS  275 

Hence,  substituting  these  values  in  the  above  equation,  we 
have  

0"(est.  obU  =  01  vl—  Ti2, 
or  writing  <r{M)  for  o-(est.  obt.)  finally, 

o-w  =  criVl-ri2. (63) 

Formula  (63)  gives  the  standard  error  of  measurement  for 
a  set  of  obtained  scores.  Given  ri2,  the  reliability  coefficient 
of  the  test,  and  a\  (the  a  of  the  test  scores)  we  can,  from  formula 
(63)  measure  the  probable  divergence  of  an  obtained  score 
from  its  corresponding  true  score. 

Instead  of  aiM)  we  may  find  PE(M) — which  is  probably 
more  often  used — by  the  formula 

PECM)=.6745criV,l-ri2.       ....     (64) 

To  illustrate  the  use  of  these  formulas,  suppose  that  in  a 
group  of  100  college  men,  we  obtain  an  average  Army  Alpha 
score  of  150  with  a  a  of  15.00  points;  and  that  the  self -cor- 
relation of  Alpha  (found  by  correlating  two  forms)  is  .90.  What 
are  the  a^M)  and  PE\M)!    Applying  formula  (63),  we  have 


<r(M)  =  15V/l-.90  =  4.74 
and  from  (64), 

PE\M)  =  •  6745  X  15VT=T90  =  3 .  20. 

From  the  PE{M),  we  may  interpret  this  result  to  mean  that  the 
chances  are  even  that  the  true  score  of  any  individual  in  the 
group  of  100  falls  within  the  range,  obtained  score±3.20. 
For  a  given  obtained  score  of  175,  the  chances  are  even  that 
the  true  score  of  this  particular  man  lies  within  the  limits 
178.20  and  171.80.  Expressed  in  another  way,  we  may  say 
that  50%  of  the  obtained  scores  are  in  error  (as  compared 
with  their  true  scores)  by  not  more  than  ±3.20  points. 

In  the  formulas  for  a{M)  and   PE{M),  the  o-'s  of  the  test 
and  its  duplicate  are  assumed  to  be  equal.     If  this  is  not  at 


276      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

least  approximately  true  we  must  write  these  formulas  as 
follows: 

_  (0-1+^2)^/1 — —  fat~ 

<T(M)= 2 vl  — H2, (65) 

and 

P2?(„>  =. 6745 Xp^VT^l..    .    .   (66) 

In  the  illustration  above,  if  the  a  obtained  from  the  first 
form  of  Alpha,  and  the  a  obtained  from  the  second  form  of 
Alpha — had  been  15  and  20,  respectively,  <j^m  and  PE{M) 
would  be  written 

(run  =  ^^Vl-.90  =  5 .  53 

and 

PE{m=-  6745X5. 53  =  3. 73. 

The  student  must  be  careful  not  to  confuse  the  formulas  for 
0-(est.)  and  P^(est.)  with  those  for  u^m  and  PE{M).  The 
"estimate"  formulas  enable  us  to  say  with  what  degree  of 
accuracy  we  can  predict  an  individual's  score  on  one  test, — 
knowing  his  score  on  a  second  (and  usually  a  different)  test. 
The  actual  prediction  of  the  "most  probable  score"  is  made 
of  course,  by  means  of  the  regression  equation  connecting  the 
two  tests.  The  aiM)  and  PEiM)  formulas,  on  the  other  hand, 
enable  us  to  determine  the  probable  divergence  of  an  individual's 
obtained  score  from  his  corresponding  true  score,  when  we 
know  (1)  the  a  and  (2)  the  reliability  coefficient  of  the  test. 

When  tests  are  scored  in  different  units,  the  g{M)  of  the 
one  cannot  be  directly  compared  with  the  c^  of  the  other. 
We  cannot  compare  directly,  for  example,  the  reliability  of  a 
score  made  on  a  tapping  test  (score  in  number  of  taps  made  in 
30  sec.)  with  the  reliability  of  a  score  on  a  logical  memory  test 
(scored  in  number  of  items  remembered).  A  simple  method  of 
overcoming  this  difficulty  is  to  use  a  ratio  similar  to  the  coeffi- 
cient of  variation,  V,  described  in  Chapter  I.     Thus  the  ratio 


STATISTICAL  METHOD  AND  TEST  RESULTS  277 


-~-  or     t  (M)  of  the  one  test  may  be  compared  directly  with 

the  -r^-  or      .  {M)  of  the  other.     In  this  way,  the  reliability  of 

obtained  scores  on  one  test  may  be  compared  with  the  reliability 
of  the  obtained  scores  on  another. 


III.  Combining  the  Scores  from  Different  Tests 

When  a  number  of  different  tests  have  been  given  to 
the  same  individual,  it  is  often  desirable  be  able  to  combine 
the  separate  test  scores  into  a  composite  score  in  order  to 
express  the  individual's  standing  in  the  tests  as  a  whole.  The 
simplest  procedure  is,  of  course,  to  average  the  scores  as  they 
stand.  In  merely  averaging  results,  however,  two  difficulties 
arise.  The  first  is  the  difference  in  the  size  and  kind  of  units 
employed  in  the  tests.  Many  tests  are  given  by  the  Amount- 
Limit  Method — the  work  is  completed  (or  as  much  as  possible 
done)  and  the  individual's  performance  is  scored  in  terms  of 
the  time  required.  Many  other  tests  are  given  by  the  Time- 
Limit  Method — the  time  is  fixed,  and  the  subject's  score  is 
the  number  of  items  completed  or  the  number  of  questions 
answered  in  the  time  allowed.  It  is  obvious  that  scores  ob- 
tained from  tests  given  by  these  two  methods  cannot  be  com- 
bined directly. 

A  second  difficulty  is  the  question  of  the  relative  influence 
or  "weight"  to  be  given  the  different  tests  in  the  composite 
score.  Simply  to  average  the  "raw"  (obtained)  scores  gives 
us  no  control  over  the  relative  importance  of  the  various  tests 
in  the  final  total  score.  For  although  it  is  often  assumed  that 
by  simply  averaging  results  we  avoid  the  troublesome  question 
of  weighting,  what  we  actually  do  in  such  cases  is  to  weight  quite 
drastically  without  knowing  what  the  weights  are.  With  these 
two  difficulties  in  mind,  let  us  examine  several  methods  which 
have  been  proposed  for  combining  separate  test  scores  into  a 
composite  score. 


278       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

1.  Combining  Test  Scores  by  Percentiles 

If  the  distribution  of  each  of  the  separate  tests  which  we 
have  given  is  broken  up  into  percentiles,  it  becomes  an  easy 
matter  to  combine  the  separate  percentile  rankings  in  the  vari- 
ous tests,  and  thus  secure  a  final  percentile  ranking  for  each 
individual.  The  method  of  calculating  percentiles  has  already 
been  considered  (page  45).  It  is  only  necessary,  then,  to  show 
how  percentile  rankings  may  be  combined. 


TABLE  XXIX 

Percentile  Distributions  for  9- Year  Olds  on  Three  Tests.   Method 
of  Combining  the  Percentile  Ratings  of  a  Single  Individual 

Percentiles  S's 

5's     Perc. 
Tests  0    10     20     30     40     50     60     70     80     90    100  Score  Rank 

Picture  Completion 62  240  297  325  372  407  440  450  499  577  646     445        65 

Substitution 219   190  173   158  152   141    133   126   121   109     80     126        70 

Sequin  Form-Board....   34     24     21     20     18      18     17     16     15     15     13        17        60 


Median  percentile •.  .  .  .         65 

Table  XXIX  gives  the  percentile  tables  for  9  year-olds  on 
three  tests  of  the  Pintner-Patterson  series  of  performance  tests. 
The  subject,  a  9  year-old  boy,  made  a  score  of  445  on  Picture 
Completion  which  gave  him  a  percentile  ranking  of  65  (midway 
between  60  and  70)  on  this  test.  On  Substitution,  a  score  of  126 
gave  him  a  percentile  ranking  of  70;  and  on  the  Sequin  Form 
Board  a  score  of  17  gave  him  a  percentile  ranking  of  60.  The 
median  of  these  three  percentile  rankings  is  65,  which  indicates 
that  the  subject  is  somewhat  above  the  average  for  Ins  age.  If 
the  subject  had  been,  say,  10  or  11  years  old,  percentile  tables 
for  these  age  distributions  would  have  been  used.  As  is  evident 
from  Table  XXIX  the  method  of  combining  percentile  rankings 
is  simple  and  straightforward;  it  rules  out  the  question  of 
different  units  in  the  tests  combined,  and  gives  each  test  equal 
weight  in  the  final  score. 


STATISTICAL  METHOD  AND  TEST  RESULTS  279 

2.  Combining  Test  Scores  by  the  Method  of  Median  Mental 

Age 

When  the  subjects  are  children,  and  age-norms  exist  for  the 
tests  administered,  it  is  a  relatively  easy  matter  to  determine 
the  MA  of  the  subject  in  each  test,  and  then  find  the  median 
of  these  Mi's.     The  median  MA  is  the  " composite  score." 

Tables  giving  the  MA  equivalents  in  scores  for  various 
tests  have  been  published  by  many  authors  J  and  need  not  be 
reproduced  here.  The  method  of  finding  a  median  mental  age 
for  several  tests  is  often  very  useful  and  its  results  are  easily 
interpreted.  The  method  does  not,  however,  apply  to  normal 
adults. 

3.  Combining  Tests  Which  Have  Been  Weighted  According  to 

the  Variability  of  the  Test  Scores 

When  several  tests  have  been  given,  all  by  the  Time-Limit 
or  all  by  the  Amount-Limit  Method,  scores  may  be  combined 
directly,  the  weight  which  each  test  score  shall  have  in  the 
composite  score  being  determined  in  accordance  with  the  varia- 
bility of  the  test  scores.  An  illustration  will  make  the  method 
clear.  Suppose  that  in  a  given  test  in  which  the  Average  =  25 
and  o-  =  5,  subject  A  scores  20;  and  in  another  test  in  which  the 
Average  =  150  and  a  =  15,  A  scores  160.  Now  if  we  simply  add 
A's  two  scores,  e.g.,  20+160  to  get  180,  the  score  in  the  second 
test  is  given  three  times  as  much  importance  in  this  composite 
as  the  score  in  the  first,  since  the  spread,  i.e.,  the  cr,  is  three  times 
greater  in  the  second  test.  In  order  to  give  the  two  tests  equal 
weight,  we  must  equalize  their  spread  or  variability,  and  this 
can  be  done  by  multiplying  the  a  of  the  first  test  by  3  or  dividing 
the  <s  of  the  second  by  3.  This  same  procedure  must  then  be 
applied  to  the  scores.  By  the  first  operation,  our  composite 
score  becomes  20X3+160  or  220;  by  the  second  operation,  the 

1  For  example,  see  Whipple,  Manual  of  Menial  and  Physical  Tests,  Vols. 
I  and  II,  1914;  Pintner  and  Patterson,  A  Scale  of  Performance  Tests,  1921; 
Pyle,  W.  H.,  The  Examination  of  School  Children,  1913. 


280      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

composite  score  becomes  20-fJ-f5-  or  73 .  34.    In  either  composite 
both  tests  will  now  have  equal  weight. 

TABLE  XXX 

How  to  Combine  Scores  Weighed  According  to  Variability 

Data  from  200  College  Women.     (From  Carothers,  F.  E.,  Psychological  Ex- 
amination of  College  Students,  Archives  of  Psychology,  1921,  pp.  30-34.) 

Log.  Memory  Log.  Memory  Com-       Informa-      Vocab- 
Testa  (recall)       (recognition)  pletion  tion  ulary 

1  2  3  4  5 

Average 6.50  37.47      35.78      104.71      73.90 

a- 1.76  7.69        4.36  26.79  7.60 

Multiplier  to  give  all 

tests  equal  weight.  5  12  ^  1 

Newer 8.80  7.69        8.72  8.93  7.60 

A's  score 5  35  30  100  75 

A's    weighted    score  Total 

(all  tests  equal)...         25  35  60  34  75  =  229 

A's  weighted  score: 
Tests  1  and  3 
weighted  2,othersl         50  35  120  34  75  =  314 

In  order  to  illustrate  this  method  of  combining  scores  in  more 
detail,  the  average  and  the  a  for  each  of  five  tests  are  given  in 
Table  XXX  together  with  the  scores  of  subject  A  on  each  test. 
If  A's  scores  are  added  as  they  stand,  test  4  (Information)  will 
be  given  15  times  the  weight  of  test  1  (Logical  Memory,  recall) 
in  the  composite,  since  the  a  for  Information  is  15  times  the  a 
for  Logical  Memory,  recall.  Likewise,  Information  will  have 
approximately  6  times  the  weight  of  Completion  and  approxi- 
mately 3  times  the  weight  of  Logical  Memory,  recognition,  and 
Vocabulary.  It  seems  hardly  probable  that  Information  is  as 
much  superior  in  value  as  this  to  the  other  tests — in  fact,  it  is 
possibly  one  of  the  least  important — and  hence  a  new  weighting 
is  clearly  necessary.  The  simplest  plan  at  the  start  will  be  to 
weight  all  of  the  tests  equally  as  shown  in  the  table.  If  we 
multiply  the  a  of  test  1  by  5,  the  a  of  test  2  by  1,  the  a  of  test  3 
by  2,  the  a  of  test  4  by  §,  and  the  a  of  test  5  by  1,  we  make  all  of 
the  a's  approximately  equal.     Now  if  we  multiply  A's  scores  by 


STATISTICAL  METHOD  AND  TEST  RESULTS  281 

these  same  "multipliers,"  the  new  test  scores  will  all  have  the 
same  weight  in  the  final  composite.  In  determining  multipliers, 
the  best  plan  is  to  keep  them  whole  numbers,  if  practicable,  and 
as  small  as  possible.  In  Table  XXX,  for  example,  the  o-'s  of 
tests  2  and  5  have  been  taken  as  standards  because  this  gives 
the  simplest  multipliers  for  the  other  tests. 

Suppose  now  that  we  had  wished  to  give  Logical  Memory, 
recall,  and  Completion  twice  as  much  weight  as  the  other  tests 
in  the  composite.  To  accomplish  this  we  should  simply  have 
multiplied  the  <r's  of  tests  1  and  3  by  10  and  4  instead  of  5  and 
2,  i.e.,  we  should  have  multiplied  by  enough  to  make  their  new 
o-'s  twice  as  large  as  the  cr's  of  the  other  tests.  Of  course,  when 
all  of  the  tests  have  already  been  weighted  1,  we  need  only 
double  the  scores  on  tests  1  and  3. 

To  summarize  the  steps  in  the  method: 

(a)  Find  the  average  and  the  a  or  Q  of  each  test. 

(6)  If  the  tests  are  to  have  equal  weight,  multiply  the 
cr  or  Q  of  each  test  by  factors  selected  so  as  to  make  all  of  the 
new  <r's  or  Q's  equal.  If  some  tests  are  to  count  more  heavily 
than  others,  make  their  cr's  or  Q's  proportionally  larger. 

(c)  Multiply  each  £'s  score  by  the  " multiplier"  decided 
upon  in  (6),  and  add  these  new  scores.  Leave  the  result  as 
a  composite  total,  or  average  the  new  scores  if  there  is  some 
reason  for  working  with  smaller  numbers. 

4.  Combining  Test  Scores  by  Converting  the  Scores  of  Different 
Tests  into  Comparable  Series 

As  mentioned  above,  the  chief  difficulties  in  combining  the 
scores  of  different  tests  arise  from  differences  in  the  units  in 
which  the  tests  are  scored  as  well  as  differences  in  variability 
among  the  tests  themselves.  We  have  already  considered  three 
ways  of  avoiding  these  difficulties.  Still  another  method  is  to 
convert  the  scores  of  the  different  tests  into  comparable 
distributions,  after  which  the  test  scores  may  be  combined 
directly. 

Two  methods  of  combining  tests  in  this  way  have  been. 


282      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

proposed,  both  of  which  assume  that  the  distributions  of  test 

scores  are  normal  or  approximately  normal.     The  more  recent, 

suggested  by  Professor  Clark  Hull,1  is  to  convert  the  scores 

from  each  test  into  a  "standard"  normal  distribution  in  which 

the  scores  shall  range  from  0  to  100  with  a  mean  at  50  and  a 

of  14.     [Individual  scores  rarely  spread  more   than    ±3.5o- 

50 
above  or  below  the  average ;    hence,  since  ^r—  =  14 .  00  the  a  of 

o.o 

this  distribution  may  be  taken  as  14.00.]     Conversion  of  the 

scores  of  a  given  test  is  readily  made  by  the  following  scheme: 

Let  M— average  of  the  given  test. 

Let     <7  =  a  of  the  given  test. 

Let  Xi  =  individual's  score  on  the  given  test. 

Let   50  =  average  of  the  converted  series. 

Let   14  =  0-  of  the  converted  series. 

Let    X  =  individual's  score  in  the  converted  series. 

Now  if  £  =  —    SindK  =  50-MS;  then  X  =  i£+SXi. 

To  illustrate,  suppose  that  in  a  given  test  the  average  is 
16.00,  the  <j  is  3.5,  and  that  subjects  scores  18  on  the  test. 
What  is  A's  converted  score? 

S=^\  or  4.00,  and  #  =  50-16X4  or  -14.00. 
o .  o 

Substituting  in  X  =  K+SXU  X=  -14+4X18  =  58. 

A's  score,  therefore,  in  a  distribution  of  Average  =  50  and  a  =  14 
is  58.  In  other  words  (assuming  a  normal  distribution),  58  is 
as  far  above  the  average  of  the  distribution  whose  average  is 
50,  as  18  is  above  the  average  of  the  distribution  whose  average 
is  16.00. 

An  illustration  will  serve  to  demonstrate  how  scores  may 
be  combined  by  this  method  (Table  XXXI). 

1  The  Conversion  of  Test  Scores  into  Series  which  shall  have  any  Assigned 
Mean  and  Degree  of  Dispersion,  Journal  Applied  Psychology,  1922,  6.  p.  299, 


STATISTICAL  METHOD  AND  TEST  RESULTS  283 


TABLE  XXXI 

Test  1  Test  2 

Word  Building  Digit  Span                  Total 

Average 16 .  30  7.4 

a 4.90  1.3 

A's  score 18.00  8.0 

A's  converted  score 54 .  86  56 .  48                 55 .  67 


Taking  test  1,  Word-Building,  first,  from  the  formula  above, 

£  =  ~  or  2.86;    and  #  =  50-16.30X2.86  or  3.38.     Hence, 
4.9 

X  =  3. 38+2. 86  Xi,  and  substituting  A's  score  of  18  for  X\  we 

.  .  14 

have  X  =  54.86.     In  like  manner,  m  test  2,  Digit  Span,  &  =  —  -x 

1 .  o 

or   10.8;    and  #  =  50-7.4X10.8  or    -29.92.     Accordingly, 

X=  -29.92+10.8X8  (substituting  A's  score  in  Digit  Span) 

or  56.48.     Averaging  A's  scores  in  Word-Building  and  Digit 

Span,  we  have  55.67  as  the  composite  score,  which  means  that 

A  is  slightly  above  average  (50)  in  the  two  tests. 

Since  we  have  computed  both  K  and  S  for  each  of  the  tests, 
all  of  the  scores  on  Word-Building  may  be  quickly  converted 
into  "new"  scores  by  means  of  the  formula  Z  =  3.38+2.86Xi; 
and  all  of  the  scores  on  Digit  Span  converted  into  "new" 
scores  by  means  of  the  formula  X=  —29. 92+10. 8X1.  In 
each  case  the  X\  represents  the  actual  score  on  the  test. 

An  earlier  method  of  combining  test  scores,  based  on  the 
same  principles  as  the  above  plan,  was  outlined  in  1912  by 
Professor  Woodworth.1  Woodworth's  plan  was  to  find  the 
difference  between  a  given  individual's  score  on  a  test  and  the 
average  score,  i.e.,  X— Avx;  divide  this  plus  or  minus  differ- 
ence (ztx)  by  the  a  of  the  test  and  call  the  result  (  — ),  the 
"reduced  score."  2    Reduced  scores  found  in  this  way  for  the 


1  Combining  the  Results  of  Several  Tests,  A  Study  in  Statistical  Method, 
Psychological  Review,  1912,  Vol.  XIX,  pp.  97-123. 

?  Note  that  in  Woodworth's  method  the  average  is  taken  at  0  and  a  as  1.00, 


284       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

same  individual  on  several  tests  may  be  combined  by  simply 
averaging  them — the  weight  of  each  test  in  the  composite  will 
be  1.00.  To  illustrate  the  method  using  the  data  of  Table 
XXXI,  A 's  score  of  18  on  the  Word-Building  test  is  1 .  70  above 
the  average,  i.e.,  above  16.30;  and  dividing  this  deviation  by 
the  a  of  the  series  gives  A  a  " reduced  score" — a  score  ex- 
pressed in  a  units — of  .347.  On  the  Digit  Span  test,  A's  score 
of  8.00  is  .6  above  the  average  of  the  distribution,  i.e.,  above 
7.4;  and  dividing  .  6  by  1 . 3  we  get  a  reduced  score  on  Memory 
Span  of  .462.  If  we  average  these  two  reduced  scores,  A  is 
found  to  stand  .  405  (in  <t  units)  above  the  average  of  the  group  in 
the  two  tests.  (Remember  that  this  method,  like  the  preceding 
one,  assumes  that  the  distributions  of  test  scores  are  approxi- 
mately normal.) 

Of  these  two  methods,  the  first  is  somewhat  the  simpler 
inasmuch  as  it  involves  only  plus  values  (all  transmuted  scores 
lie  between  0  and  100),  while  the  second  method  introduces 
plus  and  minus  values  which  are  nearly  always  fractions,  often 
small  in  size  and  inconvenient  to  handle.  Again,  a  composite 
score  of  55 .  67  by  Hull's  method  is  probably  more  intelligible  to 
the  average  student  accustomed  to  think  in  per  cents,  than  an 
average  score  of  .405  found  by  Woodworth's  plan.  The  latter 
result  is  meaningful  only  to  those  who  have  had  considerable 
statistical  training. 

Woodworth's  method  has  one  particular  advantage,  how- 
ever, which  should  be  mentioned,  viz.,  that  when  reduced  scores 
have  once  been  calculated  for  two  or  more  tests,  correlations 
between  the  tests  may  easily  be  found.  The  method  of  obtain- 
ing such  correlations  is  illustrated  in  Table  XXXII  which  gives 
the  reduced  scores  made  by  10  adults  on  a  Memory  Span  and 
Information  test,  and  the  correlation  between  the  two  series. 
As  shown  in  the  table  the  calculations  are  relatively  simple. 
Since  each  individual's  reduced  score  on  Memory  Span  (X)  is 
simply  his  x  (i.e.,  his  deviation  from  the  average)  divided  by 
&X)  and  his  reduced  score  on  Information  (F)  is,  again,  his  y 
(i.e.,  deviation  from  the  average)  divided  by  cry,  the  sum  of  the 


STATISTICAL  METHOD  AND  TEST  RESULTS 


285 


products  (i.e.,  —  •  — )  of  the  reduced  scores  of  all  of  the  ten 

\  Vx    Cfy/ 

2^7*77 

individuals  will  give  — -.      We  know  from  formula  (24)  that 

O'xO'y 
2t?7 

r—  Ar  (page   168).      Hence,   the   correlation   between   the 

i\ax(ry 

two  tests  is  obtained  simply  by  dividing  - — -,  (7 .  31)  by  N  (10) : 

(TxCy 

that  is,  r  equals  .731. 


TABLE  XXXII 

To  Illustrate  the  Method  of  Finding  Correlation  from 
''Reduced  Scores" 


Memory      Information  (F) 


Reduced 
Score  in  X 


Reduced 
Score  in  Y 


Individuals 

Span  (X) 
Score 

Scor 

A 

5 

90 

B 

9 

60 

C 

8 

90 

D 

7 

85 

E 

6 

70 

F 

10 

100 

G 

12 

130 

H 

6 

80 

I 

5 

(       75 

J 

12 

120 

Avx  = 

=  8.0 

<TX- 

=  2.53 

(-)     (-) 

\(JX'  \(Ty/ 


\ffx 

-1.19 
.39 

-''.39 

-  .79 
.79 

1.58 

-  .79 
-1.19 

1.58 


Product  of 
Reduced  Scores 

(xy  \ 
(JxOy) 


-1.45 

-!24 

-  .97 
.49 

1.94 

-  .49 

-  .73 
1.46 


2xy 

OxOy 


-.566 

'^094 
.766 

.387 

3.065 

.387 

.869 

2.307 

=  7.309 


Avy  =  90.00 
0-^  =  20.62 


2xy      7.31 


N<rx<Ty        10 


=  .731 


Note. — This  table  is  intended  simply  to  illustrate  the  method.     A  produot- 
moment  r  would  not  ordinarily  be  found  for  10  cases. 


The  student  should  bear  in  mind  when  using  either  of  these 
methods  that  neither  is  strictly  applicable  when  the  distributions 
are  considerably  skewed.  As  stated  above,  both  assume  that 
the  distributions  to  which  they  are  applied  are  normal  or 
approximately  normal. 


286      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

IV.  The  or  of  the  Sum  or  Difference  of  Corresponding 
Values  of  Two  Series  of  Test  Scores 

If  we  know  the  correlation  between  two  series  of  test  scores 
Xi  and  X2  and  the  cr's  of  the  two  series,  it  is  possible  to  compute, 
in  a  simple  way,  the  a  of  the  new  composite  series  obtained  by 
adding  or  subtracting  the  corresponding  scores  in  the  two  original 
series.  When  the  scores  of  the  "new"  distribution  have  been 
found  by  adding  corresponding  scores,  the  formula  for  asl  is 

(Ts—^o-2Xl+(T2X2-\- 2raXlaX2, (67) 

in  which  crs  denotes  the  a  of  the  "new"  summed-series,  aXl  is  the 
a  of  the  Xi  scores,  aX2  is  the  cr  of  the  X2  scores,  and  r  is  the 
coefficient  of  correlation  between  Xi  and  X2.  When  the  scores 
in  the  new  distribution  have  been  obtained  by  subtracting  cor- 
responding scores  in  the  two  tests,  formula  (67)  becomes 

<rd=^/(T2x1+(T2X2-2raXl(TX2, (68) 

in  which  ad  is  the  a  of  the  new  difference-series. 

A  problem  will  illustrate  the  use  of  these  formulas.  Let 
Xi  denote  a  Verb-Object  Test  and  X2  an  Opposites  Test.  Then 
given  0^=11.18,  0^  =  9. 00,  and  rXlX2=  .60,  what  is  the  a  of 
the  new  series  obtained  (1)  by  adding  the  corresponding  Xi  and 
X2  scores,  and  (2)  by  subtracting  the  corresponding  Xi  and  X2 
scores?    Substituting  in  formula  (67),  we  have 


or 


<t3=\/(11.18)2+(9.00)2+2X.  60X11. 18X9, 
a8  =  18.07. 


Thus,  18.07  is  the  a  of  the  (X1+X2)  series.     To  find  the  a  of 
the  (Xi— X2)  series,  ad,  we  substitute  in  formula  (68), 


crd=V/(11.18)2  +  (9.00)2-2X. 60X11,  18X9. 00, 
or 

<7d  =  9.23. 

1  For  a  simple  mathematical  proof  of  this  formula,  9ee  Yule,  An  Introduction 
to  the  Theory  of  Statistics,  pp.  210-211. 


STATISTICAL  METHOD  AND  TEST  RESULTS  287 

Formula  (68)  is  often  useful  when  a  test  has  been  repeated 
in  a  group  under  changed  conditions  and  the  variability  of  these 
changes,  i.e.,  the  <j  of  the  differences  between  scores  made  on  the 
second  and  the  first  giving  of  the  test,  is  sought.  Except  that 
there  is  only  the  one  test  concerned,  the  method  is  identical 
with  that  of  the  problem  above.  The  chief  objection  to  the 
formula  is  that  the  r  between  the  scores  on  the  first  and  second 
giving  of  the  test  must  be  known.  For  this  reason,  unless  the 
r  is  wanted  for  other  purposes,  it  is  usually  easier  to  subtract 
the  corresponding  scores  and  derive  the  a  of  their  differences 
directly. 

From  the  formula  for  the  reliability  of  the  average,  <rav  =  ~^i, 

VN 


(formula  13),  we  know  that  o-(dls.)  =  v  JWav..  We  may,  therefore, 
write  ViVVav.^  instead  of  <rXl;  VW<7av.z2  instead  of  <rX2;  ViVo-av.s 
instead  of  as;  and  v  JWav.d  instead  of  <xd.  Making  these  sub- 
stitutions in  formulas  (67)  and  (68)  we  have  (the  iV's  cancel), 
that 

0"av.  s—v  0"*av.  xx  +  C"av.  x2  4"  2r<7av.  xi^av.  x2,         •       •       (69a) 

and  ( 

Cav.d=   v  (7-2av.i1+0-2av.a;2  — 2/'(7a,v.Xl0-av.  x2'         •        •        (696) 

in  which  o-av.s  is  the  a  of  the  average  of  the  (X1+X2)  series  of 
scores,  and  <7av.  a  is  the  a  of  the  average  of  the  (Xi  —  X2)  series  of 
scores. 

Formulas  (69a)  and  (696)  must  always  be  used  whenever 
there  is  any  correlation  between  the  X\  and  X2  scores.  If  Xi 
and  X2  are  uncorrelated,  that  is,  if  r  =  .  00,  the  third  term  under 
the  radical  disappears  and  (69a)  and  (696)  become 

Oav.  s  =  v  O^av.  xt  +  C2av.  x2,  ....       (70a) 

and  

%.d  =  ^2av.ii  +  ff2av.i2 (706) 

Now  if  we  write  <r^m.)  instead  of  o-av.d  in  formula  (706),  we  at 
once    recognize    the    familiar   formula,  cr(dlff.)  =  V  c2av.  1+ <r2av.2, 


288       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

which  we  have  used  heretofore  for  measuring  the  reliability  of 
the  difference  between  two  averages,  or  with  appropriate 
changes,  two  <r%  or  two  r's.  It  should  always  be  remembered 
that  0-(dlff.)  is  simply  a  special  form  of  the  more  general  formula 
(696)  and  that  it  always  assumes  a  zero  correlation  between 
Xi  and  X2. 

The  PE  may  be  written  for  a  in  any  of  the  formulas  given 
in  this  Section  by  making  the  substitution  PE  =  .  6745  X  <r. 

V.  How  to  Interpret  the  Coefficient  of  Correlation 

BETWEEN   TWO   TESTS 

When  can  a  coefficient  of  correlation  be  considered  "high"? 
Is  an  r  of  .40  between  two  tests  evidence  of  "low"  or  "marked" 
relationship?  Questions  like  these,  and  many  others  which 
relate  to  the  interpretation  of  a  coefficient  of  correlation  fre- 
quently arise  in  test  work  and  must  be  answered  if  we  would 
understand  the  significance  of  an  obtained  r. 

The  effectiveness  of  an  r  as  a  measure  of  relation  may  be 
evaluated  in  several  ways:  (1)  in  terms  of  the  standard  error 
of  estimate ;  (2)  in  terms  of  the  standard  error  of  measurement ; 
and  (3)  in  terms  of  the  percentage  of  factors  common  to  the  two 
capacities  correlated.  Let  us  consider  these  three  approaches 
to  an  interpretation  of  r  before  attempting  to  lay  down  any 
general  rule  for  classifying  r's  as  "high,"  "medium,"  or  "low." 

1.  The  Interpretation  of  a  Coefficient  of  Correlation  in  Terms 

Of  0-(est.) 

The  standard  error  of  estimate,  o-(eSt.)>  is  probably  the 
most  practicable  way  of  evaluating  the  effectiveness  of  a  coeffi- 
cient of  correlation.  This  follows  from  the  fact  that  a^st.  xx), 
which  enables  us  to  tell  how  accurately  we  can  estimate  an 
individual's  score  on  test  Xi  knowing  his  score  on  test  Xo, 
depends  on  the  r  between  the  two  tests.  When  r  =  1 .  00, 
o"(est.  xi>  =  •  00,  which  means  that  we  can  predict  a  score  in 
Xi  from  a  knowledge  of  X2  with  perfect  accuracy — no  error. 


STATISTICAL  METHOD  AND  TEST  RESULTS  289 

To  take  the  opposite  extreme,  when  r  =  .  00,  o-(est.  xx)  =  01 
directly,  which  means  that  we  can  only  be  certain  that  the 
predicted  score  lies  somewhere  within  the  limits  of  the  Xi  dis- 
tribution, i.e.,  within  the  limits,  Obtained  Score  ±3c.  In 
other  words,  the  estimate  from  the  distribution  of  X\  alone  is  as 
good  as  the  estimate  made  with  the  addition  of  X2.  As  r 
decreases  from  1 .  00  to  0,  the  standard  error  of  estimate  rapidly 
increases,  so  that  predictions  from  the  regression  equation 
range  all  of  the  way  from  certainty  to  practically  guesswork. 
The  closeness  of  the  correspondence  denoted  by  an  r,  therefore, 
may  be  gauged  by  the  size  of  cr(est0. 

We  may  illustrate  with  the  following  problem.  Suppose 
that  the  correlation  between  two  tests  X\  and  X2  is  .60,  and 
that  aXl  =  5. 00.  Then  er(est.  Xl)  is  5  X  Vl  -  .  62  or  4 .  00,  which  is 
only  20%  less  than  5.00  the  <7(est. x£>  for  r=  .00,  i.e.,  for~a  mini- 
mum predictive  value.  The  proportionate  amount  of  reduc- 
tion in  (7(est.  x)!  as  r  varies  from  .00  to  1.00  is  given  by  the 
expression  vl- r2,  and  hence  it  is  possible  to  estimate  the 
" predictive "  value  of  an  r  from  Vl— r2  alone.  This  radical 
(vl  —  r2)  has  been  designated  by  Kelley1  the  "coefficient  of 
alienation,"  and  is  usually  denoted  by  the  letter  "k"  k  may 
be  thought  of  as  measuring  the  absence  of  relation  between 
two  variables  Xi  and  X2,  in  the  same  way  that  r  measured  the 
presence  of  relation.  Thus  when  k  =  1 .  00,  r  =  .  00,  and  when 
k  =  .  00,  r  =  1 .  00 — the  larger  the  coefficient  of  alienation  the 
greater  the  lack  of  relation,  and  the  less  the  value  of  the 
prediction.  In  order  to  show  how  the  estimate  improves  as  r 
increases,  the  k's  for  the  values  of  r  from  .00  to  1.00  are  given 
in  Table  XXXIII. 

It  will  be  noted  that  r  must  be  .866  before  k  is  half  way 
between  perfect  correlation,  and  a  guess: — before  the  stand- 
ard error  of  estimate  is  reduced  one-half.  For  r's  of  .30  and 
less,  the  coefficients  of  alienation  are  so  large  that  the  predic- 

1  Kelley,  T.  L.,  Principles  Underlying  the  Classification  of  Men.  Journal 
of  Applied  Psychology,  1919,  Vol.  Ill,  1,  p.  50. 


290      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

tions  based  on  them  are  but  little  better  than  a  guess.  Even 
with  an  r  —  .  99,  it  will  be  noticed  that  the  standard  error  of 
estimate  is  still  \  as  large  as  when  k  =  1 .  00.  It  is  obvious, 
then,  that  in  order  to  estimate  individual  scores  with  accuracy, 
the  correlation  should  be  at  least  .  90. 


TABLE 

XXXIII 

Giving  Coefficients  of 

Alienation  k  for 

Values  of  r 

FROM 

.00 

TO  1.00 

r 

fc=  Vl-r2 

r 

k=  y/i-r* 

.00 

1.0000 

.80 

.6000 

.10 

.9950 

.8660 

.5000 

.20 

.9798 

.90 

.4539 

.30 

.9539 

.95 

.3122 

.40 

.9165 

.98 

.1990 

.50 

.8660 

.99 

.1411 

.60 

.8000 

1.00 

.0000 

.70 

.7141 

(.7071) 

.7071 

2.  The  Interpretation  of  a  Coefficient  of  Correlation  in  Terms 
of  the  Standard  Error  of  Measurement,  cr{M). 

We  have  found  (page  183)  that  the  standard  error  of 
measurement  enables  us  to  estimate  the  probable  divergence  of 
an  obtained  score  on  a  test  from  its  corresponding  true  score. 

Moreover,  since  <rw)  =  <riVl  —  ri2,  the  amount  of  this  probable 
divergence  will  depend  to  a  large  degree  upon  the  size  of  the 
self-correlation,  ri2,  and  accordingly  it  follows  that  the  value  of 
ri2  as  a  measure  of  relation  may  be  determined  from  the  size 
of  o-(jif).  When  r=1.00,  for  example,  o-(ad=.00,  and  every 
obtained  score  equals  its  true  score  exactly.  When  r  =  .  00,  on 
the  other  hand,  cr(M)  =  <ri  (the  <j  of  the  distribution)  and  we 
can  only  be  sure  that  the  true  score  (corresponding  to  a  given 
obtained  score)  lies  somewhere  within  the  limits  of  the  dis- 
tribution— within  the  limits  ±3c.  In  other  words,  when 
r—  .00,  the  probable  divergence  of  an  obtained  score  from  its 
true  score  is  as  great  as  it  would  be  had  we  simply  guessed  that 
the  true  score  lay  somewhere  in  the  distribution. 

To  illustrate,  suppose  that  the  reliability  coefficient  of  a  given 


STATISTICAL  METHOD  AND  TEST  RESULTS  291 

test,  n2=.80,  and  that  01*=  10.00.  Then  (T(M)  =  10Vl- .80 
or  4.472,  and  since  <rw)  is  10.00  when  r=.00,  evidently  a 
reliability  coefficient  of  .80  serves  to  reduce  a^M)  to  about 
45%  of  what  it  would  be  in  the  event  of  a  guess.  The  re- 
duction in  aw  as  r  varies  from  0  to  1.00  is  given  by  the 
expression  vl- ru.  Hence  this  factor  may  be  used  to  test 
the  effectiveness  of  an  obtained  reliability  coefficient,  just  as 
k  tests  the  value  of  the  r  between  two  tests.  In  Table  XXXIV 
the  values  of  vl  —  rl2  have  been  calculated  for  r's  from  .00  to 
1.00. 


TA] 

BLE 

FOR 

XXXIV 

Values 

r 

of  r 

Giving  Values 

of  vl—  r12 
V*l— TO 

FROM    .  00  TO   1 .  00 

r 

Vl~TO 

.00 

1.0000 

.80 

.4472 

.10 

.9487 

.90 

.3162 

.20 

.8944 

.95 

.2236 

.30 

.8367 

.98 

.1414 

.40 

.7746 

.99 

.1000 

.50 

.7071 

1.00 

.0000 

.60 

.6325 

.70 

.5477 

.75 

.5000 

( 

From  Table  XXXIV  it 

is  evident  that  the  self-correlation 

of  a  test  must  be  at  least  .  75  before  v  1  —  ri2  is  half  way  between 
complete  reliability  and  a  guess.  For  an  7*12  =  .98,  the  chances 
are  still  68  in  100  that  a  given  score  will  diverge  from  its  true 
score  by  as  much  as  ±  .  1414  of  the  a  of  the  test.  Since  high 
reliability  coefficients,  therefore  (e.g.,  .90  or  above),  indicate 
relatively  large  departures  from  perfect  reliability,  it  is  clear 
that  a  self-correlation  of,  say,  .30  or  .40  is  almost  valueless. 

3.  Interpretation  of  a  Coefficient  of  Correlation  in  Terms  of  the 
Percentage  of  Common  (Overlapping)  Elements  or 
Factors 

It  is  sometimes  helpful  to  regard  a  coefficient  of  correlation 
as  a  ratio  which  expresses — directly  or  indirectly — the  per- 


292       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 


centage  of  elements  or  factors  common  to  the  tests  which  are 
correlated.  Or  again,  r  may  be  thought  of  as  a  device  for 
indicating  the  extent  to  which  the  factors  which  determine 
capacity  in  the  one  test  "overlap"  those  of  another  test.1  Let 
us  suppose  that  capacity  in  test  X  depends  upon  the  presence 
or  absence  of  a+c  independent,  elemental,  factors;  and  that 
capacity  in  test  Y  depends  upon  the  presence  or  absence  of 
b-\-c  independent,  elemental,  factors.  The  a  factors  determine 
X  scores  alone,  the  b  factors  Y  scores  alone,  and  the  c  factors 
are  common  to  both  X  and  Y.  Moreover,  let  us  suppose 
further  that  all  factors,  a,  b,  and  c,  are  governed  solely  by  the 
laws  of  chance,  so  that  each  factor  is  as  likely  to  be  present  as 
absent  in  the  same  way  that  a  coin  when  tossed  is  as  likely  to 
fall  heads  as  tails. 

Now  if  we  let  na  =  total  number  of  a  factors,  nh  =  total  number 
of  b  factors,  and  nc  =  the  total  number  of  c  factors,  it  can  be 
shown  2  that  the  correlation  between  X  and  Y  is  given  by  the 
formula : 


r=- 


n„ 


, = (71) 

That  is,  the  coefficient  of  correlation  equals  the  number  of  com- 
mon factors  in  X  and  F, 


-X- 


-Y- 


a  a  a  a 

cccc 

bbbbbbb 

r  = 


.426 


V8xIT 
DIAGRAM  XXVII 


divided  by  the  geometrical 
mean  of  the  total  number 
of  factors  in  X  and  Y. 
This  situation  is  shown 
graphically  in  Diagram 
XXVII  in  which  X  is 
determined  by  8  factors,  4 
a's  and  4  c's,  and  7  by  11  factors,  7  6's  and  4  c's.  The  correla- 
tion by  formula  (71)  is 

4  4 

■  or  -7==  =    A9Q 

V(4  +  4)(7-H)  fSxll 

1  The  following  is  adapted  from  the  discussion  by  Kelley,  Statistical  Method, 
pp.  189-190. 

2  See  Kelley,  Statistical  Method,   1923,  p.  190;    or  Brown,  Wm.,  Essentials 
of  Mental  Measurement,  1911,  pp.  79-SO. 


STATISTICAL  METHOD  AND  TEST  RESULTS 


293 


If  the  number  of  elementary  factors  determining  the  score 
in  X  equals  exactly  the  number  determining  the  score  in  Y,  so 
that  n&  =  nh,  formula  (71)  becomes 


nc 


n&+nc' 


(72) 


and  the  coefficient  of  correlation  is  now  simply  the  decimal 
fraction  which  indicates  what  proportion  of  the  causes  influenc- 
ing performance  in  X  and  Y  are  common  to  both.  If  t  =  ihe 
number  of  common  factors  (nc)  and  if  s  =  the  total  number  of 

factors,  present  in  X  and  Y  (na+nc)  r  is  simply  -.     (Remem- 

ber  that  the  factors  in  X  and  Y  are  assumed  to  be  equal 
in  number  and  influence.) 
This  condition  is  illustra- 
ted in  Diagram  XXVIII. 
Since  X  is  determined  by 
8  factors,  4  a's  and  4  c's 
and  Y  by  8  factors,  4  b's  and 
4  c's,  the  correlation  by 
formula  (72)  is  4/8  or  .50. 

Now  let  us  assume,  lastly,  that  Y  is  completely  determined 
by  nc  elements,  and  that  X  is  determined  by  these  same  elements 
plus  n&  elements  in  addition  (nb  =  0).  Formula  (71)  then 
becomes 


■Y- 

bbbb 

-X- 

a  a  a  a 

c  c  c  c 

=  .50 


DIAGRAM  XXVIII 


r  — 


V^c(™a+™c) 


(73) 


and  the  coefficient  of  correlation  equals  the  number  of  common 
elements  in  X  and  Y  divided  by  the  geometrical  mean  of  the  total 
number  of  factors  in  X  and  in  Y.  Diagram  XXIX  shows  this 
graphically.    Y  is  determined  by  4  c's  and  X  by  these  factors  plus 

.4 

4  a's  in  addition:  the  correlation,  therefore,  is     , :  or  .707.  If 

V4X8 


a  a  a  a 

-Y- 

c  c  c  c 

294      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 
we  square  the  r  obtained  from  formula  (73),  we  have  that 

r2=rf^'  ■ ™ 

that  is,  the  square  of  the  coefficient  gives  the  extent  to  which 
the  elements  in  Y  overlap  those  of  X:— or  the  proportion  of 
elements  in  X  which  are  also  involved  in  Y.  In  Diagram  XXIX 
note  that  Y  overlaps  X  50%  and  that  r2— i.e.,  (.707)2— is  .50  as 
_x„  it  should  be.1    Moreover,  since  the 

coefficient  of  alienation  will  equal 
.707  when  r=.707  (see  Table 
XXXIII),  it  follows  that  an  r  of 
.  707  (and  not  .  50)  should  be  taken 
r=     4     =.7n7  as    half  of   a   perfect   correlation.2 

On  the  same  assumptions,  an  over- 

DIAGRAM  XXIX  ,  ,      oolr)y  , 

lapping  oi  33 1%  common  ele- 
ments— i.e.,  r2=.3334 — will  give  a  correlation  of  .578,  which 
is  1/3  of  a  perfect  correlation;  and  an  overlapping  of  25% 
common  elements,  r2  =  .  25,  gives  an  r  =  .  50,  which  is  1/4  of  a 
perfect  correlation.  By  analogy,  an  r  of  .30  or  less  implies 
so  slight  a  degree  of  overlapping  that  there  can  be  a  very  small 
percentage  of  common  elements. 

The  coefficient  of  correlation  as  a  measure  of  the  percen- 
tage of  common  factors  may  be  seen  to  best  advantage  in 
series  formed  by  tossing  coins  or  throwing  dice,  in  which 
the  "  overlapping "  is  arbitrarily  determined  and  controlled  at 
will.  As  an  illustration,  consider  the  correlation  table  in 
Diagram  XXX  in  which  is  shown  the  relation  between  two 
series  of  500  successive  throws  of  12  pennies  made  in  the  fol- 

1  This  result  has  interesting  implications.  Thus  if  all  of  the  elements  in 
test  X2  are  common  to  X\  (e.g.,  a  criterion)  the  extent  to  which  A' 2  overlaps 
Ai  is  given  by  simply  squaring  the  coefficient,  rXixi-  The  assumption  must 
be  made,  of  course,  that  the  scores  in  both  tests  are  summations  of  independent 
and  similar  elements  whose  presence  or  absence  is  governed  by  chance  alone. 

3  Woodworth,  R.  S.,  Combining  the  Results  of  Several  Tests:  A  Study  in 
Statistical  Method,  Psychological  Review,  1912,  XIX,  p.  113.  Hull  Clark, 
The  Joint  Yield  from  Teams  of  Tests,  Journal  of  Educational  Psychology, 
1923,  14,  pp.  396-406. 


STATISTICAL  METHOD  AND  TEST  RESULTS 


295 


DIAGRAM  XXX 

Showing  the  number  of  heads  in  500  successive  throws  of  12  pennies 
in  which  7  pennies  were  tossed  in  the  second  throw  and  5  remained  as  they 
fell  in  the  first  throw  of  all  12  together. 

Heads  in  First  Toss 


0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

Total 

12 

1 

11 

1 

2 

1 

2 

3 

1 

1 

10 

10 

CO 

9 

2 

9 

13 

4 

3 

31 

o 
H 

8 

1 

5 

9 

10 

18 

14 

4 

2 



63 

Q 
O 

7 

1 

2 

5 

14 

24 

28 

10 

7 

4 

95 

o 

6 

1 

3 

9 

18 

27 

29 

16 

3 

2 

1 

109 

CO 

5 

4 

11 

23 

21 

15 

9 
5 

1 

83 

P 

< 

4 

3 

6 

9 

21 

14 

10 

69 

w 

3 

3 

3 

8 

4 

4 

4 

26 

2 

(3 

1 

5 

1 

1 

11 

1 

1 

1 

11C 

GO 

21 

9 

2 

0 

54 

93 

112 

Total 

11 

20 

2 

500 

X 

Y 

a  a  a  a  a  a  a 

G  c  c  c  c 

b  b  b  b  b  6  b 

na  =  n&  =  7 
nc 


r  = 


na  -\-nc     12 
By  calculation  (product-moment) 
r=.424. 


.416. 


(72) 


i  From  Pearl,  R.,  Medical  Biometry  and  Statistics,  p.  297  (after  Darbishire). 


296      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

DIAGRAM  XXXI 

Showing  the  results  of  100  successive  throws  of  dice  in  first  throw  of 
which  (X)  5  dice  were  thrown,  counted,  and  left  down;  and  in  each  second 
throw  of  which  (Y)  5  additional  dice  were  thrown  and  counted  together 
with  the  5  left  down  (10  in  all). 


Fiest  Throw 

OF 

5  Dice  (X) 

w 
o 

Q 

o 

1-1 

o 

o 
« 
w 

H 

n 

O 
u 

w 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

Total 

45 

1 
1 

1 
1 

1 

1 
1 

1 

4 

1 

2 
1 

2 

2 

1 

1 
1 

11 

2 
1 

1 

2 
2 
1 

1 

1 
1 

12 

1 

1 

2 
2 
1 
2 
1 
1 
1 
1 

1 
14 

1 
1 
3 

2 
1 
2 
1 
1 
2 

1 
15 

1 
2 
1 

2 
6 

1 
1 

1 
1 

1 

1 

6 

1 

1 
1 
1 

1 
5 

1 

2 

1 

1 
1 

3 
1 
5 
6 
2 

44 

43 

42 

41 

40 

6 
6 
6 

39 

38 

37 

1 

1 

9 
8 
9 

36 

35 

34 

1 

9 
3 
5 
5 
4 
2 
5 

33 

1 

32 

31 

1 

1 
3 

2 
1 

1 
6 

30 

29 

1 

28 

2 

27 

2 

4 

o 

o 

3 
1 
2 

100 

26 

1 

25 

1 

Total 

3 

1 

7 

By  calculation  (product-moment) 
r  =  .  694 


nc  =  o 


(5)       N 
a  a  a  a  a 


(5) 

c   c   c   c   c 


—Y— 


Vnc(na+nc)     V5X10 


=  .707 


(73) 


STATISTICAL  METHOD  AND  TEST  RESULTS  297 

lowing  way:  first,  all  12  pennies  were  tossed,  and  the  number 
of  heads  recorded  and  noted  in  the  X  column;  then  5  coins 
were  left  lying  and  the  remaining  7  were  tossed  again  and  the 
number  of  heads  in  all  12  recorded  and  noted  in  column  Y, 
opposite  the  X  entry.  By  this  scheme  5  coins  (factors)  contrib- 
ute to  each  pair  of  tosses ;  and  hence,  according  to  formula  (72) 
the  correlation  should  be  5/12  or  .416.  By  the  product-moment 
formula  the  actual  correlation  between  the  two  series  is  .424, 
which  indicates  a  very  close  correspondence  between  actual 
and  theoretical  results.  The  situation  existing  in  each  pair  of 
X  and  Y  tosses  is  shown  in  the  figure  in  Diagram  XXX.  If  4 
coins  had  been  left  lying,  the  r  would  have  been  4/12  or  .334; 
if  6  had  been  left  lying,  r  would  have  been  6/12  or  .  50  etc.  A 
number  of  diagrams  of  the  sort  shown,  in  which  the  number  of 
common  factors  (i.e.,  coins  left  lying)  varies  from  0  to  12,  and  r 
from  0  to  1 .  00  may  be  found  in  Pearl's  Medical  Biometry  and 
Statistics,  pages  294-300. 

Now  suppose  that  we  calculate  the  correlation  between  two 
series  of  dice  throws  made  according  to  the  following  scheme : 1 
5  dice  are  thrown,  and  the  total  read  and  recorded  in  the  X 
column;  then  5  additional  dice  are  thrown  and  the  total  of 
all  10  (the  5  left  and  5  just  thrown)  are  read  and  recorded 
in  the  Y  column.  If  this  is  continued  until  100  throws  have 
been  made,  we  shall  have  100  X  and  100  Y  entries,  each  Y 
throw  (of  10  dice)  "overlapped"  to  the  extent  of  50%  by  its 
corresponding  X  throw  (of  5  dice).  And  since  all  of  the  ele- 
ments in  X  are  completely  contained  in  Y}  the  correlation  be- 

5 
tween  X  and  Y  should,  by  formula  (73),  be      ,  or  .707. 

V5X10 

(See  Diagram  XXXI  and  accompanying  figure.)  Actually,  the 
correlation  by  the  product-moment  formula  is  .694,  which 
indicates,  again,  a  very  close  correspondence  between  actual 
and  theoretical  results.  The  square  of  this  r  gives  us  approxi- 
mately .  50  as  the  percentage  of  common  elements  in  X  and  Y : 

1  These  throws  were  made  by  the  writer* 


298       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

that  is,  we  have  one  half  of  a  perfect  correlation.     (See  page 
294.) 

While  formulas  (71-74)  are  interesting  and  suggestive  as 
giving  us  the  means  of  interpreting  a  coefficient  of  correlation 
under  certain  special  or  restricted  conditions,  it  would  be 
a  mistake  to  apply  them  generally, — to  assume  that  by  simply 
squaring  the  coefficient  of  correlation  we  can  always  determine 
the  percentage  of  common  factors  or  the  amount  of  overlapping. 
It  seems  likely  that  the  scores  on  most  psychological  tests  as 
well  as  many  social  and  educational  measurements  are  the 
result  of  the  combined  action  of  many  factors  which  are  often 
dependent  on  each  other,  and  probably  interwoven  in  a  rela- 
tively complex  manner.  At  any  rate,  we  do  not  know  that  a 
test  score  is  simply  the  sum  of  a  certain  number  of  similar  and 
independent  elements. 

Summary 

From  the  discussion  in  the  preceding  paragraphs,  it  is 
evident  that  even  with  correlation  coefficients  which  we  have 
been  accustomed  to  think  of  as  high,  the  departure  from  perfect 
correlation  is  considerable.  Strictly  speaking,  the  term  "high 
correlation "  should  be  applied  only  to  coefficients  which  are 
.95  or  above.  However,  in  mental,  social,  and  educational 
measurements  there  are  so  many  actual  and  potential  sources 
of  error  due  to  the  variability  of  the  material  dealt  with,  and 
the  relative  crudity  of  the  measurements  made,  that  very  few 
tests  indeed  could  meet  this  requirement.  Very  seldom  do 
correlations  between  tests  run  above  .70  or  .75;  and  hence  it 
is  probably  justifiable,  in  view  of  the  limitations  mentioned,  to 
regard  such  coefficients  as  high.  There  seems  to  be  fairly 
general  agreement  among  workers  with  tests  that  an 

r  from       .00  to  =b    .20  denotes  indifferent  or  negligible  relation. 
r  from  ±  .20  to  ±    .40  denotes  low  correlation:   present  but  slight. 
r  from  ± .  40  to  d=    .70  denotes  substantial  or  marked  relationship. 
r  from  ± .  70  to  =fc  1 .  00  denotes  high  relation. 

This  is  a  tentative  classification  which  is  to  be  taken  as  only 


STATISTICAL  METHOD  AND  TEST  RESULTS  299 

generally  true.  The  size  of  a  correlation  coefficient  should 
always  be  evaluated  with  due  regard  for  the  material  dealt  with, 
the  size  of  the  sample,  and  PET,  no  matter  what  its  absolute 
value. 

PROBLEMS 

1.  The  self-correlation  of  a  certain  test  is  .60. 

(a)  How  much  must  the  test  be  lengthened  to  raise  the  self -correla- 
tion to  .90? 
(6)  What  effect  will  doubling  the  test  have  on  its  reliability? 

2.  Two  equivalent  half-scales  are  made  up  from  the  Downey  Will- 

Temperament  *  Test  in  the  following  way:  (1)  by  grouping  all 
odd-numbered  tests  in  one  half-scale,  and  all  even-numbered 
tests  in  the  other;  (2)  by  grouping  the  first  two  tests  of  every 
pattern  into  one  half-scale,  and  the  last  two  tests  into  another 
half-scale ;  (3)  by  grouping  the  first  and  last  tests  of  each  pattern 
into  one  half-scale,  and  the  second  and  third  tests  of  each  pattern 
into  a  second  half-scale. 
Reliability  coefficients  for  the  half -scale  were  found  as  follows  by 
the  three  methods : 

iV=146 


Method 

Reliability  Coefficient 

1 

.17 

2 

.31 

3 

.24 

Average 

.24 

What  is  the  reliability  of  the  whole  Downey  test? 

3.  In  a  small  group  the  reliability  coefficient  of  a  test  is  .55  and  the 

a  of  the  test  scores  is  3.00.  What  must  the  self-correlation  of 
this  test  be  in  a  larger  group  whose  a  is  5.00,  in  order  to  have 
the  same  degree  of  reliability? 

4.  The  reliability  coefficient  of  a  test,  as  found  in  a  large  unselected 

group,  is  .92;  the  Average  is  142  and  a  is  16.00.  If  an  individual 
makes  150  on  the  test, 

(a)  What  is  the  PE  of  this  score,  i.e.,  the  PE{M)1 

(b)  Within  what  range  does  the  true  score  lie? 

i  Ruch,  G.  M.,  and  Del  Manzo,  M.  C,  The  Downey  Will-Temperament 
Group  Test:  A  Further  Analysis  of  Its  Reliability  and  Validity.  Journal 
Applied  Psychology,  Vol.  VII,  1923,  p.  65, 


300       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

(c)  In  a  second  test  of  a  different  function,  the  reliability  coeffi- 
cient is  .86;  the  average  is  54  and  cr  is  10.00.  In  which  test 
are  the  obtained  scores  the  more  reliable,  i.e.,  closer  to  the 
true  scores? 

5.  The  reliability  coefficient  of  a  test  is  .80.     What  is  the  maximum 

self-correlation  obtainable  with  this  test  as  it  stands? 

6.  Given  the  following  records   (all    in  seconds)  for   100  Barnard 

Freshmen; -1  and  the  scores  made  by  individual  A. 

Tests  Coordinate  Tapping  Color  Naming  Opposites 

Average 82.7  376.3             57.0               51.1 

SD 10.8  51.7               8.8               10.3 

A's  scores 85  350                62                  40 

(a)  Combine  A's  scores  by  the  method  of  variability  weighting 

all  tests  1. 

(b)  Combine  A's  scores  weighting  Coord,  and  Tapping  1  each, 
Color  Naming  3,  and  Opposites  4. 

7.  Using  the  data  in  Example  6  above,  combine  A's  scores  by  the  two 

methods  given  on  pages  282  and  283.  Since  all  scores  are  in 
seconds,  the  higher  the  score  numerically  the  lower  it  actually  is. 

8.  One  hundred  and  fifty  high  school  seniors  make  an  average  score 

of  120  on  Army  Alpha  with  a  cr  of  21.6.  Two  weeks  later  the 
group  is  praised  for  its  performance  (without,  however,  being 
told  what  the  scores  were)  and  given  a  second  form  of  Alpha  on 
which  the  average  score  is  126  and  the  a  is  24.2.  The  r  between 
the  tests  is  .86. 

(a)  Is  the  effect  of  the  incentive  (praise)  plus  the  practice  effect 
sufficient  to  bring  about  a  real  increase  in  average  score?  How 
would  you  rule  out  the  practice  effect? 

(b)  Why  is  it  necessary  to  have  the  correlation  between  the  tests? 

9.  A  battery  of  tests  correlates  .85  with  a  criterion.    Assuming  that 

performance  on  the  battery  is  completely  determined  by  X 
elements,  and  performance  on  the  criterion  by  X-\-Y  elements, 
to  what  extent  may  we  say  that  the  battery  probably  "  overlaps  " 
the  criterion? 

10.  Interpret  a  coefficient  of  correlation  ?*=.50  in  three  ways;    an 

r=.65? 

i  Carothers,   F.   E.,    The    Psychological    Examination    of    College    Students, 
Archives  of  Psychology,  1921,  No.  46,  pp.  21ff. 


STATISTICAL  METHOD  AND  TEST  RESULTS  301 

Answers 

1.  (a)  6  times. 
(6)  r=.75 

2.  Method   1:    r=  .29.     Method  2:    r=  .47.     Method  3:    r=.39. 

Average  of  all  three  methods :  r  =  .  38. 

3.  r=.84. 

4.  (a)  P#(M)  =  3.05. 

(6)  Between  162.2  and  137.8. 

(c)  In    the   first    test.      The    —^=.021   (first  test);    — — 

Av.  Av. 

=  .047  (second  test). 

5.  r=.89. 

6.  (a)  Taking  as  multipliers  for  the  four  tests,  1,  -J,  1,  and  1,  re- 

spectively, we  have  257  as  A's  composite  score. 
(6)  A's  score  is  501.     (Since  the  measures  of  performance  are  in 
time  units,  the  higher  the  numerical  score  the  lower  the  actual 
performance.) 

7.  A's  scores  are  47,  57,  42,  and  65.     Her  average  is  52.75.     (Hull's 

method.) 
A's  scores  are  —.213,  +.509,   —  .568,  +1.078;  her  average  is 
.202.     (This  means  that  A  stands  .202<7  above  the  average  cf 
the  group  on  the  four  tests.) 

D 

8.  (a)  Yes.     is  5+. 

°dlff. 

9.  About  72%  common  elements. 

REFERENCES 

The  following  books  will  be  found  to  be  helpful  as  general 
references : 

1.  Primer   of   Statistics,  by  W.  P.  and   E.  M.  Elderton.     A.  &  C. 

Black,  Ltd.,  London.     1910. 

2.  Mental  and   Social  Measurements,   by  Edward   L.   Thorndike. 

Published  by  Teachers  College,   Columbia  University.     1912 
(revised  edition). 

3.  Statistical  Methods  Applied  to  Education,  by  Harold  O.  Rugg. 

Houghton  Mifflin  Company.     1917. 


302      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 

4.  An   Introduction   to    Statistical    Methods,    by   Horace    Secrist. 

Macmillan  Company.     1917. 

5.  How  to  Measure  in  Education,  by  Wm.  M.  McCall.     The  Mac- 

millan Company.     1922. 

6.  The   Theory   of   Educational   Measurements,    by   Walter   Scott 

Monroe.     Houghton  Mifflin  Company.     1923. 

7.  The  Fundamentals  of  Statistics,  by  L.  L.  Thurstone.     The  Mac- 

millan Company.     1925. 

8.  Statistical  Method  in  Educational  Measurement,  by  Arthur  S. 

Otis.    World  Book  Company.     1925. 

More  advanced  books  are: 

1.  Elements  of  Statistics,  by  A.  L.  Bowley.     P.  S.  King  and  Son, 

London.     1920  (fourth  edition). 

2.  An  Introduction  to  the  Theory  of  Statistics,  by  G.  Udny  Yule. 

Chas.  Griffin  and  Company,  London.     1919  (5th  edition).1 

3.  Essentials  of  Mental  Measurement,  by  W.  M.  Brown  and  G.  H. 

Thomson.     Cambridge  University  Press.     1920. 

4.  A  First  Course  in  Statistics,   by  D.   Caradog  Jones.     G.  Bell 

&  Sons,  London.     1921. 

5.  Statistical  Method,  by  Truman  L.  Kelley.     The  Macmillan  Com- 

pany.    1923. 

6.  Handbook  of  Mathematical  Statistics,   by  H.   L.   Rietz  et  al. 

Houghton  Mifflin  Company.     1924. 

Aids  to  Computation: 

1.  Barlow's  Tables  of  Squares,  Cubes,  Square  Roots,  Cube  Roots, 

Reciprocals  of  numbers  from  1  to  10,000.     E.  and  F.  N.  Spon, 
Ltd.,  London.     1921. 

2.  Tables  of  Vl  —  r2  and   1—  r2  for  use  in  Partial  Correlation  and 

Trigonometry,   by    John    Rice   Miner,   Sc.D.   Johns    Hopkins 
Press.     1922. 

1  The  book  by  Yule  is  a  classic  which  should  be  known  to  every  serious 
student  of  mental  and  social  measurements. 


STATISTICAL  METHOD  AND  TEST  RESULTS 


303 


Table  of  Squares  and  Square  Roots  of  the  Numbers  from  1  to  1000 


Number 

Square 

Square  Root 

1 

1 

1.000 

2 

4 

1.414 

3 

9 

1.732 

4 

16 

2.000 

5 

25 

2.236 

6 

36 

2.449 

7 

49 

2.646 

8 

64 

2.828 

9 

81 

3.000 

10 

100 

3.162 

11 

121 

3.317 

12 

144 

3.464 

13 

169 

3.606 

14 

196 

3.742 

15 

2  25 

3.873 

16 

2  56 

4.000 

17 

2  89 

4.123 

18 

3  24 

4.243 

19 

3  61 

4.359 

20 

4  00 

4.472 

21 

4  41 

4.583 

22 

4  84 

4.690 

23 

5  29 

4.796 

24 

5  76 

4.899 

25 

6  25 

5.000 

26 

6  76 

5.099 

27 

7  29 

5.196 

28 

7  84 

(  5.292 

29 

8  41 

5.385 

30 

9  00 

5.477 

31 

9  61 

5.568 

32 

10  24 

5.657 

33 

10  89 

5.745 

34 

1156 

5.831 

35 

12  25 

5.916 

36 

12  96 

6.000 

37 

13  69 

6.083 

38 

14  44 

6.164 

39 

15  21 

6.245 

40 

16  00 

6.325 

41 

16  81 

6.403 

42 

17  64 

6.481 

43 

18  49 

6.557 

44 

19  36 

6.633 

45 

20  25 

6.708 

46 

21  16 

6.782 

47 

22  09 

6.856 

48 

23  04 

6.928 

49 

24  01 

7.000 

50 

25  00 

7.071 

imber 

Square 

Square  Root 

51 

26  01 

7.141 

52 

27  04 

7.211 

53 

28  09 

7.280 

54 

29  16 

7.348 

55 

30  25 

7.416 

56 

31  36 

7.483 

57 

32  49 

7.550 

58 

33  64 

7.616 

59 

34  81 

7.681 

60 

36  00 

7.746 

61 

37  21 

7.810 

62 

38  44 

7.874 

63 

39  69 

7.937 

64 

40  96 

8.000 

65 

42  25 

8.062 

66 

43  56 

8.124 

67 

44  89 

8.185 

68 

46  24 

8.246 

69 

47  61 

8.307 

70 

49  00 

8.367 

71 

50  41 

8.426 

72 

51  84 

8.485 

73 

53  29 

8.544 

74 

54  76 

8.602 

75 

56  25 

8.660 

76 

57  76 

8.718 

77 

59  29 

8.775 

78 

60  84 

8.832 

79 

62  41 

8.888 

80 

64  00 

8.944 

81 

65  61 

9.000 

82 

67  24 

9.055 

83 

68  89 

9.110 

84 

70  56 

9.165 

85 

72  25 

9.220 

86 

73  96 

9.274 

87 

75  69 

9.327 

88 

77  44 

9.381 

89 

79  21 

9.434 

90 

8100 

9.487 

91 

82  81 

9.539 

92 

84  64 

9.592 

93 

86  49 

9.644 

94 

88  36 

9.695 

95 

90  25 

9.747 

96 

92  16 

9.798 

97 

94  09 

9.849 

98 

96  04 

9.899 

99 

98  01 

9  950 

LOO 

100  00 

10.000 

304       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 


Table  of  Squares  and  Square  Roots — Continued 


dumber 

Square 

Square  Root 

Number 

Square 

Square  Root 

101 

1  02  01 

10.050 

151 

2  28  01 

12.288 

102 

1  04  04 

10.100 

152 

2  31  04 

12.329 

103 

1  06  09 

10.149 

153 

2  34  09 

12.369 

104 

1  08  16 

10.198 

154 

2  37  16 

12.410 

105 

1  10  25 

10.247 

155 

2  40  25 

12.450 

106 

1  12  36 

10.296 

156 

2  43  36 

12 . 490 

107 

1  14  49 

10.344 

157 

2  46  49 

12 . 530 

108 

1  16  64 

10.392 

158 

2  49  64 

12 . 570 

109 

1  18  81 

10.440 

159 

2  52  81 

12.610 

110 

121  00 

10.488 

160 

2  56  00 

12 . 649 

111 

123  21 

10.536 

161 

2  59  21 

12 . 689 

112 

1  25  44 

10.583 

162 

2  62  44 

12.728 

113 

1  27  69 

10.630 

163 

2  65  69 

12.767 

114 

129  96 

10.677 

164 

2  68  96 

12.806 

115 

132  25 

10.724 

165 

2  72  25 

12 . 845 

116 

134  56 

10.770 

166 

2  75  56 

12.884 

117 

1  36  89 

10.817 

167 

2  78  89 

12.923 

118 

139  24 

10.863 

168 

2  82  24 

12.961 

119 

141  61 

10.909 

169 

2  85  61 

13 . 000 

120 

144  00 

10.954 

170 

2  89  00 

13.038 

121 

146  41 

11.000 

171 

2  92  41 

13.077 

122 

148  84 

11.045 

172 

2  95  84 

13.115 

123 

1  51  29 

11.091 

173 

2  99  29 

13.153 

124 

1  53  76 

11.136 

174 

3  02  76 

13.191 

125 

156  25 

11.180 

175 

3  06  25 

13.229 

126 

158  76 

11.225 

176 

3  09  76 

13.266 

127 

1  61  29 

11.269 

177 

3  13  29 

13.304 

128 

1  63  84 

11.314 

178 

3  16  84 

13.342 

129 

1  66  41 

11.358 

179 

3  20  41 

13.379 

130 

1  69  00 

11.402 

180 

3  24  00 

13.416 

131 

1  71  61 

11.446 

181 

3  27  61 

13.454 

132 

1  74  24 

11.489 

182 

3  31  24 

13.491 

133 

1  76  89 

11.533 

183 

3  34  89 

13 . 528 

134 

1  79  56 

11.576 

184 

3  38  56 

13.565 

135 

1  82  25 

11.619 

185 

3  42  25 

13.601 

136 

184  96 

11.662 

186 

3  45  96 

13 . 638 

137 

1  87  69 

11.705 

187 

3  49  69 

13.675 

138 

1  90  44 

11.747 

188 

3  53  44 

13.711 

139 

1  93  21 

11.790 

189 

3  57  21 

13 . 74S 

140 

1  96  00 

11.832 

190 

3  61  00 

13 . 784 

141 

1  98  81 

11.874 

191 

3  64  81 

13.S20 

142 

2  01  64 

11.916 

•     192 

3  68  64 

13 . S56 

143 

2  04  49 

11.958 

193 

3  72  49 

13 . 892 

144 

2  07  36 

12.000 

194 

3  76  36 

13 . 92S 

145 

2  10  25 

12.042 

195 

3  80  25 

13.964 

146 

2  13  16 

12.083 

196 

3  84  16 

14.000 

147 

2  16  09 

12.124 

197 

3S8  09 

14.036 

148 

2  19  04 

12.166 

198 

3  92  04 

14.071 

149 

2  22  01 

12.207 

199 

3  96  01 

14.107 

150 

2  25  00 

12.247 

200 

4  00  00 

14.142 

STATISTICAL  METHOD  AND  TEST  RESULTS 


305 


Table  of  Squares  and  Square  Roots — Continued 


dumber 

Square 

Square  Root 

Number 

Square 

Square  Root 

201 

4  04  01 

14.177 

251 

6  30  01 

15.843 

202 

4  08  04 

14.213 

252 

6  35  04 

15.875 

203 

4  12  09 

14.248 

253 

6  40  09 

15 . 906 

204 

4  16  16 

14 . 283 

254 

6  45  16 

15.937 

205 

4  20  25 

14.318 

255 

6  50  25 

15.969 

206 

4  24  36 

14.353 

256 

6  55  36 

16.000 

207 

4  28  49 

14.387 

257 

6  60  49 

16.031 

208 

4  32  64 

14.422 

258 

6  65  64 

16 . 062 

209 

4  36  81 

14.457 

259 

6  70  81 

16.093 

210 

4  41  00 

14.491 

260 

6  76  00 

16.125 

211 

4  45  21 

14.526 

261 

6  81  21 

16.155 

212 

4  49  44 

14.560 

262 

6  86  44 

16.186 

213 

4  53  69 

14.595 

263 

6  91  69 

16.217 

214 

4  57  96 

14.629 

264 

6  96  96 

16.248 

215 

4  62  25 

14.663 

265 

7  02  25 

16.279 

216 

4  66  56 

14.697 

266 

7  07  56 

16.310 

217 

4  70  89 

14.731 

267 

7  12  89 

16.340 

218 

4  75  24 

14.765 

268 

7  18  24 

16.371 

219 

4  79  61 

14.799 

269 

7  23  61 

16.401 

220 

4  84  00 

14.832 

270 

7  29  00 

16.432 

221 

4  88  41 

14.866 

271 

7  34  41 

16.462 

222 

4  92  84 

14.900 

272 

7  39  84 

16.492 

223 

4  97  29 

14.933 

273 

7  45  29 

16.523 

224 

5  01  76 

14.967 

274 

7  50  76 

16.553 

225 

5  06  25 

15.000 

275 

7  56  25 

16.583 

226 

5  10  76 

15.033 

276 

7  61  76 

16.613 

227 

5  15  29 

15.067 

277 

7  67  29 

16.643 

228  . 

5  19  84 

15.100 

278 

7  72  84 

16 . 673 

229 

5  24  41 

15.133 

279 

7  78  41 

16.703 

230 

5  29  00 

15.166 

280 

7  84  00 

16.733 

231 

5  33  61 

15.199 

281 

7  89  61 

16.763 

232 

5  38  24 

15.232 

282 

7  95  24 

16.793 

233 

5  42  89 

15.264 

283 

8  00  89 

16 . 823 

234 

5  47  56 

15.297 

284 

8  06  56 

16.852 

235 

5  52  25 

15.330 

285 

8  12  25 

16 . 882 

236 

5  56  96 

15.362 

286 

8  17  96 

16.912 

237 

5  61  69 

15.395 

237 

8  23  69 

16.941 

238 

5  66  44 

15.427 

238 

8  29  44 

16.971 

239 

5  71  21 

15.460 

289 

8  35  21 

17.000 

240 

5  76  00 

15.492 

290 

8  41  00 

17.029 

241 

5  80  81 

15.524 

291 

8  46  81 

17.059 

242 

5  85  64 

15.556 

292 

8  52  64 

17.088 

243 

5  90  49 

15.588 

293 

8  58  49 

17.117 

244 

5  95  36 

15.620 

294 

8  64  36 

17.146 

245 

6  00  25 

15.652 

295 

8  70  25 

17.176 

246 

6  05  16 

15.684 

296 

8  76  16 

17.205 

247 

6  10  09 

15.716 

297 

8  82  09 

17.234 

248 

6  15  04 

15.748 

298 

8  88  04 

17.263 

249 

6  20  01 

15.780 

299 

8  94  01 

17 . 292 

250 

6  25  00 

15.811 

300 

9  00  00 

17.321 

306      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 


Table  of  Squares  and  Square  Roots 


Number 

Square 

Square  Root 

301 

9  06  01 

17.349 

302 

9  12  04 

17.378 

303 

9  18  09 

17.407 

304 

9  24  16 

17.436 

305 

9  30  25 

17.464 

306 

9  36  36 

17.493 

307 

9  42  49 

17.521 

308 

9  48  64 

17.550 

309 

9  54  81 

17.578 

310 

9  61  00 

17.607 

311 

9  67  21 

17.635 

312 

9  73  44 

17 . 664 

313 

9  79  69 

17.692 

314 

9  85  96 

17.720 

315 

9  92  25 

17.748 

316 

9  98  56 

17.776 

317 

10  04  89 

17 . 804 

318 

10  11  24 

17.833 

319 

10  17  61 

17.861 

320 

10  24  00 

17.889 

321 

10  30  41 

17.916 

322 

10  36  84 

17.944 

323 

10  43  29 

17.972 

324 

10  49  76 

18.000 

325 

10  56  25 

18.028 

326 

10  62  76 

18.055 

327 

10  69  29 

18.083 

328 

10  75  84 

18.111 

329 

10  82  41 

18.138 

330 

10  89  00 

18.166 

331 

10  95  61 

18.193 

332 

11  02  24 

18.221 

333 

1108  89 

18.248 

334 

11  15  56 

18.276 

335 

11  22  25 

18.303 

336 

11  28  96 

18.330 

337 

11  35  69 

18.358 

338 

11  42  44 

18.385 

339 

1149  21 

18.412 

340 

1156  00 

18.439 

341 

11  62  81 

18.466 

342 

11  69  64 

18.493 

343 

11  76  49 

18.520 

344 

11  83  36 

18.547 

345 

11  90  25 

18.574 

346 

11  97  16 

18.601 

347 

12  04  09 

18.628 

348 

12  11  04 

18.655 

349 

12  18  01 

18.682 

350 

12  25  00 

18.708 

^.re  Roots — Continued 

Number 

Square 

Square  Root 

351 

12  32  01 

18.735 

352 

12  39  04 

18.762 

353 

12  46  09 

18.788 

354 

12  53  16 

18.815 

355 

12  60  25 

18.841 

356 

12  67  36 

18.868 

357 

12  74  49 

18.894 

358 

12  81  64 

18.921 

359 

12  88  81 

18.947 

360 

12  96  00 

18.974 

361 

13  03  21 

19.000 

362 

13  10  44 

19.026 

363 

13  17  69 

19.053 

364 

13  24  96 

19.079 

365 

13  32  25 

19.105 

366 

13  39  56 

19.131 

367 

13  46  89 

19.157 

368 

13  54  24 

19.183 

369 

13  61  61 

19.209 

370 

13  69  00 

19.235 

371 

13  76  41 

19.261 

372 

13  83  84 

19.287 

373 

13  91  29 

19.313 

374 

13  98  76 

19.339 

375 

14  06  25 

19.363 

376 

14  13  76 

19.391 

377 

14  21  29 

19.416 

378 

14  28  84 

19.442 

379 

14  36  41 

19.46S 

380 

14  44  00 

19 . 494 

381 

14  51  61 

19.519 

382 

14  59  24 

19.545 

383 

14  66  89 

19.570 

384 

14  74  56 

19.596 

385 

14  82  25 

19.621 

386 

14  89  96 

19.647 

387 

14  97  69 

19.672 

388 

15  05  44 

19.698 

389 

15  13  21 

19.723 

390 

15  21  00 

19.748 

391 

15  28  81 

19.774 

392 

15  36  64 

19.799 

393 

15  44  49 

19.824 

394 

15  52  36 

19.849 

395 

15  60  25 

19.875 

396 

15  6S  16 

19.900 

397 

15  76  09 

19.925 

398 

15  84  04 

19 . 950 

399 

15  92  01 

19.975 

400 

16  00  00 

20.000 

STATISTICAL  METHOD  AND  TEST  RESULTS 


307 


Table  of  Squares  and  Square  Roots — Continued 


Number 

Square 

Square  Root 

Number 

Square 

Square  Root 

401 

16  08  01 

20.025 

451 

20  34  01 

21.237 

402 

16  16  04 

20 . 050 

452 

20  43  04 

21.260 

403 

16  24  09 

20 . 075 

453 

20  52  09 

21 . 284 

404 

16  32  16 

20.100 

454 

20  61  16 

21.307 

405 

16  40  25 

20.125 

455 

20  70  25 

21.331 

406 

16  48  36 

20.149 

456 

20  79  36 

21.354 

407 

16  56  49 

20.174 

457 

20  88  49 

21.378 

408 

16  64  64 

20.199 

458 

20  97  64 

21.401 

409 

16  72  81 

20 . 224 

459 

21  06  81 

21.424 

410 

16  81  00 

20.248 

460 

21  16  00 

21.448 

411 

16  89  21 

20.273 

461 

2125  21 

21.471 

412 

16  97  44 

20.298 

462 

21  34  44 

21.494 

413 

17  05  69 

20.322 

463 

21  43  69 

21.517 

414 

17  13  96 

20.347 

464 

21  52  96 

21.541 

415 

17  22  25 

20.372 

465 

21  62  25 

21.564 

416 

17  30  56 

20.396 

466 

21  71  56 

21.587 

417 

17  38  89 

20.421 

467 

21  80  89 

21.610 

418 

17  47  24 

20.445 

468 

21  90  24 

21.633 

419 

17  55  61 

20.469 

469 

21  99  61 

21.656 

420 

17  64  00 

20.494 

470 

22  09  00 

21.679 

421 

17  72  41 

20.518 

471 

22  18  41 

21.703 

422 

17  80  84 

20.543 

472 

22  27  84 

21.726 

423 

17  89  29 

20.567 

473 

22  37  29 

21 . 749 

424 

17  97  76 

20.591 

474 

22  46  76 

21.772 

425 

18  06  25 

20.616 

475 

22  56  25 

21.794 

426 

18  14  76 

20.640 

476 

22  65  76 

21.817 

427 

18  23  29 

20.664 

477 

22  75  29 

21.840 

428 

18  31  84 

20.688 

478 

22  84  84 

21.863 

429 

18  40  41 

20.712 

479 

22  94  41 

21.886 

430 

18  49  00 

20.736 

480 

23  04  00 

21.909 

431 

18  57  61 

20.761 

481 

23  13  61 

21.932 

432 

18  66  24 

20.785 

482 

23  23  24 

21.954 

433 

18  74  89 

20.809 

483 

23  32  89 

21.977 

434 

18  83  56 

20.833 

484 

23  42  56 

22 . 000 

435 

18  92  25 

20.857 

485 

23  52  25 

22 . 023 

436 

19  00  96 

20.881 

486 

23  61  96 

22 . 045 

437 

19  09  69 

20.905 

487 

23  71  69 

22 . 068 

438 

19  18  44 

20.928 

488 

23  81  44 

22.091 

439 

19  27  21 

20.952 

489 

23  91  21 

22.113 

440 

19  36  00 

20.976 

490 

24  01  00 

22.136 

441 

19  44  81 

21 . 000 

491 

24  10  81 

22.159 

442 

19  53  64 

21.024 

492 

24  20  64 

22.181 

443 

19  62  49 

21.048 

493 

24  30  49 

22 . 204 

444 

19  71  36 

21.071 

494 

24  40  36 

22 . 226 

445 

19  80  25 

21.095 

495 

24  50  25 

22.249 

446 

19  89  16 

21.119 

496 

24  60  16 

22.271 

447 

19  98  09 

21.142 

497 

24  70  09 

22 . 293 

448 

20  07  04 

21.166 

498 

24  80  04 

22.316 

449 

20  16  01 

21.190 

499 

24  90  01 

22.338 

450 

20  25  00 

21.213 

500 

25  00  00 

22.361 

308       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 


Table  of  Squares  and  Square  Roots — Continued 


Number 

Square 

Square  Root 

Number 

Square 

Square  Root 

501 

25  10  01 

22 . 383 

551 

30  36  01 

23.473 

502 

25  20  04 

22 . 405 

552 

30  47  04 

23.495 

503 

25  30  09 

22.428 

553 

30  58  09 

23.516 

504 

25  40  16 

22.450 

554 

30  69  16 

23.537 

505 

25  50  25 

22.472 

555 

30  80  25 

23 . 558 

506 

25  60  36 

22 . 494 

556 

30  91  36 

23 . 580 

507 

25  70  49 

22.517 

557 

31  02  49 

23.601 

508 

25  80  64 

22 . 539 

558 

31  13  64 

23.622 

509 

25  90  81 

22.561 

559 

31  24  81 

23 . 643 

510 

26  01  00 

22 . 583 

560 

31  36  00 

23 . 664 

511 

26  11  21 

22 . 605 

561 

31  47  21 

23 . 685 

512 

26  21  44 

22 . 627 

562 

31  58  44 

23 . 707 

513 

26  31  69 

22 . 650 

563 

31  69  69 

23.728 

514 

26  41  96 

22 . 672 

564 

31  80  96 

23 . 749 

515 

26  52  25 

22 . 694 

565 

31  92  25 

23.770 

516 

26  62  56 

22.716 

566 

32  03  56 

23.791 

517 

26  72  89 

22.738 

567 

32  14  89 

23.812 

518 

26  83  24 

22 . 760 

568 

32  26  24 

23.833 

519 

26  93  61 

22 . 782 

569 

32  37  61 

23 . 854 

520 

27  04  00 

22.804 

570 

32  49  00 

23 . 875 

521 

27  14  41 

22 . 825 

571 

32  60  41 

23.896 

522 

27  24  84 

22 . 847 

572 

32  71  84 

23.917 

523 

27  35  29 

22 . 869 

573 

32  83  29 

23.937 

524 

27  45  76 

22.891 

574 

32  94  76 

23.958 

525 

27  56  25 

22.913 

575 

33  06  25 

23.979 

526 

27  66  76 

22.935 

576 

33  17  76 

24 . 000 

527 

27  77  29 

22.956 

577 

33  29  29 

24.021 

528 

27  87  84 

22 . 978 

578 

33  40  84 

24 . 042 

529 

27  98  41 

23.000 

579 

33  52  41 

24 . 062 

530 

28  09  00 

23 . 022 

580 

33  64  00 

24.0S3 

531 

28  19  61 

23 . 043 

581 

33  75  61 

24.104 

532 

28  30  24 

23 . 065 

582 

33  S7  24 

24.125 

533 

28  40  89 

23 . 087 

583 

33  98  89 

24.145 

534 

28  51  56 

23.108 

584 

34  10  56 

24.166 

535 

28  62  25 

23.130 

585 

34  22  25 

24.1S7 

536 

28  72  96 

23 . 152 

586 

34  33  96 

24.207 

537 

28  83  69 

23.173 

587 

34  45  69 

24.228 

538 

28  94  44 

23.195 

528 

34  57  44 

24.249 

539 

29  05  21 

23.216 

589 

34  69  21 

24.269 

540 

29  16  00 

23.238 

590 

34  81  00 

24 . 290 

541 

29  26  81 

23.259 

591 

34  92  81 

24.310 

542 

29  37  64 

23.281 

592 

35  04  64 

24.331 

543 

29  48  49 

23 . 302 

593 

35  16  49 

24 . 352 

544 

29  59  36 

23 . 324 

594 

35  28  36 

24.372 

545 

29  70  25 

23.345 

595 

35  40  25 

24.393 

546 

29  81  16 

23 . 367 

596 

35  52  16 

24.413 

547 

29  92  09 

23 . 388 

597 

35  04  09 

24.434 

548 

30  03  04 

23.409 

598 

35  76  04 

24.454 

549 

30  14  01 

23.431 

599 

35  88  01 

24.474 

550 

30  25  00 

23.452 

600 

36  00  00 

24.495 

STATISTICAL  METHOD  AND  TEST  RESULTS 


309 


Table  of  Squares  and  Square  Roots — Continued 


Number 

Square 

Square  Root 

Number 

Square 

Square  Roc 

601 

36  12  01 

24.515 

651 

42  38  01 

25.515 

602 

36  24  04 

24.536 

652 

42  51  04 

25 . 534 

603 

36  36  09 

24 . 556 

653 

42  64  09 

25 . 554 

604 

36  48  16 

24.576 

654 

42  77  16 

25.573 

605 

36  60  25 

24 . 597 

655 

42  90  25 

25.593 

606 

36  72  36 

24.617 

656 

43  03  36 

25.612 

607 

36  84  49 

24 . 637 

657 

43  16  49 

25 . 632 

608 

36  96  64 

24 . 658 

658 

43  29  64 

25 . 652 

609 

37  08  81 

24.678 

659 

43  42  81 

25.671 

610 

37  21  00 

24 . 698 

660 

43  56  00 

25 . 690 

611 

37  33  21 

24.718 

661 

43  69  21 

25.710 

612 

37  45  44 

24.739 

662 

43  82  44 

25.729 

613 

37  57  69 

24.759 

663 

43  95  69 

25 . 749 

614 

37  69  96 

24.779 

664 

44  08  96 

25.768 

615 

37  82  25 

24.799 

665 

44  22  25 

25.788 

616 

37  94  56 

24.819 

666 

44  35  56 

25 . 807 

617 

38  06  89 

24.839 

667 

44  48  89 

25.826 

618 

38  19  24 

24.860 

668 

44  62  24 

25 . 846 

619 

38  31  61 

24.880 

669 

44  75  61 

25.865 

620 

38  44  00 

24 . 900 

670 

44  89  00 

25.884 

621 

38  56  41 

24 . 920 

671 

45  02  41 

25 . 904 

622 

38  68  84 

24.940 

672 

45  15  84 

25.923 

623 

38  81  29 

24.960 

673 

45  29  29 

25.942 

624 

38  93  76 

24.980 

674 

45  42  76 

25 . 962 

625 

39  06  25 

25 . 000 

675 

45  56  25 

25.981 

626 

39  18  76 

25 . 020 

676 

45  69  76 

26 . 000 

627 

39  31  29 

25 . 040 

677 

45  83  29 

26.019 

628 

39  43  84 

25 . 060 

678 

45  96  84 

26.038 

629 

39  56  41 

25.080 

679 

46  10  41 

26 . 058 

630 

39  69  00 

25.100 

680 

46  24  00 

26.077 

631 

39  81  61 

25.120 

681 

46  37  61 

26.096 

632 

39  94  24 

25.140 

682 

46  51  24 

26.115 

633 

40  06  89 

25.159 

683 

46  64  89 

26.134 

634 

40  19  56 

25.179 

684 

46  78  56 

26.153 

635 

40  32  25 

25.199 

685 

46  92  25 

26.173 

636 

40  44  96 

25.219 

686 

47  05  96 

26.192 

637 

40  57  69 

25 . 239 

687 

47  19  69 

26.211 

638 

40  70  44 

25.259 

688 

47  33  44 

26.230 

639 

40  83  21 

25 . 278 

689 

47  47  21 

26 . 249 

640 

40  96  00 

25 . 298 

690 

47  61  00 

26.268 

641 

41  08  81 

25.318 

691 

47  74  81 

26.287 

642 

41  21  64 

25.338 

692 

47  88  64 

26.306 

643 

41  34  49 

25.357 

693 

48  02  49 

26.325 

644 

41  47  36 

25.377 

694 

48  16  36 

26.344 

645 

41  60  25 

25.397 

695 

48  30  25 

26 . 363 

646 

41  73  16 

25.417 

696 

48  44  16 

26.382 

647 

41  86  09 

25.436 

697 

48  58  09 

26.401 

648 

41  99  04 

25.456 

698 

48  72  04 

26.420 

649 

42  12  01 

25.475 

699 

48  86  01 

26 . 439 

650 

42  25  00 

25.495 

700 

49  00  00 

26.458 

310      STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 


Table  of  Squares  and  Square  Roots — Continued 


dumber 

Square 

Square  Root 

Number 

Square 

Square  Root 

701 

49  14  01 

26.476 

751 

56  40  01 

27 . 404 

702 

49  28  04 

26.495 

752 

56  55  04 

27.423 

703 

49  42  09 

26.514 

753 

56  70  09 

27.441 

704 

49  56  16 

26.533 

754 

56  85  16 

27.459 

705 

49  70  25 

26.552 

755 

57  00  25 

27.477 

706 

49  84  36 

26.571 

756 

57  15  36 

27.495 

707 

49  98  49 

26 . 589 

757 

57  30  49 

27.514 

708 

50  12  64 

26 . 608 

758 

57  45  64 

27.532 

709 

50  26  81 

26.627 

759 

57  60  81 

27.550 

710 

50  41  00 

26 . 646 

760 

57  76  00 

27.568 

711 

50  55  21 

26 . 665 

761 

57  9121 

27.586 

712 

50  69  44 

26 . 683 

762 

58  06  44 

27.604 

713 

50  83  69 

26.702 

763 

58  21  69 

27.622 

714 

50  97  96 

26.721 

764 

58  36  96 

27.641 

715 

51  12  25 

26.739 

765 

58  52  25 

27.659 

716 

51  26  56 

26 . 758 

766 

58  67  56 

27.677 

717 

51  40  89 

26.777 

767 

58  82  89 

27.695 

718 

51  55  24 

26.796 

768 

58  98  24 

27.713 

719 

51  69  61 

26.814 

769 

59  13  61 

27.731 

720 

51  84  00 

26 . 833 

770 

59  29  00 

27 . 749 

721 

51  98  41 

26.851 

771 

59  44  41 

27.767 

722 

52  12  84 

26 . 870 

772 

59  59  84 

27 . 785 

723 

52  27  29 

26 . 889 

773 

59  75  29 

27 . 803 

724 

52  41  76 

26.907 

774 

59  90  76 

27.821 

725 

52  56  25 

26.926 

775 

60  06  25 

27.839 

726 

52  70  76 

26.944 

776 

60  21  76 

27.857 

727 

52  85  29 

26 . 963 

777 

60  37  29 

27.875 

728 

52  99  84 

26.981 

778 

60  52  84 

27.893 

729 

53  14  41 

27 . 000 

779 

60  68  41 

27.911 

730 

53  29  00 

27.019 

780 

60  84  00 

27.92S 

731 

53  43  61 

27.037 

781 

60  99  61 

27.946 

732 

53  58  24 

27 . 055 

782 

61  15  24 

27.964 

733 

53  72  89 

27.074 

783 

61  30  89 

27 . 982 

734 

53  87  56 

27.092 

784 

61  46  56 

28.000 

735 

54  02  25 

27.111 

785 

61  62  25 

28.018 

736 

54  16  96 

27.129 

786 

61  77  96 

2S.036 

737 

54  31  69 

27.148 

787 

61  93  69 

28.054 

738 

54  46  44 

27.166 

788 

62  09  44 

2S.071 

739 

54  61  21 

27.185 

789 

62  25  21 

28.089 

740 

54  76  00 

27 . 203 

790 

62  41  00 

2S.107 

741 

54  90  81 

27.221 

791 

62  56  SI 

28.125 

742 

55  05  64 

27 . 240 

792 

62  72  64 

28.142 

743 

55  20  49 

27.258 

793 

62  88  49 

28.160 

744 

55  35  36 

27.276 

794 

63  04  36 

28.178 

745 

55  50  25 

27 . 295 

795 

63  20  25 

28.196 

746 

55  65  16 

27.313 

796 

63  36  16 

28.213 

747 

55  80  09 

27.331 

797 

63  52  09 

28.231 

748 

55  95  04 

27.350 

798 

63  68  04 

28.249 

749 

56  10  01 

27 . 368 

799 

63  84  01 

28.267 

750 

56  25  00 

27.386 

800 

64  00  00 

2S.2S4 

STATISTICAL  METHOD  AND  TEST  RESULTS 


311 


Table  of  Squares  and  Square  Hoots — Continued 


lumber 

Square 

Square  Root 

801 

64  16  01 

28.302 

802 

64  32  04 

28.320 

803 

64  48  09 

28.337 

804 

64  64  16 

28.355 

805 

64  80  25 

28.373 

806 

64  96  36 

28.390 

807 

65  12  49 

28.408 

808 

65  28  64 

28 . 425 

809 

65  44  81 

28.443 

810 

65  61  00 

28.460 

811 

65  77  21 

28.478 

812 

65  93  44 

28.496 

813 

66  09  69 

28.513 

814 

66  25  96 

28.531 

815 

66  42  25 

28.548 

816 

66  58  56 

28.566 

817 

66  74  89 

28.583 

818 

66  91  24 

28.601 

819 

67  07  61 

28.618 

820 

67  24  00 

28 . 636 

821 

67  40  41 

28.653 

822 

67  56  84 

28.671 

823 

67  73  29 

28.688 

824 

67  89  76 

28.705 

825 

68  06  25 

28.723 

826 

68  22  76 

28.740 

827 

68  39  29 

28.758 

828 

68  55  84 

28.775 

829 

68  72  41 

{   28.792 

830 

68  89  00 

28.810 

831 

69  05  61 

28 . 827 

832 

69  22  24 

28.844 

833 

69  38  89 

28.862 

834 

69  55  56 

28 . 879 

835 

69  72  25 

28.896 

836 

69  88  96 

28.914 

837 

70  05  69 

28.931 

838 

70  22  44 

28 . 948 

839 

70  39  21 

28.965 

840 

70  56  00 

28.983 

841 

70  72  81 

29 . 000 

842 

70  89  64 

29.017 

843 

71  06  49 

29 . 034 

844 

71  23  36 

29 . 052 

845 

7140  25 

29 . 069 

846 

71  57  16 

29.086 

847 

71  74  09 

29.103 

848 

71  91  04 

29.120 

849 

72  08  01 

29.138 

850 

72  25  00 

29.155 

Number 

Square 

Square  Root 

851 

72  42  01 

29.172 

852 

72  59  04 

29.189 

853 

72  76  09 

29 . 206 

854 

72  93  16 

29 . 223 

855 

73  10  25 

29.240 

856 

73  27  36 

29.257 

857 

73  44  49 

29.275 

858 

73  61  64 

29 . 292 

859 

73  78  81 

29.309 

860 

73  96  00 

29 . 326 

861 

74  13  21 

29 . 343 

862 

74  30  44 

29 . 360 

863 

74  47  69 

29.377 

864 

74  64  96 

29.394 

865 

74  82  25 

29.411 

866 

74  99  56 

29 . 428 

867 

75  16  89 

29.445 

868 

75  34  24 

29 . 462 

869 

75  51  61 

29 . 479 

870 

75  69  00 

29.496 

871 

75  86  41 

29.513 

872 

76  03  84 

29 . 530 

873 

76  21  29 

29 . 547 

874 

76  38  76 

29 . 563 

875 

76  56  25 

29.580 

876 

76  73  76 

29.597 

877 

76  91  29 

29.614 

878 

77  08  84 

29.631 

879 

77  26  41 

29 . 648 

880 

77  44  00 

29 . 665 

881 

77  61  61 

29.682 

882 

77  79  24 

29 . 698 

883 

77  96  89 

29.715 

884 

78  14  56 

29.732 

885 

78  32  25 

29 . 749 

886 

78  49  96 

29 . 766 

887 

78  67  69 

29 . 783 

888 

78  85  44 

29.799 

889 

79  03  21 

29.816 

890 

79  21  00 

29.833 

891 

79  38  81 

29 . 850 

892 

79  56  64 

29.866 

893 

79  74  49 

29 . 883 

894 

79  92  36 

29.900 

895 

80  10  25 

29.916 

896 

80  28  16 

29 . 933 

897 

80  46  09 

29 . 950 

898 

80  64  04 

29 . 967 

899 

80  82  01 

29.983 

900 

81  00  00 

30.000 

312       STATISTICS  IN  PSYCHOLOGY  AND  EDUCATION 


Table  of  Squares  and  Square  Roots — Continued 


Number 

Square 

Square  Root 

Number 

Square 

Square  Root 

901 

81  18  01 

30.017 

951 

90  44  01 

30.838 

902 

81  36  04 

30 . 033 

952 

90  63  04 

30 . 854 

903 

81  54  09 

30 . 050 

953 

90  82  09 

30.871 

904 

81  72  16 

30.067 

954 

91  01  16 

30 . 887 

905 

81  90  25 

30.083 

955 

91  20  25 

30.903 

906 

82  08  36 

30.100 

956 

91  39  36 

30.919 

907 

82  26  49 

30.116 

957 

91  58  49 

30.935 

908 

82  44  64 

30.133 

958 

91  77  64 

30.952 

909 

82  62  81 

30.150 

959 

91  96  81 

30.968 

910 

82  81  00 

30.166 

960 

92  16  00 

30 . 984 

911 

82  99  21 

30.183 

961 

92  35  21 

31.000 

912 

83  17  44 

30.199 

962 

92  54  44 

31.016 

913 

83  35  69 

30.216 

963 

92  73  69 

31.032 

914 

83  53  96 

30.232 

964 

92  92  96 

31.048 

915 

83  72  25 

30.249 

965 

93  12  25 

31.064 

916 

83  90  56 

30.265 

966 

93  31  56 

31.081 

917 

84  08  89 

30 . 282 

967 

93  50  89 

31.097 

918 

84  27  24 

30 . 299 

968 

93  70  24 

31.113 

919 

84  45  61 

30.315 

969 

93  89  61 

31.129 

920 

84  64  00 

30.332 

970 

94  09  00 

31.145 

921 

84  82  41 

30.348 

971 

94  28  41 

31.161 

922 

85  00  84 

30.364 

972 

94  47  84 

31.177 

923 

85  19  29 

30.381 

973 

94  67  29 

31.193 

924 

85  37  76 

30.397 

974 

94  86  76 

31.209 

925 

85  56  25 

30.414 

975 

95  06  25 

31.225 

926 

85  74  76 

30.430 

976 

95  25  76 

31.241 

927 

85  93  29 

30.447 

977 

95  45  29 

31.257 

928 

86  11  84 

30 . 463 

978 

95  64  84 

31.273 

929 

86  30  41 

30.480 

979 

95  84  41 

31.289 

930 

86  49  00 

30.496 

980 

96  04  00 

31.305 

931 

86  67  61 

30.512 

981 

96  23  61 

31.321 

932 

86  86  24 

30 . 529 

982 

96  43  24 

31.337 

933 

87  04  89 

30 . 545 

983 

96  62  89 

31.353 

934 

87  23  56 

30.561 

984 

96  82  56 

31.369 

935 

87  42  25 

30.578 

985 

97  02  25 

31.3S5 

936 

87  60  96 

30.594 

986 

97  21  96 

31.401 

937 

87  79  69 

30.610 

987 

97  41  69 

31.417 

938 

87  98  44 

30 . 627 

988 

97  61  44 

31.432 

939 

88  17  21 

30 . 643 

989 

97  81  21 

31.448 

940 

88  36  00 

30.659 

990 

98  01  00 

31.464 

941 

88  54  81 

30.676 

991 

9S  20  81 

31.4S0 

942 

88  73  64 

30 . 692 

992 

98  40  64 

31.496 

943 

88  92  49 

30.708 

993 

98  60  49 

31.512 

944 

89  11  36 

30.725 

994 

98  80  36 

31.528 

945 

89  30  25 

30.741 

995 

99  00  25 

31.544 

946 

89  49  16 

30.757 

996 

99  20  16 

31.559 

947 

89  68  09 

30.773 

997 

99  40  09 

31.575 

948 

89  87  04 

30.790 

998 

99  60  04 

31.591 

949 

90  06  01 

30.806 

999 

99  SO  01 

31.607 

950 

90  25  00 

30.822 

1000 

100  00  00 

31.623 

INDEX 


Italics  are  used  for  Reference  to  Definitions. 


Age-scale,  109,  110 

Array,  155 

Attenuation,  211;  correction  for, 
212 

Average,  8,  9,  28,  31,  50,  51;  relia- 
bility of  an,  121 

Average  deviation  or  AD,  22,  23, 
32,  34,  35,  51,  52 

Axes,  coordinate,  60;  use  in  cor- 
relation, 159,  175 

Barlow's  Tables,  302 

Bias  in  sampling,  144.  See  Sam- 
pling. 

Binomial  expansion,  79;  in  prob- 
ability, 77-80;  graphic  repre- 
sentation of,  80 

Blakeman,  J.,  test  for  linearity, 
210 

Bowley,  A.  L.,  302 

Bravais,  163 

Brown,  Wm,  269,  292 

Brown  and  Thomson,  191,  218, 
302 

Burt,  Cyril,  251 

Carothers,  F.  E.,  134,  280,  300 

Central  tendencies,  8-16;  reliabil- 
ity of  measures  of,  120-127 

Classification  of  measures  into  fre- 
quency distributions,  2-4 

Class-interval.     See  Step-interval. 

Coefficient  of  alienation,  289 


Coefficient  of  contingency,  198; 
computation  of,  198-199;  com- 
parison with  correlation  coeffi- 
cient, 200;  short  method  of 
computing,  201 

Coefficient  of  correlation,  1^9; 
as  a  ratio,  152-153;  repre- 
sented graphically,  158-159; 
steps  in  computation  of,  from 
guessed  average,  163-168;  steps 
in  computation  of,  from  aver- 
age, 169-170;  reliability  of, 
170;  interpretation  of,  288- 
299.     See  also  Correlation. 

Coefficient  of  regression,  175,  178 

Coefficient  of  variation,  calcula- 
tion of,  41-42 

Coin  tossing,  in  experiments  on 
laws  of  chance,  79-81 

Column  diagram.  See  Histogram. 

Comparison  of  groups  in  terms  of 
central  tendencies  and  variabil- 
ities, 42;  in  terms  of  overlap- 
ping, 45 

Comparison  of  obtained  distribu- 
tions with  normal  probability 
curve,  81 

Contingency  method,  195-203. 
See  also  Coefficient  of  contin- 
gency. 

Continuous  series,  1;  tabulation 
of  measures  in,  2-7 

Correction,   computation  of  cor- 


313 


314 


INDEX 


rection,  C,  in  Short  Method, 
31;   for  attenuation,  211 

Correlation,  149-152;  positive, 
negative,  and  zero,  150-151; 
graphic  representation  of,  161— 
162;  construction  of  correla- 
tion table,  154;  product-mo- 
ment method  of  computing, 
163-170;  rank  methods  of 
computing,  189-195;  spurious, 
258;  effect  of  errors  of  observa- 
tion on,  211.  See  also  Par- 
tial correlation  and  Multiple 
correlation. 

Correlation-ratio ;  in  non-linear 
relation,  204-205;  steps  in 
computing,  206;  comparison 
with  r  to  determine  linearity  of 
regression,  209-210;  correction 
of  " raw"  eta,  209;  reliability 
of,  208 

Criterion,  266;  value  of,  in  deter- 
mining validity  of  tests,  266- 
267 

Cumulative  errors,  effect  on  mul- 
tiple R,  238-239 

Deciles,  45.    See  Percentiles. 

Deviation.  See  Quartile  devia- 
tion, Average  deviation,  and 
Standard  deviation 

Dice  throwing,  in  experiments  on 
laws  of  chance,  80-81 

Difference,  reliability  of,  between 
measures  of  central  tendency, 
128-137;  reliability  of,  be- 
tween two  r's,  171.  See  Stand- 
ard and  Probable  error. 

Discrete  series,  2;  median  in,  12; 
short  method  applied  to,  36 

Elderton,  W.  P.  and  E.  M.,  301 
Equation,   of  straight  line,    175; 


plotting  of  linear,  176-178;  of 
regression  lines,  in  Deviation 
Form,  178-179;  in  Score  Form, 
180-182 

Error,  curve  of,  83.  See  also  Nor- 
mal curve. 

Errors,  of  sampling,  143;  of  ob- 
servation, 211;  constant,  274 
variable  274.  See  also  Prob- 
able and  Standard  errors. 

Footrule  (Spearman's)  in  corre- 
lation, 192-195 

Frequency  distribution,  three 
methods  of  constructing,  3-4 

Frequency  Polygon,  59-63;  com- 
parison with  histogram,  65 

Garrett,  H.  E.,  114 

Grades,  method  of,  in  correlation, 
192.     See  also  Footrule. 

Graphic  methods,  of  representing 
data,  in  a  frequency  distribu- 
tion, 59-71;  of  representing 
correlation  coefficient,  158-162 

Grouping,  in  tabulation,  3;  as- 
sumptions in,  5 

Heterogeneity,  effect  of,  on  cor- 
relation, 259;  on  reliability,  271 
Hillegas,  Milo  B.,  108 
Histogram,    63-66 ;     comparison 

with  frequenc}r  potygon,  65 
Holzinger,  Karl  J.,  271 
Homogeneity  of  a  group,  17 
Hull,  Clark,  method  of  transmut- 
ing ranks,  111-115;   method  of 
combining  tests,  282,  300 

Index  of  reliability,  273 

Jerome,  Harry,  82 
Jones,  D.  Caradog,  S3,    S4,    174, 
211,  302 


INDEX 


315 


Kelley,  T.  L.,  33,  195,  254,  259, 
263,  267,  272,  273,  289,  292,  302 

Law  of  normal  frequency,  82 

Line  graphs,  72-73 

Line  of  means,  best  fitting  line, 

160,    173;     plotting    of,    175; 

equation  of,  175-182 
Linearity  of  relation,  203;    tests 

for,  209-210 

May,  Mark  A.,  223,  224,  244,  263 

McCall,  W.  A.,  109,  110,302 

Mean,  Arithmetic.     See  Average. 

Mean  deviation.  See  Average 
deviation. 

Median,  11;  12,  13,  38,  50;  reli- 
ability of,  126 

Methods  of  combining  test  scores, 
277;  by  percentiles,  278;  by 
median  mental  age,  279;  by 
variability  of  test  scores,  279- 
281;  by  conversion  into  com- 
parable distributions,  281-284 

Middle  50%,  21,  85 

Midpoint  of  step,  how  to  find,  6; 
as  representative  of  all  the 
scores  on  the  step,  6 

Midscore,  in  ungrouped  discrete 
series  when  N  is  even,  12;  when 
N  is  odd,  12 

Miner,  John  Rice,  302 

Monroe,  W.  S.,  185,  302 

Mode,  15,  16,50 

Moore,  H.  L.,  255 

Multiple  coefficient  of  correlation, 
R,  222;  computation  of,  230- 
231;  general  formula  for,  238; 
"chance"  R,  239;  alternate 
forms  for,  239 

Musselman,  J.  R.,  261 

Non-linear  relation,  203-205 
Normal  curve,  74',  deduction  from 


binomial  expansion,  80;  why 
employed  in  psychological  meas- 
urement, 81-84;  properties  of, 
84-85;  use  in  the  solution  of  a 
variety  of  problems,  94ff;  in 
test  making,  101-109;  in  trans- 
mutation of  ranks,  111-115;  in 
measuring  reliability,  123,  131 

Normal  probability  curve.  See 
Normal  curve. 

Normal  frequency  distribution, 
83 ;  illustrations  of,  75 

Ogive,  66;    construction  of,  67, 
71;  smoothing  of,  68;  in  calcu- 
lating percentiles,  69-70 
Otis  Correlation  Chart,  167 
Otis,  A.  S.,  217,  259,  272,  302 
Overlapping,  in  the  measurement 
of  groups,  44-45;    of  elements 
or  factors  in  correlation,  291- 
299 

Partial  correlation,   221;    illus- 
tration   of,     in    three-variable 
problem,  223-231;  notation  in, 
232;    general  formulas  for  use 
in,  231-240;  models  of  four-  and 
five-variable     problems,     240- 
244;     illustration    of,   in  four- 
variable      problem,      244-251 ; 
value  of,  in  analysis  and  causal 
investigations,  25  Iff ;  limitations 
to  use  of,  258 
Pearl,  Raymond,  295,  297 
Pearson,  Karl,  163,  200,  205,  209 
Percentile  scale,  109;    evaluation 

of,  209 
Percentiles,  calculation  of,  45ff, 
percentile  scores,  46;  graphic 
method  of  finding,  69;  method 
of  combining  scores  from  dif- 
ferent tests,  278 


316 


INDEX 


Phillips,  Frank  M.,  252 

Pintner  and  Patterson,  49,  279 

Probable  error,  relation  to  Q,  21; 
relation  to  other  measures  of 
variability  in  a  normal  distri- 
bution, 85;  use  in  solution  of 
problems,  94-109 

Probable  error,  of  an  average, 
125ff;  of  a  median,  126;  of  a, 
127;  of  a  difference,  129;  table 
for  finding  reliability  of  a  dif- 
ference in  terms  of,  135;  of  a 
coefficient  of  correlation,  170- 
171 

Probable  error  of  estimate,  in  pre- 
diction, 184-185;  in  partial  and 
multiple  correlation,  237 

Probable  error  of  measurement, 
274-276 

Product-moment  method  of  find- 
ing r,  deviations  from  GA,  163- 
168;  deviations  from  average, 
168-170 

Pyle,  W.  H.,  279 

Quartile  deviation  (Q),  17,  18- 
22;  in  discrete  series,  40;  when 
to  use,  50 

Quartiles,  Qi  and  Qz,  computa- 
tion of,  18-19 

r,  Product-moment  coefficient  of 
correlation,  formulas  for,  167, 
168.  See  Coefficient  of  correla- 
tion, and  Correlation. 

Random  sample,  142-145.  See 
also  Sampling. 

Range,  2,  17,  50 

Rank  difference  method  of  com- 
puting correlation,  189ff;  when 
to  use,  195 

Ranks,  transmutation  of,  into 
units  of  amount,  11  Iff 


Reavis,  George,  253 

Reduced  scores,  in  combining 
test  scores,  283-284;  in  com- 
putation of  r,  285 

Regression  equations,  deviation 
form,  174f ;  in  score  form,  180f ; 
partial  equations  of,  235;  non- 
linear, 203ff 

Regression  coefficients,  174,  178 

Relative  variability,  measures  of, 
40.  See  also  Coefficient  of 
variation. 

Reliability,  measures  of,  118-137; 
limitations  to  measures  of,  142- 
145;  coefficient  of,  268-271; 
dependence  of  coefficient  of,  on 
size  and  variability  of  group, 
271-272;  index  of,  273.  See 
also  Probable  error  and  Stand- 
ard error. 

Rietz,  H.  L.,  et  al.,  302 

Rosenow,  Curt,  239 

Ruch-Stoddard  Correlation  Sheet, 
167 

Ruch,  G.  M.  and  Del  Manzo, 
M.  C,  271,  299 

Rugg,  H.  O.,  301 


Sampling,  random,  120;  errors 
of,  142-143;  unreliability  due 
to,  144;   criteria  of,  144 

Scaling  total  scores,  109.  See  also 
Percentile  scale,  Age-scale,  T- 
scale. 

Scatter  diagram,  154 

Score,  meaning  of,  7 

Secrist,  Horace,  302 

Semi-interquartile  range,  21.  See 
Quartile  deviation. 

Skewness,  86-89 

Sommerville,  R.  C,  56,  219 

Spearman,  C,  212,  213 


INDEX 


317 


Spearman's  Footrule,  192;  proph- 
ecy formula,269 

Spurious  Correlation,  258-261 

Standard  deviation  (a),  26,  27,  35; 
relation  to  other  measures  of 
variability,  85;  reliability  of, 
127;  general  formulas  for  par- 
tial o-'s,  233-235;  of  the  sum  or 
difference  of  corresponding  val- 
ues of  two  series  of  test  scores, 
286-288 

Standard  error,  of  an  average, 
121-125;  of  a  median,  126; 
of  a  (7,  127;  of  a  Q,  128;  of  a 
difference,  128-133;  table  for 
finding  the  reliability  of  a  dif- 
ference in  terms  of,  134;  of  a 
sum  or  difference,  measures 
correlated,  and  uncorrelated, 
187 

Standard  error  of  estimate,  in 
prediction,  183;  in  partial  and 
multiple  correlation,  237;  in 
interpreting,  r,  288-290 

Standard  error  of  measurement, 
274-276 ;  in  interpreting  r,  290- 
291 

Step-interval,  2,  3,  4,  5;  midpoint 
of,  5-6;  assumptions  with  re- 
gard to  data  on,  5-6 

Tables  of  frequencies  of  normal 
probability  curve,  in  terms  of  a, 
91;  in  terms  of  PE,  93 

Tabulation,  of  measures  into  fre- 


quency   distribution,     3f;      of 

correlation  table,  154 
Thorndike,  E.  L.,  88,  301 
Thurstone  Correlation  Sheet,  167 
Thurstone,  L.  L.,  302 
Trabue,  M.  R.,  127,  137 
Transmutation  of  ranks  into  units 

of  amount,  111 
T-scale,  110 
True  scores,  118,  272-273 

Validity,  measurement  of,  in  a 
test,  266-268 

Variable  errors,  effect  on  r,  211; 
measurement  of,  274-276 

Variability,  16;  causes  of,  82,  88; 
comparison  of  groups  with  re- 
spect to,  42-44;  coefficient  of 
relative,  41 ;  reliability  of  meas- 
ures of,  127-128.  See  also  Aver- 
age deviation,  Quartile  devia- 
tion, and  Standard  deviation. 

Weighting  of  tests,  by  variability 
of  test  scores,  279 

Whitley,  M.  T.,  267 

Whipple,  G.  M.,  279 

Woody,  Clifford,  104,  105,  107 

Woodworth,  R.  S.,  method  of 
combining  tests,  283;  use  of 
"  reduced  scores  "  in  comput- 
ing r,  285 

Yule,  G.  Udny,  80,  121,  122,  196, 
200,  210,  212,  218,  221, 237, 286, 
302 


HA33  Garrett,   Henry  Edward!'    X 

education!108  "  **«**>*  and 


G192 


n 


-^hr-^J^i 


Date  Due 


Hfc33  c.    1 

G192  Garrett,   Henry  Edward. 

author     statistics    in  psychology 
and  Education. 

TITLE 


DATE    DUE 


BORROWER'S    NAME 


Ju 


:uj^^^   Ca 


o  q»,3fl^ 


L'^^^>-^ i_