THE UNIVERSITY
OF ILLINOIS
LIBRARY
370
116 ..
no.9-15
The person charging this material is re-
sponsible for its return to the library from
which it was withdrawn on or before the
Latest Date stamped below.
Theft, mutilation, and underlining of books
are reasons for disciplinary action and may
result in dismissal from the University.
UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN
•iL2 8 1!78
APR 9^91
m\
2\97S>
OCT 0 1 i<82
SEP 27 1
nrT 2 5 W»
OCT 23 w
MAY*
L161 — O-1096
Digitized by the Internet Archive
in 2012 with funding from
University of Illinois Urbana-Champaign
http://www.archive.org/details/relationofsectio11monr
BULLETIN NO. 11
BUREAU OF EDUCATIONAL RESEARCH
COLLEGE OF EDUCATION
RELATION OF SECTIONING A CLASS
TO THE EFFECTIVENESS OF
INSTRUCTION
by
Walter S. Monroe.
PRICE 15 CENTS
PUBLISHED BY THE UNIVERSITY OF ILLINOIS, URBANA
1922
37<2
PREFACE
The educational experiment reported in this bul-
letin was initiated by the former Director of the Bureau
of Educational Research and the data collected under
his supervision. The present Director of the Bureau
is responsible for the tabulation of the data and for
the preparation of this report.
This investigation was made possible through the
cooperation of Superintendent Peter A. Mortenson and
of certain principals and teachers of the Chicago Public
Schools. Not only did they cooperate in the collection
of the data but they also made substantial contribu-
tions to the project by supplying test materials. The
writer is glad to acknowledge the indebtedness of the
Bureau of Educational Research to all who contributed
to this project.
Walter S. Monroe, Director
November 10, 1922
Relation of Sectioning a Class to the
Effectiveness of Instruction
The problem. The purpose of this educational experiment was
to determine the relative effect upon the achievements in certain
school subjects of three plans of sectioning a class. A "class" is
defined as the total number of children assigned to a teacher for in-
struction even though they may be divided into two or more groups
for instructional purposes. The three plans of sectioning a class
considered in this investigation are: (1) teaching a class as a single
unit; (2) dividing the class into two equal groups approximately
equivalent with respect to general intelligence; (3) dividing the class
into three equal groups approximately equivalent with respect to
general intelligence. When a class is taught as one group, all of the
pupils recite at the same time. Following the recitation there is a
period for study. Thus under this plan the work of the teacher al-
ternates between "hearing classes" and supervising the study of
the pupils. When a class is taught as two sections, one group
recites while the other group studies. In this case the teacher's
time is almost wholly devoted to "hearing classes." Any supervi-
sion of the study of the pupils is of necessity given incidentally and at
irregular intervals when the teacher is fortunate enough to have a
few minutes of leisure during a recitation period. When a class is
divided into three sections, the conditions are much the same except
that necessarily the length of the recitation periods is reduced. In
general pupils of one section study during the recitation periods
of the other two sections.
The specific problem of this investigation was to determine
the relative effect of these three plans of sectioning a class upon the
direct results of instruction in certain school subjects. In other
words this investigation sought to answer the question, "Which is
the best plan of sectioning a class?"
General plan of the experiment. If it were possible to secure
three groups of classes so that all factors which affect the results of
instruction were equivalent in the beginning of the experiment and
could be controlled throughout the experimental period, the simplest
procedure would be to have one group_-of classes taught as a unit,
another group taught in two sections and a third group in three sec-
tions. However, it would be difficult, if not impossible, to secure
exact equivalence of teaching ability and of pupil material. Our
facilities for measuring the ability of teachers are extremely crude
and at best it would be difficult to demonstrate that any differences
found in the results of instruction were not produced largely by
differences in teaching ability. It is true that we have a number of
general intelligence tests which might be used to measure the quality
of the pupil material. However, the limitations of these instruments
are such that one would be unable to interpret small differences in
the resulting achievements.
In order to avoid these two difficulties this experiment was
planned so that the same teacher should instruct a given class when
organized according to two different plans of sectioning. This,
necessarily, must be done during successive semesters. This proce-
dure insured the constancy of the teacher, although not necessarily
of teaching ability since the ability of a given teacher may vary
from semester to semester with different types of class organization.
In order that the pupil material might be the same for the two plans
of class organization one hundred percent promotion was secured
at the middle of the school year. Thus, a teacher who instructed
a class as one section during the first semester of this experiment
instructed the same pupils during the second semester but with the
class divided into two or three sections. Other teachers taught
classes organized according to other combinations of sectioning.
This general plan of the experiment makes the semester a vari-
able factor. It is possible that pupils may normally make greater
progress during one semester than during the other. Furthermore,
the gain of second trial scores over first trial scores is likely to be
much greater than the gain of third trial scores over second trial
scores simply because the pupils become acquainted with the testing
procedure. In order to balance these two variable factors it was
necessary to arrange experimental groups in pairs. Thus, corres-
ponding to an experimental group of classes which was taught as a
single section during the first semester and as three sections during
the second semester, there was another group of classes taught as
three sections during the first semester and as a single section during
the second semester. In dividing a class into sections the scores
yielded by the general intelligence tests were used to secure sec-
tions of approximately equivalent pupil material. Six experimental
groups of classes were organized as follows:
$£'
Group I. Classes taught as a single section during the first
semester and as three sections during the second semester.
Group II. Classes taught as three sections during the first
semester and as one section during the second semester.
Group III. Classes taught as one section during the^first
semester and as two sections during the second semester.
Group IV. Classes taught as two sections during the first
semester and as one section during the second semester.
Group V. Classes taught as two sections during the first
semester and as three sections during the second semester.
Group VI. Classes taught as three sections during the first
semester and as two sections during the second semester.
So far as the writer knows, essentially the same methods of
instruction and subject-matter were followed in all of these groups.
The investigation was confined to Grades II, V, and VII in order to
reduce the labor and expense. As these grades are fairly representa-
tive of the three divisions of the elementary school, primary, inter-
mediate and grammar, it is not likely that different results would be
obtained in the other grades. The number of classes, the total en-
rollment, and the number of complete records in each experimental
group are given in Table I.
TABLE I. NUMBER OF CLASSES, TOTAL ENROLLMENT, AND NUMBER
OF COMPLETE RECORDS IN EACH OF THE EXPERIMENTAL GROUPS
Grade
I
II
Group
III IV
V
VI
Total
II
Number of classes
7
4
3
6
7
3
30
Total enrollment
348
201
138
288
324
162
1461
Complete records
240
111
103
208
224
89
975
V
Number of classes
2
2
8
4
4
4
24
Total enrollment
87
92
379
192
196
181
1127
Complete records
70
72
326
133
157
143
901
VII
Number of classes
3
3
5
5
2
18
Total enrollment
141
140
244
214
91
830
Complete records
119
109
186
159
86
659
The data collected. Through the cooperation of Superintendent
Peter A. Mortenson of the Chicago Public Schools and of certain
principals and teachers, the Bureau of Educational Research carried
on this investigation during the school year of 1920-21. Experi-
mental classes were organized in sixteen elementary schools.1 For
measuring the general intelligence of the pupils the Pressey
Primer Scale was used in the second grade, and the Illinois General
Intelligence Scale in the other two grades. The achievements of the
pupils in the second grade were measured by means of the Pressey
Scale of Attainment No. 1. In the fifth and seventh grades achieve-
ments were measured by Monroe's Standardized Silent Reading
Tests, Revised, Monroe's General Survey Scale in Arithmetic, and
Buckingham's Problem Scale in Arithmetic, Divisions 1 and 2. The
general intelligence tests were given only at the beginning of the ex-
periment, October 11, 1920. Form 1 of the achievement tests was
given at this time. Form 2 of the achievement tests was adminis-
tered at the close of the first semester, February 3, 1921. At the
close of the experimental period, May 11, 1921, Form 1 was again
given.
The tests were administered by the teachers who also scored the
test papers and entered the scores upon individual record cards.
This, however, was done only after all of the teachers involved in the
experiment had been called together for the purpose of acquainting
them with the tests. In this explanation several tests were adminis-
tered to the teachers in exactly the same way as they were to be ad-
ministered to the pupils. In addition detailed instructions were
supplied to the teachers for all steps of the work. Since no compari-
sons were made between the scores yielded by tests administered by
different teachers it is felt that this procedure in the administration
of the tests does not seriously affect the results of the experiment.
Limitations of the experiment to be kept in mind in interpreting
the results. A number of conditions must be kept in mind in inter-
preting the results. In the first place practically all of the teachers
who cooperated in the investigation had been accustomed to teaching
classes in two sections. A few, perhaps 1 in 20, had taught a class as
a single section but, so far as the writer was informed, no teacher
had had any experience in instructing a class in three sections. Thus,
it is altogether likely that most of the teachers had acquired a techni-
que of instruction which would prove more successful with a class
divided into two sections than with a class divided into either one
or three sections. Furthermore, there appears to be a prejudice
^hese sixteen schools were the following: Brown, Dante, Douglas, Fiske, Jenner,
Julia Ward Howe, Morse, Otis, Pullman, Scanlan, Shields, Spry, Van Vlissingen, Ward,
Wentworth, and West Pullman.
8
against the division of a class into three sections. Thus, there is
introduced a factor which may be expected to produce greater achieve-
ments in classes taught as two sections than in classes taught as
either one or three sections. The effect of this factor is, however,
unknown but it should by all means be recognized in interpreting
the results.
The instruments used for measuring the achievements of the
pupils do not measure all achievements resulting from instruction.
They can be considered to do no more than measure representative
samples of the achievements within their respective fields. Outside
of silent reading and arithmetic, in which tests were given, there are
many important achievements of which no attempt was made to
secure direct measurements. It is, of course, possible that the
measures of achievements secured correlate closely enough with all
other achievements resulting from instruction, that a sufficiently
accurate index of all achievements is furnished for judging the re-
lative effectiveness of the instruction in the different experimental
groups. However, convincing experimental evidence on the point
is wanting and, for this reason, due caution must be exercised in
extending the conclusions of this experiment to school subjects other
than silent reading and arithmetic, as well as to the more subtle
outcomes engendered by the social contacts of the school room.
Finally, it must be remembered that this investigation was
carried on in classes enrolling approximately 45 pupils. Hence
it does not necessarily follow that the conclusions would apply to
classes enrolling 20 to 30 pupils. It is possible that this change in
the size of class might produce a complete reversal in the conclusions.
Method of summarizing data. After rejecting records which
were incomplete and obviously inaccurate, the scores yielded by an
application of a test were combined in a total distribution for each
experimental group. Thus, a distribution was formed of the first
trial scores made on Monroe's Standardized Silent Reading Tests,
Revised, by the group of fifth grade pupils enrolled in "classes taught
as a single section during the first semester and as three sections dur-
ing the second semester." In the same way distributions of scores
were formed for each of the experimental groups and for each appli-
cation of the test. The gain in achievement during the first semester
was found by subtracting the average score for the first trial of a
test from the average score of the second trial. The gain for the
second semester was found by subtracting the average score of the
second trial from that of the third trial. A second measure of gain
was secured by following a similar procedure with the median scores
but these gains are not given in this report as they were, in general,
in agreement with those calculated from the average scores.
In calculating these gains no account was taken of the possible
non-equivalence of the different forms of the tests used. In fact no
accurate information concerning the equivalence of duplicate forms
is available except for Monroe's Standardized Silent Reading Tests,
Revised, and for Monroe's General Survey Scale in Arithmetic.
The duplicate forms of these two tests have been shown to be approx-
imately equivalent.2 However, since Form 1 of each test was used
twice and the average scores calculated from it were used both as
subtrahends and minuends, and since the gain for any plan of section-
ing is computed from both semesters the non-equivalence of Forms 1
and 2 of the tests used will not affect the comparisons of gains made
in the following tables.
The point scores yielded by the different tests are expressed in
terms of different units and from different zero points. Thus before
any combination from the results of the different tests can be made
it is necessary to express the gains in terms of a common unit. The
usual assumption in such cases is that the standard deviation of the
distribution of scores represents the same increment of ability for
one test as for another. On the basis of this assumption a total dis-
tribution for each test was secured by adding the distributions of
the six experimental groups within a grade. This was done for the
scores secured at each period of testing. The average of the three
standard deviations was assumed to represent the same increment of
ability for each test and was used as a divisor for reducing the gains
to the basis of a common unit. For example, during the first semester
the fifth grade pupils in Group I classes made a gain in arithmetic of
23.82 points. During the second semester they made a gain of 21.5
points. The average standard deviation of the arithmetic scores
in the fifth grade is 19.65. Using this as a divisor we secure as
quotients 1.21 and 1.09. In this manner the entries in Tables II,
III and IV were obtained. The two quotients whose calculation was
explained are given in Table III.
Tables II, III and IV are similar in structure and are to be read
in the same way. The gains for the different experimental groups
2Monroe, W. S. Illinois Examination, University of Illinois Bulletin Vol. 19, No.
9, Bureau of Educational Research Bulletin No. 6. Urbana: University of Illinois,
1921. 70 p.
10
*
w
p
<
*
o
P
£
0
o
w
co
W
K
H
fc
>— i
CO
Ch
P
C
rt
o
J
CO
w
0
o
1— 1
60
pel
w
w
0
<
w
pd
>
<
W
o
fe
pe!
to
P
P
w
w
I
h
h
13
c_
£
S
0
W
u
Q
<
h
<
bfi
cs
Ih
U
>
<
5
.22
.02
.006
O b oo co © vo O"^>co
^uoOI^ ^LOCNCO ^lo-^OO
O
^ O © ^ 0 CN OS CO^hcN
^ co r~- o ^h vo co £^veN'<t|oo
4-1
CO
3
.03
.13
-.02
o
i-H CO LT> r^ rH •* cn >o ■*
O r^ ^ c\ ^Tt< cm co i?vo cs ch
<_>••• cn-.. <_>•••
.5
o
N CO CO ON ^ h CO'— 'CN
^O ^ 0\ ^ CO «-0 ON ^w->coO
CO
CO
3
co ^ o
f
O
r~-r~-cN »-h © \o r^vocN
c^ CN t-h t-- ^ w-^ <n co ^ CN CO co
c
O
On O Ov © co c?\ Tfi co i— i
Qi< vo o Q ^ ^ o ^ n cn oo
4-1
CO
3
.15
-.22
.33
c
O
io Ol "t ©COCN VO N O
^ TjH CO MD ^ vo O CO ^ WO Tt- ^
o
^ cn co co vo © n <-i cn
^CA vo h- Q" vo *-n vo ^"VOOCO
4-1
CO
h
3
CO *0 "*
CO O CO
r
.5
O
i^ rt n ri wi ^ vo vo vo
^•tn^co ^^ocoO> ^w-iuo©
O
(NOVO O hO\ ^-. ^ "** ^
4 §*
O i— ' CO CO rf O
Tt< —1 O O CN CO
CN ^h i-h CN CN
c
a,
3
0
H
3
I
II
Average
III
IV
Average
V
VI
Average
bO
3
O
O
—H — <U
*
w
D
t
\H
w
X
H
£
HH
in
CL,
13
O
*
0
J
<
h
CO
*
W
w
ti
s
O
1— 1
U
t*
<z>
w
o.
W
X
O
w
<
h
w
fe
>
w
<
tf
w
§
u,
0
u,
e*
i— i
Q
fe
w
Q
W
H
fc
ft.
%
W
Q
O
<
CO
3
fe
I— i
h
<
£
0
w
3
w
>
w
M
I
o
<
A
t-i
V)
52;
►-H
<:
o
«*j
^H
On
NO
^^
00
3
'
*
.S
^ «^
■*
OO
CN
On
O
^ o
vo CO
"3
S^
so
CO
S10-
CN
Tt<
5^
CO ^f
0)
O
<
.s
o
00
On
o
oo
On
^ CN
W-> OO
'rt
Q oo
CO
vr>
cr ^
u-,
io
S^-
lo u->
o
w-^
co
v»^
9
CO
O
o
6 c
3
i
•
1
-Q '35
2 >
c
w->
>-o
LO
,__,
«*
oo
^-v ^
OO CN
*rt
CO n.
o>
^o
<N "1
CI
«-«
3 *^
vo -<f
a
^
^
■M C
'C cn
.s
w-»
■*
VO
^~. "*
<-r.
lO
.-. On
Tt^ CN
<
"rt
^ V£)
•—i
CN
CT °°
C4
^n
S^-
O "*
O
w
r
i — ' •
««N
CO
t^
vo
CO
o
vo
T— 1
JB o
3
•
•
r
J= '«
.s
CO
o
r~^
oo
lt,
r-»
O
O ^o
2 •>
"rt
3 ^
<N
S ""■
o
o
g°!
f- -*1
a
.
r
.— M
.s
w-,
-f
o
On
oo
^
w->
^ o
'C T~l
"rt
£T o
cs
CO
Q r~"
vo
r^-
gt
IH CO
<
o
s — '
v~'
'-^*
oo
^
"*
o
5
•
y
g
On
LO
CN
CN
h>.
o
^^ On
^ f^.
*3
?> <=>.
pN
OS
S^
00
VO
S°!
CN CN
JS
O
V""' t_(
*j
*c
<
.s
,_,
oo
w>
T_|
T— 1
VO
-*
C^ CN
CN -<fri
'rt
C ^
CI
r^
^ ^
w->
U")
S1^
a
W^h'
*4-i
o
LO
Wi
CN
CN
c
o
3
|*
d
,_
r-
'<*'
CO
«0
Tt<
VO
co *n
'3
S ""•
CO
CN
CN ^
tN
^
s^
O CO
a
'
.5
_ o
oo
^
ON
O
o
W-!
CO On
o
'5
^r co
1^1
^
dT ^
CO
CO
gt
r^ vo
O
^^
v-~'
<**•
r*.
1_H
o
r^-
T— I
^
a>
5
•
I
P*
.5
^ on
c>>
w-,
-^ ^
VO
Tt<
On
CO co
bfl
'rt
O <*>
O
»-j
CN »
CO
CO ^
CN CN
a
W 1
l"
1*
f
c
oo
'^i
CN
s-~, ^
r^
CO
^ ^
CO CO
*c3
c^ ^°
<-r,
vO
w,
CN
gCN
O vo
O
w r
^
**.,
CO
o :
O
r-l
VO
CO
t^
CO
5,
t~-
I-
CN
to
w->
Tf<
d
s
CO
£ (
<D
(U
<L>
c
H
bfl
W)
Ofl
5
C
«-
M
P-H
hi
u
i— i
>
a>
>
_ cS
C
>
<
>
<
>
<
^ o
o
-a
o
4J
*
*
[I]
p
3 OT
£ w
w 8
t-H
« W
w o
o. <
X *
w w
h >
z
w
w
fa
fa
Vfca
"*«
»o
u
bQ
s
|*
r
_c
^ ^
-f
Os
^ O
CN »H
*c3
S^
CO
CO
g^
W-> io
u
o
>
<
_c
•~s °°
-f
|
^^ ^
oc u-i
"rt
«
CN
^ CO
CO CO
o
'*-"'
*-'
Vi-T
NO
CO
CO
CN
CN
6 c
3
1
JJ o
.O 'io
8 •>
.5
^^ r--
oc
oo
CO
OO NO
O
CO ""J
r^
^
?T9
v *
o o
-C T3
4-. C
'£ cn
.S
r^
t^
CN
^ ^
CO OO
<
'5
CT <*>
o
CN
~* ^
CN CN
«-i-j
,_
OO
co
CN
o
J§
Q
r
l"
-£> '£
.5
—. ^
oc
00
-*
co On
2 >
"3
CO .
LT,
CN
CN O
VO CN
*a
O
w 1
^
.ti w
.5
T*»
o
r»
OO
r- i-l
'u. '"",
C3
dr ^
CS
o
^? "*
O CN
<
o
v — '
f
» — ' *
f
<-<-J
CN
OO
CN
3
■
CJ
1
u
E
43
.5
.-. <*>
cc
_
^-. <^
NO On
'rt
S^-
»H
CN
g^
CN vo
o
♦J
'C
<
.5
x-v "**
»-H
CO
,_H
i— i i-l
*rt
r-
"*.
CT «*5
W> t+i
o
^^
V ' *
vi.:
r- 1
O
c
o
5
CO
|
CO
1*
11
.5
,—. CO
ts
OO
|
^H SO
"3
o
3 °i
cn
*o
S^-
On NO
at
.s
f>
t^
r^
CN
y-> r^
o
U
"3
o
CT ^
©
CN
C-~" ^
^ CO
LO
^
o
ON
o
CN
u
3
l"
|*
0*
c
^-v. ***
_
-*
o
OO Tf
bo
c
'5
£2- ""•
^
3^
CN 'f
O
r
r
.s
_ o
^
u-»
^-v ^
lO NO
'rt
fa7 'sD
CO
»— i
TJH T-H
a
w l'
r
"^ r
«-. CO
O —
ON
»
VO
ON
• O,
I— 1
©
OO
W-i
O 3
1-H
1 — 1
T-H
T— 1
£ £
<L)
<u
ex
fcfl
oc
3
o
^
HH
e«
£
> §
u
l>
i — ; <U
C
3
>
<
>
<
o
T3
O
XI
V
are arranged in pairs. In Table II, the gain for Group I on Test 1
when taught in classes of one section is 1.42. When taught in three
sections the gain is .55. The gain for Group II classes when taught
in one section is .90 and when taught in three sections it is 1.11.
The Group I classes were taught in one section during the first
semester but the Group II classes were taught in one section during
the second semester. This difference in time is largely responsible
for the differences in the size of the gains.
Interpretation of results. In interpreting the gains in Tables
II, III and IV it is necessary to keep in mind both the constant and
variable errors of measurement which are involved in the original
data as well as the chance variations in the gains due to sampling.
The variable errors of measurement in the original data depend upon
the reliability of the tests used. If we assume a coefficient of re-
liability3 of .84 for Test 1, it can be shown that the probable variable
error of measurement is approximately .25 when expressed in terms
of sigma which is the unit used in expressing the gains in Tables II,
III, and IV.4 A probable error of measurement of .25 means that
the scores for 50 percent of the pupils involve variable errors which
are less than .25. For the other 50 percent the variable errors will
be greater than .25. The presence of variable errors of measurement
affects the average of the scores as shown by the following formula
in which N is the number of scores upon which the average is based.
P.E.m
P.E.
M average —
Substituting in this formula for Group I, we find the probable error
of measurement of the average (P. E. m average) is .017; for Group II
it is .024. The gain 1.42 is the difference between the two averages.
sThe coefficient of reliability assumed here is probably higher than would be found
for this test. When based upon the scores of a single grade, the coefficient of re-
liability for Monroe's General Survey Scale in Arithmetic is approximately .85. For
Monroe's Standardized Silent Reading Test 1, Revised, the coefficients of reliability
are approximately .75 for rate and .65 for comprehension. For Test II they are
about .08 higher. The reliability of the other tests is not known.
4The formula for the probable variable error of measurement is
In this case (7 = 1.
P.E.M = .6745 (7 j/i.
14
The probable error of the difference of the two averages is given by
the following formula
v
P.E.Dif.=\p.E.i + P.E.2
In this formula P. E.i and P. E.2 stand for the probable errors of
measurement of the two averages whose difference is taken. In this
case P. E.i is equal to P. E.2 since we have used the average of the
standard deviations of the several distributions in reducing the gains
to a comparable basis. Applying the above formula, we find that
the probable variable error of measurement to be associated with 1.42
is .024 and with .90 is .034. The formula for the probable error of
the sum of the two averages is the same as that for their difference.
Hence we may calculate the probable error of measurement to be
associated with the average gain 1.16 by taking one half of the
probable error of measurement of the sum of the two averages. The
P. E.m of the average gain 1.16 is .020.
Since the probable variable error of measurement depends only
upon the magnitude of the standard deviation of the scores and the
number of scores, we will obtain the same result for the gains of these
two groups when taught in classes of three sections. The probable
variable error of measurement of the difference (.33) may be calcu-
lated by the formula given above. It is .028.
This probable variable error of measurement is relatively small
in comparison with the gain .33, and in general when an average or
difference is three or four times its probable error it can be considered
significant. Hence, if we had to consider only the variable errors of
measurement we would be justified in asserting that this difference
was significant and could not be due to the presence of these errors
in our original data. However, it should be remembered that we
have been liberal in the estimate of the coefficient of reliability.
It is likely that the true value of the probable error is much
larger.
Since all gains are expressed in terms of a common unit the prob-
able variable errors of measurement found for the entries under Test 1
will apply also to Tests 2, 3, and 4 provided we assume the same co-
efficient of reliability for these tests. The probable variable error of
measurement of the average is affected by the number of cases from
which the average is computed. Hence for the gains made by other
groups it will be slightly greater, since the number of scores is smaller
15
for those groups. In Table III the number of scores in Groups III
and IV is slightly larger. Hence a smaller probable variable error
of measurement will be found, but for all of the other groups it will
be larger than the one which we have considered in detail. In several
cases the difference in gains is so small that when compared with the
probable variable error of measurement it cannot be considered as
significant.
In addition to the variable errors of measurement, it is necessary
to consider the chance variations in the gains due to sampling even
when the sample has been chosen without bias. The probable error
of an average due to sampling is given by the following formula
0"dist.
P. E.s = .6745
Since sigma (a) has been used as a unit in terms of which the gains
are expressed, Cdist. equals 1 for our calculations.5 In the case
of Group I, P. E.s=.044. The gain 1.42 is the difference between
two averages and hence it would be necessary to apply the formula
for the probable error of the difference of the two averages. This
being done we find that the P. E.s to be applied to the gain
(1.42) is .062. In case of Group II, P. E.s= .064 and for the differ-
ence between the two averages it is .090. For the average 1.16,
P. E.s = .055. For the difference .33, P. E.s = .078.
When we consider the probable error due to sampling (.078)
in addition to the probable variable error of measurement (.028) the
difference (.33) would probably be significant and indicate a slight
superiority in achievement as measured by Test 1 for the pupils
taught in classes of one section, provided no other errors could be
considered to affect this difference. It is, however, necessary to
consider the constant errors of measurement. Their exact magni-
tude can not be known but their presence is evident. For example,
in Table II the gains on Test 1 for Groups I and II when taught as
one section are 1.42 and .90 respectively. The gain of 1.42 was made
during the first semester and is the difference between the first and
second trial scores. The gain of .90 was made during the second
semester and is the difference between the second and third trial
scores. Due to the pupils becoming acquainted with the tests and
5Thls is not the true value of a. The variable errors of measurement tend to in-
crease the value of the obtained sigma. The relation is given by the formula
^true = 0" obtained VTn
16
the testing procedure, both of these gains involve a constant error.
This tends to make the obtained gain larger than the true gain, but
as the practice effect of the second trial scores over the first trial
scores is larger than that of the third trial over the second trial scores,
it is reasonably certain that the gain for Group I (1.42) contains the
larger constant error. The gains made by these two groups when
taught in classes of three sections are .55 and 1.11. Both of these
gains involve a constant error but in this case the larger constant
error is found in the gain for Group II. Each of the average gains
for these two groups (1.16 and .83) includes a relatively large constant
error but the two errors are much more nearly equal than those in-
cluded in the gains for each group separately. Hence, we are probably
justified in considering their difference (.33) to be relatively un-
affected by the presence of constant errors in any of our original data.
However, the neutralization of the constant errors which seems
plausible, if not probable, in the case we have just considered does
not appear to have taken place in a number of the other differences
in this group of tables. With the exception of Groups I and II in
Table II some of the differences are positive but others are negative
for each pair of groups, although it is not impossible that a given plan
of sectioning a class might be more effective in one subject than in
another. The variations in the signs of the differences do not appear
to occur in such a way as to justify this explanation of the negative
gains. It is likely that a constant error was introduced in certain
groups of scores which was not neutralized in the difference. For
example, Group VI is shown by Test 2 to have made a larger gain
during the second semester when taught in two sections. Each of
the other tests shows a smaller gain for this semester and this we
should expect as the gain is the difference between the second and
third trial scores. The probable explanation of this condition is that
in some way a constant error was introduced in one set of scores yield-
ed by Test 2 for Group VI. An examination of Tables III and IV
reveals several similar instances. Hence, we are forced to the con-
clusion that at least certain sets of scores involve an unknown con-
stant error. The fact that this happened in certain cases tends
to make one suspicious of the presence of an unknown constant error
in other sets of scores even though evidence of its presence is lacking.
It is perhaps significant that in the case of the differences in
gains between classes taught as one section and classes taught in
three sections, eight gains are positive while six are negative. The
17
same situation prevails with respect to the gains made by classes
taught in one section when compared with the gains made by classes
taught in two sections. For classes taught in two sections compared
with classes taught in three sections, we have records only in the
second and fifth grades. Four of the differences are positive while
five are negative.
Conclusion. The facts presented in Tables II, III, and IV and
the errors they include appear to justify the conclusion that there is
no evidence of greater achievements being made by pupils when
taught in classes organized on the basis of one plan of sectioning than
in classes organized on a different plan of sectioning. Since the teach-
ers were more experienced in teaching classes in two sections and
probably preferred this plan of organization this condition might
appear to mean that the division of classes into two sections was the
least efficient of the three plans. However, in the writer's judgment
this conclusion is not justified. The most obvious inference, in his
opinion, to be drawn from the data of this experiment is that the
educational tests used do not yield sufficiently accurate and precise
measures of achievement to make possible the determination, under
the conditions of this experiment, of the best method of sectioning
a class. It is likely that the differences in the gains made during a
period of less than a semester are not large. This being the case it is
necessary either to extend the experimental period or to secure more
precise measures of achievement. The magnitude of the probable
variable error of measurement of the difference and also of the prob-
able error due to sampling can be decreased by increasing the number
of pupils in the experimental groups, but the constant errors are not
affected by any increase in the number of cases. Certain constant
errors are neutralized in the differences but, as we have shown, other
constant errors which occur in only certain sets of scores were not
eliminated. The presence of these constant errors is due to imper-
fections in the educational tests used. Therefore, it appears that
until our instruments for measuring achievements of school children
are materially improved we cannot expect such educational ex-
periments as the one described in this report to lead to reliable
conclusions.
J