CO >
aicd TJ <
u< OU_164832>3
= CQ -< CO
OUP786 13-S-75 MKQOQ,
>!',<, '***
OSMAN1A UNIVERSITY LIBRARY
Call No. ff 10 - 00 3 M Accetsion No. * 8 3
IvJ
Author
.This .book should be returned on r before the date last marked below.
MATHEMATICAL STATISTICS
By
8. S. WILKS
UNIV
MATHEMAT rr *> ISTICS
Ijp S. S. Wllka
ERRATA
Page
Line
In Place of
Read
Page
Line
In Place of
Road
6
15
(2)
(2')
69
4
Yrr
Y^
9
4
x 1 > -co
Xj > -CD
73
top
Distribution
Distributions
16
2
E
E ,
79
5b
vandom
random
8l
7
<5 rr/n
A rr l\Tr\
18
6
add
and
vj 1
J
/
/I1
22
ib*
fU^Xg,...,*^
FU^x,,,...^)
81
4
(3 cr/n
<5<r/fH
81
f)
2 / 2-2
2 / -2
31
3
X
dor
/
/riti
X
X
1
1
42
9
00 00
00 GO
81
7b
n
n
-CD -OO
-co -co
80
1 Ob
(la. Jflb.)
(a?)(Ib 2 )
45
10
2.93-
2.92.
90
13
n-r k -i
n-i^
50
1
6 m
F n
90
16
n-r k -i
n-r k
.
( 2 )
50
2
n
"]/ A
pq
F PQ
90
21
n-r.-i
n-r.
P 2
(2)
K
k
50
3
J2_L.
o o
p
n' A'
91
Ib
f~ 2
f
1-T
l
1
3
4n 2 A ?
4nA 2
9?
5
f(x )
f(x Q )
50
14
1
1
4
93
3b
** 130
473
4 00
50
15
400
l
4
9*
l ^
(2)
n-r k -l
n-r k
55
12
(k+x)(k+x-i ). . .
(k+x-i )(k+x-2)...k
9 C >
()
h+l 6
h+i<J
...(k+1)
98
7
diatritiution
distributions
1 z
x x -l
X-l X-2x
1 00
1
A o
fl
a n
55
I J
( 1+^M 1-*- k ) . .
(i+ k )d+ k )..-
k
(n)
... i
107
3b
n i
1 i
_ d[
_ h
1 16
4b
(x ia- a i )(x K a j)
A lj (x iot -a l )(x ja -a 1
55
19
( i +d ) h
( 1 4-d ) d
56
1b
Vh
h
1 19
1
f n-l . .
r / n ~ 1 , 1 \
( s + 1)
1 ( p + 1 )
2TT
56
2b
^
^o
1 19
r
r(~^- + i )
n- n -^ + D
120
4
V^^ A 12
if^ 1 ^ A, 2
57
2
3
V^
h
5
ITT
Vtr
7
* As counted from the bottom
ERRATA
Page
Line
In Place of
Read
Page
Line
In Place of
Read
120
3b
n(x-a)
Hn(x-a)
204
9
from 1 to
from
121
1
U-1,2)
(1-1, 2, ...,k)
204
9
n-i
3-1
^ton
121
4
characteristic
moment generating
128
Ib
(c)
(e)
207
1
( 3n+3 )
(n+l)
1 X .
207
16
Pr(uu) ^
Pr(u^u)
1 36
2
~ ^T\x~a )
~~ ~r>\ x~~a )
21
1
N
n
1 X *7
/r-u
i a 5 iog p
1 a 5 iog P
21
13
(-O g
1 3 f
O U
n ae*
n ae
210
15
(-D g
( . n-h-g
148
iob
significant
significance
215
1 i
n 5 ,
n J
1 52
ib
w
CO
"
*
220
1
in terms
in terms of
1 57
2b
dy
dy
P
p
. A __
^
221
iob
m,n;pN,n
158
8
y + bx
y - bx
223
1
maximizing p,
maximizing p,
162
ib
(J)( 6 6 )
<{>( 6 6 )
1 2
T 2
225
12
of N 1
of N
165
2
(3)
(1 " M>
t 1 + ^
226
5
(5-6)
(5.5)
165
3
(M-N)
(M-i-N)
226
7
(5.12)
(5.5)
227
12
dx.
dx
166
19
the likelihood
is the likelihood
1
lot
167
2
y
y a
229
;>b*
X i<x
x l
^
238
^
a. .
| a |
167
7
a
a
ij
lj
245
13
y
y
168
Ib
(1)
( j )
k
3
174
18
, C
A C
254
1
polynimial
polynomial
u uq
u uq
256
1 ib
ind e t e rmi nant
indeterminate
176
2b
j_^-
n 4n
257
19
|| A .1P|
HA ip l!
181
ib
Y , and Y <k
Y , and'Y ,
J. -k:
258
13
(i)
1
183
6
Y Y
ij.' i.k'
V Y
258
3b
j~th column
i-th column
183-
7
Y V
.Ik' I i..'
V "^
JFL. *.
Y .j.' Y ..k>
Y "j.' Y .'.^
260
17
The canonical
correlation
The correlation
185
ib
assumed
not assumed
'71
19
Cochran, G. C.
Cochran, W. G.
186
1
different from
zero
273
18
Valewis
Valeurs
zero
186
8b
3
S
o. .
. . .
188
4
minimizing
maximizing
192
ib
1 - 1 , 2, . . ., a
i - i, 2, ..., r
193
8
the R^ are
the C . are
194
5
r + 1
r -- 2
197
14
R i,"
S i,oo
201
5b
P(r 11 )
P(r )
^
1J
** As counted from the bottom of footnote.
MATHEMATICAL STATISTICS
By
S. S. WILKS
PRINCETON UNIVERSITY PRESS
Princeton, New Jersey
1947
Copyright, 1943, by
PRINCETON UNIVERSITY PRESS
PREFACE
Moat of the mathematical theory of statistics In Its present state has been
developed during the past twenty years. Because of the variety of scientific fields In
which statistical problems have arisen, the original contributions to this branch of
applied mathematics are widely scattered In scientific literature. Most of the theory
still exists only In original form.
During the past few years the author has conducted a two- semester course at
Princeton University for advanced undergraduates and beginning graduate students in which
an attempt has been made to give the students an Introduction to the more recent develop-
ments in the mathematical theory of statistics. The subject matter for this course has
been gleaned, for the most part, from periodical literature. Since it is impossible to
cover in detail any large portion of this literature in two semesters, the course has
been held primarily to the basic mathematics of the material, with just enough problems
and examples for illustrative and examination purposes.
Except for Chapter XI, the contents or the present set of notes constitute the
basic subject matter which this course was designed to cover. Some of the material in
the author's Statistical Inference (1937) has been revised and Included. In writing up
the notes an attempt has been made to be as brief and concise as possible and to keep to
the mathematics with a minimum of excursions into applied mathematical statistics problems.
An important topic which has been omitted is that of characteristic functions of
random variables, which, when used in Fourier Inversions, provide a direct and powerful
method of determining certain sampling distributions and other random variable distribu-
tions. However, moment generating functions are used; they are more easily understood by
students at this level and are almost as useful as characteristic functions as far as
actual applications to mathematical statistics are concerned. Many specialized topics are
omitted, such as intraclass, tetrachoric and other specialized correlation problems,
aeml-lnvariants, renewal theory, the Behrens -Fisher problem, special transformations of
population parameters and random variables, sampling from Poisson populations, etc. It is
the experience of the author that an effective way for handling many of these specialized
topics is to formulate them as problems for the students. If and when the present notes
are revised and issued in permanent form, such problems will be Inserted at the ends of
sections and chapters. In the meantime, criticisms, suggestions, and notices of errors
will be gratefully received from readers.
Finally, the author wishes to express his indebtedness to Dr. Henry Scheffe,
Mr. T. W. Anderson, Jr. and Mr. D. F. Votaw, Jr. for their generous assistance in pre-
paring these notes. Most of the sections in the first seven chapters and several sections
in ChapteSX and XI were prepared by these men, particularly the first two. Thanks are
due Mrs. W. M. Weber for her painstaking preparation of the manuscript for lithoprinting.
S. S. Wilks.
Princeton, New Jersey
April,
TABLE OF CONTENTS
CHAPTER I, INTRODUCTION 1
CHAPTER II. DISTRIBUTION FUNCTIONS
52.1 Cumulative Distribution Functions 5
2.11 IMivariate Case 5
2.12 Blva rlate Case 8
52.13 k-Variate Case 11
2.2 Marginal Distributions 12
52.3 Statistical Independence 13
52.1* Conditional Probability 15
52.5 The Stleltjes Integral 17
52.51 Univarlate Case 17
52.52 Blvarlate Case 20
52.53 k-Variate Case 21
52.6 Transformation of Variables 23
2.61 Unlvariate Case 2k
2.62 Blvarlate Case 2k
52.63 k-Vtetrlate Case 28
52.7 Mean Value 29
52.71 Univarlate Case ; Tchebychef f f s Inequality 30
52.72 Bivariate Case 31
52.73 k-Variate Case 32
52.71* Mean and Variance of a Linear Combination of Bandom Variables 33
52.75 Covariance and Correlation between two Linear Combinations of ''Random
Variables 3^
52.76 The Moment Problem 35
52.8 Moment Generating -Ftinct ions 36
52.81 Univariate Case 36
52.82 Multivariate Case 39
52.9 Regression ". ^0
52.91 Regression Functions Uo
vli
viil TABLE OP CONTENTS
2.92 Variance about Regression Functions fci
V/52.93 Partial Correlation ' * . k2
,/2.94 Multiple Correlation . . . . . 42
CHAPTER III. SOME SPECIAL DISTRIBUTIONS
3.1 Discrete Distributions 47
V/ 3.11 Binomial Distribution 47
3.12 Multinomial Distribution 50
^3-13 The Poisson Distribution 52
3-14 The Negative Binomial Distribution 54
3.2* The Normal Distribution 56
3.21 The Uhivariate Case 56
3.22 The Normal Bivariate Distribution 59
3.23 The Normal Multivariate Distribution 63
3-3 Pearson System of Distribution Functions 72
3.4 The Gram-Charlier Series 76
CHAPTER IV. SAMPLING THEORY
4.1 General Remarks . . 79
4.2 Application of Theorems on Mean Values to Sampling Theory 80
4.21 Distribution of Sample Mean 81
^.22 Expected Value of Sample Variance 83
U.3 Sampling from a Finite Population 83
4.4 Representative Sampling 86
4.41 Sampling when the p are known 87
4.1*2 Sampling when the o-^ are also known 88
4,5 Sampling Theory of Order Statistics 89
4.51 Simultaneous Distribution of any k Order Statistics 89.
4.52 Distribution of Largest (or Smallest) Variate 91
4.53 Distribution of Median 91
4.54 Distribution of Sample Range 92
4.55 Tolerance Limits 93
4.6 Mean Values of Sample Moments when Sample Values are Grouped; Sheppard
Corrections 94
4.7 Appendix on Lagrange's Multipliers 97
CHAPTER V, SAMPLING FROM A NORMAL POPULATION
55.1 Distribution of Sample Mean 98
55.n Distribution of Difference between Tiro Sample Means 100
55.12 Joint Distribution of Means In Samples from a Normal Blvarlate
Distribution , 100
55.2 The ^-distribution 102
55.21 Distribution of Sum of Squares of Normally and Independently
Distributed Variables 102
55*22 Distribution of the Exponent In a Multivarlate Nonnal Distribution . . 103
55.23 Reproductive Property of xf-Dlatrlbutlon MO?
55. 2U Cochran f s Theorem 103
55-25 Independence of Mean and Sum of Squared Deviations from Mean In
Samples from a Normal Population 108
55.3 The "Student" t -Distribution 110
55. ^ Snedecor's F-Dlstrlbutlon 113
55.5 Distribution of Second Order Sample Moments In Samples from a Blvarlate
Normal Distribution 116
55.6 Independence of -Second Order Moments and Means In Samples from a Nonnal
Multlvarlate Distribution 120
CHAPTER, VI. ON THE THEORY OF STATISTICAL BSTPftTIOK
56.1 Confidence ;Intervals and Confidence Regions 122
56.11 Case In which the Distribution Depends on only one Parameter 122
56.12 Confidence Limits from Large Samples 127
56.13 Confidence Intervals In the Case where the Distribution Depends on
Several Parameters 130
56.14 Confidence Regions 132
56.2 Point Estimation; Maximum Likelihood Statistics 133
56.21 Consistency 133
56.22 Efficiency 134
56.23 Sufficiency 135
56. 2k Maximum Likelihood Estimates 136
56.J Tolerance Interval Estimation
56. U The Pitting of Distribution Functions
CHAPTER VII. TESTS OF STATISTICAL
57.1 Statistical Tests Related to Confidence Intervals 11*7
57.2 Likelihood Ratio Tests 130
TABLE OP CONTENTS
57-3 The Neyman- Pears on Theory of Testing Hypotheses 152
CHAPTER VIII . NORMAL REGRESSION THEORY
58.1 Case of One Fixed Varlate 157
58.2 The Case of k Fixed Varlates 160
58.3 A General Normal Regression Significance Test . , 166
58. k Remarks on the Generality of Theorem (A), 58.3 171
58.41 Case i 171
58.^2 Case 2 172
58.43 Case 3 173
8.5 The Minimum of a Sum of Squares of Deviations with Respect to Regression
Coefficients which are Subject to Linear Restrictions 17^
CHAPTER IX. APPLICATIONS OF NORMAL REGRESSION THEORY TO ANALYSIS OF VARIANCE PROBLEMS
59-1 Testing for the Equality of Means of Normal Populations with the Same
Variance 176
59.2 Randomized Blocks or Two-way Layouts 177
59-3 Three-way and Higher Order Layouts; Interaction 181
59 ** Latin Squares 186
59.5 Graeco-Latln Squares 190
59.6 Analysis of Variance In Incomplete Layouts 192
59-7 Analysis of Covarlance 195
CHAPTER X. ON COMBINATORIAL STATISTICAL THEORY
510.1 On the Theory of Rims 200
510.11. Case of Two Kinds of Elements 200
510.12 Case of k Kinds of Elements 205
510.2 Application of Run Theory to Ordering Within Samples 206
510.3 Matching Theory 208
510.31 Case of Two Decks of Cards 208
510.32 Case of Three or More Decks of Cards 212
510.4 Independence in Contingency Tables 213
510.U1 The Partitional Approach 213
10.^2 Karl Pearson's Original Chi-Square Problems and its Application
to Contingency Tables 217
10.5 Sampling Inspection 220
510.51 Single Sampling Inspection 221
10.52 Double- Sampling Inspection 22U
CHAPTER XI. AN INTRODUCTION TO MUUTIVARIATE STATISTICAL ANALYSIS
1 . 1 The Wishart Distribution 226
511.2 Reproductive Property of the Wishart Distribution 232
511*3 The Independence of Means and Second Order Moments in Samples from a
Nonnal Multivariate Population 233
511.^ Retelling's Generalized "Student" Test 23^
511.5 The Hypothesis of Equality of Means in Multivariate Normal Populations , 238
511.6 The Hypothesis of Independence of Sets of Variables In a Normal
Multivariate Population
511.7 Linear Regression Theory in Nonnal Multivariate Populations
511.8 Remarks on Multivariate Analysis of Variance Theory 250
511.9 Principal Components of a Total Variance 252
511.10 Canonical Correlation Theory 257
511.11 The Sampling Theory of the Roots of Certain Determinantal Equations . . . 260
511.111 Characteristic Roots of One Sample Variance- co variance Matrix . . 261
511.112 Characteristic Roots of the Difference of Two Sample Variance-
covariance Matrices 265
511.113 Distribution of the Sample Canonical Correlations 268
LITERATURE FOR SUPPLEMENTARY READING 271
INDEX 279
CHAPTER I
INTRODUCTION
Modern statistical methodology may be conveniently divided Into two broad
classes. To one of these classes belongs the routine collection, tabulation, and des-
cription of large masses of data per ae, most of the work being reduced to hlgjh speed
mechanized procedures. Here elementary mathematical methods such as percentaglng,
averaging, graphing, etc. are used for condensing and describing the data as JLt Is.
To the other class belongs a methodology which has been developed for making predictions
or drawing inferences, from a given set or sample of observations about a larger set or
population of potential observations. In this type of methodology, we find the mathe-
matical methods more advanced, with the theory of probability playing the fundamental
role. In this course, we shall be concerned with the mathematics of this second class
of methodology. It is natural that these mathematical methods should embody assumptions
and operations of a purely mathematical character which correspond to properties and
operations relating to the actual observations. The test of the applicability of the
mathematics in this field as in any other branch of applied mathematics, consists In
comparing the predictions as calculated from the mathematical model with what actually
happens experimentally.
Since probability theory is fundamental in this branch of mathematics, we
should examine Informally at this point some notions which at least suggest a way of
setting up a probability theory. As far as the present discussion is concerned, perhaps
the best approach is to examine a few simple empirical situations and see how we would
proceed to idealize and to set up a theory. Suppose a die is thrown successively. If
we denote by X the number of dots appearing on the upper face of the die, then X will
take on one of the values l, 2, 3, **> 5, 6 at each throw. The variable X jumps from
*For an example of such a comparison, see Ch. 5 of Bortkiewlcz 1 Die Iterationen. Springer t
Berlin, 191?.
I JLNTKUbUUT ION
value to value as the die la thrown successively, thus yielding a sequence of numbers
which appear to be quite haphazard or erratic in the order in which they occur. A sim-
ilar situation holds in tossing a coin successively where X is the number of heads in a
single toss. In this case a succession of tosses will yield a haphazard sequence of
o's and 1 f s. Similarly, if X is the blowing time in seconds of a fuse made under a
given set of specifications, then a sequence, let us say of every N th fuse from a pro-
duction line will yield a sequence of numbers (values of X) which will have this char-
acteristic of haphazardneaa or randomness If there is nothing In the manufacturing oper-
ations which will cause "peculiarities'* In the sequence, such as excessive high or low
values, long runs of high or low values, etc. We make no attempt to define randomneaa
in observed sequences, except to describe it rougjhly as the erratic character of the
fluctuations usually found In sequences of measurements on operations repeatedly per-
formed under "essentially the same circumstances", as for example successively throwing
dice, tossing coins, drawing chips from a bowl, etc. In operations such as taking
fuses from a production line and making some measurement on each fuse (e. g. blowing
'%
time) the. resulting sequence of measurement^ frequently haa "peculiarities" of the kind
mentioned above, thus lacking the characteristic of randomness. However, it haa been
found that frequently a 'state of randomness similar to that produced by rolling dice,
drawing chips from a bowl, etc., can be obtained In such a proceaa aa mass production
by carefully controlling the production procedure.*
Now let us see what features of these empirical sequences which ariae from
"randomizing processes can be abstracted Into a mathematical theory- -probability
theory. If we take the first n numbers In an empirical sequence of numbers X 1 , X 2>
Xj, . . . , ^ . . . . , there will be a certain fraction of them, aay P n (x), leaa
than or .equal to x, no matter what value of x la taen. For each value of x, oF n (x)i .
We shall refe* to F n (x) as the empirical cumulative distribution function of the numbers
^/^g^X,,*..,^... As x Increases, P n (x) will either increase or remain constant.
Itis a matter of experience that as n becomes larger and larger F n (x) becomes more
and more stable, appearing to approach some limit, aay P^U) for each value of x.
*Shewhart has developed a statistical jnethod of quality control in maaa production engin-
eering which is essentially a practical empirical procedure for approximating a atate of
randomness ( statist icrfl control . to use Shewhart's term) for a given measurement in a
sequence of Articles from a production line, by successively identifying and eliminating
causes of peculiarities In the sequence back In the materials and manufacturing oper-
ations.
I. INTRODUCTION
If any subsequence of the original sequence Is chosen "at random" (i.e. according to any rule
which does not depend on the values of the X'a) then a corresponding ? n (x) can be defined
for the subsequence, and again we know from experience that as n increases, F n (x) for the
subsequence appears to approach the same limit for each value of x as in the original
sequence.
Entirely similar experimental evidence exists for situations in which the em-
pirical sequences are sequences of pairs, triples, or sets*of k numbers, rather than
<*-*
sequences of single numbers. For example, a sequence of throws of pairs of dice would
give rise to a sequence of pairs of numbers; the resistance, capacity, and inductance of
each relay in a sequence cf telephone relays from a carefully controlled production line
would yield a sequence of triples of measurements. In considering a random sequence of
pairs of numbers (X n , X 21 ), (X 12 , X 22 ), . . . , (X 1n , X 2n ), . . . , we can let F n (x 1 ,x 2 )
be the proportion of pairs in the first n pairs in which the value of X 1 is less than or
equal to x 1 and the value of X 2 is less than or equal to x 2 . We need not list all of the
properties of P n (x 1 , x^), for they are straightforward extensions of those of F n (x) con-
*.,*
sidered above. The important point here is that as n increases, experience indicates
that F n (x 1 , x 2 ) appears to approach some limit F^ (x 1 , x g ) for each value of x 1 and of x g .
In particular, suppose we group the numbers of an empirical random sequence
X 1 , X 2 , . . . , X^ . . . (with empirical cumulative distribution function F n (x)) into
pairs (or samples of two numbers), so as to make a new sequence of pairs of numbers
(X^ X g ), (X,, X^), . . . , (Xgn-i' X 2n^' Aa before, we have an empirical cumu-
lative distribution function F n (x 1 , x g ) for this sequence of pairs. It is an experimental
fact that as n becomes larger and larger, F (x- , x 2 ) behaves more and more nearly like the
product F n (x 1 ) F n (Xp)* A similar situation is true for sequences of sample'^ of three or
more numbers. As we shall see later, 'it is this product property that suggests a
way to set up a mathematical theory of sampling. X s1
The matter of F n (x) appearing to approach some function F^ (x) as n increases is
, u *3*
purely an empirical phenomenon, and not a mathematical one, but it suggests a way of set-
ting up a mathematical model corresponding to any randomizing process which, upon repeated
application will yield an empirical sequence of numbers. We postulate the existence of a
function F(x)^(the properties of this function are given in 2.11) to serve as a mathe-
*
matical model for F^x). In some situations such as coin tossing, dice throwing, etc.,
a complete numerical specification of F(x) can be proposed by combinatorial and other a
priori considerations. In other situations of a more purely statistical nature it may be
Impossible to specify F(x") beyond a particular functional form involving certain parameters.
I. INTRODUCTION
In attempting to relate the behavior of the empirical cumulative distribution
function F n (x) to the mathematical abstraction P(x) one encounters at least two diffi-
culties: One is common to all mathematical theories of physical (chemical, biological,
sociological) phenomena, employing limits: the mathematical process of passing througji
an infinite number of steps is physically unrealizable, and is often impossible even as
a "thought -experiment". For example, let the reader consider the notion of mass or
charge density in the light of the fact that mass and charge are discrete. The other
difficulty is peculiar to probability theory in that the theory does not assert that
nMffi^n^ "" F (*)> kiit that the a PP roa ch is in a sense defined within the framework of the
theory itself: F n (x) converges stochastically to F(x). Stochastic convergence is de-
fined in 5U.21 .
Once F(x) has been postulated, the mathematics begins and it consists of carrying
out various mathematical manipulations on F(x) corresponding to certain operations which
can be performed on the sequence produced by the given randomizing process. The mathe-
matics then becomes a method of making predictions of what will happen if certain opera-
tions are applied to the sequence. For example, F(b)-F(a) is a prediction of the pro-
portion of times, in a large number of trials, that the given process will yield numbers
+00
greater than a and leas than or equal to b; J x dF(x) (taken in the Stieltjea sense,
-CD
2.5) is a prediction of the average of numbers obtained in a long series of repeated
applications of the process; P(X I )-F(x 2 ) is a prediction of the proportion of samples of
pairs of numbers, out of a large number of such pairs, in which the first number is
x 1 and the second ^x ? ; JJ dP(x 1 )dF(x 2 ), where R is the region in the x^ 2 plane for
i
which A^-U^x^B, is a prediction of the proportion of samples of pairs of numbers,
out of a large number of such pairs, in which the average of the sample pair lies be-
tween A and B. Many other examples could be given here but these will perhaps Illus-
trate the nature of the correspondence between the mathematical operations performed on
F(x) (1. e. probability theory) and calculations based on the results of repeated appli-
cations of a given randomizing process. The degree of correspondence, 1. e. validity of
prediction, depends on the degree of randomness In the empirical sequence and on how
well the function F(x) has been chosen. That such predictions, correctly applied, have
practical validity has been experimentally verified many times.
See a study by V. I. Smirnoff, "Sur les ecarts de la courbe de distribution empirique",
Recueil Mathematique. Moscow, vol. 6 (1939), pp. 25-26.
CHAPTER II
DISTRIBUTION FUNCTIONS
In this chapter we outline the basic probability theory necessary for the work
of the course. The treatment is general, the study of important particular distributions
being postponed to the next chapter.
2.1 Cumulative Distribution Functions
In the previous chapter we have introduced the notion of an empirical cumula-
tive distribution function (c. d. f.)F (x), and have indicated that it is an experimental
fact that F (x) appears to approach a limiting form ^(x) as n is increased. We now de-
fine a mathematical model F(x) for "the intuitively apprehended IjLCx) by laying down
postulates for distribution functions. Henceforth the term cumulative distribution
function (c. d. f.) will be used only in the sense defined below.
We shall find it convenient to use the following notations and definitions
from point set theory: PE signifies that the point P belongs to the set E. E^E 2 is
read "E. contains E ". The sum (or union) of E- and E is the totality. of points P for
I Cl I C-
which PE 1 or E 2 ; we shall denote it by E I + E 2 . The product (or intersection) of E 1 and
E 2 is the totality of points P for which P both E 1 and E 2 ; we write it E.jE 2 . E 1 and E 2
are said to be dis joint if they have no points in common. The difference E. - E 2 is the
totality of points in E I not in E 2
2.11 Univariate Case
A d. L* F ( x ) ia defined by the following postulates:
1 ) If x ! <x ff , then F(x ) - F(x ) ^o.
2) F(-co) 0, F(-fOo) = 1.
The notation in (2) implies that the limits of F(x) exist as x - oo or 4-00. Since (1 )
means that F(J^ is monotone, It follows that F(x) has at most an enumerable number of
discontinuities, and that the limits F(x+o), F(x-o) exist everywhere. The determination
of the values of F(x) at its discontinuities Is really not essential, but it will be con-
venient to fix them by
3) F(x+0) - F(x).
II, DISTRIBUTION FUNCTIONS
It follows from (1) and (2) that P(x) la non- negative.
The relation between probability statements about a random variable X and Its
c. d. f. Is determined by the following further postulates:
T) Pr (Xc) - P(x).
The left member Is read "the probability that X x ." Let E 1 , E 2 , ..... , be a finite or
enumerable number of disjoint point sets on the x-axis:
2 f ) Pr(XE 1 +E 2 4...) = Pr(XE 1 ) + Pr(XE 2 ) + ... This may be called the law ol
complete addltivity, and may be used to determine the term on the left side of the equa-
tion, or any term on the rigjit, when all the other probabilities entering the equation
are known. For example, let I be the Interval x f < x < x f * , I 1 be the Interval - CD < x
x f , I" be the Interval - oo < x < x". Then
I" I 1 + I.
From ( 1 f )
Pr(XI ! ) - F(x'), Pr(Xei")
and hence from (2 r ) we may state the theorem
A) Pr(x'<xc") - F(x") -
In order to find the probability that X be equal to a given value x 1 take a sequence of
points a 1 <a 2 <a,<... converging to x 1 . Let I be the Interval a^x^x 1 , and I. be the inter
val a ,-<xa u . . -Then
J J
Hence from (2 1 ),
OD
Pr(XI) Pr(X-x') + Pr(XIj),
and from theorem (A),
oo
F(x') - F(a ) - Pr(Xx') + X [F(a Ul )-F(a .)].
- J J
*In this chapter it is convenient to denote a random variable by a capitaA.etter, X,
etc., and the corresponding independent variable in the distribution function by the
corresponding lower case letter, x, etc. In later chapters we will drop this convention
when there is no danger of confusion.
$2.11 II. DISTRIBUTION FUNCTIONS
Now
1 .)-P(a.)] - llm2[P(a 1 . l )-P(a 1 )] - llm [P(ai . )-P(a. )] - P(x'-O) -
J J n-oo j-1 J+1 J n-oo n+1 1
Hence we have the theorem
B) Pr(X-x) - P(x) - P(x-O).
In a similar manner one may derive the following theorems:
C) Pr(x'<X<x") F(x"-o) - F(x'),
Pr(x'pCx") - P(x") - F(x'-o),
Pr(x'pC<x") - F(x"-o) - F(x'-o).
D) Pr(XE) 1 (for any aet E for which the middle
member la defined).
E) Pr(- CD < X < + oo) = l.
Let E 1f E g ,...., be sets which are not necessarily disjoint, then
P) Pr(XE + E) - PrfXCE) + Pr(XE 2
PrUeE.+E +E ) - 5_ Pr(XE.) - V" PrfXCE.E.) + PrtXCE.EpE, \ etc
~ s= ^ J J
We now characterize two important classes of c. d. f. f s:
(1) Suppose that F(x) increases only by jumps, more precisely, suppose a
finite or an enumerable set of points x^ x g , ---- , and corresponding positive numbers p 1 ,
P 2 ,..., SIpj-1, such that P(x) = Zp. summed over all j for which xXx. We shall call
thia the discrete case. It may be shown from the theorem (B) that in this case
Pr(X-x 1 ) - p 1 , while for any point x 1 ^ any x^ Pr(X=x ! ) = o.
If the number of x^ is finite, or more generally, if the x 1 have no cluster
points except + 00, then the graph of P(x) in this case is a step-function made up of
horizontal lines as shown in (a) of Figure 1 . The jump at x = x^ is equal to p^, the
*It should be noted that an empirical c. d. f . F n (x) of an observation variable X has pro-
perties (1), (2), (3) of a c. d. f. (discrete case). P n (x) does not have properties (1 1 )
and (2 1 ), although it has analogous properties. That is, corresponding to (1 ! ) we would
have Prop(X^x) F n (x) (proportion of values of Xx is F n (x)) and for (2 1 ) we would have
Prop(X6E 1 +E 2 +...+E^ r ) Prop (XE 1 ) 4- Prop (XE 2 ) +...+ Prop (X6E k ). Thus, in the case of
F n (x), P A would be the proportion of cases among the n values of X, in which the observa-
tion variable x 4 and not probability that X = x, .
^ *
II, DISTRIBUTION FUNCTIONS
probability that X - x^.
(ii) Another case la characterized by the existence of a function f(x) ^ o
auch that
P(x) -f f
-00
d*.
Thla. la really a neceaaary and sufficient condition for the abaolute continuity of F(x),
but Instead of calling thla the absolutely continuous case, we shall refer to It merely
as the continuous case. The graph of F(x) In this case Is continuous as shown In (b) of
Figure 1 . We shall call f (x) the probability density function of the random variable X.
The reader may show that in this case
J f() tt,
and that the statement remains valid if one or both of the equality signs inside the par-
entheses on the left are deleted. If f(x) is continuous for x 1 < x < x",
Pr(x'xc") - (x"-x f ) f(Xj), where x 1 < x 1 < x",
and if f(x) is continuous at x ,
PrCx^X^+dx) = f(x Q ) dx,
except for infinitesimals of higher order. The infinitesimal f(x)dx is sometimes called
the probability element of X.
The discrete and continuous cases thus defined obviously do not cover all uni-
variate c. d. f. f s, but we shall confine ourselves to these in the present course.
F(x) F(x)
Pjl
P(x)
X* X
< a )
F(x)
by
X
Figure 1
2.12 Blvarlate Case
Let J be a rectangle in the x 1 , x 2 plane, x{ < x 1
1 , x g ) the second difference
xj. Denote
&2.12 II. DISTRIBUTION FUNCTIONS
Then a c. d. f. FU^Xg) is subjected to the following postulates:
1) AJ F(x 1 ,x 2 ) } o.
2) F(-CQX 2 ) - F(x 1 ,-oo ) = o, FUoo,-fO>) - 1.
By letting x ! -oo In (1), we get with the aid of (2),
F(x v x) - F( XI ,X^) ^ if x > x,
and similarly
F(xf,x g ) - F(x|,x 2 ) ^ o If xf > xj,
so that F(x.,x 2 ) is monotonic in each variable separately. Hence the limits F(x-fO,x 2 ),
FfXjjXg+o) exist everywhere. It can be shown that Ffx^Xg) is discontinuous in x 1 at
worst on an enumerable number of lines x 1 - constant, and similarly for x 2 . If we let
xj > - oo and x* - oo in (i), we get (x^ 9 x^) ^ because of (2). The values of
FCx^Xg) at its discontinuities are fixed by
3) F(x v x 2 ) = F(x^o,x 2 ) - F(x^,x 2 +Q).
The tieup of probability statements about a vector random variable X 1 ,X 2 with
two components with its c. d. f . is determined by the following further postulates:
T)
Let E. , E 2 ,...., be disjoint sets, then
2') Pr(X 1 ,X 2 E 1 +E 2 +...) = PrtX^XgCE,) + Pr(X 1r X 2 E 2 ) + ...
By methods of 2.11 the reader may verify the following theorems:
A) Pr(X v X 2 J) - AjF(x 1 ,x 2 ),
where J and dj are defined above.
B) Pr(x<X^xy,X 2 =x 2 ) = F(x;,x 2 ) + F(x,x 2 -0) - F(xj,x 2 ) - P(xf,x 2 -o).
C) PrtX^x^Xg-Xg) - F(x v x ? ) + F(x r o,x 2 >0) - F(x r o,x 2 ) - F(x 1 ,x 2 -0).
It can be shown by methods beyond the level of this course that from the postulates (1 ! ),
(2 f ) the probability Uv.ot X 1 ,X o rE is determined for /t very general class of regions
10 _ II. DISTRIBUTION FUNCTIONS ' _ $2.12
called Borel-measurable regions.
D) o Pr(X 1 ,X 2 E) 1.
E) Pr(-OD<X i <+oo,-CD<X 2 <+oo) - 1.
For sets E 1 , E 2 ,...., not necessarily disjoint,
F) Theorem (F) of 2.11 is valid
With a bivariate distribution function we shall be mainly interested in the
discrete case and the continuous case, and occasionally a mixed case, all defined below.
We remark again that these categories are not exhaustive.
i) The discrete** case is characterized by the existence of a finite or enum-
erable set of points Cx llf x 2l ), i=i,2,..., and associated positive numbers p.^ (probabilities)
Tp = i, guch that F(Xj,x 2 ) =5_.p., summed for all j for which x 1 . ^ x 1 and x 2 . x g . From
theorem (C) it follows that Pr(X 1 =x li ,X 2 x 2i ) - p^ and for any point (x^x^) not in the
set UjX)* Pr-xX-x) - 0.
ii) By the continuous case (see remarks in 2.11 about absolute continuity)
4 e shall understand that in which there exists a function f(x 1 ,x g ) ^ o such that
1 2
( a )
-00 -00
We may show that
f(x 1 ,x 2 ) dx 1 dx 2 ,
J
9
*In k-dimensional space a Borel -measurable region (or a Borel set) is one that is obtain-
able from half -open Intervals or cells, x| < x x, 1 - 1 , 2,...,k, by taking a finite
or enumerable number of sums, differences, and products of such cells. A function f(x)
is Borel -measurable if the set of values of x for which a<f(x)bisa Borel set,
where a and bare any two real numbers. A Borel-raeasurable function of two or more
variables is similarly defined.
**As in the case of one variable, it should be observed that an empirical c: d. f .
F n (x 1 ,x 2 ) of two observation variables X 1 , X 2 has properties (1), (2), (3) of a c. d. f.
for two random variables (discrete case). But, in (i ') and (2') one would use the term
-proportion of cases " instead of the term "probability of. The p associated with the
isolated points (x 1 ,x 2l ) would be called the proportion of cases for which X 1 - x ,
X 2 - x 2l , instead of the probability that X, = x 1 , X 2 - x 2 . The number of such joints
would be j n, the number of observed pairs of values of X 1 , X .
This comparison of an empirical c. d. f . and the case of discrete variables
extends at once to the case of k variables discussed in 2.13.
II. DISTRIBUTION FUNCTIONS 11
and that the result la not Invalidated If J is closed by the addition of its boundaries.
Prom this it follows that, except for infinitesimals of higher order,
and f (x 1 ,x 2 )dx 1 dx 2 are called respectively the p. d. ft and the probability
element of the random variables X 1 , X 2 .
iii) The mixed* *case (X 1 continuous, X 2 discrete) is said to obtain if there
exists a finite or enumerable set of lines x 2 - x 2i , i = 1, 2,..., associated positive
numbers p gi , 27 P 2i - i, and a non-negative function of x 1 and x 2 defined for all x 1 and
for x 2 - x gi , i - 1, 2,..., which function we shall write as ffx^x^), such that
+00
J f(? 1 !x 2l )d! 1 - 1, i - 1, 2 ,...,
-oo
and
- *1
FU^Xg) =Zp 2 ,\ f (e i !x 2 ,)d5 1 , summed over all j for which x g * x g .
-oo
In the mixed case p 2i is the probability that the random point X 1 , X 2 will fall on the
line x 2 - x 2i , and f(x 1 lx 2 )dx l is the probability (to within terms of order dx 1 ) that
x. J <X 1 <x l +dx 1 if the random point falls on the line x 2 = x 2i .
It may be shown from our postulates that for any (B-meas.) region E in the
we get in the three cases
i) Xp. summed over all j such that x 1 .,x 2 ,E,
Pr(X 1 ,X 2 E)= ii)
P 2 i f( x 1 l x 2i^ dx T where E 2i l3 the Projection
E 2l on the x 1 axis of the part of
the line X 2 x 2l lying in E.
( If the line does not Inter-
sect E, the corresponding in-
tegral is zero. )
By means of the Stieltjes integral (2.5) these three cases may be brought under the
single expression PrfX^XgCE) =JdF(x 1 ,x 2 ), which includes Indeed the most general case.
E
2.13 k-Variate Case
A k-variate c. d. f. F(x 1 yx 2 , . . ,,x k ) must satisfy the following three postu-
lates: Let J be the k-dimensional cell xj<x^xj, i-i,2,.. t k, and define the k-th
difference
*probability density function
**The reader will understand this case better if he rereads this description after having
mastered 2.4.
12 _ II. DISTRIBUTION FUNCTIONS _ $2.2
k
ZA F(XJ f Xg, . . . jX k ) ^1^2* ' *^k-i^k^ x i '*2' * *
J
where the operators A, are applied successively and denote
1) Aj F(x lf x 2 ,...,x k ) ^ 0.
2) F(-oo,x 2 ,,..,x k ) FU^-oo, x 3 ,...,x k ) - ..
- F(x 1 , . . .,x jc _ 1 ,-oo ) 0, F(+oo,+oo, . . .,-foo
3)
. . . >^_ ^ ,^-f^ . - 9 . . . * k .
As In the bivarlate case It can be shown from ( 1 ) and ( 2 ) that F is monotonlc In each
variable separately and that F Is monotonlc (In the sense of (1 )) In any set of variables
if the remainder are held fixed.
A random vector variable X - (X 1 ,X g , . . . ,X^) is said to have the c. d. f.
F(X I ,x 2 ,. ..,x k ), --or the random variables X.j,X 2 ,...,X k are said to be jointly distri-
buted with the c. d. f , if furthermore
1 ! ) PrU^x^X^Xg,...,}^^) - F(x 1 ,x 2 ,...,x k ).
If E^Eg,..., are a finite or enumerable number of disjoint sets,
2 f ) Pr(XE 1 +E 2 +. . . ) Pr(XE 1 ) + Pr(XE 2 ) + . . .
By thejnethods used before we may now generalize the theorems (A) to (F) of 2. 11 and
2.12.
The discrete case and the. continuous are defined by obvious generalization of
2.12, and it is evident how mixed cases of various orders would now be defined.
2.2 Marginal Distributions
Suppose the joint c. d. f. of the random variables X 1 ,X 2 is FU^Xg), and con-
sider the probability that X 1 x , without any condition on X 2 :
This is called the marginal distribution of X 1 . We note that it is a bona fide distribu-
tion function as defined in 2.11, In fact, it is the univarlate c. d. f. of X.. Simi-
larly, we define F(+co,x 2 ) as the marginal distribution of X 2 .
For the discrete case defined In 2.12 we then have
F(x 1 ,+oo ) m JTp j summed for all j such that x
t ^
$8.5 II. DISTRIBUTION FaHCTIOHS _ 13
For the continuous case,
?1 f *1
(a) F(x,, +00) -( J f(jr 1f f 8 )d| t a - J f ,(!,),,
* - -
OD -00 -00
where
+00
-00
f 1 (x 1 ) may be called the marginal p. d. f . of X 1 .
In the trlvarlate case we get besides the marginal distribution of each random
variable separately, for example,
also marginal distributions of pairs of random variables, for example,
x+oo j
For a k-varlate distribution one likewise defines marginal distributions of the random
variables taken one at a time, In pairs,..., k-1 at a time. We note that all these mar-
ginal distributions satisfy the postulates (1), (2), (3), 1 ), (2 ! ) for a c. d. f .
2 . 3 Statistical Independence
If PJx^Xg) Is the c. d. f. of X^X 2 , then from 2.2,
P 1 (x 1 ) - P(x lf +a> ), F 2 (x 2 ) F(+co,x 2 )
are the marginal distributions of X 1 and X g , respectively. We say that the random vari-
ables X 1 ,X g are Independent In the probability sense, or statistically Independent , If
(a) P(x lf Xg) - PjU,) F 2 (x 2 ).
It is easily seen that a necessary and sufficient condition for the statistical independ-
ence of X 1 and X g is that their joint c. d. f . factor into a function of x^ alone times
a function of x 2 alone, i. e.,
Ftx^Xg) - G(x 1 ) H(x 2 ).
In order to see the probability implications of statistical Independence, con-
sider any two intervals I 1 and I 2 on the x 1 and x 2 ~axes, respectively,
V x l < x i ^ x "
<x
V
U TT. DTOTRIBffrTOtt PTOfiTIQNS
and let J be the rectangle of points (x^X) satisfying both these inequalities. Then
(b)
For, by hypothesis, we have (a); hence
2
Pr(X 1f X 2 J) - AjPtx^Xg) - F^xfJPgtxg) + PjUj )P 2 (x|) - P, (xj )P 2 (xg) - Pj (xf
After factoring the right member we easily get (b).
By the same method, and with the aid of Theorem (B) of 2.11 and Theorem (CD
of 2.12 we get that if X 1 and X 2 are statistically independent, then
This is of importance for the discrete case. For the continuous case we may state the
following result: If ffx^Xg) is the joint p. d. f. of X^Xg, if f j(*j) is the marginal
p. d. f. of Xj, J - 1,2, and if X 1 ,X 2 are statistically independent, then
f(x 1 ,x 2 ) - fjU,) fg (x 2 )
wherever ffx^Xg) is continuous. At the points of continuity, we have from equation (a)
of 2.12,
6F 2 (x g )
dx 1
the last step following from (a) of 2.2.
k random variables are said to be mutually (statistically) independent if their
Joint c. d. f. is of the form
F(x 1 ,x 2 ,...,x k ) - F 1 (x 1 )F 2 (x 2 )...F k (x k ),
where PJ(XJ) is the marginal distribution of X*. Two random vector variables X^
(*liXi2'***'*lkif)' * " 1 ' 2 ' are called statistically independent if the joint c. d. f.
of the k 1 -f kg components is the product of the marginal distributions of X 1 and X 2 :
The definition of the statistical independence of n vector random variables is made as
DISTRIBUTION FUNCTIONS
the obvious generalization.
The concept of statistical Independence Is fundamental In sampling theory
n random variables are said to constitute a random sample from a population <4.1 ) with
c. d. f. P(x) If their joint c. d. f. Is P(x 1 JFUg)..^^). If the population distribu
tion Is k-varlate with c. d. f. F(X I ,x 2 , . . . ^x^), then the n vector variables
^1 (*ii jXi2> ** ''^ik^' * ** 1 ' 2 >"'> n > are aaid to t> e a random sample If the joint
c. d. f . of the set lX 1<x f Is
n
I I F(Xj i 9 ^"\ 2* * * * '^Uc *
2.k Conditional Probability
Let X be a random variable, and let R be any (Borel) set of points on the
x-axls. Let E be any (Borel) subset of R. If Pr(XR) ^ 0, we define the conditional
probability Pr(XE|XR), read "the probability that X is in E, given that X is in R", as
(a) Pr(XE|XR) ~
The definition (a) extends immediately to any finite number of random variables. For
example for two random variables X 1 , X 2 , R would represent a (Borel) set in the x^ 2
plane and E would be a subset of R.
Of particular interest is the case in which R is a set in the x^ 2 plane for
which X^E. where E. is any (Borel) set in the domain of X 1 , and E is the product or inter
section set between R and a similar set for which X 2 E 2 , where E P is any (Borel) set in
the domain of X 2 . Here we may write E = E 1 E 2 ' The 3im Pl e3t case is that in which E 1 is
an interval xj < x <| x^ and E 2 la an interval x < x 2 < x. Then R la the horizontal
strip x^ < x 2 x|, and E is the rectangle for which x| < x 1 xf and x < x 2 x. In
the present case, expression (a) may be written in the form
Pr(X lf XpE)
(b) .
Because of symmetry, we may also write
Pr(X.,X p E)
In a similar manner we may write for the case of three variables
PP(X 1 ,X 2 ,X,E 1 E 2 E,)
and so on for any number of variables. The relation (b) may of course be expressed in
16 II. DISTRIBUTION FUNCTIONS
terms of distribution functions. In particular, if X^Xg have a blvariate p. d. f.
f(x 1 ,x 2 ), and Ejis the set
(c) x{ x, ^x?
on the x^axis, and Eg is the set
(d) x x g x + h
on the Xg-axis, then E is the rectangle in the x^g-plane defined by (c) and (d). Equa-
tion (b) becomes
x?
A^TJ.1 J\ t
f ( f X X dX (
J * 121
(e)
if the denominator does ipt vanish. If ffx^Xg) is continuous in the rectangle E, the
denominator may be written
h fg(5 2 ), where x < E 2 < x + h,
and the numerator,
h ffXjjiigfx, ))dXj, where x < < n 2 (x 1 ) < x + h.
i
(e) may then be written
(f) j [f(x 1 ,T^ 2 (x l ))/fg(E 2 )]dx 1 .
We note that the integrand, for fixed x and h, has the properties of a univariate p. d. f .
We next assume that fg(x^) ^ o. Noting that Prfxj^X^xyiXg-x^) is not defined by (b), we
now define it as the limit of (e) as h * 0. The continuity we have already assumed is
sufficient to justify our taking limits under the integral sign in (f ); the result is
T 1 *
r1
t
where
(8) f(x 1 lx 2 ) - f(x 1 ,x 2 )/f 2 (x 2 ).
U 2.5. 2.51 _ II* DISTRIBUTION FUNCTIONS _ 1?
For fixed x g , f(x 1 U 2 ) again has the properties of a unlvarlate p. d, f.; It may be
called the conditional j>. d. f. of x 1 , given x g . We note that If X 1 and~X 2 are statisti-
cally Independent,
Likewise, If random variables X n ,...,X 1lc ;X 21 ,...,X 2lc have a joint p. d* f.
f (x n ,...,x 1k ;x 21 ,...,x 2k ) we define the conditional p. d. f.
(h)
f...f
if the denominator is not zero.
2.5 The Stielt.les Integral
An important tool in mathematical statistics, which often permits a common
treatment of the discrete and continuous cases (and indeed the most general case), is the
Stieltjes integral.
2.51 Univariate Case
We begin by defining the Stieltjes integral over a finite half-open Interval
a < x b: Suppose we have two functions, 4>(x) continuous for a x b, and F(x) mono-
tone for a < x ^ b. We subdivide (a,b) into subintervals I.:(x* - ,x^) .by means of points
X Q a < x 1 < x 2 < . . . < x m = b. In each interval we pick an arbitrary point 5*Ii.
Denote by ^F(x) the difference F(x,) - F(x, -), and form the sum
S zI<t>(SJ A<F(x).
If U. is the maximum of 6(x) in 1^, and L, the minimum, then
where
%,-
1
Let - max(U.-LJ. Then
j J j
m
^ ^ " SL * ^Z (U r L l)A4P(x) ^ ^^PCx) - [F(b)-F(a)].
^ j j j ^-~
18 _ II. DISTRIBUTION FUNCTIONS _ *2.31
Hence if the Intervals I, are further subdivided, and this process is continued in such a
way that the norm of the subdivision, 6 - maxfxi-x, , ), approaches zero, then since
j J J
is uniformly continuous on (a,b), -> 0, and hence
It is easily seen that S-^ is non-decreasing, and S^, non- increasing, as the subdivision
m
is made finer, a^fd hence from (a), S approaches a limit. Since ST and S are independent
of the choice of the arbitrary point .* in I.,, therefore from (a), lim S is likewise in-
J J 6-0
dependent of this choice. Furthermore, lim S may be shown to be independent of the
<5-o
method of subdivision. We call this limit the Stleltjea integral of 4>(x) with respect
to F(x) over the range a < x < b and denote it by
b
U(x)dF(x) = lim s.
i *-*
Let ua examine further the significance of the Stielt jes integral when F(x) is
a c. d. f. in the discrete or continuous cases: Suppose that F(x) is a discrete c. d. f.
with only a finite number n of jumps of amount p^ at the points a^. In the Interval (a,b).
We may assume that the points are ordered,
(b) a < a, < a 2 < ... < a^ b.
Since the points a^ are Isolated, eventually for any mode of continued subdivision, each
interval I* will contain not more than one point a^ In its interior or as right end point.
If I* contains a^, that is if x. 1 < a^ ^ x., denote it by 1^, and call the arbitrary
point E in this Interval, . Then
o if I. la not an
Hence
Now aa the norm t o, EjJ. a^, ( ' ) (S^) <b(a. ), and thua
b n
(c) U(x)dP(x) -21 t(a..) p...
a k=1
It will be noted that the continuity of <t>(x) at the points a^ la eaaential. The result
(c) may be shown to remain valid in the caae where there la an infinite number of points
of discontinuity of F(x) in (a,b).
II. DISTRIBUTION FUNCTIONS _ 1Q
In the continuous case at points of continuity of the p. d. f . f (x) we have
dF(x)/dx - f(x), dF(x) - f(x)dx,
and hence we might write heuristically
(d) U(x)dF{x) U(x)f(x)dx.
a a v
The relation (d) may be proved as follows: We first assume that f(x) is continuous on
(a, to). Then in each interval !, we pick as %* the point for which
The existence of such a point is guaranteed by the mean value theorem. Then
But by the so-called fundamental theorem of the calculus (actually, the definition of
the ordinary definite integral), the limit of this sum as the norm approaches zero is the
right member of (d). The proof can be extended to the case where f(x) has discontinui-
ties on (a,b).
We shall have need of the Stieltjes integral over an infinite interval. We
define it as
+OD b
(e) f (b(x)dP(x) - lim f<t>(x)dF(x)
*-co *"""* ~ a
00 b-* +00 a
if and only if the limit exists as a -CD, and b * 4-00, independently. In more advanced
work it is sometimes convenient to consider
-fT
(f) lim $4>(x)dF(x).
T > +00 _ T
This limit of course exists whenever (e) does, but the converse is false, (f ) is called
the Cauchy principal value of the infinite integral. Unless the contrary is explicitly
stated, we shall always understand that the infinite integral connotes (e).
An intuitive explanation of the meaning of the Stieltjes integral will be
given in 52.53, where we shall also indicate how the Stieltjes integral may be general-
ized over any range which is a Borel set E. In the univariate case, the various expres-
sions for Pr(XE) introduced in 2.11 may then all be summarized under
Pr(XE) J dP(x).
E
20 II. DISTRIBUTION FUNCTIONS
2 . 52 Blvariate Oaae
We limit our definition to the case where F(x 1 ,x 2 ) is a c. d. f . aa defined in
J2.12, Let J be the half -open cell
(a) J: a T < *, b,, a g < * 2 V
We assume t(Xj,Xg) is continuous on J (boundaries included). By means of lines parallel
to the axes, subdivide J into rectangles J,, j - i,2,...,m. Let the norm 6 of the sub-
division be the maximum of the lengths of the diagonals of J\. In each cell J* pick a
point (Ejij^j). Define ^Spfx^Xg), the second difference of Ffx^Xg) over the j-th cell,
as in $2.12, and form the sum
By considering the upper and lower sums Sy and Sr, defined aa in 2.51, we find again
that lim S exists, and define it to be the Stieltjes integral of <t> with respect to F
*-*0
over J:
(b) ^(x l ,x 2 )dF(x 1 ,x 2 ) - lim S.
Jo *0
The remarks in 2.51 regarding the independence of (b) of the choice of (^o^gj) and of
the mode of subdivision remain valid.
As in 2.51 it may be shown that in the discrete case
J
where (x^x^) & re the points in J (excluding the left and lower boundaries) where the
probabilities are p^ (see 2.12). In the continuous case we may derive
b 2 b 1
j6(x 1 ,x 2 )dF(x 1 ,x 2 ) I J^U^XgJfU^XgJdx^g.
J a 2 a7
In the mixed case defined in 2.12, and in the notation employed there, it may be shown
that
b,
U(x 1 ,x 2 )dF(x 1 ,x 2 ) - ZL-P 2 i$ <P(x 1 ,x 2l )f(x 1 |x 2l )dx 1 , summed for all i auch that a 2 <
a i
Denote by R 2 the entire x^g- space. We say that the improper integral
8 2,53 _ II, DISTRIBUTION FUNCTIONS _ 21
exists If and only If the limit
llm
a, -co
bj + 4-00
exists, where Jja^b^ are related by (a), as a^a^b^bg Independently become Infinite
(with the signs Indicated).
A generalization of the Stieltjos Integral to regions more general than rect-
angles will be given In 2.53.
2.35 k-Varlate Case
We first define the Stlelt jes Integral over a half-open cell,
(a) J: a < x b^ 1 - !,2,...,k.
We assume that P(X I ,x 2 ,. ..,x k ) Is a k-varlate c. d. f. as defined 52.13, and that
4>(x 1 ,x 2 ,..,x k ) Is continuous In J (and on Its boundaries). By means of hyperplanes
x^ = constant, 1 i,2,...,k, we subdivide J Into cells J,, j - l,2,...,m. Let 6 be the
length of the longest of the diagonals of the cells J*. Define A^F. the k-th difference
of P over the cell J, as In 2.13, and form the sum
where (5 1 -t> ^i) Is an arbitrary point In J.. Under the hypotheses we have made, S
converges to a limit Independent of the choice of ( 1 ., . . 5] C 4) and of the mode of sub-
division, as 6 * o. We define
$ 6(x 1 ,...,x k )dP(x v ...,x k ) - llm S.
J ^
Let R k be the entire x^.. ,x k -space. The Stieltjes integral of 4> with respect to P
over R^ la defined as in 2.52.
Next, let us define the integral over a region K which is the sum of a finite
or enumerable number of half -open celTs J^, i 1,2,...,
K
- y
*
To define the integral over any (B- measurable) region E in R^. we cover E with a region of
the type K just considered, .-aid then take as the Integral o^er E the greatest lower bound
of the integral over K for all possible K containing E:
22 _ II. DISTRIBUTION FUNCTIONS _ tg.93
j6dF - g.l.b. C <t>dF.
E KDE K
In terms of our general definition of the Stlelt jea Integral we see that
b
S*(x)dF(x) - $4>(x)dF(x)
a I
only If I Is the half -open Interval a < x < b. For the closed Interval we would have to
add 6(a)[F(a)-F(a-o)] - 4>(a)Pr(Xa) to the left member; for the open Interval, subtract^
itbHFtbJ-Ffb-o)] - 4>(b)Pr(X-b).
Specializing now to the discrete case, we may say that the most general such
case can be described as follows: There Is a finite or enumerable number of points
(x 1 4*x 2 j,. > x i c i) J "" 1*2,..., and associated positive numbers P4,51p4 1, such that
j
F(x 1 , . , .,x^) -vLp, summed over all 1 such that x^. < Xj,...,
In this case
\6dF - Xi>(x 1a , . . jX^g) p a summed over all s such that (x 1g ,. .
E
In the continuous case
J<bdF - J<bfdV,
E E
where dV Is the volume element dx^g. . .dx^. In the most general case
- Pr(XE).
E
It Is helpful for some of us to develop an Intuitive feeling for the
Stlelt jes Integral. Consider first an ordinary Integral
E
where h la continuous. We may conceive of the Integral In a Lelbnltzlan (non-rigorous,
but sometimes fruitful) sense: The k- dimensional volume E is partitioned into tiny
volume elements dV. These are so small that the function h is "practically constant"
over any dV. We multiply this "practically constant" value of h by the volume dV and.
sum over E. Now a c. d. f. f(^ 9 ...,x) defines a probability distribution over R, of
12.6 _ _ II. DI3TRIBOTIQN FUNCTIONS _ 23
which it ia sometimes convenient to think as a mass distribution. We think of dF as
being the amount of mass or probability in an infinitesimal volume element dV, whether
it be concentrated at points, along curves or surfaces, or smeared out as a density. We
weight the "practically constant" value of t in dV with the amount dF of mass or proba-
bility, getting 6dF, and we sum over E. The reader may see that the definition of JWF
J
over a half -open cell J is a rigorous polishing up of the process we have described:
In place of dV we use the cell J 1f in place of dF we useA^F, the probability that a rai
J J
dom point be in J*, we multiply not by the "practically constant" value of 4> in J*, but
by any value it assumes in J., and finally, instead of merely summing, we take the limi
of the sum*
g.6 Transformation of Variables
Suppose y <|i(x) is a (B-meas.) function of x. Then if X is a random variai
with c. d. f . F(x), Y - i|i(X) is also a random variable with c. d. f . 0(y) calculated a
follows :
0(y) - Pr(Yy) Pr(i|,(X)y) - J dP(x),
Ey
where E y is the totality of points on the x-axis for which i|i(x) y.
More generally, suppose (X 1 ,X 2 , . ..,X^) is a random vector variable with c.d.
FU^Xg,.,.^), and j,,J 2 ,...,J n tre (B-meas.) functions of x 1 ,x 2 ,...,x k ,
y 1 - ^(x^Xg,...^). Then (Y 1 ,Y g ,. ..,Y n ), where Y - * 1 (X 1 ,X 2 ,. . .,\.), is a random
vector variable with c. d. f .
y^ 9^2 f * * * *^n
where EL v .. is the region in R. defined by ^ (x. ,x p , . . .,x^) < y 1 , i - l,2,..,,n.
^1 '^2' * * * '^n
It may be shown that if X 1 ,X 2 are random (possibly vector) variables, and
that if Y 1 - ^(X,), Y 2 - (> 2 (X 2 ) are (Brmeas.) functions, then if X 1 and X 2 are statis-
tically independent, so are the random variables Y 1 and Y 2 .
Transformations of discrete variables offer no especial difficulties, so we
consider in the following sections transformations in the continuous case.
The theorems obtained there are essentially corollaries to corresponding
theorems on the transformation of integrals, single and multiple. Rigorous proofs of the
theorems on integrals may be found in standard real variable texts. For the student in
this course the insertion at this point of heuristic proofs which will strengthen his %
2k II. DlgPRIBUriQM FDNCTIONS it 2,61 2*62
Intuitive grasp seems desirable, and accordingly we employ the Infinitesimal arguments
so useful In applied mathematics.
2.61 Ualvarlate Case
Suppose X is a random variable with p. d. f . f(x). Let y - 4>(x)
be a monotone transformation having unique inverse x - 4>~ 1 (y) f and such that 4> f (x) exists.
Now consider a new random variable Y - <t>(X). The problem here is to determine
Pr(y<t(X)<y+dy). Now since y - <t>(x) is monotone, It is clear that the values of x for
which y < 6(x) < y +dy(dy>o) will lie on an interval (x.x-fdx) depending on y, where dx
may be positive or negative depending on whether 4>(x) is monotone Increasing or decreas-
ing. Since x - 6~ 1 (y) by the Inverse of the transformation y - 4>(x), then expressed
In terms of y, the interval (x,x+dx) becomes (6 (y), <T (y+dy)). Hence the value of
Pr(y<*(X)<y+dy) Is given by determining the value of Pr (x<X<x+dx) - Prf 1 (y)<X<<tf '(y+dy))
if dx > or Pr(x+dx<X<x) - Pr(4>" 1 (y+dy)<X<<|>~ 1 (y)) if dx la negative. In either case the
probability is, except for differentials of order higher than dy,
f(x)|dx|
where x is to be expressed in terms of y. We may summarize as follows:
Theorem (A) : Let X be a* continuous random variable with probability element
f(x)dXj and let y 4>(x) be a monotone transformation with inverse x - 4>" 1 (y) auch that
<t> f (x) exists. Then except for differentials of order higher than dy
Pr(y<6(X)<y+dy) g(y)dy
led
Example. Suppose
where g(y) - f(x)|g|! expressed in terms of y
f(x)dx - e~ x dx
- o dx x<o
and that it ia deaired to find Pr(y<X 2 <y+dy), i. e., the probability element of
y, say g(y)dy. We have the transformation y - x 2 , or x - Yy"and hence
g(y)dy - e" x l-^ldy - e"^ -~ dy
2.62 Bivariate Caae
Suppose
are functions of x^ x g with continuous first partial derivatives. Let f(x ]f x Q ) be the
joint p. d. f . of X 1 and Xg. We shall assume further that the transformation (a) ia
one-to-one, that is, the relation between the x f a and ya ia auch that corresponding to
II. DISTRIBUTION FUNCTIONS
each point in the x^ 2 plane (OP that part of it for which the probability function
f(x 1f Xg) j o) there is one and only one point in the y^ 2 plane and each point in the
y^g plane which has a corresponding point in the XjXg plane has one and only one corres-
ponding point in the x^ 2 plane, the relation between any point in the XjX 2 plane and ita
corresponding poiiit in the y^ plane being given by (a). Let the inverse of the trans-
formation (a) be
Let the Jacobian of the transformation (b) be
ay 1 ay 2
a* 2 ax 2
(c)
If X 1 , X g are random variables, then Y I y^X^Xg) and Y 2 y 2 (X 1 ,X 2 ) will also be ran-
dom variables. The problem here is to determine the p. d. f. of Y 1 and Y 2 , say g(y 1 y 2 ),
from ffx^Xg) the p. d. f. of X^ X 2 and the transformation (a). In other words, the
problem la to determine
(d)
to within terms of order dyjdy g .
Consider the infinitesimal region R in the x 1 ,x 2 plane bounded by the curves
whose equations are
(e)
where
y 1
dy 1
, dy>o.
The situation is represented in Figure 2.
Figure 2
Now the probability (d) is given by
By the mean value theorem for
Integrals the value of this Integral Is f(xj,x)dA where (xj,x) Is some point in R and
dA Is the area of R. We must now find an expression for dA<
If the coordinates of P 1 In Figure 2 are (x^,x 2 ) then the coordinates of
PO> *> ?k are
(f)
2
dx.
dx-
dx.
dx.
dx.
dx
except for Infinitesimals of order higher than dy. and dy g . To show this It Is suffic-
ient to consider only one point, say P 2 . The coordinates of P 2 are given by (f ) when y 1
Is replaced by y 1 + dy 1 . We have
x- *
But x^y^dy^y ) - x^y^y,,) ^ -5^7, * terms of order (dy 1 r and higher and x^+dy, ,y 2 )
- X 2 (y 1f y 2 ) + ^ dy 1 - terms of order (dy 1 ) 2 and higher. But (x^y^yg), XgCy^yg))
are the coordinates of P 1 which have been Indicated by (x^Xg), thus showing that the
approximate coordinates of P g are those stated In (f ). A similar argument holds for
the approximate coordinates of P- and P^.
It Is clear that P 1 , together with the points represented by the approximate
coordinates of P g , P^, P^ given by (f) form a parallelogram R 1 . Now It Is known from
coordinate geometry that If (x^x^), (x^x^), (x^,x^) are three vertices of a parallelo-
gram, then the area of the parallelogram Is given by the absolute value of the deter-
minant
Hence the area of the parallelogram R 1 la given by the absolute value of
$2.62
II. DISTRIBUTION FUNCTIONS
27
(e)
1 X, X 8
dx, 3x 2
-
ax, 3x3
ay, ay,
ax, ax 2
ay a ay 8
1 dy 1 2 dy^ i
ax, ax 2
i yp * * $y 2
i a ay^
But since the coordinates of the vertices of parallelogram R 1 differ from the corres-
ponding coordinates of the corresponding vertices of R by terms of order higjier than
dy 1 or dy 2 , it follows that the area of R (i. e,, dA) differs from the area of R 1 by
terms of order higher than dy 1 dy g .
Since ffx^Xg), the p. d. f. of X 1 , X 2 , is continuous, we have that ftx^x^)
differs from f(x 1 ,x 2 ) by terms of order dy^y^where (x^xjp is any point in R. There-
fore we have the result that the probability expressed by (d) is equal to
(h)
where the x f s are to be expressed in terras of y f s by (b). It may be verified by the
reader that
a(x 1 ,x 2 )
a(y 1 ,y 2 )
,-1
x 1 -
We may summarize in the following:
Theorem OB): Let X 1 , X g be two continuous random variables with p. d. f.
). Let y 1 - y^x^Xg), y g - y 2 (x 1 ,x g ) be a transformation with a unique inverse
y^yg), x 2 X 2 (y 1 ,y 2 ), such* that the first partial derivatives of the y f s with
respect to the x f s exist. If the random variables y^CX^Xg) and y^X^Xg) are denoted
by Y, and Y respectively , then
" i - -- - L - d ' L -
^dyg
is given by ( c ) .
-
where x- and x are expressed in terms of y- ,y by (b). and
__ i ^ i ^
Example; To illustrate the transformation problem for two random variables,
suppose the probability element of X 1 and X 2 la
-
21 22
II, DISTRIBUTION FUNCTIONS
12.63
defined over the entire
Y, -
plane. To determine the p. d. f . of Y 1 and Y 2 where
[, Y 9 - tan" 1 J- .
C J\if\
The transformation Involved here Is
*i
defined over that part of the y 1 ,y 2 plane for which y 1 ^ > j Yg < 2it. The In-
verse of the transformation Is
*! - y 1 cos y 2
x 2 = y 1 sin y 2 .
We have
cos y 2 - y, sin y 2
sin y j cos y
Therefore by Theorem (9, the probability element of Y 1 , Y 2 Is
- ^
1 "* 21
2.63 k-Varlate Case
Let the joint p. d. f. of X 1 , X 2 ,...,X^ be f (x 1 ,x g , . ..,x^), and Introduce new
random variables Y I , Y 2 ,...,Y^ by means of the one-to-one transformation
(a) y^L - y 1 (x 1 ,x 2 ,...,x k ), l=1,2,...,k .
Let the Inverse (which will be unique ) of this transformation be
(b) x -
and Its Jacoblan
(c)
dx.
ay k
assuming, of course, that the first partial derivatives exist.
II . DI3TRIBOTION FUNCTIONS 29
By pursuing an argument similar to that used In the bivariate- ease, we find
that the probability element of the Y^, say gty^ yg* 7^ c ) dy 1 ...dy^is given by
(d)
where the x f s are to be expressed in terms of y f s by (b).
This covers the cases where the number n of new variables equals the number k
of original variables. It can be shown that if n > k, there exists no p. d. f . for the
n new variables. (Note here the complete generality of the treatment by means of the
c. d. f . In 52.6). If n < k the usual method of getting the p. d. f . of the new variables
Is to adjoin further variables to fill out the number of new variables to k, use the
above procedure, and then "integrate out" the extra variables by getting the marginal
distribution of the n variables whose p. d. f . is desired.
2.7 Mean Value
We begin with the definition of the me&n value of a random variable In gen-
eral and then consider in later sections the mean values of particular (random) functions
of especial interest In statistics. If X is a random variable with c. d. f . P(x) we
define the mean value of X as
+00
(a) E(X) - ^xdP(x).
-00
This is also called the expected value of X.
If Y <(>(X) la a continuous function of X, then the c. d. f . of Y is (52.6)
0(7) - JdF(x),
where E is the set of points on the x-axis such that 4>(x) y. Prom (a),
+00
E(Y) - Jy dO(y),
-00
and this may be shown to be equivalent to
+00
(b) E[*(X)] -
and
-GO
If random variables X^X,,...,^ have the c. d. f. F(x, ,x g , . . .,x k ),
y - <b(x 1 ,x 2 ,...,x jc ) is continuous, then from the definition (a) it may be shown that
50 II. DISTRIBUTION FUNCTIONS
(c) E[*(X lf X f ...^)] - dF,
where R^ Is the entire k-space. Of course If the improper integral does not exist in
the sense explained in 2.5, ff., we say that the mean value of <t> does not exist. In
the light of the Intuitive discussion (2.53) of the meaning of a Stieltjes integral, we
see from (c) that the mean value of <t> may be regarded as an average over k- space of the
function <|>,--the average being taken over volume elements dV, and the weight assigned to
each contribution being the total probability in dV.
For the discrete and continuous cases, the expressions (b) and (c) may be
analyzed into the forms given in 2.51, 2.53.
2.71 Uhivariate Case; Tchebycheff ! s Inequality
The mean value of X ,
+00
H - E(X 1 ) dFU), i - 0,1,2,...,
-co
is called the i-th moment of the distribution F(x) about the origin. ^ 1 for any
F(x); nj E(X) Is called the mean of X, also the mean of the distribution and denoted
by a. The i-th moment about the mean is defined to be
+00
(a) ^ - Ef(X-a) 1 ] - Ju-a^dFU), 1 - 0,1,2,...
-CD
For any F(x), ^ o = 1, ^ = o. The variance of X, or the variance of the distribution,
Is defined to be ^ 2 , and is denoted by the special symbol cr^. <T X > o is called the
standard deviation of X or of the distribution. -A formula for expressing ^ in terms of
ji|, ^ -1 ,...,^ 1 I may be obtained by using the binomial theorem in (a) and then integrating
In particular, we find that
o-l = M - - a 2 .
An important theorem about arbitrary distributions with finite variance Is
contained in the Tchebycheff Inequality;
(b) Pr(lX-a|>6cr x ) l/<5
To prove (b) we break up the Integral for
2
42.72
II. DISTRIBUTION FUNCTIONS
31
(c)
where the Intervals 1
+00
J (x-a) 2 dF(x) =\ + \+ \,
-oo
I, are defined by
I,: - oo < x < a - <J<r x ,
.
Now in I 1
Hence
(d)
Similarly,
(e)
Finally,
(f)
|x-a| > <5cr,
J(x-a) 2 dP(x) ^ <5 2 cJ JdF(x)
"
J(x-a) 2 dF(x) ^ dV 2 JdP(x)
J(x-a) 2 dP(x) ^ o.
Using (d), (e), (f) in (c), we get
This la easily seen to be equivalent to (b).
2.72 Bivariate Case
For the distribution
we define momenta fiJ. about the origin by
0,1,2..,
where R ? is the entire x.x -space. Since X 1 has the marginal distribution F^x^, the
mean of X 1 has already been defined in 2.71; we denote it by a 1 . In view of the re-
marks In 2.7, we may calculate a 1 from either of the integrals
52 II DISTRIBUTION FUNCTIONS so.
+00
a l "
-00 R 2
Similar atatementa apply to a 2 - E(x 2 ). We note ji 00 ' - 1. The point (a^ag) may be
called the mean of the dlatrlbutlon. the momenta ^ about the mean for F<X I ,x 2 ) are de-
fined by
(a) n j - EKXj-a, ^(Xg-ag)^] - J(x 1 -a 1 ^(x^-a^dFU^Xg), i,j - 0,1,2,...
R 2
For any P(x 1 ,x 2 ), H 00 - 1, ^ 10 - p^ 01 0- The variance of X 1 haa already been defined
in $2.71; we note that it is a^ - ^ go . Likewise, < - ji 02 . The remaining second
order moment n n la called the covarlance of X 1 and X g . The quotient
(b) p 12 " H-i i /0V x
I I C. II A,. Ag
la called the correlation coefficient of X 1 and X g . By meana of the Schwartz inequality
it may be ahown'that -1 ^ p 12 1 . Aa an exerciae the reader may show that if X 1 and X 2
are atatiatically independent, then p 1g - o, but the converse Is false.
The reader may also verify that a necessary and sufficient condition for
p 12 - 1 la that all of the probability in the X^ 2 plane be concentrated along some
atraight line with positive slope. (For p 12 - -1 the slope must be negative.)
Formulas giving the moments about the mean in terms of the moments about the
origin may again be obtained from (a); in particular, it is found that
a " - a
and these expreasions may then be substituted in (b) to evaluate the correlation coef-
ficient in terms of the first and second order moments about the origin.
273 k-Varlate Case
The moments ^ of a distribution F(x 1 ,x g , . . .,x k ) about the origin are defined
aa
II. DI3TRIBOTIOT FUNCTIONS 33
where R^ la the complete k- apace. For any F, ^oo.,.o " 1 * The mean of X T defined In
(2.71, may now be seen to be ^OQ Q , and can be expressed also by means of integrals
with respect to marginal distributions of various orders. We denote E(X 1 ) by a 1 , and
note that the above statements apply to a 2 - E(Xp ),..., a^ - E(X^). The point
(a^ag,...,^) is called the mean of the distribution. and the moments p about the mean
are defined to be
- J n (*i-a ) *<&> Ji -
x x x
KX^-a.) i *
1-1 x 1
We note n 00 " 1 I n order to simplify the notation, we specialize the following re-
marks to the variable X 1 or the ,pair X^Xg; their generalizations are obvious:
hoo...o ** * The varlance of x i* defined in 52.71 is seen to be P 2 oo...o* The covar "
iance of X 1 and X g , defined in 2.72, is H l10 o o' aru ^ ^ e corre ^ a tion coefficient of
X 1 and X 2 is
... 0/^200. ..0 ^020. ..
These quantities may all be expressed in terms of the first and second order moments
about the origin.
2.lk Mean and Variance of a Linear Combination of Random Viariables
Suppose we have k random variables X 1 ,X 2 , . . .,2^, the c. d. f. of X^ being
F,(x,). Let their joint c. d. f. be F(x- ,x , . . .,x,_). F 1 (x 1 ) is then the marginal dis-
JL i I <- Jv JL JL
trlbution (2.2) of X^; if the X^ are mutually (statistically) Independent
k
F(x 1 ,x 2 ,...,x k ) - TTp 1 (x 1 ),
but we shall not assume this. Let y - td(x 1 ,x g ,. . .,x k ) be a linear function,
k
(a)
1-1
k
Then Y - t>(X. ^g,...,^) - JEIctjX^ is a random variable (2.6), its c. d. f. G(y) is
li *
0(7) - JdP(x,,x 2 ,...,x k ),
r
k
where EL Is the half -space defined by 5~c
* 1-1
II, DISTRIBUTION FUNCTIONS J2.79
In accordance with the notation established In $2.73, denote the mean of X^
by a,, its variance by o^ which we shall now abbreviate to o^> and the covariance of
X, and Xj by Pi^^* Denote the mean of Y by a, its variance by <.
It is helpful to note that E is a linear operator: if 4> 1 and <J> 2 a* contin-
uous functions of X 1 ,X 2 , ...,X^, and A and B are constants,
1 / A<K _i_Tik \^1W A PK f^T? __ uf^K f\Tf A TS^ <fi \ j. H T?/ <k \
Tc \ "k
From this we get immediately, because of (a),
k
(b) a - E(Y) - jEIaiEU.) -
Note that for the validity of this result it is irrevelant whether or not the X^, are
statistically independent.
Next let us calculate the variance of Y:
0-2 . Ef(Y-a) 2 ! - E|[Zot 1 X 1 -Z 1 a 1 )
7 i-1 1 ^ i-1^ 1
k k
- Efl^-oitX.)] 2 ! - E|
k
(c)
where p^ - 1 . If the X^ are mutually independent, then - o for i ^ J, and
2.7!3 Covariance and Correlation between two Linear Combinations of Bandom
Variables
Suppose Y^ and Y are each linear combinations of random variables. The ran-
dom variables in both combinations may be the same, or none of those appearing in Y may
appear in Y . , or there may be an intermediate degree of overlapping. All of these cases
may be covered by assuming that
II, DlgEMBOMON FUNCTIONS
where the ot^, p 1 ape conatanta and the X^ are random variables with joint c. d. f .
P(X I ^g,...,^). For example, the ease of no overlapping would be obtained by requiring
0, i - 1,2,...,k. If EO^) - a 1 , then from (b) of $2.7^,
E(Y ) -
Hence the oovarianoe of Y and Y ft is
where o^ Is the variance of X^ and ^^Pi* is the covarlance X< and Xi. Hence the cor-
relation coefficient between Y and Y. la
p
from (b) of 52.72 and (c) of J2.7 1 *. Special cases of this formula for the correlation
coefficient are much used in education and psychology in connection with tests.
2.76 The Moment Problem
The general moment problem (univariate) is twofold: (JL) given an infinite
sequence of numbers 1, KJ, n^,..., does there exist a distribution with these numbers as
moments? and if so, (11) is the distribution unique? It is usually only the problem
(11) that arises in statistics. It may be shown that whenever the moment generating
function 4>(fc) (see $2.8) exists for -h e h, h > o, there is a unique* distribution
with the moments 4> (1) (o).
Necessary and sufficient conditions for the unique determination of a
*$ then is analytic in a strip containing the imaginary axis, hence the characteristic
function f(t) - 4>(it) is analytic for all real t, and this is a sufficient condition for
uniqueness In the moment problem: See P. L^vy, Theorie de l f addition des variables
aleatoires. Monographiea des probabilltes, Paris, 1937, p. 1*1.
36 II. DISTRIBUTION FUNCTIONS jjg.fl g.Ai
distribution by Its moments are extremely complicated, but the following theorem
gives an easily applied sufficient condition of Carleman:
Theorem (A); A sufficient condition for the uniqueness of a distribution
with moments |iJ is that the series ^T" (ji m )~ 1 ' 2m diverge,
- !&!
For a multivariate distribution with moments y. 1 define
(a) A i " "*ioo...o * Hoioo...o + ---^o...ooi
A sufficient condition of Cramer and Wold for uniqueness Is Theorem (B), of which (A)
may be regarded as a special case:
Theorem (B); If the series >(A )"'/* m diverges, where A 4 la defined by
^l^Mi^^M* ' CT^T ^ ~
(a), then the distribution F(XJ ,x 2 ,...,x^) is uniquely determined by its moments.
2.6 Moment Generating Functions
When the moment generating function (m. g. f . ) of a distribution satisfies a
certain condition given below, then the moments of the distribution may easily be found
by differentiation of the moment generating function. The use of the m. g. f . also per-
mits the easy determination of the distribution of certain functions of certain random
variables. We consider in detail the
2,81 Uhivarlate Case
For any distribution F(x) we define the m. g. f . aa
(a) <t>(e) - E(e ex ) - "f e ex dF(x).
-oo
If we proceed heuristically, we may write
+00 +00 4 +OD
dF(x) \ x dF(x) - p.J .
J
-00
Let us now consider under what conditions nJ ra 4)^(0).
In order that 4>(e), considered aa a function of a real variable, poaaeaa de-
rivatives at e o, it la neceaaary that <t>(e) aa defined by (a) exiat in a neighborhood
*H. Hamburger, "Uber eine Erweiterung dea Stieltjeaachen Moment enproblema", Math.
Annalen f vol. 81 (1920), pp. 235-319, and vol. 82 (1921), pp. 120-16U, 168-187.
**H. Cramer and H. Wold, "Some theoreraa on diatribution Emotions", Jour. London Math.
Soc. , vol. 11 (1936), pp. 290-29 1 *.
S2.81 II. DISTRIBUTION FUNCTIONS
-h e < h, h > 0. (Note that in any case 6(0) - 1 la defined by (a)). We see now that
this restricts the class of functions tf(x) under consideration. Our definition (2.51)
+00 +QD o
of the infinite integral J implies the existence of f and f . Hence as x -* + oo ,
-CD -co
F(x) -* 1 sufficiently rapidly so that
+co
(c) M. - [ e^dFU) < CD ,
i j
o
and as x -* -co , F(x) -* sufficiently rapidly so that
o
(d) M 2 - f e'^dFfx) < GO .
-CD
This means that F(x) possesses moments of all orders: To demonstrate the flnlteneaa of
+00
1
-GO
consider
+00 a +00
J x 1 dF(x) - Jx^FU) + ^ (x i e" hx )e hx dF(x).
o o a
Choose a 30 large that x i e" hx < i for x > a. Then the second term of the right member is
less than M 1 defined by (c); the first term is certainly finite, and thus
+00
x dF(x) < oo .
o
Similarly by use of (d) we may show
o
< oo,
-GO
and hence ln|| < GO for all i.
We now state the heuriatically obtained relation (b) in the form of
Theorem (A); If the m. g. f. <t>(e) of a c. d. f. F(x), aa defined by (a),
exists for -h ^ e h, where h > 0, then the 1-th moment of P(x) about the origin la
0,1,?,...
3 8 II. DISTRIBUTION FUNCTIONS
The proof of this theorem may be baaed on the theory of the bilateral Laplace
transform and is beyond the level of this course.
The m. g. f . if it exists is uniquely determined by (a). The converse is
stated in
Theorem (B); If F(x) has the m. g. f . 6(6), and 6(6) exists for -h < 6 h,
h > o, and if the c. d. f . G(x) has the same m. g. f., then G(x) s F(x-).
The reader may write out an expression for <t>(e) in the discrete case, which is
a sum of terms, and an expression in the continuous case,, which is an ordinary integral,
by using the analysis of 2.51.
We note that if Y = i|)(X) is a continuous function of X, and Gf-(y) is the
c. d. f . of Y, then the m. g. f . of G(y) is
+00
E(e eY ) = E(e* (X >) - e eip(x) dF( X ).
-CD
If this exists for I el h (h>o) and is recognized as the m. g. f. of a known distribution,
then theorem (B) determines G(y).
In certain problems, particularly in sampling theory, it is important to know
the limiting form of a c. d. f. F/ x(x) as n - CD of a function X n of n random variables.
The m. g. f . offers a powerful method for determining the limit of this distribution.
The method Is to obtain the m. g. f . of X n , say d> n (e); then if 6 n (6) has a limiting form
as n -* oo which la the m. g. f . of some c. d. f . F(x), we may conclude under certain con-
ditions that Lim F, Jx) = F(x). More precisely we shall state the following theorem
n -* oo ( '
without proof.***
Theorem (C): Let F/ n j and 6/ n )(6) be respectively the c. d. f . and m. g. f .
of a random variable X n (n=i ,2,1,^, ). If 6/ n \(6) exists for le| < h for all n and if
there exists a function 6(6) such that Lim 6/^\(6) = 6 (6) for |e| < h', then Lim F/^(x)
n-*co (n) n
= F(x), where F(x) Is the c. d. f . of a random variable X with m. g. f . <t>(6).
*D. V. Widder, The Laplace Transform. Princeton University Press, 19^, P-
**If the integral defining 6(6) exists on the real interval (-h,h), it exists for complex
in the strip determined by the condition that the real part of 6 be in the Interval,
and 6 la an analytic function in the strip: see Widder, loc. clt. Hence if for F(x) and
G(x) the moment generating functions coincide in the interval, they coincide in the atrip.
For coincidence In the atrip there is a uniqueness theorem: Widder, p. 21*3.
***For proof, see J. H. Curtiss, "On the Theory of Moment Generating Functiona", Annala
of MM-,h. Ctrt., Vol. 13, No. k , pp.
{2.82
II. DISTRIBUTION FUNCTIONS
(a)
2,82 Multivariate Caae
The m. g. f. of a distribution P(X I ,
E(e
-1
is defined to be
dF.
We assume
(b)
<t> exists for -h
h, h > 0,
and then may consider restrictions on P(x), analogous to those of 2.81, implied by (b).
We state without proof
Theorem (A); Under the assumption (b)
J-1 Jo J\
ae 1 1 de 2 2 ...ae k l
>1 = 6 2
Theorem ( B ) : If <b satisfies condition (b), it uniquely determines P.
Let F*(x^) f with m. g. f. ^^(6.^) , be the c. d. f. ! 3 of mutually independent
variables X 1 , 1 - i,2,...,k. Then the joint c. d. f. is
(c)
and the m. g, f . of P is
P(X.,X ,...,3
k
U 1
dF
r
_| ]
e
-CD
(d)
By the uniqueness Theorem (B) it follows that if the m. g. f. la (d), the distribution
is (c).
Theorem (C): Suppose that random variables X^, i l,2,...,k, have c . d . f . ' a
p.fx^) with m. g. f * 4)4(6^)^ and that all 6^(e.) satisfy Condition (b). Then the X
are mutually independent if and only if the m. g. f . <t> of the joint distribution F fac-
tors according to (d).
II. DISTRIBUTION FUNCTIONS
SS2.9. 2.91
The theorem 13 also valid In the case where the X i are vector variables (then
e i are also vectors).
If Y.^ - i|) l (X 1 ,X 2 , .. .,X^), 1 l,2,...,t, are continuous functions, then a
method of determining the joint c. d. f. G(y 1 ,y 2 , . . .,y t ) of the variables Y is to form
the m. g. f . of G; It is
= E
6 i Y i
If this exists for le^l < h > o, 1 = i,2,...,t, It uniquely determines G(y 1 ,y gf . . .,y t ),
2.9 Regression
2.91 Regression P'unctions
If X 1 , X 2 have the joint p. d. f. f(x 1 ,x 2 ), we define the regression function
a 1<x of X 1 on X 2 as the mean value of X 1 for a fixed value x 2 of X 2 , 1. e.
+00
(a)
- E(X
l Ix 2 )dx 1 ,
-00
where the conditional p. d. f. t'fxJXg) is defined by (g) of 2.U. We note that the
regression function (a) is a function of x 2 only. The graph of this function la called
the regression curve . If the regression function is linear,
(b)
c,
then we say that we have a case of linear regression, and call b and c the regression
coefficients. The reader may show that if X 1 and X ? are statistically independent, then
the regression of X. on X ? is linear, with b = and c = a. , the mean of X 1 . We remark
that the regression of X 1 on X 2 may be linear, while that of X 2 on X 1 is not.
If X 1 , X 2 are discrete random variables, then in the notation of 2.12, we
define the regression of X 1 on X 2 only for X 2 = x 2i , i - 1,2,..., by
(c) a. *
2
where both summations are made for all j such that x ? . = x ^. For the mixed case des-
cribed in 2.12, we define the regression of X 1 on X by
52.92 _ II. DISTRIBUTION FUNCTIONS _ J*L
+00
(d) a 1-x = E(X 1 lX 2 x 2l ) - ^X 1 f(x 1 !x 2l )dx 1 .
2 -OD
We shall limit the discussion for more than two variables to the continuous
case. Pork random variables X 1 ,X 2 , . . . ,X k , let f (x 1 \x^ f x^ f . . . ,x k ) be the conditional
p. d. f . -defined by (h) of 2.1*. Then we define the regression function of X 1 on
X ,X X ,...,X. to be
2 j K
+00
(e) a i. x x ...x = E(X 1 |x i =x i' 1-2,3,... jk) = J X 1 f(x 1 lx 2 ,x 5 ,...,x k )dx 1 .
2 3" k
-(]D
If this function of x 2 ,x.,, . . . ,x k is linear,
(f) a - ECXJXi-x,) = Z_b.x + c,
1 'V3'" k 111 j =2 J J
then the regression is said to be linear and the b- and c are called regression coeffic-
ients. Similarly, we may define the regression function of any X on the remaining X f s.
We note in conclusion that a regression function may always be regarded as the first
moment of a conditional distribution.
2.92 Variance about Regression Functions
The variance of X 1 fcrr a fixed value x ? of X 2 is defined as
(a)
CD
r^ x = f (x 1 -a 1
-00
p
cr!r is, in general, a function of x 0> and its mean value ** ^ with respect to x is
1 ' Xp d i c: d
known as the variance of X, about the regression function of X, on X . That is, we have
- i - - - - i d
OD CD
(b)
f f
-So -O
c. c c. || 1 X o ' c. e.
-CO 2 -60 ~OD 2
In the k-variate case, we have
GO
p r ?
( f\\ fr- MY n i f ^ Y I Y Y Y^HY
* ' 1.YY Y I^II.YY Y ' * 1 P ^ * * * If ' ^^ 1 *
2 3*" k J^ ? j-*' k
and the variance of X. about the regression function of 'X, on X ,X, , . . . ,;c is
___________. ^ ^_____^ _-____ - I j K.
II. DISTRIBUTION FUNCTIONS
2.93,
'
00 00
-00 -00
00 00
-oo-oo
\-f . t
The quantities given by (a), (b), (c) and (d) may be similarly defined for discrete and
mixed cases, and also for empirical distributions.
2. 93 Partial Correlation
Suppose X 1 , Xg,...,^ Is a set of random variables. The covarlance between
any two of the variables, say X 1 and X ? for fixed values of any set of the remaining var-
iables, say X p , X r+1 ,...,^ c (2<rk), Is defined as
OD oo
C i2.r(r+i)...k be ' the mean value of
^j^
wlth
to
00 00
...k " J...( c i2-
xx
-oo-oo
00 00
-00 -OD
, . . .x k
)(
.x k )f (x l ' W ' -
partial correlation coefficient
X r+i ' ' ' ?X k i3 def ined as
) k ^ e ^ ween ^i > ^ with respect to .
Pi2-r(r+i )...k "
The quantities defined in (a), (b) and (c) extend to discrete and mixed cases.
2.9** Multiple Correlation
A procedure which is often carried out In statistics is that of determining
best-fitting linear regression functions in the sense of least squares even thougji the
actual regression function is "not quite" linear. The procedure is perhaps more often
carried out with an empirical c. d. f. P n (x 1 ,x g , . ,,x k ) than with a probability c. d. f,
II. DISTRIBUTION FUNCTIONS
Jti.
Here, we shall only consider the case of a probability c. d. f . where the variables are
all continuous. There will be analagous results for discrete and mixed cases (and also
for the empirical distributions).
In this problem we let X 1 , X 2 ,..,X^. be random variables with c. d. f.
,X,. >*) and determine the constants b b,..,* so that the mean value cf the square
g,.
1-2
) Is a minimum, 1. e,, so that
(a)
? k
S - J...J (x r b r yb l x l ) 2 dP(x 1 ,x 2 ,..,x k ) =
-CD -00
""
Is a minimum.
The values of the b f s which minimize S are given by solving the equations
a q
jrij - 0, (1 1 ,2, . . . ,k). Writing out these equations we have (after dividing each equation
by -2):
(b)
where a^^ = E(X 1 ) and c^ E(X 1 X.). Substituting the value of b 1 from the first equation
Into each of the remaining equations, and setting Cj* = c ii~ a i a -j ** E[(X^-a^)(X^-a . )],
the covarlance betweeh X^ and X^, we have the following equations to solve for
b g , b^.^b^:
(c)
from which we obtain by using Cramer's rule for solving linear equations
(d)
where
j=2
cof actor of Cj* in \C^,\
1C, J
being the determinant
It is assumed, of course, that this determinant ^ 0.
2k
kk
II. DISTRIBUTION FUNCTIONS
For the value of b, we therefore have
(e)
a, -
The least squares regression function of X 1 on X^X,,..,^ is thus
(f)
where the values of the b's are given by (d) and (e).
If we substitute the minimizing values of the b's, given by (d) and (e), in
(a) we obtain the minimum value of 3:
Min(S) = E[(X 1 -a 1 -
,,
(g)
If we sum the last expression first with respect to 1, we find that
1-2
if
i f - j , and = if i ! ^ j. Hence summing on i and putting i f = j the last expression
T*- i^
reduces to >. C, jCmC^' which is the same as ^> C^C, iC 1 ^. Thus denoting Mln(S)
-.2 JT? 2 U J iJ = 2 J
-2 3...k
by <n . , ",/we have
(h)
C
n
c n
c 12 . .
C 1k
9*i
c 22 . .
C 2k
.
'' kk
1C
ij'
To show that orf. , v may be expressed as this ratio of determinants, let us note that
I C.J uK.
the determinant in the numerator may be expressed as
(D
II. DISTRIBUTION FUNCTIONS
where TJ^ oof actor of C^ in the numerator determinant. Now, for i - 2,3>..*k,
(j) ^i " ~ ^ *^1 " l '*"**^
where t^j ia the cofactor of Cj, in the determinant ICjjl* (i, J-2,3,..,k). Hence the
numerator determinant may be expressed aa
t IC^I "C 1 jCitj"^, 3, .k). Dividing expression (k) by TJ n and remembering that
^J - C M,i,j 2,5,...,k), we therefore establish the fact that cr?. , ,. may be expressed
. I c^ K.
2
as the ratio of determinants given in (h). The quant it><rf. , i. is the variance of X-,
I d,J llv .-- i
about the least- square linear regression function (f ), and should not be confused with
"?-23...k a3 defined ^ 2 -93.
The correlation coefficient between X 1 and the regression function (f ) is
known as the multiple correlation coefficient between X 1 and Xg^X,,..^ and is denoted
by R. . . To obtain an expression for the multiple correlation coefficient, we
I C.J K.
first determine the covariance between X 1 and the function (f ), which is
(1)
The variance of X 1 ia C n and that of (f) is
whose value is equal to the last expression in (g), and which has been reduced to
k 11
C^C. .C J . Hence the multiple correlation coefficient is
l..1-g
TT- .. " I/
Cii
It will be observed from (h) that
p
k6 II. DISTRIBUTION FUNCTIONS S2.9U
and hence by 2.72, R?. 2 3...k " 1 lf ' and only lf> a11 of the Probability in the
k- dimensional space of the random variables lies on the least-square regression surface
X 1 " a 1 "
It should be noted that a partial correlation coefficient between X 1 and X g
with respect to X^, X^, ..., X^ could be determined for the case of a linear least-
square regression function by replacing a. and a Y _ v by the corres-
hr*' x k 2 ' x r c r+r' -x k
ponding linear least-square regression functions in determining C 12 . r ( r+1 )
Again, we remark that analogous results can be obtained by using an empirical
c. d. f. Px^X,...,) instead of a probability c. d. f. F(x ,x, . . .,x).
CHAPTER III
SOME SPECIAL DISTRIBUTIONS
In the present chapter, the notions of the preceding chapter will be exemp-
lified by considering certain distributions that arise frequently in applied statistics.
We shall begin by considering distributions for the discrete case. Since the distinction
between the random variable X and the corresponding independent variable x of the dis-
tribution function has been made clear, we shall henceforth denote both by the lower
case x unless this leads to ambiguity.
3*1 Discrete Distributions
3.11 Binomial Distribution
An important distribution function of a discrete variate is the binomial dis-
tribution which may be derived in the following manner. Suppose the probability of a
"success 1 * in a trial is p and the probability of a "failure" is q = 1 - p. For e- ample
the probability of a head in a toss of an "ideal " coin is ~ and the probability of not H
head (a tail) is 1 - - = -. We can represent these probabilities in functional form
~
-
f(a) where f(<x) - p for a - i , a success, and f(<*) - q for <x *= 0, a failure. In other
words f (a) is the probability of obtaining a successes in a single trial.
The probability associated with n trials which are mutually independent in
the probability sense is
f(,) . f(a a ) . _ . f(o^).
The probability of x successes and n - x failures in a specified order say
2 ' ..... a x = 1 ' a x + i
" a = 1 a = 0,....,a- 0, ia
f(1) x f(0) n " x = p x q n " x .
The number of orders In which x successes and n - x failures can occur is the
number of combinations of n objects taken x at a time which is
{a) n c x = x!(n-x)J '
it 8
III. SOME SPECIAL DISTRIBUTIONS
4i.ll
These n C x orders are mutually exclusive events. Hence, to find the probability B(x),
say, of exactly x successes irrespective of order we add the probabilities for all of the
C orders, thus obtaining
(b)
B(x)
B(x) will be recognized as the (x+1 )-st term In the expansion of (q+p) n . This demon-
strates that the sura of the probabilities is equal to unity, 1. e.
(q+p) n -
x=o
Hence / B(xM is clearly a c. d. f. P(x).
To derive the momenta of the distribution B(x) we will find it convenient to
use the m. g. f .
r n
(c) <t>(e) - E(e xe )
x-o
The h-th moment of x can be expressed as
In particular the mean E(x) Is
(d)
and the second moment about zero is
- 2
6=0
6=0
npe e (q+pe e ) n " 1
6=0
np ,
en
ae
(q+pe)
een " 1
npe(q+pe)
6=0
n(n-l)p 2 e 2e (q+pe e ) n ~ 2
6=0
6=0
np + n(n-i )p .
Therefore, the variance is
(e)
or = np + n(n-l )p 2 - n 2 p 2 = np - np 2 = npq.
S3.11 III, SOME SPECIAL DISTRIBUTIONS 1*9
Example: Applying the binomial distribution to the coin tossing problem,
we have p = ~ and q - ~. The probability of x heads is
The mean and variance are, respectively,
u. 1 ^ = n
In deducing B(x) we have assumed that p remains constant from trial to trial.
If the probability is different for each trial, our conclusions must be modified. Let PJ
be the probability of a success in the i-th trial (1 * i,2,...,n) and q^ - 1 - p^ the
corresponding probability of a failure. Let
P = EPI- Q-HI^QI- ' - P-
Then the expected value of x - 2L_ a i > th total number of successes in n trials, is
1-1 1
= E(<x 1 )+...+E(a n ) - p^...^ - np.
The variance of o is P. Since the trials are independent the variance
of x - yL
Noting that p, - p + (p4-p) and q^ - q - (p--p) we can write the variance
(f) ^_ _
1-1
This is obviously less than the variance, npq, we found above. When the probability is
constant from trial to trial, the distribution is known as the Bernoulli case; when the
probability varies, we have the Polsson-caae.
In 2.71 it was proved that if a variate x is distributed about the mean a
P
with the variance o- , we have the Tchebycheff Inequality
Pr(lx-a|><Scr) ^ lg
for any <J>0. In the binomial distribution x has mean np and variance npq. Let us change
to the variate r - ~, the "relative frequency" of successes. We have E(r) - E(^) - fj
" ^n " p ' Similarly, o- SE| , EL. The Tchebycheff inequality states that
_20 III. SOME SPECIAL DISTRIBUTIONS .12
If we choose 6 ^fcrA, this inequality becomes
^ppq
(8) Pr(|r-p|>A) ^M . ^
Inequality (g) expresses what is known as the
Law of Large Numbers: For any given positive number A, the probability that
r will deviate from 2 ]L more than A can be made arbitrarily small by choosing n suffic-
iently large .
Roughly speaking, the larger the value of n, the more the probability
"piles up" around p (the mean of r) such that in the limit (as n > oo ) the probability
is all piled at p.
In the example of "ideal" coin tossing r is the ratio of number of heads to
total number of tosses. Then
Pr(lr - ||>A) < -^r- . ~~T *
1 M '
Example: If A = 0.1 and n = 100, we have Pr( | r - ^|>.1 ) < -r^r r; In other
words, the probability is less than -j-*~'that the relative frequency of heads will
deviate from ^ by more than o.i .
3.12 Multinomial Distribution
An immediate generalization of the binomial distribution is the multinomial
distribution. Suppose an event is characterized by a variate that can take on one and
only one of k values, say y 1 , y 2 ,..., y k . For example, if the event is the throw of a
die and if y is the number of dots appearing on the top face, y can take on only one of
the values 1,2, 3, ^, 5, 6 In each throw. It should be noted that the k mutually ex-
clusive kinds of events may not correspond to k values of a one- dimensional variable y.
Thus, if Cj, C 2 ,...., C^ are k kinds of events, (e. g., the sides of a die may be
colored rather than numbered) one and only one of which will occur in each trial, then we
may let y be a vector with k components (y ,y ', ,y ' ), such that the value of the
vector for an event of type C 1 is ( 1 ,0,0, . . . ,0), the value for one of type C 2 is
(0,1 ,0, . . ,o), etc. For convenience, we could denote these values of the vector y by
y 1 , y^, etc., and proceed as in the case where y 1 , y 2 ,...., y k are different values of a
one- dimensional variable y.
Let the probability of y being y i be p i where 2^p i i . The probability assoc-
iated with n trials is
S3-12 - III, SOME SPECIAL DISTRIBUTIONS _ ai
where each of the y's will have one of the values ^ , y 2 ,...,y k , where f(y 1 ) - p 1 (il,2,
. ..,k). We now wish to find the probability that x 1 of the y f a are y 1 ! s, x 2 of the y f s
are y s, etc., (2_x, n).
2 l-i x
The probability of x 1 events characterized by y 1 , etc., occurring in a speci-
fied order, sayy (1) -y,,...., y ( ^ } - y, , y (x ^ } . y g , . . ., y (n) - y k , is
The number of different orders in which we can get x 1 y 1 ! s, etc., is the number of ways
in which n objects can be permuted where x 1 are of type C 1 ,...,x k are of type C k , that is
So the probability of x 1 y 1 's, x g y g a, etc., irrespective of the order in
which they occur is given by adding the probabilities of various possible orders. We
obtain
(a)
This may be recognized as the general term in the expansion of (p^pg*...^) 11 . Hence,
the sum of M(x 1 ,x 2 ,..,x k ) over all partitions of n, that is, all sets of x i (^c.-n,
is unity.
To find the means, variances, covariances, and higher moments we set up the
m. g. f.
(b(e 1 ,e 2 ,...,e k ) = E[e
ni -Vi x i X 2 x k
i"
e, e,, n
92 III. SOME SPECIAL DISTRIBUTIONS
The mean of x, is
(c) E(x. ) - t| - np^ I (p 1 e
JL ^f I *" '
And
de; j. i . j. i . |ea*o
1 e f s-o
np* + n(n-l )p< .
Therefore, the variance of x^ is
2 222
(d) <r > npj -f n(n-l )p^ - n p.. np^(1-p^ ),
1 *
In a similar manner we find the covariance between x^ and x^ to be - np^p^. It is clear
that the binomial distribution is the special case when k 2.
3.13 The Polsson Distribution
The Poisson distribution is in a sense a particular limiting form of the bi-
nomial distribution. We shall deduce it from geometrical considerations. Let AB be a
line segment of length L and CD a segment of length 1 contained in AB.
AC D B
Figure 5
Let the probability that a point taken at random falls on an interval of length du be
j^ ; that is, the p. d. f. of u is a constant. The probability of the point falling in
CD is ^. If we let n points fall at random on AB, the probability that exactly x of the
fall on CD is given by the binomial distribution ((a) of 53.11 )
n:
Now let n and L increase indefinitely in such a way that the average number of
points per unit length is a finite number k^ o, i. e., k. Now
n/ ^x n(n-l ). .(n-x+i ) /nlvX,, n l x n-x
B( X ) -| (j-) (1- j- 5 ) .
[x!u x ]
So the limiting value of B(x) for a given x Is
III. SOME SPECIAL DISTRIBUTIONS
Lim B(x) - Lim
x:
n, x n n l,n-x (kl) x e' kl
E D (i- E . H ) - ^T
Let kl = m, and we get the usual expression for the Poisson distribution
(a) p(x) = !
The aim over all x Is seen to be 1,
= e- m (i+m+f?+...) = e' m e m - i.
xo
The m. g. f. is
(b)
A/^\ ^
<i)(6) = e
CD CD
m V ^ X V ,6
""A 8 ^T - e Z^T
Prom this we derive the momenta about zero in the customary manner.
(c)
E(x) - ii
Efx^)
6=0
ae
6=0
= m + m
e=o
Therefore, the variance is equal to the mean ,
(d) or ~m + m -m =m.
This argument given for one dimension immediately extends to two or more di-
mensions. For example, for two dimensions we would take AB and CD to be regions of the
plane, the latter contained in the former, arid k to be the limiting ratio of the number
of points per unit area. The Poisson distribution is applicable to problems dealing with
occurence of events in a time interval of a given length such as emission of rays from
radioactive substances, certain traffic problems, demands for telephone service and
bacteric count in cells .
Example ; Let us consider the following problem as an example to which the
Poisson distribution is applicable. If X-rays i-.re considered as discrete quanta and
if the absorption of k or more will kill a certain unicellular organism, what is the
probability that an organism of a given size S on a given glass glide will escape
death by X-r^-ys after being exposed for t seconds? On the assumption thcit the
III. SOME SPECIAL DISTRIBUTIONS
projection of the organism of size S on a plane has an area of a, and m la the
average number of rays striking an area of size a in t seconds, and the rays
appear independently and at random, then the probability that x of the X-rays
hit. the organism in t seconds is
P(x) -
x!
k-1
Hence, the probability of survival ia > p(x). The average number of rays ab-
sorbed by the survivors is
> xp(x)/ 5~p(x).
M) / x^T)
3*n The Negative Binomial Distribution
Another discrete distribution which is closely related to the Bernoulli bi-
nomial distribution is the negative binomial. If we expand, according to the binomial
theorem,
(q - P)" k
where q - 1 + p, k > o, p > o, we get as the general term
(a) q- k [
When we interpret this as a probability function of x, p(x), it ia called the negative
binomial distribution and ia defined for x 0, 1, 2,... We notice that the sum of p(x)
for all x is unity ,
oo
oo
X=0 X=0
The m. g. f . is
oo
(b)
OQ
From this wo find the merm
(C)
E(x)
at
06
6=0
kp
83.
III* SOME SPECIAL DISTRIBUTIONS
and
ae
k(k+l
6-0
- kp + k(k+i )p*
e-o
Therefore the variance is
(d)
<r 2 - kp + k(k+l )p 2 - k 2 p 2 - kp + kp 2 - kpq.
The similarity of this m. g. f . and these moments to those of the positive binomial dis-
tribution should be noted.
It can easily be shown that a special limiting case of the negative binomial
distribution is the Poisson law. If we let p o and k * coin such a way that
lira kp - m ,
then
-Urn
- lira (n) * j, I
k-oo
= e->.
If we make a change of parameters, we have the usual expression for the
Polya-Eggenburger distribution. Let
h
k -y . D = d.
O
Then the distribution may be written as
(e)
P(x)=(Ud)
This distribution, one of a number of contagious distributions, ia useful In describing,
for example, the probability of x cases of a given epidemic in a given locality.
If we Interpret - as the probability of a "success" and jj as the probability of
a "failure" In a trial, then it will be seen that (a) is the probability that x + k trials
will be required to obtain k successes. For the probability of obtaining k - 1 successes
JJ6 _ III. SOME SPECIAL DISTRIBUTIONS _ SS?-*,
and x failures in x + k - 1 trials is
_
(k-l)J x. 1 q q
Now the last trial must be a success. Therefore multiplying this probability by (~),
the probability of success, we obtain (a), the probability that x * k trials will be
required to obtain k successes.
3.2 The Normal Distribution
3.21 The Uhlvarlate Case. A very important distribution is the normal or
Gaussian distribution
ke -h 2 (x-c) 8
defined over the range -co < x < oo where k, h, and c are constants.* Various attempts
have been .made to establish this distribution from postulates and other primitive assump-
tions. Gauss, for example, deduced it from the postulate of the arithmetic mean which
states, roughly, that for a set of equally valid observations of a quantity the arith-
metic mean is the most probable value. Pearson derived it as a solution of a certain
differential equation. It can be shown that it is the limiting distribution of the
Bernoulli binomial distribution. We shall not derive the normal distribution from more
basic considerations, but we shall observe that it arises under rather broad conditions
as a limiting distribution in many situations involving a large number of variatea.
We can determine k in the distribution by requiring that the Integral over the
entire range be unity. If we let u h(x-c), we wish
JdP(x) - H \ e" u du 1.
r 2
To evaluate the integral I = } e" u du we observe that
2
r u
-GO
oo oo
- \
-00 -00
Changing to polar coordinates u r cos e, v - r sin e, we get
2rr oo 2
l2 " \ i re ~ P ^ de "
Therefore, we take k - *== .
83.21 III. SOME SPECIAL DISTRIBUTIONS
The mean of the distribution is
CD
>
-00
-00 -00
The latter Integral is zero because the integrand is an odd function of x - c. So
oo
a = E(x) =
* n
-00
The variance is found by integration by parts,
-00
We usually write the normal distribution with c and h expressed in terms of a and cr^ f
respectively, i. e.,
. (x-a) 2
(a) f(x) - -=1- e
|[27Tcr
We shall refer to this distribution as N(a,<r 2 ).
To find higher moments (about the mean) it is convenient to use the ra. g. f .
of the normalized variate ~
x-a
E(e 6( "o r " ) ) - -~ \ e e 2a dx
e
-oo
Setting jr - e y, the last integral becomes
oo 1,,2 i
-oo
Hence.
-Le 2
(b) t(e) - e 2
III. SOME SPECIAL DISTRIBUTIONS
It should be noticed that the normal distribution is symmetrical with respect to the
line x - a, its mean. The smaller the value of a^ is, the greater the concentration
about the mean. In fact o- is the distance from the mean to the points of inflection:
f(x)
a-o- a a+<r
Figure k
Because of its wide application and because of its theoretical importance, the normal
distribution has been the origin of much of the terminology and many of the concepts in
statistics.
The integral
00
du = 1 - F(x)
is widely tabulated; the ordinate
1 e 2
la also tabulated in many places. The value of x for which
t ru!
O 1
e du TT
-x
is called the probable error and is approximately .
It can be readily verified by applying Theorem (C) ^.21, that as n * oo the
normalized variable x " n P , where x is distributed according to the binomial law, has the
Vnpq
limiting distribution N(0,1 ). For we may write
x-np
where x lf x g ,.. f x n are independently distributed according to the law p(x) p x (l-p) 1 " x ,
(x-o, or 1 ). The mean of this distribution is E(x) - - X" X P X ( 1 "P) 1 " X " P* and the vari-
ance is o- 2 - 21 (x-p) 2 p x (i-p) 1 ~ x - pq. The applicability of Theorem (C),U.2l, is then
x0
obvious .
. 22 TIT- SOME SPRHTAL DISTRIBUTIONS
3.22 The Normal Bivariate Distribution
The extension of the normal probability density function to the case of two
variables, x 1 and x g , is straight forward. We replace (x-a) 2 by a quadratic form in
x 1 - a 1 and x 2 - a 2 . The distribution may be written
-iQ
Ke 2
where Q - A^y 2 + 2*! 2*1*2 + A 22 y 2> y i " x i " a i' and K > A n > > A 22 > ' A 12 are con "
stants such that A.-A 22 > A 2 . These inequalities on the A f s are necessary and sufficient
conditions for Q to be a positive definite quadratic form in y- and y 2 , i. e., Q >
unless y 1 - y - o. We wish to determine K so that the integral of the p. d. f . over the
x,jX -plane is unity. The integral transforms to
-CD -CO
(a)
A
0805 -
-1 S
6 ii n^ r\^ u J^ V4 Jp
-oo-oo
If we let y 1 -f ^ y z 1 , and integrate z. and y in (a) from -oo to --oo, and use the
1 *n d *
fact that
oo
-oo
we obtain for (a)
K .
If the integral is to be unltyj we must choose
r-
J[A
2TT 2TT
where A is the determinant
A 1 1 A 12
A 12 A 22
We may, therefore, write the distribution as
60
III. SOME SPECIAL DISTRIBUTIONS
2tr
where
^ .
In order to find the means, variances, and covariance of x. and x p , it will
be convenient to obtain the nu g. f. of (Xj-a-j) and(x 2 -a 2 ), i. e.
e) - E(e
)
(c)
Letting
HJ<
-oo-oo
- y. , we have
*
iiy J
-00 -00
where R
-oo-oo
A. eA6-2A 6.6
l . . ..
2 - lg ] 2 = A 1 1 ef+A 22 6^2A 12 e 1 e 2 where A lj
. cof actor of A, . in A
Making the change of variables
and integrating with respect to z 1 and z g , we obtain
(e) 6(6^63) e l ^' a " 1 ".
Now consider the problem of finding the mean values of x 1 and x ? . We have
- o.
Hence E(X I ) - a, . Similarly E(X S ) -
S3 *22
TTT. SOME SPECIAL DISTRIBUTIONS
To find the variances and covariances of x. and x 2 , we must take second derivatives.
Thus to find the variance of x. we have
f - E[(x r a 1 )
O
36?
6,^2-0
Similarly,
For the covariance, we have
- A
22
a- 2 - E[(x r a, )<x 2 -a a )]
If the three equations
(f )
of- A"
2 .22
" = A
=A
12
- A
12
,
are solved for A.-,
2 ,
we obtain
tribution
(h)
We may summarize as follows:
Theorem (A); If x 1 , x 2 are distributed according to the blvariate normal dis-
_ 1 *~~
g
the m. g. f . of (x 1 -a 1 ) aiid(x 2 -a 2 ) is given b^ (e); E(X I ) = a^, (1-1,2); the variance
ii i P
of Xj 1 A (11 ,2) and the covariance between x 1 and x g la A . A
, A 12 are
expressed In terms of variances and the correlation coefficient between x 1 and x b^ (
Expressing A 1 1 , A 12 , A 22 in (h) in terms of cr 2 cr 2 and p, the distribution
(h) may be written as
62
III. SOME SPECIAL DISTRIBOTIQNS
(x 1 -a 1 )(x s - a )
(1)
e 2 <'-P ) *i
The marginal distribution of (1) with respect to x, la the distribution of
x,. Thus Integrating (1) with respect to x 2 we obtain as the distribution of x,
A similar expression holds for the distribution of x g .
'Ve would also like to know the conditional probability function
Substituting the expressions for f(x r x 2 ) and
flnd
from (a) and (b), respectively, we
Thus, for a fixed value of x 1 , x 2 13 distributed according to N(a 2 4p 5 ^(x 1 -a 1 ), <Tg(l-f 2 )).
In a similar way we can show that the marginal distribution of x 2 is
N(a 2 ,o|) and the conditional probability of x^ 1 given x 2> is NU,* p^(x 2 -a 2 ),cr^(l-p 2 )).
It will be observed that if p - o, the marginal and the conditional probability distribu-
tions of x 1 (or x 2 ) are identical.
Since the conditional distribution of x 2 is ti(a 2 *f^(x^-a. } ), cr 2 (i-p 2 )), the mean
value of x 2 for the Interval (x^x^dx, ) is simply a g -pP(x 1 -a 1 ). So the regression
function of
on
x 1 is linear, that is,
00
Similarly
p J- (x 2 -a g ).
Since o-g (l-p ) Is the variance of x g about the meana 2-x in the conditional probability
distribution, the nearer p 2 la to 1 , the smaller is this variance. If p 0, x 2 does not
III. 30MB SPECIAL DISTRIBUTIONS 63
depend on x 1 ; the two variatea are independent and
(*r*1 )2 (* 2 -a g ) g
2cr
f(x f x) -
3.23 The Normal Multivariate Diatribution
Let us now consider the extension of $3.22 to the case of k variatea.
Let
Xg,...^) - Ce
where 1 1 A^i 1 1 is a synmetric, positive d^n^j matrix, that is, AJJ - Aj^ and
> A 11 t 1 t^ > o for real t,, not'all zero.
i,j1 1 J ! J 1
We wish to determine C so that the integral over the entire range, -oo<x^<oo,
is unity. We must have
i-
-00 -00
To evaluate this integral, we transform the variables. Let
Then
oo i
-00 -00
where Q - > A,
f ~? *
...] e dyi*r 2
Ifow we can write
" A
1 1*1 P ^- A A
' J J * X^ M 4**i
Let <
z, - y, + -*
1 1
III. SOME SPECIAL DISTRIBUTIONS
Then
1
7-
...J
.dy k .
-CD -03
The range of z 1 la -oo < z^ < oo .
We should observe that the quadratic form Is again positive definite, that Is,
A,J?
for real s^ not all zero. For If therp were such a set of s's for which this quadratic
fonn were zero or negative, it would be implied that there is a set of t ! s for which
We continue this process, in turn letting
+ J
41'
and correspondingly
v<2)
(D A (D
.(k-2) .(k-2)
(k-1) A(k-2) Vi I .k ^k^k-i
' -
Each quadratic form in this sequence is positive definite by the foregoing argument. The
integral becomes
1
TJ
T ? -^
J ... I e
dz r ..dz k .
-00 -OD
The final quadratic form is positive definite, so A n > o,
we can integrate on each 2 in turn, using the fact that
> o,
1 ' > o. Hence
e
-OD
-cx
^
! c
3*23
III. SOME SPECIAL DISTRIBUTIONS
Therefore, we get
1
TJ
(21T)
(D
To find the value of w, let us evaluate by Lagranges 1 method (known alao as
pivotal condensation) the determinant of II AJ ||
A 21 A 22 " A 2k
- A,
1 A,
f\f* m
^22
K1
If we subtract A 12 times the firsL column from the second, etc., we get
IA! - A,
1 ....
A 21 A A 21 A 12 . . . .
22"" S
A
AT;
A kk "
A 2i A ik
A ki A ik
(a)
Continuing in this way, we find the value of the determinant
-
Therefore, the constant we are seeking is
(2TT)
and the normal multivariate p. d. f. is
(b)
iTJAT
66 _ _ _ TIT. SOME SPECIAL PTftTRIBOTIQNa t? ?
At this point we should notice some properties of positive definite quadratic
forms and matrices. Since IA| - AA ^ > ' A ' is P sitive > ** or each of
factors is a positive constant. Corresponding to each principal minor of II A^j II of order
h, there is L. quadratic form in h variables. This quadratic form is again positive def-
inite. For if there were a set of h t's (not all zero) making this form zero or negative,
this set and the (k-h) other t f s zero would do the same for > A..,t,tj. Since the de-
173=1 J J
tenninant of a positive definite matrix is positive, it follows that every principal minor
is positive. Conversely, if every principal minor is positive the matrix or the quadratic
form is positive definite, for then each AJ[' is positive and the above process of re-
ducing to a sum of squares may be carried out.
The transformation to the z ! s is linear, of the form
where b^ * - for j < 1. The process we have used proves the theorem that any positive
definite quadratic form may be "diagonalized" by a real linear transformation. If we
followed this by the transformation
W
w i
we would have reduced the quadratic form to a sum of squares. This last is equivalent to
i- V A "" :Jt
(c)
w - \/A (l " 1 Ob x
* W ii 1 ij J
Now we wish to show that the mean is
To do this we differentiate both sides of the following equation with respect to a :
6
-QD -00
Since <L. ^.A, .(x,-^ )(x .-a . ) -2 ^L A b ,(k,-a.) the differentiation of the above equa-
da h l,j 1J x x J J J=1 nj J
tlon gives us
$3.23 _ III. SOME SPECIAL DISTRIBOTIONS 67
2 J ...
SO
for i - 1, 2, . .., k. This gives us k homogeneous linear equations In the k unknowns,
E(x,-a,). Since the determinant of the coefficient matrix, |A|, is not equal to ero,
J J
the only solution to these equations is that all the unknowns be zero.
aJ - 0.
So
(d) E(xj) - a,j, j -
Next we wish to show that the covariance of x^ and x* is
, < cof actor of A< v in II A- ^ II
a^Xx.-a,)] - A 1 ^ - - *J - 3J_ .
1 J J |A|
To demonstrate this we differentiate with respect to AJ both sides of the identity
P ? -
e
J J
Differentiating, we have
(2tt
? ? *ir 2 "
f < "2 Xx-'a-Xx.^,)] e
J 2 l l j j
d2 " I d ir 2
2 . (cofactor of A) - (-*
where ^ 1 if 1 - j, and - o if i j( j.
is euation by (-
i j
P "5
If we multiply both sides of this equation by (- ) |A| the left hand side is
and the right hand aide is A J. So we have
66 _ III, SOME SPECIAL DISTRIBUTIONS
(e) <r - Et(x-a; 2 ] -A 11 ,
(f)
We may summarize aa follows:
Theorem (A): If x. , * 2 , . . . , x^ are distributed according to the normal multi-
varlate distribution (b), then E) - a, < - A 11 , and
Now let us find the 'Joint marginal distribution of x 1 ,x g ,..,x r (r<k). To ao
this we integrate out x p 1 ,..-.,Xj c , getting
I.)
We can see this is true if we recall the procedure used in evaluating . If at any stage,
we had integrated out the z f s, we would have had remaining a normal multivariate distribu-
tion of the x's.
We wish to find an expression of the B UV in terms of the A^i. We know that the
value of E[(x u -a u )(x v -a v )] is A uv if found from the original distribution and ia B uv if
found from the marginal distribution. But these two expressions must be equal. Therefore
A U Y - B uv .
Hence, to derive || B uy || from || A^ J| we delete from || A J || the last k - r rows and columns
(obtaining HB UV || ) and take the inverse of this matrix.
In particular, suppose r = 1 . We find the distribution of x 1 to be
where
where 75^ - cof actor of A n in A.
Thus, tr* - A 11
Similar distributions exist for the other x ! s.
This result gives us a simple method of finding the m. g. f. of
(x 2 -a 2 ), ..., (x^a^) defined by
43.83
III. 30MB SPECIAL DISTRIBUTIONS
(h)
- E(e
6 4 (x.-a, )
-1
j.
<*Z.
l-i
-oo
(1)
Consider the expression
+ I A oo< x o- a o> 2 ,
GO
where
inite.
and A Q - (A^J (1, j-o, i,2,.. f k) and
(i, j0,i ,2, . . ,k) positive def-
If we set A Ql -^ and (x -a ) * 1, then it will be seen that the expression
(i) is exactly the same as that defining 4)(6 1 ,$ 2 , . . .>\). But the expression in [ ] is
where B 1
Now ^7 argument presented in 2.91*, we may write
A oo A oi ' ' ' A ok
A io A 11 ' A ik
" A od A| -
Therefore we have
(k)
k
A oo " /nr 1 A oi A oj A
i> j=i
Substituting this value of B^ in (j) and the expression for ( j) in (1) we find that (1)
reduces to '
(1)
Setting x Q -a = i, and A Q > --e^, we therefore obtain the following result:
Theorem (B): If x, , x , . . . ,x,, are distributed according to the normal multivar-
a^HM^BBS^aKeRBOMI^fe ^ I ^ ^ .._.._. __ ..i - . _ . -
70 _ ITT. 3CMB
late law (b), the m. g. f . of (x^a, ), (x s -a. s ),..,(x^-& k ) is
(m)
The argument leading to Theorem (B) may be readily applied to show that any
r (r^k) linearly Independent linear functions of (Xj-a^, 1 - 1,2,..., are distributed
according to a normal r-varlate distribution. To show this, let
(n)
* i-1
be the r linearly independent linear functions, i. e., such that there exists no set of
constants C (i>i ,2, . ,,r) not all zero for which ^ Ip^C- - 0, io,2,..,k. Let
^,6 , ...,e ) be the m. g. f. of the L, i. e.,
*-
IT D ^ ( Y Q ^ _i- .^ A T
4 ^*-l / \ ^ 4 ** 4 / "* f ri 'ri
,e A ' J "' " 1 dx,...dx k
(21T) 2 - 00
(o)
I ]]'
f
i ^Y Q ^^Y fi ^J./ "t" ^Y Q
4 ^V^ ^*4 / \ ** 4 Cl J / T X 1^4 V 4 <
J Jw -J-iX
where t 1 - 26 1 . . The value of this integral is given by (1) with x -a - i,
oo
A ol =-t r Thus
(P) 6(6, ..... e p ) - e ^J- 1
where
Now consider the quadratic form
X BP Vq= ^^^W 1 ^^Q*-
If || A ij '[| is positive definite and if 1 are linearly independent, then clearly || B pq ||
is positive definite. We therefore have
Theorem (C); Let x 1 , . . ,x k be distributed according to the normal multivariate
i_aw (b), and lot L p = ]^l pl (x 1 -a 1 ) (p-i,2,..,r) be linearly independent linear functions
<2L t/no x j " ' j Then L p are distributed according to the normal r-variate law
3.23 HI* SOME SPECIAL DISTRIBUTIONS 71
(2tr) 2
A 1 ^! ,1 , .
where II B^ll is the inverse of the matrix II B pq ll, and B pq > A 1 ! ,
- pq ------ i -^- i pi
Next let us find the conditional p. d. f .
r(x^,..., x k )
f(x 1 |x 2 ,...,x k ) - g( x 2 ,..., Xk ) >
where g(x 2 ,...,x k ) la the marginal distribution of the last k - 1 variables. Using the
marginal distribution found abo^ where now II B pq || = II A pq || (p,q 2,,..,k) and also
t we get
(r) f(x l !x 2 ,...,x k )dx 1
dx dy dx
ox 1 ax 2 ...ox k
rar
Therefore, f9r fixed values of x , ...,x k , we have x 1 normally distributed with variance
and mean
k
(3) E( Xl lx 2 ,...,x k ) - a, -
The regression function for the multivariate normal distribution is linear.
III. SOME SPECIAL DISTRIBUTIONS UJL
3.3 Pearson System of Diatribution Functions
Thus far we have dealt with special distributions which arise under certain spec-
ified conditions. Several attempts have been made to develop a general system of distri-
butions which can describe or closely approximate the true distribution of a random
variable.
One of these systems derived by Karl Pearson Is based upon the differential
equation
(a)
b+cx+dx
Depending on the values given the constants a, b, c, and* d we get a wide variety of dis-
tribution functions as solutions of the differential equations. We get J-shaped and
U-shaped curves, symmetrical and skewed curves, distributions with finite and infinite
ranges.
The normal distribution may be obtained aa a solution of the differential equa-
tion for c*d0 and b < o. This function Is Type VII of Pearson's twelve types of solu-
tions.
Another special case we shall be interested in Is d - 0. Then the equation is
dj; (x+a)y
ox b+cx
Writing this as
dy dx /ca-b\ dx
"~ + ( ~ )
we see the solution Is
2
C
Changing the constants, we have
y Ke^tx+atf- 1 P>o,V>o,
00
where K is chosen 30 K \ e~^ x (x+a) ~ 1 dx i. This is the Pearson Type III distribution,
>r-at<x<oo7 a
To determine K we m^,ke tlie Indicated integration. Let
defined for -ot<x<oo7 a
- x
83.3 III. SOME SPECIAL DISTRIBUTION
CO 00
K
-Qt
where K ! Ke' *(f v . Therefore we choose K 1 so
This last integral is an important function of the exponent V denoted byRv), the gamma
function of V.
To evaluate f(v ) we integrate by parts, using z v " 1 as u and e~ z dz as dv.
-Z
oo oo
+ (v-1 ) i (
o
This gives us a recursion or a functional equation forRv). If V is an integer,
(b) Rv) - (v-i)(v-2)...2.ini).
Since
oo
Ri) - J e~ z dz -i,
o
we have for V an integer,
Rv) - (V-1)! .
It is also easy to evaluate T(V) if V is an integer plus ^. For
00 _1 QD
(i)-
I--"
o -oo
and we have
(c) f^(v ) (v-1 )(v-2). . ..5 VTT.
In general for v > 0,Tcv) has a finite value, and in any interval (a,b) of values of
V(0<a<b), Rv) is continuous. Rv) has a minimum f or V 1.1*6163.
_lit III. SOME SPECIAL DISTRTBOTIONS
With this determination of K, the Pearson Type III distribution is
(d)
This distribution for the case a o and p 1 is known as the ^-distribution with 2V
degrees of freedom and Is one of the most important distributions in statistics. It and
certain applications will be studied ih detail in Chapter V.
It will be convenient at this point to find the moment -generating function of
the distribution (d) when a 0. We have
s*\
U
6(6) E(e ex ) - pA-y
CD
y>
= (fi-e) v '
Therefore, for p~6 ) 0, we have
6(6) d - |r v .
F:>r p ~, we have
(e) 6(6) - (1-26)" V ,
which :Ls the m. g, f . for the X 2 - distribution with 2V_ degrees of freedom.
Next let us consider the solution of the differential equation (a) when
dx 2 -f ox + b has two real roots, say g and h (g<h), both different from -a. Then, using
partial fractions we can write the equation as
d(x-g)(x-h)
<*Z y( x + a _ / A B v
dx * d(x-g)(x-h) " y 4-g " h-x ;
where A and B are functions of g, h, a and d which we do not need to determine.
The solution of this equation is
y C(x-g) A (h-x) B ,
*3.3 _ III. SOME SPECIAL DISTRIBUTIONS _ 7?
where C la a constant of Integration. We wish to determine C ao that
h
C$(x-g) A (h-x) B dx - i.
If we let x - g + (h-g)v, the Integral becomes
1
C(h-g) A+B ^ J /0-v) B dv.
Because we will need the result later, let us evaluate the integral, namely
\ n-i n p -i
} v 1 (i-v) 2 dv,
o
which is known as the Beta Function of n 1 and n 2 , B(n 1f n 2 ). We wish to show that this is
Hn.nrO
(f) B(n v n 2 ) - ! - ^ .
1 2
To do this we consider the product'Rn., ) Hn 2 ), where
*? n.-i -x
I (^ ) . } x ] e dx ,
o
and similarly for r(n 2 ). Letting x - s 2 , we get
2n-i 2
Rn 1 ) = 2 ] s ] e 3 da.
So we can express F T (n 1 )T(n 2 ) as the double integral
oooo
n 1 )Rn 2 ) - U \ J s 1 t 2 e~ a -t ds dt.
If we change to polar coordinates ,
s r cos e,
t = r sin 6,
this Integral over the positive quadrant of the at plane becomes
f f sn r l 2n n -i ?n + 2n -l g
ii I \ cos e sin e r " c dr de.
Now
2n 1 +2xig-l 2
r e x dr - I (n^ng).
If we let cos^e - x, we get
H
2 s
2J COS
,-i
2n--i A n.-l n p -i
sin e de * \x (i-x) dx -
Combining these results, we have
thus proving our desired result.
Therefore, the Type I distribution may be written In the general form
t ^ RA+B+2)(x-g) A (h-x) B
There are twelve types of Pearson distributions. Below are graphed several
representative ones
It
Figure 5
}.k The Gram-Charlier Series
Another rather general system of distribution functions, known as the Gram-
Charlier Series, is based upon the normal distribution and its derivatives. Instead of a
number of distributions of different functional forms, this system is composed of an in-
finite series of terms of a certain kind. Charlier gave a theoretical argument for this
system from his development of the hypothesis of elementary errors. We shall regard it,
however, as a distribution which has been found satisfactory for fitting or "smoothing"
certain empirical distributions.
III. SOME SPBCW, pHnHWffMW 22-
generator of this series is the Gaussian or normal distribution. Let
Jx-a) 2
(a) (Mx) - -7=- e
and let
a l
(b) ^(x) - -^j 4> (x), 1 - 1,2,...,
where x 1 2lfi . Then the Gram-Charlier series is
(c) f(x) - b 6 (x) ^ b^^x) + b 2 4> 2 (x) + ... - 4> Q (x) |b Q -b 1 ^ + b g [t x '|) - i] - ...
where (z) Is the nth Hermlte polynomial
-2^ n|[n-Ti(n-a?(n- ? ) z n-4. p>>
By choosing the a, <r, and b f s properly we obtain a wide variety of distribution f unctions t
which are asymptotic to the x-axis at both ends of the range.
Since ^
f(x) dx b
we choose b 1 . The mean is
oo
-00
\ xf (x)dx - crb 1 + a.
-00
If a in the expression for x 1 is taken as the mean of the distribution f (x), then b 1 - o.
Taking a a^ the mean of the distribution we find
oo
f (x-a) 2 f(x)dx cr 2 + so 2 ^.
-oo
If a-, in the expression for x 1 , la chosen "as the standard deviation of f(x) then b 2 0.
It is easily found that the third and fourth moments are
(d)
(e)
78 III. SOME SPECIAL DISTRIBUTIONS fS-fc
Similarly, higher momenta can be found. Equations (d) and (e) and similar ones for higher
moments give equations for determining the b f s in terms of moments. The problem of
fitting distributions by the uae of moments, however, will be discussed in 56.4.
CHAPTER IV
SAMPLING THEORY
4l General Remarks
Suppose x is a random variable with c. d. f . F(x). In accordance with the
statement made at the end of 2.3, we define a random sample (^ of size n of values of x
from a population with c. d. f. P(x) as a set of n random variables x 1 , x 2 ,...,x n with
c. d. f.
(a) FU,) . P(x 2 )
We note that a random sample consists of statistically independent random variables all
having the same c. d. f . It is often convenient to think of x^ as the value of x in the
first "drawing" from the population, x g as the value of x in the second "drawing", etc.
In the theory of sampling, we are usually interested in c.d.f.'s of one or more
functions of the n random variables comprising the sample. Thus, suppose g(x^ ,x 2 , . . >x n )
is such a sample function (Borel measurable). We are interested in determining the c. d.
f. of g, i. e., Prfgfx^Xg,...^) g], the value of which is obtained by performing the
Stieltjes integration
(b) S""$ dP( * l) "" dP(X ) '
R
where R is the "region in the n- dimensional space of the x f s for which g(x 1 ,x g , . . 9 * Tl ) ^ g
Similarly, if g l (x 1 ,x 2 , , . .^J (i - i,2,...,k), kn, are k Borel measurable
functions, -we are interested in determining Pr(g i (x 1 ,x 2 ,.. .^J ^ g^ (i 1,2,...,k)).
The random variable x may be a vector with r component^ say x* 1 ',x^ 2 ',. ,.,x^ ' ,
with c. d. f. F(x' 1 ',x^ 2 ',. . .,x^ r '). In this case the sample O n would consist of n random
vectors (x ,x a , . . .,x^ )* a " i,2,...,n, (a total of nk vandom variables) with c. d. f
ft
(1) x (2) x (r)
,x , .. .,x
Again, the sampling prbblem is to determine the c. d. f . of one or more (Borel measurable)
IV. SAMPLING THEORY
functions of the nk random variables involved. Por example, here one may wish to deter-
mine the- probability theory of such functions as x' ', 2I X a x i * 3^" (* L * )
Y\ ttwi OhMirt
/4\ / 4 \ f 1 } 1 ^ M} I *!
(x v ;*'-x v J; )> where x vx; - 5_.x v ^ ; , 1, j - 1,2, ...,r, and other symmetrical functions.
n a-i
In mathematical statistics one is usually interested in relatively simple sample
functions, such as averages, ratios, sums of squares, correlation coefficients, etc. One
is able to obtain simple expressions for sampling distributions for such functions only In
certain special cases which will be considered in this and in later chapters. However,
one is able to obtain moments of some of the simpler g functions such as .averages, average
sum of squares, etc., under broader conditions. Some of these cases will also be con-
sidered.
4.2 Application of Theorems on Mean Values to Sampling Theory
This section consists of the application of results of 552,71-2.75 to cases of
interest In sampling theory. No assumptions are made about the population distribution
except the existence of first and second moments.
U.21 Distribution of Sample Mean
Let O n :(x 1 ,x 2 ,.. ,,x n ) be a sample from a population with an arbitrary distribu-
tion for which the first moment jij - a exists. Let x be the mean of the sample,
x - > x 1 /n.
1-1 1
Then from equation (b) of 52. 7 1 *, we have that the expected value of x is
E(x> 7 ^ a l /n " a '
since a^ - Efx^) a. If furthermore the population distribution P(x) has a finite vari-
ance o- 2 , then since each x^^ has the c. d. f . P(X I ), and the x^ are mutually independent,
we get from (d) of 52. 7 fc that the variance of x is
We gather these results into
Theorem (A); If x is the mean of a sample of size n from a population with
arbitrary c.. d. f . P(x), then if the mean a of P(x) exists ,
E(x) . a,
and if P(x) has finite variance <r 2 , the variance of x ^is
o-J - cr 2 /n.
81
Having computed the mean and variance of x we may now apply Tchebycheff f a In-
equality ($2.71):
(a) Pr(|x-a| > <j<r/n) ^/6 2 .
Let be an arbitrary positive number, and define 6 from d<r/n - . Then (a) may be writ-
ten
tb) Pr(|x-a| > ) cr 2 /n 2 2 .
Now a random variable X^ which la defined for n - 1,2,3,..., la aald to converge stochas-
tically to a value A If
Pr(|X^-A|<) aa n KDO for every fixed > o.
Letting n > oo In (b) we get
Theorem (B); For an arbitrary population with finite variance, the sample mean
convergea atochaatlcally to the population mean,
*
For the aample of alze n let G n (x) be the c. d. f . of x:. From theorem (A) we aee
that the limiting form of G la the atep function
o for x < a,
yx) - \
lira
_ ii' /
1 for x > a.
In order to "apread out" again the probability which all "pllea up" at x a, we might con-
aider the diatribution of z - (x-a)/h(n), where the function h(n) la choaen ao aa to keep
the variance of z from approaching zero. From (d) of 2.?U, we aee that
<r
Hence if we choose h(n) - n 2 , the variable z has zero mean and unit variance for all n.
A beautiful reault about the limiting distribution of z as n - a* regardless of the popu-
lation diatribution, is contained in the central limit theorem;
p
Theorem ( C ) ; For an arbitrary population with mean a and finite variance <r , the
c. d. f. G n (z) of
z - fr-a
approachea the normal diatribution -N( 0, 1 ) as n * CD ,
IV. SAMPLING THEORY
Z 1.2
1 f "5
( c ) G ( z ) v - \ e dt as n > oo ,
n Vin J
-co
uniformly in z.
We make the proof for the case where the m. g. f. i|)(e) of the original distribu-
tion exists for |6|<h, h>o. Then for |6|<h, the m. g. f . <t>(6) of y - (x-a)/<ralso exists,
for 4>(6) - e" ae /ill(6/(T). Finally, let $(e) be the m. g. f. of z:
+00 +00
$(e) = E(e ez ) - \ \ exp[eT" (x,-a)/ Vno-JdPlx. )dP(x )....dF(x r ,)
J i "T"
-00 -00
+00
- ij exp [e(x-a)/Vnc^dF(x)l n |*(e/Vn) l n .
-oo
Now
hence <t>"(u) 4>"(o) + TJ(U), where T^(U) v o as u ^ o. We recall d)^^(o) is the i moment
where < u 1 < u < h if u > 0, and -h < u < u 1 < if u < o. <t>"(u) is continuous at u - 0,
hence <t>"(u) 4>"(o) + TJ(U), where T^(U) v o as u ^ o. We rec
if y about the origin, so <t(o) i, <t> ! (o) = o, 4>"(o) - 1, and
(d)
where < 6 1 < 6 < h>fn or -hlln < e < 6 1 < 0. Now choose any 6 and hold it fixed, (d) is
valid for n > 6 2 /h 2 . Letting n > oo , for every fixed e,
lira J(e) - e 2
CD
which is the m. g. f, for N(0,1). Therefore from Theorem (C) of 2.91, the limiting distri-
bution of G (z) is given by (c) above.
While the above proof based on the generating function can be shortened, we have
purposely given it in a way which permits of generalization to distributions of which it is
assumed only that the second moment exists. In this general case one employs instead of
the m. g. f. the characteristic function (t) of the distribution, which is related to the
generating f unct ion <P(e) by $(t) - 4>(it). This always exists for all real t. The argu-
ment follows the above step by step and at the end one appeals to a theorem analogous to
(C) of 2.91> which states that if the limit of the characteristic function is the charac-
$$4.22.4.3 IV. SAMPLING THEORY 83
teriatic function of some contlnuoua c. d. f . F*(x) then the limit of the c. d. f . is
F*(x) uniformly for all x.
4 .22 Expected Value of Sample Variance
For the sample O n :(x 1 ,x g , . . .^J, call S the sum of squared deviations from the
sample mean,
S -
Recalling that E is a linear operator, we get
E(S) - >TE(x 2 ) - nE(x 2 ).
Now if the population distribution F(x) has mean & and finite variance cr 2 ,
E(x 2 ) - [ K of F(x)] - <r 2 + a 2 ,
E(x 2 ) = [^ of c.d.f . of x] - cr 2 + a 2 - a 2 + cr 2 /n.
Thus
E(S) - (n-l)tr- 2 .
We note that E(S /n) ^ o- 2 , but if we define
s 2 - 3/(n-i),
then
E(s 2 ) - cr 2 .
U.3 Sampling from a Finite Population
Suppose that a population has a finite number N of elements, each characterized
by a number x x^ 1 ', 1 - i,2,...,N, and that we draw a random sample O n :(x 1 ,x 2 , .. ,,x n )
without replacement. The sample may be represented by a point (x 1 ,x 2 , . . . ,x^) In n dimen-
sions, the possible values of x a being x' 1 ', x^ 2 ),..., x, <x - 1 ,2, . . .,n. To simplify the
discussion, let us assume that the values of the x^ ' are distinct, i 1,2,...,N. Then
Pr(XQ^x^ forot^p) = o. Hence we may think of the range of the sample point being all
points of the lattice x a x^ 1 ' 9 x^ 2 ' , . . . ,x^, ot- i,2,...,n, but we must ascribe to any
point for which x a > x^, ct^ft f the probability zero. By a random sample we mean that all
points of this lattice, barring the exceptional points just mentioned, have the same prob-
ability p. To enumerate the points with probability p, we note that to obtain such a
point, we may choose x 1 in N ways, x 2 in N-1 ftaya, ..., x n in N-n+1 ways. The number of
points with probability p is thus N(N-1 )...(N-n+i ). Since the total probability of the
points of the lattice must add up to unity, we have
(a) p- [NON-i )...(N-n+i)f 1 ,
pU^Xg,...^) - p* x^..^,
where
if any two x^are equal,
if all x are distinct.
Define the mean a and the variance < of the population from
i-l 1-1
Here, we shall consider the problem of determining the mean and variance of the mean of a
random sample from this population. Let x be the sample mean,
We note that the x^ are not Independent, it will later be seen that the correlation be-
tween x a and x la not zero, but we may nevertheless use the formula (f ) of 52.7^* as
pointed out there. Thus
(b) E(x) -
ot-1
To calculate E(x a ) we desire t"ie marginal distribution of x a . Suppose a - 1 . Then
Pr^-or ') is the sum of the probability over all lattice points for which x 1 - x'*',
that is, it is p times the number of lattice points for which x 1 - x^ ', and no two of
.,^ are equal. To compute this number note that we may choose x^ In only one way,
, then x 2 In N-1 ways, x 2 ^ x' 1 ', then x. In N-2 ways, x^ j x^ 1 ^ or x g , etc.; so
the desired number is (N-1 )(N-2)...(N-n+l ) The marginal probability of x- is thus seen
to be
x 1
from (a). We get
E(x 1 ) - xPr(x r x) - J"x/N - a.
IV* SAMPLING THEORY 8*5
Similarly,
E(x a ) - a, ot- l,2,...,n,
and substituting In (b), we find
E(3c) - a.
To calculate c we use formula (c) of 2.7 1 *>
Employing again the marginal distribution of x a , we get for the variance of x^,
- 2 - E(x 2 ) - [E(x a) ] 2 - ^(x^fpr^-x^) - a 2 = ftx' 1 '] 2 ^ - a 2 ,
(d) cr a - (T.
To find fa- for QL ^ p, we use the joint marginal distribution of x a and x^. To simplify the
notation, let <x 1, p 2. Then Prjxx^ ',xx^ J ;i^j) is p times the number of points
for which x <1 x' \ x 2 x^^ ^ x^ ', and no two of x 1 ,x 2 ,...,x n are equal. To enumerate
these points, note that we may choose *i> x p ^ n on ^y one wa> y> then x, in N-2 ways, x^ in
N-3 ways, etc. Hence
E[(x r a)(x 2 -a)] - Xu (1) -a)(x ( j) -a)Pr(x r x (i) ,x 2 =x ( J
J-j J
V3
IV. SAMPLING
Likewise,
(e) Pap - -1/(N-1) If
Combining (c), (d), (e), we have
n
no*-n(n-l>r 2 /(H-1),
er* -H-n,
We note that for n N, CP- o, that <r~ la a monotonlc increasing function of N, and that
Jv X
as N > oo , o^ <r^/n f <pr fixed n.
4.4 Represent at ive Sampling
Suppose we have a population re consisting of k mutually exclusive sub-populations
ft each with c. d. f . PCx), that is,
If X is drawn at random from TV let
p, - Pr(X from re ), > p. . 1 ,
1 x 1-1 1
To find the c. d. f. of X we may proceed as follows:
P(x) - Pr(Xx) - >^Pr(X from tr 1 )-Pr(X^x | X from T^) - J>~PjFi(x)
Denoting the mean of P(x) by a, and its variance by o- 2 , we calculate
dP(x) - ^ ~
+00 +OD
a - f x dP(x) -ipij x dP 1 (x)
00 -00
where a is the mean of
iv.
.
+OD
a 2 - { x 2 dP(x) -
+00
x 2 dF (x) -
-00 ~~ ' -00
where <r? la the variance of P 1 (x). This may be written
From 4.21 we have that if x is the mean of a sample of size n drawn at random
from 35 then
E(x) - a,
(a) ar| ,
k.ki Sampling when the p^ are known
We auppoae the probabilities p^ are known (the means a,^ are assumed throughout
to be unknown). Let us draw a sample O n consisting of the following sub-sarilples: (r '
(n 1 elements f rom TTJ ), 0^ 2 ' (n 2 elements from <O, .., 0^ ' (n^ elements from T^); "n n.
Call X R the mean of , and x^ the mean of (r '. Then
(a)
X R -
If we use X as an estimate of the mean a of tr, we would like to have
Since we do not know the a^, we require
for all a^, and this uniquely determines the n^ as
If n^ np^^, then O n is called a representative sample from rr. The advantages of
representative sampling over random sampling from tr are implicit in
Theorem (A) ; The variance cr| of the mean x*_ of a representative sample and the
AU uK.li. 1 IJ. --- j ^ - -
o **
variance o^ of the mean x of a random sample of the same size have the following relation-
ship;
IV. SAMPLING THEORY |M?
the equality holding only when all a^ are equal.
To prove the theorem, we calculate
(b) erf - V cr| (n^/n)
X R f="l x l 1
from (a) mid the mutual independence of the x. Now
Therefore
Hence (a) of ^J> may be written
f l m 4 +
and the theoron folJowa.
J*. 1 *? Jrjnpllnp; when the <r. are also known
We employ the same notation as in i*.M . If we use the mean "X R of the sample to
estimate a, wo have ji at seen that the n, are uniquely determined by the requirement
E(x R ) = a.
Suppose however that we use as an estimate of a the statistic
(a)
How shri Id we rhooae the n^, for fixed n = A^n^, so that
(b) E(y) = a,
and e is minimum (for the class of statistics satisfying (a) and (b))? The method of
j y
H.Vi shows th.'it we must take c^ = p^. Then
TV. SAMPLING THEORY fiSL
The problem is now to find the n^ which minimize (c)- subject to the condition
that LIU - n. Treating the n 1 as though they were continuous variables, and following
i-1 x x
the method of Lagrange, we form
-1 x
and set
dg/a ni = o, i - V^...,t:
We get
-p^/n^ + A a o,
( A \ r T /A
To evaluate A sum the equations (d) for 1 = i,2,...,k, and solve for A 2 :
The minimizing n. are thus
^ - np^/
Putting these back in (c), we find the minimum variance to be
k
<J -
1 *
With the help of the Schwartz Inequality,
j_
(the equality holding only if the a i are proportional to the b i ), where we let a i - p?,
b^ p^ <r. , we obtain
Theorem (A); <r 2 < cr| ,
^-* y - X
the equality holding only if all o^ are equal .
^.3 Sampling Theory of Order Statistics
1*.51 Simultaneous Distribution of any k Order Statistics, Suppose :(x*,x ,
...,x n ) is a sample of size n from a population with probability element f(x)dx, and that
x 1 ,x g ,...,x n are arranged in ascending order of magnitude. These ordered values of x will
be referred to as order statistics, more specifically, x a will be called the a^* 1 order
IV. SAMPLNG THEORY
statistic. Let r^ i , ..., r k be k Integers such that 1 ^ r t < r g < ... < i^ ^ n. The
problem to be considered here Is that of finding the probability element of x_ , x_ ,...,
r ! r 2
V *' e>
(a) f(x ,x ,...,x ) dx dx ...to- .
Let 1^, I 2 , I., . .., I be tha 2^+1 Intervals
(b) (-00, x ),(x ,x +dx ),(x +dx ,x ),(x ,x +dx ),...,(x +dx ,400),
P 1 r 1 r 1 r 1 r 1 r 1 r 2 r 2 r 2 r 2 r k ^
and let
^ f(x)dx - q > i - l,2
1
The problem of finding the probability element (a) Is Identical with that of finding the
probability ( to terms, of order dx^ dx^ . . . dx w ) that If a sample of n elements Is
r i r 2 r k
drawn from a multinomial population with classes 1^ I g , ..., I 2lc+1 then r,-i elements will
fall In I } , 1 element In I 2 , Tg-r^i elements In I,, 1 element In 1^, ..., n - r^-1 ele-
ments In Igfc.^. It follows from the multinomial law (3.12) that the probability of such
a partition is
- n - 1
v
Substituting the values of the q^ and noting that, to within terms of order dx_ ,
1 r i
x
f
)dx r and \
* * v
(d) J f(x)dx - f(x_ )dx r and \ f(x)dx - \ f(x)dx,
'i i <j
X
we have
(e) f(x ,x ,...,x )dx dx ...dx
r 1 r 2 r k r l r s r k
x <1
f(x )dx ...f(x )dx .
1 r l r k k
M4.52. 4,53 _ 3V. SAMPLING THEORY _ 31
The distribution function (e) has many applications, some of which will now be
considered briefly.
4.52 Distribution of Largest (or Smallest) Varlate
In this case k - 1, r - n; (e) of 4.51 then becomes the probability element of
n(J f (xjdx) 11 "^ (x n )dx n ,
the largest element x ,
-00
a similar expression holding for the probability element of the smallest element.
4.53 Distribution of Median
In this case let the number of elements in the sample be 2n + 1 . We would then
have k - 1, r fc - n + i, and (e) of 4.51 will be the probability element of the sample
median x^ . Denoting the median by x, we have
<v
X 00
(a) I2nlll ({ ffxJdxAkoOdxj'VftJdx.
(n:) -oo X
The asymptotic distribution of the median for large n may be derived from (a).
If x Is the population median then Jf(x)dx - ^. Therefore
-OD
x 3 OD Sf
f(x)dx - 1 * f f(x)dx and Jf(x)dx - 1 - \ f(x)dx,
5f * *o
and hence (a) may be written as
x
(b) ( ^ 1)J g (l-k(\ f(x)dx) 2 ] n f(x)dx.
2 2n (n! ) 2 4
x
We may write J f(x)dx = f-(x-x ), where
S o -
min i'(x) < 7 < max f(x)
xtl
and I is the interval (3t ,X) or (X f ).
Let Vn(5c-x ) - y. Then (b) becomes
92 IV. SAMPLING THEORY
We now choose any value of y, hold it fixed, and let n * oo . If f (x) la continuous at
x - X Q and f (X Q ) J o, then f (X +y/Yn) -* f (X Q ), f > f(x Q ), and with the help of Stirling's
formula for the factorials, we thus get as the limit of (c) as n > oo ,
1_2 . 2
(d) -^- e"^ 7 /ay dy,
where cr^ 1/8 [f(x n )] 2 . Hence the median x In samples of size 2n + 1 Is asymptotically
j U
normally distributed with mean X Q and variance 1 /8n[f (X Q ) ] 2 . It is of interest to note
that this asymptotic distribution depends only on the X Q and f(* o ) of the population.
Example: For the normal distribution
f(x)
Y2TTO-
we have X Q - a, f(5^) = i/V^or Therefore, the variance cr| of x in samples of
size 2n + 1 from a normal distribution with variance <r 2 is Tro- 2 /4n, approximately.
It will be recalled from U.21 that the variance o| of the mean of a sample of size
2n + 1 is o^/fen+i). Hence, for large samples from a normal population, the mean has
smaller variance than the median.
In a similar manner one could treat the problem of finding the sampling distri-
bution of the lower quart lie of a sample (the (n+1 )st element in rank order in a sample of
size Un -f 3), and other particular order statistics.
J+.5 1 * Distribution of Sample Range
The joint distribution of the largest and smallest values of x in the sample is
given by (e) of 4. 51 with k = 2, r 1 = 1 , r g - n. We have
x
(a) n(n-i )( f (x)dx) n ' 2 f (x, )f (x^&^db^.
X 1
To obtain the distribution of the sample range R, we make the following transformation
(b)
x n * X 1 - R
and integrate the resulting distribution with respect to S.
Example: Suppose x has the rectangular distribution
Vr, < x < r,
(c)
= o, otherwise.
We have for (a),
(d) n(n-i)r~
IV SAMPLING THEORY
Applying transformation (b) and Integrating with reapect to S from o to r - R, we
obtain aa the probability element of the range in aamplea of aize n from the rect-
angular diatribution
(e) n(n-l )r" n R n " 2 (r-R)dR.
U.55 Tolerance Limits
The joint diatribution of the amalleat and largeat valuea of x in the aample ia
given by (a) of 4.5^. Now auppoae we set
x i x n
(a) ^ f(x)dx - u, ^ f(x)dx - v.
-oo x
We have
d(u,v)
-
9(u,v) 8(x 1 ,x n ) f(x 1 )-f(x n )
and hence the joint diatribution of u and v ia
(b) n(n-D v n ~ 2 dudv,
and the region of non-zero probability denaity ia the triangle bounded by uo, vo,
The probability element (b) clearly doea not depend on the probability denaity function
f(x). Integrating with reapect to u from o to i-v, we find the probability element of v
to be
(c) n(n-l) v n " 2 (i-v)dv.
It will be aeen that v ia the amount of the probability in the diatribution f(x)
included between x 1 and x n (or statistically apeaking, it la the proportion of the popula-
tion included between x 1 and x n , 1. e. between the leaat and greateat valuea of a aample
of aize n). Prom expreaaion (c) one can determine the aample aize n auch that the probabil-
ity ia that at leaat 1 oqp* of the population will be included between the leaat and
greateat value of the aample. Such a value of n would be obtained by aolving the following
equation for n:
1
(d) n(n-l) { v n ~ 2 (l-v)dv - ,
P
or
(e) nO n " 1 - (n-i)p n - 1 - .
Example : For .95 and P - ,99> we find n 130. Thua, if a aample of 130
caaea ia drawn from a population in which the random variable x ia continuoua, the
probability ia .95 that the leaat and greateat valuea of x in the aample will include
IV, 3AMPLINQ THEORY
at least 99% of the population.
x 1 and x^ are examples of tolerance limit a. More generally, two functions of
the sample values, say L^x^..^^) and L g (x 1 ,x 2 , . ..,x n ), will be called 100/Sx distribu-
tion-free tolerance limits at probability level if
r*
(f) Pr(] f(x)db/J) - ,
L 1
for all possible probability density functions f(x).
If the functional form of f(x) is known but depends on one or more parameters
6 1 ,e 2 ,.. . ,e^ and if L 1 and L 2 are such that (f) holds for all possible values of the para-
meters we shall call L 1 and L 2 looflx parameter-free tolerance limits at probability level
.
If we denote by u 1 , Ug,...,*^ the quantities
respectively, it is -easy to verify in a manner similar to our treatment of the distribu-
tion of u and v, that the probability element of u^Ug,...,!!. is
n ,
a result which is independent of f(x). The domain over which this density function is
> u. < 1 .
^
defined is the region for which u- ^ (i-l,2,..,k) and
1 1
k.6 Mean Values of Sample Momenta when Sample Values are Grouped; Sheppard
Corrections
Suppose that x is a continuous random variable having probability element f(x)dx,
and that O n is a sample from a population having this distribution. Let the x axis be
divided into non- over lapping intervals of equal length 6, suppose I Q is the Interval in-
cluding the origin, and let h be the x- coordinate of the center of I Q . Denote the inter-
vals by ..., I_ 2 , I -(| , I Q , i lf I 2 , ... where the end points of I are (h+(i-^)<S,h+(i-Kl)<J ),
i * ,-2,-l ,0, 1 ,2, . . . Let
"/
(a) p i " J f(x)dx,
(U.6 TV. SAMFLIKG THEORY
the probability associated with 1^. If f (x) ia identically zero outside some finite inter-
val there will be only a finite number of non-zero p^, otherwise there will be a conver-
gent series of p^. Let H I be the number of x ! a in O n falling into 1^, and let the value
of each of these x's be replaced by h + id, the midpoint of 1^. Let .M be the r-th
"grouped" moment of the sample, defined as follows
(b) ^-
It will be noted that tf M is the "grouped" analogue of
(o
where x^x^,...^ are the values of x in the sample. In fact M - Lim M. It is easy to
verify that E(M) - K , where
OD
(d) H^ - J x r f(x)dx.
-GO
The problem to be considered here is that of finding ECMK where h is a con-
tinuous random variable distributed uniformly (i.e. with probability element j<3h) on the
o
interval (- ~6, ^6). For a given 6, the random variables involved in the grouping problem
are the n^ and h. The conditional probability law of the n^ given h is the multinomial
distribution
nl _ r-S-T 2
lnJnJ... '"P^ P -1 p o p i P 2
Now we have
where 2. denotes avmnnation over all positive integral or zero values of the n, such that
- n. The m. g. f. of is
. . r _ _
(g) *<e)-E(e)- e" r Pdh - 1 \ ( T P e ) n dh.
If the m. g. f . does not exist then the characteristic f unct Ion (obtained by replacing
Q* _ IV, SAMPLING THEORY
by e V^T) will exist since the j> are positive and will form a convergent series If there
Is not a finite number of them. We now have
(h)
Making uae of (a) we may write
(1) E(.MI) - f2.\ \ f(x)(h+l<srdx dh
E( ( } M r ) ~i'^-\ \
Setting h + Id - y, we have
U) ELMM-12L\ \ fUJ^dxdy
dd +J>) y +*
oo y + tf
f C 2
M \ ftxj^dx dy.
-OD y - -id
Interchanging the order of integration, we obtain
oo x + d oo
M)-J.f (
(k) E(M)-. ftxJydy dX - [(X +) + - (X --
-00 X - -(5 -00
In particular, for r - 1, 2, 3, (k) becomes
QD
E^MJ) - ( xf(x)dx - nj,
-oo
? 2 2
E(^M^) I (x 2 ^^)f(x)dx- ^ +^,
^oo
oo 2
(x 3 + |- x)f(x)dx H t
-oo
2 .2
It will be noted that ^MJ, (,M^ - y^), and ^M^ - ^Mj are unbiased (56.21 ) esti-
mates of jj^ Kg* KJ- The quantities j^, ~T"d M i are called Sheppard corrections of ^M^
IV* Q THEORY 97
and, Ml. Such corrections can be obtained for higher values of r by further use of (k).
Similarly one can determine Sheppard corrections for grouped moments about the sample
mean, as defined by
J
*.7 Appendix on Lagrange ! s Multipliers
We frequently encounter the problem of finding the extreme (maximum or minimum)
value of a function g(x 1 ,...,x n ) subject to side conditions
(a) 4> l (x 1 ,...,x n ) - o, i-i,...,k<n.
To Insure the Independence of the conditions (a) we assume that for some x ***_
,
at the extremum. To simplify the notation, assume n^i, l-i,..., k. At the extremum, dg0,
(b) ^ ff dx i 0,
where dx-,,..,dx^ are functions of dx^ 1 ,...,dx n , determined by d4>, -o, i. e.,
n_ a<t> .
(c) 2_ 5T 2 dx 1 - o, j - l,...,k,
1=1 Ox i X
and dx^ 1 ,...,dx n are completely arbitrary numbers. In order that (b) be satisfied for
all dx 1 ,...,dx , which are arbitrary except that they must satisfy (c), a necessary and
sufficient condition is that the equation (b) be a linear combination of equations (c),
i. e., that for some A^.../^,
Ao , Jc. d<t>4
(d) fx ~ ~ f^- A i dx ' i " 1 '-" n
We see that the conditions (d) are obtained if we employ the following rule: To minimize
g subject to (a-), form the function
G(x 1 , . . .,x ;A I , . . ,,A, ) n g
and set
( e ) g"5c j ** u ' ,...,.
The equations (a) and (e) constitute a system of n+k equations in n+k unknowns Xj,...,^;
A. , . . . ,A, , For an extremum it is necessary that x 1 , . . . ,x n satisfy these equations . In
most applications In statistics the question of sufficiency can be settled in an obvious
way.
CHAPTER V
SAMPLING FROM A NORMAL POPULATION
Since the normal distribution appears in such a wide variety of problems, we
shall consider in detail certain sampling problems from such a distribution. Many distri-
butions are important in statistics for the reason that they arise in connection with
sampling from a normal universe. In the present chapter, we shall only consider certain
sampling problems, deriving certain sampling distribution. The application of these samp-
ling problems to problems of significance tests, statistical estimation, etc., will be
made In later chapters.
5.1 Distribution of Sample Mean
An important property of the normal distribution is the so-called reproductive
property. We wish to demonstrate that a linear function of normally distributed varlates
is again normally distributed. Suppose x., x g , ..., x are distributed independently
according to N(a i; o^), N(a 2; o^), ..., N(a n ,o^), respectively. Let us find the distribution
of the linear form L I l x 1 + l^x^ + ... + l n x n . According to the results of 2.7*1, the
expected value of L is
(a) E(l 1 x 1 +l 2 x 2 +...4-l n x n ) - IjEUj) -f 1 2 E(X 2 ) + ... + \f(^) m 1 i a i + 1 2 a 2 + + Vn*
The joint distribution of the x f a is
(b) - e "*
(2T() 2 cr 1 ...cr n
Prom this we shall find the moment generating function of the linear form minus its mean,
L - E(L),
8S.1 V. SAMPLING PROM A NORMAL POPULATION
(c)
\...\
(2TT) 2 ^^ "00 -00
co -
- A -L- * -
e dx,
i=i 1^I ."a, J
IV- 1
1.1 ViFo
e
This la the moment generating function for the probability element
-4
(d) -J=- e 2o dy,
V2TtCT
where
2
"
n n
p p
Therefore L is distributed according to N(^_ l^a. ,>_lf<rf). We have the
1 i i 1 i i
Theorem (A); If x. , x , . . . ,x^ are Independently distributed according to
i f ji i - "
N(a 2 ,o|), . . . N ( a n q ^) f respectively, then any linear function of the x ! s
-X- + ^-p x p + f + ^n x n ^ 3 ^ r ^ u ^ e( ^ according to
From this result we can easily derive the distribution of the mean of a sample.
Consider a sample, R , of n observations x 1 ,x 2 , . . .,x n . The x f s are Independently distri-
buted each according to N(a,o^). If we take 1 1 - 1, ..., l n 5, the linear form L is
simply x, the mean of the sample. Its expected value is
100 _ y, SAMPLING PROM A NORMAL POPULATION _ {51g
(e) e, +^a + ... + la - a;
its variance la
n n n
Therefore, we have the following corollary to Theorem (A):
Corollary (A): JEf O n : Xj ,x 2 , . . . ,^ la a sample from the normal population
^(a^er 2 ), then the sample mean x la dlatrlbuted according to N(a,|~).
3.11 Dlatrlbutlon of Difference between Two Sample Meana
Suppoae we have two aamplea, O n and O n , of n and n f obaervatlona drawn from
normal populationa, NCa/r 2 ) and N(a',cr 2 ), respectively. Then the two aample meana, x
and x 1 , are dlatrlbuted according to N(a, <r^/n) and N(a f , a* 2 /it), respectively. To find
the dlatributlon of the difference of the two meana, let ua conalder the linear function
x - x f . In thla caae 1 1 l f l g - -i; so the expected value of the linear form Is
(a) a - a',
and Ita variance la
We therefore have the following corollary to Theorem (A):
Corollary (A g jh II! O n : X 1 ,x 2 ,...,x n and O n , : x 1 x^...,x^ t are samples from
the populatlona N(a,<r^) and N(a ! ^cr' 2 ). reapectlvely, then x - x 1 _la dlatrlbuted according
2 ,2 _ _
to N(a-a',~- + ~ 2 ;rr) where x and x f are the meana of and ,. reapectlvely.
3>12 Joint Diatrlbutlon of Meana in Samplea from a Normal Blvariate Diatrl-
butlon
Let ua conalder a aaraple O n ( x 1a / X 2 a / OL " 1 i 2 /^/ n ) from the blvarlate dlatribu-
tlon
Let x i " n ^ x ia' i - 1>2. We wiah to determine the joint diatribution of
To do this, we determine the m. g. f. of the (x^) and (x 2 -a 2 ), i. e.
V. ftilPTTNTfl FROM A NORMAT.
(b)
- E(e 1
2tf
-is s
-oo
2n-fold
00 00 - ^
2
51 j
1-1
-00 -00
But, we know from (d) and (e) of 3.22 that if we set
ing expreaaion inside [ ) will be
- a 1 - y 1 inside [ ] the result-
Therefore, the m. g. f. of (x, -a,), and (x 2 ~a 2 ) la
(c)
i(e,,6 a ) - e ^-'J" 1
Since e L J"" " la the m. g. f. for (x^a^ and (x 2 ~a 2 ) In dlatrlbutlon (b) 3.22, it
follows that the dlatrlbutlon of (x^-a,), (x" 2 -a 2 ) (having m. g. f. (c)) la
(d)
_ _
dx ld x 2 .
If x- and x 2 are distributed jointly according to the normal bivar-
We therefore have
Theorem
iate law (a) 55^2, then if x 1 and 5c 2 are sample means of the x^ and the x^ . respectively^
J=Q sample O n (x la ,il ,2;a=i ,2,..,n) from such a distribution then x 1 and x 2 are also
distributed according to a normal bivariate distribution given bj (d).
Theorem (B extends at once to the case of means in a sample from a k-variate nor-
mal population with distribution (b) 3.23. The distribution of the means in this case is
(e)
n
V. SAMW.TNQ FROM A KQRMAL POTOLMTQll
3.g The ^-distribution
The X 2 -dlstrlt>utlon function with m degrees of freedom is defined as
This distribution arises very frequently in connection with sampling theory of quadratic
forms of normally distributed variables. We shall consider some of the important cases
in this chapter and others in Chapters VIII and IX.
The integrals, j^ f m (x 2 ) d * 2 ^ J f m fa 2 ) d X 2 are tabulated in many places for
various values of m and xf? When we let x /2 - t, the latter integral is transformed
into the Incomplete Gamma Function of which extensive tables have been computed by Karl
Pearson.
5*21 Distribution of Sum of Squares of Normally and Independently Distributed
Variables
The simplest sample statistic which is distributed according to the ;( 2 -law is
the sum of squares of variates independently distributed according to the same normal law
with zero mean. Let us use the method of moment generating functions to find the distri-
bution of x 2 - J>_x 2 -where each x^ (i- l,2,...,n) is independently distributed according
to N(0,1). The joint distribution of the x's is
i
6
Now let us find the moment generating function of
>
) --- !- \ \
r , / o J J
(enr''-(jD -
e
-oo
= .
(2tT)
$5.22 V. SAMPLING FRCM A NORMAL POPULATION
n
- (1-26) 2 ,
VT^ie
for e < 1.
But this is the moment generating function of the Pearson Type III distribution
((e) of 53.3) when p 1, ex- o, and ^ f Therefore by uniqueness Theorem (B), 52.81,
we have
Theorem (A); If R : x 1 ,x 2 ,...,* n is a sample from N(0,1 ), .the function
xf * -21 x i is distributed according to the X 2 -law with n degrees of freedom, i. e.
From this result it follows that, if x 1 ,x 2 ,...,x n are distributed independently according
to N(a,<r 2 ), then* 2 - I>_ (x^a) 2 /^ is distributed according to f n (* 2 )d(x 2 ).
'
We cem readily determine the momenta of the x distribution from i^s moment
generating function. We expand 4>(e) in a power aeriea
n P
(c) *() - 1 + S. 20 + _ ( 2 ,e) 2 *...+ 5_2 (2e )
Then we find the momenta about ?ero
(d)
The mean is n and the variaRdte is
or' = n(n+2) - n^ = Pn.
3.?? Dlatrilji -tloii c\\ Uie Hxpcnorit In a Multlvarlate Normal Distribution
Now let us consider r t iv-rmc.1 multivarlote distribution of k variatea with zero
means
V. SAMPLING PROM A NORMAL POPULATION , j.yg?
and let us find the distribution of the quadratic form, ^> A 11 x 1 x 1 . To do this we find
17F1 1J * J
the moment generating function of the quadratic form,
(b) 4>(0) - E(e ^ ^ 1 ^ - \ A l/r> ( { e 1 1 dx. ...dx k
foW\ K / 2 J J IK
-co -co
00 00 - TT
e
-oo -oo
It follows from 3.23 that
00 00 - ^r^j^j k/2
e ' dx, ...dx v - ^TLL,
1 k VTi[
-co -oo
the above Integration yields
That is,
(c)
which, as will be seen from (e) in 3.3, is the m. g. f . for a x. 2 distribution with k
degrees of freedom.
We therefore have
Theorem (A); If x, ,x n , . . ,x, are distributed according to the normal multivar-
.- k ~ i d K:
late law (a), then ^> A. ^x.x. = xf say, 1^ distributed according to fi.(x )
1,J-1 1 J x J _k
More generally, the quadratic form ^> A 1 ^(x^a^ )(x j-a^ ) from the distribution
i "j H1 " 11 J J
-ok _ V. 3AMPLIHQ FROM A NORMAL POPULATION
has the X s distribution wlth-k degrees of freedom.
5.23 Reproductive Property of ^-Distribution
In the same way that the normal distribution possesses the reproductive prop-
erty so also does the x 2 -<Hatribution. Suppose we have T^> x!' 9 "'XJc Distributed accord-
Ing to f m (7[f), f m (Xo)*" f m (x)> respectively. Prom the joint distribution of
1 2 ^C k
these varlates, let us find the moment generating function of the sum 2Tx?" assuming
i-1
Independence,
o o
00
[(1-29)
- (1-26)
m
2
where m - jL-^* W&) ia the m * g- f - for a x- distribution with m degrees of freedom.
Therefore, we have the following
Theorem {A}; If , x >X are Independently distributed according to x 2 *"
>
laws with n^, m^,...,^ degrees of freedom respectively, then > 7t| la distributed ac-
cordlng to a *x 2 -law with > m 4 degrees of freedom.
1 i
5.2k Cochran's Theorem
Cochran f s theorem states certain conditions under which a set of quadratic forms
are Independently distributed according to ^ 2 -laws if the variables of the quadratic forma
are Independently distributed, each according to N(0,1). To prove this theorem, we need
several algebraic theorems which will be stated as lemmas.
z a* where the c a are +1 or -1 .
Lemma 1 ; If q Is a quadratic form, > QOA^XXA/ of order n and rank r, there
exists a linear transformation z a - ^bQ/jX^a-i ,2, . ..,r) such that 3> a afi x a x /5 il "
In 53.25 we exhibited a linear transformation that would do this for a positive
definite quadratic form. The reader may extend that demonstration to prove Lemma i .*
*A proof of Lemma 1 is given in M. Bocher, Introduction to Higher Algebra, Maomillan,
New' York, 1907.
106 _ V, SAMPLING FROM A NORMAL POPULATION
ok
>> 2: If Z> Ax x is transformed into ]> a^ ^ a 2 rt bj; a linear trans-
formation. z a - b a pXp(a-i,2,... f n) then
This lemma can be readily verified f ran the fact that a n - ^> aJ^bL-b > ft and by using
*" y,6i ' 7 '
the rule for multiplying determinants.
Lemma 3: Suppose we have k quadratic forms. q 1 , q 2 , . ,q k , in x 1 , x g , . . . ,x n of
K _iQ_ o
ranks n 1f n ,.. % n. , respectively, and suppose > q^ .21. x . Then a necessary and suffi-
i 2 TC Ias1 i aarl n
cient condition that there exist a non- singular linear transformation z -
-- S ------
(a-1 ,2,. ..,^>_ n^ ) such that
1 - 1
2 2
*,+...+ z
2
"
is that n n, + n -f ... + n...
-- 1 2 TC
' Proof; The necessity condition is obvious since > n 1 must be equal to n in
1 - 1
order for the transformation to be non-singular.
Now consider the sufficiency condition. We assume n n 1 + n 2 4 ... + n^. By
Lenma 1 there is a linear transformation y^ ' > b^Vx^ such that q 1 21 c (y a ) 8
where c a +1 or -1. In the same way we know there exist transformations
y (2) . y"b^ 2 ^x y (k ^ -
such that
In other words we have n 1 linear forms y^ 1 ' ^vb^Vx,. (ai , . . . ,n 1 ) such that q-
X 1c a(A^ b iri )x A) 2 - 2J c (y ( J ^) 2 ; we have n o linear f orms ..sruch that
0=1 a fil ^ P 0-1 2
a ^s c (>b v 'x ^ N. c (v vc M pfp
HO *" ,^> a^4_ D afl x r* ' ^> w tt ' 9 etc.
Let us denote yj* ' by z a for a 1 ,2, . . .,n 1 , y^ ^ by z a for a - n^l , . . ., n 1 + n g , etc.
Let us denote b^' by c^ for ot i,...,n 1 ty)i,...,n), bjj^ by c^ for a - n 1 +1, ..,
V. gUMP7,TNfl ff^qifl A NORMAL PQPT1LATTON _ 1 07
(pl,...,n), etc. Combining all of the linear transformations, we may write
( a " 1*2,.., n).
n l' fn 2 n k n n n
Tfcen Ql -v. q 2 - SZ^J. etc., and
By Lemma 2, | afj | - |c a <5 ap | Ic^l 2 (where <5 af , is 1 if a - p and is o if a + p ). This
reduces to
Since the c a + 1 , this equation la
i -1 . !c af ,| 2 ,
and because the c a are real lc a J * 1 .
This fact tells us that the n linear forms are Independent and constitute a
non-singular linear transformation. Prom the identity
r~ o r~ p
we deduce that 2_c a z^ is positive definite since z_x is positive definite. Hence, each
c- m +1. This proves the sufficiency of the condition n n. + n p + ... + iv.. It is
interesting to observe that |c a A| - +1 and that Xccr-c^ - &<&> that ls > the transforma-
*Nl'11 ' If '
71
tion is orthogonal .
Cochran f s theorem follows readily from this algebraic theorem.
Theorem (A) (Cochran ! s Theorem); If x a (ai,2,...,n) are independently dlstrl-
f- P
4T ., X(X "
buted according to N( o, i ) and if > x^ - ^ q^ where q., is a quadratic form of rank n if a
necessary and sufficient condition that the q^ be distributed according to f n (*x 2 ) 1
1
that > nj n.
Proof ; Asaome n, = n, and find the m. g. f . of the q^. We have
_n
E(e
Now transform the x f a to z'a by Lenrrv- ^, iiutinp l\n>t the Jacobian is unity.
108 V. SAMPLING FROM A NORMAL POPMATION
OD OD 1 X -d - ( d 'd \ i.Aty* 4. 4.7*}
1 r r " 1^-- Z a+^i v z i+ + z n /+ +TC i *!!.+. fiii- 1+1 n' n
;BI i... i- ' ' ?*"
oo -oo
-J^d-Mi)" 7 ".
which is the m. g. f . of k independent ^ 2 distributions with n 1 ,n 2 , . . .,r^ degrees of free-
dom, thus establishing the sufficiency condition.
The converse assumes that
oo oo _- .
-CD -00
-
Since 5_q 1 * 2Z^^, the right hand side of the equation becomes the m. g. f . of > x
fei 1 ai ai
(which has a-xT distribution) when 6 1 6 g >... e^ 6. So the equation becomes
~1 n
(1-26) 2 - (1-26) 2 ,
that is,
(1-26) ' = (1-26) 2 .
Hence, ^> n. - ji, and the theorem is proved.
1 - 1
5.25 Independence of Mean and Sum of Squared Deviations from Mean in Samples
From a Normal Population
As an application of Cochran's Theorem, we shall show that the sample mean and
sum of squares of deviations about the mean in a sample from a normal population are inde-
pendent and have % 2 - distributions. Consider a sample O n :x 1 ,x 2 , .. ,,x n drawn from a normal
population N( 0, 1 ) . Then
(a)
Let
t
Tl Y-l Tl T T1
1
>
V. SAMPLING FROM A NORMAL POPULATION
10Q
and
- nx 2
q 2 IB of rank 1, for In the matrix
n
n
any minor of order two,
1 1
n n
n n
la zero, but each element la different from zero. The determinant of the matrix of q 1 la
111 i
D -
n
n
n
n
n "
1
n
n "
n
n
n
Subtracting the first row from each of the others, we get
D
1
1
_ 1
j_
n
n
n " * "
n
- 1
1
...
- 1
1 ...
- 1
...
1
Next we add each column to the first and find
D -
n
1
_ 1
_ 1
n
n
n
" n
1
. ...
1
....
i
- o,
for all the elementa of the flrat column are zero. If we uae thla method of evaluation on
any principal minor of order n - 1 , we get
110
V. SAMPLING FROM A NORMAL POPULATION
M -
i - n-1 _ J_
n n
n
.. . - n
.1,..
1
1
...
1
Hence the rank of q. is n - 1. Using Cochran's Theorem we conclude that 5Z(x a -x) and
x 2 )* respectively.
nx 2 are Independently distributed according to f n .-|(% 2 )
If x is distributed according to
then
is distributed according to
N(0,1). Hence, we have proved the following corollary to Cochran f s Theorem:
n 2 2
If <L:x 1f ...X- is a sample from N(a,<^), then J^ ( x y x ? and n ^ x "^ are inde-
_ n 1 n - -- - *-y g-2 - g-2 - -
pendent ly distributed according to fV .(-v 2 ) and f,(-x 2 ). It also follows that s 2 -
*-. "UT - n ~ ' * - ' ^ - - -
-
?' and x are independently distributed.
- 1 - - - - -
It should be pointed out that one could establish the fact that
1
TT for a sample from N(a,<r 2 ) are independently distributed according to
and N(o, 1 ), respectively, by verifying that the m. g. f.
and
*(e,, 8 ) - E(e 1
6 2 (x-a)Vn
y 5.3 The "Student" t- Distribution
Next we shall derive the distribution of the ratio of two independent varlates,
one normally distributed and the other distributed according to the -x. 2 - law. Let be a
variate distributed according to N(0,1) and let % 2 be distributed according to f m (x 2 )- H
these are independently distributed, the joint probability element Is
e 2 d!d( x 2 ),
Let us change variables to
Then I
- t^j,
u.
-00 < t < 00 ,
< U < OD .
V. SAMPLING PROM A NORMAL POPULATION
111
The Jacoblan of this transformation is
J -
1
VF-
Hence the Joint distribution of t and u la
- 1
du dt.
To find the marginal distribution of t, we integrate out u,
00
o
du
> 2
(a)
ffltL
Is called the "Student" t-dlstrlbutlon with ra degrees of freedom. Values of tg have
been tabulated such that
- ,
for - .1, .2, .3, .**, .5, -6, ,7, .8, .9, -95, .98, .99 and m * 1/2, 3, ...,30, In
R. A. Flsher f s Statistical Methods for Research Workers .
The application of this distribution to sampling theory is imnedlate. As an
important application consider a sample O n from N(a,<r 2 ), Then
g m (5T-a)Vn
V. SAMPLING FROM A NORMAL POPULATION
Is distributed according to N(0,1 ) and
,
cr 2
is independently distributed according to f n .^(x 2 )* The ratio
(x-a) V5 _
, _ _ _
n(n-i ) m (x-a)Vir
"
cr*(n-1)
is, therefore, distributed according to gr^^t).
The quantity t and its sampling theory which marked a new step in statistical
inference were first investigated by Gossett who without rigorously proving his result
suggested the above distribution of t in a paper published in 1 908 under the name of
"Student". A rigorous proof was supplied by R. A. Pisher in 1926. The essential feature
of t is that both it and its distribution are functionally independent of <r.
The "Student" distribution may also be used in connection with two samples.
Let O n (x la ,a- 1,2,...,^) and O n (x 2a , a- l,2,...,n 2 ) be samples from N(a 1 ,cr 2 ) and
i, respectively. Let x. and 5L be sample means and s? - ^(x.^-x, ) 2 /ki.-l) and
P **2 - 9 a-l '
- ^'^-.-x^r/h -1). Then
1 2
is distributed according to N(o,l) and
is distributed independently according to f n ^ n _ 2 (x 2 )- Hence, the ratio
is distributed according to g r ,
^n. -fn^
V. flAIIPT.TW* ffRflM A NORMAL POPULATION
115
It can be verified by the reader that
m > oo
Vin
5.< Snedecor's P- Distribution
Now let us consider the distribution of the ratio of two quantities Indepen-
dently distributed according to -^-distributions. Let v ^ and<xj be independently distri-
buted according to f
m
f m ( x |), respectively. The Joint distribution is
X s
1
/m,
Let us make the change of variables
P - m,
Then
m
< P < 00,
< V < CD .
and the Jacoblan of the transformation is
J -
m m.
v p
m m
The distribution of the transformed varlates Is
^1
m
m. m-
- 1 -
m.
- 1 - Pv + v)
Integrating out the extraneous variable v, we get the distribution of P,
m
1 m.
00
ra
p-ra. r--m^
\ (^-)l(^
ra-
1 k v. SAMPLING PROM A NORMAL POPULATION
P- , m 1 +m 2 m 1 m 1 n^ +ri
(a) - ^'IT,! (5-) r P~ " \i + 5- ?)" "^~
2 2
This distribution, known as Snedecor f s P- distribution with m 1 and m 2 degrees of freedom,
will be denoted by li (P).
m^fflg
Values of F^ have been tabulated such that
P
for .99, .95 and all combinations of (m^nu) from (1,1) to (12,30) and for certain
combinations from (14,32) to (500,1000), in Snedecor's Statistical Methods .
The moments about zero are easily obtained. Since the above is a distribution
function, the integral over the entire range of F is unity, and, hence,
00 r 1 r 1* r 2 p-* P 1 P"* r 2 r 1
p 2 ( 1 4 P ) 2 dP -
m 2 r( r i +r 2 s
Using this fact, we get n ! by integration.
r
, m, +m - ^
p/ 1 g) - 1U<I r m i
(b) E(P")>^ m ^^ (^ }F< (1+ S; F ) dF
~^
m, TT- \ TT- - 1 + r
m. r m, i m m.
n K~ I ( - * r >" ( ~- ^ ?,-
( )
n m i
(
n m i n m p
I c 1 )! ()
for r < ^.
By a simple change of variable the P- distribution may be changed into a Type I
distribution, (the integrand of the Betr*. function times a constant). Let
V, SAMPLING PKQM A NORMAL POPULATION LLJL
m.
x -
Then
and
- m g x
m 1 1 - x '
m i (i - x) 2
(F)dP transforms into
m a . m l .
dx ___ i _ T~ n . x x2~ "
^n" 8 ) s mi ' 1- mi (i - x > 2 B( mi )
* *~* \~*~'
It should be pointed out that the square of Student's t is simply distributed as
t 2 ).
If we make the change of variable
(d) z-
we obtain R. A. Fisher's z- distribution.
Example i : As an example of the applications of the F-distribution, consider
two samples O n :(x 1a ,ca - 1,2,...,^) and R : (x 2o o- I,2,...,n 2 ) from populations
^ ^
o p
^a^) and N(a 2 ,<y, respectively. Let
Then 2
-I
is distributed according to h,. . ^F).
1 " ' 2"
Example 2; Suppose O n , O n ,...,0 are k samples fronN( a 1 ,<r 2 ), N(a 2 ,<r 2 ),...,
,^- 2 ), respectively. Then 2
116
41*3.
2 r-
has a x -distribution with n - k (z.iu n) degrees of freedom. Since
is distributed as
)* the sum
facts it follows that the ratio
Is distributed according to (x ) and independently of the former sum* From these
n~k
is distributed according to h^ Q.^tF).
If the k samples come from a normal populations with the same mean as well as
the same variance,
^r
is distributed according to N(o,i) and, therefore,
E-i or ^
has the -^-distribution with k - i degrees of freedom. Prom this fact> and since
the x^ are distributed independently of the s^, it follows that
P -
n - k
has the distribution h^ -1 n . k (F).
3.3 Distribution of Second Order Sample Moments in Samples from a Bivariate
Normal Distribution
*
Let us consider a sample O n (x la , x 2<x , a- i,2,.*.,n) from the distribution (a)
in 55.12* The probability density function for the sample is
We shall find the m. g. f . of the sample variance and twice the covariance
4>(e 11 ,e 12 ,e 22 ) -E(e
where
,$... SJJL* We have
V. SAMPLING FROM A NORMAL POPULATION
n oo QO
^\ \
9
where
-oo "-oo
n oo oo
-oo -oo
n oo
(2TT) 1
.
? ? ^ ^
. \ \e J-
n 1 ... i
-oo -oo
IB!'
/-r -
lj,ap (x la
The determinant of thp matrix of the quadratic
2n and is
|B
j-jj rt
i3
C
D
D ...
D
D
C
D ...
D
D
D
C ...
D
.
D
D
D ...
C
where C la a 2 x 2 block of elements as follows:
12
H } A 22 - 26 22 (1 - H>
and D is a 2 x 2 block of elements as follows:
2 . 2
n 6 11 n 6 12
n
If in
the first row of elements is subtracted from the third, fifth, etc., and
the second row Is subtracted from the fourth, sixth, etc., and If In the resulting deter-
minant to the first column Is added the third, fifth, etc., and to the second column Is
added the fourth, sixth, etc., we find that
which exists If the e^, are sufficiently small. Hence the m. g. f. of the a n , a g2 and
2a i2 la
n-i _ n-i
(a) <)(6 11^12' 6 22 ) " |A lj' 2 |A lf 2e lj' 2 '
Now If we can find a function f (a 1 1 ,a 12 ,a 2g ) such that
e J I
R
where R Is the region In the space of the a, . for which a, , > o, a 00 > , -1 < . 12
iJ " 22 ^ a n a 22
<1, then '( a i 1 a i2* a g2^ w *^ be the p ' d * f ' ^ the a li* The ^^Q 11 11633 of1 the solution
can be argued from the multlvarlate analojrue of Theorem (B) of 2.81.
Denoting |A jL j| (n " 1) / 2 by A (n " 1) / 2 and A I J - 26^ by Sjj.and choosing values of
the e^ small enough for Iff^JI to be positive definite, we can write
/n-i N /n-K _ n-1
2 2 2 2
and we can expand (1-k 2 )" 41 " 1 ^ 2 Into the Infinite series (i/rn-l)/2^[Rn-i)/2 ^ l)/ljjk 21 .
^0 L
Hence we may write
But ,
,n-i , n fALL, 2 br A
( ~ * 1J \ V g J _~2 A 1l a i1
and a almllar expression holds for * 22 . Therefore, we may write
t*
V. ftAMPT.TWa IFRflM A NORMAL POPULATION
119
(a)
- J
00 00
5
n-1
2T A -T A
2*11*11 2*22*22
If In t ] we make use of the formula U - >flrtrF/a al n 1 + -i) we nay write [ ] as
(f)
But from the definition of the Beta function, 53.3,
t (1-t) * dt
2 '
1
1-r 2 ) 2 dr.
Therefore
a
c -1
since terms for odd values of j vanish upon Integration. Making use of this value of [ ]
In (e) we have
n-1 '
n-*i , j
2
^a n a 22 ; z v i-r )
~2~ V TT
oo oo 1
2 ,1 x n-l
( f f A ( i } a
(g) *(a n ,a 12 ,a 22 ) ^ J J^p^^n^g^il^g) j
00-1
1 1 t- C
Setting r Va n a 22 - a 12 , (g) can be expressed as (b) where
(h)
a
' a
120 _ V, SAMPLING FROM A NORMAL POPULATION _ fs~6
As we mentioned earlier, the uniqueness of this p. d. f. may be argued from, the multlvar-*
iate analogue of Theorem (B) 2.ftl.
The sampling distribution of the correlation coefficient r may be found by set-
ting a 1g - r Va n a 22 in (h), expanding e r ^ aiiai2Al2 Into an Infinite series, and inte-
grating with respect to a 1 1 and a g2 ; we obtain as the probability element of r
where p ~(A 12 / V A n A 22 ) the correlation coefficient of the population.
If p - o, the distribution of r is simply
(J)
The distribution (h) may be generalized to the case of a sample from a k-variate
normal distribution given by (b) In 3.23. The distribution for the k-variate casa,
which will be derived in Chapter XI, is
k
n-1 n-k-2 - -
A I fi t O
I Qij j i \y
00 f (aj) "
rr
,where a^^j - m^ x laf x l^ x jor" x j^ ^a^ 1 " 1*2, ...,k; a- l,2,..,,n) being the sample.
Clearly, n > k for this distribution to exist.
This is a very Important distribution function and is fundamental in the theory
of normal multivarlate statistical analysis. It is known as the wishart distribution,
5*6 Independence of Second Order Moments and Means in Samples from a Normal
Multlvariate Distribution
In 5.25 it was shown that in samples of size n from a normal distribution
N(a,cr 2 ), the quantities 6/<r 2 )^ (x a -x) 2 and n ^." a ) were Independently distributed accord
ing to f n-1 (^) and N( 0,1), respectively.
In the case of samples of size n from the k-variate normal distribution (b),
V. SAMPLING PROM A NORMAL POPULATION
53. 25, the two seta of quantitiea a j - (xi a -x 1 )Uj a -Xj)(i.J - 1,2,...,k)and 5^(1 - i,a)
are Independently distributed according to (k), 55.5, and (e), $5.12, respectively. A
straight forward method of establishing the Independence of the two systems Is by evalu-
ating the characteristic function of a^ and s&,, (l^j), and (x^-a, jvli:
k
^-
- E(e 1
where e i , - e^, which turns out to be a product of the form <>,(<<) . <> 2 (Si), 1. e.
!i - nil i
CHAPTER VI
ON THE THEORY OF STATISTICAL ESTIMATION
Let be a sample from a population whose c.d.f. depends on h parameters 6 1 ,
e 2 ,..,6^. Suppose the functional form of the c.d.f. Is known, but the true values of
the parameters are unknown. A fundamental problem in the theory of statistical estimation
Is the following: On the basis of the evidence of O n , can we assign an interval for one
of the parameters, say 6^ and then state with a given amount of confidence (the meaning
of this phrase will have to be defined) that the true value of e 1 lies in this interval?
More generally, can we make similar statements regarding a subset of the parameters, say
9i >3p' * * *'<*m mh, and a region In the parameter space These problems are discussed in
6.1. If Instead of assigning on the basis of O n an interval of values in which we esti-
mate the true parameter value to be contained, we wish to assign a single value, the prob-
lem is more difficult: We can hardly hope that our "point estimate" will coincide exactly
with the true value: In what sense can such an estimate be said to be "good"? How can
"good* 1 estimates be found? These questions are considered in 6.2. Closely related to
the problem of point estimation of one or more parameters are questions of curve fitting;
these are taken up In 6.*t.
The problems described above may be called parametric problems in statistical
estimation. There are also non-parametric cases of statistical estimation. One of these
Is the problem of tolerance limits, which may be formulated as follows: Suppose a sample
O n Is from a population In which the random variable x Is continuous. Can we determine
functions Lj and L g of the x f s In the sample such that we can state with a given probab-
ility that loop* of the x's in the population will be Included in the interval (L 1f L 2 ),
no matter what the population distribution Is? or no matter what the values of the
parameters are if the functional form of the distribution is known? This problem is
discussed In $6.3. Some of the underlying sampling theory is discussed in 4.55.
6.1 Confidence Intervals and Confidence Regions
In this section we consider the estimation of one or more parameters by means of
statements that the parameter lies, or the parameters lie, in a certain region of the para-
meter apace. The discussion of the example of 6.11 should be carefully studied: while
thla will not be repeated elsewhere, the analogous considerations pertain in every case
taken up In $$6.11-6.13.
6,11 _ VI. QN.THE THEORY OF STATISTICAL ESTIMATION _ 1gs
6,11 Case in which the Distribution Depends on Only One Parameter
It will be clearest if we begin by means of an example (range of a rectangular
distribution): Let R be the range of a sample O n from a population with the p. d. f.
f(x;e) i/e, when o x e, and o, otherwise.
It has been shown in ^.5^ that the p. d. f . of R is
f n (R;e) - n(n-i )e~ n R n " 2 (e-R), o R e.
If we introduce the f unct ion
tlr R/e,
we find that the distribution of this function of sample and parameter i independent of
the true value of the parameter, its p. d. f . is
g(i)j) - n(n-i)dJ n ~ 2 O-iJO, oibl.
We pick a positive number < 1 (it is customary to take .95 or .99) and define ty^
from
i
g()d<p - .
Then regardless of the true value of e,
which is equivalent to the statement
(a) Pr(ReR/* ) - .
It should be noted that R is the random variable in this statement and not e. The inter-
val 6:(R,R/i|Jr) is called a confidence Interval for e, and is called the confidence coef-
ficient. Let us examine the significance of the probability statement (a):
First of all, (a) does not mean that if we take the value of R from a specific
sample, say R ~ R I , that the probability that
(b) R! i e R^
is : For, e is not a random variable, it is a constant, even if unknown, and hence the
statement (b) is true or false; if (b) is true the probability is unity, and if false,
1pit _ VI. ON THE THEORY OF STATISTICAL ESTIMATION _ S6.11
zero, In no case la It . The situation Is analogous to the random drawing (with re-
placement) of balls from the classical urn, In which the proportion of white balls is ,
of black balls, 1 - . After we have drawn a ball the randomness of the process Is over,
the particular ball drawn Is either black or white, and probability statements, aside from
the trivial one that p or 1 , are no longer possible. However, If we draw a large num-
ber of balls we may expect that the percentage of white balls drawn will closely approxi-
mate 100 . More precisely: The law of large numbers (3-11) tells us that the propor-
tion of white balls drawn converges stochastically to as the number of drawings Is in-
creased.
We now see the practical significance of the probability statement (a): If we
always use confidence coefficient and always assert that the true value of the parameter
6 (It need not always be the same parameter) lies in the interval obtained by putting the
sample values into the confidence interval, then in the long run (1. e. in repeated samp-
ling) the percentage of correct statements can be expected to be very close to 100 .
Again more precisely, we should say that the probability that the proportion of correct
statements departs from by more than a fixed amount h > o, approaches zero as the number
of statements (i.e. number of samples) is increased, no matter how small h.
In general, if a distribution depends on one parameter 6, and if we have two
functions 6(C) ) and 6(CL) which depend on the sample CL but not on e, so that the Interval
H ii ii
is a random Interval, then if the probability that the random interval 6 cover the true
value of the parameter ia f
Pr|6td(0 n )l - ,
whatever be the true value e, we call 6(0 ) a confidence interval for e, and the confi-
dence coefficient. We shall sometimes refer to the pair 6, 1) of random variables as
confidence limits. This terminology is due to Neyman.
The method of finding confidence Intervals that was employed In the example is
worth noting: It depends on finding a function ^ of O n and e whose distribution ia inde-
pendent of 6. If the function V is monotone and continuous in e, then the relation
can be inverted to read
S6.11 VI , ON THE THEORY OP STATISTICAL E3TIMATIOK
Pr(ee.<5(0 n )) - ,
where <*(0 n ) is the confidence interval. Another perhaps more direct method of determining
confidence intervals ia aa follows :
Suppose T(x 1 ,...,x n ) is a function of a sample O n : ^ ,x g , . . . ,x n ) from a popu-
lation with distribution element f(x;e)dx, such that the probability element of T is
g(T;6)dT. Suppose the range of values of T having non-zero probability density is (a,b),
and suppose the range of possible values of 6 is (a,fl). Suppose two continuous monotone
increasing functions T^(e) and T(e) exist such that
(d) a
\ g(T;e)dT -
a
b
\ g(T;e)dT -
T-
where p and q are positive such that p + q - 1. Assume that g(T;e) is such that T^ and
T each ranges from a to b as 6 ranges from a to p. Then for a given value of T, let 6
and e be the values of e for which T^(e) = T, T(e) - T, respectively. Then (e,e) is a
confidence Interval for e with confidence coefficient . That (6,6) is a confidence inter-
val follows from the relation
Pr(T (6) T T(6)) - ,
which, because of the continuous raonotonic character of T^(6) and T(6), may be inverted
and written as Pr(6 6 ~e) - . It should be noted that we may obtain confidence limita
for each value of p if functions T^(e) and T(e) of the required kind exist for each p.
The question arises as to which value of p is "best". This would depend, of course, on
what definition of "best" we choose. In those cases where the mean value of the length of
the confidence interval is a function which factors in the fonn h.(p)h ? (6), coranon aenae
suggests that we should choose p so that the mean length is a minimum. In the case of
large samples, the definition of "best" confidence intervals ia fairly direct (see J6.12).
We may represent confidence intervals obtained by thia process, graphically as
follows :
*We permit a or a to be -oo , b or fl to be +00 .
126
VI. ON THE THEORY OP STATISTICAL ESTIMATION
Figure
Suppose the true value of 6 is e Q . For any sample value of T, the corresponding confi-
dence Interval Is formed as follows: Draw a line parallel to the 6- axis , defined by
T - sample value. Let A, B be the points of Intersection of this line with the two
curves as Indicated In Fig. 6 . The confidence Interval Is the projection of the segment
AB on the 6-axls. The confidence Interval will cover the true value of 6 Q if and only if
the segment AB crosses the line 6 = e , that is if and only if T falls in the range
T (6 Q ), T(e Q ), But the probability of T falling in this interval is precisely . We
thus have Pr(e e Q ^ 6) - . The discussion and conclusion hold for any 6 in the range
(,P).
This method, for example,' has been applied by R. A. Fisher to the problem of
determining the confidence limits for p from the distribution (1) 5.5 of the correlation
IL
coefficient r. Fisher uses the term fiducial limits instead of confidence limits?*
The idea involved in this method has also been applied to cases where T is a
discrete random variable to obtain approximate confidence limits for the. parameter in-
volved. In this case the signs in the analogue of (d) for the discrete case are replaced
placed by ^ signs, and the largest value of T^(e) and smallest value of T(e) are obtained
satisfying the inequalities. T (6) and T(e) will be step-functions and the approximate
confidence limits are obtained by drawing a smooth curve through the graphs of the step-
functiona. For example, Clopper and Pearson (Blometrika, Vol. 26 (ijjk), pp. kok-hij)
have applied the method to the problem of determining approximate confidence limits for
the binomial probability parameter p from the statistic in the binomial distribution
6.12 VI. ON THE THEORY OP STATISTICAL ESTIMATION 1P 7
C p x q n ~~ X (x-0, 1 ,2, . . . ,n), and Rlcker (Journal of the American Statistical Association.
n x " "
Vol. 32 (1937), pp. 3^9-356) has applied the method to the Polaaon distribution ^ e ~" m >
where m ia the parameter and x the statistic. A method of determining confidence limits
for 6 from large samples based on the likelihood function is given in 6.12.
6.12 Confidence Limits from Large Samples
Suppose x has c. d. f. F(x,6), where e is a parameter. Let O n be a sample of
size n from a population having this c. d. f . Let P(0 n ,e) be the likelihood function,
1. e.
(a) P(0 .e) =
i 1 A
where f(x,6)is the p. d. f. If x is a continuous variable, and is simply probability if
x is a discrete variable.
We recall the first method of obtaining confidence Intervals given in 6.11,
which depends on finding a function 'JJ of O n and e whose distribution Is Independent of e.
That a function of the desired type for large 'samples may be obtained from the likelihood
function P(0 n ,6) may be concluded by use of the central limit Theorem (C) of U.21. The
central limit theorem applies to a sum (the average), so we replace the product in (a) by
a sum by taking logarithms :
n
(D) log P(0 n ,e) = > y >
where y^ = log f(x^,e) may be regarded as a random variable for any fixed e. To apply the
central limit theorem we need E(y) arid cr. where y = log f(x,e). Now
j
4-OO
(c) E(y) = j log f(x,e)d x F(x,e),
-no
where d x; F(x,e) = f(x,6)dx in the continuous case, and the integral (c) becomes a sum in th
discrete case. The calculation (c) does not give a simple result, but It is clear that if
we employed z = dy/6e, then
+00
E(z) = \ ^ log f(x,6)d F(x,6) ,
J U ~ .A
-oo
and in the continuous case this becomes
+00
E(z) - J ^ f(x,e)dx.
-00
*If x is discrete, f(x.e) = F(x.e) - F(x-o,e)
12 8 VI. ON THE THEORY OP. STATISTICAL ESTIMATION 6,12
If the order of integration -and differentiation may be interchanged,
(d) E(z) - o.
Let us now assume (d) to be true in any case and furthermore that A 2 - E(z 2 ) is finite.
Differentiating (b) we get
1-1
and hence
Applying the central limit theorem to 7 we have that
- log P(0 n ,e)
n
la asymptotically distributed according to N(0,1). We summarize in
Theorem (A): If
log f(x,e)| - 0, and A 2 - Ef[ log f(x,6)] 2 |
i finite, then
-1--2- log P(0 n ,e)
VnA ae n
iS asymptotically distributed according to H(0 t l ).
Hence we have approximately, for large n,
(e)
where d la chosen so that
d - V
1 r^ o*f
~\ e 2 dy . .
V2TT J
Now if - 3. 1S P is monotone in e, we may invert in (e) and write the result
VnA ae
6.lg VI. ON THE THEORY OP STATISTICAL ESTIMATION i og
(f) Pr(6 6 6) - .
The asymptotic confidence intervals (f ) furnished by ~ -- ^ " are optimum in
"
the following sense: The mean value of i^~ ( -- d ^fl P ) 1 2 is greater than that of any
6 e
other function G(0 n ) f the sample which has N(0,1) as its limiting distribution .
This maximum property of the mean squared rate of change with respect to 6 implies short-
est average confidence intervals in a certain sense, since confidence intervals are ob-
tained by taking the inverse of & with respect to 6.
VnA 36
Example : Suppose samples of size n are drawn from a population having the bi-
nomial distribution
f(x,p) - dF(x,p) - P X O-P)'~ X , x - 0,1.
In a sample of size n
^l n - S=*l
n 1 n-n 1
where n 1 = J>L x,. We verify that E(d log f/dp) - o and calculate
and
1 3 log P _ VP(I-P) ,11 1LZL\
wr* 6D ^[ V D 1-t) ;
V p (i - p)
Therefore, to find approximate confidence limits with confidence coefficient , we
invert the expression
*For proof for the case where G(0 n ) is of the form hjx^e) where G(0 n ) is asymptoti-
cally distributed according to N(0,1 ), see S. S. Wilis, Annals of Math. Stat., Vol. 9
(1938), pp. 166-175, and for more general results, see A. Wald, Annals of Math. Stat.,
Vol. 13, O9**2), PP- 127-137.
150
VI. ON THE THEORY__OP STATISTICAL ESTIMATION
obtaining
p p) -
_ 1 p O p
where p and p are given as the roots of the quadratic ( p) n d|(p - p ).
6.13 Confidence Intervals in the Case where the Distribution Depends on
Several Parameters
Suppose that the c. d. f. of the population depends on parameters 6 1 ,6 2 , . . .,6_,
and we wish to estimate 6^ If there exist functions 6^0 n ), ^-|(0 n ) f the sample, such
that the probability that the random interval
cover the true value of 6 1 does not depend on the true values of 6 1 ,6 2 , * , . ,6^,
Pr|6 1 tc$(0 n )| , independent of 6 1 ,6 2 , . . .,6^
then we say that 6(0 ) is a confidence interval for 6 1 with confidence coefficient .
(The parameters 6 2 , S-,..., 6^ are sometimes called nuisance parameters . )
Example J_ (Mean of a normal population) : If is a sample from a population
with distribution N(a,cr^), then in the notation of 5.3,
t - Yn(x-a)/a
has the t-dlstributlon gj.^^t) with n-1 degrees of freedom. Define t^ from
Then
= Pr(-t ^ t ^ t ) -
whatever be the true values of a and o- 2 . Hence (x-t^a/ Via, x+t^a/ fn) is a confidence
interval for a with confidence coefficient .
Example 2_ (Difference of means of ^bwo normal populations T<nown t have the game
variance ) : Let O n :(x il ,x 10 , . . . ,x in ) be a sample of size n^ from N(a i ,cr 2 ), 1=1,2.
Let 1 " 1
VI. ON THE THEORY OF STATISTICAL ESTIMATION
Then by 5-25 3^/cr has the --distribution with r^-1 degrees of freedom, hence
(5.23) S/cr 2 , where 3 - S 1 + S , has the T^-distribution with n. -f ri - ? degrees
of freedom. Furthermore, y = (d-a)/[a' 2 (n^ +n~ 1 ) J 1 ' ? has the distribution N(0,1),
and aince y and S/o-^ are statistically independent (5.0 1 ?), it follows from r ^.^
that cry/[3/(n 1 -fn -2) ] 1 ' 2 has the t-distribution with n 1 + n -p dcgrooa of i reodom,
gn + n _ 2 (t). Defining t f from
we find by the method of Example 1 that a confidence interval for a Is (d-t f 3 !
d-i-t^s ' ), where
Example
Let
(Variance of a normal distribution) : Let be a sample from N(a,cr
JL _ 2
= >_(x,-x) ,
X
_
1=1
Then (5-25)
o o
Xfv "^2 be
has the x" di3tribution f n -i^) wlth n " 1 degrees of freedom. Let
*** '
two P^ n ^ 3 on tne ran S e (*oo ) such that
We find that
ent .
) is a confidence interval for cr ? with confidence coeffici-
Example k (Ratio of the variances of two normal distributions ) : Let :
_. n
i?' * ' * ' x ln
n^ from N(a^,cr. ),
Let
J
J
Since (n.-1 )s^ for 1=1,2, are independently distributed according to -^-distributions
with n.^-1 degrees of freedom respectively, it follows from 5.^ that T/6 has the
F-distribution h^ . ^ .(F) with n--l and n -i degrees of freedom. Pick a pair of
n. - i , n o ~ i i d
limits F l , P p so that
since S 1 and S are statistically independent
VI. ON THE THEORY OP STATISTICAL ESTIMATION.
1 _ 1>n _ 1 (F)dP - .
P ,
Then a confidence interval for 6 is (T/F 2 ,
6.1U Confidence 'Regions
We suppose that the population distribution depends on parameters 6 1 ,6 2 , . . . ,6^.
We denote the parameter point (6 1 ,e g ,. . .,6^) by 9, and the entire h-dimensional space of
admissible parameter values by _fl . If d(0 ) la a random region in O which depends on the
sample O n , but not on the unknown parameter point , and if the probability that the ran-
dom region <5(0 n ) cover the true parameter point is independent of ,
Pr J9fc<5(0 n ) I * , Independent of 0,
then we say that <5(0 R ) is a confidence region for , with confidence coefficient .
It may be desired to estimate only a subset 6 1 ,6 2 , . . . ,e m , m < h, of the h para-
meters (the remaining parameters are called nuisance parameters ) . Denote the in-dimen-
sional space of ' :(e 1 ,e ? ,...,e m ) by A 1 . If d ! (0 n ) is a random region in n ! such that
Pr|0 I ed l (O n )| = , independent of 0,
whatever be the true value @, then <5'(0 ) is said to be a confidence region for e ! with
confidence coefficient .
Example; Suppose O n and O n are samples from normal populations Nfa^o- 2 ) and
N(a 2 ,cr ? ), respectively. We know f^om 5.25 that S^cr 2 , s /o- 2 (defined in Example 2,
6.13),
n(x-a ) 2 n
1l - l
are independently distributed according to 7^ 2 - laws with n.-l, n 2 -l, 1, 1 degrees of
freedom respectively. By 5.23 it follows' that
2 2
cr cr
are independently distributed according to ^ 2 -laws with n. + n 2 -2 and 2 degrees of
freedom respectively. Hence if we set
g-,,
* ~ - v~5~;
S, + 3 2
then P la distributed according to h ' _ (F) where n - n, + n .
c ^ n c. \ c.
SS6.2.621 VI. ON THE THEORY OP STATISTICAL E3TBMTION Lii_
Therefore If F la chosen 30 that
P
\ **,*-*<*** ' '
we may say that
i(r a i> 2 +W a a > 2 ,n-a. x - . ,
Pr( 3, + S 2 ( > * P J " '
which is equivalent to the statement that
Pr|(a,,a 2 )fc<5(0 n ;i - ,
where <5(0 n ) la the region in the (a,,a p ) plane bounded by the random ellipse with
equation
2 - 2 2(S,+S 2 )
Vr a i> * n 2 (x 2 -a 2 ) 2 = P n . 2 2 .
In other words , the probability Is that this ellipse will cover the true para-
meter point (a^a ).
6.2 Point Estimation: Maximum Likelihood Statistics
Throughout this section we consider the point estimation of a parameter 6 in the
c. d. f . of a population." There may be other unknown parameters present, if so, we de-
note these by e ,e , . . . ,e^. A statistic is any function T(0 n ) of the sample, not depend-
ing on 6, or on any other parameters if such are present. Point estimation consists of
the use of a single statistic for estimating the parameter; confidence intervals, we re-
call, Involve two statistics, the end-points of the confidence interval, satisfying cer-
tain conditions (6.1). Desirable conditions for statistics used as point estimates have
been given by R. A. Fisher: An optimum estimate satisfies the criteria of consistency,
efficiency, and sufficiency, defined below. A method which sometimes yields optimum sta-
tistics is Fisher's method of maximum likelihood.
6.21 Consistency
A statistic T(0 n ) is said to be a consistent estimate of 6 if T converges stoclv-
astically (4.21 ) to e as n * oo . From Theorem (B) of ^.21 we know that whenever the
population has finite variance, the sample mean is a consistent estimate of the population
mean. We remark that consistency is purely an asymptotic property. If for every n,
E(T) =* e, then we say that the statistic T is unbiased . It follows from Theorem (A) of
4.21 that the sample mean is always an unbiased estimate of the population mean (when-
ever the latter exists). The following theorem enables us to recognize the consistency
VI. ON THE THEORY OF STATISTICAL ESTIMATION
of statistics In many cases:
Theorem (A); A sufficient condition that T be a consistent statistic for esti-
mating 6 Is that E(T) + 6 and <r| * as n + oo .
that
To prove the theorem, write T(0 n ) T R , and set any > o. We need to show
Pr( IT n -e| >) >oasn oo.
Since E(T ) > 6, there exists an N such that
|E(T n )-6| < l for n > N.
We note that for n > N the Interval of T n values |T n -E(T n )| T> is always contained in
the interval |T -e| . Hence the probability of T falling outside the latter interval
is the probability of T falling outside the former:
Pr(|T n -6| > ) Pr[|T n -E(T n )l > ^] - Pr[ |T n -E(T n )| > <T T . /2cr T ].
By Tchebychef f f s inequality (2.71) the last expression Is (2<r T /) 2 , and by hypothesis
this * as n oo for all fixed > 0.
Example; Let O n be a sample from a population with an arbitrary c. d. f . about
which we assume only that the fourth moment |i k about the mean exists, and consider
P o ^
s as an estimate of <r , where
_
Then we know E(s 2 ) -<r 2 , and by use of the well known formula for the variance of s 2 ,
it follows from theorem (A) that s 2 is a consistent statistic for estimating o- 2 . If
we apply the same theorem to
(a') 2 - (x r x) 2 /n,
we find it is also a consistent estimate of o- 2 , but unlike s , it is biased.
6.22 Efficiency
T(0 ) is said to be an efficient estimate of 6 if
i) VrT(T-e) is asymptotically distributed according to N(0,fi) with ^ < oo,
VI, ON THE THEORY OF STATISTICAL ESTIMATION
11) for any other statistic T f (0 n ) such that Vn"(T f -6) Is asymptotically distributed
according to N(0,fi'), n fi f .
Since the asymptotic mean and variance of T are e and >i/n, respectively, It follows from
Theorem (A) of 6.21 that (1) Implies the consistency of T. The efficiency of T f In esti-
mating 6 Is defined by E -
Example; Consider the sample mean x and the sample median x of O n from N(a,<r )
as estimates of a* We have from 5.11 that VTT(5c-a) Is distributed according to
N(0,o^), and from 4. 53 that Vn(3!-a) Is asymptotically distributed according to
N(0,~rror 2 ). Hence x Is more efficient than X. However, to prove x "efficient" It
would be necessary to verify condition (11) of the definition. This example may be
generalized as follows: If O n is from a population with p, d. f. f(x), if the pop-
ulation median = a (the population mean), and if f(x) is continuous at a, then using
the results of ^.53 on the asymptotic distribution of x, we find that x is a more
efficient estimate of a than x if o- < [2f(a)]~, 1 x is more efficient if o- > [2f(a))~l
6.23 Sufficiency
T is said to be a sufficient statistic for estimating e if for any other statis-
tic T 1 , the conditional distribution f(T f |T) of T f , given T, is independent of 6. (We use
the same notation f(T ! |T) whether the population Is continuous or discrete.) Thus, ex-
pected values, moments and other probability calculation* about T f , given T, will be calcu-
lated from f(T'|T) and hence will not depend on e, but they will depend on T in general.
Or, in Fisher 1 s terminology a sufficient statistic "exhausts the information" in a sample.
We note that sufficiency, unlike consistency and efficiency, is not merely an asymptotic
property.
A convenient method of spotting sufficient statistics is embodied in
41
Theorem (A) : If the population distribution is continuous, let P((L;e,e , . . ,6u)
'' ' ' ' - = - - ' - - - ~~~ ~" ""''' "---'" II C. I*
be the p. d. f. of : if the distribution la discrete, let P(CL;e,6 , . . . ,& ) be the dls-
~ ' " Ji '' ' . 1 - - - - __.-_LL_ H J _ ._.. -
crete probability of . In either case a necessary and sufficient condition that T be a
sufficient statistic for estimating 6 Ij3 that the function P_ factor in the following
manner
P(0 n ;e,e 2 ,...,e h ) - g 1 (T;e,6 2 ,...,6 h )-g 2 (O n ;6 ? ,e 3 ,...,e h ).
A sufficient set of statistics with regard to a set of parameters may be defined-
and an analogue of Theorem (A) obtained for that case; see Neyraan and Pearson, Statistical
Research Memoirs, Vol. 1 (1936), pp. 119-121.
*Por proof, see J. Neyman, Giornale dell Istltuto I ta llano degll Attuari, Vol. 6 (193*0,
pp. 320-33^.
15 6 VI. ON THE THEORY OF STATISTICAL ESTIMATION
Example 1 ; Suppose O n is from N(a,o-^). Then
*f O . O 1 1 / P
(x*~aj /o" ~ -"*** - ^ I **
P(0 n ;a,cr 2 ) e 2 (!
where
(Here and in the following examples the factors corresponding to g 1 and g of Theorem
(A) are separated by a dot.) Hence x is a sufficient statistic for estimating a.
In this case there is no sufficient statistic for <r "but it is easily shown that
x,3/(n-i) are a sufficient set of (unbiased) statistics for a and <j".
Example 2: For R from N(0,6),
- In - ls'/e
P(0 n ;e) = (2ite) 2 e 2 -i,
? o
where S 1 wj x i- Hence 3' is a sufficient statistic for estimating a. S ! /n is an
unbiased (see ex.2,6.2U) sufficient statistic.
Example 3 : Suppose the population has the discrete distribution
p(x;6) = 6 X e~ 6 /xj, x = 0,1,2,...
We recall from 3.15 that F,(x) = 6. For a sample : (x 1 ,x , . . . ,x n ),
n x, -e
F(0 r ;e) - FTe e /x, I .
If we write this
/ x i -na n
P(0 n ;e) = (e 1 1 e )'(i/TTx '),
n 1
2X* is a sufficient statistic for estimating 6. Since
we see that
it follows that x = <2__x,/n Is an unbiased sufficient statistic,
b . P J t Max 1 mum JUI 1 i? 1 i h P o_cl gstimates
The function P(0 ;e,e , . . . ,e,. ) defined in Theorem (A) of 0.23, when considered
as a function of the parameter point e,e o ,...,e, , for fixed CL, Is called the likelihood
r. 1 1 1 1 *
of the parametei' point. If the Ill-el lliood function P has a unique maximum at a = (0 n ),
a ? = 6 p (0 n ), . . . ,6| ] ~ a^(0 ), Uu3ii tlie set nf statistics e,a , ...,e. Is called the maximum
likelihood oatimate of the pnmmetor point.
S6.2U _ VI. ON THE THEORY OF STATISTICAL ESTIMATION _ 137
Let ua consider the case of one parameter , say e. In Theorem (A), 6.12, It was
shown that under certain conditions the quantity j is asymptotically distributed
VnA ae
according to N(0,1 ). Let us assume that 6 is the value of 6 which maximizes P and that we
can make the following expansion about a 6,
VnA 66 v~nA 66 TnA 36 * 2 VnA d 6^
tl, ( e-8) + -1- ( tl2tP> (
^ * ^ *
-AV + UV +
where 6 is on the Interval (6,6), and
We have employed the fact that 9P/de vanishes for 6=6. Now from Theorem (A), 6. 12,
YnA 86
where TI v o a3 n v GO .
Making use of (a) we may write
d . 1,
(c) Pr(-AV + W +7riwv- ? < d) =^ \ e 2
-00
Considering U, V, W as three random variables, the left side of (c) states that the prob-
ability of U, V, W falling into a certain region in this space is given by the expression
p
on the right. Now let us assume (1) that (-!- - ^f )^ converges stochastically to
P
l x 6 - 2
we 3 hall assLune = -A (implying that U converges stochastically to 0)
as n CD , (2) that (- - )y (o^d hence 2AW) converges stochastically to some finite
n 36^ ^
number K, and (3) that V has some limiting non- degenerate p. d. f. aan co (1. e.,
has a c. d. f. which la continuous), then the limiting form of the distribution function in
the If, V, W space aa n > oo is a one-dimensional p. d. f . along the straight line
JU -
' W K .
M6 _ VI, ON THE THEORY OP STATISTICAL ESTIMATION _ 6.2**
The p. d. f . on this line la that of the limiting distribution of V. Hence,
Llm Pr( -AV + UV + WV < d) - Lira Pr( -AV < d).
n * co Vn n * oo
The equality of the two expressions for A Is a reasonable assumption as the reader will
see from the following discussion:
ae " ae
Differentiating this with respect to e, we get
, * f a 2 iog f df a log f a 2 f
\e) J - 5 * - ~ P
ae ae ae ae
Substituting (d) Into (e), and Integrating with respect to x from -oo to +00 , we have
tf ]m f a| ta-
a e ae -oo de
Now If we may Interchange the order of Integration and differentiation in the right member,
then the left member is seen to be equal to
a 2
r ^
\ f dx - -^ (i) - o.
-00
We may summarize in the following
Theorem (A) : Let 0_(x. ,x , . . . ,x,_ ) be a sample from a population with c. d. f.
1-* n i . n - -
F(x;e). Let P(0 n ) ~ 1 U, f ( x i ; e ) M the likelihood function, where f(x;6) 1_3 the . d. f .
1 x ii a continuous variable and probability of x i_f x i_3 discrete. Let P(0 T ;e) have a
unique maximum at e = e, and assume
ae ae'
and that as n oo
(II) (;- 9 lo ^ P )^convergeg gtocliagtic:;lly to -A p ,
n de ^
(III) ( fj d l0 ^ P ^ converges 3tocIin.atlcn.llY t<> a finite K,
ae^ ~
(iv) VrT(e-e) has a limiting non- degenerate p. d. f .
VI. ON THE THEORY OF STATISTICAL _ESTIMAT I ON
59
Then Yn(3-e) is distributed asyrnptotically according to N(o,^).
Under fairly general conditions, which will not be given here, it can he shown
that if 6 is any other statistic such that YrT(6-6) Is asymptotically distributed accord-
ing to N(0,B 2 ), then B 2 > ~r.
A 2
In the present case where the c. d. f. F(x;6) depends on only one parameter 1t
is often possible to transform from the old parameter 6 to a new parameter <t> s< that the
asymptotic variance of 6, the maximum likelihood estimate of b will be Independent of the
o
parameter. Let A be defined as before, let <t> = h(6), a function to be determined, and
define
We will try to determine the function h(e) go that B' la a given positive constant. We
have
<t> - gi Ade + C,
where C Is any arbitrary constant. If the last equation determines <b as a monotonlc con-
tinuous function of 6, then since P(0 n ;6) has a unique maximum for 6=6, clearly
P(0 n ;h" (<b)) has a unique maximum for <{>=$ = h(6). By Theorem (A) the asymptotic vari-
/% o _ i
ance of <t>, the maximum likelihood estimate of <b, will be (nB^J which Is independent of
<t>. As an illustration the reader can verify that in Example 2 below, <b log 6 is a new
parameter of the desired type.
Theorem (A), 6.!? f and Theorem (A) of the present section can both be extended to
the case of several parameters. In the case of several parameters it may be shown under
conditions analogous to those in Theorem (A) that for large n Vn^-e), Vn(6 -e ),...,
Yn~(6, -6, ) are asymptotically distributed according to a normal multivarlate distribution
with variance- covariance matrix I \<r*<r,p* J I given by | |A 1 -I I" 1 , where
<n
Exainple_1: Suppose O n is from N(a,1):
H VI. ON THE THEORY OF STATISTICAL ESTIMATION 6.24
f(x;a) = (2tt) 2 e 2 ,
log f - - | log (2TT) - ^(x-a) 2 ,
9 log f/da x - a.
P
If we use the first of the two expressions for A we get
A 2 = E[(x-a) ? ] = cr ? = 1.
To use the second expression we would have to take the expected value of
-3 2 log f/da 2 = 1,
and we note we get the same result. To find e we inspect
- ^n - l[n(x-a) 2 +S]
P(0 n ;a) = (2TT) - e 2
and see that this is maximum when the exponent is minimum, that is for
a = x.
Theorem (A) says that VrT(x-a) is asymptotically distributed according to N(0^1).
In the present case this is the exact distribution.
Example P: For R from N(0,6),
f(x;6) = (2tre) 2 e 2 ,
log f = - 1 log (2it) - ~ log 6 - |x 2 /6,
dlog f/de . l(-i/e + x 2 /e 2 ),
A = lS[(-i/e + x 2 /e 2 ) 2 ).
Let us see whetner it, may not be easier to calculate A from the other formula:
9 2 log "
-E(e" 2 -x 2 /e 5 ) = - ie' 2 4 E(x 2 /e)/e 2 .
O
Since x /e has the -^"-distribution with k - 1 degrees of freedom, its mean is k =1
Hence
A 2 - - ie- 2 + e' 2 - ie' 2 -
6.24 VI. ON THE THEORY OP STATISTICAL ESTIMATION U1
Now 1 1 V
- -n - -n - Tri
P(0 n ;e) = (2n) 2 e 2 e
where S f - 2_x i 2 . Differentiating
log P - - | log (2-it) - 7p log 6 -
with respect to e, we get
. lp-(-n/e+s'/e 2 ).
Equating this to zero and solving for 6, we find
6 - S ! /n.
A P
By Theorem (A), VTf(e-e) is asymptotically distributed according to N(0,26 ). Since
S 1 / 6 actually has the x -distribution with n degrees of freedom, its exact mean and
variance are n and 2n, respectively; hence the asymptotic mean and variance given by
Theorem (A) turn out to be the exact mean and variance. However, the'.Sxact distri-
< P
bution of n6/6 is the ^ -distribution with n degrees of freedom, and not a normal
distribution.
Example 3: As an illustration of the method of obtaining maximum likelihood
estimates when the distribution is discrete, consider again the sample of Example 3,
6.23. We may write
P(0 n ;e) - e ra e~ ne U,
where
U- 1/ftx '
1-1
are independent of 6. To find 6 we setdP/36 = o and solve for 6:
log P nx log e - ne + log U,
dP/ae P-(nx/e - n) - o,
6 = x.
This we have already shown to be 'an unbiased sufficient statistic. We calculate
log f(x,6) = X log 6 - 6 - log Xj
d 2 log f/ae 2 = - x/e 2 ,
A 2 - -E(-x/e 2 ) = 1/6.
Thus Theorem (A) tells us that Vn(5c-e) is asymptotically distributed according to
N(o,e).
VI, ON THE THEORY OP STATISTICAL ESTIMATION
Example
In thla example we Illustrate the method of maximum likelihood for
dbtainlng estimates when more than one parameter is present in the population distri-
bution. Suppose O n Is from N(a,6). Then
P(0 n ;a,e) - (2tr e)
where
i-i
To find the estimates a, $, we set
3P/da - aP/ae =
and solve for a and e:
log P - - |n log (?TT) - l n log 6 - l[n(x-a) 2 +S]/6,
dP/da = P[n(x-a)/e] = o,
ap/de - lpf-n/e + [n(x-a) 2 +sye ? | o.
The solutions of these equations are easily found to be
a = x, e = 3/ri.
As we have previously noted, these are both consistent estimates, but the latter is
biased.
Let us compute the asymptotic variance- covarlance matrix of V"n(a-a) arid VTT(e-e)
as given in the generalization stated below Theorem (A):
log f = - 1 log e - (x-a) 2 /e - 1 log (sir),
26
Hence the asymptotic variance- covariance matrix la
daae
x-a
o"
"
A aa A ae
A ea A ee
- 1
5
^
-1
=
e o
26 2
It is easily verified that the entries in the last matrix are exact with the excep-
tion of the (2,?) entry whose exact value is P6 2 (n-i)/n.
6.3 Tolerance Interval Estimation
In the foregoing sections we have discussed two methods of estimating one or
more parameters in distribution functions from samples: the method of confidence inter-
vals and the method of point estimation based on the method of maximum likelihood. If
VI. ON THE THEORY OF STATISTICAL ESTIMATION
the original parameters, say 6 1 , 6 2 ,..., e^ are transformed to new parameters <(>. , <t> ,
> *h by any one-to-one transformation ^ - 4^(6^ 6 2 ,..., e h ), 1 i,2,...,h, which is
continuous and possesses first derivatives, we may apply both methods of estimation as be-
fore to the problem of estimating the new parameters. In fact, It can be readily verified
that the maximum likelihood estimates of the d> 1 are d> 1 (6 1 , e g ,..., e^, 1 i,2,...,h,
where 6^ 2 ,..., ^ are maximum likelihood estimates of the e^ A specific case of
transforming a single parameter was discussed In 6.24; the problem there was to find a
function of 6 having a maximum likelihood estimate whose variance in large samples (to
terms of order ) does not depend on this function of e.
Another problem of estimating a function of the parameter which deserves special
comment Is that of setting tolerance limits (see 4.55). This problem is as follows:
Suppose f(x,6)dx is the probability element of x where e is the parameter. For a given
< p ! < 1 let L 1 and L ? be such that
L 1 oo
J f(x,e)dx - 1|., ^ f(x,e)dx - ^ .
-oo L^
L! and L e are continuous functions of p 1 and 6; denote them by L-^p 1 ), L 2 (9,p f ). Prom the
discussion 3n the paragraph above it follows that in a sample of size n the likelihood es-
timates of 1^(6, p 1 ) and of L 2 (e,(V) are L^^p') and L ($,P'), which are completely expressible
in terms of the sample, when the functional form of f(x,e) la given. Now the Integral
f(x,e)dx
is a random variable which represents the proportion of the population for which L (e,p f )
< x < L 2 (e,p). Assume that the distribution function of the integral (a) is independent
of e. If e converges stochastically to e aa n * oo (which ia Implied by the assumption
(iv), Theorem (A), 6.24) then L^e,^ 1 ) and L p (6,f5 f ) converges stochastically to L^e,^ 1 )
and L 2 (e,p ! ), respectively, and hence the Integral (a) converges stochastically to f5 f .
Therefore, for a given p on the interval (0,1), and on the same interval and choosing
some p 1 on the interval (p , i ), one can choose an n, say n, such that for n > n !
< b > Pr(v>p) > ,
no matter what value e may have. For some values of p and , particularly those near
unity (e. g., .95 or .99) there exists a smallest n, say n Q , such that
VI. ON THE THEORY OF STATISTICAL ESTIMATION
(c) Pr(v^p) > .
Therefore, under this condition L^^p 1 ) and L 2 (,p') are 1 oop % parameter- free tolerance
limits at probability level (see ^.55)*. The interval L^^p 1 ), L 2 (3,P') on the x-axis
may be referred to as a tolerance interval baaed on samples of size n Q for covering 01
estimating at least loop % of the values of x of the population, with a probability of
at least . These results may be extended to the cas in which two or more parameters are
involved in the distribution function of x.
It is evident that there are many ways of choosing tolerance limits as functions
of 6 so that statement (b) can be made, e. g., L 1 and L 2 could be determined by cutting
off unequal probabilities from the tail of the distribution function f(x,e) rather than
equal probabilities.
The reader should note carefully the distinction between a confidence interval
statement (6.11 ) about a population parameter and a tolerance Interval statement (in this
and ^.55) about a population proportion. It will be seen, however, in the last ex-
ample of 6.12 that the confidence statement about the proportion p In a binomial popula-
tion is closely analogous to a tolerance interval statement about a population proportion
in the case of a population with a continuous random variable.
As an example of tolerance limits of this type involving two parameters con-
sider a sample O n (x 1 , X Q , ..., x n ) drawn from the normal distribution N(a,<r 2 ). Lot x be
? 1 ^ - P
the sample mean and s = T 2_ (x,-x) . Let t , be such that
11 - 1
-v
where g n _ 1 (t) is the "Student" distribution with n-l degrees of freedom (see 5-3 ). Let
L I - x - t , y^jp- s, and L ? = x + t t yjlp- a The proportion of the population having
values of x on the tolerance interval x + t , y j-p a Is
(e) \ N(a,a 2 )dx.
L 1
The distribution function of this integral la not known. However, it has been shown'
*For details of the approach to tolerance limits for large samples when the functional
form of f(x,e) is known, see A. Wald. "Setting of Tolerance Limits when the Sample is
Large", Annals of Math. Stat . , Vol. 13 (191*2).
**S. S. Wilks, "Determination of Sample Size for Setting Tolerance Limits", Annals of
Math. Stat . , Vol. 12, (19^1), pp. 9 J *-95.
6.U VI. ON THE THEORY OF STATISTICAL ESTIMATION . U5
that the mean value of this integral Is p. Its variance has been determined only for
9 "tfit 1
large samples, which is t^,e ' /(rtn) to terms of order ~r.
In the discussion thus far, it has been assumed that the functional form of the
population distribution f(x,e) is known but the value of e is unknown. From the point of
view of practical statistics the case in which x is a continuous random variable with an
unknown distribution is perhaps more important than the case in which the functional form
la known. Thia case haa been treated in **.55*.
6.U The Fitting of Distribution Functions
The problem of fitting of distribution functions is as follows: Let
F(x,e 1 ,e 2 , . . . ,6^) be a c. d. f. depending on the h parameters e 1 , 6 , ..., 6, , and let
be a sample of alze n from a population having this c. d. f . Consider the values
x.j, x , ..., x of the sample aa n values of a variable x. From these n values we can
construct an "empirical" c. d, f ., say F n (x). The problem of fitting F(x,6. , . . . ,&) to
F n (x) is that of determining e ] , e^,...,^ so that F(x,6 1 , . . . ,e^) is approximately equal
to F (x) in some aenae.
The method of maximum likelihood provides one method of determining values of
e^ e ? ,..., e h by maximizing the likelihood f|f (x l ,6 1 ,6 ? , . . . ,6^ with respect, to the e'a.
Clearly the values assigned to the parameters by this method are precisely their maximum
likelihood estimates 6 1 , 6 2 ,..., ^ (6.2U). This method of fitting la beat in the sense
that for large n, the variance of each 6^ ia leaa than or equal to that of any other con-
sistent and asymptotically normally distributed eatlmate of 6, .
Another method of fitting which la eaay to apply in many problema ia the method
of momenta . Thia method conalata of equating the moments
and solving for 6-, 6 2 ,..., e, (assuming pj exists for i = i ,2, . . . ,h), where
*For further details, not given here, the reader is referred to S. S. Wilks, loc. cit.,and
also S. S. Wilks, "Statistical Prediction with Special Reference to the Problem of Toler-
ance Limits", Annals of Math. Stat . , Vol. 13 (191*2). An extension of the notion of tol-
erance limits to two or more variables Is to be presented in a forthcoming paper in the
Annals of Math. Stat. by A. Wald.
VI, ON THE THEORY OP STATISTICAL ESTIMATION
00
-00
CD
M{ -
-00
In the case of fitting certain distributions, for example the normal distribu-
tion, the binomial and Poisson distributions, the two methods of fitting yield the same
results.
CHAPTER VII
TESTS OF STATISTICAL HYPOTHESES
Suppose the distribution function of a population depends on parameters 6^ G ,
. .., &. We assume the functional form of the distribution to be known, but not the true
parameter values. Let /ibe the h- dimensional space of admissible parameter values. De-
note the parameter point by. Let uj be a specified point set InH: it may be of dimen-
sionality 0,1 , . . . , up to h. In this chapter we consider tests of the statistical hypoth-
esis,
A teat of H is a procedure for accepting or rejecting H Q on the evidence afforded by a
sample from the population. A more precise definition of a test will be given in 7.3.
As a general rule one seta up a test with the hope of rejecting the hypothesis, and for
thia reason the hypothesis la often called a null hypothesis in such cases. Thus, if one
dealrea confirmation of a suspicion that two populations have different means, one takes
aa H the hyag^fcsis that the means are equal, and if H Q is rejected by the test, then
one'a suspiclBMps confirmed on the basis of the test used.
* ~*^T \
Statistical hypotheses are classified as follows: If OL is a single point of n,
that ia, if H^ atatea =9^, then H is called simple; in any other case H^ is called
o o o - o
compoaite.
7.1 Statistical Testa Related to Confidence Intervals
Consider the case where H Q specifies the value of only one parameter 6^
v e i - ?
If the population distribution depends on no other parameters, this Is a simple hypothesis;
If other parametera 6 , ..,, \ are P re9ent > H ia composite, u> being the h-1 dimensional
U8 VII. TESTS OF STATISTICAL HYPOTHESES S7....1
subapace (hyperplane) in O defined by e 1 = e. If confidence intervals <5(0 n ) for 6 1 are
available, then one may proceed as follows: Form <5(0 n ) for the sample R , and reject H
unless <5(0 ) covers 6. If is the confidence coefficient, then
Pr(rejecting H Q if it is true) - 1 - Pr |6^<5(0 n ) |6 1 = ei = 1 - .
The quantity a 1 - is called the significance level of the test. It will be noted
that when confidence intervals for 6 are known, then a whole family of tests is at hand:
A test exists for every 6, that is, for every admissible value of 6. . We remark that be-
yond the statement Pr( rejecting H Q if true) - a , no further property of the test can be
deduced from the definition of confidence intervals. One might ask about the Pr(accepting
H Q if false), that is, accepting H Q when 6 1 has some other value than e, but the signif-
icance level tella us nothing about this*. As will be seen in the examples below, our
method usually leads us to the calculation of a certain statistic, say T, and H ia re-
jected if T falls in a certain range R. Suppose, for example, that R la the range T > T ,
and that T possesses the p. d. f. f(T) If 6 1 =6^. In certain cases It Is sometimes said
that or = Pr( finding a value of T less probable than T if H is true). This really does
not motivate the test any better: If by ff T 1 is less probable than T " we mean f (T 1 ) <
f(T 2 ), then the same test can be made with other statistics S= <t(T), and the relation
"less probable" is not invariant under such transformations *.
It should be noted that confidence intervals give us a far more complete judge-
ment about the parameter 6 1 than significant^ tests. We also remark that if confidence
regions (6.1 k) for the set e 1 , e ? ,..., e fn are available, then go are significance tests
for the hypothesis
V 6 1 = e ?> e 2 = *2 e m= e m-
H is simple if m = h, composite if m < h.
Example 1 : Suppose that on the basis of the sample from a population with
o /-N n
Jistril
hypothesis
the distribution N(a,cr 2 ), where a and cr" are unknown, we wish to teat the (Student)
H :
This la a composite hypothesis: The space H of admissible parameter points (a, a 2 ) is
*See 7.3-
**This may be shown by considering the signs of f f (T) and g f (S), where g(S) is the p. d.
f. Of S.
7.2 VII. TESTS OF STATISTICAL HYPOTHESES
(b) P - P(0 n ;a ,e) - (2-rte) * e
log P Q = - \ n log (2n) - In log 6 - ^[n(x-a ) 2 +S]/e,
aP /a~P l- l n /e +1
Equating this to zero and solving for 6, we get
6 - (x-a ) 2 +S/n,
and substituting this into (b), we find
JL 1
I^(0 n ) - |2it[(x-a ) 2 +S/n]| 2 e 2 .
Hence
In
A - [i+n(x-a ) 2 /S] 2 .
The distribution of A under the assumption that H Q is true is independent of the un-
known 6, in fact
-i*
A - [Ut 2 /(n-i)j 2 ,
where
t = Vn(x-a o )/s
has the t-dlstribution g^^t) with n-1 degrees of freedom. Let t 2 t 2 correspond
to A - A Q . Then A ^ A Q if and only if |t| > t Q . To get
Pr(A ^ A Q ) - a,
we define t from
oo
' t )dt = a .
~JU~ I
*0
The likelihood ratio test for H Q is seen to be the same as the (Student) test of
Example 1 , 7.1 .
In many cases the asymptotic distribution of the likelihood ratio is given by
Theorem (Aj^: Suppose the c. d. f . oi_ the population depends on parameters 6. ,
e_, . ., , a , and that A is the likelihood ratio for the hypothesis
c. n - - . "" -
H Q : 6 1 - e, e^ - e,..., e m - e,
1 5 g VTT. TESTS QF STATISTICAL HYPOTHESES 7 .3
where m < h . Then under certain regularity conditions the asymptotic distribution of
o
-2 log A, under the assumption that, H r is true, is the -^ -distribution with m degrees of
freedom.
In the above example we may write
_ 1
A (i + lt ? /N)~ N (1 + -k 2 /N) 2 ,
C. C.
where N = -(n-1 ) . Hence as n > co , N > co , and
^-^\
-2 log A - t 2 .
Since the asymptotic distribution of t is N(0, l ), the asymptotic distribution of t 2
is the X-diatributlon with one degree of freedom, and this accords with theorem (A).
7.3 The Neyman- Pearson Theory of Tea ting Hypotheses
In the notation introduced at the beginning of Chapter VII, consider the hypo-
thesis
H Q : eo>.
In many problems (for instance, all the examples we have considered in 7.1) several
tests, or a whole family of tests, are available, and the question arises, which is the
"best" test? For the comparison of teats, Neyman and Pearson have introduced the concept
of the power of a test. We approach this concept through the following steps:
First, we note that any teat consists of the choice of a (B-meaa.) region w in
the sample space and the rule that we reject H if and only if the sample point falls
in w. w is called the critical region of the test. The power of the test is defined to
be the probability that we reject H . This is a function of the critical region w (a set
function of w) and of the parameter point <S> (a point function of ). We write it
P(wl@) and note it is
P(w|9) Pr(0 n w|e).
The interpretation of the power function is based on the following observation: In using
a teat of H Q , two types of error are possible (exhaustive and mutually exclusive): (I)
We may reject H Q when It is true. (II) We may accept H Q when it is false, I. e., when
9 is a point not In ui. We call these respectively Type I and Type II errors. Now a
*The regularity conditions are the same as those for the mu It i- parameter analogue of
Theorem (A), 6.24.
S7 * 3 _ VII. TESTS OF STATISTICAL HYPOTHESES _ 1 53
Type I error can only occur If the true o>. Hence the probability of making a Type I
error if o> is
(a) Pr(0 n wieo>) = P(w!<3>) for o>.
A Type II error can be committed only if @co. The probability of making a Type II error
if dlo> Is
w|eeo) = l-Pr(O n w|eco)
= i-P(wie)
The significance of the power of a teat is now seen to be the following: For co, P(w|8)
ia the probability of committing a Type I error; for goo, B(w|e) ia the probability of
avoiding a Type II error. We illustrate this discussion with an example of a one para-
meter caae .
Suppose O n is from N(a,l ), and that we wish to test the hypothesis
H : a - a .
Let u 1 , u^ be any two numbers, -oo u 1 < u + oo , such that
(b) (2TT) e du - 1 - a.
u i
Consider the test which consists of rejecting H Q if
(c) Vn(x-a ) < u 1 or Vn(x-a o ) > u g .
The critical region w of the test is the part of the sample space defined by (c), that is,
the region outside a certain pair of parallel hyperplanes (if u 1 -oo , or n n ** +00, w is
a half-space). Let us calculate the power of the test:
P(w|0') - Pr[V5(x-a ) < u, or YH~(x-a ) > u ? |a]
- i-Pr[u 1 Vn(x-a o ) u g |a].
Now if the true parameter value is a,
u = Vn~(x-a)
*An elementary discussion of a simple case with several parameters may be found in a
paper by H. Scheffe, "On the ratio of the variances of two normal populations", Annals
of Mathematical Statistics, Vol. 13 (19^2), No. h.
i;U _ VII, TESTS OF gPATiariCAL HVPflPHEflRS _ >73
has the distribution N(0,1 ). Write
Yn(x-a ) u+Vn(a-a ).
Then
P(w|a) - i-Pr[u r Vn(a-a ) u u 2 -VH(a-a Q )|a],
u 2 -Vn(a-a )
(d) P(w|a)
Each choice of the pair of limits \i ]9 u g aatlafylng (b) gives a teat of H Q . Let ua now .
consider the clasa C of testa thus determined, and try to find which is the "best 11 test of
the claaa C.
We note first that for all teat a of the claaa,
Pr(Type I error) - P(w|a Q ) a,
from (d) and (b). Thia la what we have previoualy called the aignificance level of the
teat. To compare the teata we might conaider the grapha of P(w|a) againat a for the var-
ioua teats; the graph for a given teat la called the power curve of the teat. We have
aeen that for every teat of the claaa C, the power curve paaaes through the point (a Q ,a).
To find the shape of the power curve, we might plot points from (d), but by elementary
methods we reach the following conclusions: The alope of the power curve correapondlng
to (u lf u 2 ) is zero if and only if a la equal to
(e) a ta - a + (u 1+ u 2 )/Vn.
As a + co , P(w|a) 1, unleaa u 2 - +co , in which caae P * 0. Aa a + -oo, again
P 1, unleas u 1 - -co, in which caae P 0. Alao, < P < 1. Except for the cases
u- -oo or u n +00 , the power curve muat then behave aa followa: rising from a minimum
at the value a = a given by (e), it increaaea monotonlcally, approaching the aaymptote
P i aa a * oo . It may be shown from (d) and (e) that its behavior is symmetrical
with reapect to the line a a m . In the exceptional caaea, for u 1 -CD , P Increases mon-
otonlcally from o to i aa a Increaaea from -oo to +00; for u 2 = +00, P decreasea monoton-
ically from 1 to o. 3ome power curvea are aketched in the figure:
>7.3
VTI. TESTS OP STATISTICAL HYPOTHESES
155
Figure 7
(1) la for the teat with u 1 -oo , (11) for u g - +00 , (ill) has its minimum at a Q , (iv)
to the left of a , (v) to the right. All the tests of class C have power curves lying in
the region between the curves (1) and (11).
Aa far aa the probability of avoiding errors of Type I Is concerned, the tests
are equivalent, aince the curves all pass through (a ,a). For a ^ a , we recall that the
ordinate on the curve is the probability of avoiding a Type II error. For two tests of
H Q , say T 1 and T g , with critical regions w 1 and w ? , we say that T 1 is more powerful than
T 2 for testing H Q : a - a Q against an alternative a a 1 ^ a Q if P(w 1 |a f ) > P(w 2 la ! ).
Thia means that if the true parameter value is a 1 , the probability of avoiding a Type II
error la greater in uaing T 1 than T 2 - Now for altervatives a > a , the power curve (i)
liea above all other power curves of tests of claaa C, that ia, the teat obtained by taking
u- - -oo la the most powerfi.il of the class C for all alternatives a > a . Hence this
would be the best test of the class to use in a situation where we do not mind accepting
H if the true a < a , but want the most sensitive teat of the class for rejecting H
when the true a > a . On the other hand, we see that this teat is the worst of the lot,
that la, the leaat powerful, for teating H Q against alternatives a < a Q . For these alter-
natives the teat with power curve (ii), obtained by taking u ? = +00, is the most powerful.
There la thua no teat which la uniformly most powerful of the class C for all alternatives
-oo < a < 4-00 1
The situation described in the last sentence is the common one. To deal with it
Neyman and Pearson defined an unbiaaed test aa one for which P(w|a) is minimum for a = a .
The argument against biased tests In a situation where we are interested in teatJng a
hypotheaia against all poaalble alternatives is that for a biased tost Pr( accepting H )
la greater if a has certain values ^ a Q than if a - a .
If we set a
a in (e) we find
that the unbiased teat of the class C is that for which i^ -u . This la the teat of the
claaa C we should prefer, barring the "one-aided" situations where the teats with power
curves (1) and (ii) are. appropriate .
VII* TESTS OP STATISTICAL 1 ^ S73
This serves to Illustrate the comparison of tests by use of their power func-
tions. Beyond this description of the underlying ideas of the Neyman- Pearson theory, it
is not feasible to go into it further except for a few remarks: If one considers instead
of the class C, the more inclusive class of all tests with critical regions w for which
P(w|a Q ) - a, there is again no uniformly most powerful test. However, the unbiased test
obtained above la actually the uniformly most powerful unbiased test of this broader
class.
Leaving the one parameter case now, we recall that the definition of the power
of a test and its meaning in terma of the probability of committing Type I and Type II
errors was given for the raultiparameter case at the beginning of this section. Methods
of finding optimum critical regions in the light of these concepts have been given by
Neyman, Pearson, Wald and others, but there is still much work to be done. The problems
of defining and finding "beat" confidence intervals are related to those of "best 11 teats;
the groundwork for such a theory has been laid by Neyman*. In conclusion, we recall the
assumption made at the beginning of Chapter VII: that the functional form of the distri-
bution is known for every possible parameter point: It is clear that In the application
of the theory the calculations for the gain In efficiency by using a "best" test in pref-
erence to some other test will be invalidated if those calculations have been made for a
distribution other than the true distribution. The whole theory introduced abo\re pre-
sumes the knowledge of the functional form of the distribution.
*J. Neyman, "Outline of a theory of statistical estimation baaed on the classical theory
of probability", Phil; Trans. Roy. Soc, London. Series A, Vol. 256 (1937), pp. 535-380.
CHAPTER VIII
NORMAL REGRESSION THEORY
In 2.9 certain ideas and definitions in regression theory were set forth and
discussed. In the present chapter we shall consider sampling problems and tests of sta-
tistical hypotheses which arise in an important special type of regression theory which
we shall refer to as normal regression theory. To be more specific, we shall assume that
y is a random variable distributed according to N(> a^x .<r ), where x 1 ,...,x l . are fixed
p-1 P P i K
variates, and consider samples of size n from such a distribution. N(2. a^jc^o- ) is a
conditional probability law of the form ftylxX,...). A sample of size n will con-
sist of n sets of values (y a !* 1a * *2 a >*- ' x iea^ a * i*2,...,n, where y 1 ,...,y n are n ran-
dom variables, but where the x , p i,2,...,k, a V...,tti, are fixed variates. and not
random variables. We shall consider such problems as estimating (by confidence intervals
and point estimation according to principles set forth in 6.1 and 6.2) values of the
a ! s and <r^ from the sample and of testing certain statistical hypotheses regarding the
a ! s. We shall also consider applications of normal regression theory to certain problems
in analysis of variance, including row-column and Latin square lay-outs.
8.1 Case of One Fixed Variate
In order to fix our ideas in the regression problem, we shall first consider in
detail the case in which y is distributed according to Nfa+bx,** 2 ). Let (L: (y a lx a ),
a l,2,...,n >lobe a sample of size n from a population having this distribution. The
probability element for the sample is
(a) (y 1 ,...y n ) -
, 2<r 2 -
158 VIII. NORMAL RBG >wefll
Maximizing the likelihood function (that enclosed in [ ]) with respect to <^, a, t>, we
find in accordance with 6. 2k that a and 6 are given by solving
>
y a - an - t>2_*a - 0,
xf. - o,
and a- 2 is given by
(o) **-i
Solving (b) we obtain
a - y bx,
(d)
In order to be able to solve (b), we must have2l (x a -x) 2 ^ o. Now a and S are linecu? func-
a
tlons of y 1 ,.../y n , and It* follows from Theorem (C) of 55.23 that a and b are jointly dis-
tributed according to a normal blvarlate law with
b, E(a) a,
"* " * U a -x) 2 '
cov (a,b) - -
The sum of squares in the exponent of (a) may be written as
2 - q 1 - Q 2 ,
where
.,48.1
VIII.
It la evident from (b) that A - a, t> - b are homogeneous linear functions of (y a -a-hx a )
(a l,2,...,n). Also y a - a - fcx a - y a -a-bx a - (a-a) - (6-b)x a which is a homogeneous
linear function of the (y a ra-bx a ) (a. i,2,...,n). We know that 2l(y a -a-bx a ) 2 is distri-
buted according to the *x 2 -law with n degrees of freedom, and that q 2 (which is the expo-
nent In the Joint normal distribution of a - a and t> - b) is distributed according to a
X. 2 -law with 2 degrees of freedom. Therefore, it follows from Cochran's theorem 5.2^
i P
that q^ is distributed according to the x -law with n - 2 degrees of freedom.
We may summarize in the following
Theorem (A)! Let O n ; (y a lx a ), a* l,2,...,n, where the x a are not all equal ,
be a sample of size n from a population with the distribution N(a+bx, a 2 ) , Then
( 1 ) The maximum likelihood estimates a, fc and of a, b, <r 2 . respectively, are given b
(b) and (c).
( 2 ) a-a and S b are jointly normally distributed with zero means and variance- covar-
lance matrix given b
n
0-2
1 distributed according to the X-law with n - 2 degrees of free-
dom and j^ distributed independently of a and 6.
One may readily set up confidence limits for a or b on basis of the "Student"
t-distributlon. For example, from 55.3 It follows that
(b-b]
t -
n - 2
is distributed according to ^gCt), from which confidence limits can be set up for b, or
the statistical hypothesis can be tested that b has some specified value b Q (e. g., 0,
which corresponds to the hypothesis that y is independent of x). A similar treatment holds
for a*
160
VIII, NORMAL REGRESSION THEORY
8,2 The Case of k fixed Variatea
Suppose y la distributed according to W^****' " 2 )- *** n : (y a ' x i > X 2 '*
x lca ), a- l,2,...,n>k,be a sample of size n from this distribution. The probability ele-
ment for the sample Is
(a)
) - [(
-=-
V2tr<
.
) n
]dy 1 ...dy
n .
n
There Is no loss of generality In considering the mean of y a as a homogeneous linear func-
tion of x lo( ,, ^''^a' for by Choosln 8 one of the X!B > 8& y x tf = 1 for a11 a > we can
reduce our results so as to cover the case In which the mean value of y Is not homogeneous
In the fixed variates.l.e., of the fonn (a 1 4a 2 x p *...*a k x k ). The results for the homogeneous
case are simpler than for the non-honogeneous case from the point of view of notation be-
cause of greater symmetry.
The maximum likelihood estimates of a^...,^ and <r 2 , found by maximizing the
quantity In [ ] In (a), are given by the following equations
a
<>
equations (b) as
' a op " ^^a' a oo "
oq
" i2.-.k). We may write the
(q- 1,2,...,k).
If the determinant |a | T< o, then It follows from 52. 9U that the solution of (d) Is
It should be noted that
1 1 Is positive definite arid hence
o If
are
linearly Independent (l. e., If there exists no set of real numbers C (p - 1,2,..., k)
not all zero for which
Jt-
X_ a PQ C P C a"
*.*** r -i,
C- - o, for all a). For consider the quadratic form
p n k
) If the x a are l^^arly Independent,
161
JBL JL p JL-
clearly ^(^x^C ) , and hence ^> a^CLCg, cannot vanish. Now the a QD and hence the
aL are linear functions of the random variables y^ y 2 ,..., y n . Therefore, the L are
distributed according to a normal k-varlate distribution. The variance of cL is
n k t P
- :> o^er 1 a , - a 1 ^. Similarly, the covarlance of L and a n is a?*.
q "*~ q f qi *i f Q
It will be noted that (b) can be written as
- 1,2,...,k)
which shows that (^"^), (p 1 ,2, .. .,k) ? are homogeneous linear functions of
k p p
^la.^^, (a i,2,...,n). y fl -
1^1 ** ^ k
functions of y a - ^"^X a . Now
, (a i,2,...,n). y fl - 21 ^x , (a- i,2,...,n) are also homogeneous linear
^ F
where
(f)
Hence, q 1 and q are homogeneous quadratic forms In (j^- > \X-QQ)' Since eL - a., are
distributed according to a k-varlate normal law with variance- covarlance matrix
- 1 1 " 1 it follows from 55.22 that q 2 Is distributed according to the*x?-l8M with
k degrees of freedom. Similarly, we know that
Is distributed according to the x 2 " law wlth n degrees of freedom. Therefore, by
Cochran's Theorem, 5. 2k, q 1 Is distributed according to the *x, 2 - law with n - k degrees of
freedom and Independently of q 2 (1. e., the aL - cr).
Consider the aum of squares In (c); we may write
162
VIII. NORMAL REGRESSION THEORY
But, It follows from S2.9 1 * that this expression reduces to
(g)
oo
oi
a
ok
. a,
We may summarize in
Theorem (A); tet O n : (y a l* la *3c 2a ,... > x lDQl ), (a- 1 ,2, .. .,n).be a sample of
size n from a population with distribution N(2_a p x p ,cr*), where x 1 , x
(a- 1 ,2,... jn),are linearly independent. Let a^ >
P^ a
V
Thai
aI
and
( 1 ) The maximum Itolihood estimates of the a and or 2 are given bj ( e ) and (^) .
(2) The quantities (a p - a^), (p - 1 ,2, . . .,k),are distributed according jto a k-variable
normal law with zero means and variance- covarlance matrix 1 1 -&L 1 1 ~ 1 .
(3) The quantity -^2_ (ya"zZ^p x p a ) 2 which may be expressed as in (g) as the ratio of two
determinants, is distributed according to a x 2 -law with n - k degrees of freedom, and
independently of the (a- aK
Making use of the results as stated in Theorem (A), one may set up confidence
limits (or a significance test) for any a by setting up the appropriate Student ratio.
Or one may set up confidence limits for <r 2 by using q 1 . Confidence regions may be set up
for all of the a or any sub- set of them by setting up a Snedecor P ratio, in which the
numerator sum of squares is the exponent in the normal distribution of the corresponding
set of a_ and the denominator sum of squares is q. .
An Alternative Proof of the Independence of q 1 and q 2 .
The proof which has been given for establishing the independence of the two
above expressions in the probability sense depends upon Cochran's Theorem. The indepen-
dence can also be established by the use of moment generating functions. Let
be the moment generating function defined as
where q 1 and q 2 are defined in (f ). If we can show that
$8.2 _ VIII. NORMAL REGRESSION THEORY
then it follows by Theorem (B) In 2.81 and (e) of $33 that q 1 and q g are Independently
awa with n - k and k degrees of free
" z a* ^ e ma ^ write equations (b) as
distributed according to yf-lawa with n - k and k degrees of freedom respectively.
which reduces to
from which we have
Now
where
For q 2 we have
A ^z z .
a a
The probability element associated with the sample is given by (a). Makipg the
transformation j:(y a -^ax a ) - z a , a - 1,2, ...,n, we obtain as the probability element of
the z a the expression
For the m. g. f . we have
-00 -CD
VIII. NORMAL RECffiESSIQN THEORY
$6.2
where
where
- 1 - 26 +
B
GLp i 'c. *-*f t~
The value of the n-tuple integral is ^i where B is the determinant |B I .
B, let us augment it as follows (letting 1 - 26 - M, 2(e--6 2 ) - N):
To evaluate
B -
M+NA n NA 12 ... NA ln
x 21 ...x kl
,... NA,
2n
x m X 2n'" x kn
1
1,
o 1
Suppose the (n+p)-th column is multiplied by - N(2_ a pq x ) - C , say, p - l,2,...,k, and
q-1 q p
added to the a-th column a - l,2,...,n. We obtain the following expression for B
B -
o
M ..
. . . . 3
M1 ^21 ~kl
o
6. . .
*
../..M 3
.
C 3t X
(V ..
....C,
o o
c 11
12
c ..
....C
) 1 o
a_.
a_..
"^2X1
.
.
.
....C,_ (
) 1
Now suppose the a-th column (a- l,2,...,n) be multiplied by
column (p - l,2,...,k). Noting that
and added to the {n+phth
P-q '
58.2
VIII. NORMAL REGRESSION THEORY
165
We find that B reduces tc
D
M 0. . . .0
M....O
B -
o 6 M o o C .0
C, , C^..C, (1+12) o ...0
11 12 * m v T W
C , C ^. .C ( 1- n)
21 22 2n v M': .
.
l/ui ^Vo ^\rn I 1 tl'
JV 1 IV C JYJL 1 1*1
Therefore we have
n-k
k
2
which concludes the argument that q and q 2 are Independently distributed according to
p
X -laws with n-k and k degrees of freedom respectively.
Remarks on the Generality o>f the linear Regression Function. The regression
function
P
= t,
x is much more general than it might appear at first. For example, if
t
Jc-1
t , the regression function would be the polynomial
-, -, A g
i\. _,^
zl_apt p ~ , in which case we would have a random variable y, having as its mean value the
function of t given by ^.apt^" 1 . The estimates a would, of course* have the same form
as those given by (e), except that x would be replaced by t^""- 1 in calculating a_ and
p 1
Again, we might have for k = 2m+ 1 , X- - 1, x ? = sin t, x, cos t, x^ sin 2t,
= cos mt, in which case the mean value of y is a harmonic function of the form
+ a 2 sin t + a cos t +
rat. The procedure for obtaining a^ a 2 ,...,
a 2 1 is as before given by (e).
Another example: Suppose k 2 and x la = 1, x 2a =* for a 1,2,...,^, and
x 1a 0, x 2a - 1 f or a = n 1 +l,...,n 1 +n 2 . The sample O n : (y a lx la ,x 2a ), a = 1 ,2,.. ^n-n^r
drawn from N( y a, x^,a 2 ) is equivalent to two Independent samples 0^ : (y- >y *..iy Y , )/
^1 P p 1 o U
^a 2 ) and N(a 2 ,<r 2 )
n
+T*" y n ^*n ' ^ 3ize n i and n 2 wapectlvely, drawn from
respectively. This example extends readily to the case of several independent samples.
166 _ VIII . NORMAL REGRESSION THEORY _ 18,3
Curvilinear regression ia alao a special case* For example, for quadratic re-
gression in two variables, say u and v, we would let x 1 -1, x g -u, x, v, x^ u ,
p
x v , x^ uv.
8,3 A General Normal Regression Significance Test
The following general significance test frequently arises in normal regression
V
theory: A sample R : (y a l x i a > x p a > -' x ka^ a " 1 > 2 >'-> n ia assumed to be drawn from a
population with distribution N(^>_a^x ,<r 2 ), and it is desired to test the hypothesis
P=1 p p
that a p+1 , a p+2 , . .., a k (r<k) have specified values, say a p ^ 1>0 , a r + 2 ,o'*"' a
tively, no matter what values a 1 , a 2 , ...,a and <r 2 may have. For example, all specified
values may be zero in which case the problem is to test the hypothesis that y, which is
assumed to be distributed according to N(^a^x D ,cr 2 ),is actually independent of x -,
In order to determine the teat function (of the y a and x^ a ) for testing the
hypothesis we shall make uae of the method of likelihood ratios discussed In 57.2.
The probability element of the sample ia
(a) dF(y 1 ,...y n ) -
where
(b) p (n' a T a 2' *> a l
likelihood function.
Let XI be *the (k+1 )-dimenaional parameter apace for which <r > 0, -OD < o^ < +00,
p = l,2,...,k, and let co be the (k-r)- dimensional subapace of ilfor which a - a - Q ,
a p+2 a p+2 Q , ...,8^ - a^ Q . If Ho denotes the hypothesis to be tested, then Ho is the
hypothesis that the true parameter point lies in ou, where the admissible points are those
in H.
The likelihood ratio A for testing Ho is given by
where the denominator is the maximum of P(0 n ;a 1 ,...,a k ,<r 2 ) for variations of the parameters
over fl and the numerator ia the maximum for variations of the parameters over co. To find
the maximum of the likelihood function (b) over U, we follow the ordinary procedure of
taking the first derivative of the likelihood function with respect to each parameter,
VIII. NORMAL REPRESSION THEORY
setting the derivatives equal to zero. We find that the maximizing values are
(d) < S
' p - l,2,...,k, as given In 8.2. Substituting these In the likelihood function we find
n _ n
^a^ag,...,^,^) - ( ) 2 e 2 .
Similarly, by maximizing the likelihood over cu, we set a p+1
m \ Q> and differentiate with respect to o^, a 1f ...,a , obtaining- *
where y - 7- < - x, a - 7x, a QO - , and I
Substituting in the likelihood function, we find
n _ n
)2 e 2 '
Therefore
^ n
(f ) A - b) 2 .
Now it is clear that cr^ ^ or^, since <TQ la the minimum of > (y" ax ) ^ or var:1 - a tion3
a
of a^,...,^, while cr^ la the minimum for variations of a 1 ,...,a r , for fixed values of
&,...,B.. Now let
n
q -
^
-o^) is
The difference n(o^-o^) is simply the further reduction In the sum of squares
) 2 obtainable by varying a r ^ 1 ,. . .,0^. in addition to a 1 ,a g , . . .,a p .
Expressed in terms of q 1 and q 2 ,
(g) A -
1 ^
Thus A Is a single- valued function nt Qg/q, , irtilch meana that q 2 /q. la equivalent to A as
a teat function. The "nearer the value of A to unity, the smaller the value of q> as
168 _ VIII. NORMAL REGRESSION THEORY _ 88.3
compared with q 1 . To complete the problem of setting up a test for testing H^> we must
now obtain the distribution of A, (or q 2 /Q 1 ) under the assumption that the hypothesis
H^ Is true, 1. e., that has been drawn from N( xIaJK-,^) for a . - a - '***'
*k " *k,o*
We shall now show- that If H Is true then q^ and q g are distributed Indepen-
dently according to ^-lexra with n - k and k - r degrees of freedom respectively.
The probability element for the sample O n from the population having distribu-
(2l
tion N(2apX p* ") la given by (a). Now as we have seen In 8.2, the sum of squares In
the exponent can be written as
(h)
The second expression in (h) may be written as
where 1^. (u l,2,...,r) are linear functions of (a -a ), (g - r+i,...,k) and where
o o
I Ibgh' I " I la 1 ! I" 1 * g>h - r-fi ,...,k, where a^ 1 is the element in the g-th row and h-th
colxomn in I la I I" 1 P,q - l,2,...,k. See 3.23.
pq
To verify the statement that expression (i) is equal to the second expression in
k
(h), let us denote a - a by d and let L - > \ d . We must then determine the 1
u U- u TJ B< Ug g . ilg
and the b so that
that is, an identity in the d'a.
d 2
Taking ^ &A of botn sides of this identity we get
/v - l,2,...,r%
iv-Sjg " a vg
and hence
(1) V
*2
Taking fa a of both sides of (j) we get
8.3
Vlir. NORMAL REGRESSION THEORY
169
(m)
b "
u,vl " '*
Using (k) and (1) we find that (m) reduces to
+ gh
(n)
or
(o)
" a
gh
Referring to 2.91* it will be seen that
(P)
b^ -
11
*rl'
*g1
+ b gh " a gh
,:.a
rr tt rh
V a gh
which la eqxial to the term In the g-th row and h-th column* In the Inverse of | la^ a l I
Making use of the fact that the sum of the squares In the exponent of the like-
lihood function in (a) is
(q)
n
expression (i),
it is now clear that by maximizing the likelihood function for variations of a-,..., a,,
o- 2 inu>, we find
(r)
since the first expression in (i) vanishes when a., , . . . ,a p are varied so as to maximize the
likelihood function.
Remembering that when the likelihood function is maximized with respect to a- ,
cr?
, . * . ,a, ,<r 2 over Awe obtain cr? as given in (d), we clearly have
Is)
which is a function of the a (g r+i,...*k) which, as we have seen in 8,2, are distri-
o
i 7 Q _ VIII NORMAL REGRESSION THEORY * Sfi^
nc n
buted Independently of -g- - Q 1 , Q 1 being distributed according to the x, -law with n - k
degrees of freedom. But - -|q 2 la aeen to be the exponent In the joint distribution of a
when Ho is true. Hence by 5.23 q g is distributed according to the 7(, 2 -law with k - r
degrees of freedom.
Since q g and q 1 are distributed independently according to ^f-laws with k - r
and n - k degrees of freedom when HQ is true it follows that the quantity
q (n-k)
is a Snedecor ratio distributed according to the law \_ T n-i^F)^ ( 3ee 5*0 when HQ is
true. Now
n
which shows that A is a single-valued function of P, and hence P is equivalent to A for
testing the hypothesis HQ, the upper tail of the P-distribution being used for determining
critical values of F for various significance levels. It can be shown that this test is
unbiased (see 7.3 ) although we shall not demonstrate this fact here.*
We may summarize our results in the following fundamental Theorem in normal
regression theory:
Theorem (A): Let O n : (y a lx lQt x Ca , ..., x ka ) be a sample from a population with
distribution N(^a x ,0^), where x are linearly independent . Let H be the statistical
pss1 p p p
hypothesis that the true parameter point a 1 , a^,..., ^-^ belongs to <^: -o>< a < +00,
(u m i ,2, ...,r), cr p > o, a r+1 * a p ^ 1 Q . . . ,8^ - a^ Q which jls a subset of the admissible set
A: -oo < a < 4-00, p 1,2,... ,kp^ > o. Let o^ be the maximum likelihood estimate of a* 2
for variations of the parameters over O, and cj^ the maximum likelihood estimate of a- 2 for
variations of the parameters over o. Then:
(2) no . no^ + -^" ^^g'^o^V" 8 ^ o } wnere Sp are given b (d), I Ib^l I is the
inverse of the matrix obtained bj deleting the first r rows and columns of 1 1 a |
p,q = i,2,...,k.
(5) The quantities
no a n(cr ^"" ( ^ )
q i " "TT" q 2 " T2 '
are independently distributed according to x g -laws with n-k and k - r degrees of
*See J. F. Daly, "On the Unbiased Character of Likelihood Ratio Tests for Independence
in Normal Systems", Annals of Math. Stat . , Vol. 11, (19^0).
VIII. NORMAL REGRESSION THEORY _ _ 171
freedom. respectively. when H la true.
Q .r - _____
( ^ ) The likelihood criterion A for testing H Q ig given fcy
n ri
where P is Snedecor^ ratio which Is. distributed according to
when H Is true,
...... - \j -~~
8. *f Remarks on the Generality of Theorem (A), 8.3
In order to emphasize the generality of Theorem (A), 8.3, we shall discus*
briefly several cases of particular interest.
B.k\ Case i . It freqaently happens tnat the following statistical hypothesis
la to be tested on basis of a sample 0^: (y l x i a > x p a > > x ka^ a3sumeci to have been drawn
from a population with distribution N(> a^x ,cr )r
_C1: j -OD < a^ < +CD , <r > o, p - 1 ,2, . . .,k
JL. 2
co: ! Region in /I for which ^_c a o, cr > o, u r+1 ,r+2, ...,k
pl * -^
where the c are linearly Independent constants. In other words the hypothesis to be
tested here is that there are k - r linear restrictions among the a , given that the
sample la from a population with distribution N(^a x .<r 2 ). Denoting this statistical
hypothesis by H 1 , we may verify from Theorem (A) that the likelihood criterion for testing
H 1 la of the same form as A for H_ where <r 2 q- is the minimum of S - ^1 (y a ~^L a n X TvJ 2 for
' <zi pai ^ ^
variations of a. , . . f a over FL, and cr 2 q g is the difference between the minimum of S over u>
and the minimum of S over A- As In the case of H Q , q 1 and q 2 for H^ are Independently
dlatrlbuted according to -^-laws with n - k and k - r degrees of freedom respectively,
when H 1 is true. To verify that this is true, we transform the a as follows:
a -a' 'u - 1,2,..., r
P-1
We may write this twinafonnation aa
JLLE
VIII. NORMAL REGRESSION THEORY }8.12
where
Cgp - 1 g - P i r
- othemrise .
Without loss of generality we may assume the c to be auch that |c 'I J 0. Hence
k k
and 21 a x I> a q x q where x* > c^x . Therefore H^ may be expressed as the statis-
tical hypothesis:
H: -oo < aJ < oo , <r )> o p- i,2,...,k,
co : Region in 1 for which a ' o, <r 2 > o g r+1 , . . . ,k,
o
which is to be tested on basis of the sample O n : (y GL l x i ot > x 2 CL * > ^Q.) a!BS 1 >2,...,n
drawn from a population with distribution N(^; a'xj^cr 2 ). Theorem (A) is immediately
applicable to H' as expressed In this form.
8.1+2 Case 2. The following statistical hypothesis say H frequently arises,
to be tested on basis of a sample O n from N(^> CLX ,<r^);
.fL: -oo < ap < oo, . o- 2 > o, p l,2,...,k
co : Region in A for which a - ^c u ^a^, a^ > 0, p T,2,...,k.
In other words the hypothesis H is that the a^ can each be expressed linearly in terms
of r k) parameters a. 1 , &.' where the c up are given. By using the transformation
fi r
c^a 1 , where the c^, (,q r+l,...,k, p l,2,...,k) are further given numbers
such that Ic^l ^ we can express H as follows:
H: -oo < a^ < oo, <r 2 > o, p = i,2,...,k
^: a g ** ' cr 2 > o, g - r+1,...,k,
to be tested on basis of the sample O n : (y a l x i CL * > x k a ) a ** 1 * 2 *---^ n > x a = ,
from a population with distribution N(JXaJ ) x*,cr 2 ). This case is clearly covered by
Theorem (A).
In this case o la the minimum of S - (ya-VW 2 for variations of a 1 .
o^T p-T ^ p 1>
ag,...^ over n, while <Tq 2 is the difference between the minimum of S for variations of
the ap over n and that of S for variations of the a over u>(l.e., for unrestricted varia-
tions of a^, u-i,2,...,r, when the ap are replaced by "c^a'q in S and the a' are set
equal to o (g-r+1 , . . ,,k). ^
VIII. NORMAL REGRESSION THEORY
q- and q 2 are independently distributed according to ^ 2 -laws with n - k and k - r degrees
of freedom, respectively, when H is true.
6.^3 Case 3. The following variant of the hypothesis H, of 8.3 arises in
such problems as randomized blocks (see 9-2 ), Latin squares (see 9^ ), etc., to be
tested on basis of a sample O n : (y a l x i a > *> x ka^ aiB 1 *2,...,n. Denoting this hypothesis
by H"', it is specified as follows:
-oo < ap < +00, cr 2 y 0, p - 1,2,...,k,
with the a^ restricted by the r 1 linearly Inde-
pendent conditions.
I The subspace in fl for which
" ' V " ^2,...,^, where r 1 < r 2 < k '
*
H" 1 is the hypothesis that the a satisfy r 2 - r 1 further linear restrictions, assuming
that r <1 linear restrictions are fulfilled, linear independence being assumed throughout.
H" 1 is to be tested on basis of a sample O n : (y a l x i a >-* x kcx^ a ** 1 > 2 >-** n In this
o p
case o- q- is the minimum of S J>_(y a - ^L-^rprKi) for variations of the a over IX (i. e.
o^T p=i p p k p P
for variations of the a subject to the restrictions J>_ dp^a > A= l,2,... 4 r 1 ) while cr q g
is the difference between this minimum and that for variations of the a overall, e. for
variations of the a subject to the restriction "^r^ 6 ^^ v " I>2,...,r 2 ). When H' is
P p-1 p p
true, q 1 and q are independently distributed according to ^ l awa with n - k - r 1 and
r 2 - r., degrees of freedom respectively.
That this case is 'covered by Theorem (A) may be seen by considering the non-
singular transformation of the ^J>_ ( l Da a I) = a g> q 1 ,2-, . . ,,k where 6L are given numbers
since that |dJ ^ 0. We have a^ - XI da' whlch transforms y a^x^ into
IT /n. i ^
where xJ^ a - ^ d^x^ Now under XI the regression function is ,-3?-- ^X-na' an< ^ we
therefore specify H^ 1 as
j-oo < a; < co, er 2 > o, p -
A:
| being assumed o from the outset )
\ Subspace In A for which
The applicability of Theorem (A) is now obvious.
174 VIII. NORMAL REGRESSION THBQRY
8.5 The Minimum of a Sum of Squares of Deviations with Respect to Regression
Coefficients which are Subject to Linear Restrictions
It will be noted in $$8.3 and 8.4 that frequently we have to find the minimum
of
(a) S-
for variations of the a^, when the a^ are subject to one or more linear restrictions.
The object of this section is to give an explicit expression for the minimum of the sum of
squares, under such conditions, as a ratio of two determinants.
Let us consider the problem of finding the minimum of the sum of squares (a),
when the a are subject to the linear restrictions
(u 1*2, ...,r < k).
We shall use the method of Lagrange, $4.7 , and write
(c) F(a 1 ,a 2 ,...,a k ;A 1 ,A 2 ,...,A p ) - s + 2^A U (T c up a p ).
It is necessary that
" ' , (q
in order for S to have an extremum (in this case a minimum). Performing the differentia-
tion |-| , these equations may be written as
q
k r
where a , a (and a QO ) are defined in (d) of $8.2. Multiplying each of (e) by a and
summing from q - l to k, we get
(f)
Expanding S, we obtain
and making use of (f ),
18.5
VIII. NORMAL REGRESSION THEORY
17S
Rewriting (h) and (e) with a Q - 1, and using the conditions (b), we obtain the following
homogeneous linear equations In the 1 + k + r quantities a Q , a 1 , a g , . . , a^, A 1 , . . . , A r .
(3 ' a oo )a o
(i)
U " 1 >
In order for these equations to have a non- vanishing solution the determinant
of the 1 + k + r equations must satisfy the Well-known condition of being 0, i. e.,
a or*" a ok o .... o
a ll*"* a lk c n"** c n
(o-o)
.
.
(o-o)
c n c rk
- o.
Treating the first column as a sum of 2 columns as indicated and employing the
usual rule for expressing the determinant as the sum of two determinants, we find the min-
imum value of 3 to be given by
A
where
(11)
a OO a 01"" a Ok
ik
rk
and A OQ is the minor of a QO in A.
It should be noted that the values of the a^ and A U which yield the extremum of
P(or the values of the a which yield the minimum value of S) are given by the last k + r
linear equations in (i) with a Q 1 .
CHAPTER IX
APPLICATIONS OF NORMAL REGRESSION THEORY TO ANALYSIS OF VARIANCE PROBLEMS
In this chapter we shall consider some applications of normal regression theory
together with the general significance test embodied in Theorem (A), 8.3, to certain
problems In the field of statistical analysis known as analysis of variance. This field
of analysis is due primarily to R. A. Pisher.
9*1 Testing for the Equality of Means of Normal fopulationa with the Same
Variance
Suppose (y pa )> <* - i,2,...,iu, p= l,2,...,k, are samples from N(a 1 ,<r 2 ),
N(a g , o- 2 ), ..., N(a^,cr 2 ) respectively, and that it 1 Is desired to test the statistical
hypothesis H (a^ . . .-a. ) specified as follows: -
Q : -oo < a^ < oo, " / 0, p - 1,2,...,k
cu : a^ - a, -co< a < oo , <r 2 > o, p i,2,...,k.
In other words H is the statistical hypothesis that all of the samples are drawn from
normal populations with identical means, given that the populations are normal and have
equal variances. The probability element for the k samples is
V2Tt<r
Maximizing the likelihood function (i. e.,- the expression in [ ]) for variations of the
parameters over n, we obtain
St
.
for variations of the parameters over <jj we find
where y - Tk^.^'ooL' the mean of the y ' 3 ^ the p " th saD ? )le Maximizing the likelihood
9. 2 IX. APPLICATIONS OP
(c)
where y ~
(d)
N W q 1 and q 2 of Theorem < A )> 58.3, are aa follows:
2 >
Assuming HCa^a^. . ***\) is true ( i. e., that a 1 - a g - ... - a^j It follows from Theorem
(A), 8.3, that q 1 and q 2 are "Independently distributed according to x 2 * laws with n - k
and k - 1 degrees of freedom respectively. Hence
k
(e)
(n-k)q
is distributed according to tu v(F) dP.
K- i ,n-K.
To see exactly how this problem is an application of Theorem (A), the reader
should refer to 8. in, Case i. It will be noted that -the set of k samples, (L , 0-
Jc 1
0_ , can be regarded as a single sample of size n (n
k k
K- _- p
tribution N(<>_a x , <r ), where x. = 1 , x
p-1 p p
a -
for O n , and so on. The hypothesis
J. where C 1p - 1, a - a, a - 0, q - 2,3,...,k,
from a population with dis-
x k - b for O n , Xg -1, x 1 - X 3 -
is that all a are equal, i, e.
q=1
9.2 Randomized Blocks or Two-way layouts
Suppose y 4 , ( i - i,2,...,r, j = i,2,.'..,s) are random variables independently
r s
distributed according to NCm-i-R.-i-C..,^) where > R. - 5]C^ 0, and that we wish to test
on basis of the y, . the hypothesis H[(C.) 0] specified as follows:
2
-co < m, R,, C. < oo, <r* > o,
r - A -J
1,2, ...,r; j - 1,2, . ..,s
The aubapace in D. obtained by setting each C^ - 0.
The c* space is simply the subspace in Q for which the C. are all 0. The probability ele-
ment for the sample (i. e. the y. .) is
1 " ^T^^ij"" 1 " R i" C 1 )2
(a) t(-zz~) rs e 2<rl 'J ]T~Tdy...
The sum of -squares in the exponent of (a) may be written aa
178 IX. APPLICATION OF NORMAL REGRESSION THEORY TO ANALYSIS OF VARIANCE PROBLEMS S9.2
s - A.[
^9 J
(b)
" g (y ij~^i"*j^ )2 +
where y - j^Z_y 1 j, y ia ^A y^ y.j - ~^_y 1: . Maximizing the likelihood function In [ ]
(which la equivalent .to minimizing S as far as m, the R^ and C* are concerned) for varia-
y
luiv
tions of the parameters over -fl, we find
m - y, iL - y, - y, C, - y , - y,
Maximizing the likelihood function for variations of the parameters over u we simply set
each of the C. equal to zero and maximize for variations of cr 2 , m, R^ (subject to^R^ - o).
We find
m - y, fi - y. - y
It may be readily verified that
u>
q 1 and q of Theorem (A), 8.3, are given by
rsoj
2
<T
It follows from Theorem (A), 8.3, (See Case 3, $8.^3) that q 1 and q 2 are inde-
pendently distributed according to y^-Iaws with (r-i)(s-l) and (s-1) degrees of freedom
respectively, when H[(C.) =0 ] is true. Hence, under the same conditions,
(r-l)q 2 (r-1
is distributed according to h/_ 1N /^ . Wa . v(P)dP,and is equivalent to the likelihood
\ a i ) 9 \ r- i M s - i )
ratio criterion for testing H[(C.) 0] using the upper tail of the distribution for ob-
taining critical values of P for given significance levels.
In an entirely similar manner we may derive an F test for the hypothesis
S9.2 TY r APPLTnATTON OF NORMAL RF/IRRSSTON THEORY TO ANALYSTS OF VARIANT
H[(R. )=o] defined as follows:
Jl: f Same as forn in definition of H[(CJ0]
cx>: I The subspace inn obtained by setting each R. o.
Following ateps similar to those followed for H[(CJ-0] we find for HUR^-c
J -^
(c) P -
which will be distributed according to h/ r _ 1 \ / r . 1 )(g.-j \(F)dF, when HKR^J-o] is true.
The applicability of Theorem (A), 8.3, in testing H[(CJ-0] is evident when it
is noted that under L the y 1 . can be regarded as a sample of size rs from a population
1J r+a+i g
having a distribution of the form N(^>__a x ,cr ) In which there are two homogeneous lin-
p-1 p p
ear conditions on the a (the a being written in place of the m, R,, C- and each x having
the value or 1 ) whereas under o> there would be s + 1 linear conditions on the a (or
s - 1 linear conditions In addition to those already imposed under /l). Both HUC-J^O]
and HKR^Ho] come under Case ~$, 8.43.
If the y^., 1 = l,2,...,r; j - i,2,...,s, are considered in a rectangular array
with 1 referring to rows and j to columns, then it will be seen that we are assuming that
y^ . is a normally distributed random variable with a mean which is the sum of three parts:
a general constant m, a specific constant R 1 associated with the i-th row and a specific
^
constant C s associated with the j-th column (where 21 R, = ]EIc. 0). The variance is
J 1- 1 j J
assumed to be independent of i and j. Statistically speaking. R, la often referred to as
effect (or main effect ) due to the i-th row, and c . the effect (or main effect ) due to the
.1-th column . HfCR^Ho] is therefore the hypothesis that row effects are zero no matter
what the values of m and column effects. The quantity < ^...(y 1 i~y*~y '+y) 2 Is often re-
ij J * J
ferred to as "error" or "residual" sum of squares after row and column* effects are re-
moved, and when divided by (r-1 ) (s-1 ) the resulting expression provides an unbiased esti-
o v _ o
mate of cr no matter what the values of m, the R^ and C^. ^_Jy4 ~y) la usually referred
to as sum of squares due to rows, and when divided by r - 1 , the resulting quotient pro-
vides an unbiased estimate of a- 2 (and, as we have seen, independent of that obtained by
using "error 11 sum of squares) if the R^ - 0, no matter what values the C. and m may have.
K 2
A similar statement holds for y (y ^-y) . It can be shown by Cochran f s Theorem and by
"- 1 9 J
the uae of moment generating functions, although we shall not do so here, that
Z^v K
^ y lj" y l" y .j' l ' y ^ 2 ' ""2^- (yi~y) 2 > ^2 ^(y.j'Y) 2 are Independently distributed according
to \T- laws with (r-1) (s-i), (r-1), (s-i) degrees of freedom respectively, if the R I and
1 80 IX. APPLICATION OP NORMAL REGRESSION THEORY TO ANALYSIS OP VARIANCE PROBLEMS
are all zero, and furthermore the sum of the three quantities Is -
~
* n --y), which
*
is distributed according to the ^ 2 -law with rs-1 degrees of freedom if each R^ and each
2
,
is zero.
These various sums of squares together with their degrees of freedom are com-
monly set forth in an analysis of variance table as follows :
Variation Due to
Sum of Squares
Degrees of Freedom
Rows
- <n.-r> 2
r - 1
Columns
SB ^^ ( "V V \
f* -^ * J ^ v '
s - 1
Error
S.. S (T^,^J,'
(r-1) (s-1)
Total
s - ^, 7l ^,'
ra - 1
The main facts regarding the constituents of this Table may be summarized as follows:
(1) S - Sp + S c + Sg.
(2) Sp/ 2 , SQ/ p, Sg/ 2 are independently distributed according to x. 2 ~l aw3 with (r-1),
(s-1), (r-l)(a-l) degrees of freedom respectively if all R. and C. are zero.
(3-1)Sn 1 J
(*>) F - - is distributed according to h, ., , x_ 1 j,_ 1 JPJdP when Hi [(R 1 )-0] is
true.
(r-i)S r
(U) P - g- ^ is distributed according to h, g _ 1 . / r-1 x (g-1 x(P)dP when H[(Ci)-0] is
true.
(5)
2 is distributed according to the x 2 "law with (r-i)(s-i) degrees of freedom for
any parameter point in II (i, e. no matter what values the R^ and C. may have).
(6) S/or 2 is distributed according to the ^ 2 -law with ra-1 degrees of freedom if all R^^
and
J
are zero.
The theory discussed in this section has been widely used in what are called
designed experiments, particularly in agricultural science. For example, rows in our
rectangular array may be associated with r different varieties of wheat, columns with s
different types of fertilizer, and y^, with the yield of wheat on the plot of soil associ-
ated with the i-th variety and j-th fertilizer, it being aaaumed that plots are of the
same size and the soil homogeneous for all plots. In such an application, we emphasize
that the fundamental assumptions are that the yield on the plot associated with the i-th
9.3 IX. APPLICATION OF NORMAL REGRESSION THEORY TO ANALYSIS OF VARIANCE PKRT.EMfl iAi
variety and the j-th fertilizer may be regarded as a normally distributed random variable
having mean value of the form m + R. 4- C^ (where^_R^ s> C * 0), and a variance <f
1 J .1 x j- J
which has the same value for all 1 and j. The question of whether the assumptions are
tenable in any given case is one for the individual applying the method to settle. In
this example H[(C.)0] would be the hypothesis that fertilizer effects on yield are all
equal -no matter what the variety effects may be.
9.3 Three-way and Higher Order Layouts; Interaction
The analysis presented in 9.2 can be extended to three-way and higher order
layouts. In this section we shall consider in detail the three-way layout. Let y^
(i = 1, ?,..., r; j = 1, ?,..., s ; k = i,2,...,t) be random variables distributed indepeir -
dently according to
(a) N(m+T ljk ,<r 2 ),
where
(b) T ijk = X ijo + T iok + Z ojk + X loo + T ojo * W'
where each set of I's on the right hand side of (fc) is such that when summed over each index
the sum is zero. Thus there are (r-i )(s-l ) linearly independent constants in the set
(I,. I, (r-i ) such constants in the set H loc J, with similar statements holding for the re-
maining sets. For convenience, we may consider y^^ as a random variable associated with
the cell in the i-th row, j-th column and k-th layer of a three-dimensional rectangular
array of cells. The mean value of y^^ is given in (a), in which the IJL OO > t ne
the I , are row, column and layer main effects, respectively; the 1^ . are row- column
interactions, the I row- layer interactions, and the I are column- layer int e rac t ionat
The probability element of the
(c)
The sum of squares in the exponent of (c) is
(d) S ^
Now let
y -A-!>Iy<
rst , * 1 lie '
y*. . = -z<c_y4.n> with similar meanings for y. * and y a .^. ,
(e) ^11* ** ~^-^11k' with similar meanings for y^.^ and y.^>
Y i .. = y^.."^* with similar meanings for Y. ,. and Y. , k ,
*These are called first-order interactions.
1 82 IX- APPLICATION OP NORMAL RTOTCRF.ft3IQN THEORY TO ANALYSTS OP VARIAffifTft PRflHTJqyLci
Y^i. - yij.~yi..~y. j'+y> with aimilar meanings for Y^.^ and Y^,
( Q ) ^z ______o
", with similar meanings for S. Q . and S o .. ,
S. 00 - XI(Y 1 ..-I loo ) 2 , with similar meanings for S Q . and 3 00? ,
lj J>K
OOO
Let S be the value of S with each 1^^^ 0, with similar meanings for
We may write
s
Squaring the quantity In [ ], keeping the expressions within the parentheses Intact, and
summing with respect to 1, j, k, we obtain
(8) 3- S... + S.. Q + S. Q . + S .. + S. 00 + S . + 3 OQ . + 3 000 .
It follows from Cochran's Theorem, 5.2U , (and can also be shown by moment -generating
functions) that the eight sums of squares on the right side of (g), each divided by o^,
are Independently distributed according to x 2 ~l awa with (r-1 )(s-l )(t-i ), (r-1 )(s-l ), (r-1 )
(t-1), (s-i)(t-i), (r-1), (s-1), (t-1), 1 degrees of freedom, respectively, If the 7^^ are
distributed according to (a).
The sumsof squares In (g) provide the basis for testing various hypotheses con-
cerning the Interactions I^^ Q9 I i o i c > I o jk and the maln effects T loo' ^jo' ^ok* Fbp "*
ample, suppose we wish to test the hypothesis that row-column Interaction Is zero (1. e.
each IIJQ-O) no matter what the row- layer and column- layer Interactions and main effects
may be. This hypothesis, say H[(I^ ^ )o] l may be specified as follows:
-oo< m, I ljo , I loj , I QJk , I lQO , I ojo , I Qok < co, a 2 > o,
-Cl: <^ f or all l,j,k, the sum of the I ! s In each set over any
(h) / Index being 0.
I Subspace of fl obtained by setting each I 1 1ft o.
Q.3 IT. APPLTGATTnn ng NORMAL KTarcrraaflTQir THEORY TQ ANALYSIS QP VARIANCE ppnnr.EMs
Maximizing the likelihood in (c) for variations of the parameters overn, we find
and maximizing the likelihood for variations of the parameters over o>, we find
It should be noted that in maximizing the likelihood over n we obtain as maxi-
mum likelihood estimates of I I J Q , I lok , 1^, I lQO , I Q .J O , I ook the quantities Y IJ . ,^ 1>k ,
When the hypothesis H[(I 1 j )-o] is true it follows from Theorem (A), 8.3 (see
Case 3, SB.IO), that
3 -o
are Independently distributed according to -y-lBMa with (r-l )(a-i )(t-i ) and (r-1 )(a-i )
degrees of freedom, respectively. Hence the F-ratio for testing this hypothesis la
irtilch i distributed according to h (r-1 )( 8 . 1 j j (r-1 )(s-i )(t-i J^^ when P^^ijo^" ^ is
true. In a similar manner F-ratios can be set up for testing the hypothesis of zero row-
layer or zero column- layer interaction.
The constituents in (g) also provide a method of testing the hypothesis of no
interaction between rows and columns in a two-way layout from t (t^>2) replications of the
layout. This hypothesis amounts to the hypothesis that effects due to rows and columns
are additive on the mean value of the y^, in which case the mean value of y^. is of the
form m + I loo I Q J O In this problem we consider y i ^ k (i - l,2,...,r; j 1,2, ...,s) as
the variables associated with the k-th replicate, and assume the mean value of y^ k to be
m + I^j + 1^ + I Q 4 The problem is to test the hypothesis that each 1^. o. This
hypothesis which will be called Hf t(I 1: j )-o] is specified as follows:
!-oo < m, I 1; j , I 10Q , I j < oo, <r 2 > o, for each i and
j, where the sum of the I f s in each set over each index
ISO.
o>: { The subspace ofn obtained by setting each 1^ . o.
18U IX. APPLICATION OP NORMAL REGRESSION THEORY TO ANALYSIS OF VARIANCE
Maximizing the likelihood fvinctlon In (c) for variations of the parameters over H, we
find
-A
and similarly
By Theorem (A), 58-3, It follows that
S...+S? .+3g..+3g .
Mo o 9
^ cr^ cr^
are independently distributed according to y^-lawa with ra(t-i) and (r-i)(s-i) degrees of
freedom, respectively, when H 1 [(I il - )=o], is true, and hence under the same assumptions
rs(t-i ) 3
(P) P-
is distributed according to
In a similar manner, the existence of second-border Interaction in a three-way
layout may be tested on basis of replications of the three-way layout. This problem,
however, leads us into four-way layouts and the details must be left to the reader.
Suppose we are interested in testing the hypothesis H[(I loo )=o] that the I i
no matter what the interactions and main effects due to columns and layers may be.
This hypothesis may be specified as follows:
n: | Same as fl in (h).
(q)
I Subspace of il for which each I.i oo = 0.
We have
and
and hence by Theorem (A), 8.3 ,
69. S IX> APPLICATION OF NORMAL REPRESSION THEORY TO ANALYSIS OP VARANCE PRnRT.EVS 185
rat
IV ^2
are independently distributed according to ~^~ laws with (r-1 )(s-l )(t-l ) and (r-1) degrees
of freedom, respectively, when H[(I loo )=o], is true, and the P-ratio for the hypothesis is
which is distributed according to h, ,v . . Wa , w . , A (P)dF. Similar tests exist for
\ r- 1 ; , ( r- 1 ; \ s- i )( L- i ;
testing the hypothesis that the I . = o or that the I ook = 0.
Suppose the interactions I.. _. , I^**., and !.,_,, are all zero and that it is de-
XJO OJiv lOcC
sired to test the hypothesis that the main effects due to rows are 0, i. e., I, 0.
This hypothesis say H 1 f(I ioo ) a= 0] may be specified as follows:
f
|-oo < m, I 10Q , I ojo , I Qok < co, a 2 > 0,
We find
and hence
<li -
1
v
f
j: I Subspace of H obtained by setting each 1^ = 0.
..
s?. + S . + s..
rst(<r 2 -o^) &
(jj LL *OO
q 2 - 2" '
cr or
which are distributed independently according to -x^-l&wa with rst -r-s-t + 2 and
r - 1 degrees of freedom, respectively, when H! t(I ioo ) sss O] is true. The P-ratio is
(rst-r-s-t + 2) S
which has the distribution h (r-1 ^ ( r3 t- r -s-t+2) (F)dF ^ when H f [d loo )-o] is true.
'The difference between the F-ratio for testing H[(I^ )-0] and that for testing
VVflt"
H'[(I loo )-o] should be noted. In the first hypothesis the interactions are /assumed to be
THPIHTTY TO ANALYSTS Off VARTJ
town zero, and in the second one the interactions are assumed to be zero. The
sum of squares in the denominator of F for the first hypothesis is simply 3. . . while it
is 3... + S. + S Qt + S. % in the F for the second hypothesis. The terms S? tQ , S Ot >
SQ are conmonly known as interaction sums of squares, and the process of adding these to
the error sum of squares S. m % in the case of testing H f I(Ii oo )"l *- a often referred to aa
confounding first -order interactions with error. Of course, the hypothesis may be set
up in such a way that only two ( or even only one) of the Interaction sum of squares will
be confounded with error. The term confounding as it is commonly used is more general
than it is in the sense used above. For example, if layer effects (IQQ^) are assumed to
be zero throughout the hypothesis specified by (s) we would have found not only all first-
order interaction sum of squares but also layer effect sum of squares 3^ Qv confounded witt
s....
There are many hypotheses which can be tested on basis of the S's on the rigit
hand side of (g), and we shall make no attempt to catalogue them here. It is perhaps
sufficient to sumnarize the constituents of the various possible tests in the following
analysis of variance table (the 21 extending over all values of 1, j, k in each case):
Variation Due To
Sum of Squares
Degrees of
Freedom
Rows
s V Z( yi--y )S
r - 1
Columns
S, -Z(y, Jf -y) 2
B - 1
Layers
s ,-Z(y f , k -y) 8
t - 1
Row- Column Interaction
S , - Z(y j . -y , r -7, j , +y ) 2
(r-O(s-V)
Row-Layer Interaction
3. o ."" ^-(yi-ic"^* ."y. k 4 ^^
(r-i)(t-0
Column-Layer Interaction
^o- 1 ^^f Jk"^f J?"^f fk" 1 ^
(3-1)(t-1)
Error
3 ...- z <yijk-yuryi f k-y.jk + yi. .*y.j,*y,,ic-y) 2
(r-D(s^)(t-i)
Total
^ -*yijk-y> 2
rst - 1
9 A 4 latin Squares
Suppose y t j (1,J - 1,2, ...,r) are random variables distributed according to
*. 5 j r g
N(m + R * Cj + I t ^cO,where ^I R i - ^Cj ^T t - o, such that each T t occurs in con-
junction with each R^ once and only once, and with each C^ once and only onoe, each IL
occurring once and only once in conjunction with each Cj. Such an arrangement of combina-
tions of attributes is known as a Latin Square arrangement. For a given r there are namy
IX. APPLICATION OF NORMAL REGRESSION THEORY TO ANALYSIS OP VARIANCE PROBLEMS 1fi7
such arrangements, each of which can be represented in a square array in which the R 1
would be row effects > the C* column effects and the T^. treatment effects. For example,
when r - k, the following is a Latin Square arrangement of row, column and treatment
effects :
Fisher and Yates (Statistical Tables , Oliver and Boyd, Edinburgh, 1938) have tabulated
Latin Squares up to size 12 by 12.
Now consider the following hypothesis, say H[(T t )-o],to be tested on basis of
the sample
n:
u>: iSubspace In SI obtained by setting each TV 0.
In other words, we wish to test the hypothesis that the T fc are all zero, assuming that th
jj are distributed according to Nfm+RC-fTo 2 ). The probability element of the y Is
(a)
The sum of squares S In the exponent nay be written as
1,J ^
r 2 (y-m) 2 ,
where
186 IX. APPLICATION OP NORMAL REGRESSION THEORY TO ANALYSIS OP VARIANCE PRnRT.BM*
where y - l g ^Ty^ y A . - ^Zy^, y. j - ^Zy^ and y (t) - y^, denoting
summation over'all cells (1 and j) In the Latin Square array in which T t occurs. Let S^
be the value of Sp when the R, 0, with similar meanings for SQ and s.
... . . IHG, llkellhood function in (a) for variations of the parameters over
Hwe find
3j-y.j.-y. T t-y ( t)-y-
Maximizing the likelihood for variations of the parameters over u>, we aet T^.= o
(t - l,2,...,r) and maximize for variations of m, 1^, C,, a 2 (21^ = Z.Cj - 0). We find
^- J
J
ined by maximizing over .Q, and
J to be the same as those obtained by maximizing over .Q, and
(c)
It follows from Theorem (A), 8.3, (see Case 3, 8.1*3) that q 1 and q. are inde-
distributed according to the x 2 "l awa
dom respectively when H[(T t )-o] is true, where
pendently distributed according to the X 2 "l awa with (r-1 )(r-2) and (r-1 ) degrees of free-
and hence
(x-2)q p (r-2)sg
(d) F -- - - ^^ -- ^
is distributed according to h/ ^ Ur-i )(r-2)^^^ anc * ^ 3 e Q u:I - va l ent to ^ he livelihood ratio
ratio criterion for testing H[(T t )=o], it being understood, of course, that' critical values
of F for a given significance level are obtained by using the upper tail of the F distri-
bution.
In a similar manner, if HKR^J-O] denotes the hypothesis for which A is identi-
cal with that for H[(T t )0] while co is the subspace in ilfor which R 1 R ? , . . R^ *o,
IX. APPLICATION OF NORMAL REGRESSION THEORY TO ANALYSIS OF VARIANCE PROBLEMS 1 89
then we obtain for P the following:
(e) P =
which is distributed according to h/ 1 ^ ( r -2)(r-i )^ F ^ dF when HKR^JaQ] is true.
An entirely similar hypothesis, say H[(C*)-0] may be defined by considering c*>
j
as the subspace in JCL for which C 1 C 2 = ... C o, and an P similar to (e) is ob-
tained with the same distribution as that of P defined by (e).
We may summarize in the following analysis of variance table
Variation Due to
Sum of Squares
Degrees of Freedom
Rows
ag-rZtJFi.-y) 8
r - 1
Columns
s = r]T(y. r y) 2
r - 1
Treatments
s = r]My (t) -y) 2
r - i
Error
SE-W*i--^<t^ 8
(r-1 )(!->)
Total
s = Zjy 1%j -y) 2
r 2 - i
The main properties relating to the constituents of this table are the
following:
2 2 2
(2) s2/cr, s2/<j, s/or, S E /cr are independently distributed according to
with
r-1, r-1, r-i, (r-i)(r-?) degrees of freedom respectively, when all R, , C. and T+
J- J ^
are zero.
(r-i)sg
(3) p - - gJl is distributed according to h (r _ 1 j (r _ 1 ^ r . 2 j(P)dP when
la true.
(4) p - k ig distributed according to h/ r-1 j / p-1 )( r .o)( F ) dF when Ht(CJo] la trua
g
(5)
_
distributed accordlng to h/ r-1 ^ / p-1 N/ r-? x(F)dP when H[(T t )0] la true,
(6) S E /CT is distributed according to the x~ law with (r-i)(r-p) degrees of freedom for
any parameter point in/1 (i. e. no matter what values m and the R^, C-, T t may have).
^ P P
(7) S/cr is distributed according to the x -law with r"-l degrees of freedom when all R^,
C - and T t are zero.
1 QQ "DC. APFLTCATTHN OP NORMAL RTMffreflflTnN THEORY TO ANALYSIS OP VAPTAWra PttflHTJUfl
The reader will find it initructlve to wrtfy ttmt Sg la the lmliHijm of
-Cj-T^) for variations of the m, R< , C* and T+ subject to the restrictions
ij" J " JJ "*' x J t v
R^ - j Cj - ^ T t - o,and can be obtained by applying formula (k) of 8.5, noting
that all x are or 1 .
As in the case of two- and three-way and higjier order layouts, Latin Square lay-
outs have been widely used in agricultural experiments. For example in studying the ef-
fects of r types of fertilizer on yields of a certain variety of wheat, it is common to
lay out a square array of r 2 plots of equal area and to associate row and column effects
with variations In fertility of soil and associate treatments with different fertilizers.
The main assumption In such an application is that variation In fertility of soil from
plot to plot is such that yield on the plot in the i-th row and j-th column may be re-
garded as a normally distributed random variable y, , with mean value of the form m + R,
+ C, + T., (where 21 H, - X CM - X!lV 0, T f being the effect of the t-th treatment)
J t 1 j J t t t
and variance <r which is the same for all plots.
Latin Square lay-outs have also been tried out in other fields, for example in
industrial research.
9*5 Graeco-Iatin Squares
Higher order Latin Squares, known as Graeco-Latln Squares may be treated in much
the same manner as Latin Squares. A Graeco-Latln square involving, for example, a four-
way classification may be defined as follows: Let fc^l, fp^, |7 1 I, M 1 1, 1 - 1 ,2, . . .,r,
be four sets of mutually exclusive attributes. Let r 2 objects be arranged in such a way
that r of the objects have attribute a., r have attribute p^, r have attribute "y.., and
r have 1 attribute <5^, i i,2,...,r, and in such a way that exactly one object has the
combination of attributes (a , pj, 1, j-1 ,2,. . .,r, exactly one has the combination
(a^^and so on for each of the combinations (ct 1 ,d.), (p..,*^), (P****), 0^,61). We may
* J ^-j^j^-j^-J
conveniently allow the a^ to refer to rows, p* to columns, >. to treatments in an ordinary
Latin Square and let 6^ refer to the fourth classification. Let y^ (i,j - l,2,...,r) be
random variables distributed according to NCm+R.+Cj+T..-*.!! ,cr 2 ) where R<, C^, T+, II, are
r JL* Y*^tJ"**
effects due to a., p ., 7,., <J , and where 2~R< C< Z~T^ 5_ UL - o. As a matter
ijtu 1 i i jYt^ r u
of fact, we may consider the four-way classification Graeco-Latln square as a superposi-
tion of the two Latin squares fc^l, fp.^1, (7 1 I and |a l, fp^J, \6 \, the a and p re-
ferring to rows and columns in both cases, the i as treatments in the first Latin square,
and 6^ as treatments on the second Latin square, such that when the two Latin squares are
superimposed each 1^ will occur with each 6^ exactly once. Two Latin squares which have
this property are said to be orthogonal. A set of r - 1 mutually orthogonal Latin squares
$9.5 IX. APPLICATION OF NORMAL REGRESSION THEORY TO ANALYSIS OF VARIANCE PRC
la gald to form a complete set of mutually orthogonal Latin squares and vtfien superimposed
would form an (r+1 )-way classification Graeco-Latin square. Complete seta of orthogonal
Latin squares exist when r is a prime integer and also for certain other values, e. g.
r = **,8,9. The sum of squares S in the likelihood function is
and may be written as
where
r 2 (y-m) 2 ,
where y^ , y ., y, y/ t x are as defined in 9.^ and y. , is the average of all y^, having
mean values involving U u - Let sS "be the value of 3^ when the R^ with similar meanings
for sg, 1% and sg.
As before, we may define hypotheses HKR^-o], H[(Cj)-0], H[(T t )=o], and
H[(U )=0] all with the same Jl. parameter space given by
.fl:
J-oo < m, R t , Cj, T t , U u < -i-oo, or > o,
but wither parameter spaces obtained by setting each R^ o, each C. 0, each T^. 0, and
each U u = o, respectively. The F ratios for these four hypotheses may be written down by
the reader in terms of Sg, SR, S^, S^, and S^.
The analysis of variance table for the four-way Graeco-Latin square turns out to
be as follows :
192 IX, APPLICATION OF NORMAL REGRESSION THEORY TO ANALYSIS OF VARIANCE PROBLEMS SQ.6
Variation Due to
Sum of Squares
Degrees of Freedom
The a 1
s2 - rX(y<r y) 2
r - i
K 1
The ft.
- r5~ (v -y) 2
r - 1
J
c T ' j
TheTr t
3?= r^I(y (t) -y) 2
r - 1
The d
o _ r j- ( - _- } 2
r - 1
u
^n ^ u ^
Error
"E S <3 ir*i-' 5 -r ? (t)" 5 iu]*' 5)2
(r-l) (r-3)
Total
s -^(y ir r.
r 2 - 1
p o
^ ave ^ e 3ame meanings as for the Latin Square and yr u j is the
where y, y^ , , y
mean of all y^ . having attribute <$ u .
The properties of the constituents of this table are very similar to those of
the constituents of the table pertaining to the ordinary Latin Square and therefore we
shall not write them down. The reader may verify that S-^ is the minimum of
j
" ' ~ ~ ~ " x2 ~^-'~^ to the restrictions
> (74 ..-m~R,-C .-T t -U ), subject
i73-i J J
and is obtainable from formula (k), 8.5.
1
Extensions to higher order Graeco-Latin squares and complete seta of Latin
squares are straightforward.
9.6 Analysis of Variance in Incomplete Layouts
The results which have been presented in 9.2-9.5 depend on complete or bal-
anced layouts in the sense that there is exactly one random variable corresponding to each
cell of the layout, or in the sense of orthogonality exemplified by Latin Squares, Graeco-
Latin squares, and complete sets of Latin squares. Because of this element of balance the
sums of squares arising in connection with the various hypotheses are relatively simple.
The problem to be considered here is that of deriving sums of squares appropriate to tests
of hypotheses in case there are arbitrary numbers of random variables associated with the
various cells.
First let us consider the case of a two-way layout. Let y 1 , y 2 , ..., y n be the
random variables of the sample such that each y belongs to one row and one column in an
r by s layout. If a y, say y a , belongs to the i-th row and j-th column, we assume it to
o V V
be distributed according to N(m+R, +C ,,cr- ) where /_ R, Z-C< 0. We may rewrite this
V S~ i i j J
distribution as N(m+ /L-RjX 1ioL + 4- c ^ x 2 ia >cr ~ ) where for a given a the x lla (i = 1,2,..., ft)
S9.6 IX. APPLICATION OF NORMAL REGRESSION THEORY TO ANALYSIS OP VARIANCE PROBLEMS 195
are all zero except for the value of 1 corresponding to the row within which y a occurs, a
and similarly the x 2 , (j 1,2,..., a) are all zero except for the value of J correspond-
ing to the column within which y a occurs.
The likelihood function for the sample y 1 ,...., y n la
(a)
and the sum of squares In the exponent of this likelihood function Is
(b)
V~j
Now suppose we consider the hypothesis that the w* are all 0, This hypothesis H[(C.)"0]
may be specified as follows:
-OD < m, R^, C 1 < CD, cr* > o,(all 1 and j)
-fl:
(c)
GJ : [ Subspace In A obtained by setting each C* 0.
Maximizing the likelihood function for variations of the parameters InHwe find from
8.5 that the values of m, the R.^ and C^ which minimize S are given by the linear equa-
tions
V V V
- o,
- 0, 1 - 1,2,...,]
(d) -Zj
j
j " '
where / y denotes summation of all y . J~y denotes summation of all y In the 1-th
4-i- J a <* *T7 a a
denotes summation of all y Q in the J-th column, n^j is the number of y a falling
in the cell at the intersection of the i-th row and J-th column,n^, n ij and
n * - 21 n. .. It follows from 8.5 that the minimum of S for variations of the m, R, anil
J j[ J-J x
C In H is given by
oo
where
IX. APPLICATION OF NORMAL REGRESSION THEORY TO ANALYSIS OP VARIANCE PROBLEMS
(e)
A -
2 ^r~~ _ ^r~
77 a 4r a n~.
n
r.
n^ n
.1
y a n i. n i. n n
n a o
9
n i8 '
n r.
^s
1 o
n r1 n .j o o 1
n n
.3 "is-
^3
n _ o 1
8
and A^ is the minor of / y n in A. Hence
oo ^a
oo
Maximizing the likelihood function for variations of the parameters overu> we find that
the maximizing values of m and the R^ are given by the r + X- equations resulting by set-
ting A^ and all Ci equal to zero in (d) and deleting the last equation. Similarly,
where A 1 is obtained by deleting the last s + 2 rows and columns from A with exception of
the next to the last row and column. A is the minor of V y^ in A .
Hence
oo
cr 2 -!
uj n
l oo
It follows from Theorem (A), 8:3, that
q i m ^
and
t96 IX, APPLICATION OF NORMAL RBC51E33ION THEORY TO ANALYSIS OP VARIANCE
are distributed independently according to i-lsata with n - r - s + 1 and a - 1 degrees of
freedom respectively when H[(C *)(>] is true* The F ratio is therefore
(f ) _ A oo
which has the distribution h/_ - % / ^ _ Al %(F)dF when H[(C*)-0] is true. The reader may
vs-i j, vn-r-s+i ; j
verify that if m, * - 1 (all i and J) then n - rs and we have the complete two-way layout
discussed in 9-2, and In this case the F ratio reduces to that given in (b), $9*2.
The extension of the foregoing treatment to higjher order layouts is stralght-
forward and will not be considered in detail. It is perhaps sufficient to note that in
the case of higher-order layouts we would have several sets, say q, classifications, the
u-th classification consisting of p u mutually exclusive -categories, such that each y a in
the sample would belong to exactly one category in each classification. If we denote the
mean effect (on y a ) of the v-th category of the u-th classification by I uy where
5_I, 1V - o, u - l,2,...,q (or more generally several linear restrictions may be applied to
v=H uv q PU
I uv for each u) then the mean value of y a may be expressed as m + 3> *> ^v^uva wnere
ul vT
for each value of u, x uya (a - i,2,...,n) is unity for only one value of v and zero other-
wise; the value of v for which ^ uva is unity being that corresponding to the category (of
the u-th classification) within which y a falls. The problem of testing the hypothesis
that I uy for the u-th classification (u-th classification effects ) are all zero amounts to
setting up a determinant corresponding to A in (e) based on q classifications instead of
2, and performing operations similar to those performed on A to obtain A OO , A 1 , and A 00 .
The reader will find it instructive to work through the details of setting up A, q 1 , q 2 ,
and F for the case of a three-way layout when the hypothesis to be tested is that the main
effects due to one of the classifications are zero. He will also find it profitable to
treat the ordinary latin square as a three-way layout by this* method and show that the F
obtained for testing the hypothesis of no treatment effects is identical with that given
by (d) in 89. ^. The generality of this procedure should be carefully noted by the reader
because not only can all of the results previously discussed in this chapter be obtained
by this procedure, but tests for the existence of interaction between two or more classi-
fications in incomplete or unbalanced layouts may be deduced by applying the procedure.
9.7 Analysis of Covariance
Throughout all of the discussion in $$9.2-9.6 we have assumed the mean value of
the random variable in each case to consist of the sum of a general constant (which is the
same for all random variables) and constants referring to rows, columns, treatments, in-
196 IX. APPLICATION OP NORMAL REGRESSION THEORY TO ANALYSIS QF VARIANCE PPORT.FMS $Q.7
teraction, etc. It frequently happens that there are practical situations which suggest
that the mean value of the random variable should include linear functions of one or more
fixed variates (see 8.2) in addition to the sum of constants of the type mentioned above.
For example, if y,, refers to yield of wheat in a plot in the i-th row and j-th column of
a two-way layout, not only should the mean value of y^. Include a general constant and row
and column effects, but also linear effect of number of plants on this plot, say x, -. The
mean value would then be of the form m + ax. . + Rj + C., with the usual conditions on the
R 1 and C., The object of this section is to examine what modifications of 9.2-9.6
* J
should be made in order to take one or more fixed variates into account in the mean value
of the random variables involved.
Let us return to the two-way layout discussed In 9.2 and assume that the mean
value of y, . depends linearly not only on m, R. and C . but also on a fixed variable X, ..
* J -^ J ! J
In other words, assume that the y, are random variables Independently distributed ac-
cording to N(m+ax, _.+R.-fC,,cr : ), where^I R, = ^LC- = o. The question arises as to what
1J 1 J i 1 j J
forms the P- ratios take for testing the hypothesis that the C- are all zero or the hypothe-
J
sis that the R^ are all zero, when the/1 parameter space is the (r + 3 + i)-dimensional
space for which -oo < a, m, R,, C. < +00, cr^ > o. The probability element of the y . . is
exactly that given in (a), 9. 2, with y^. replaced by y^. - ax^.. Making this substitution
In (b), 9.2, we see that the sum of squares in the exponent of this probability element
(for any point in A) may be broken down into the following components:
(a) ;
where Y 1 . y, 4-y..-y *+y, with similar meaning for X 1 . ,and Y 1m y, ,-y, with similar mean-
*-jiji**j ij 1*1,*
ing for Y j, X^., X .. The first sum of squares on the right In (a) may be written as
(b)
where a
Making the substitution (b) in (a) we obtain 5 sums of squares which when divided by a 2
are (by Cochran f s theorem) distributed independently according to -/f-Iawa with (r-1)(s-1)
-1, 1, r-1 , s-1, 1 degrees of freedom, respectively.
Now suppose we wish to test the hypothesis H 1 [(C 1 0=o] which is specified as
follows :
+ X^(Y 1 .-aX 1 .-R 1 ) 2 4- ZjY ^ .-C.) 2 + rs(y-ax-m) 2 ,
1>J ijj
89.7 PPLTrA>rTQW OP NQRMAT. BHOREaaTQll THEORY TO ANALYSIS OP VARIANCE PRnm.EMfl 1 q
) -OD< a, m, R t , C, < oo , r 2 > 0, (all
n:
u>: I The aubapace In IL obtained by setting each C* - o.
Maximizing the likelihood function for variations of the parameters lnJ% which
la equivalent to minimizing S aa far aa variations of a, m, R*, C^ are concerned, we
obtain
(d) R 1A -? 1 . - fe., Cj^-Y.., - ^X.j, ^-y- a^x, a^
The aum of squares In the exponent of the probability element for any point In GJ (1. e.,
all C - 0) may be expressed In terms of the following components:
(e) S^- UYi+Y) - aCX+X.)] + Lf.-aX) 2 * rs(y-ax-m) 2 .
^ .
J-* J 1*9 J
Maximizing the likelihood function for variations of the parameters In a* amounts to mini-
mizing 3 as far as m, a, R^ are concerned. We find
- Y -
By Theorem (A), 8.3, It follows that
are Independently distributed according to x 2 -lawa with (r-i)(s-i)-i and 3-1 degrees of
freedom, respectively, when H^tC^-O] Is true. Hence the P-ratlo for this hypothesis Is
which has the distribution h g-1 f / r . 1 )( 3 .i ).i(?)^ when the hypothesis Is true.
It should be noted that rsc^ and rso^ can each be expressed In terras of deter-
minants (see ( g ), (8.2) as follows:
1 98 IX. APPLICATION OF NORMAL REGRESSION THEORY TO ANALYSIS OP VARIANCE PROBLEMS 9.7
rs oa-
In a similar manner we may define hypothesis H 1 [(R 1 )-o] by replacing GJ - by R.^ - in
A p Ag
the specification of o>. We find that o^ remains the same but oj^ for this hypothesis is
identical with that for H 1 t(Cj)o] after replacing . and X by Y^ and ^, respectively,
The F-ratio for H.KR, )=o] is distributed according to h_, , , w 1X JFJdF.
I L| >vr~ijv3~i/""i
The constituents which are used in making up the F-ratios for testing the two
hypotheses considered above may be set forth in the following analysis of covarlance table:
Variation Due To
Sums of Squares
Cross Products
(x) (y)
Degrees
of Freedom
y
X
Rows
?'
?.
13 1 ' 1 '
r
- 1
Columns
^i
F !j
? >j? ' J
s
- 1
Error
&
^j
^ViJ
(r-1
)(s-i)
Total
'*>*
^c^)
S^)U lf5 )
rs - 1
The results obtained for the case of one fixed variate may be extended in a
rather straight forward manner to the 'case of k fixed varlates where k < (r-1 )(s-l ). Thus,
if k fixed variates x^i, p i,2,.,.,k, are taken into account linearly 'in our two-way
layout, W e would begin by replacing y^ . in the probability element in (a), 9. 2, by
(y^j-2_a x ^.) and follow a procedure similar to that for the case of one fixed variate.
Thus, in place of QX^^ 0X4 . aX. o aX in (a) we would have^_a X ,.,
n^-v /_aX. respectively, where the meanings of X , ., 3L 1 . , Xp.^
r r \J - r^ x^ x^^-J f^- f J
Th
are obvious.
e reader will find it instructive to carry out the details in arriving at F-ratios for
S9.7 IX, APPLICATION OP NORMAL REGRESSION THEORY TO ANALYSIS OP VARIANCE PRC
testing hypotheses H^[(Cj)o] and H^KR^H)] which are k-fixed-varlate analogues of
HjKfijH)] and HjKR^-0], respectively.
The procedure which we have outlined for introducing fixed variates linearly
into the mean value of the random variables in a two-way layout extends in a straight
forward manner to three-way layouts, Latin squares, Graeco-Latin squares, and to incom-
plete or non- orthogonal layouts of the type discussed in 9.6. We shall have to leave
the matter of carrying out details as exercises for the reader. Because of the generality
of 9.6 it is perhaps worth while to remark, without going through the details of proof,
that if one fixed variate is introduced linearly into the mean value of y a , which would
amount to replacing m by m + ax a in (a), 9.6, the effect on the determinant A as defined
in'(e), 9.6, would be to insert another row and column into A as second row and second col-
ujnn, the r + s + 5 elements of this row and column being
1 I c 3
reading left to right in the row, and reading top to bottom in the column. This augmented
determinant has its own A . A, A 1 (see 9.6) which are obtained by operations analo-
gous to those used in obtaining A QQ , A 1 , A f Qo from A in 9-6. The extension of our pro-
cedure to the problem of linearly taking into account k fixed variates in the mean value
of y a in 9.6 is straightforward and will be left to the reader.
CHAPTER X
ON COMBINATORIAL STATISTICAL THEORY
Many problems in distribution or sampling theory in statistics reduce to combin-
atorial considerations. For example, the derivation of the binomial distribution (3.11)
depends on the determination of the number of distinct orders in which x p ! s and n-x q ! s
can be multiplied together, and similarly the derivation of the multinomial distribution
(3.12) depends on the enumeration of the number of distinct orders in which n. p. f s,
JL k ] ]
n o Po*****^ PI^ |S can b multiplied together where >_ p, -i , y~n, n. A majority of
d d * K ii K i^"i - 1
the combinatorial problems of the drawing-balls-from-urns variety involve direct applica-
tions of permutation and combination formulas, which in turn are often simply expressible
in terms of binomial and multinomial coefficients. -The theory of sampling from a finite
population (J*.3) is based on the use of binomial and multinomial coefficients and their
use as weigjhts in various averaging operations. The sampling theory of order statistics
($l*.5) is a direct application of the multinomial distribution law to probability functions
of continuous random variables.
The object of the present chapter is to discuss some of the more complicated dis-
tribution problems in combinatorial statistical theory which are of particular interest in
applied mathematical statistics. More specifically, we shall present some results on the
theory of runs, the theory of matching and its application to testing Independence in con-
tingency tables, Pearson's original ^-problem, and Inspection sampling,
10.1 On the Theory of Runs
Suppose we have an arbitrary sequence of n elements, each element being one of
several mutually exclusive kinds. Bach sequence of elements of one kind is called a run.
The simplest case is that in which there are two kinds of objects. We shall consider this
case in detail, and also present briefly some results for the case of several kinds of
elements.
10,11 Case of Two Kinds of Elements
Suppose we have n 1 a f s and n g b f s (n^n -n). Let r 1 , denote the number of runs
of a ! s of length j and r, denote the number of runs of b f s of length j. For example, if
X. OH CQMBIHATQRIAL STATISTICAL THBflff
the arrangement Is
aaabbaabaabbab ,
then r n - 1, r 12 - 2, r 15 -1 , r gl - 2, r 22 - 2, and the other r f a are zero. It should be
observed that ^.Jr- 1 - n. , the number of a's, and also 2_ Jr j - n . Let r. -
J j j J
d that ^.Jr- 1 - n. , the number of a's, and also 2_ Jr j - n . Let r. - 2r;.4
V" J j j J J 3
- Z_ r o< denote the total number of runs of a's and b's, respectively. For a given
j r
set of numbers r--, r 10 , r-.,... there are ~ f- | ' - - r ways of arranging the r-
11 12 ^ r n* r i2 ........ r mi rj ]
runs of & ! s. And for a specified set, r ,, there ape n; i - - - - r **&* of arrang-
2
** p
ing r g runs of b's. It is clear that r 1 cannot differ from r 2 by more than unity, for if
it did two runs of one kind of element would have to be adjacent, but this is contrary to
the definition of runs. If r 1 - r g , a given arrangement of runs of a f s can be fitted Into
a given arrangement of runs of b f s in two ways, either with a run of a f s first or with a
run of b f s first. We define the function F(rj,r 2 ) to be the number of waya of arranging
r,j objects of one kind and r g objects of another so that no two adjacent objects are of
the same kind. Clearly,
(a) F(r lf P 2 ) - if Ir r r 2 l > 1
- i if Ir r r 2 l -1
- 2 if r, - r g .
The total number of ways of getting the set r^ (i - 1,2; j - 1,2,...,^) is
rj r 1
2 -T P(r 1f r 2 ) .
nl
Since there are \ possible arrangements of a ! s and b f a, the joint distribution func-
n r n 2 .
tlon of the given set r.. (all possible arrangements given equal weight) Is
Now let us determine the joint distribution of the n^. To do this we
r J J
with respect to the r .. We wish to sum i ^ - r over all partitions of n , V e.. for
n^2J r 21 ....r . 2
all r g i such that Z-3r 2 ^ - n g and 2l.r 2 J - r 2 . In order to do this, consider
M-x) 2
t
"
202 _ X. ON COMBINATORIAL STATISTICAL THEORY _ t10.11
n 2
It ia evident that the coefficient of x in the initial expression is the sum
r^l n
- - r that we desire. The coefficient of x in the final expression is the co-
Z
efficient of the term for which r +t n , i. e.,t - n -r~. Therefore the desired sum is
(r -un p -r ){ (n p -i).' 22 22
(rg-i(ng-r g )l " (r 2 -i)i(n 2 -r g )] Hence > the ^ oint Distribution function of the r^ and
r z l3
V (n 2 -l)J
(c) P (r ij' r 2> ~ r n l...r 1ni l (r 2 -iJJ(n 2 -r 2 Ji ' P(r l '
Now we sum out r. By (a) we get
}J F(r i 'V " (rajKn-r*! )I * 1 + (r-i jJdi-r )J '
r, .'
This gives ua the joint distribution function of the r 1 .,
__ r,! _ (n g +i)l
(d) p(r 1 .)= T I - r . T / _ - ^ i
'J r i1' r ! ---- 1n * ! 1
with a similar expression holding for the joint distribution of the r^.
Another important distribution is the joint distribution of r 1 and r g . We get
this by summing out the r 1 . in (c), just as we summed (b) with respect to the r 2 ^ to
tain (c). The result ia
(e) P(r v r 2 ) = ( P -I JK-r )J ' (r-i )1 (n-r) ' p(r i '
Finally, we find the distribution function of r 1 by aumming (e) with respect to r 2 , ob-
taining
(n^i). 1 (n 2 +i)J I nl
(f) p(r i )as (^-1)1(^-^)1 r 1 . l (n 2 +i
The distribution of the total number of runs of a f s and b ! a is of considerable
interest in applications of run theory. It ia uaed aa a teat for randomness of the ar-
rangement of a 1 a and b'a; the amaller the total number of runa the more untenable the hy-
potheala of randomneaa. Let u ^ + r-, the total number of runa. To find the distribu-
tion of u we must sum (e) over all points in the r^ , r 2 plane for which u - rj + r 2 . We
have two caaea, (1) u 2k (even) and (2) u - 2k- 1 (odd). To find the probability that
S10.11 X. ON COMBINATORIAL STATISTICAL THEORY 205
u - 2k, we note there is only one point In the r-,r 2 plane for which u r- + r~ 2k
where F(r 1 ,r 2 ) J 0, and that point Is (k,k). When u - r 1 + r g 2k - 1 there are two
points at which F(r^ 9 r 2 ) J o, namely (k, k-1) and (k-1, k). Hence from (e) we have at
once (using the notation m C n O) :
n.-i n -l / n.+n 9
Pr(u-2k) = 2 ( k 1 ) ( ^ ) /( 1 2 }
K i / 1
(g)
Pr(u=2k-l ) - ^^
n. +n-
This distribution was derived by Stevens and also by Wald and Wolfowitz** and
Q*
the function Pr(uu ! ) >" p(u) haa been tabulated by Swed and Eisenhart*** for n 1 n g
(n^m, n -n In their notation) from the case n 1 - 2, n g 20 to n 1 - 19, n g - 20 for
various values of u ! .
Another probability function of considerable interest in the application of the
theory of runs is the probability of getting at least one run of a ! s of length s or
greater or in other words the probability that at least one of the variables r. , r. ,.,
r is+2'"'' in the di3tribution (d) ia ^ 1 Mosteller has solved this problem for the
case n 1 - n 2 n. To obtain this probability we put n 1 - n 2 - n in (d),thus obtaining
and sum over all terms such that at least one of the variables r 1 , r 1 - , . . . ^ 1 . We can
accomplish the same thing by summing over all terms such that all of these variables are
zero, and subtracting the result from unity. To do this we must sum the multinomial co-
efficient In (h) over all values of r^,...,^ such that r, - r^ .. ... r- 0,
W. L. Stevens, "Distribution of Groups in a Sequence of Alternatives", Annals of
Eugenics, Vol. IX (1939).
A. Wald and J. Wolfowitz, "On a Test of Whether Two Samples are from the Same Popula-
tion", Annals of Math. Stat . , Vol. XI (19^0).
***Prieda S. Swed and C. Eisenhart, "Tables for Testing Randomness of Grouping in a
Sequence of Alternatives", Annals of Math. Stat., Vol. XIV
****Frederick Mosteller, "Note on an Application of Runs to Quality Control Charts",
Annals of Math. Stat., Vol. XII
2pl X. ON COMBINATORIAL STATISTICAL THEORY
> jr 1 , - n, > r 1 , - r 1 , and then aura with reapect to r 1 . It will be noted that the aum
of the multinomial coefficients under theae conditiona ia given by the coefficient o'f x 11
in the formal expanaion of
v s-i r l r l s-1 r ! V" r i~ ut t
(X+X+...+X ) -X (1-X ) / ( , )X .
' o r r 1
which ia
Jj 1 r, n-j(a-i)-l
The deaired probability of at least one run of length a or greater ia therefore
(i) Pr(at leaat one of r.^i, J^s) - 1
J.1 1 r- n-l-j(s-i ) n+1
: SZt-iHt^X r ., )( r
r. Jo J P 1 ] r i
! -
2n
the aummation on r 1 extending from ^/>t^ the largest integer rti^' Applying similar
methoda to each of the multinomial coefficients in (b), Moateller has shown that the prob-
ability of getting at least one run of a ! s or b f s of length s or greater is
2 ,
(j) Pr(at leaat one of r, j or r 2j > 1 , j ^ a) = 1 - A/r)
where
r 1 n-i-j(s-l )
( j )( r -l
the r 1 aummation being aimilar to that in (i). Moateller has tabulated the smallest value
of a for which each of the probabilities (i) and ( j) is ^ .05 and- .01 for 2n - 10, 20, 30,
JK>, 50.
In order to indicate how to find moments of run variables let us consider the
case of r 1 . We ahall firat find the factorial moments E(x^) where
x< a >-
for they are earaier to find then ordinary momenta in the preaent problem. Prom them the
ordinary momenta may be found aince E(x^ ') ia a linear function of the firat i ordinary
momenta. Letting i - 1,2,..., a, we obtain a ayatem of a linear equations which may be
solved to obtain the ordinary moments as linear functions of the factorial moments.
$10.12 X. ON COMBINATORIAL STATISTICAL THBQRY 303
We have
L 1
In order to evaluate (k) we use the following Identity:
m T" ,_ ...ft! , ... - Bl ______ (A+B)J
1 ' - (Ofl)I(A-C-l)J 1J(B-1)J ,(C+B)J(A-C)!
which follows at once by equating coeff Iclenta of x In the expansion of
(m)
Therefore we have upon substituting p(r 1 ) from (f ) Into (k), simplifying, and using (1)
-- _
- *"2 T " (n r a)Jn 2 J
Prom this result we find
(n +1 )n.
n ->
A similar expression holds for
If the.a's and b's are regarded as elements In a sample of size n from a bi-
nomial population In which p and q represent the probabilities associated with a and b,
respectively, then n- , the number of a f s, Is a random variable distributed according to
n 1 n
the binomial law ^C.. p q . The probability laws analagous to (b), (c), (d), (e), (f )
n ii
when n 1 Is regarded as a random variable In this manner are simply obtained by multiplying
n i n 2
each of these probability laws by C n p q .
10,12 Case of k Kinds of Elements
The theory of runs has been extended to the case of several kinds of elements by
Mooi*. If there are k kinds of elements, say a 1 , a 2 ,-. ..,8^, denote by r^j the number of
runs of a^ of length j. Let r, be the* total number of runs of a^ Mood has shown that
*A. M. Mood, "The Theory of Runs", Annals of Math. Stat., Vol. XI (19^0) .
206 X. ON COMBINATORIAL STATISTICAL THEORY $10.2
the joint distribution law of the r^ is given by
r. , A V ,, r n , / nl
IT,*) " I I r^ ^ r F(r. ,r ,...,ivj / ? i
(a) p(r 1;) ) - FT rii ; r t,,, r ^ ' FdV^,...,^) / n< in ^| ; , ,
where P(r 1 ,r , . . .,r^) is the number of ways r 1 objects of one kind, r 2 objects of a second
kind, and so on, can be arranged so that no two adjacent objects are of the same kind.
r l r 2 ^
Is the coefficient of x 1 * 2 Xj c in the expansion of
k r- -i r -1 n-
(b) (xX+.- *
The argument for establishing (a) is very similar to that for the case of k 2 and will
not be repeated. Mood showed that the joint distribution function of r- ,r , . . .,r, is
I c. 1C
given by
(c) p(r 1 ,r 2 ,...,r k )
i i i ._ .1 *\*i**o'**lr / / n '-n TV
li '^-i^ 12 K / n^ng r^.
which we state without proof. Various moment formulas and asymptotic distribution func-
*
tions have been derived by Mood in the paper cited.
If instead of holding ^ ,n g , . . .,11^ fixed in the run problem for k kinds of
elements, we allow the n f s to be random variables with probability function Tf(n 1 ,n g ,..,nj c )
(e. g., the multinomial distribution with 21 n i**n), the run distribution functions (a) and
1 1
(c) would simply be multiplied by r^n^n^...,!^).
10.2 Application of Run Theory to Ordering Within Samples
Suppose 2n+1 (x l > x 2 >-*-> x 2n+l ' ls a 3am P le from a population in which x is a
continuous random variable. Let x be the median value of x In the sample. Let each
sample value of x < x be called a and each sample value of x > x be called b. There are
n a f s and n b f s in the sample, ignoring the median (which is neither). Now suppose we
consider all possible orders In which the sample x's could have been drawn (ignoring the
median in each case). It is clear that all of the run distribution functions (b), (c),
(d), (e), (f) are applicable, for n 1 * n 2 n, to this aggregate of possible orders of the
x*s (i. e. a ! s and b f a) in the sample. If there is an even number, say 2n, items in the
sample, we can take any number between the two middle values of x in the sample as a num-
ber for dividing the x ! s Into a ! s and b ! s, and our run theory is immediately applicable <to
this case with n 1 = n^ n. In general if in a sample of size kn + k - 1 we choose the
(n+1 )th, (2n--2)th, (3n+3)th, . . . .,(k-1 )(n+1 )th values of x in increasing order of magnitude
as points of division, and let all x's less than the (n+1 )th x be denoted by a 1 , those
110. g X. ON COMBINATORIAL STATISTICAL THEORY 207
"W
between the (2n+2)th and (tesff )th by a 2 , and 30 on, we then reduce our sample to n a. 'a,
n a 2 ! s, ,n a^'s- Ignoring the k - 1 x ! s used for division points, it is clear that
run theory for k kinds of objects is applicable to the aggregate of all possible orders in
which sample x ! s could occur (ignoring the x f s used for division points). The points of
division can, of course, be taken so as to yield an arbitrary number of a 1 ! s, a 2 'a, etc.
By classifying the values of x in a sample into a f s and b f s (or more generally
into a 1 'a, a 2 ! s, . . . ,8^*3) and using the theory of runs we have a basis for testing the
hypothesis of randomness in the sample as far as order is concerned. The more commonly
used tests of the hypothesis of randomness based on run theory are:
(1) Number of runs of a's,for which the distribution is (f), 10. n. For given
values of n^ and n 2 , the test consists in finding the largest value of r 1 (the
number of runs of a f s), say r, for which Pr(r 1 ^r^ > ) ^ , e. g., for - .05. A
similar statement may be made concerning runs of b f s.
(2) Total number of runs of a f s and of b f s having distribution (g), 10.11. Again,
the test consists in finding the largest value of u, say u, for which
Pr(u u) j^, for given values of n 1 and n 2 .
(3) At least one run of a f s (or b ! s) of at least length s, for n 1 n 2 = n, based
on the distribution (1), 10.11. The test consists of finding the smallest
value of s for which probability (1) is < .
(k) At least one run of either a f s or b f s of at least length s, for n 1 - n ? - n,
based on the distribution (j), 10.11. The test consists of finding the small-
est s for which probability ( j) is < .
The distribution theory of each of these tests has been determined under the
assumption that the hypothesis of randomness is true, with a view to controlling only
Type I (see 7-3 ) errors. Type II errors for these tests have never been investigated,
i. e., probability theory of the testa when some alternative weighting scheme (other than
equal weights) is used for the different possible arrangements of a ! s and b f s.
It should be noted by the reader that the theory of runs developed in 10.11 is
not applicable to the following type of problem of reducing a sample to two kinds of ele-
ments: Suppose x-,x 2 ,...,x are elements of a sample from a population with a continuous
distribution function. Consider an arbitrary order of these n x ! s, and between each suc-
cessive pair of elements write a if the left number of the pair ia smaller than the right
and b if it is larger. We then have reduced the sample to n - 1 a'a and b f s. We may de-
fine runa of a'a and b's as before, but the theory of arrangements of the a 1 a and b ! a as
defined from the corresponding arrangements, and hence the distribution theory of runa of
this type, is an unsolved problem in combinatorial atatiatica.
gpfl X. ON COMBINATORIAL STATISTICAL THEORY M10.5. 10.^1
10.3 Matching Theory
A problem which frequently arises In combinatorial statistics is one which may
be conveniently described by an example of card matching. Suppose each of two decks of
ordinary playing cards is shuffled and let a card be dealt from each deck. If the two
cards are of the same suit let us call the result a match. Let this procedure be contin-
ued until the entire 52 pairs of cards are dealt. There will be a total Dumber of
matches, say h. Each possible permutation of one deck compared with each possible per-
mutation of the second deck will yield a value of h between o and 52, inclusive.. There-
fore If we consider all of these possible permutations with equal weight, we inquire as
to what will be the distribution function of h in this set of permutations. Similarly if
we consider three decks D I , D 2 , and D, of cards to be shuffled and matched we would have
triple matches and three varieties of double matches. A triple match would occur if the
three cards in a single dealing from the three decks were of the same suit. As for
double matches, they would occur between decks D I , D 2 , between D 1 , D, and between D g , D,.
The problem arises as to what will be the distribution of triple matches and of the three
varieties of double matches.
Extensions of the problem to more than three decks, to decks with arbitrary
numbers of cards In each suit and an arbitrary number of suits suggest themselves at once.
In this section we shall present some techniques for dealing with this problem without
attempting to be exhaustive. It will be convenient to continue our discussion in card
terminology, for no particular advantage is gained Hi introducing more general terminology.
The generality of the results for objects or elements other than cards Is obvious.
10.3V Case of Two Decks of Cards
Suppose we have a deck D T of n cards, each card belonging to one and only one
of the k suits C^, Cg,...,^. Let n^, n l2 ,...,n 1k (^n^-n) be the number of cards be-
longing to C 1f Cg,...,^, respectively. Let D 2 be another deck of n cards, each card be-
longing to one and only one of the classes C 1 , Cg,...,^. Let n gl , n 22 ,...,n 2k
(2l n 2i" n ) k the number of cards In D 2 belonging to C t , C 2 , .. .,0^, respectively.
The problem is to determine the probability of obtaining h matches under the
assumption of random pairing of the cards. In other words, we wish to find the number of
ways the two decks of cards can be arranged so as to obtain exactly h matches. Dividing
this number by N, the total number of ways the two decks can be arranged, we obtain the
probability of obtaining h matches under random pairing. The value of N. is simply the *
total number of ways the two suits can be permuted, and is given by the product of two
multinomial coefficients:
{10.31 X. ON COMBINATORIAL STATISTICAL THEORY 209
(a)
To determine N(h), consider the enumerating function
n
(b)
where <5 1 . - 1 , If 1 - j, and o,.if 1 / j. We associate the auxiliary variables x v ,x 2 ,...,
x k with the suits C^, C^...,^ respectively of the first deck, and the auxiliary vari-
ables y 1 ,y 2 ,.. . ,y k with the corresponding suits of the second deck. 4> la the product of
n Identical expressions, each expression consisting of the sum of k 2 terms, each term
d i1 6
being a product of an x and a y. The term xye J In any one of the n factors corres-
ponds to the event of a card In suit (L of the first deck being paired agrinst a card In
suit C. of the second deck. If 1 j we have a match, and e 6 occurs as a factor. Now
suppose we pick a typical term In the product given In (b). Such a term would be of the
form
"Ml 6 S^ J 6
(c) (X, y 1 e 1 1 )( Xl y. e 2 2 )....
1 1 J 1 X 2 J 2
This general term corresponds to the event of n pairings as follows: a pairing between
(L of D- and C. of D_; a pairing between C, of D. and C^ of D ; ....; and a pairing be-
IT ' JT ^ X 2 J 2
between C i of D I and C, of D 2 . Now If the compositions of D I and D 2 are specified as
n *^n
n n , n 12 ,...,n lk and n gl >n g2 , .. ->n 2k , respectively, then It follows that the only terms In
the expansion of (b) which have any meaning for pairings of these two decks of cards are
those of the form
hA n. . n 19 n. v n 91 n^^ n .
n ^ ' ^ K 2 2 dK
where h Is an Integer such that h n. It should be noted that such terras may not
Various authors have considered various enumerating functions, but the one which we shall
use was devised by I. L. Battln, "On the Problem of Multiple Matching", Annals of Math.
Stat . 9 Vol. XIII (19^2). Battin's function is relatively easy to handle and has the ad-
vantage of representing the two decks of cards symmetrically in the notation and opera-
tions. It extends readily to the case of several decks of cards. The reader should re-
fer to Battin f s paper for a fairly extensive bibliography on the matching problem.
210 X. ON COMBINATORIAL STATISTICAL THEORY {m *1
exist for some values of h between o andU, which means that it is not always possible to
have any arbitrary number of matches for given deck compositions. The tenn given in (d)
corresponds to some arrangement of the two decks of cards such that there are exactly h
matches. In general there are many such terms. Therefore, if we expand to and determine
the coefficient of the expression given by (d) we obtain the value of N(h), the number of
ways in which h matches can occur. To simplify our notation let 1^(4)) denote the opera-
tion' of taking the coefficient of expression (d) In the expansion of 4>. We may rewrite <t>
as
(e) to
Expanding we have
n-h
Expanding the expression in [ ], we have
Inserting this expression Into (f), and expanding (XI x, ) 8 (51 y*) 8 (Zl^y*) 11 "? we find
x 1 x 1
(h) N(h) Kfr(to) - s VftM g Mg 9
where
V- (gl) 2 (n-g)l
M \ T y .
8 Y-lfc
8 1 TT [<n 11 -8 1 )I(n 2l -B 1 )Ja 1 ll
the sunnation extending over all positive Integral (or zero) values of the s, such that
\r
^9j n-g and H II -S I ^ 0, n^-Sj, ^ 0, i - 1 ,2, .. .,k. The probability P(h) of obtaining
h matches is therefore N(h)/N, where N is given by (a).
For the case k - 2, the probability of h matches reduces to the following ex-
pression
n n
where 1 - ^(n 1 ^ngg-t-h), J - -^(n^+ngg-h). Dhless h la such that for given valuea of
and n 22 , n n + (n 22 ~h) are positive even Integers or o, then P(h) - o.
$10.31 X. ON COMBINATORIAL STATISTICAL THEORY 21 1
Greville* has given the distribution of h in a slightly different form and by
another method.
Moments of the random variable h can be found directly from the enumerating
function <(>. We have
(k) - ]
JL1 i 1 A1 i P A1 i IT 1Ji p 1 iA pp 1Ar
coefficient of x 1 x 2 ...x k ^r 1 y 2 y k
in the expansion of wi
The reader will find it instructive to carry out this operation for p 1,2, and
find that
f n i1 n pl
, "IT 21 '
(1) a 2 - E(h 2 ) - [E(h)] 2
It should be noted that our results can be readily extended to the case of two
decks of cards in which the total numbers of cards are different or where one or more of
the suits may have no cards at all. To consider, the case of unequal total numbers of
cards, say n 1 in deck D 1 and n g in deck E^ where, without loss of generality, we can let
n 1 > n 2 , we simply add to D 2 n 1 - n 2 dummy cards, and consider them as a new suit. We
would thus have k + 1 suits of cards, where the (k+1 )-thsuit is empty in DJ, i. e.
n 1lc+1 - o, n ^ j - n 1 - n 2 . The procedure from here on is just as before. The case in
which some of the suits are empty in one or both decks is taken into account by specifying
the values of the corresponding n^ or n gi as in expanding <t> and collecting terms.
The reader should note that if a score s^ . is assigned to a pairing in which the
D. card belongs to the 1-th suit and the D 2 card belongs to the j-th suit, then one can
find the distribution of the total score T in n pairings (i. e.,when the two decks are
paired against each other) under the assumption of random matching, by replacing 6^* by
T. N. E. Greville, "The Frequency Distribution of a General Matching Problem", Annals
of Math. Stat., Vol.- XII
212 X. ON COMBINATORIAL STATISTICAL THEORY
8,1
In (b) and finding the coefficient of
^22 J*2k
In the expansion of the resulting expression. The procedure for finding E(T) and a| and
higher moments Is the same as that' for dealing with the moments of h with s^, substituted
for 4 ir
10*32 Case of Three or More Decks of Cards
Suppose we have a third deck of cards, say D,. Let the numbers of cards be-
longing to suits C^ Cg,...,^ be n , n- 2 ,..,n* k . A triple match has been defined as one
In which the triplet of cards (one from each deck) are of the same ault. A double match
between D 1 and D g will occur when the cards from D 1 and D 2 In a triplet are of the same
suit but different f!om the suit of the card from D, In the triplet. Double matches be-
tween Dj, D. and D 2 , D. are similarly defined. If In the complete set of n triplets from
the three decks we let h 125 be the number of triple matches, h 12 the number of double
matches between D^ D g , with similar meanings for h 15 and h 23> we may obtain the distribu-
tions and moments of the h f s from the following enumerating function:
*ljk 6 123 * 'VlS * 4 ik 6 13 + V25 -
(a)
where 6* ^ -1, if 1 - j - k, and otherwise. The remaining 4 'a are defined as for the
2-deck problem. By following an argument similar to that for the 2-deck problem, It will
be noted that the number of ways in which the three decks of cards can be permuted so as
to obtain h 12 - triple matches, and h 12 , h 15 , h g5 double matches between D^ D 2 ; D 1 , D,;
and Dg, D^, respectively, is given by the coefficient of
where
Q x^ Xg . tX^ . y^ y 2 7^ ^\ z % *** z k
in the expansion of t.
This coefficient and hence the joint probability law of the h's is rather cum-
bersome and will not be given here. As in the case of the 2-deck problem we may find
moments and joint moments of the h ! s by performing differentiations on 4> with respect to
the 6's, that Is,
S510.4.
COMBINATORIAL STATISTICAL THEORY
r. r 2 r r, - ,
(c) ^23*12*1 3*23 * - J ^Coeff . of Q in
where
N -
(nJ)
The mean values of the h f s are the following:
6 f a
0\
n
B(h 18 ) -
with similar expressions for ECh^) and E(h 2 ,). The reader may refer to Battin'a paper
for second moments.
The extension of our technique to the problem of determining the distribution
and moments of the numbers of hits for various orders of multiple matching when more than
three decks of cards are involved is immediate. The extension of the results to the case
of decks of unequal numbers of cards, empty suits, etc., when three or more decks are con-
sidered, is straightforward.
10.4 Independence in Contingency Tables.
In this section we shall consider the problem of testing the independence of a
two-way classification on basis of a sample of n elements, each element belonging to one
and only one the classes A 1 , A 2 ,...,A r and to one and only one of the classes B^ , B 2 ,
...,B a . In the sample, let n, * be the number of elements belonging to A,, and B*. Let
s r X J J- J
n . n .,, 3>~n 4 4 - n. The number of elements belonging to A< is ix. and
J 11.
the number to B 4 is n 4 . The problem is to test. the hypothesis of the independence of the
j J
A and B classification. We shall consider two approaches to this problem. The first
(10.U1 ) is a pure combinatorial approach based on partition theory in which the set of
all possible partitions of n into rs components n^^ satisfying the marginal conditions
listed above are investigated. The second approach (10.1*2), which is Karl Pearson 1 s orig-
inal treatment of the problem, is an application of the theory of sampling from a multi-
nomial population consisting of the rs classes (A^B^) i - i,2,.. f r; j - 1,2,.., a.
10.41 The Partitional Approach
In this section we shall consider the problem of determining the number of ways
of partitioning the integer n Into rs integers (or zero) n (i-1,2,...,r; J-1 ,2, . . .,a)
ON GQMBIMATQRIAL STATISTICAL THEORY
41Q.JH
such that 5.1^. - n^ and 2L_njj n . are fixed. The technique discussed In 810.3 can
be extended so as to accomplish this enumeration. We shall then find the mean values of
certain functions of the n 1 , over this set of partitions.
We may represent the n, n , n
,
n in the following contingency table :
Total
(a)
n n n 12 . . . . n 1a
n i.
n 21 n 22 .... n 2a
n 2.
n r , n r2 . . . . Hpg
n r.
Total n , n . . . . n _
1 . c. .9
n
Consider the enumerating function
(b)
6 4 . n.
1-1
of which
vhich is the product of n factors, n- of which are 2-X^e >
1 ' 1 J
and so on. A typical term in the expansion of this product of n factors is of the form
(c)
where
) . (x.
la the number of tlmea
n l2 9 12
Is taken from the
n is*ls
),
ru
*
factors
..
J
" n
Z"s~
n^ 1 n ^
1
corresponds to one way of partitioning n Into the set n^.* so that 2L^ 4 " n 1
Z-
To find the total number of ways of partitioning n Into the given set
we must determine how many individual terms in the expansion of (b) are identical with
In other words we are to find the coefficient of
(d)
T-T n
J ' Xj'
in the expansion of (b).
JEL 6 n n i
Expanding each of the terms (2_x^e J ) , 1 - l,^,.^r, by the multinomial law
and multiplying the results and taking the coefficient of the expression (d), we find at
once that the number of partitions of n into the sets of values nu ., subject to the mar-
ginal conditions
- n
j,
, Is
OK CQMBIMATQRIAL STATISTICAL THEORY
The total number of ways of partitioning n, subject to the marginal conditions mentioned
above, is
nl
Therefore the probability of partitioning n into the particular set of values n^ ^assum-
ing all ways of making partitions (subject to the marginal conditions) equally likely, is
given by the ratio of expression (e) to expression (f ).
The moments of the n^j may be found directly from the probability law of the
Consider first the problem of determining the h-th factorial moment of a particular n^
say n^. We have
*>
where 2_ denotes summation over all values of the n^* subject to the usual marginal con-
ditions. Now when h - 0, we know that the right hand side of (g) is simply the sum of the
probability function of the n^, over all possible values of the n^ and is therefore
unity, which amounts to the statement that
(h)
Now the numerator on the right hand side of (g) may be written as
where nJ - n, for all 1 except i - a and n f (n -h) and nJ j n^ for all i, j ex-
1 1 GL Qt XJ J. J
cept for 1 a, j p and n^ = n a -h. Now perform the summation Indicated in (i) over
all values of the nJ . subject to the conditions ZLnJ . nJ and / nJ . = n f , where
ij v j ij i. ^ ij .j
n ! 4 n 4 except when j - p and n f a n _-h. It follows from (h) that the value of this
j j -p v
sum is
Therefore we have
X, ON COMBINATORIAL STATISTICAL THEORY *1Q.J1
y-
It Is clear that h must be less than each of the numbers n and n ~. For h = 1 and 2
we have
n n A
Edi^)- -^-
(k)
(2)J2)
Hence
(1)
n
By a similar argument one can find joint factorial momenta of two or more of the
For example,
(h) (g)
n .r5 n .<5 a -7
A similar expression holding for ot ^ y, ft = <5. The restrictions on the size of g and h
are obvious. These moments can also be found directly from the enumerating function <f> by
carrying out appropriate differentiations on the e^, then setting the e f s o and col-
lecting appropriate coefficients.
The criterion which Karl Pearson def 'ned for testing the hypothesis of row-
coluran Independence In r by s contingency tables Is defined as follows
which Is a quadratic form In the n^. It should be noted that % Is simply the sum of the
squared differences between each n^ and Its mean value (under the assumption of Inde-
pendence or "randomness" ), each squared difference weighted Inversely by the mean value of
the corresponding n^,. This Inverse weighting scheme suggests Itself fairly readily In
the Pearson approach to be considered In 10.1*2. The mean value of y may easily be
found by making use of formulas (k) and (1), and Is
X. ON COMBINATORIAL STATISTICAL THEORY 217
(o)
By using formulas ( j) and (m) for the appropriate values of g and h, the variance and
P
higher moments of x ma y "bs found.
10.42 Karl Pearson's Original Chi-Square Problems and Its Application to
Contingency Tables,
Suppose FT Is a multinomial population In which each element belongs to one and
only one of the classes C., ,C 2 ,. . .,0^. Let p 1 ,p 2 , . ..^ (zIpj-1) be the probabilities
associated with C^C^,...,^ respectively. In a sample of size n let n 1 ,n g , .. ,,11^ be the
numbers of elements falling Into (^ ,C 2 , ...,0^. respectively. We have seen (53.12) thAt the
probability law of the a. Is
(a) n! n n i n 2 J\
n^iigJ...!^] P 1 P 2 '"Pk '
It was shown In 3.12 that E(n^) np^. In view of the Central Limit Theorem (4.21 ) It
Is clear that the limiting distribution aa n * oo of each of the quantities
n i
( "
is N(0,i). Now let us investigate the limiting joint distribution of the set
(n i -np 1 )
Since X ** on ^Y k - 1 of the x, are functionally independent. It is sufficient to
1 ^ A
consider the limiting joint distribution of the first k - 1 of the j^. The m, g. f . of
^1 '^2* " * * *^k~ 1
Zle^i v Zle^-np^/ln n ,
(b) <t> - E(e ) = Z_(e rt n l^TPi >
nnj. j_ l
/ 1 j
(PT
Expanding each of the exponentials in ( ) and taking logarithms, we have
X. ON CCMBINATOBIAL STATTgPTnAL THEORY
810.42
log 4> - -
n log (i + 2_ -^ + 2_ -
1 Vn i 2n
Lira d> - e f
oo
Therefore we have
(d)
where A 1 ^ - PJ^JJ " p l p j' 1 'J " i2,...,k-i, where dj, - 1, i - j, and o, i ^ J. Making
use of the multivariate analogue of Theorem (C), 2.81, it follows that the limiting prob-
ability element for the distribution of the x is
where
^^ .
" 1
A ij x i x j
J
. It may be readily verified by the reader that
and hence
(g)
~
-
p i
We have seen, (5.22), that if x ,x ,...,x, - are random variables having distribution (e)
fc..-| 1 d *- I
then ^Z A ii x i x i is distributed according to a ^ 2 -law with k - 1 degrees of freedom.
Now if we -replace x by (n^-np^/ VTT in (g) denoting the result by -^ 2 , we obtain
(h) X 2 -
We conclude that the limiting distribution of ^ is identical with the distribution of
2ZA^.x,x, where the x^ are distributed according to (e); that is to say, the limiting
distribution of the expression in (h) is the ^ 2 -law with k - 1 degrees of freedom. A rig-
orous proof of this statement is beyond the scope of this course, but It is a consequence
of the following theorem which will be stated without proof:
$10.42 _ X. ON COMBINATORIAL STATISTICAL THEORY _ : _ 21Q
Theorem (A): Let xj n *,x n ),...,x n ' be random variables having a Joint c. d. f .
for each n greater than some n Q . Let the limiting joint c. d. f , as n > OD be
P(x 1 ,x 2 ,...,x r ). Let g(x^ 9 x 2 ,... 9 x r ) be a (Borel measurable ) function of x 1 ,x g , ...,x r .
Then the limiting c. d. f . say P(g) of g(x| n ',x^ n ', . . . ,x n ' ) as n * oo I.B given b
P(g) - IdFCx^Xg,...^), .
R
where R Is the region In the x apace for which g(x.,x 9 ,...,x ) < g.
i^BMMMc ------ - . - . - - - - - - ^ - j ^ ^ _
We may sunmarize our results therefore In the following theorem, which is, in
fact, a corollary of Theorem (A):
Theorem (B)t Let be a sample of size n from the multinomial distribution
(a). Then the limiting distribution of ^T (i^-np^ ag n _ QQ j^g the ^-distribution
1 np,
with k - 1 degrees of freedom ,
Now let us consider the contingency problem described in the introduction of
510. if. In this, case the multinomial population consists of ra classes (A^B*) i - i,2,..r;
-j 1,2,.. ,s. Let the probability associated with (A^B^) be pj^tlETpif' 1 ) It follows
at once by Theorem (B) that
(1)
has as its limiting distribution for n * oo, the ^ 2 -law with ra - 1 degrees of freedom.
If the p^., were known a priori, then the test given in (1) could be used for testing the
hypothesis that the sample originated from a multinomial distribution having these values'
of p.. j. If the A and B classifications are independent in the probability sense then
1J r s
P4 4 ~ PI^I (^ Pi" 1 > 21 Qf* 1 ) I* 1 the p^ and q were known a priori then (i) with p^ .
"* PiQ-f can, of course, be used to test the hypothesis that the sample came from a multi-
nomial population with probabilities P^q^-
But suppose neither the p^ nor q^ are known a priori, and that we wish merely to
test the hypothesis of independence of the A and B classifications. Karl Pearson proposed
the following test for this hypothesis
U) x i
where the n 4 and n . are defined in 10.1*1.
i j
If we let
220 ; _ X, ON CQMBIRATQRIAL STATISTICAL THEORY
and express the n, in ( j) in terms the x, we obtain
00 X* -
where x - ^x^ and X<J - x ir
By following an argument similar to that used in determining the limiting dis-
tribution (e) of the x^, 1 - 1,2,.,.,k-1, we may find the limiting distribution of the Xjj
(all i,j except i-r, js) to be normal multivariate. From this limiting distribution one
finds that the distribution of y^(x^ .-x^ q.-x iP^/P^} ia the A. ? distribution with
(r-1)(a-1) degrees of freedom. By an argument similar to that embodied in Theorem (A) we
may make the following statement:
Theorem (C); Let be a sample from a multinomial population with the- mutually
aaHRHB&SnSKaSMBBMV J^ " "" """ ~ *~- ~- - -"- '
exclusive classes (AjBJ i l,2,...,r; j - l,2,...,s, In which the probability asaoc-
iated with (A^,) is P^Qj. Let -^ be defined as in ( j). Then the limiting distribution
Xo n * t* 16 ^-distribution with (r-1 )(s-i ) degrees of freedom.
The reader may verify that the likelihood ratio criterion for testing the hypo-
thesis specified by
n: Plj > 0,
"=
that la, the hypothesis that the A and B classifications are Independent is given by
It follows from Theorem (A), 7.2, that when the hypothesis of independence is true, the
limiting distribution of -2 log A is the -^-distribution with (r-1 )(a-l ) degrees of free-
dom.
10.3 Sampling Inspection
In a mass production process, suppose articles are produced in lota of N arti-
cles each, and suppose each article, upon inspection, can be classified ag defective or
non-defective. It is often uneconomical to carry out a program of 100% inspection. As
an alternative, sampling methods of inspection applicable to each lot have been developed
which have the property of guaranteeing that the percentage of defectives remaining
S10.51 X, ON COMBINATORIAL STATISTICAL THEORY gg1
after applying the sampling inspection procedure ^in the long run (i. e. to a large number
of lots) is not more than some preassigned value. Such sampling methods have been devel-
oped and put into operation by Dodge and Romig* of the Bell Telephone Laboratories. It
should be pointed out that these sampling methods are essentially screening devices for
reducing defectives after production, and are not devices for removing the causes of de-
fectives. Methods for detecting the existence of causes of such -defectives must be intro-
duced further back into the production operations. In particular, statistical quality
control methods originally introduced by Shewhart, have been found uaeful in connection
with thia problem.
The mathematical problem involved in sampling inapection ia one in combinatorial
atatiatlca. Dodge and Romig have developed two typea of inapection aampling, aingle
aarapling and double sampling, which will be conaidered in turn. Prom a mathematical
point of view, many sampling inspection achemea can be devised which guarantee quality of
outgoing producta in the sense mentioned above.
10.51 Single Sampling Inspection
Let p be the fraction of defectives in a lot 1^ of size N. The number of de-
fectives will be pN. Now let a sample O n of size n be drawn from L^. Giving all possible
samples of size n equal weight, the probability of obtaining m defectives (and n - m non-
defectivea or conforming articles) in is
P m,n; P N,N ^ * > W ,2, . . .,
where r is the smaller of n and Np. Let
(b) P(c;p,N,n) -
It is easy to verify that if any two values of p and p 1 (pN and p ! N being inte-
gers) are such that p < p 1 then
(c) F(c;p,N,n) > F(c;p',N,n).
*
H. P. Dodge and H. G. Romig H A Method of Sampling Inspection", Bell System Technical Jour-
nal, Vol. VIII (1929) and "Single Sampling and Double Sampling Inspection Tables?, Bell
System Technical Journal, Vol. XX (19^1 )
**See "Guide for Quality Control and Control Chart Method of Analyzing Data 11 O9 1 *!) and
"Control Chart Method of Controlling Quality During Production" (19^2), American Stand-
ards Association, New York.
X. ON COMBINATORIAL STATISTICA
Let p t be the lot tolerance fraction defective, 1. e. the maximum allowable fraction de-
fective In a lot, which is arbitrarily chosen in advance (e. g., .01 or .05). Let
PQ- F(c;p t ,N,n).
P C is known as the consumer's risk; It is (approximately) the probability that a lot with
lot tolerance fraction defective p t will be accepted without 100* inspection. It follows
from (c) that if the lot fraction defective p exceeds p t then the probability of accepting
such a lot on basis of the sample is less than the consumer's risk. The probability of
subjecting a lot with fraction defective actually equal to p (process average) to 100%
inspection is
(d) P p - 1 - F(c;p,N,n),
which is called producer f a risk. It will be noted from (c) that the smaller the value of
p, the smaller will be the producer's risk.
The reader should observe that producer risk and consumer risk are highly anal-
ogous to Type I and Type II errors, respectively, (see 57.3 ) in the theory of testing
statistical hypotheses as developed by Neyman and Pearson. In fact, historically speaking
the concept of producer and consumer risks In sampling Inspection may be considered as the
forerunner of the concept of Type I and Type II errors In the theory of testing statisti-
cal hypotheses.
Now suppose we make the following rules of action with reference to a sampled
lot where c is chosen for given values of P C , p t , N, n:
(1 ) Inspect a sample of n articles.
(2) If the number of defectives in the sample does not exceed c, accept the lot.
(3) If the number of defectives in the sample exceeds d, Inspect the remainder of
the lot.
(M Replace all defectives found by conforming articles.
Now let us consider the problem of determining the mean value of the fraction
defectives remaining in a lot having fraction defective - p, after applying rules of
action (1 ) to (k).
The probability of obtaining m defectives in a sample of size n is given by (a).
If these m defectives are replaced by conforming articles and the sample is returned to
the lot, the lot will contain pN - m defectives. Hence the probability of accepting a
lot with pN - m defectives is given by (a), m- 0,1,2,...,C. The probability of Inspect-
ing the lot 100% is 1 - P(c;p,N,n), which, of course, is the probability of accepting a
X. QM COMBINAJORIAL STATISTICAL THEORY
lot with no defectives. Therefore the mean value of the fraction of defectives remaining
after applying rules ( 1 ) to ( * ) is
(e) ^ " ^5 (JL ^ ) P m,n;pN,N*
The statistical interpretation of (e) is as follows: If a large number of lots
each with fraction defective p are inspected according ^to rules (j_) to (J*J, then the
average fraction defective Iji all of these lots after inspection ija p. For given values
of c, n, and N, p is a function of p, defined for those values of p for which Np is an
integer, which has a maximum with respect to p. Denoting this maximum by j^, it is
called average outgoing quality limit. It can be shown that the larger the value of p
beyond the value maximizing p, the smaller will be the value of p. The reason for this,
of course, is that the greater the value of p, the greater the probability that each lot
will have to be inspected 100*. If the consumer risk,n, and N are chosen in advance,
then, of course, c and hence PT is determined. Thus, we are able to make the following
statistical interpretation of these results:
If rules (J_), (_2), (i) and (k ) are followed for lot after lot and for given
values of c , n, N, the average fraction defective per lot after inspection never exceeds
PL* no matter what fractions defective exist in the lots before the inspection.
It is clear that there are various combinations of values of c and n, each
having a p with maximum p^ (approximately) with respect to p.
The mean value of the number of articles inspected per lot for lots having frac-
tion defective p is given by
(f) I - n + (N-n)d-F(c;p,N,n)),
since n (the number in the sample) will be inspected in every lot and N - n (the remainder
In the lot) will be inspected if the number of defectives in the sample exceeds c.
Thus, we have two methods of specifying consumer protection; (i) Lot quality
protection obtained by specifying lot tolerance fraction defective p^ and consumer's risk
P C ; (11) Average quality protection in which average outgoing quality limit p^ is
specified.
By considering the various combinations of values of c and n corresponding to a
given consumer's risk (or to a given average outgoing quality limit) there is. in general,
a unique combination, for a given p and N, for which I is smaller than for any other.
Such a combination of values of n and c together with a value of p as near to its actual
value p in the Incoming lots as one can "obtain" is, from a practical point of view, the
224 X. ON COMBINATORIAL STATISTICAL THEORY >10.32
combination to use since amount of inspection is reduced to a minimum.
Extensive tabulations of pairs of values of c and n, for consumer's risk - 0.10,
for values of N from l to too,ooo,for lot tolerance fraction defective from .005 to .10,
and for process average from .00005 to .05, all of the variables broken down into suitable
groupings, nave been prepared by Dodge and Romig. They have also made tabulations of pairs
of values of x* and n for given values of outgoing quality limit p^ from .001 to .10, for
values of N from 1 to 100,000 and for values of process average from .00002 to .10. Num-
erous approximations have been made to formulas (a), (b), (d), (e) and (f ) for computa-
tion purposes, which the reader may refer to in the papers cited. For example, it is easy
to verify that the Poisson law e"^ n (pn) m /ml is a good approximation to (a) if p and ^ are
both small, say <0.10.
10.32 Double Sampling Inspection
In double sampling inspection from a given lot of size N, the procedure for
taking action regarding a given lot is aa follows:
(1 ) A first sample of size n 1 is drawn from the lot.
(2) If the number of defectives is c^, the lot ia accepted without further samp*
ling.
(3) If the number of defectives in the first sample exceeds c 2 inspect the remainder
of the lot.
(4) If the number of defectives in the first sample exceeds c 1 but not c 2 , inspect
a second sample of n g pieces.
(5) If the total number of defectives in both samples does not exceed c 2 , accept the
lot.
(6) If the total number of defectives in both samples exceeds c 2 , inspect the
remainder of the lot.
(7) Replace all defectives found by conforming articles.
As in the case of single sampling, we have two kinds of consumer protection:
(i) Lot quality protection, and (11) Average quality protection.
Consumer risk, the probability of accepting a lot with fraction defective p t
without loox Inspection, is given by
The single sum in this formula is simply the probability of accepting the lot on basis of
the first sample (1. e. Step (2)) and the double sum is the probability of accepting the
S9 X. ON COMBINATORIAL STATISTICAL THEORY 225
lot on basis of the first and second samples combined (1. e. Step (5)), after having
failed to accept on basis of the first sample alone.
The mean value of the fraction of defectives per lot remaining after the defec-
tives have been removed by the double sampling procedure, for lots having fraction de-
fective p originally, is given by
c i W 1
)(P)(P .-- ) -
=3 N c l ^i,n l ;pN,Nm.n 2 ;pN-C r i,N-n 1
The mean value of the number of articles inspected per lot for lots having
fraction defective p is
C 1
(c) I - n, + n 2 0-|lP m>rVpN|N ) + (N-n r n 2 )0-P a ),
where P & is the value of the probability given in (a) with p t replaced by p.
For given values of N I , n 1 , n g , c 1 , c 2 , it is clear that p is a function of p,
defined for those values of p for which Np is an integer, and has a maximum value p^,
the average outgoing quality limit . For a given value of N there are many values of n^ ,
n , C- , c_ which will yield the same value of PT (approximately), or will yield the same
consumer risk (approximately) for a given lot tolerance fraction defective. Dodge and
Romlg have arbitrarily chosen as the basia for the relationship between n f s and the c ! s
the following rule: To determine n 1 and n 2 such that for given values of c 1 and c 2 ,
n- and c. (as sample size and allowable defect number) provides the same consumer risk
(approximately) as n 1 + n 2 and c 2 (as sample size and allowable defect number). The sense
In which "approximately" la used la due to nearest integer restrictions. Even after this
restriction there la enough choice left for combinations of n 1 , n 2 , c^ q g to minimize I
as given by (c). To determine the n f a and c's under these conditions for given N, for
given consumer riak, (or average outgoing quality) involves a considerable amount of com-
putation. Dodge and Romlg have prepared tables for double sampling analogous to those de-
scribed at the end of 10.51 for single sampling.
For a given amount of consumer protection, a smaller average amount of Inspec-
tion Is required under doubling sampling than under single sampling, particularly for
large lots and low process average fraction defective p.
CHAPTER XI
AN INTRODUCTION TO MULTIVARIATE STATISTICAL ANALYSIS
A considerable amount of work has been done In recent years In the theory of
sampling from normal multivariate populations and in the theory of testing statistical
hypotheses relating to normal multivariate distributions. The two basic distribution
functions underlying all of this work are the sample mean distribution (e) in $5.12, and
the Wishart distribution, (k) in 55.6, of the second order sample moments. We have given
a derivation of the distribution of means (55.12) and a derivation of the Wishart distri-
bution for the case of samples from a bivariate normal population, (55.12). The general
Wishart distribution was given In 55.5, without proof.
In the present chapter we shall present a geometric derivation of the Wishart
distribution, and consider applications of this distribution in deriving sampling distri-
butions of several multivariate statistical functions and test criteria. The few sections
which follow must be considered merely as an introduction to normal multivariate statis-
tical theory. The reader interested in further material in this field is referred to the
Bibliography for supplementary reading.
11.1 The Wishart Distribution
In 55.5, we presented a derivation of the joint distribution of the second order
moments in samples from a bivariate distribution. The general Wishart distribution was
stated in (k) of 55.5. We shall now present a derivation of this distribution.
* " *' ''* **' a
n observations from the k-variate population having a p. d, f .
a
-
(2tt) k / 2
*John Wishart, "The Generalized Product Moment Distribution In Samples from a Normal Multi-
variate Population", Biometri ka , Vol. 20A, pp. 32-52. A proof based on the method of
characteristic functions has also been given by J. Wiahart and M. S. Bartlett, "The Gen-
eralized Product Moment Distribution", Proc. Camb. Phil. Soc., Vol. 29 (1933) pp. 260-
270.
HUt XI. AN INTRODUCTION TO MULTIVARIATE STATISTICAL ANALYSIS 227
where A la the determinant -of the positive definite matrix I |A 1 J | . Let
Clearly b, * - bj^, so that there are only k(k+i )/2 distinct b^*. The b^^/n may be re-
ferred to as second order sample moments. Our problem Is to obtain the joint p. d. f . of
the bj. The Joint p. d. f . of the x a (1 - 1,2, . . . ,k; a- 1,2,.. . ,n) Is given by
* n/g
( C) (8ir) (nic)/a " (2ff) <nvc)/ 2
Now, the probability element of the b. , la given by
A n / 2 r " i i A ij b iJ
' 2 ..... k) " (gtr) (nk)/ g
where R Is the region In the kn-dlmenslonal space of the x 1a for which
(e) b<<
within terms of order [ | db^.., the probability given by (d) may be written as
A n/2
(f )
Our problem now reduces to the Integration of ^ia over the region R. Let f 1 (b n )db n
n p l,a
be the volume element for which b^ < 2L x^ a < b n -H db n ; f 2 (b 21 ,b 22 |b 1 1 )db gl db 22 the
volume element for which b 2i < ^Ix 2a x ia < b 2l + db 2l , 1 - 1,2, for a fixed value of b n ;
with a similar meaning for f,(b 51 ,b 52> b 55 lb n ,b 21 ,b 22 )db 51 db 52 db 55 , and so on. Then the
volume element for which b^j < ^* la Xj a < b^j + db ij' that ls * the ^tegral In (f ) (to
terms of order If db. 4 ) Is given by the product
KJ 1J
(g) f 1 (b n )db n f 2 (b 2l ,b 22 lb l1 )db 21 db 22 ..
Now, consider the problem of determining the expression for
f m (b ml ' b m2' >*m^l 1 ' " Vl ,m-i )db ml db ra2-
228
XT. AN TMFRQDUCTIQN TO MIMTVARIATE STATISTICAL ANALYSTS
We note that
ia x 1a>
" i,2,...,m-i, are fixed. Geometrically, we may repre-
- 2- x
sent P I (X II ,X IS ,.. .^x^), 1 - l,2,...,k, aa k points in an n- dimensional apace.
ia the diatance between the i-th point P and the origin 0, while "b*/ Vbbi ia the
coaine of the angle between the vectors OP^ and OP... Fixing
means fixing the relative position of the vectors OP 1 , OP 2 , . . . ,OP
free to vary in such a way that
- 1 ,2, . . .,ra-i
j . The vector OP m is
(i)
mi
mi
db
mi'
and we wiah to find the volume of the region over which P m is free to vary. If n - m,
we have aa many vectora aa dimensions and we can find our volume element by making the
transformation
The Jacobian la
(i - 1 ,2, . . .,m),
(J)
where
X 11 X 12 * ' " x im
X 21 X 22 ' * ' X 2m
2x ml
. . . 2X M
The abaolute value of the determinant* |x^ J is the volume of the irallelotope baaed on
the edgea OP^ OP 2 ,...,OP m . By taking the positive square root of the square of this
determinant, we may overcome the difficulty of sign. Thus
and hence
Therefore, we have
- Ib,
A - 2
(k)
cr-1
Hence the differential element on the right in (k) obtained by taking all values of
for which
_ iuv* xs* mo. ml
is a function of the volume of the parallelotope and the differentials db^ in the values
of the b^./
It can be shown that Ibj^ J is the volume of the parallelotope T m , based on the
edges OP 1 ,OP 2 ,...,OP m , for any number of dimensions n ^ m. If n exceeds m, then P m is
free to vary within an (n-m+i )-dimensional spherical shell, as will be noted by examining
the inequalities in (i). One of these inequalities (i-m) represents an n-dimensional
spherical shell of thickness db^, the remaining inequalities representing pairs of paral-
lel (n-1 )-dimensional planes, where in general no two pairs are parallel to each other.
The volume included between any arbitrary pair of planes, e. g.,
JL
"" ^mi + ^rai ^^ m ^ * 8 an N~ dimensional slab of thickness db^/ Y^ii- ^ e inter-
section of the (m-1 ) pairs of (n-1 ) -dimensional planes and the n-dimensional spherical
shell yields an (n-m+1 )-dimensional spherical shell. Now the inner surface of this shell
(or any spherical surface concentric with the inner surface) is perpendicular to the
differentials db , db m^ > jdb ^ . This is evident upon examining the manner in which
the (n-m+1 )-dimensional spherical shell mentioned above is obtained as the conmon inter-
section of the m-1 parallel pairs of (n-1 ) -dimensional planes and the n-dimensional
For example, see D. M. Y. Sommerville An Introduction to the Geometry of n Dimensions ,
Methuen, London (1929) Chapter 8.
There is also another geometrical interpretation of |b^ J for any n ^ m, which is
of considerable interest. The x, a (i = l,2,...,m; <* i,2,...,n) may be regarded as n
points P a (a = i,2,...,n) in m dimensions. If we take any m of these n points*, say
P : (x, , i = 1 ,2,. ,.,m),(r - i,2,...,m) together with the origin as the (m+1 )-st
a r r
point, then the square of the volume of the parallelotope based on OP a ,OP a ,...,OP
m 12m
is given by l2LX 1a x lA I. This follows from the discussion between (i) and (j). Now
r*-i * r J r
there are C ways of choosing m points from the n points P a , and hence there are n C m
parallelotopes which can be formed in a manner similar to that discussed above. It can
be shown that !b| - ! x x. I, where f denoted summation for all n C m paral-
ij - la .j a
lelotopes thus formed. The proof of this follows by mathematical induction by increasing
the number of points from m to n successively by unity. -In the case i j 1 , we have n
points in one dimension and ^ J * b 1 1 ^ x i a > the 3Um of 3 Q uarea of the distances of
each point from the origin. In the case oPl,j = l,2,...,k, Ib^jl is the sum of squares
of the volumes of all n C m m-dimenslonal parallelotopes which can be constructed from the
n given points, using the origin as one vertex in each parallelotope. |bj.| may be
referred to as the generalized sum of squares .
.220 XI. AN INTRODUCTION TO MUMIVARIATE STATISTICAL ANAI^SIS fcLJLJ-
spherical shell. Therefore, the rectangular volume element db ml db m2 ,...,db nsn Is perpen-
dicular to the Inner surface of the (n-m+1 )-dimensional spherical shell. The thickness
of this shell is given by the differential element l/(2 YlbjJ ) Mdb^ in (k). Therefore,
by multiplying this thickness by the inner surface content of our (n-m+ 1 )-dimensional
JJL
shell, we obtain the volume of the shell to terms of order Tldb^. The radius of this
inner surface is equal to the distance h from P m to the (m-1 J-dlmensional space formed by
OP. ,OP 2 , . . .,OP j. This is, perhaps, seen most readily by noting that the inner surface
of our shell is obtained by taking the intersections of dx^x^ - b^, 1 - l,2,...,m
(we are assuming, of course, that all db , are > o). The cente? of the sphere having
this surface must clearly lie in the Intersection of the (m-1) (n-1 )-dimensional planes
a "* ^mi' * ** 1*2,..., m-i. This intersection point lies on each of the vectors
OP 1 ,OP 2 ,...,OP nM ,and the line between this point and P m is perpendicular to each of the
first m-l vectors, which is equivalent to the statement that the center of the (n-m+1 )-
dimensional shell is at the point where the perpendicular from P m intersects the (m-1)-
dimensional space formed by the remaining m-1 vectors, OP 1 ,OP 2 ,...,OP m-1 . The volume
of T , the parallelotope formed from OP.,OP 2 ,...,OP , la Vlb77| - V , say, and that of
T m-1 , the parallelotope formed from OP 1 ,OP 2 , . . .,OP m-1 , IsVlb-l - v m -i* a *P " 1*2,..., m-1.
Using T - as the base of T and h as the height, we must have V = W i, or h -
Wr
Now the volume of an n- dimensional sphere of radius r ia
00
and the surface content of the sphere la obtained by taking the derivative of this ex-
pression with respect to r, which la found to be
n
(m) 2 nri .
The integral in (1) may be readily evaluated by integrating immediately with
respect to x , then setting
x n-i "
1,2,..., n-1, and integrating with respect to the appropriate 6 at each stage.
The surface content of the inner surface of our spherical shell is therefore
$11.1 _ XI. AN PEPRQDUCTION TO MULTIVARIATE STATISTICAL ANALYSIS _ 331
n-m+i
5 V
tn\ SIT / m xn-m
and the content of the spherical shell is obtained by multiplying expression (n) by the
thickness -5*7- || db ,. Therefore, we finally obtain as the expression for the function
m 1-1
in (h),
n-m+i
~~~
Letting m take on the values i,2,...,k in (o) and multiplying the results, we obtain the
9
following expression for (g)
kn _ k(k-l
(P) **
1=1
which la the value of TTdx*,. in (f ) to terms of order 1 I db < ^. We therefore finally
R l,a la
obtain the Wlahart distribution:
n n-k-l
TT (A/a
nk
n,k
TT (A/an t
(q) w(b > A > db - e Mdb,,,
which Is defined over the region In the b t . apace for which I Ibj .| ' is positive semi-
definite, that Is, over all values of the a^, for which Ib^J and all principal minors of
all orders are ^ o. In order for the distribution to exist it is clear that n+1-k > 0.
Since -r-r
where the integration is taken over the space of the b,, it is clear that
r 1 MHC"i 1 J 1 J"TT if
(r) ilb^l 2 e 1 'J ss1 Tldb, , - -
, c . , , uu ^^_
i^j J ^
(A/2 k )
Replacing Aj, by A^^ - 26^ (e^^ - 6j 1 ) in (r) then multiplying the result by
n
23g XI. AN/ INTRODUCTION TO MUI/TIVARIATE STATISTICAL ANALYSIS 11. 2
we obtain the m. g, f. of the b^ and 2b^i (i<j), which has the value
n
(q \ A I A O A
a / f\ I f\j ^"4
n
x la an(i 2 2- x ia x ia^ i< ^^ as
determined from (c) by multiplying (c) by
n
2
and Integrating over the entire kn-dimenaional apace of the x'a ia alao given by (3).
Therefore, If one were given the function (q) in advance, one could argue by the multi-
variate analogue of Theorem (B), 2.81, that It ia the diatribution function of the
f"*;H) w ^ ere ^ e P- d - f- of the x i a i3 gl ven toy (c).
The Wiahart diatribution (q) may be regarded aa a generalization of the x 2 -
distribution to the case of vectors with k- component a. In fact for k - 1, the quantity
P
A^bj. ia distributed according to the x. -diatribution with n degrees of freedom. In this
case b n ia the aum of aquarea of the n aample values of x^ while in the k-variate case
bj. Is the aum of aquarea of the n aample values of the x, (the 1-th component of the
vector Xj,x 2 ,.. *>\) and b^ (i^j) is the inner product or bilinear form between the n
sample values of the x^^ and x,. Aa In the case of the ^-distribution, the Wiahart dis-
tribution has a reproductive property to be considered in the next section.
11.2 Reproductive Property of the Wiahart Diatribution.
The reproductive property of Wiahart distributions is very useful in multivari-
ate statistical theory, and it may be stated in the following theorem:
Theorem (A). Let b[l^,b[j\ . . .,b[^ (lj l,2,...,k) be p aystema of random
variables distributed independently according to Wishart distributions (p. d. f . '^)
Wj, >k (tojj ) ;A lj .), (t - i,2,...,p),
respectively. Let ^ . = 5Zb[V, n Xl^. Then the b^i are distributed according to
w" I U"* I
the Wiahart . d. f .
To prove this theorem, we determine the m. g. f, <f>(e. J, (6^=6..), of the.b
1 J ^-J Jl ' ii
and 2b (i<j). We have
811.5
XI. AN INTRODUCTION TO MUUIVARIATK STATISTICAL ANALYSIS
233
(a)
But
(b)
- E(e
- E(e
) - A
.....
E(e
and therefore
n n
which la the m, g. f . for the Wiahart p. d. f .
which we conclude, "by the raultlvariate analogue of the Theorem (B), 2.81, to be the dis-
tribution of the b, ^ (-!TblV)-
ij ^ ij
11.3 The Independence of Means and ^Second Order Momenta in Samples from a
Normal Multivariate Population
Suppose O n (x ia , li,2,...,k; a-i,2,...,n) is a sample from the normal raultivar-
late population having p. d. f .
(a) f(x )
1
The p. d. f . of the sample is
-
2
(2TC)
(kn)/2
where c^ - 2l ^io^i'^ja"^^
(c)
where
P^lt XI. AN INTRODDCTION TO MULTIVARIATE STATISTICAL ANALYSIS I11l
The a,, are distributed according to the Wlahart distribution (q), $11.1, with
n replaced by n-1 . It waa shown In 55.12 for the case k*2 that the x^ are distributed
according to the normal blvarlate law (d), 5.12, and it was remarked that in the general
case, the distribution of the x^ is given by (e), 55.12. The proof of (e), 55.12, may be
carried out by evaluating the m. g. f. of the (x^-a^), 1. e.
) - \ e 1 fU^
(d) E(e
where f U ia ) is given by (b), the integration being over the entire kn-dlmensional space
of the X . The evaluation of this integral may be carried out as an extension of the
case of k*2, 55-12. The details are left to the reader. In order to show that the a**
have the Wishart distribution with n replaced by n-1, it is sufficient to show that the
ra. g. f. of the a 1 and sa lj (i^J) is A (n " 1 ) / 2 |A l . J -2e l jr ((n " 1 J / 2) . The problem of doing
this is a direct extension to the k-variate case of the procedure followed for k>-2 in
55-5. We shall have to leave the details to the reader.
Just as in the 1 and 2 variable cases discussed in 55.6, the a^. and x^
are independently distributed systems. A fairly direct verification of this, although
tedious, is to evaluate the joint m. g. f . of the a^ and x^ and note that it factors.
11.4 Hotelling'a Generalized "Student" Test *
Suppose a sample O n is drawn from a normal raultivariate population with distri-
bution (a) in 11. 3, and that it is desired to test the hypothesis Hta^-a^ ) that the a,
have specified values a io (i-1 ,2,...,k), no matter what values the A^ may have. This
hypothesis may be specified as follows:
j A 1 1 such that | |A 1 J I is positive definite
A: <
^and -oo < a^ < +00, 1 - l,2,...,k.
(a) L
The subspace of A for which a, - a^ ,
^ I!, 2,... ,iC.
It will be noted that this is the k-variate analogue of the "Student" statistical hypoth-
esis discussed in 57.2 for one variable, which is simply the hypothesis that a sample from
a nonnal population comes from one having a specified mean, no matter what the variance
may be.
The likelihood function for testing the hypothesis HU^-a^) is given by (b)
in 511.3.
J11.lt _ H. AH mTRODqCTIOH TO MOUTIVARIATE STATISTICAL ANALYSIS _ 21.
Maximizing the likelihood function for variations of the A and a over XI,
ire find
(b) a i-*i' " X ij" -
and hence the maximum of the likelihood for variations of the parameters over n is
1 --ink
nk a n
Similarly, the maximum of the likelihood for variations of the parameters over
(1. e. for variations of the Ai and for -B) ia found to be
(d)
(a*) 2
where c Ql , - c,, In (b), 511.5, with a^ - a^ Q .
The likelihood ratio for testing Hta^-a^) Is the ratio of expression (d) to
expression (c), 1* e.
(e) A
n
la^l 2
lc oi/
Clearly, we may use A n - Y,say, as a test criterion for H(a.-a. ) since it is a single-
value function of A. To complete the derivation of our test, we must determine the dis-
tribution of Y when H(a 1 a io ) is true. We shall obtain this distribution by first finding
its moments. Now, we know from 11.1 that the joint distribution of the c ^* is the
Wishart distribution
n- n-k-l
The g-th moment of Ic j^l Is obtained In the following way. Since the Integral of the
function (f ) over the space S G of the C, is unity, we have
XI, AN INTRODUCTION TO MULTWARIATE STATISTICAL ANALYSIS
(g)
TT
KJ
Replacing n by n+2g in (g), then multiplying by
n
(A/2 k ) 2
1
we obtain an expression on the left which defines E(|c
right. That is
(h)
But the c .. are functions of the
and x, since
(1)
Therefore,
k l n-1
(j)
c olj
g)
(A/2 k ) 2
and its value is given on the
-aj)
.
Dividing both members of ( j) by the expression in [ ], then replacing n by ri+2h, except in
the distribution of the x.^ (the n's here being easily removable by changing variables
VnCx^a.^) = y 1 sayl then multiplying the resulting equation by [ J, we obtain, as the first
member, an expression defining E(|c Ql .| g - |a..| h ) and its value is given by the second
member; thus
(k)
E(lc ol1 |K| ai .| h ) --J
Clearly this moment will exist for all integers g and h for which all arguments of the
tn.lt
XT. AN DfPRODTCTIOW TO MULTIVARIATE STATISTICAL ANALYSIS
257
gamma function are > o. Setting g - -h, we obtain aa the h-th moment* of Y,
(1)
h) j
h)
s^n Rf +
This moment may be written as
(m)
(f)
nf> ;
" nn)
^ + h - 1 f - 1
x (1-x) <3bc.
Therefore the h-th moment of Y (h - 0,1,2,...) Is Identical with the h-th moment of a
variable x having probability element
(n)
dx.
It follows from Theorem (A), $2.76, on the uniqueness of distributions from moments that
Y Is distributed according to the probability law (n).
Making use of the fact that
and letting
we may write
(o)
-1
c oi j - a i j
- Vn"( x -
*Por more applications of the foregoing technique of finding moments of ratios of deter-
minants, see S. S. Wilks "Certain Generalizations in the Analysis of Variance",
Biometrlka. Vol. 2V (1932) pp. kn
XT. AN "nflTOQDIin'PTflN TO Hni/PTVARIATE fVPATTSFPTCAI, ANALYSTS
Multiplying the first row by -y^ then adding to the second; multiplying the first row lay
-y g and adding to the third; and ao on, we may write the determinant as
(P)
-1 y, y 2 .
y i a !1 a i2'
7 2 a 21 a 22*
*1k
*2k
*k
It follows from the argument leading to expression (k), 3.23 that the expression (p) may
be written as a^M* !> a 'y^y*], and substituting the value of y^ we are finally able
to write
4 4 rn^
(q)
c oij'
where T 2 Is Hotelllng's* Generalized Student Ratio which can be written down explicitly
In terms of the a*J and (-a in an obvious way. Hence
(r)
Y -
n-T^/n-1
and the distribution of T can be found at once by applying the transformation (r) to the
probability element (n) (with x replaced by Y). The result is
^@ JT^l&t
1K$ The Hypothesis of Equality of Means In Multlvarlate Normal Populations
Suppose O r (x la , 11,2, ...,k; a-1 ,2,..,,!^; ti,2,...,p) are p sanples from the
L
normal k-varlate populations
(a)
and that It Is desired to test the following hypothesis:
H. Hotelllng, "The Generalization of Student's Ratio 11 , Annals of Math. Stat . . Vol. 2
(1931) PP. 359-378.
XI. AN TNTRODUCTIQlf TO MHUT1VARIATE STATISTICAL ANALYSIS
(b)
co-
such that
la positive definite
and -oo < aj < oo , i-1,2,.. .,k; t-l,2,...,p.
1 Subspace of H for which a] - a? ... - a? -
| where -oo < a i < oo , 1-1 ,2, , . .,k.
Denoting this hypothesis by H(a]af-. . .-aj), it is simply the hypothesis that the samples
come from k-variate normal populations having identical sets of means, given that they
come from k-variato nonnal populations with the same variance- covariance matrix. It
should be noted that this hypothesis is the multivariate analogue of that treated in 9.1.
Let
(c)
ij-
,_ o 77 \ / __ti 77
and
(d>
where
(e)
and
The a 1 ,/n are the second-order product moments in the pool of all samples, and similarly
x is the mean of the i-th variate in the pool of all samples.
The likelihood function for all samples is
(8)
where
(2TT)<
Itaxlnlzlng the likelihood function for variations of all parameters over n, we obtain
XI, AN INTRODUCTION TO MULTIVARIATE STATISTICAL ANALYSIS
and the maximum of the likelihood turns out to be
(h) Brr~n e " ~ *
(2TT) 2 |-~*! 2
Similarly, maximizing the likelihood function for variations of the parameters over
we obtain
(i) a = x r MA^II - 1 1-51 1" 1 ,
and the maximum of the function turns out to be
(J)
nk
1 * 2
nk ft n
U
n
Hence the likelihood ratio for testing H(a]=a^=.. .=a^) la the ratio of (j) to (h), 1. e.
(k)
Again we may use A ' n = Z, say, as our teat criterion. To find the diatrlbution
of Z, we proceed aa In 11.4 by the method of momenta. Noting that the a^. are diatrl-
buted according to the Wlahart diatrlbution w 1 v( a jM A ii) we have, aimllar to (h),
511 .k,
(^ + g>
(1) E(|a, ,| g ) = -J
Now It may be verified that
(m)
^ .
k
t -t tas1 .
a^^ and x^ being independently distributed systems, it follows that the a^^ and the
are independently distributed systems. The a are distributed according to Wishart dis-
p k _t t - t
where m. j - > ^ n^Cx^-x^ )(x.-x,). Since the a^ . are functions only of the a< the
tas1 1 - 1 J J J
tributions vt _j ^(a^A-j^j ), tl ,2, . . .,p, and it follows from the reproductive property of
the Wishart distribution that the a , are distributed according to w^ . (a, _.,A^ ,).
ij n-p,K ij ij
Therefore by using the joint distribution of the a^. and 5c and following steps similar
to those yielding (k) in 11.U, we find
,5 3d* AN INTRODUCTION TO MUIfflVA
(n)
The h-th moment of Z la given by setting g -h. We find
(o)
It should be noted 'that for the case of two samples, (1. e. p2), the h-th moment of Z
reduces to
and. hence the distribution of Z In this case Is the same as that of Y with n replaced by
n-1 . In the two-sample case, It should be remembered that n - n 1 + n g , the sum of the
two sample numbers.
For the case of p 3, the h-th moment of Z Is
Making use of the formula
(q) reduces to
from which we Infer the distribution of Z to be Identical with that of x 2 , where x Is
distributed according to
Hn-2) n-k-Sfc-i
r(n-k-2)T(k)
Setting Z -IX- 2 , we find dx -^Z~^ 1 / 2 ^dZ, and hence the distribution of Z for the case of
three samples Is
<) j
g li2 XI. AN INTRODUCTION TO MUUTIVARIATE STATISTICAL ^WATIVSTS $11 .6
The distributions for p b and 5 turn out to be relatively simple also.
11.6 The Hypothesis of Independence of Seta of Variables in a Normal Multl-
variate Population
Suppose O n (x ia , 1-1, 2,... ,k; ai,2,...,n) is a sample from a normal multivarlate
population with distribution (a) in 11.3. Let the variates x^ be grouped Into r groups .
as follows: G I : (x 1 ,x g , . ..,x^ ), G 2 : (x k +1 ,x fc + 2 > >*! ^ )>> <V
( x v j. .iv .i>-^ x v)> where k - k.^-k^-f. . .4-k . The problem we wish to consider is that
K 1*' - *' wt r-1 i e r
of deriving a test for the hypothesis that these groups of variates are mutually indepen-
dent, 1. e. that Aj, - for all l,j not belonging to the same group of variates. Let
I lA^y 1 1 denote the value of | lA^I I when all A^ aw o for 1, j belonging to different
groups of variates. The hypothesis to be tested may then be specified as follows:
J Space of the A^ such that | |A 1 J | is positive
(a) {definite and -oo < a i < +00.
(o)
u>: | Subspace of /I for which I lA^I I I |A jL j 1 1 .
We denote this hypothesis by H( | lA^I I - I |A| I ). Maximizing the likelihood function
(b) in 11.2 for variations of the parameters over IX we find the maximum to be
nk n
The nfexlmum of the likelihood function for variations of the parameters under u> la
n
~~ o
,!
where aiy a. , if i,j belong to the same group of variates and aiy o if 1, j belong
to different groups of variates. Clearly |aj^ '| is equal to the product of r mutually ex-
clusive principal minors Tlla i ., I, the u-th minor being the determinant of all a^ *
u-1 u j u , . r J
associated with the u-th group of variates. Similarly lAi^l |f |A 1 ^ |. The llkeli-
( . X J u-1 x u j u
hood ratio for testing H(l |A 1 j| | - I !A^y 1 1 ) is, therefore,
n
Ia i1 |2
(d) ^-"-n
(o)2
S11.6
XT. AN INTRODUCTION TO MUI/TIVARIATE STATISTICAL ANALYSIS
Denoting A n by W, which may clearly be used as the teat criterion in place of A, we de-
termine the distribution of W by the method of moments.
It should be noted that if we factor VaJ^ out of the i-th row and i-th column,
(il,2,...,k) of each of the two determinants |a 1 J and laiyl, and using the fact that
" r
iV
sam P^ e correlation coefficient between the i-th and j-th variatea,
where r^, = 1, and ri^ - r,, if i and j both belong to the same group of variatea, and
ri^' If 1 and j belong to different groups of variates.
To find the moments of W, let us divide the a^. into two classes: (A) those
for which i and j correspond to different groups of variates, and (B) all others. Let
the product of differentials of the a 1 . in (A) be dV A with a similar meaning for dVg.
Now it is evident that if we integrate the Wishart distribution w r v (a, .;Ai^') with re-
1 1 , K 1 J 1 J
spect to all a., in Class (A), we will obtain the product of Wiahart distributions
uu
since this integration simply yields the joint distribution of the a^j In Class (B) which
we know to be independently distributed in sets a, . (ui ,2, . . . ,r) when I I A.. ! I
. , u j u J
| \A\.'\ I, each set being distributed according to a Wishart law. Hence we must have
n-1
(e)
H.1
\
n-k-g 1
(o)
e
2 itpi n ij tt ij
dV A
fr
n-1
A ^ 1 a i j
J U U J U
(n-i)k u k u (yi
. Vi
Let both members of (e) be multiplied by ]~Tla 1 ^ |" (which is constant as far
j
in Class (A)
multiply throughout by
(f)
as the a,** in Class (A) are concerned), then replace n by n + 2h throughout (e), then
h)
AN INTRODUCTION TO MPLTIVARIATE STATISTICAL ANALYSTS
then Integrate with respect to the a 1 , in (B). It will be seen that the first member in
(e) after these operations will be the integral expression defining E(V\r), and the second
member will be the value of ECW 11 ). We find
(g)
A ku
n FT
k
'U
As a special case, suppose we wish to test the hypothesis that x 1 is independent
of the set x 2 ,x_,. ..,x-. In this case r 2, k 1 ** 1 , k = k-1 . The W criterion is
(h)
W
' 1 1
where r. 1 is the minor of the element in the first row and column of I r, . I , and R is the
sample multiple correlation coefficient between x 1 and x^x,, . . .,Xv. The h-th moment of
W for this case la found from (g) to be
(1)
Following the procedure used In inferring the distribution of Y in 11.1* from its h-th
moment, we find the probability element of W to be
U)
n-k-g k-3
W 2 (1-W) 2 dW.
Setting W - 1-R 2 , we easily find the distribution law of R 2 , the square of the sample
multiple correlation coefficient, between x 1 and x 2 ,x, f ... f x^ to be
n-k-2
(k)
when the hypothesis of independence of x- and x ,x.,...,x,_ is true, i. e. when the AL--O,
I c. I K I,J
( J 2,3,...>k), which is equivalent to having the multiple correlation coefficient equal to
zero in the population. This result was first obtained by R. A. Fisher, who also
411.7 XI. AN TMTOQDTIC'PTQN TO MHUTIVARIATE STATISTICAL
later* derived the distribution of R 2 in samples from a normal multivariate population
having an arbitrary multiple correlation coefficient.
Distributions of W for various special cases Involving two and three groups of
variates have been given by Wilks**.
11.7 Linear Regression Theory in Nonaal Multivariate Populations
The theorems and other results presented in Chapters VIII and IX can be extended
to the case in which the dependent variable y is a vector with an arbitrary number of
components (say y 1 ,y 2 ,.. .,y a ), each component being distributed normally about a linear
function of the fixed variates x 1 ,x 2 ,...,x^. In this section we shall state without
proof the multivariate analogues of the Important theorems In Chapter VIII. The details
of the proofs of these theorems are rather tedious and can be carried out as exten-
sions of the proofs for the case of one variable.
Suppose y 1 ,y 2 , ...,yj. are distributed according to the normal multivariate dis-
tribution
1 J.
A 2 2
(a) -A-e
(2tt) 2
where
(b) b, -
the
x being fixed variates. Let O n (yi a IXp a ; i-i,2,...,s; J>-1 ,2, . . . ,k; a-i,2,...,n) be
*R. A. Fisher, "The General Sampling Distribution of the Multiple Correlation Coefficient"
Proc. Roy. Soc. London, Vol. 121 (1928) pp. 65^-673.
An alternative derivation has also been given by S. S. Wilks, "On the Sanqpling Distribu-
tion of the Multiple Correlation Coefficient", Annals Math. Stat., Vol. 3 (1932)
pp. 196-203.
**S. S. Wilks, "On the Independence of k Sets of Normally Distributed Statistical Vari-
ables", Econometrics, Vol. 3 (1935), pp. 309-326.
***Proof s and extensions of many of the results may be found in one or more of the follow-
ing papers:
M. S. Bartlett, "On the Theory of Statistical Regression", Proc. Royal Soc. Edinburgh,
Vol. 53 (1933), PP- 260-283.
P. L. Hsu, "On Generalized Analysis of Variance", Biometrlka, Vol. 31 (19^0), pp. 221-
237.
D. N. Lawley, "A Generalization of Fisher's z", Blometrlka, Vol. 30 (1938), pp. 180-187.
W. G. MadoHf, "Contributions to The Theory of Multivariate Statistical Analysis", Trans.
Aror. Math. Soo.. Vol. U (1938), pp. fc5V-*95-
S. S. Wilks, "Moment-Generating Operators for Determinants of Product Moments In Sanqples
froma'Normal System", Annala of Math., Vol. 35 O93M, PP. 312-3^0.
XI. AN imODIETIOW TO MUUPIVARIATE aTATIgTICAL ANALYSIS
a sample from a population having distribution (a). The likelihood function associated
with this sample is
?
(ait)
ng
2
b ia -
C 4 i -
(c)
where
(d)
Let
(e)
Clearly c.^ - c .,, cJ - cj^, c* - c" . For a given value of 1, let a^ be the solution
of the equations
q
(f)
that is,
(g)
and let
(h)
Purthennore, let
(D
The essential functional and probability properties of the quantities defined
in (d), (e), (f ), (g), (h) and (1) may be stated In the following theorems:
Theorem (A);
Theorem (B);
ill ,7 _ JET- AH IMTRQDDCTIQN TO Mnr/rWARIAfTB gPATigTICAL ANALYSTS
Theorem (C)t If 0^: (y^l^p^ la * sample from a population having distribution
if the x_^ are such that I Ic
^xi ---
according to the Wishart distribution
(a), then if the x_^ are such that I Ic" I 1 is positive definite , the a, , are distributed
--- ^xi --- pq - ij --
and independently of the a^ (i-l,2,...,s; j*-l,2, ...,k) which are distributed according
to the normal ka-variate distribution law
xj^ip^ip^jq'^'q*
TT2
k s
where D is the ks-order determinant |A, ^c" I and has the value A He"! .
- -- -- X j pq pq
The multivariate analogue of the general linear regression hypothesis stated in
58.3 may be specified as follows:
The space for whichlA, Jis positive definite
iJ
and -oo < a, < oo , i~i,2,...,s; pi,2,...,k.
The subspace of XL for which a^ - a ip >
Let us denote this hypothesis by H(a i -a^ ). It is the hypothesis that the last k-r
regression coefficients corresponding to y^, (ii ,2, ...,s) have specified values E I .
If the a^ 0, our hypothesis is that each y^ is independent of x r+1 * x r ^ 2 > ^k*
The likelihood ratio A for testing this hypothesis (as obtained by maximizing
the likelihood (c) for variations of the parameters over Hand by maximizing for varia-
tions of the parameters over u> and taking the ratio of the two maxima) turns out to be
given by if 1 / 2 , where
ls.,1
(n) U T"^ '
|a !j'
The form of sf j may be seen from the following considerations:
In view of Theorem (A), when the likelihood function (c) is maximized for vari-
ations of the parameters overu>, we may consider the maximizing process in two steps:
First, with respect to the a^ parameters over cj (holding the A^ fixed). Here we fix
a i a i ' (l1,2,...,s; p=r+l,...,k) in (j) and minimize the second term on the right
side of (j) with respect to a^ (i-1 ,2, . . .,s; p-i ,2,... f r). The coefficient of A I . in the
XI- AN HgrRODDCTION TO MUUPIVARIATB STATISTICAL AMALY3I3
right band aide of (J) after this minimizing step Is sjj, where
(o) a - 8
where BL * results from the second term of the right hand aide of (J). We next maximize
(o) for variations of the A^, after maximizing with respect to the a., (over GJ). It
will be seen that the maximizing values \* of A* * are obtainable after the first maxi-
mizing step (i. e, with respect to the a^ overcj), and are given by
It will be noted that the form of sj* is similar to that of s^*, and is given
where
-yc
and where at are given solving the equations
0, (q1,2,...,r),
where
The HLj in (o) are functions of the aj which are distributed independently of the
In fact, it can be shown that the m^., are of the form >>"g^ u *^ u , where the ^ u
(i-l,2,...,s) are linear functions of the fi^ distributed according to
,.,
and furthermore the sets ^ u (^" 1 *2,...,k-r) are Independently distributed, and are dis-
tributed Independently of the s^ where E^iW^ipo) la true. If the a A - o (i-l,2,..,s
pr+i,.*.,k) then it follows from Theorem (B) that IsjJ may be expressed as the ratio
of two determinants as follows:
11,7
XI. AN imRQDDCTIQH TO MCTJTIVARIATB aTATISKPICAL AKALgaiS
(t)
p;q'-1,2,...,j
Now the problem of determining the distribution of U when ^^4-^ ) la true,
la, therefore, reduced to that of determining the distribution of the ratio of determin-
ants
Is,
1.1
(u)
where the S, are distributed according to the Wishart distribution
(v)
and the ,. are distributed according to
lu
r
(w)
Ju
the s^j and ^ being Independently distributed systems.
The simplest procedure for finding the distribution of U Is perhaps by the
method of moments. The method of finding the moments of U Is entirely similar to that of
finding the moments of Y and Z in 11 .U and 11.5, respectively. The h-th moment is
given by
(x)
from which one may infer the distribution of U In any given case by methods illustrated
in 11.5. We may summarize our remarks in the following theorem which is the multivarlate
analogue of Theorem (A), 8.5.
Theorem (D); Let O n (yi a l x pa ) M & sample of size n from the population having
dlatrlbution (c). Let H(a 1 **a.^ ) be the statistical hypothesis specified by_ (m), and
let U - A 2 / n , where A jLa the likelihood ratio for testing the hypothesis. Then
(y)
u
(1,
250 XI. AN INTRODUCTION TO MUUTIVARIATE STATISTICAL AMAT.YHTfl 1 1 .
where s^ la defined by (^), and m^j bj (o) and (g), and if E(a^ -a^ ) la true, the h-th
moment of U is given bj (x)
It ahould be obaerved that U la a generalized form of the ratio
In Theorem (A), 8.3. In fact, when a - 1, then a n - no^and m 1 1 -
It may be verified that Theorem (D) la general enough to cover multlvarlate
analoguea of Caae i (8.41), Caae 2 (8.^2) and Caae 3 (8.^3). The eaaentlal point to be
noted In all of these caaea Is that k represents the number of functionally Independent
a lp (** or ea h 1) involved in specif ying H and r k) represents the number of function-
ally Independent a. (for each 1) involved in apecifylng u>.
11,8 Remarks on Multivarlate Analysis of Variance Theory
The application of normal linear regression theory to the various analysis of
variance problems discussed in Chapter IX can be extended in every instance to the case
in which the dependent variable y is a vector of several components. In all such raulti-
variate extensions, U in 11.7 plays the role in making significance tests analogous to
Snedecor's P (or 1 / ( 1 +lF ) > to be more precise) in the single variable case, (Theorem
(A), 8.3).
The reader will note that the problem treated in 11.5 is an example of multi-
variate analysis of variance, and is the multivariate analogue of the problem treated In
9-1.
To illustrate how the extension would be made in a randomized block layout
with r f rows and s f columns, let us consider the case in which y has two components y-
and y g . Let y Uj . and y gl j be the values of y 1 and y 2 corresponding to the i-th row and
j-th column of our layout, i-i ,2, . . .,r f , j-1 ,2, . . .,s ! .
The distribution assumption for the 7^4 and Ypi^ 1* that (y-|i^ m i"Rii""C 1 j)
i " are Jo^tly distributed
I J. -
according to a normal bivariate law with zero means and variance- covariance matrix
A 11 A 12 II
A 21 A 22 II '
Now suppose we wish to test the hypothesis that the "column effects* are zero
for both y 1 and y^. Thia hypotheaia may be apeclfied as follows:
rr. AN TOTROPTICTTQN TO MDUTIVARIATE STATISTICAL ANALYSIS
-oo < nij, m g ,
12
.a:
(a)
< <D,
positive definite.
. The aubapace in H obtained by setting
each Cj * and each C 2 ^ - 0.
This hypothesis, which may be denoted by H[(C 1 ,,C 2 i)-o], is clearly the 2- variable ana-
logue of H[(Cj)-0] in 59-2.
Let 7^ 7 1 *> 7^ > ^nR' ^nc' ^11E ^ iave m ar ^-ngs as functions of y 1 1 1 ,
7 112 i-^7 lr i a i similar to those of y^, 7.j> 7 > Sp, S G , Sg as functions of y n ,y 12 ,
^pg- Let 7 2 i^ 3 r 2.j' ^2 ' S 22R' S 22C' S 22E ^^ alrailar meanings as functions of
.,. Let
(b)
It may be verified that the likelihood ratio A for testing the hypothesis
H[(C 1 jiC 2 j)-0] is given by u/ 2 , where
(c)
S 11E S 12E
S 12E S 22E
S nE +S 1lC
S 21E +S 21C S 22E+ S 22C
It follows from Theorem (D), 1K7, that the h-th moment of U when
H((C M 9 C 4)-0) is true, the special case of (x), 11.7, obtained by setting s2, k-r'-f3 f -l,
i j ^J
r - r 1 , n - r's 1 , i. e.
using formula (r) In 511.5, this reduces to
(e)
XT, AN IMTRQPDCTION TO MIII^WARIATE STATISTICAL ANALYSTS
from which we can easily obtain the distribution of U Q by the method used In 'deriving the
distribution of Z In (u), 511.5.
The extension of the hypothesis specified In (a) and the corresponding U Q to
the case In which y has several components, say y^yg^...^ Is immediate. Similar re-
sults hold for testing the hypothesis that "row effects" are zero.
We cannot go into further details. The illustration given above will perhaps
indicate how Theorem (D) can be used as a basis of significance tests for multivariate
analysis of variance problems arising In three-way layouts, Latin squares, Graeco-Latln
squares, etc.
n.9 Principal Components of a Total Variance
Suppose x 1 ,x g , .. . ,x^ are distributed according to the normal raultivariate*law
(a) in 11.3. The probability density is constant on each member of the family or neat
of k-dimensional ellipsoids
k
- C '
where < C < CD . The ellipsoids in this family all have the same center (a ] ,a , . ..,a. )
and are similarly situated with respect to their principal axes, that is, their longest
axis lie on the same line, their second longest axis lie on the same line, etc., (assuming
each has a longest, second longest, ..., axis).
Our problem here is to determine the directions of the various principal axes,
and the relative lengths of the principal axes for any given ellipsoid in the family (the
ratios of lengths are, in fact, the same for each member of the family). We must first
define analytically what is meant by principal axes. For convenience, we make the follow-
ing translation of coordinates
The equation (a) now becomes
The theory of principal axes and principal components as discussed in this section (in-
cluding no sampling theory) can be carried .through formally without assuming that the
random variables Xj,Xg, ...,x k are distributed according to a normal multivariate law.
However, this law is of sufficient interest to justify our use of it throughout the sec-
tion. Some sampling theory of principal components under the assumption of normality
will be presented in 11.11.
PRODUCTIOH TO MOUTIVARIATE STATISTICAL
233
If P:(7 1 ,7 2 , ..>y k ) represent o any point on this ellipsoid, then the squared distance D 2
between P and the center is > y^. How if we allow P to move continuously over the
ellipsoid, there will, in general, be 2k points at which the rate of change of D 2 with
respect to the coordinates of P will be zero, 1. e. there are 2k extreraa for D 2 under
these conditions. These points occur in pairs, the points In each pair being synmetrlo-
ally located with respect to the center. The k line segments connecting the points in
each pair are called principal axes. In the case of two variables, 1. e. k - 2, our el-
lipsoids are simply 'ellipses, and the principal axes are the major and minor axes. We
shall determine the points In the k-variate case and show that the principal axes are
mutually perpendicular.
It follows from U.7 that the problem of finding the extrema of D 2 for varia-
tions of P over (c) is equivalent to finding unrestricted extrema of the function
(d)
for variations of the y 1 and A. Following the lagrange method, we must have
and also equation (c) satisfied. Performing the differentiations in (e), we obtain the
following aquations
(f)
- 0,
(1-1, 2, ...,k).
Suppose we multiply the i-th equation by A , ii,2,...,k, and sum with respect to 1. We
have
(g) iLlA 11 ^ - * 3>"" A jA lh yj - o.
Since iz^A^A 111 - 1 , if J - h, and o, if j ^ h, (g) reduces so that it may be written as
(h)
Allowing j to take values i,2,...,k, it is now clear that equations (h) are equivalent
to (f ) for finding the extrema. In order that (h) have solutions other than zero, it is
necessary for
(D
A 11 -A A 1
A 22 -A,
.2k
- 0.
25k XI. AN INTRODUCTION TO MUUTIVARIATB STATISTICAL ANALYSIS $11.9
This equation is a polynimial- of degree k, usually called the characteristic equation
of the matrix I I A M I . It can be shown that the roots of (i) are all real*. If the
roots are all distinct, let them be A^A ,...y\^. The direction of the principal axis
corresponding to A is given by substituting A in (h) and solving** for the
o O
j-i,2,...,k. Let the values of the y^ (i-1 ,2, . . ,,k) corresponding to A- be
(ii ,2, . . .,k) and let the direction cosines of the g-th principal axis be c ff j (defined by
Hence, we have from (f )
(J) y gl
It ia now clear that if y ^ are solutions of (j), then -y ^ are solutions also. Multi-
plying the i-th equation by y , and summing with respect to i, we find
(k)
or
(1) f=T y gi " A g " *
Therefore, the 3quared length of half the g-th principal axis is AC. If we consider the
o
h-th principal axis, (g^h), we have
^hi hi^, ij^hj" ' ***/
If the i-th equation in ( j) be multiplied by y hl /A and summed with respect to i, and if
the i-th equation in (m) be multiplied by Jxlfih and summed with respect to i, we obtain,
upon combining the two resulting equations
(n)
Since A ^A,, this equation implies that ^y^yv,- - 0, which means that the g-th and
(3 j * O^ ** 1
h-th (gjfti) principal axes are perpendicular, i. e. all principal axes are mutually per-
pendicular.
Suppose we change to a new set of variables defined as follows
See M. B8cher, Introduction to> Higher Algebra. MacMillan Co., (1929), p. 170.
For an iterative method of solving the equations, together with a more detailed treat-
ment of principal components than we can consider here, see H. Hotelling, "Analysis of
a Complex of Statistical Variables into Principal Components", Jour, of Educ. Psych..
Vol. 2if, (1933), pp. M7-U1, 1*98-520.
XI. AN INTRODUCTION TO MUUTIVARIATE STATISTICAL ANALYSIS
255
(o)
"" Z
g'
Multiplying the g-th equation by c j and using the fact that ^" c gl c ,
of mutually orthogonal vectors) and summing with respect to g, we find
Substituting In the equation of the ellipsoid (c), we have
>
(for a set
"
Now It follows from the argument leading to (n) that A i c i c "9*
^A* (If 8"h). Hence the equation of the ellipsoid In the new coordinates Is
(r)
- c -
The Jacoblan of the transformation (p) Is |c ^| which has the value 1, as one can see by
squaring the determinant. Hence, if the (x^-a,) are distributed according to the normal
multlvarlate law (a), 511 ,3* and since (p) transforms the quadratic form (a) into (r),
then the z are Independently distributed with variance A . But from (o) we also have,
B &
by taking variances of both sides,
"
Sunming with respect to g, and using the fact that
, we have
In other words the sum of the variances of the y^ (11 ,2, ...,k) is equal to the sum of
the variances of the z (g-l,2,...,k). A 1 ,A 2 , . ../ k are called principal components of
'the total variance. It will be observed that z is constant on v-i)-dimensional planes
o
perpendicular to the -th principal axis, &-i,2,...,k.
We may summarize in the following
Theorem (A); Let y. ,y ? , . . . ,y^ be random variables distributed according to the
normal multivariate distribution
k
(a)
5
dyr .. dv
236 _ XI, AN INTRODUCTION TO MUUIVARIATE STATISTICAL ANALYSIS _ S11.Q
Let the roots of the characteristic equation |A *-Ad , .| - o be A, *A ,...,A,.,. Let c^
_ __ _ _ __ __ 1J I e. K gl
(1-1,2,.. . ,k) be the direction coainea of the g-th principal axia of
and let
(u) Agi y i " V ' (8-1, 2, ...,k).
Then
(1 ) The direction coainea are given by
where the y . aatlafy the equations
(2) The length of half the g-th principal axis la VA_C.
g
(3) The principal axea are mutually perpendicular.
(h) The tranaformation (u) tranaforma the probability element (a) into
(v)
======= e - u*,...^.
the z., being independently distributed.
^ 8 ~_
(5) ^> A !_*&> i- Q- the aum of the variancea of the y< ia equal to the aum of
-
the variancea of the z .
o
If two of the roota of (i) are equal, we would have an indetermlnant situation
with reference to two of the principal axea. In this caae, there will be a two-dimen-
aional apace, i. e. plane, perpendicular to each of the remaining principal axes such
that the intersection of thia plane with (c) ia a circle. Similar remarka can be made
about higher multiplicitiea of roota.
Aa a aimple example in multiplicity of roota, the reader will find it inatruc-
tive to conaider the caae in which the variance of y (i=l ,2, . . .,k) ia o- 2 and the covari-
ance between y^^ and y. ia a 2 p. Equation (i) becomea
[ar 2 (1- ? )-A] k " 1 [<r 2 (H-(k-1) ? )-A] - 0.
There are roota of two magnitudea, one being o- 2 (i-p) with multiplicity k-1; the other
being oO+fk-i )f) with multiplicity 1. It ia convenient in thia caae to think of one
11.10 , XI . INTRODOCTION TO MULTIVARIATB STATISTICAL ANALYSIS 237
long principal axis (if f >0) and k-1 short ones all equal (although indeterminate in
direction). If f^Q, then it is clear that the long axis increases as k increases, while
the short axes remain the same. Thus the variance of the z (which is a linear function
of the y 1 by transformation (u)) corresponding to the longest axis increases with k.
This property of increasing variance of the linear function of several positively inter-
correlated variables associated with the longest axis, is fundamental in the scientific
construction of examinations, certain kinds of indices, etc. By continuity considera-
tions one can verify that the property holds, roughly speaking, even when the variances
(as well as the covariances) of the variables depart slightly from each other.
11.10 Canonical Correlation Theory
Let x 1 ,x 2 , . . . ,x^ be random variables divided in two sets S 1 : (x 1 ,X 2 , . . . ,x^.)
and SQ'.IX,. .,,... ,x,_ .!_ ) (k.+ks-k). We shall assume that k- < k . Let L 1 and L be
C. Jt.j-t-1 K,^ +Kg I d I . \
arbitrary linear functions of the two groups of variates, respectively, i. e.
L, -
The correlation coefficient between L I and L g (see 52.75) is given by
where i and j in the summations range over the values i,2,...,k, while p and q range over
the values k^+1, k^+2,.. .,k.j+k 2 . ||A p | is the covariance matrix between the variables
in G 1 and those in 2 ; | |A M | is the variance- coveriance matrix for variables in 0- ; a
similar meaning holding for I |A pq | I .
Now suppose we consider the problem of varying the 1^ and lg so as to maxi-
mize the correlation coefficient R 12 , (actually to find extrema of R 12 , among which there
will be a maximum). Corresponding to any given solution of this problem say 1^, 1 2 ,
(i-i,2,...,k- ; i^-k^+1 ,...,k 1 -fk 2 ) there are infinitely many solutions of the form al 1A ,
bl* , where a and b are any two constants of the same sign. To overcome this difficulty,
it is sufficient to seek a solution for fixed values of the variances of L 1 and L 2 , which,
*This problem was first considered by'H. Retelling "Relations Between Two Sets of Variates 11
i, Vol. 28 (1936), pp. 322-377-
XT. AM IMTROPTCTTQW TO MIMIVARTATB OTATTflfPTnAI. AWAT.YSTfl
for convenience we may take as 1 . This is equivalent to the determination of the extrema
of R 12 for variations of the 1^ and l g , subject to the conditions
(c)
By Lagrange's method this amounts to finding the extrema of the function
where A and M are divided by 2 for convenience. The
tions
(e)
and 1 2 must satisfy the equa-
1 1 , 2 , . . . , K
(f)
which are
(g)
(h)
Multiplying (g) by 1^ and summing with respect to (i), then multiplying (h) by l g ,
summing with respect to p, and using (c), we obtain
(i)
Therefore putting jx A in (h), we obtain a system of k linear homogeneous equations in
the 1 14 and 1 2 . In order to have a solution not identically zero, the k-th order deter-
minant of the equations (g) and (h) must vanish. That is
(j)
0.
If we factor tA 11 out of the i-th row and j-th column (i-1 ,2, . . .,k) and VA PP out of the
p-th row and p-th column (p-^+1 , . . ^k^k^), we find that (J) is equivalent to
(k)
- o,
811.10 _ XI, AN INTRODUCTION TO MOI/riVARIATB STATISTICAL ANALYSIS _
where the p f a are correlation coefficients, and f^ - p - i. it can be shown* that the
roots of (k) are all real, since the determinant (k) Is the discriminant of Q. - AQ^,
where Qg Is the sum of the two quadratic forms In (c), and hence Is positive definite.
If the determinant In (k) Is expanded by Laplace's method by the first k 1 columns (or
rows) It Is clear from the resulting expansion that (k) Is a polynomial of degree k-+k
k k
In which the lowest power of A la k g -k 1 . Hence by factoring out A 2 } ' we are left with
a polynomial f(A) In A of degree 2k 1 . Now any term In the Laplace expansion of (k) (by
the first k 1 columns) Is the product of a determinant of order k. and one of order k p .
If the first determinant has r rows chosen from the upper left hand block of (k), then
the second determinant will contain k - (k--r) rows from the lower left hand block of
k -k +2r
(k). The product of these two determinants will therefore have A 2 1 as a factor.
k ~k
Therefore, by factoring A 2 1 from each term In the Laplace expansion of (k), It Is clear
that the resulting polynomial, that Is f(A), will contain only even powers of A. There-
fore, the 2kj roots of f(A) - o are real and of the form +A V fA^...,^ , where each A
la ^ o. Let A jL - p^y and ^ - fyi^y Let ^U' I \i2 9 be the aolutlona of the equations
(g) and (h) corresponding to the root p^, (u-1 ,2, . . .,2k 1 ) and let L^ , L_, be the values
of L 1 and L g In (a) corresponding to the solutions 1 U11 , l^p- Remembering that JA - A,
and inserting the u-th root p in (g) and (h), we must have
Multiplying (1) by l ull and summing with respect to i, and making use of the fact that
^ ^ find
The first term in (n) is simply the correlation coefficient between L^ and lyg, and its
value is p^- If u is even, then the correlation between L^ and 1*^ la equal to that be-
tween Lj u+1 ^ and -L^ u+1 j g (or -L^ u+1 ^ and L^ u+1 ^. It can be easily verified that the
correlation between L( 2 i)i and L (2j)2 W$) ia zero- Hotelling haa called L^ and L^
the u-th canonical variatea^ and p<j the canonical correlation coefficient between the
canonical variatea Iy 1 and L,^. Hence, the canonical correlatlona and therefore the roots
*M. Bdcher, loc. cit., p. 170.
260 XI. AN IMKOPqCTIQE TO MULTIVARIATE STATISTICAL ANALYSIS S11.11
of the equation (k) lie on the interval (-1,+1). If there exists a single largest root,
it is the one such that when it is substituted in (g) and (h) we obtain solutions (i. e.
values of 1^ and 1 2I) K which, used in (a), will give the linear functions having maximum
correlation. For further details on canonical correlation theory, the reader is referred
to Retelling 'a paper.
We may summarize our results in the following
Theorem (A); Let S 1 :(x 1 ,x g ,...,x k ) and S 2 :(x k +1 ,...,x k ) be two sets of ran-
dom variables where k - kj+kg (k^g). Let L I and L 2 , as defined in (a), e linear func-
tions of the variables in S 1 and S 2 , respectively, such that the variances of L I and L 2
are unity. Let R 1 2 be the correlation coefficient between L 1 and L g . Then
( 1 ) There are at most 2k 1 distinct extrema of R 1 2 for variations o the 1 1 ^ and l gi
in L 1 and L 2 ;
(2) These extrema correspond to the 2k^ roots of equation (k), which lie on the
interval (-1 ,+1 ) and are symmetrically spaced with respect jto the origin.
(3) The value of R 10 corresponding to the u-th root o,.. of (k) is equal in value to
_ , ^ t ^ U j __
Q..V itself (the u-th canonical correlation coefficient ) .
H U j ___ ____
(U) The canonical correlation coefficient between the two canonical variates corres-
ponding to any two numerically different values of o^ la zero.
The reader should note that no assumptions have been made about the distribution
function of the two sets of random variables, S^ and S g . We are able to maintain this
degree of generality as long as we are considering canonical correlation theory of popu-
lations. However, the statistical value of this theory may be questionable if the distri-
butions of the x's in G I and G 2 departs radically from the normal multivariate law. Again
In studying sampling theory of canonical correlations, progress has been made only for the
case of sampling from normal multivariate populations. Some of the sampling results are
given in 11.11.
11.11 The Sampling Theory of the Roots of Certain Determlnantal Equations
In the treatment of the theory of principal components (11.9) and of the canon-
ical correlation theory (11.10), it was found that the roots of certain determinantal
equations in which the matrices are variance- covariance matrices, played /fundamental roles.
In testing hypotheses concerning principal components, canonical correlations and allied
topics, we are interested in the roots of the analogous equations in which the matrices
are sagiple variance- covariance matrices. In the following sections, we shall derive the
(11 .111
XI AN INTRODUCTION TO MUUTIVARIATE STATISTICAL ANALYSIS
261
distributions of the roots of several sample determinants! equations when the samples
are drawn from certain special multlvarlate normal populations. The distribution theory
of the roots for more general assumptions has not yet been developed.
11.111 Characteristic Roots of One Sample Variance- covarlance Matrix .
Let us consider a sample O n :(* la ; 1-1, 2,..., k; a-i,2,...,n) from a normal multl-
varlate population, whose variance- covarlance matrix has one root A of multiplicity k.
The variance- co variance matrix of the population Is then of the form
A ...
A ...
I 00 ...
A
and its Inverse is
i/A o . . . o
o I/A ... o
o ... 1/A
The p. d. f . of this population la
(a)
Let
w n-i k( a ij'X*ij)' where
_
2A
1
j)' where *i
* 1 > lf ^"J*
the distribution of the roots of
a n -l
(b)
a 22 -l
a i1 are distri ^ u ted according to
- We are interested in finding
we
-i
- o,
These dlatrlbutiona and their derivations were first published in the papers by R. A.
Plaher, "The Sampling Distribution of Some Statistics Obtained from Non- linear Equa -
tions", Annals of Eugenics. Vol. 9 (1939) pp. 238-21*9, and by P. L. Hsu, "On the Dis-
tribution of Roots of Certain Determinantal Equations", Annals of Eugenics. Vol. 9
(1939), PP 250-258. The derivations uaed in this section were developed by A. M.
Mood (unpublished).
XI. AN INTRODUCTION TO MULJIVARIATB STATISTICAL ANALYSIS
which is analogous 'to (1) $11.9. For a geometrical Interpretation of these roots, the
reader Is referred to 11.9.
In 11.9, It was shown that for a matrix I lA^JI there Is a set of numbers
(i,M>2,...,k) (direction cosines of the principal axes of the family of ellipsoids
- C) such that the transformation
c gi y i -
will yield 2
(c) _
g,h-i i, J1
where the A are roots of |A J -A4. . | o. .Expressing the z in terms of the y* In the
g * J o ^
middle member of (c), we get
^- **-%
s g
Hence,
In a similar manner we can find numbers -y^ (i,h-1 ,2, . . .,k) to express a^j as
on a
where the 1, are the roots of (b) and the 7^ are elements of an orthogonal matrix
I l^jjJ I; that is 1^^^ <$ ll - and Sl^hi^hl " 6 r The ^h and tjie ^lh ^P 611 ^ onl y on
the a,,. We can get the simultaneous distribution of 1^ and -y ih by substituting (d) In
w n-i k^ a i1M^H^ and multiplying by the Jacobian of the transformation (d). Ordering the
l h so that 1 1 ^ 1 2 ^ ... ^ l k ^ o, the Jacobian is
(e) (I r l 2 )(l r l 5 )...(l r l k )(l 2 -l 3 )...(l k . 1 -l k )(t)(7 lh ),
where iC^^) is a function of the 7 ih only, not involving 1^. This can be verified In
the following way. It is clear from (d) that the Jacobian will be a polynomial in the
l h ; in fact it will be a polynomial of degree k ^" 1 s for there are ^^" 1 ' Independent
elements in I IT.. J I . If 1^ 1 ^ (i^j), the transformation (d) will not be uniquely de-
termined, and hence the Jacobian will be zero since when a transformation is not
(locally) unique, the Jacobian is zero. This fact Implies that we can factor out teras
\e( k 1 * 1 \
There are ^ * such terms, and when they have been factored out, what
.111
XI. AM INTRODUCTION TO MUDPWARIATE STATISTICAL ANALYSTS
remains la Independent of the l h since the Jacobian la a polynomial of degree
Noting that
and that
the 7ju aa
(f)
lf we can write the distribution of the l h and
In- ilk k(k-i
(2A)
jqpii
-1
To derive the distribution of the l h alone, we integrate (f ) with respect to the
over the space of the 7** for which I l*yjJ I i 3 orthogonal, obtaining
n-k-g -
(g)
1-1
The constant K is determined by the condition that the integral of (g) over
the space R of the l h is unity. To find K, let us first define
j_
(h) <t>(r) - f(TT lj ) r e
j 4_i *
D I
1=1
We note that
(1)
K
Since
I a, . | , we have
U)
but from the Wiahart distribution (see (h) In 11'.1), we find that
XI. AN INTRODUCTION TO MIM'TVARIATS OTATI9TICAL ANALYSTS
' 111.111
Equating ( j) to (k) and setting n - k+2, we get
(1)
It remains to evaluate
It can be verified that
00 00 00
KJ
i - 1
. 2
Making uae of (r) In $11 .5, the right hand side may be written aa
1-1
Hence
1-1
(2A)
k(k+i
2
Using this result and equations (1) and (1), we find that
K -
Substituting In (g), we finally obtain as the distribution element of the characteristic
roots of (b;
TT 2 (T l il 1 )~ T ~e ^ '
(") "V/r.-^
(2A)
1-1
It can be shown fairly readily by making appropriate orthogonal transformations,
that if the sample O n : ( x i a f i-l,2,...,k; a-i,2,...,n) is from the normal multlvarlate
population (a) in 11.3 for the case in which the characteristic roots of the matrix
J1U112 _ XI. AN DEPRQDUCTIOy TO MUI/PIVARIATE gPATIgTTGAL AMALYflTfl _
|A ^-Ad. J m o are all equal to A, say, then the characteristic roots of (b") are also
distributed according to (m).
We may summarize In the following
Theorem (A): Let Q^Uj^; 1-1, 2, ...,k; a-i,2,...,n) be a sample from a normal
multlvarlate population for which the characteristic roots of the variance- covarlance
matrix are equal to A, Let a^ . (1, j-1 ,2, . ,.,k) be the second order sample product sums
as defined below (a). Let ^ ,1 2 ,. ..,1^ be the roots (In descending order of magnitude )
of laij-Ujjl 0- The joint probability element of the l^ (1-1, 2, ...,k) Is given bj (m).
11.112 Characteristic Roots of the Difference of Two Sample Variance- covarlance
Matrices .
Let us consider two samples O n '-(x^, 1-1 ,2, . ..,k; ct-1 ,2, . . . ,^ ) and O n :(x? a ,
1-1, 2,,.., k; a^i, 2,..., n 2 ) (n^k, n g >k) drawn from the same normal multlvarlate popula-
tion
VA" -
(a) - j-75-e
(2tr) k / 2
i a "^i)( x ;) > (t-1,2,). In this section, we shall derive the distribu-
tion of the roots of
(b)
In 11.9> we have seen that there la a linear transformation
(c) x l " a l " 2I c gl z g (l-i, 2, ...,
such that
Now let
^B
fAg
I.e.
(e) x 1 - a 1 -
-
Then
266 XI. AN INTRODUCTION TO MULTIVARIATE STATISTICAL ANALYSIS i 1
The transformation (e) when performed on the sample values gives us
where
Now equation (b) becomes
|d gi'
Clearly the roots of
are the same as those of equation (b). Note that the b^i are functions of the w5 , such
that each value of i, t, and a, w? a is distributed according to a normal law with zero
mean and unit variance, the w^ a being independently distributed. This shows that we lose
no generality by assuming that A^ - i, if i=j, and o if i ^ j.
Under this assumption, the a^. have the distribution
(g)
From a theorem In algebra* we know that there la & transformation of the a^. such that
(h)
where are the roots of (b) (arranged, say, In descending order of magnitude). The
See M. Bocher, loc. clt. p. 171.
J1 1 11g
AN
TO MTTnPTVARTATE STATIST THAT.
P67
and the e^ are functions of a^. and
gtituting in
(i) " -1 ir(a
; hence, their distribution may be found by sub-
and multiplying by the Jacobian of the transformation (h). By following a procedure
similar to that of 11.111, we can show that the Jacobian of (h) is
1 a2 N
where Ip(Ujj) is a function of the u^, independent of e^. Hence, the simultaneous distri-
bution of the e, and u, ., is
n,-k-2
k(k-i
TT
(J)
n -k-2
k(n 2 -1 ) k(k-l )
TT
Tl (ej-eJUJtu.J.
l<j-i a J 1J
1=1
Noting that '.u
lu^l
lu jh l and
|u.,| we see that (j) factors into a function of the e^ and a function of the
n,-k-2
n 2 -k-2
n 1 +n 2 -?k-4
where C is a constant. On integrating with respect to the u, , we get the marginal distri-
n,-k-2 n -k-2
bution of the
(1)
K ia a constant to be determined so the integral of (1) over the range of the e^ is unity.
Following a procedure similar to that used in determining K in 11.111, we evaluate K in
(1) and obtain as the distribution element of the e.
268
AN HirRQDDGTIQN TO MUUTIVARIATE STATISTICAL ANALYSIS
C11.113
(m) it
It should be emphasized that distribution (m) holds for the roots of (b) where
1 2
the a^. and the a, . are any two sets of random variables distributed Independently ac-
cording to the Wishart distributions
(n) w (a 1 ;A ) , w (a 2 ;A ),
where n 1 and n g are both > k. In fact, we may summarize our results in
Theorem (A): Let a], and a|* be Independently distributed according to the
Wlshart distributions (n). Let B I ,e g , . , .,6^ (in descending order) be the roots of the
equation I *(*]***! j)- a j[j I " - Then the joint probability element of the e.^ (1=1 ,2, .. , ,k)
ll given bj (m), where the range of the e f s J.3 i^^e^. . .^e, >o.
11.113 Distribution of the Sample Canonical Correlations.
Corresponding to the population canonical correlations discussed In 11.10,
there are canonical correlations of a sample. In this section, we shall determine the
distribution of the sample canonical correlations when the smaller set of variatea has a
normal multivarlate distribution independent of the other set.
Consider a sample O n :'(x ua ; u1 ,2,. ..^-fkg; ai,2, ...,n) from a population where
the first k 1 varlates have a normal distribution and the remaining k 2 variatea are dls-
tributed independently of the first k
). Let
The canonical correlations of the sample are defined as the roots of
i -1-
( 1 > J = l
(a)
Multiplying each of the first k 1 columns by 1 and then factoring 1 out of each of the last
k rows, we see that (a) la equivalent to
(b)
a pj I "?
S11..113
XI. AN INTRODUCTION TO MULTIVARIATE STATISTICAL ANALYSIS
!
except for a factor of 1 . Since we are not Interested in the roots which are
Identically zero, we shall confine our attention to the roots of (b).
Let a pc * be the element corresponding to a in the Inverse of I I a I |
(p,q-k 1 +i , ...jlc^kg). After multiplication on the left by
(c)
equation (b) becomes
hi
(d)
pq
- o,
Since each member in the upper right hand block is and since each element in the lower
right hand block is 6 , (d) can be reduced to
(e)
o.
The roots of (e) (which are also roots of (b)) are the sample canonical correlation co-
efficients. Let the squared roots (in descending order) be 1^
that
I
We observe
(f)
pr
; 1. e., a j _ J . may be written as
and
where, in the determinants on the right, h and j are fixed but p,r k^+1 , . . .,
Let this value be b^. If we consider the x pa (a i,2,...,n; p - k^l , ...^k
fixed with I |a_ | | positive definite, then a^ . and b^ . are bilinear forms in x
pq
Xjp (a,p - 1,2,...,n; l,j = 1,2,...
where I IG a J I is of rank n-1, and b A . may be written as
of rank k ? , H a n being a function of the fixed x . a^
forms in the x iot and x.^ having matrices which do not depend on 1 and j. By Cochran f s
> H a ^x la x^, where | |H a ^| I is
and b. are, therefore, bilinear
270 _ XI. AN INTRODUCTION TO MULTIVARIATE STATISTICAL ANALYSIS _ 811.113
Theorem, we know that there is a transformation which applied to the x la would make
a n ""
Applying this same transformation to each set we get
The y's are normally and independently distributed with zero means and equal variances.
Thus (e) may be written in the form
where c^i a n"^iv the C i1 anc ^ b i1 t>ein S independently distributed according to Wlshart
distributions w n _ k _^ k (c^Bji) and w k ^ k (VjfBj*), where IIBjjII is some positive
definite matrix.
Therefore, it follows from the results of 11.11? that the square of the roots
of (e) (1. e. the square of the canonical correlation coefficients) have the distribution
(m) where n 1 - n-k g , k=k 1 and n g - k 2 +i . That is, the distribution is
where 6^^ - 1, 1=1,2,...^^
We may summarize our results in
Theorem (A); let O n : (x la ; ii l 2 / ... f k 1 A g ; k 1 ^k c ; al ,2, ...,n) be a sample
from a population in which the first k 1 varlates are distributed according ^o a normal
multivariate distribution^ but independently of the remaining k g variates (^lich may have
any arbitrary distribution or may be "fixed* 1 variates). Let a uv (u^v^i^^.^k^kg) be
the second order sample product sums as defined above (a), and let l^,!^,...,!^ be the
squared roots (squared canonical correlation coefficients ) of equation (b). The
distribution element of the 1^, (i=i,2,...,k 1 ) is given by (g) where 6^^ 1^, and where
the range of the I 2f s \a such that
LITERATURE FOR SUPF-*MENTARY READING
1 . American Standards Association: "Guide for Quality Control and Control Chart Method
of Analyzing Data" (19^1) and "Control Chart Method of Controlling Quality
During Production" (19^2), American Standards Association, New York.
2. Anderson, R. L. : "Distribution of the Serial Correlation Coefficient", Annals of
Math. Stat . , Vol. 13, (19^2) pp. 1 - 13.
3. Bartlett, M. S.: "On the Theory of Statistical Regression", Proc. Royal Soc. of
Edinburgh, Vol. 53 (1933), PP- 260 - 283.
k. Bartlett, M. S. : "The Effect of Non-Normality on the t Distribution, Proc. Camb.
Phil. Soc., Vol. 31 (1935), PP- 223 - 231.
5. Battin, I. L. : "On the Problem of Multiple Matching", Annals of Math. Stat., Vol.
13 (19^2), pp. 29^ - 305.
6. B6cher, M. : Introduction to Higher Algebra. MacMillan, New York (1907).
7. Bortkiewicz, L. von: Die Iterationen. Berlin, Springer, (1917),
8. Brown, George W. : "Reduction of a Certain Class of Statistical Hypotheses", Annals
Of Math, atfit., Vol. 11 (19^0), pp. 25 1 * - 270.
9. Camp, B. H. : "A New Generalization of Tchebychef f ! s Inequality", Bull. Amer. Math.
Soc., Vol. 28 (1922), pp. U27 -
10. Cochran, G. C.: "The Distribution of Quadratic Forms in a Normal System, with
Applications to the Analysis of Covariance", Proc. Camb. Phil. Soc., Vol. 30
(193*0, pp. 178 - 191.
11. Copeland, A. H. : "Point Set Theory Alplied to the Random Selection of the Digits of
an Admissible Number", Amer. Jour. Math., Vol. 58 (1936), pp. 181 - 192.
12. Craig, C. C.: "On the Composition of Dependent Elementary Errors", Annals of Math.,
Vol. 33 (1932), pp. 181* - 206.
-<3. Craig, A. T.: "On the Distribution of Certain Statistics", Amer. Jour. Math,, Vol.
5^ (1932), pp. 353 - 366.
Ik. Cramer, H. and Wold, H. : "Some Theorems on Distribution Functions", Jour. London
Math. Soc., Vol. 11 (1936), pp. 290 - ?9k.
\f5. Curtiss, J. H. : "On the Theory of Moment Generating Functions", Annals of Math.
Stat., Vol. 13 09^2), pp. 1*30 - if 33.
16. Daly, J. F. : "On the Unbiased Character of Likelihood Ratio Tests for Independence
in Nonnal Systems", Annals of Math. Stat., Vol. 11 (19^0), pp. 1 - 32.
LITBRATITO FOR 3IPPLEMEOTW RBAPHfg
17. Darmois, G. : Statistique Mathematique. Paris, Doin, 1928.
18. Deming, W. E., and Birge, R. T.: "On the Statistical Theory of Errors", Rev.
Modern Phya., Vol. 6 093*0, PP- 122 - 161.
19. Dodd, E. L. : "Probability as Expressed by Asymptotic Limits of Pencils of Sequences"
Bull. Amer. Math. Soc., Vol. 36, (1930), pp. 299 - 305.
20. Dodd, E. L.: "The Length of Cycles Which Result from the Graduation of Chance
Elements", Annals of Math. Stat . , Vol. 10 (1939), PP- 254 - 26k.
21. Dodge, H. P., and Romig, H. G. : "A Method of Sampling Inspection", Bell System
Tech. Jour., Vol. VIII, (1929).
22. Dodge, H, F., and Romig, H. G. : "Single Sampling and Double Sampling Inspection
Tables", Bell System Tech. Jour., Vol. XX (19^1 )
23. Doob, J. L.: "Probability and Statistics", Trans. Amer. Math. Soc., Vol. 36 (193*0,
pp. 759 - 775.
24. Feller, Willy,: "On the Integral Equation of Renewal Theory", Annals of Math. Stat . ,
Vol. 11 (1941 ), pp. 243 - 267.
25. Fertig, J. W. : "On a Method of Testing the Hypothesis that an Observed Sample of n
Variables and of Size N has been drawn from a Specified Population of the Same
Number of Variables", Annals of Math. Stat., Vol. 7 (1936), pp. 113 - 163.
26. Fertig, J. W. : "The Testing of Certain Hypotheses by means of Lambda Criteria with
Particular Reference to Physiological Research", Biometric Bulletin, Vol. 1
(1936), pp. 1*5 - 82.
27. Fisher, R. A. and Yates, F. : Statistical Tables for Biological , Agricultural and
Medical Research, London, Oliver and Boyd, 1938.
28. Fisher, R. A.: "On the Interpretation of xf froro Contingency Tables, and the Calcu-
lation of P., Jour. Roy. Stat. Soc., Vol. 85 (1922), pp. 87 - 9k.
29. Fisher, R. A.: "Frequency Distribution of the Values of the Correlation Coefficient
in Samples from an Indefinitely Large Population", Biometrlka, Vol. 10 (1915),
pp. 507 - 521.
30. Fisher, R. A.: "On the Mathematical Foundations of Theoretical Statistics", Phil.
Trans. Roy. Soc. London, Series A, Vol. 222 (1921), pp. 309 - 368.
31. Fisher, R. A.: "On a Distribution Yielding Error Functions of Several Well Known
Statistics", Proc. Internat . Cong, of Math., Toronto (1924), pp. 805 - 813.
32. Fisher, R. A.: "The Theory of Statistical Estimation", Proc. Camb. Phil. Soc.,
Vol. 22 (1929), PP- 700 - 725.
33. Fisher, R. A.: "Applications of 'Student's' Distribution", Metron, Vol. 5 (1926),
pp. 90 - i ok.
34. Fisher, R. A.: "The General Sampling Distribution of the Multiple Correlation
Coefficient", Proc. Roy. Soc. London, Series A, Vol. 121 (1928) pp. 654 - 673.
LITERATURE FOR SUPPLEMENTARY READING
35- Fisher, R. A.: "Inverse Probability ", Proc. Camb. Phil. Soc., Vol. 26 (1930), pp.
528 - 535.
36. Fisher, R. A.: "The Concepts of Inverse Probability and Fiducial Probability Re-
ferring to Unknown Parameters", Proc. Roy. Soc. London , Series A, Vol. 139
(1933), PP. 3^3 - 3^8,
37. Fisher, R. A.: "The Fiducial Argument in Statistical Inference", Annals of Eugenics,
Vol. 6 (1935), PP. 391 - 398.
38. Fisher, R. A.: The Design of Experiments. London, Oliver and Boyd, 1935-
39. Fisher, R. A.: "The Sampling Distribution of Some Statistics Obtained from Non-
Linear Equations", Annala of Eugenics, Vol. 9 (1939), pp. 238 - 2^9.
1*0. Fisher, R. A.: Statistical Methods for Research Workers. 8th Ed., London, Oliver
and Boyd, 191*1 .
in. Fry, T. C.: Probability and its Engineering Uses. Van Nostrand Co., 1928.
1*2. Girahick, M. A,: "On the Sampling Theory of the Roots of Determinantal Equations",
Annals of Math. Stat., Vol. 10 (1939), pp. 203 - 22U.
43. Greville, -T. N. E. : "The Frequency Distribution of a General Matching Problem",
Annals of Math* Stat., Vol. 12 (19^1), PP 350 - 35 1 *.
M. Gumbel, E. J. : "Les Valewla Extremes des Distributions Statist iques", Annales de
I'Inatitut a. Poincar (1935).
45. Hamburger, H. : "Uber eine Erweiterung des Stleltzeaachen Momentenproblems " , Math.
Annalen. Vol. 81 (1920) pp. 235 - 319, and Vol. 8? (1921), pp. 120 - 165, 168 -
187.
46. Retelling, H. : "The Generalization of Student 'a Ratio", Annals of Math. Stat . > Vol.
?, (1931 ), pp. 359 - 378.
1*7. Hotelllng, H. : "Analysis of a Complex of Statistical Variables into Principal Com-
ponents", Jour. Ed. Paych., Vol. 2k (1933), pp. M? - Ml, pp. ^98 - 520.
48. Hotelllng, H. : "Relations between Two Seta of Varlatea", Blometrika, Vol. 28 (1936),
pp. 321 - 377.
1*9. Hotelllng, H. : "Experimental Determination of the Maximum of a Function", Annala of
Math. Stat., Vol. 12 (19^1), pp. 20 - 45.
50. Hsu, P. L. : "On the Distribution of Roots of Certain Determinantal Equations",
Annala of Eugenics, Vol. 9 (1939) pp. 250 - 258.
51. Hau, P. L.; "On Generalized Arialyala of Variance", Blometrlka, Vol. 31 (19^0) pp.
2?1 - 237-
52. Ingham, A. E. : "An Integral which Occura in Statistics", Proc. Camb. Phil. Soc.,
Vol. 29 (1933), PP. ?70 - 276.
53. Irwln, J. 0.: "Mathematical Theorems Involved in the Arialyais of Variance? Jour.
Roy. Stat. Soc., Vol. 9^ (1931), pp. 284 - 300.
21k _ LITERATURE FOR SUPPLEMENTARY READING _
5k. Irwin, J. 0. and others: "Recent Advances in Mathematical Statistics", Jour. Roy,
Stat. Soc., Vol. 95 (1932), Vol. 97 093M, Vol. 99 (1936).
55. Keurike, E. : Einfuhrung in die Wahrscheinlichkeitstheorie. Leipzig, Hirzel, 19J2.
56. Kendall, M. G. and Smith, B. B.: "The . Problem of m Rankings", Annals of Math. Stat
Vol. 10 (1939), pp. 275 - 287.
57. Keynes, J. M. : A Treatise on Probability. MacMillan, London, 1921.
58. Kolodziejczyk, S. : "On an Important Claaa of Statistical Hypotheses", Biometrika,
Vol. 27 (1935), pp. 161 - 190.
59. Koopman, B. 0.: "On Distributions Admitting a Sufficient Statistic", Trans. Amer.
Math. Soc., Vol. 39 (1936), pp. 399 - ^09.
oo, Koopmans, T.: On Modern Sampling Theory. Lectures delivered at Oslo, 1935, (unpub-
lished).
61. Koopmans, T. : "Serial Correlation and Quadratic Forma in Normal Variables", Annals
of Math. Stat.. Vol. 13, 09^2), pp. U - 33.
62. Kullback, S.: "An Application of Characteristic Functions to the Distribution Prob-
lem in Statistics", Annals of Math. Stat., Vol. 5 (193M, pp. 26^ - 307.
63. Lawley, D. N.: "A Generalization of Fisher's z", Biometrika, Vol. 30 (1938), pp.
180 - 187.
6k. L6vy, D. : Theorie de L f addition des Variables Aleatoires . (Monographies des prob-
abilities, fasc. 1) Gauthier, 1937.
65. Lotka, Alfred J. : "A Contribution to the Theory of Self-Renewing Aggregates, with
Special Reference to Industrial Replacement", Annals of Math. Stat., Vol. 10
(1939), PP. 1 - 25.
66. Madow, W. G. : "Contributions to the Theory of Multlvarlate Statistical Analysis",
Trans. Amer. Math. Soc., Vol. kk (1938), pp. 1*5^ - 1*95.
67. Mises, R. von: Wahrscheinlichkeitsrechnung und ihre Anwendung in der Statlstik und
Theoretischen Physik., Leipzig, Deuticke, 1931.
68. Mood, A, M.: "The Distribution Theory of Runs", Annals of Math. Stat., Vol. 11
pp. 367 - 392.
69. Mosteller, Frederick, "Note on an Application of Runs to Quality Control Charts",
Annals of Math. Stat., Vol. 12, (19^1) pp. 228 - 232.
*'
70. Neumann, J. von: "Distribution of the Ratio of the Mean Square Successive Difference
to the Variance", Annals of Math. Stat., Vol. 12 (19^1), pp. 367 - 395.
71. Neyman, J. and Pearson, E. S.: "On the Use and Interpretation of Certain Test
Criteria for Purposes of Statistical Inference", Biometrika. Vol. 20 A (1928)
pp. 175 - 2UO, pp. 263 - 29k.
72. Neyman, J. and Pearson, E. S. : "On the Problem of the Most Efficient Tests of
Statistical Hypotheses", Phil. Trans. Roy. Soc., London, Ser. A, Vol. 231 (1933)
p. 289.
LITERATURE FOR SUPPLEMENTARY READING
73. Neyman, J. and Pearson, &. 3.: "The Testing of Statistical Hypotheses In Relation
to Probabilities a priori", Proc. Camb. Phil. Soc., Vol. 29 (1933), pp. ^92 -
510.
Ik. Neyman, J. and Pearson, lif. 3.: Statistical Research Memoirs. University College,
London, Vol. 1 (1936), Vol. 2 (1937).
75. Neyman, J.: "On the Two Different Aspects of the Representative Method: The Method
of Stratified Sampling and the Method of Purposive Selection", Jour. Roy, stat.
SQfi., Vol. 97 O93M, PP. 558 - 625-
76. Neyman, J.: "Su Uh Teorema Concernete le Cosiddette Statiatiche Sufficient!",
Glornale dell* Institute Italiano degli Attuari, Vol. 6 (193^), pp. 320 - 331*.
77. Neyman, J. : "Outline of a Theory of Statistical Estimation baaed on the Classical
Theory^f Probability", Phil. Trana. Roy. Soc., London, Ser. A, Vol. 236 (1937)
pp. 333 - 380.
78. Perron, 0.: Die Lehre von der Kettenbriichen. Leipzig, Teubner, 1929.
79. Pearson K. : "On the Criterion that a Given Set of Deviations from the Probable in
the Case of Correlated Variables ia Such that It Can Reasonably be Supposed to
have Arisen from Random Sampling 11 , Phil. Mag. 5th Ser., Vol. 50 (1900), pp. 157
- 175.
80. Pearson, K. : Tables for Statisticians and Biometricians. Cambridge University
Press, 1914.
81 . Pearson, K. : Tables of the Incomplete Gamma Function. Cambridge University Press,
1922.
82. Pearson, K.: Tables of the Incomplete Beta Function. Cambridge University Press,
1932.
83. Pomey, J. B. : Calcul des Probabilities. Paris, Gauthier-Villars, 1936.
84. Reichenbach, H. : Wahrscheinlichkeitslehre. Leiden, Sijthoff, 1935.
85. Rider, P. R.: "A Survey of the Theory of Small Samples", Annals of Math., Vol. 31,
(1930), pp. 577 - 628.
86. Rietz, H. L. : Mathematical Statistics. Open Court Publishing Co., Chicago, 1927.
87. Sasuly, M. : Trend Analysis in Statistics. The Brookings Institution, Washington,
88. Scheffe, H. : "On the ratio of the variances of two normal populations", Annals of
Math. Stat,, Vol. 13 (19^2), pp. 371 - 388.
" X """"" """"""""
89. Shewhart, W. A. : Economic Control of Quality of Manufactured Product . Van Nostrand,
1931.
90.- Shewhart, W. A. : Statistical Method from the Viewpoint of Quality Control . U. S.
Department of Agriculture, Wash -*ton, 1939.
91 . Smirnoff, V. I. : "Sur lea Ecarts de la Courbe de Distribution Qnpiriquey Recuell
Math&natique, Moscow, Vol. 6 (1939) pp. 25 - ?6.
276 _ LITERATURE PQR SUPPLEMENTARY RBADIHQ _ _ _
92. Snedecor, G, W. : Calculation and Interpretation of Analysis of Variance and
Covariance. Collegiate Press, Ames, Iowa, 193 1 *.
93. Soramerville, D. M. Y. : An Introduction to the Geometry of N Dimensions. London,
Methuen, (1929).
91*. Stevens, W. L. : "Distribution of Groups in a Sequence of Alternatives", Annals of
Eugenics. Vol. IX (1939).
95. "Student": "The Probable Error of a Mean". Biometrika. Vol. 6 (1908), pp. 1 - 25.
96. Swed, Frieda S., and Eisenhart, C.: "Tables for Testing Randomness of Grouping in
a Sequence of Alternatives", Annals of Math. Stat . , Vol. U (191*3).
97. Wald A. and Wolfowitz, J. : "On a Test of Whether Two Samples are from the Same
Population", Annals of Math., Stat., Vol. 11, (19^0), pp. U7 - 162.
98. Wald, A.:' "Contributions to the Theory of Statistical Estimation and Testing
Hypotheses", Annals of Math. Stat., Vol. 10 (1939), pp. 299 - 326.
99. Wald, A. : Lectures on the Analysis of Variance and Covariance. Columbia University
100. Wald, A.: Notes on the Theory of Statistical Estimation and of Testing Hypotheses.
Columbia University (19^1).
101. Wald, A.: "Asymptotically Shortest Confidence Intervals", Annals of Math. Stat.,
Vol. 13, (19^2), pp. 127 - 137-
102. Wald, A,: "Setting of Tolerance Limits when the Sample is Large", Annals of Math.
Stat., Vol. 13, (19^2), pp. 389 - 399-
103. Welsh, B. L. : "Some Problems in the Analysis of Regression among k samples of Two
Variables". Biometrika. Vol. 27 (1935), pp. H5 - 160.
1QJ*. Whittaker, E. T. and Watson, G, N. : A Course in Modern Analysis, Hh ed., Cambridge
University Press, 1927.
105- Whittaker, E. T. and Robinson, G.: The Calculus oj* Observations. London, Blackie
and Son, 1932.
106. Widder, D. V., The Laplace Transform. Princeton University Press, 191*1.
107. Wiener, N. : The Fourier Integral. Cambridge University Press,, 1933.
108. Wilks, S. S. : "Certain Generalizations in the Analysis of Variance", Biometrika,
Vol. 2 1* (1932), pp. Vn - 1*94.
109. Wilks, S. S. : "On the Sampling Distribution of the Multiple Correlation Coefficient"
Annals of Math. Stat., Vol. 3 (1932), pp. 196 - 203.
110. Wilks, S. S. : "Moment-generating Operators for Determinants of Product Momenta in
Samples from a Normal System". Annals of Math., Vol. 35 (193*0, pp. 312 - 3^0.
111. Wilks, S. S.: "On the Independence of k sets of Normally Distributed Statistical
Variables", Econometrica, Vol. 3 (1935), pp. 309 - 326.
LITERATURE FOR SUfflUWffff^ READING 77
112, Wilka, S. 3.: "The Likelihood Teat of . Independence In Contingency Tables", Annals
of Math. Stat., Vol. 6 (1935), PP. 190 - 195-
113. Wilka, 3. S. : "Shortest Average Confidence Intervals from Large Samples", Annals
of Math. Stat., Vol. 9 (1938), pp. 166 - 175.
ilk. Wilks, S. S. : "Analysis of Variance and Covariance of Non- Orthogonal Data", Metron,
Vol. XIII (1938), pp. 1*1 - 15*.
115. Wilks, S. S.: "Determination of Sample Size for Setting Tolerance Limits", Annals
of Math. Stat., Vol. 12 (19*1), pp. 9* - 95*
116. Wilks, S. S. : "Statistical Prediction with Special Reference to the Problem of
Tolerance Limits", Annals of Math. Stat,, Vol. 13 (19*2), pp. 4oo - 1*09.
117. Wishart, J. : "The Generalized Product Moment Distribution In Samples from a Normal
Multivariate Population", Biometrlka, Vol. 20 A (1928), pp, 32 - 52.
118. Wishart, J. and Bartlett, M. S.: "The Generalized Product Moment Distribution in a
Normal Distribution", Proc. Camb. Phil, Soc., Vol. 29 (1933), pp. 260 - 270.
119. Wishart, Ji and Fisher, R. A.: "The Arrangement of Field Experiments and the
Statistical Reduction of the Results". Imp. Bur., Soil. 3d., 1930. (Tech.
Coram. 10.)
120. Wolfowitz, J. : "Additive Partition Functions and a Class of Statistical Hypotheses"
Annals of Math. Stat., Vol. 13 (19*2), pp. 2*7 - 279-
121. Yates, P.: "The Principles of Orthogonality and Confounding in Replicated Experi-
ments", Jour. Agrlc. Science. Vol. 23 0933)> pp. 108 - 1*5.
122. Yates, F. : "Complex Experiments", Jour. Roy. Stat. Soc., Supplement , Vol. 2 (1935)*
pp. 181 - 2*7.
123. Yule, G. U. : An Introduction to the Theory of Statistics, 10th Ed., London, Griff in,
1936.
INDEX
Analysis of covarlance, 195
extension to several fixed
variatea, 199
Analysis of covariance table, 198
Analysis of variance, 176
for incomplete lay-outs, 195
for Graeco-Latin square, 192
for Latin square, 189
for randomized blocks, 180
for two-way layout, 1 80
for three-way layout , 1 86
multivariate extension of, 250
Average outgoing quality limit, 225
Average quality protection, 223
Beta function, 75
Binomial distribution, 47
Bernoulli case, ^9
moment generating function of, U8
negative, 56
Poisson case, U9
Binomial population, confidence limits
of p in large samples from, 129
Borel-measurable point set, 10
Canonical correlation coefficient, 259
Canonical correlation coefficients,
distribution of, in samples, 270
Canonical variate, 259
C. d. f. (cumulative distribution
function), 5
Central limit theorem, 81
Characteristic equation of a variance-
co variance matrix,
Characteristic function, 82
Characteristic roots
of difference of two sample variance-
covariance matrices, distribution
of, 268
of sample variance- covariance matrix,
distribution of, ?64
Chi square distribution, i 02
moment generating function of, 7^
momenta of, 103
reproductive property of, 105
Chi-square problem, Pearson's original, 217
Cochran ! a Theorem, 107
Complete additivity, law of, 6
Component quadratic forms, resolving
quadratic Into, 168
Conditional probability, 15
Conditional probability density function, 17
for normal bivarlate distribution, 62
for normal multivariate distribution, 71
Confidence coefficient, 12^
Confidence interval, *\2k
Confldefnce limits, 12U
from large samples , 127
graphical representation of, 126
of difference between means of two normal
populations with same variance, 130
of mean of normal population, 130
of p in large samples from binomial
population, 1 29
of range of rectangular population, 123
of ratio of variances of two normal
population, 131
of regression coefficients, 159
of variance of normal population, 131
Confidence region, 132
Confounding, 1 86
Contagious distribution function, 55
Consistency of estimate, 133
Consumer 1 s risk, 222
Contingency table, 21 k
Chi-square test of independence in, 216
likelihood ratio test for independence
in, 220
Continuous distribution function,
bivariate case, 10
univariate case, 8
Convergence, stochastic, 81
Correlation coefficient, 32
between two linear functions of random
variables, Jk
canonical, 260
canonical distribution of, 270
distribution of, 120
INDEX
P79
Correlation coefficient (con f t)
multiple, ^ 5
multiple, distribution of, In samples
from normal multlvarlate popula-
tion, 2kk
partial, U2
Covarlance, 32
analysis of, 195
between two linear functions of
random variables, 34
Critical region of a statistical test, 1
Cumulative distribution function,
bivariate case, 8
k-variate case, 11
continuous case, 10
continuous, univarlate case, 8
discrete, bivariate case, 10
empirical, 2
mixed case, 1 1
postulates for, bivariate case, 9
postulates for, k-variate case, 12
postulates for, univariate case, 5
univariate case, 5
Curve fitting,
by maximum likelihood, 1^5
by moments, 1 4 5
Curvilinear regression, 1 66
Difference between two sample means,
distribution of, 100
Difference of point sets, 5
Discrete distribution function,
bivariate case, 10
univariate case, 7
Disjoint point sets, 5
Distribution function,
binomial, ki
contagious, 55
cumulative, bivariate case, 8
cumulative, univariate case, 5
discrete, univariate case, 7
limiting, of maximum likelihood esti-
mates in large samples, 138
marginal, 12
multinomial, 51
negative binomial, 5 1 *
normal bivariate, 59
normal multivariate, 63
Distribution function (con f t)
normal or Gaussian, 56
of canonical correlation coefficients, 270
of characteristic roots of difference of
sample variance -co variance matrices, 268
of characteristic roots of sample
variance- covariance matrix, 26**
of correlation coefficient, 120 ^
of difference between means of two samples
from a normal population, 100 *
of exponent in normal multivariate popula-
tion, 101*
of Fisher's z, 1 1 5 \^
of Retelling 1 s generalized "Student" ^
ratio, 238
of largest variate in sample, 91
of likelihood ratio for generalized "Student"
statistical hypothesis, r38
of linear function of normally distributed
variables, 99 \S
of means in samples from a normal bivariate
population, 100, ioi\x
of means in samples from a normal multi-
variate population, 101 \s
of median of sample, 91 \S~
of multiple correlation coefficient in
samples from normal multivariate popula-
tion, 2U
of number of correct matchings in random
matching, 210
of number of trials required to obtain a
given number of "successes", 55
of order statistics, 90
of range of sample, 92 ^
of regression coefficients, k fixed variates,
16?
of regression coefficients, one fixed ^
variate, 159
of runs, 201
of sample mean, limiting, in large samples,
81
of second order sample moments in samples
from normal bivariate population, 1 1 6
of smallest variate in sample, 91
of Snedecor f a F ratio, 1U \S
of "Student's" ratio, 110 ^
of sums of squares in samples from normal
population, 1 02 ^
of total number of runs, 203
Poisson, 52
Polya-Eggenberger, 56
Type I, 76
280
INDEX
Distribution function (con't)
Type III, 72
Wishart, 120
Wishart, geometric derivation of, 227
Distribution functions, Pearson system
of, 72
Efficiency of estimates, 151*
Equality of means,
of normal populations, test of, 176
test for, In normal raultlvarlate popu-
lation, 238
Estimation,
by Intervals, 122
by points, 135
Estimates,
consistency of, 133
efficiency of, 13^
maximum likelihood, 136
optimum, 133
sufficiency of, 135
unbiased, 133
Expected value, 28
Factorial moments, 20k
F distribution, Snedecor's, 11U
Fiducial limits, 126
Finite population, sampling from, 83
Fisher's z distribution, 115
Fixed Variate, 16
Gamma function, 73
Gaussian distribution function, 56
Generalized sum of squares, 229
Graeco-Latin square, 190
analysis of variance for, 192
Gram-Charlier series, 76
Grouping, corrections for, 9k
Harmonic analysis, 166
Hermite polynomials, 77
Hotelling's generalized "Student 11
ratio, 238
Incomplete lay-outs, 192
Independence,
linear, 1 60
In probability sense, 13
of mean and sum of squared deviations
in samples from normal population, 108
of means and second order sample moments
in samples from normal bivarlate
population, 120
Independence (con ! t)
of means and second order moments in
samples from normal multivariate
population, 120, 233
of seta of variates, test for, in normal
multivariate population, 2H2
mutual, 14
statistical, 13
Inspection, sampling, 220
Interaction,
first order, 181
second order, 18^
Jacobian of a transformation,
for k variables, 28
for two variables, 25
Joint moments of several random variables, 31
Lagrange multipliers, 97
Laplace transform, 38
Large numbers, law of, 50
Large samples, confidence limits from, 127
Largest variate in sample, distribution of, 91
Latin square, 186
analysis of variance for, 189
complete set of, 191
orthogonal, 190
Law of complete additivlty, 6
Law of large numbers, 50
Least square regression function, kk
variance about, kk
Least squares, 1*3
Likelihood, 136
Likelihood ratio, 150
Likelihood ratio teat, 150
in large samples, 151
fpr equality of means in normal multivariate
populations, 238
for general linear regression statistical
hypothesis for normal multivariate
population, 2^7
for general noiroal regression statistical
hypothesis, 170
for independence in contingency tables, 220
for independence of sets of variates in
normal multivariate population, 2^2
for "Student" hypothesis, 150
for the statistical hypothesis that means
in a normal multivariate population have
specified values, 235
INDEX
281
Limiting form of cumulative distribu-
tion function as determined by
limiting form of moment generating
function, 38
Linear combination of random variables,
mean and variance of, 33
Linear combinations of random variables,
covariance and correlation coef-
ficient between, ?k
Linear functions of normally distributed
variables, distribution of, 99
Linear independence, 160
Linear regression, JfO
generality of, 165
Linear regression statistical hypothesis,
likelihood ratio test for, in normal
raultivariate populations, 2^7
Lot quality protection, 223
Marginal distribution function, 12
Matching theory,
for three or more decks of cards, 212
for two decks of cards, 208
Matrix, 63
Maximum likelihood, curve fitting by, \k$
Maximum likelihood estimate, 136
Maximum likelihood estimates,
distribution of, in large samples, 138
of transformed parameters, 139
Mean of Independent random variables,
moment generating function of, 82
Mean value, 29
of linear function of random
variables, 33
of sample mean, 80
of sample variance, 83
Means,
distribution of difference between, in
samples from normal population, 1 oo
distribution of, in samples from a normal
blvariate population, 100
distribution of, in samples from a normal
multivariate population, 101
Median of sample, distribution of, 91
M. g. f . (moment generating function), 36
Moment generating function, 36
of binomial distribution, U8
of Chl-square distribution, 7^
of mean of independent random
variables, 82
of multinomial distribution, 51
of negative binomial distribution, 5^
Moment generating function (con't)
of normal bivariate distribution, 60
of normal distribution, 57
of normal multivariate distribution, 70
of Poisson distribution, 53
of second order moments in samples from a
normal bivariate population, 118
Moment Problem, 35
Moments,
curve-fitting by, U5
factorial, 204
joint, of several random variables, 31
of a random variable, 30
Multinomial distribution, 51
moment generating function of, 51
Multiple correlation, kz
Multiple correlation coefficient, distribution
of, in samples from normal raultivariate
population, 2 4 ^
Negative binomial distribution, 5*f
moment generating function of , 54
Neyman- Pearson theory of statistical tests,
152
Normal* bivariate distribution, 59
conditional probability density function
for, 62
moment generating function of, 60
regression function for, 62
distribution of means in samples from, 101
distribution of second order moments in
samples from, 1 1 6
independence of means and second order
moments in samples from, 120
Normal distribution, 56
moment generating function of, 58
reproductive property of, '98
Normally distributed variables, dlstributior
of linear function of, 99
Normal multivariate distribution, 63
conditional probability density function
for, 71
distribution of exponent in, 10k
distribution of subset of variables in, 68
moment generating function of, 70
regression function for, 71
variance- co variance matrix of, 68
Normal multivariate population,
distribution of means In samples from, 101
distribution of multiple correlation
coefficient in samples from,
IKDEX
Normal multivarlate population (con f t)
dlatrlbution of second order moments
In samples from a, 232
general linear regression statistical
hypothesis for, 2^7
generalized "Student" test for, 25 4
Independence of means and second
order moments in samples from, 120, 233
test for Independence of seta of
variables in, 24?
Normal multivarlate populations, test
for equality of means In, 238
Normal population,
distribution of means in aainples
from, i oo
distribution of sums of squares in
samples from, 102
Independence of moan and sum of squared
deviations In samples from, i 00
Normal populations,
distribution of dil'fe fence between moans
in samples from, 100
test of equality of means of several, 176
Normal regression, 1 "; /
fundamental theorem on testing hypothesis
in, 170
k fixed varlates, 1 ^o
one fixed variato, i c ,y
Nu i s one e pa rame t e r s , 150
Null hypothesis, 1 J *7
Optimum estimate, 155
Order atat i at tea, K
distribution the ry of, 89
Ordering within sam^lea. tost, for randomness
of, r>07
Parallelogram, area o'\ <
ParalleL >t< -pe f vn] ' jrio < f , P28
Partial corre^.'ilic n, ^^
coefficient, V<
P. d. f. (probability density function), 8
Pearson system oi 1 distribution ''unot ii-na, *; ^
Pearson's original Obi -square prul 1cm, Pi 7
Point set,
difference, r
product, 5
sum, 5
Poisson distribution, ^
moment generating function of, s?
Polya-Eggenberger distribution, 55
Population parameter, admissible set of
values of, 1^7
Population parameters,
Interval estimation of, 122
point estimation of, 153
Positive definite matrix, 63
Positive definite quadratic form,
k variables, 63
two variables, 59
Power curve of a statistical test, 154
Power of a statistical test, 15?
Principal axus, 2 l j2
direction cosines of, 256
relative lengths of, r?56
Principal components of a variance, 255
Probability, conditional, 15
Probability density function, 8
bivariate case, 11
conditional, 1 7
Probability element, 8
Probable error, 58
Producer's risk, 222
Product of point sets, 5
Quadratic form,
positive definite, 59
resolving, into component quadratic
forms, 168
Quality control, statistical, 2?1
Quality limit, average outgoing, 2?3
Quality protection,
average, 223
lot, 22^
Randomized blocks, 177
Randomness, 2
Randomness of ordering within samples,
test for, 207
Random sample, definition of, 79
Random variable, definition of, 6
Range of sample, distribution of, 92
Rectangular populatl on,
confidence limits of range of, 123
distribution of range In samples from a, 92
Regression, ^
Rep,i'e^slori coefficient, Uo