Skip to main content

Full text of "Mathematical Statistics"

See other formats


CO > 



aicd TJ < 

u< OU_164832>3 

= CQ -< CO 



OUP786 13-S-75 MKQOQ, 

>!',<, '*** 

OSMAN1A UNIVERSITY LIBRARY 

Call No. ff 10 - 00 3 M Accetsion No. * 8 3 

IvJ 



Author 



.This .book should be returned on r before the date last marked below. 



MATHEMATICAL STATISTICS 



By 



8. S. WILKS 



UNIV 

MATHEMAT rr *> ISTICS 

Ijp S. S. Wllka 

ERRATA 



Page 


Line 


In Place of 


Read 


Page 


Line 


In Place of 


Road 


6 


15 


(2) 


(2') 


69 


4 


Yrr 


Y^ 


9 


4 


x 1 > -co 


Xj > -CD 


73 


top 


Distribution 


Distributions 


16 


2 


E 


E , 


79 


5b 


vandom 


random 










8l 


7 


<5 rr/n 


A rr l\Tr\ 


18 


6 


add 


and 


vj 1 


J 


/ 


/I1 


22 


ib* 


fU^Xg,...,*^ 


FU^x,,,...^) 


81 


4 


(3 cr/n 


<5<r/fH 










81 


f) 


2 / 2-2 


2 / -2 


31 


3 


X 


dor 






/ 


/riti 






X 


X 






1 


1 


42 


9 


00 00 


00 GO 


81 


7b 


n 


n 






-CD -OO 


-co -co 


80 


1 Ob 


(la. Jflb.) 


(a?)(Ib 2 ) 


45 


10 


2.93- 


2.92. 


90 


13 


n-r k -i 


n-i^ 


50 


1 


6 m 


F n 


90 


16 


n-r k -i 


n-r k 








. 




( 2 ) 






50 


2 


n 


"]/ A 














pq 


F PQ 


90 


21 


n-r.-i 


n-r. 






P 2 






(2) 


K 


k 


50 


3 


J2_L. 

o o 


p 














n' A' 




91 


Ib 


f~ 2 


f 




1-T 


l 


1 












3 


4n 2 A ? 


4nA 2 


9? 


5 


f(x ) 


f(x Q ) 


50 


14 


1 


1 

4 


93 


3b 


** 130 


473 


4 00 


50 


15 


400 


l 
4 


9* 


l ^ 
(2) 


n-r k -l 


n-r k 


55 


12 


(k+x)(k+x-i ). . . 


(k+x-i )(k+x-2)...k 


9 C > 


() 


h+l 6 


h+i<J 






...(k+1) 




98 


7 


diatritiution 


distributions 




1 z 


x x -l 


X-l X-2x 


1 00 


1 


A o 
fl 


a n 


55 


I J 


( 1+^M 1-*- k ) . . 


(i+ k )d+ k )..- 








k 






(n) 


... i 


107 


3b 


n i 


1 i 






_ d[ 


_ h 


1 16 


4b 


(x ia- a i )(x K a j) 


A lj (x iot -a l )(x ja -a 1 


55 


19 


( i +d ) h 


( 1 4-d ) d 










56 


1b 


Vh 


h 


1 19 


1 


f n-l . . 


r / n ~ 1 , 1 \ 


( s + 1) 


1 ( p + 1 ) 









2TT 










56 


2b 


^ 


^o 


1 19 


r 


r(~^- + i ) 


n- n -^ + D 










120 


4 


V^^ A 12 


if^ 1 ^ A, 2 


57 


2 

















3 


V^ 


h 












5 


ITT 


Vtr 












7 














* As counted from the bottom 





ERRATA 



Page 


Line 


In Place of 


Read 


Page 


Line 


In Place of 


Read 


120 


3b 


n(x-a) 


Hn(x-a) 


204 


9 


from 1 to 


from 


121 


1 


U-1,2) 


(1-1, 2, ...,k) 


204 


9 


n-i 
3-1 


^ton 


121 


4 


characteristic 


moment generating 










128 


Ib 


(c) 


(e) 


207 


1 


( 3n+3 ) 


(n+l) 


1 X . 








207 


16 


Pr(uu) ^ 


Pr(u^u) 


1 36 


2 


~ ^T\x~a ) 


~~ ~r>\ x~~a ) 


















21 


1 


N 


n 


1 X *7 


/r-u 


i a 5 iog p 


1 a 5 iog P 


21 


13 


(-O g 




1 3 f 


O U 


n ae* 


n ae 


210 


15 


(-D g 


( . n-h-g 


148 


iob 


significant 


significance 


215 


1 i 


n 5 , 


n J 


1 52 


ib 


w 


CO 






" 


* 










220 


1 


in terms 


in terms of 


1 57 


2b 


dy 


dy 






P 


p 






. A __ 


^ 


221 


iob 


m,n;pN,n 




158 


8 


y + bx 


y - bx 


223 


1 


maximizing p, 


maximizing p, 


162 


ib 


(J)( 6 6 ) 


<{>( 6 6 ) 














1 2 


T 2 


225 


12 


of N 1 


of N 


165 


2 
(3) 


(1 " M> 


t 1 + ^ 


226 


5 


(5-6) 


(5.5) 


165 


3 


(M-N) 


(M-i-N) 


226 


7 


(5.12) 


(5.5) 










227 


12 


dx. 


dx 


166 


19 


the likelihood 


is the likelihood 






1 


lot 


167 


2 


y 


y a 


229 


;>b* 


X i<x 


x l 








^ 


238 


^ 


a. . 


| a | 


167 


7 


a 


a 






ij 


lj 










245 


13 


y 


y 


168 


Ib 


(1) 


( j ) 






k 


3 


174 


18 


, C 


A C 


254 


1 


polynimial 


polynomial 






u uq 


u uq 


256 


1 ib 


ind e t e rmi nant 


indeterminate 


176 


2b 


j_^- 
















n 4n 




257 


19 


|| A .1P| 


HA ip l! 


181 


ib 


Y , and Y <k 


Y , and'Y , 
















J. -k: 


258 


13 


(i) 


1 


183 


6 


Y Y 

ij.' i.k' 


V Y 



















258 


3b 


j~th column 


i-th column 


183- 


7 


Y V 

.Ik' I i..' 


V "^ 














JFL. *. 

Y .j.' Y ..k> 


Y "j.' Y .'.^ 


260 


17 


The canonical 
correlation 


The correlation 


185 


ib 


assumed 


not assumed 


'71 


19 


Cochran, G. C. 


Cochran, W. G. 


186 


1 


different from 


zero 


273 


18 


Valewis 


Valeurs 






zero 












186 


8b 


3 


S 














o. . 


. . . 










188 


4 


minimizing 


maximizing 










192 


ib 


1 - 1 , 2, . . ., a 


i - i, 2, ..., r 










193 


8 


the R^ are 


the C . are 










194 


5 


r + 1 


r -- 2 










197 


14 


R i," 


S i,oo 










201 


5b 


P(r 11 ) 


P(r ) 














^ 


1J 


** As counted from the bottom of footnote. 



MATHEMATICAL STATISTICS 



By 



S. S. WILKS 



PRINCETON UNIVERSITY PRESS 
Princeton, New Jersey 

1947 



Copyright, 1943, by 
PRINCETON UNIVERSITY PRESS 



PREFACE 

Moat of the mathematical theory of statistics In Its present state has been 
developed during the past twenty years. Because of the variety of scientific fields In 
which statistical problems have arisen, the original contributions to this branch of 
applied mathematics are widely scattered In scientific literature. Most of the theory 

still exists only In original form. 



During the past few years the author has conducted a two- semester course at 
Princeton University for advanced undergraduates and beginning graduate students in which 
an attempt has been made to give the students an Introduction to the more recent develop- 
ments in the mathematical theory of statistics. The subject matter for this course has 
been gleaned, for the most part, from periodical literature. Since it is impossible to 
cover in detail any large portion of this literature in two semesters, the course has 
been held primarily to the basic mathematics of the material, with just enough problems 
and examples for illustrative and examination purposes. 

Except for Chapter XI, the contents or the present set of notes constitute the 
basic subject matter which this course was designed to cover. Some of the material in 
the author's Statistical Inference (1937) has been revised and Included. In writing up 
the notes an attempt has been made to be as brief and concise as possible and to keep to 
the mathematics with a minimum of excursions into applied mathematical statistics problems. 

An important topic which has been omitted is that of characteristic functions of 
random variables, which, when used in Fourier Inversions, provide a direct and powerful 
method of determining certain sampling distributions and other random variable distribu- 
tions. However, moment generating functions are used; they are more easily understood by 
students at this level and are almost as useful as characteristic functions as far as 
actual applications to mathematical statistics are concerned. Many specialized topics are 
omitted, such as intraclass, tetrachoric and other specialized correlation problems, 
aeml-lnvariants, renewal theory, the Behrens -Fisher problem, special transformations of 
population parameters and random variables, sampling from Poisson populations, etc. It is 
the experience of the author that an effective way for handling many of these specialized 
topics is to formulate them as problems for the students. If and when the present notes 
are revised and issued in permanent form, such problems will be Inserted at the ends of 
sections and chapters. In the meantime, criticisms, suggestions, and notices of errors 
will be gratefully received from readers. 

Finally, the author wishes to express his indebtedness to Dr. Henry Scheffe, 
Mr. T. W. Anderson, Jr. and Mr. D. F. Votaw, Jr. for their generous assistance in pre- 
paring these notes. Most of the sections in the first seven chapters and several sections 
in ChapteSX and XI were prepared by these men, particularly the first two. Thanks are 
due Mrs. W. M. Weber for her painstaking preparation of the manuscript for lithoprinting. 



S. S. Wilks. 



Princeton, New Jersey 
April, 



TABLE OF CONTENTS 

CHAPTER I, INTRODUCTION 1 
CHAPTER II. DISTRIBUTION FUNCTIONS 

52.1 Cumulative Distribution Functions 5 

2.11 IMivariate Case 5 

2.12 Blva rlate Case 8 

52.13 k-Variate Case 11 

2.2 Marginal Distributions 12 

52.3 Statistical Independence 13 

52.1* Conditional Probability 15 

52.5 The Stleltjes Integral 17 

52.51 Univarlate Case 17 

52.52 Blvarlate Case 20 

52.53 k-Variate Case 21 

52.6 Transformation of Variables 23 

2.61 Unlvariate Case 2k 

2.62 Blvarlate Case 2k 

52.63 k-Vtetrlate Case 28 

52.7 Mean Value 29 

52.71 Univarlate Case ; Tchebychef f f s Inequality 30 

52.72 Bivariate Case 31 

52.73 k-Variate Case 32 

52.71* Mean and Variance of a Linear Combination of Bandom Variables 33 

52.75 Covariance and Correlation between two Linear Combinations of ''Random 
Variables 3^ 

52.76 The Moment Problem 35 

52.8 Moment Generating -Ftinct ions 36 

52.81 Univariate Case 36 

52.82 Multivariate Case 39 

52.9 Regression ". ^0 

52.91 Regression Functions Uo 



vli 



viil TABLE OP CONTENTS 



2.92 Variance about Regression Functions fci 

V/52.93 Partial Correlation ' * . k2 

,/2.94 Multiple Correlation . . . . . 42 

CHAPTER III. SOME SPECIAL DISTRIBUTIONS 

3.1 Discrete Distributions 47 

V/ 3.11 Binomial Distribution 47 

3.12 Multinomial Distribution 50 

^3-13 The Poisson Distribution 52 

3-14 The Negative Binomial Distribution 54 

3.2* The Normal Distribution 56 

3.21 The Uhivariate Case 56 

3.22 The Normal Bivariate Distribution 59 

3.23 The Normal Multivariate Distribution 63 

3-3 Pearson System of Distribution Functions 72 

3.4 The Gram-Charlier Series 76 

CHAPTER IV. SAMPLING THEORY 

4.1 General Remarks . . 79 

4.2 Application of Theorems on Mean Values to Sampling Theory 80 

4.21 Distribution of Sample Mean 81 

^.22 Expected Value of Sample Variance 83 

U.3 Sampling from a Finite Population 83 

4.4 Representative Sampling 86 

4.41 Sampling when the p are known 87 

4.1*2 Sampling when the o-^ are also known 88 

4,5 Sampling Theory of Order Statistics 89 

4.51 Simultaneous Distribution of any k Order Statistics 89. 

4.52 Distribution of Largest (or Smallest) Variate 91 

4.53 Distribution of Median 91 

4.54 Distribution of Sample Range 92 

4.55 Tolerance Limits 93 

4.6 Mean Values of Sample Moments when Sample Values are Grouped; Sheppard 

Corrections 94 

4.7 Appendix on Lagrange's Multipliers 97 



CHAPTER V, SAMPLING FROM A NORMAL POPULATION 

55.1 Distribution of Sample Mean 98 

55.n Distribution of Difference between Tiro Sample Means 100 

55.12 Joint Distribution of Means In Samples from a Normal Blvarlate 

Distribution , 100 

55.2 The ^-distribution 102 

55.21 Distribution of Sum of Squares of Normally and Independently 

Distributed Variables 102 

55*22 Distribution of the Exponent In a Multivarlate Nonnal Distribution . . 103 

55.23 Reproductive Property of xf-Dlatrlbutlon MO? 

55. 2U Cochran f s Theorem 103 

55-25 Independence of Mean and Sum of Squared Deviations from Mean In 

Samples from a Normal Population 108 

55.3 The "Student" t -Distribution 110 

55. ^ Snedecor's F-Dlstrlbutlon 113 

55.5 Distribution of Second Order Sample Moments In Samples from a Blvarlate 

Normal Distribution 116 

55.6 Independence of -Second Order Moments and Means In Samples from a Nonnal 
Multlvarlate Distribution 120 

CHAPTER, VI. ON THE THEORY OF STATISTICAL BSTPftTIOK 

56.1 Confidence ;Intervals and Confidence Regions 122 

56.11 Case In which the Distribution Depends on only one Parameter 122 

56.12 Confidence Limits from Large Samples 127 

56.13 Confidence Intervals In the Case where the Distribution Depends on 
Several Parameters 130 

56.14 Confidence Regions 132 

56.2 Point Estimation; Maximum Likelihood Statistics 133 

56.21 Consistency 133 

56.22 Efficiency 134 

56.23 Sufficiency 135 

56. 2k Maximum Likelihood Estimates 136 

56.J Tolerance Interval Estimation 

56. U The Pitting of Distribution Functions 

CHAPTER VII. TESTS OF STATISTICAL 

57.1 Statistical Tests Related to Confidence Intervals 11*7 

57.2 Likelihood Ratio Tests 130 



TABLE OP CONTENTS 



57-3 The Neyman- Pears on Theory of Testing Hypotheses 152 

CHAPTER VIII . NORMAL REGRESSION THEORY 

58.1 Case of One Fixed Varlate 157 

58.2 The Case of k Fixed Varlates 160 

58.3 A General Normal Regression Significance Test . , 166 

58. k Remarks on the Generality of Theorem (A), 58.3 171 

58.41 Case i 171 

58.^2 Case 2 172 

58.43 Case 3 173 

8.5 The Minimum of a Sum of Squares of Deviations with Respect to Regression 

Coefficients which are Subject to Linear Restrictions 17^ 

CHAPTER IX. APPLICATIONS OF NORMAL REGRESSION THEORY TO ANALYSIS OF VARIANCE PROBLEMS 
59-1 Testing for the Equality of Means of Normal Populations with the Same 

Variance 176 

59.2 Randomized Blocks or Two-way Layouts 177 

59-3 Three-way and Higher Order Layouts; Interaction 181 

59 ** Latin Squares 186 

59.5 Graeco-Latln Squares 190 

59.6 Analysis of Variance In Incomplete Layouts 192 

59-7 Analysis of Covarlance 195 

CHAPTER X. ON COMBINATORIAL STATISTICAL THEORY 

510.1 On the Theory of Rims 200 

510.11. Case of Two Kinds of Elements 200 

510.12 Case of k Kinds of Elements 205 

510.2 Application of Run Theory to Ordering Within Samples 206 

510.3 Matching Theory 208 

510.31 Case of Two Decks of Cards 208 

510.32 Case of Three or More Decks of Cards 212 

510.4 Independence in Contingency Tables 213 

510.U1 The Partitional Approach 213 

10.^2 Karl Pearson's Original Chi-Square Problems and its Application 

to Contingency Tables 217 

10.5 Sampling Inspection 220 

510.51 Single Sampling Inspection 221 

10.52 Double- Sampling Inspection 22U 



CHAPTER XI. AN INTRODUCTION TO MUUTIVARIATE STATISTICAL ANALYSIS 

1 . 1 The Wishart Distribution 226 

511.2 Reproductive Property of the Wishart Distribution 232 

511*3 The Independence of Means and Second Order Moments in Samples from a 

Nonnal Multivariate Population 233 

511.^ Retelling's Generalized "Student" Test 23^ 

511.5 The Hypothesis of Equality of Means in Multivariate Normal Populations , 238 

511.6 The Hypothesis of Independence of Sets of Variables In a Normal 
Multivariate Population 

511.7 Linear Regression Theory in Nonnal Multivariate Populations 

511.8 Remarks on Multivariate Analysis of Variance Theory 250 

511.9 Principal Components of a Total Variance 252 

511.10 Canonical Correlation Theory 257 

511.11 The Sampling Theory of the Roots of Certain Determinantal Equations . . . 260 

511.111 Characteristic Roots of One Sample Variance- co variance Matrix . . 261 

511.112 Characteristic Roots of the Difference of Two Sample Variance- 
covariance Matrices 265 

511.113 Distribution of the Sample Canonical Correlations 268 

LITERATURE FOR SUPPLEMENTARY READING 271 

INDEX 279 



CHAPTER I 
INTRODUCTION 

Modern statistical methodology may be conveniently divided Into two broad 
classes. To one of these classes belongs the routine collection, tabulation, and des- 
cription of large masses of data per ae, most of the work being reduced to hlgjh speed 
mechanized procedures. Here elementary mathematical methods such as percentaglng, 
averaging, graphing, etc. are used for condensing and describing the data as JLt Is. 
To the other class belongs a methodology which has been developed for making predictions 
or drawing inferences, from a given set or sample of observations about a larger set or 
population of potential observations. In this type of methodology, we find the mathe- 
matical methods more advanced, with the theory of probability playing the fundamental 
role. In this course, we shall be concerned with the mathematics of this second class 
of methodology. It is natural that these mathematical methods should embody assumptions 
and operations of a purely mathematical character which correspond to properties and 
operations relating to the actual observations. The test of the applicability of the 
mathematics in this field as in any other branch of applied mathematics, consists In 
comparing the predictions as calculated from the mathematical model with what actually 
happens experimentally. 

Since probability theory is fundamental in this branch of mathematics, we 

should examine Informally at this point some notions which at least suggest a way of 



setting up a probability theory. As far as the present discussion is concerned, perhaps 
the best approach is to examine a few simple empirical situations and see how we would 
proceed to idealize and to set up a theory. Suppose a die is thrown successively. If 
we denote by X the number of dots appearing on the upper face of the die, then X will 
take on one of the values l, 2, 3, **> 5, 6 at each throw. The variable X jumps from 



*For an example of such a comparison, see Ch. 5 of Bortkiewlcz 1 Die Iterationen. Springer t 
Berlin, 191?. 



I JLNTKUbUUT ION 



value to value as the die la thrown successively, thus yielding a sequence of numbers 
which appear to be quite haphazard or erratic in the order in which they occur. A sim- 
ilar situation holds in tossing a coin successively where X is the number of heads in a 
single toss. In this case a succession of tosses will yield a haphazard sequence of 
o's and 1 f s. Similarly, if X is the blowing time in seconds of a fuse made under a 
given set of specifications, then a sequence, let us say of every N th fuse from a pro- 
duction line will yield a sequence of numbers (values of X) which will have this char- 
acteristic of haphazardneaa or randomness If there is nothing In the manufacturing oper- 
ations which will cause "peculiarities'* In the sequence, such as excessive high or low 
values, long runs of high or low values, etc. We make no attempt to define randomneaa 
in observed sequences, except to describe it rougjhly as the erratic character of the 
fluctuations usually found In sequences of measurements on operations repeatedly per- 
formed under "essentially the same circumstances", as for example successively throwing 
dice, tossing coins, drawing chips from a bowl, etc. In operations such as taking 

fuses from a production line and making some measurement on each fuse (e. g. blowing 

'% 

time) the. resulting sequence of measurement^ frequently haa "peculiarities" of the kind 
mentioned above, thus lacking the characteristic of randomness. However, it haa been 
found that frequently a 'state of randomness similar to that produced by rolling dice, 
drawing chips from a bowl, etc., can be obtained In such a proceaa aa mass production 
by carefully controlling the production procedure.* 

Now let us see what features of these empirical sequences which ariae from 
"randomizing processes can be abstracted Into a mathematical theory- -probability 
theory. If we take the first n numbers In an empirical sequence of numbers X 1 , X 2> 
Xj, . . . , ^ . . . . , there will be a certain fraction of them, aay P n (x), leaa 
than or .equal to x, no matter what value of x la taen. For each value of x, oF n (x)i . 
We shall refe* to F n (x) as the empirical cumulative distribution function of the numbers 
^/^g^X,,*..,^... As x Increases, P n (x) will either increase or remain constant. 
Itis a matter of experience that as n becomes larger and larger F n (x) becomes more 
and more stable, appearing to approach some limit, aay P^U) for each value of x. 



*Shewhart has developed a statistical jnethod of quality control in maaa production engin- 
eering which is essentially a practical empirical procedure for approximating a atate of 
randomness ( statist icrfl control . to use Shewhart's term) for a given measurement in a 
sequence of Articles from a production line, by successively identifying and eliminating 
causes of peculiarities In the sequence back In the materials and manufacturing oper- 
ations. 



I. INTRODUCTION 



If any subsequence of the original sequence Is chosen "at random" (i.e. according to any rule 
which does not depend on the values of the X'a) then a corresponding ? n (x) can be defined 
for the subsequence, and again we know from experience that as n increases, F n (x) for the 
subsequence appears to approach the same limit for each value of x as in the original 
sequence. 

Entirely similar experimental evidence exists for situations in which the em- 
pirical sequences are sequences of pairs, triples, or sets*of k numbers, rather than 
<*-* 

sequences of single numbers. For example, a sequence of throws of pairs of dice would 
give rise to a sequence of pairs of numbers; the resistance, capacity, and inductance of 
each relay in a sequence cf telephone relays from a carefully controlled production line 
would yield a sequence of triples of measurements. In considering a random sequence of 
pairs of numbers (X n , X 21 ), (X 12 , X 22 ), . . . , (X 1n , X 2n ), . . . , we can let F n (x 1 ,x 2 ) 
be the proportion of pairs in the first n pairs in which the value of X 1 is less than or 
equal to x 1 and the value of X 2 is less than or equal to x 2 . We need not list all of the 

properties of P n (x 1 , x^), for they are straightforward extensions of those of F n (x) con- 

*.,* 
sidered above. The important point here is that as n increases, experience indicates 

that F n (x 1 , x 2 ) appears to approach some limit F^ (x 1 , x g ) for each value of x 1 and of x g . 

In particular, suppose we group the numbers of an empirical random sequence 
X 1 , X 2 , . . . , X^ . . . (with empirical cumulative distribution function F n (x)) into 
pairs (or samples of two numbers), so as to make a new sequence of pairs of numbers 
(X^ X g ), (X,, X^), . . . , (Xgn-i' X 2n^' Aa before, we have an empirical cumu- 
lative distribution function F n (x 1 , x g ) for this sequence of pairs. It is an experimental 
fact that as n becomes larger and larger, F (x- , x 2 ) behaves more and more nearly like the 
product F n (x 1 ) F n (Xp)* A similar situation is true for sequences of sample'^ of three or 
more numbers. As we shall see later, 'it is this product property that suggests a 
way to set up a mathematical theory of sampling. X s1 

The matter of F n (x) appearing to approach some function F^ (x) as n increases is 

, u *3* 

purely an empirical phenomenon, and not a mathematical one, but it suggests a way of set- 
ting up a mathematical model corresponding to any randomizing process which, upon repeated 
application will yield an empirical sequence of numbers. We postulate the existence of a 

function F(x)^(the properties of this function are given in 2.11) to serve as a mathe- 

* 
matical model for F^x). In some situations such as coin tossing, dice throwing, etc., 

a complete numerical specification of F(x) can be proposed by combinatorial and other a 
priori considerations. In other situations of a more purely statistical nature it may be 
Impossible to specify F(x") beyond a particular functional form involving certain parameters. 



I. INTRODUCTION 



In attempting to relate the behavior of the empirical cumulative distribution 
function F n (x) to the mathematical abstraction P(x) one encounters at least two diffi- 
culties: One is common to all mathematical theories of physical (chemical, biological, 
sociological) phenomena, employing limits: the mathematical process of passing througji 
an infinite number of steps is physically unrealizable, and is often impossible even as 
a "thought -experiment". For example, let the reader consider the notion of mass or 
charge density in the light of the fact that mass and charge are discrete. The other 
difficulty is peculiar to probability theory in that the theory does not assert that 
nMffi^n^ "" F (*)> kiit that the a PP roa ch is in a sense defined within the framework of the 
theory itself: F n (x) converges stochastically to F(x). Stochastic convergence is de- 
fined in 5U.21 . 

Once F(x) has been postulated, the mathematics begins and it consists of carrying 
out various mathematical manipulations on F(x) corresponding to certain operations which 
can be performed on the sequence produced by the given randomizing process. The mathe- 
matics then becomes a method of making predictions of what will happen if certain opera- 
tions are applied to the sequence. For example, F(b)-F(a) is a prediction of the pro- 
portion of times, in a large number of trials, that the given process will yield numbers 

+00 
greater than a and leas than or equal to b; J x dF(x) (taken in the Stieltjea sense, 

-CD 

2.5) is a prediction of the average of numbers obtained in a long series of repeated 
applications of the process; P(X I )-F(x 2 ) is a prediction of the proportion of samples of 
pairs of numbers, out of a large number of such pairs, in which the first number is 

x 1 and the second ^x ? ; JJ dP(x 1 )dF(x 2 ), where R is the region in the x^ 2 plane for 

i 
which A^-U^x^B, is a prediction of the proportion of samples of pairs of numbers, 

out of a large number of such pairs, in which the average of the sample pair lies be- 
tween A and B. Many other examples could be given here but these will perhaps Illus- 
trate the nature of the correspondence between the mathematical operations performed on 
F(x) (1. e. probability theory) and calculations based on the results of repeated appli- 
cations of a given randomizing process. The degree of correspondence, 1. e. validity of 
prediction, depends on the degree of randomness In the empirical sequence and on how 
well the function F(x) has been chosen. That such predictions, correctly applied, have 
practical validity has been experimentally verified many times. 



See a study by V. I. Smirnoff, "Sur les ecarts de la courbe de distribution empirique", 
Recueil Mathematique. Moscow, vol. 6 (1939), pp. 25-26. 



CHAPTER II 
DISTRIBUTION FUNCTIONS 

In this chapter we outline the basic probability theory necessary for the work 
of the course. The treatment is general, the study of important particular distributions 
being postponed to the next chapter. 

2.1 Cumulative Distribution Functions 

In the previous chapter we have introduced the notion of an empirical cumula- 
tive distribution function (c. d. f.)F (x), and have indicated that it is an experimental 
fact that F (x) appears to approach a limiting form ^(x) as n is increased. We now de- 
fine a mathematical model F(x) for "the intuitively apprehended IjLCx) by laying down 
postulates for distribution functions. Henceforth the term cumulative distribution 
function (c. d. f.) will be used only in the sense defined below. 

We shall find it convenient to use the following notations and definitions 
from point set theory: PE signifies that the point P belongs to the set E. E^E 2 is 
read "E. contains E ". The sum (or union) of E- and E is the totality. of points P for 

I Cl I C- 

which PE 1 or E 2 ; we shall denote it by E I + E 2 . The product (or intersection) of E 1 and 
E 2 is the totality of points P for which P both E 1 and E 2 ; we write it E.jE 2 . E 1 and E 2 
are said to be dis joint if they have no points in common. The difference E. - E 2 is the 
totality of points in E I not in E 2 

2.11 Univariate Case 

A d. L* F ( x ) ia defined by the following postulates: 

1 ) If x ! <x ff , then F(x ) - F(x ) ^o. 

2) F(-co) 0, F(-fOo) = 1. 

The notation in (2) implies that the limits of F(x) exist as x - oo or 4-00. Since (1 ) 
means that F(J^ is monotone, It follows that F(x) has at most an enumerable number of 
discontinuities, and that the limits F(x+o), F(x-o) exist everywhere. The determination 
of the values of F(x) at its discontinuities Is really not essential, but it will be con- 
venient to fix them by 

3) F(x+0) - F(x). 



II, DISTRIBUTION FUNCTIONS 



It follows from (1) and (2) that P(x) la non- negative. 

The relation between probability statements about a random variable X and Its 
c. d. f. Is determined by the following further postulates: 

T) Pr (Xc) - P(x). 

The left member Is read "the probability that X x ." Let E 1 , E 2 , ..... , be a finite or 
enumerable number of disjoint point sets on the x-axis: 

2 f ) Pr(XE 1 +E 2 4...) = Pr(XE 1 ) + Pr(XE 2 ) + ... This may be called the law ol 
complete addltivity, and may be used to determine the term on the left side of the equa- 
tion, or any term on the rigjit, when all the other probabilities entering the equation 
are known. For example, let I be the Interval x f < x < x f * , I 1 be the Interval - CD < x 
x f , I" be the Interval - oo < x < x". Then 

I" I 1 + I. 

From ( 1 f ) 

Pr(XI ! ) - F(x'), Pr(Xei") 

and hence from (2 r ) we may state the theorem 

A) Pr(x'<xc") - F(x") - 



In order to find the probability that X be equal to a given value x 1 take a sequence of 
points a 1 <a 2 <a,<... converging to x 1 . Let I be the Interval a^x^x 1 , and I. be the inter 



val a ,-<xa u . . -Then 
J J 



Hence from (2 1 ), 

OD 



Pr(XI) Pr(X-x') + Pr(XIj), 

and from theorem (A), 

oo 

F(x') - F(a ) - Pr(Xx') + X [F(a Ul )-F(a .)]. 

- J J 



*In this chapter it is convenient to denote a random variable by a capitaA.etter, X, 
etc., and the corresponding independent variable in the distribution function by the 
corresponding lower case letter, x, etc. In later chapters we will drop this convention 
when there is no danger of confusion. 



$2.11 II. DISTRIBUTION FUNCTIONS 



Now 





1 .)-P(a.)] - llm2[P(a 1 . l )-P(a 1 )] - llm [P(ai . )-P(a. )] - P(x'-O) - 
J J n-oo j-1 J+1 J n-oo n+1 1 

Hence we have the theorem 

B) Pr(X-x) - P(x) - P(x-O). 
In a similar manner one may derive the following theorems: 

C) Pr(x'<X<x") F(x"-o) - F(x'), 
Pr(x'pCx") - P(x") - F(x'-o), 
Pr(x'pC<x") - F(x"-o) - F(x'-o). 

D) Pr(XE) 1 (for any aet E for which the middle 

member la defined). 

E) Pr(- CD < X < + oo) = l. 

Let E 1f E g ,...., be sets which are not necessarily disjoint, then 
P) Pr(XE + E) - PrfXCE) + Pr(XE 2 



PrUeE.+E +E ) - 5_ Pr(XE.) - V" PrfXCE.E.) + PrtXCE.EpE, \ etc 

~ s= ^ J J 



We now characterize two important classes of c. d. f. f s: 

(1) Suppose that F(x) increases only by jumps, more precisely, suppose a 
finite or an enumerable set of points x^ x g , ---- , and corresponding positive numbers p 1 , 
P 2 ,..., SIpj-1, such that P(x) = Zp. summed over all j for which xXx. We shall call 
thia the discrete case. It may be shown from the theorem (B) that in this case 
Pr(X-x 1 ) - p 1 , while for any point x 1 ^ any x^ Pr(X=x ! ) = o. 

If the number of x^ is finite, or more generally, if the x 1 have no cluster 
points except + 00, then the graph of P(x) in this case is a step-function made up of 
horizontal lines as shown in (a) of Figure 1 . The jump at x = x^ is equal to p^, the 



*It should be noted that an empirical c. d. f . F n (x) of an observation variable X has pro- 
perties (1), (2), (3) of a c. d. f. (discrete case). P n (x) does not have properties (1 1 ) 
and (2 1 ), although it has analogous properties. That is, corresponding to (1 ! ) we would 
have Prop(X^x) F n (x) (proportion of values of Xx is F n (x)) and for (2 1 ) we would have 
Prop(X6E 1 +E 2 +...+E^ r ) Prop (XE 1 ) 4- Prop (XE 2 ) +...+ Prop (X6E k ). Thus, in the case of 
F n (x), P A would be the proportion of cases among the n values of X, in which the observa- 
tion variable x 4 and not probability that X = x, . 

^ * 



II, DISTRIBUTION FUNCTIONS 



probability that X - x^. 

(ii) Another case la characterized by the existence of a function f(x) ^ o 
auch that 



P(x) -f f 

-00 



d*. 



Thla. la really a neceaaary and sufficient condition for the abaolute continuity of F(x), 
but Instead of calling thla the absolutely continuous case, we shall refer to It merely 
as the continuous case. The graph of F(x) In this case Is continuous as shown In (b) of 
Figure 1 . We shall call f (x) the probability density function of the random variable X. 
The reader may show that in this case 



J f() tt, 



and that the statement remains valid if one or both of the equality signs inside the par- 
entheses on the left are deleted. If f(x) is continuous for x 1 < x < x", 

Pr(x'xc") - (x"-x f ) f(Xj), where x 1 < x 1 < x", 
and if f(x) is continuous at x , 

PrCx^X^+dx) = f(x Q ) dx, 

except for infinitesimals of higher order. The infinitesimal f(x)dx is sometimes called 
the probability element of X. 

The discrete and continuous cases thus defined obviously do not cover all uni- 

variate c. d. f. f s, but we shall confine ourselves to these in the present course. 
F(x) F(x) 





Pjl 




P(x) 




X* X 

< a ) 




F(x) 



by 



X 

Figure 1 
2.12 Blvarlate Case 

Let J be a rectangle in the x 1 , x 2 plane, x{ < x 1 
1 , x g ) the second difference 



xj. Denote 



&2.12 II. DISTRIBUTION FUNCTIONS 



Then a c. d. f. FU^Xg) is subjected to the following postulates: 

1) AJ F(x 1 ,x 2 ) } o. 

2) F(-CQX 2 ) - F(x 1 ,-oo ) = o, FUoo,-fO>) - 1. 
By letting x ! -oo In (1), we get with the aid of (2), 

F(x v x) - F( XI ,X^) ^ if x > x, 
and similarly 

F(xf,x g ) - F(x|,x 2 ) ^ o If xf > xj, 

so that F(x.,x 2 ) is monotonic in each variable separately. Hence the limits F(x-fO,x 2 ), 
FfXjjXg+o) exist everywhere. It can be shown that Ffx^Xg) is discontinuous in x 1 at 
worst on an enumerable number of lines x 1 - constant, and similarly for x 2 . If we let 
xj > - oo and x* - oo in (i), we get (x^ 9 x^) ^ because of (2). The values of 
FCx^Xg) at its discontinuities are fixed by 

3) F(x v x 2 ) = F(x^o,x 2 ) - F(x^,x 2 +Q). 

The tieup of probability statements about a vector random variable X 1 ,X 2 with 
two components with its c. d. f . is determined by the following further postulates: 



T) 
Let E. , E 2 ,...., be disjoint sets, then 

2') Pr(X 1 ,X 2 E 1 +E 2 +...) = PrtX^XgCE,) + Pr(X 1r X 2 E 2 ) + ... 
By methods of 2.11 the reader may verify the following theorems: 

A) Pr(X v X 2 J) - AjF(x 1 ,x 2 ), 
where J and dj are defined above. 

B) Pr(x<X^xy,X 2 =x 2 ) = F(x;,x 2 ) + F(x,x 2 -0) - F(xj,x 2 ) - P(xf,x 2 -o). 

C) PrtX^x^Xg-Xg) - F(x v x ? ) + F(x r o,x 2 >0) - F(x r o,x 2 ) - F(x 1 ,x 2 -0). 

It can be shown by methods beyond the level of this course that from the postulates (1 ! ), 
(2 f ) the probability Uv.ot X 1 ,X o rE is determined for /t very general class of regions 



10 _ II. DISTRIBUTION FUNCTIONS ' _ $2.12 
called Borel-measurable regions. 

D) o Pr(X 1 ,X 2 E) 1. 

E) Pr(-OD<X i <+oo,-CD<X 2 <+oo) - 1. 
For sets E 1 , E 2 ,...., not necessarily disjoint, 

F) Theorem (F) of 2.11 is valid 

With a bivariate distribution function we shall be mainly interested in the 
discrete case and the continuous case, and occasionally a mixed case, all defined below. 
We remark again that these categories are not exhaustive. 

i) The discrete** case is characterized by the existence of a finite or enum- 
erable set of points Cx llf x 2l ), i=i,2,..., and associated positive numbers p.^ (probabilities) 
Tp = i, guch that F(Xj,x 2 ) =5_.p., summed for all j for which x 1 . ^ x 1 and x 2 . x g . From 
theorem (C) it follows that Pr(X 1 =x li ,X 2 x 2i ) - p^ and for any point (x^x^) not in the 
set UjX)* Pr-xX-x) - 0. 



ii) By the continuous case (see remarks in 2.11 about absolute continuity) 
4 e shall understand that in which there exists a function f(x 1 ,x g ) ^ o such that 



1 2 

( a ) 



-00 -00 

We may show that 



f(x 1 ,x 2 ) dx 1 dx 2 , 

J 



9 

*In k-dimensional space a Borel -measurable region (or a Borel set) is one that is obtain- 
able from half -open Intervals or cells, x| < x x, 1 - 1 , 2,...,k, by taking a finite 
or enumerable number of sums, differences, and products of such cells. A function f(x) 
is Borel -measurable if the set of values of x for which a<f(x)bisa Borel set, 
where a and bare any two real numbers. A Borel-raeasurable function of two or more 
variables is similarly defined. 

**As in the case of one variable, it should be observed that an empirical c: d. f . 
F n (x 1 ,x 2 ) of two observation variables X 1 , X 2 has properties (1), (2), (3) of a c. d. f. 
for two random variables (discrete case). But, in (i ') and (2') one would use the term 
-proportion of cases " instead of the term "probability of. The p associated with the 
isolated points (x 1 ,x 2l ) would be called the proportion of cases for which X 1 - x , 
X 2 - x 2l , instead of the probability that X, = x 1 , X 2 - x 2 . The number of such joints 
would be j n, the number of observed pairs of values of X 1 , X . 

This comparison of an empirical c. d. f . and the case of discrete variables 
extends at once to the case of k variables discussed in 2.13. 



II. DISTRIBUTION FUNCTIONS 11 



and that the result la not Invalidated If J is closed by the addition of its boundaries. 
Prom this it follows that, except for infinitesimals of higher order, 



and f (x 1 ,x 2 )dx 1 dx 2 are called respectively the p. d. ft and the probability 
element of the random variables X 1 , X 2 . 

iii) The mixed* *case (X 1 continuous, X 2 discrete) is said to obtain if there 
exists a finite or enumerable set of lines x 2 - x 2i , i = 1, 2,..., associated positive 
numbers p gi , 27 P 2i - i, and a non-negative function of x 1 and x 2 defined for all x 1 and 

for x 2 - x gi , i - 1, 2,..., which function we shall write as ffx^x^), such that 

+00 

J f(? 1 !x 2l )d! 1 - 1, i - 1, 2 ,..., 

-oo 
and 

- *1 
FU^Xg) =Zp 2 ,\ f (e i !x 2 ,)d5 1 , summed over all j for which x g * x g . 

-oo 
In the mixed case p 2i is the probability that the random point X 1 , X 2 will fall on the 

line x 2 - x 2i , and f(x 1 lx 2 )dx l is the probability (to within terms of order dx 1 ) that 
x. J <X 1 <x l +dx 1 if the random point falls on the line x 2 = x 2i . 

It may be shown from our postulates that for any (B-meas.) region E in the 

we get in the three cases 



i) Xp. summed over all j such that x 1 .,x 2 ,E, 



Pr(X 1 ,X 2 E)= ii) 



P 2 i f( x 1 l x 2i^ dx T where E 2i l3 the Projection 
E 2l on the x 1 axis of the part of 

the line X 2 x 2l lying in E. 
( If the line does not Inter- 
sect E, the corresponding in- 
tegral is zero. ) 

By means of the Stieltjes integral (2.5) these three cases may be brought under the 

single expression PrfX^XgCE) =JdF(x 1 ,x 2 ), which includes Indeed the most general case. 

E 
2.13 k-Variate Case 

A k-variate c. d. f. F(x 1 yx 2 , . . ,,x k ) must satisfy the following three postu- 
lates: Let J be the k-dimensional cell xj<x^xj, i-i,2,.. t k, and define the k-th 
difference 



*probability density function 

**The reader will understand this case better if he rereads this description after having 
mastered 2.4. 



12 _ II. DISTRIBUTION FUNCTIONS _ $2.2 

k 

ZA F(XJ f Xg, . . . jX k ) ^1^2* ' *^k-i^k^ x i '*2' * * 
J 

where the operators A, are applied successively and denote 



1) Aj F(x lf x 2 ,...,x k ) ^ 0. 

2) F(-oo,x 2 ,,..,x k ) FU^-oo, x 3 ,...,x k ) - .. 
- F(x 1 , . . .,x jc _ 1 ,-oo ) 0, F(+oo,+oo, . . .,-foo 

3) 



. . . >^_ ^ ,^-f^ . - 9 . . . * k . 

As In the bivarlate case It can be shown from ( 1 ) and ( 2 ) that F is monotonlc In each 
variable separately and that F Is monotonlc (In the sense of (1 )) In any set of variables 
if the remainder are held fixed. 

A random vector variable X - (X 1 ,X g , . . . ,X^) is said to have the c. d. f. 
F(X I ,x 2 ,. ..,x k ), --or the random variables X.j,X 2 ,...,X k are said to be jointly distri- 
buted with the c. d. f , if furthermore 

1 ! ) PrU^x^X^Xg,...,}^^) - F(x 1 ,x 2 ,...,x k ). 
If E^Eg,..., are a finite or enumerable number of disjoint sets, 

2 f ) Pr(XE 1 +E 2 +. . . ) Pr(XE 1 ) + Pr(XE 2 ) + . . . 

By thejnethods used before we may now generalize the theorems (A) to (F) of 2. 11 and 
2.12. 

The discrete case and the. continuous are defined by obvious generalization of 
2.12, and it is evident how mixed cases of various orders would now be defined. 

2.2 Marginal Distributions 

Suppose the joint c. d. f. of the random variables X 1 ,X 2 is FU^Xg), and con- 
sider the probability that X 1 x , without any condition on X 2 : 



This is called the marginal distribution of X 1 . We note that it is a bona fide distribu- 
tion function as defined in 2.11, In fact, it is the univarlate c. d. f. of X.. Simi- 
larly, we define F(+co,x 2 ) as the marginal distribution of X 2 . 
For the discrete case defined In 2.12 we then have 



F(x 1 ,+oo ) m JTp j summed for all j such that x 



t ^ 



$8.5 II. DISTRIBUTION FaHCTIOHS _ 13 

For the continuous case, 



?1 f *1 

(a) F(x,, +00) -( J f(jr 1f f 8 )d| t a - J f ,(!,),, 

* - - 



OD -00 -00 

where 

+00 



-00 

f 1 (x 1 ) may be called the marginal p. d. f . of X 1 . 

In the trlvarlate case we get besides the marginal distribution of each random 
variable separately, for example, 



also marginal distributions of pairs of random variables, for example, 

x+oo j 



For a k-varlate distribution one likewise defines marginal distributions of the random 
variables taken one at a time, In pairs,..., k-1 at a time. We note that all these mar- 
ginal distributions satisfy the postulates (1), (2), (3), 1 ), (2 ! ) for a c. d. f . 

2 . 3 Statistical Independence 

If PJx^Xg) Is the c. d. f. of X^X 2 , then from 2.2, 

P 1 (x 1 ) - P(x lf +a> ), F 2 (x 2 ) F(+co,x 2 ) 

are the marginal distributions of X 1 and X g , respectively. We say that the random vari- 
ables X 1 ,X g are Independent In the probability sense, or statistically Independent , If 



(a) P(x lf Xg) - PjU,) F 2 (x 2 ). 

It is easily seen that a necessary and sufficient condition for the statistical independ- 
ence of X 1 and X g is that their joint c. d. f . factor into a function of x^ alone times 
a function of x 2 alone, i. e., 



Ftx^Xg) - G(x 1 ) H(x 2 ). 

In order to see the probability implications of statistical Independence, con- 
sider any two intervals I 1 and I 2 on the x 1 and x 2 ~axes, respectively, 



V x l < x i ^ x " 



<x 



V 



U TT. DTOTRIBffrTOtt PTOfiTIQNS 



and let J be the rectangle of points (x^X) satisfying both these inequalities. Then 
(b) 



For, by hypothesis, we have (a); hence 

2 
Pr(X 1f X 2 J) - AjPtx^Xg) - F^xfJPgtxg) + PjUj )P 2 (x|) - P, (xj )P 2 (xg) - Pj (xf 

After factoring the right member we easily get (b). 

By the same method, and with the aid of Theorem (B) of 2.11 and Theorem (CD 
of 2.12 we get that if X 1 and X 2 are statistically independent, then 



This is of importance for the discrete case. For the continuous case we may state the 
following result: If ffx^Xg) is the joint p. d. f. of X^Xg, if f j(*j) is the marginal 
p. d. f. of Xj, J - 1,2, and if X 1 ,X 2 are statistically independent, then 

f(x 1 ,x 2 ) - fjU,) fg (x 2 ) 

wherever ffx^Xg) is continuous. At the points of continuity, we have from equation (a) 
of 2.12, 



6F 2 (x g ) 



dx 1 



the last step following from (a) of 2.2. 

k random variables are said to be mutually (statistically) independent if their 
Joint c. d. f. is of the form 

F(x 1 ,x 2 ,...,x k ) - F 1 (x 1 )F 2 (x 2 )...F k (x k ), 

where PJ(XJ) is the marginal distribution of X*. Two random vector variables X^ 
(*liXi2'***'*lkif)' * " 1 ' 2 ' are called statistically independent if the joint c. d. f. 
of the k 1 -f kg components is the product of the marginal distributions of X 1 and X 2 : 



The definition of the statistical independence of n vector random variables is made as 



DISTRIBUTION FUNCTIONS 



the obvious generalization. 

The concept of statistical Independence Is fundamental In sampling theory 
n random variables are said to constitute a random sample from a population <4.1 ) with 
c. d. f. P(x) If their joint c. d. f. Is P(x 1 JFUg)..^^). If the population distribu 
tion Is k-varlate with c. d. f. F(X I ,x 2 , . . . ^x^), then the n vector variables 
^1 (*ii jXi2> ** ''^ik^' * ** 1 ' 2 >"'> n > are aaid to t> e a random sample If the joint 
c. d. f . of the set lX 1<x f Is 

n 

I I F(Xj i 9 ^"\ 2* * * * '^Uc * 

2.k Conditional Probability 

Let X be a random variable, and let R be any (Borel) set of points on the 
x-axls. Let E be any (Borel) subset of R. If Pr(XR) ^ 0, we define the conditional 
probability Pr(XE|XR), read "the probability that X is in E, given that X is in R", as 

(a) Pr(XE|XR) ~ 

The definition (a) extends immediately to any finite number of random variables. For 
example for two random variables X 1 , X 2 , R would represent a (Borel) set in the x^ 2 
plane and E would be a subset of R. 

Of particular interest is the case in which R is a set in the x^ 2 plane for 
which X^E. where E. is any (Borel) set in the domain of X 1 , and E is the product or inter 
section set between R and a similar set for which X 2 E 2 , where E P is any (Borel) set in 
the domain of X 2 . Here we may write E = E 1 E 2 ' The 3im Pl e3t case is that in which E 1 is 
an interval xj < x <| x^ and E 2 la an interval x < x 2 < x. Then R la the horizontal 
strip x^ < x 2 x|, and E is the rectangle for which x| < x 1 xf and x < x 2 x. In 
the present case, expression (a) may be written in the form 

Pr(X lf XpE) 



(b) . 

Because of symmetry, we may also write 



Pr(X.,X p E) 



In a similar manner we may write for the case of three variables 

PP(X 1 ,X 2 ,X,E 1 E 2 E,) 



and so on for any number of variables. The relation (b) may of course be expressed in 



16 II. DISTRIBUTION FUNCTIONS 



terms of distribution functions. In particular, if X^Xg have a blvariate p. d. f. 
f(x 1 ,x 2 ), and Ejis the set 

(c) x{ x, ^x? 
on the x^axis, and Eg is the set 

(d) x x g x + h 

on the Xg-axis, then E is the rectangle in the x^g-plane defined by (c) and (d). Equa- 
tion (b) becomes 

x? 



A^TJ.1 J\ t 

f ( f X X dX ( 
J * 121 

(e) 



if the denominator does ipt vanish. If ffx^Xg) is continuous in the rectangle E, the 
denominator may be written 

h fg(5 2 ), where x < E 2 < x + h, 

and the numerator, 

h ffXjjiigfx, ))dXj, where x < < n 2 (x 1 ) < x + h. 
i 

(e) may then be written 

(f) j [f(x 1 ,T^ 2 (x l ))/fg(E 2 )]dx 1 . 

We note that the integrand, for fixed x and h, has the properties of a univariate p. d. f . 
We next assume that fg(x^) ^ o. Noting that Prfxj^X^xyiXg-x^) is not defined by (b), we 
now define it as the limit of (e) as h * 0. The continuity we have already assumed is 
sufficient to justify our taking limits under the integral sign in (f ); the result is 

T 1 * 

r1 
t 

where 

(8) f(x 1 lx 2 ) - f(x 1 ,x 2 )/f 2 (x 2 ). 



U 2.5. 2.51 _ II* DISTRIBUTION FUNCTIONS _ 1? 

For fixed x g , f(x 1 U 2 ) again has the properties of a unlvarlate p. d, f.; It may be 
called the conditional j>. d. f. of x 1 , given x g . We note that If X 1 and~X 2 are statisti- 
cally Independent, 



Likewise, If random variables X n ,...,X 1lc ;X 21 ,...,X 2lc have a joint p. d* f. 
f (x n ,...,x 1k ;x 21 ,...,x 2k ) we define the conditional p. d. f. 



(h) 



f...f 



if the denominator is not zero. 

2.5 The Stielt.les Integral 

An important tool in mathematical statistics, which often permits a common 
treatment of the discrete and continuous cases (and indeed the most general case), is the 
Stieltjes integral. 

2.51 Univariate Case 

We begin by defining the Stieltjes integral over a finite half-open Interval 
a < x b: Suppose we have two functions, 4>(x) continuous for a x b, and F(x) mono- 
tone for a < x ^ b. We subdivide (a,b) into subintervals I.:(x* - ,x^) .by means of points 
X Q a < x 1 < x 2 < . . . < x m = b. In each interval we pick an arbitrary point 5*Ii. 
Denote by ^F(x) the difference F(x,) - F(x, -), and form the sum 

S zI<t>(SJ A<F(x). 
If U. is the maximum of 6(x) in 1^, and L, the minimum, then 



where 

%,- 



1 



Let - max(U.-LJ. Then 
j J j 

m 

^ ^ " SL * ^Z (U r L l)A4P(x) ^ ^^PCx) - [F(b)-F(a)]. 
^ j j j ^-~ 



18 _ II. DISTRIBUTION FUNCTIONS _ *2.31 

Hence if the Intervals I, are further subdivided, and this process is continued in such a 

way that the norm of the subdivision, 6 - maxfxi-x, , ), approaches zero, then since 

j J J 

is uniformly continuous on (a,b), -> 0, and hence 



It is easily seen that S-^ is non-decreasing, and S^, non- increasing, as the subdivision 

m 
is made finer, a^fd hence from (a), S approaches a limit. Since ST and S are independent 

of the choice of the arbitrary point .* in I.,, therefore from (a), lim S is likewise in- 

J J 6-0 

dependent of this choice. Furthermore, lim S may be shown to be independent of the 

<5-o 
method of subdivision. We call this limit the Stleltjea integral of 4>(x) with respect 

to F(x) over the range a < x < b and denote it by 

b 

U(x)dF(x) = lim s. 

i *-* 

Let ua examine further the significance of the Stielt jes integral when F(x) is 
a c. d. f. in the discrete or continuous cases: Suppose that F(x) is a discrete c. d. f. 
with only a finite number n of jumps of amount p^ at the points a^. In the Interval (a,b). 
We may assume that the points are ordered, 

(b) a < a, < a 2 < ... < a^ b. 

Since the points a^ are Isolated, eventually for any mode of continued subdivision, each 
interval I* will contain not more than one point a^ In its interior or as right end point. 
If I* contains a^, that is if x. 1 < a^ ^ x., denote it by 1^, and call the arbitrary 
point E in this Interval, . Then 



o if I. la not an 



Hence 



Now aa the norm t o, EjJ. a^, ( ' ) (S^) <b(a. ), and thua 

b n 
(c) U(x)dP(x) -21 t(a..) p... 



a k=1 



It will be noted that the continuity of <t>(x) at the points a^ la eaaential. The result 
(c) may be shown to remain valid in the caae where there la an infinite number of points 
of discontinuity of F(x) in (a,b). 



II. DISTRIBUTION FUNCTIONS _ 1Q 



In the continuous case at points of continuity of the p. d. f . f (x) we have 

dF(x)/dx - f(x), dF(x) - f(x)dx, 
and hence we might write heuristically 

(d) U(x)dF{x) U(x)f(x)dx. 

a a v 

The relation (d) may be proved as follows: We first assume that f(x) is continuous on 

(a, to). Then in each interval !, we pick as %* the point for which 



The existence of such a point is guaranteed by the mean value theorem. Then 



But by the so-called fundamental theorem of the calculus (actually, the definition of 
the ordinary definite integral), the limit of this sum as the norm approaches zero is the 
right member of (d). The proof can be extended to the case where f(x) has discontinui- 
ties on (a,b). 

We shall have need of the Stieltjes integral over an infinite interval. We 
define it as 

+OD b 

(e) f (b(x)dP(x) - lim f<t>(x)dF(x) 

*-co *"""* ~ a 
00 b-* +00 a 

if and only if the limit exists as a -CD, and b * 4-00, independently. In more advanced 
work it is sometimes convenient to consider 

-fT 

(f) lim $4>(x)dF(x). 

T > +00 _ T 

This limit of course exists whenever (e) does, but the converse is false, (f ) is called 
the Cauchy principal value of the infinite integral. Unless the contrary is explicitly 
stated, we shall always understand that the infinite integral connotes (e). 

An intuitive explanation of the meaning of the Stieltjes integral will be 
given in 52.53, where we shall also indicate how the Stieltjes integral may be general- 
ized over any range which is a Borel set E. In the univariate case, the various expres- 
sions for Pr(XE) introduced in 2.11 may then all be summarized under 

Pr(XE) J dP(x). 
E 



20 II. DISTRIBUTION FUNCTIONS 



2 . 52 Blvariate Oaae 

We limit our definition to the case where F(x 1 ,x 2 ) is a c. d. f . aa defined in 
J2.12, Let J be the half -open cell 

(a) J: a T < *, b,, a g < * 2 V 


We assume t(Xj,Xg) is continuous on J (boundaries included). By means of lines parallel 

to the axes, subdivide J into rectangles J,, j - i,2,...,m. Let the norm 6 of the sub- 
division be the maximum of the lengths of the diagonals of J\. In each cell J* pick a 
point (Ejij^j). Define ^Spfx^Xg), the second difference of Ffx^Xg) over the j-th cell, 
as in $2.12, and form the sum 



By considering the upper and lower sums Sy and Sr, defined aa in 2.51, we find again 

that lim S exists, and define it to be the Stieltjes integral of <t> with respect to F 
*-*0 

over J: 

(b) ^(x l ,x 2 )dF(x 1 ,x 2 ) - lim S. 

Jo *0 

The remarks in 2.51 regarding the independence of (b) of the choice of (^o^gj) and of 
the mode of subdivision remain valid. 

As in 2.51 it may be shown that in the discrete case 



J 
where (x^x^) & re the points in J (excluding the left and lower boundaries) where the 

probabilities are p^ (see 2.12). In the continuous case we may derive 

b 2 b 1 

j6(x 1 ,x 2 )dF(x 1 ,x 2 ) I J^U^XgJfU^XgJdx^g. 
J a 2 a7 

In the mixed case defined in 2.12, and in the notation employed there, it may be shown 

that 


b, 



U(x 1 ,x 2 )dF(x 1 ,x 2 ) - ZL-P 2 i$ <P(x 1 ,x 2l )f(x 1 |x 2l )dx 1 , summed for all i auch that a 2 < 

a i 
Denote by R 2 the entire x^g- space. We say that the improper integral 



8 2,53 _ II, DISTRIBUTION FUNCTIONS _ 21 
exists If and only If the limit 

llm 

a, -co 
bj + 4-00 

exists, where Jja^b^ are related by (a), as a^a^b^bg Independently become Infinite 
(with the signs Indicated). 

A generalization of the Stieltjos Integral to regions more general than rect- 
angles will be given In 2.53. 

2.35 k-Varlate Case 

We first define the Stlelt jes Integral over a half-open cell, 

(a) J: a < x b^ 1 - !,2,...,k. 

We assume that P(X I ,x 2 ,. ..,x k ) Is a k-varlate c. d. f. as defined 52.13, and that 

4>(x 1 ,x 2 ,..,x k ) Is continuous In J (and on Its boundaries). By means of hyperplanes 

x^ = constant, 1 i,2,...,k, we subdivide J Into cells J,, j - l,2,...,m. Let 6 be the 

length of the longest of the diagonals of the cells J*. Define A^F. the k-th difference 

of P over the cell J, as In 2.13, and form the sum 



where (5 1 -t> ^i) Is an arbitrary point In J.. Under the hypotheses we have made, S 
converges to a limit Independent of the choice of ( 1 ., . . 5] C 4) and of the mode of sub- 
division, as 6 * o. We define 

$ 6(x 1 ,...,x k )dP(x v ...,x k ) - llm S. 
J ^ 

Let R k be the entire x^.. ,x k -space. The Stieltjes integral of 4> with respect to P 
over R^ la defined as in 2.52. 

Next, let us define the integral over a region K which is the sum of a finite 
or enumerable number of half -open celTs J^, i 1,2,..., 



K 



- y 

* 



To define the integral over any (B- measurable) region E in R^. we cover E with a region of 
the type K just considered, .-aid then take as the Integral o^er E the greatest lower bound 
of the integral over K for all possible K containing E: 



22 _ II. DISTRIBUTION FUNCTIONS _ tg.93 

j6dF - g.l.b. C <t>dF. 
E KDE K 



In terms of our general definition of the Stlelt jea Integral we see that 

b 

S*(x)dF(x) - $4>(x)dF(x) 

a I 

only If I Is the half -open Interval a < x < b. For the closed Interval we would have to 
add 6(a)[F(a)-F(a-o)] - 4>(a)Pr(Xa) to the left member; for the open Interval, subtract^ 
itbHFtbJ-Ffb-o)] - 4>(b)Pr(X-b). 

Specializing now to the discrete case, we may say that the most general such 
case can be described as follows: There Is a finite or enumerable number of points 

(x 1 4*x 2 j,. > x i c i) J "" 1*2,..., and associated positive numbers P4,51p4 1, such that 

j 



F(x 1 , . , .,x^) -vLp, summed over all 1 such that x^. < Xj,..., 
In this case 

\6dF - Xi>(x 1a , . . jX^g) p a summed over all s such that (x 1g ,. . 
E 

In the continuous case 

J<bdF - J<bfdV, 
E E 

where dV Is the volume element dx^g. . .dx^. In the most general case 

- Pr(XE). 



E 

It Is helpful for some of us to develop an Intuitive feeling for the 
Stlelt jes Integral. Consider first an ordinary Integral 



E 

where h la continuous. We may conceive of the Integral In a Lelbnltzlan (non-rigorous, 
but sometimes fruitful) sense: The k- dimensional volume E is partitioned into tiny 
volume elements dV. These are so small that the function h is "practically constant" 
over any dV. We multiply this "practically constant" value of h by the volume dV and. 
sum over E. Now a c. d. f. f(^ 9 ...,x) defines a probability distribution over R, of 



12.6 _ _ II. DI3TRIBOTIQN FUNCTIONS _ 23 

which it ia sometimes convenient to think as a mass distribution. We think of dF as 
being the amount of mass or probability in an infinitesimal volume element dV, whether 
it be concentrated at points, along curves or surfaces, or smeared out as a density. We 
weight the "practically constant" value of t in dV with the amount dF of mass or proba- 

bility, getting 6dF, and we sum over E. The reader may see that the definition of JWF 

J 
over a half -open cell J is a rigorous polishing up of the process we have described: 

In place of dV we use the cell J 1f in place of dF we useA^F, the probability that a rai 

J J 

dom point be in J*, we multiply not by the "practically constant" value of 4> in J*, but 
by any value it assumes in J., and finally, instead of merely summing, we take the limi 
of the sum* 

g.6 Transformation of Variables 

Suppose y <|i(x) is a (B-meas.) function of x. Then if X is a random variai 
with c. d. f . F(x), Y - i|i(X) is also a random variable with c. d. f . 0(y) calculated a 
follows : 

0(y) - Pr(Yy) Pr(i|,(X)y) - J dP(x), 

Ey 

where E y is the totality of points on the x-axis for which i|i(x) y. 

More generally, suppose (X 1 ,X 2 , . ..,X^) is a random vector variable with c.d. 
FU^Xg,.,.^), and j,,J 2 ,...,J n tre (B-meas.) functions of x 1 ,x 2 ,...,x k , 
y 1 - ^(x^Xg,...^). Then (Y 1 ,Y g ,. ..,Y n ), where Y - * 1 (X 1 ,X 2 ,. . .,\.), is a random 
vector variable with c. d. f . 



y^ 9^2 f * * * *^n 

where EL v .. is the region in R. defined by ^ (x. ,x p , . . .,x^) < y 1 , i - l,2,..,,n. 
^1 '^2' * * * '^n 

It may be shown that if X 1 ,X 2 are random (possibly vector) variables, and 

that if Y 1 - ^(X,), Y 2 - (> 2 (X 2 ) are (Brmeas.) functions, then if X 1 and X 2 are statis- 
tically independent, so are the random variables Y 1 and Y 2 . 

Transformations of discrete variables offer no especial difficulties, so we 
consider in the following sections transformations in the continuous case. 

The theorems obtained there are essentially corollaries to corresponding 
theorems on the transformation of integrals, single and multiple. Rigorous proofs of the 
theorems on integrals may be found in standard real variable texts. For the student in 
this course the insertion at this point of heuristic proofs which will strengthen his % 



2k II. DlgPRIBUriQM FDNCTIONS it 2,61 2*62 

Intuitive grasp seems desirable, and accordingly we employ the Infinitesimal arguments 
so useful In applied mathematics. 
2.61 Ualvarlate Case 

Suppose X is a random variable with p. d. f . f(x). Let y - 4>(x) 

be a monotone transformation having unique inverse x - 4>~ 1 (y) f and such that 4> f (x) exists. 
Now consider a new random variable Y - <t>(X). The problem here is to determine 
Pr(y<t(X)<y+dy). Now since y - <t>(x) is monotone, It is clear that the values of x for 
which y < 6(x) < y +dy(dy>o) will lie on an interval (x.x-fdx) depending on y, where dx 
may be positive or negative depending on whether 4>(x) is monotone Increasing or decreas- 
ing. Since x - 6~ 1 (y) by the Inverse of the transformation y - 4>(x), then expressed 
In terms of y, the interval (x,x+dx) becomes (6 (y), <T (y+dy)). Hence the value of 
Pr(y<*(X)<y+dy) Is given by determining the value of Pr (x<X<x+dx) - Prf 1 (y)<X<<tf '(y+dy)) 
if dx > or Pr(x+dx<X<x) - Pr(4>" 1 (y+dy)<X<<|>~ 1 (y)) if dx la negative. In either case the 
probability is, except for differentials of order higher than dy, 

f(x)|dx| 

where x is to be expressed in terms of y. We may summarize as follows: 

Theorem (A) : Let X be a* continuous random variable with probability element 
f(x)dXj and let y 4>(x) be a monotone transformation with inverse x - 4>" 1 (y) auch that 
<t> f (x) exists. Then except for differentials of order higher than dy 



Pr(y<6(X)<y+dy) g(y)dy 
led 
Example. Suppose 



where g(y) - f(x)|g|! expressed in terms of y 



f(x)dx - e~ x dx 

- o dx x<o 

and that it ia deaired to find Pr(y<X 2 <y+dy), i. e., the probability element of 
y, say g(y)dy. We have the transformation y - x 2 , or x - Yy"and hence 

g(y)dy - e" x l-^ldy - e"^ -~ dy 

2.62 Bivariate Caae 
Suppose 



are functions of x^ x g with continuous first partial derivatives. Let f(x ]f x Q ) be the 
joint p. d. f . of X 1 and Xg. We shall assume further that the transformation (a) ia 
one-to-one, that is, the relation between the x f a and ya ia auch that corresponding to 



II. DISTRIBUTION FUNCTIONS 



each point in the x^ 2 plane (OP that part of it for which the probability function 
f(x 1f Xg) j o) there is one and only one point in the y^ 2 plane and each point in the 
y^g plane which has a corresponding point in the XjXg plane has one and only one corres- 
ponding point in the x^ 2 plane, the relation between any point in the XjX 2 plane and ita 
corresponding poiiit in the y^ plane being given by (a). Let the inverse of the trans- 
formation (a) be 



Let the Jacobian of the transformation (b) be 



ay 1 ay 2 
a* 2 ax 2 



(c) 



If X 1 , X g are random variables, then Y I y^X^Xg) and Y 2 y 2 (X 1 ,X 2 ) will also be ran- 
dom variables. The problem here is to determine the p. d. f. of Y 1 and Y 2 , say g(y 1 y 2 ), 
from ffx^Xg) the p. d. f. of X^ X 2 and the transformation (a). In other words, the 
problem la to determine 



(d) 



to within terms of order dyjdy g . 

Consider the infinitesimal region R in the x 1 ,x 2 plane bounded by the curves 
whose equations are 



(e) 



where 



y 1 
dy 1 



, dy>o. 



The situation is represented in Figure 2. 




Figure 2 



Now the probability (d) is given by 



By the mean value theorem for 



Integrals the value of this Integral Is f(xj,x)dA where (xj,x) Is some point in R and 
dA Is the area of R. We must now find an expression for dA< 

If the coordinates of P 1 In Figure 2 are (x^,x 2 ) then the coordinates of 
PO> *> ?k are 



(f) 



2 



dx. 



dx- 



dx. 



dx. 



dx. 






dx 



except for Infinitesimals of order higher than dy. and dy g . To show this It Is suffic- 
ient to consider only one point, say P 2 . The coordinates of P 2 are given by (f ) when y 1 
Is replaced by y 1 + dy 1 . We have 



x- * 

But x^y^dy^y ) - x^y^y,,) ^ -5^7, * terms of order (dy 1 r and higher and x^+dy, ,y 2 ) 

- X 2 (y 1f y 2 ) + ^ dy 1 - terms of order (dy 1 ) 2 and higher. But (x^y^yg), XgCy^yg)) 
are the coordinates of P 1 which have been Indicated by (x^Xg), thus showing that the 
approximate coordinates of P g are those stated In (f ). A similar argument holds for 
the approximate coordinates of P- and P^. 

It Is clear that P 1 , together with the points represented by the approximate 
coordinates of P g , P^, P^ given by (f) form a parallelogram R 1 . Now It Is known from 
coordinate geometry that If (x^x^), (x^x^), (x^,x^) are three vertices of a parallelo- 
gram, then the area of the parallelogram Is given by the absolute value of the deter- 
minant 



Hence the area of the parallelogram R 1 la given by the absolute value of 



$2.62 



II. DISTRIBUTION FUNCTIONS 



27 



(e) 


1 X, X 8 

dx, 3x 2 


- 


ax, 3x3 
ay, ay, 

ax, ax 2 
ay a ay 8 




1 dy 1 2 dy^ i 
ax, ax 2 
i yp * * $y 2 


i a ay^ 



But since the coordinates of the vertices of parallelogram R 1 differ from the corres- 
ponding coordinates of the corresponding vertices of R by terms of order higjier than 
dy 1 or dy 2 , it follows that the area of R (i. e,, dA) differs from the area of R 1 by 
terms of order higher than dy 1 dy g . 

Since ffx^Xg), the p. d. f. of X 1 , X 2 , is continuous, we have that ftx^x^) 
differs from f(x 1 ,x 2 ) by terms of order dy^y^where (x^xjp is any point in R. There- 
fore we have the result that the probability expressed by (d) is equal to 



(h) 



where the x f s are to be expressed in terras of y f s by (b). It may be verified by the 
reader that 



a(x 1 ,x 2 ) 



a(y 1 ,y 2 ) 



,-1 



x 1 - 



We may summarize in the following: 

Theorem OB): Let X 1 , X g be two continuous random variables with p. d. f. 
). Let y 1 - y^x^Xg), y g - y 2 (x 1 ,x g ) be a transformation with a unique inverse 
y^yg), x 2 X 2 (y 1 ,y 2 ), such* that the first partial derivatives of the y f s with 

respect to the x f s exist. If the random variables y^CX^Xg) and y^X^Xg) are denoted 

by Y, and Y respectively , then 

" i - -- - L - d ' L - 



^dyg 



is given by ( c ) . 
- 



where x- and x are expressed in terms of y- ,y by (b). and 
__ i ^ i ^ 



Example; To illustrate the transformation problem for two random variables, 
suppose the probability element of X 1 and X 2 la 



- 
21 22 



II, DISTRIBUTION FUNCTIONS 



12.63 



defined over the entire 



Y, - 



plane. To determine the p. d. f . of Y 1 and Y 2 where 
[, Y 9 - tan" 1 J- . 

C J\if\ 



The transformation Involved here Is 

*i 



defined over that part of the y 1 ,y 2 plane for which y 1 ^ > j Yg < 2it. The In- 
verse of the transformation Is 

*! - y 1 cos y 2 
x 2 = y 1 sin y 2 . 



We have 



cos y 2 - y, sin y 2 
sin y j cos y 



Therefore by Theorem (9, the probability element of Y 1 , Y 2 Is 



- ^ 

1 "* 21 



2.63 k-Varlate Case 

Let the joint p. d. f. of X 1 , X 2 ,...,X^ be f (x 1 ,x g , . ..,x^), and Introduce new 
random variables Y I , Y 2 ,...,Y^ by means of the one-to-one transformation 

(a) y^L - y 1 (x 1 ,x 2 ,...,x k ), l=1,2,...,k . 
Let the Inverse (which will be unique ) of this transformation be 

(b) x - 



and Its Jacoblan 



(c) 



dx. 



ay k 



assuming, of course, that the first partial derivatives exist. 



II . DI3TRIBOTION FUNCTIONS 29 



By pursuing an argument similar to that used In the bivariate- ease, we find 
that the probability element of the Y^, say gty^ yg* 7^ c ) dy 1 ...dy^is given by 

(d) 



where the x f s are to be expressed in terms of y f s by (b). 

This covers the cases where the number n of new variables equals the number k 
of original variables. It can be shown that if n > k, there exists no p. d. f . for the 
n new variables. (Note here the complete generality of the treatment by means of the 
c. d. f . In 52.6). If n < k the usual method of getting the p. d. f . of the new variables 
Is to adjoin further variables to fill out the number of new variables to k, use the 
above procedure, and then "integrate out" the extra variables by getting the marginal 
distribution of the n variables whose p. d. f . is desired. 

2.7 Mean Value 

We begin with the definition of the me&n value of a random variable In gen- 
eral and then consider in later sections the mean values of particular (random) functions 
of especial interest In statistics. If X is a random variable with c. d. f . P(x) we 
define the mean value of X as 

+00 

(a) E(X) - ^xdP(x). 

-00 

This is also called the expected value of X. 

If Y <(>(X) la a continuous function of X, then the c. d. f . of Y is (52.6) 

0(7) - JdF(x), 

where E is the set of points on the x-axis such that 4>(x) y. Prom (a), 

+00 

E(Y) - Jy dO(y), 
-00 

and this may be shown to be equivalent to 

+00 

(b) E[*(X)] - 



and 



-GO 



If random variables X^X,,...,^ have the c. d. f. F(x, ,x g , . . .,x k ), 



y - <b(x 1 ,x 2 ,...,x jc ) is continuous, then from the definition (a) it may be shown that 



50 II. DISTRIBUTION FUNCTIONS 



(c) E[*(X lf X f ...^)] - dF, 



where R^ Is the entire k-space. Of course If the improper integral does not exist in 
the sense explained in 2.5, ff., we say that the mean value of <t> does not exist. In 
the light of the Intuitive discussion (2.53) of the meaning of a Stieltjes integral, we 
see from (c) that the mean value of <t> may be regarded as an average over k- space of the 
function <|>,--the average being taken over volume elements dV, and the weight assigned to 
each contribution being the total probability in dV. 

For the discrete and continuous cases, the expressions (b) and (c) may be 
analyzed into the forms given in 2.51, 2.53. 

2.71 Uhivariate Case; Tchebycheff ! s Inequality 

The mean value of X , 

+00 
H - E(X 1 ) dFU), i - 0,1,2,..., 



-co 

is called the i-th moment of the distribution F(x) about the origin. ^ 1 for any 
F(x); nj E(X) Is called the mean of X, also the mean of the distribution and denoted 
by a. The i-th moment about the mean is defined to be 

+00 
(a) ^ - Ef(X-a) 1 ] - Ju-a^dFU), 1 - 0,1,2,... 

-CD 

For any F(x), ^ o = 1, ^ = o. The variance of X, or the variance of the distribution, 
Is defined to be ^ 2 , and is denoted by the special symbol cr^. <T X > o is called the 
standard deviation of X or of the distribution. -A formula for expressing ^ in terms of 
ji|, ^ -1 ,...,^ 1 I may be obtained by using the binomial theorem in (a) and then integrating 
In particular, we find that 

o-l = M - - a 2 . 

An important theorem about arbitrary distributions with finite variance Is 
contained in the Tchebycheff Inequality; 



(b) Pr(lX-a|>6cr x ) l/<5 



To prove (b) we break up the Integral for 



2 



42.72 



II. DISTRIBUTION FUNCTIONS 



31 



(c) 



where the Intervals 1 



+00 

J (x-a) 2 dF(x) =\ + \+ \, 



-oo 



I, are defined by 

I,: - oo < x < a - <J<r x , 



. 



Now in I 1 

Hence 

(d) 

Similarly, 

(e) 

Finally, 

(f) 



|x-a| > <5cr, 



J(x-a) 2 dP(x) ^ <5 2 cJ JdF(x) 
" 



J(x-a) 2 dF(x) ^ dV 2 JdP(x) 



J(x-a) 2 dP(x) ^ o. 



Using (d), (e), (f) in (c), we get 



This la easily seen to be equivalent to (b). 
2.72 Bivariate Case 
For the distribution 



we define momenta fiJ. about the origin by 



0,1,2.., 



where R ? is the entire x.x -space. Since X 1 has the marginal distribution F^x^, the 
mean of X 1 has already been defined in 2.71; we denote it by a 1 . In view of the re- 
marks In 2.7, we may calculate a 1 from either of the integrals 



52 II DISTRIBUTION FUNCTIONS so. 



+00 



a l " 



-00 R 2 

Similar atatementa apply to a 2 - E(x 2 ). We note ji 00 ' - 1. The point (a^ag) may be 
called the mean of the dlatrlbutlon. the momenta ^ about the mean for F<X I ,x 2 ) are de- 
fined by 

(a) n j - EKXj-a, ^(Xg-ag)^] - J(x 1 -a 1 ^(x^-a^dFU^Xg), i,j - 0,1,2,... 

R 2 

For any P(x 1 ,x 2 ), H 00 - 1, ^ 10 - p^ 01 0- The variance of X 1 haa already been defined 
in $2.71; we note that it is a^ - ^ go . Likewise, < - ji 02 . The remaining second 
order moment n n la called the covarlance of X 1 and X g . The quotient 

(b) p 12 " H-i i /0V x 

I I C. II A,. Ag 

la called the correlation coefficient of X 1 and X g . By meana of the Schwartz inequality 
it may be ahown'that -1 ^ p 12 1 . Aa an exerciae the reader may show that if X 1 and X 2 
are atatiatically independent, then p 1g - o, but the converse Is false. 

The reader may also verify that a necessary and sufficient condition for 
p 12 - 1 la that all of the probability in the X^ 2 plane be concentrated along some 
atraight line with positive slope. (For p 12 - -1 the slope must be negative.) 

Formulas giving the moments about the mean in terms of the moments about the 
origin may again be obtained from (a); in particular, it is found that 



a " - a 



and these expreasions may then be substituted in (b) to evaluate the correlation coef- 
ficient in terms of the first and second order moments about the origin. 

273 k-Varlate Case 

The moments ^ of a distribution F(x 1 ,x g , . . .,x k ) about the origin are defined 
aa 



II. DI3TRIBOTIOT FUNCTIONS 33 



where R^ la the complete k- apace. For any F, ^oo.,.o " 1 * The mean of X T defined In 
(2.71, may now be seen to be ^OQ Q , and can be expressed also by means of integrals 
with respect to marginal distributions of various orders. We denote E(X 1 ) by a 1 , and 
note that the above statements apply to a 2 - E(Xp ),..., a^ - E(X^). The point 
(a^ag,...,^) is called the mean of the distribution. and the moments p about the mean 
are defined to be 



- J n (*i-a ) *<&> Ji - 
x x x 



KX^-a.) i * 

1-1 x 1 

We note n 00 " 1 I n order to simplify the notation, we specialize the following re- 
marks to the variable X 1 or the ,pair X^Xg; their generalizations are obvious: 
hoo...o ** * The varlance of x i* defined in 52.71 is seen to be P 2 oo...o* The covar " 
iance of X 1 and X g , defined in 2.72, is H l10 o o' aru ^ ^ e corre ^ a tion coefficient of 



X 1 and X 2 is 



... 0/^200. ..0 ^020. .. 



These quantities may all be expressed in terms of the first and second order moments 

about the origin. 

2.lk Mean and Variance of a Linear Combination of Random Viariables 
Suppose we have k random variables X 1 ,X 2 , . . .,2^, the c. d. f. of X^ being 

F,(x,). Let their joint c. d. f. be F(x- ,x , . . .,x,_). F 1 (x 1 ) is then the marginal dis- 

JL i I <- Jv JL JL 

trlbution (2.2) of X^; if the X^ are mutually (statistically) Independent 

k 
F(x 1 ,x 2 ,...,x k ) - TTp 1 (x 1 ), 

but we shall not assume this. Let y - td(x 1 ,x g ,. . .,x k ) be a linear function, 

k 



(a) 

1-1 

k 

Then Y - t>(X. ^g,...,^) - JEIctjX^ is a random variable (2.6), its c. d. f. G(y) is 

li * 

0(7) - JdP(x,,x 2 ,...,x k ), 

r 

k 

where EL Is the half -space defined by 5~c 
* 1-1 



II, DISTRIBUTION FUNCTIONS J2.79 



In accordance with the notation established In $2.73, denote the mean of X^ 
by a,, its variance by o^ which we shall now abbreviate to o^> and the covariance of 
X, and Xj by Pi^^* Denote the mean of Y by a, its variance by <. 

It is helpful to note that E is a linear operator: if 4> 1 and <J> 2 a* contin- 
uous functions of X 1 ,X 2 , ...,X^, and A and B are constants, 



1 / A<K _i_Tik \^1W A PK f^T? __ uf^K f\Tf A TS^ <fi \ j. H T?/ <k \ 

Tc \ "k 
From this we get immediately, because of (a), 

k 



(b) a - E(Y) - jEIaiEU.) - 



Note that for the validity of this result it is irrevelant whether or not the X^, are 
statistically independent. 

Next let us calculate the variance of Y: 



0-2 . Ef(Y-a) 2 ! - E|[Zot 1 X 1 -Z 1 a 1 ) 
7 i-1 1 ^ i-1^ 1 

k k 

- Efl^-oitX.)] 2 ! - E| 



k 



(c) 

where p^ - 1 . If the X^ are mutually independent, then - o for i ^ J, and 



2.7!3 Covariance and Correlation between two Linear Combinations of Bandom 
Variables 

Suppose Y^ and Y are each linear combinations of random variables. The ran- 
dom variables in both combinations may be the same, or none of those appearing in Y may 
appear in Y . , or there may be an intermediate degree of overlapping. All of these cases 
may be covered by assuming that 



II, DlgEMBOMON FUNCTIONS 





where the ot^, p 1 ape conatanta and the X^ are random variables with joint c. d. f . 
P(X I ^g,...,^). For example, the ease of no overlapping would be obtained by requiring 
0, i - 1,2,...,k. If EO^) - a 1 , then from (b) of $2.7^, 



E(Y ) - 



Hence the oovarianoe of Y and Y ft is 






where o^ Is the variance of X^ and ^^Pi* is the covarlance X< and Xi. Hence the cor- 



relation coefficient between Y and Y. la 

p 




from (b) of 52.72 and (c) of J2.7 1 *. Special cases of this formula for the correlation 
coefficient are much used in education and psychology in connection with tests. 

2.76 The Moment Problem 

The general moment problem (univariate) is twofold: (JL) given an infinite 
sequence of numbers 1, KJ, n^,..., does there exist a distribution with these numbers as 
moments? and if so, (11) is the distribution unique? It is usually only the problem 
(11) that arises in statistics. It may be shown that whenever the moment generating 
function 4>(fc) (see $2.8) exists for -h e h, h > o, there is a unique* distribution 



with the moments 4> (1) (o). 



Necessary and sufficient conditions for the unique determination of a 



*$ then is analytic in a strip containing the imaginary axis, hence the characteristic 
function f(t) - 4>(it) is analytic for all real t, and this is a sufficient condition for 
uniqueness In the moment problem: See P. L^vy, Theorie de l f addition des variables 
aleatoires. Monographiea des probabilltes, Paris, 1937, p. 1*1. 



36 II. DISTRIBUTION FUNCTIONS jjg.fl g.Ai 

distribution by Its moments are extremely complicated, but the following theorem 
gives an easily applied sufficient condition of Carleman: 

Theorem (A); A sufficient condition for the uniqueness of a distribution 

with moments |iJ is that the series ^T" (ji m )~ 1 ' 2m diverge, 

- !&! 

For a multivariate distribution with moments y. 1 define 

(a) A i " "*ioo...o * Hoioo...o + ---^o...ooi 

A sufficient condition of Cramer and Wold for uniqueness Is Theorem (B), of which (A) 
may be regarded as a special case: 



Theorem (B); If the series >(A )"'/* m diverges, where A 4 la defined by 

^l^Mi^^M* ' CT^T ^ ~ 

(a), then the distribution F(XJ ,x 2 ,...,x^) is uniquely determined by its moments. 

2.6 Moment Generating Functions 

When the moment generating function (m. g. f . ) of a distribution satisfies a 
certain condition given below, then the moments of the distribution may easily be found 
by differentiation of the moment generating function. The use of the m. g. f . also per- 
mits the easy determination of the distribution of certain functions of certain random 
variables. We consider in detail the 

2,81 Uhivarlate Case 

For any distribution F(x) we define the m. g. f . aa 

(a) <t>(e) - E(e ex ) - "f e ex dF(x). 

-oo 

If we proceed heuristically, we may write 

+00 +00 4 +OD 

dF(x) \ x dF(x) - p.J . 
J 

-00 

Let us now consider under what conditions nJ ra 4)^(0). 

In order that 4>(e), considered aa a function of a real variable, poaaeaa de- 
rivatives at e o, it la neceaaary that <t>(e) aa defined by (a) exiat in a neighborhood 



*H. Hamburger, "Uber eine Erweiterung dea Stieltjeaachen Moment enproblema", Math. 
Annalen f vol. 81 (1920), pp. 235-319, and vol. 82 (1921), pp. 120-16U, 168-187. 
**H. Cramer and H. Wold, "Some theoreraa on diatribution Emotions", Jour. London Math. 
Soc. , vol. 11 (1936), pp. 290-29 1 *. 



S2.81 II. DISTRIBUTION FUNCTIONS 



-h e < h, h > 0. (Note that in any case 6(0) - 1 la defined by (a)). We see now that 

this restricts the class of functions tf(x) under consideration. Our definition (2.51) 

+00 +QD o 

of the infinite integral J implies the existence of f and f . Hence as x -* + oo , 

-CD -co 

F(x) -* 1 sufficiently rapidly so that 

+co 

(c) M. - [ e^dFU) < CD , 

i j 

o 
and as x -* -co , F(x) -* sufficiently rapidly so that 

o 

(d) M 2 - f e'^dFfx) < GO . 

-CD 

This means that F(x) possesses moments of all orders: To demonstrate the flnlteneaa of 

+00 

1 

-GO 

consider 

+00 a +00 

J x 1 dF(x) - Jx^FU) + ^ (x i e" hx )e hx dF(x). 

o o a 

Choose a 30 large that x i e" hx < i for x > a. Then the second term of the right member is 
less than M 1 defined by (c); the first term is certainly finite, and thus 

+00 
x dF(x) < oo . 

o 

Similarly by use of (d) we may show 

o 



< oo, 

-GO 

and hence ln|| < GO for all i. 

We now state the heuriatically obtained relation (b) in the form of 
Theorem (A); If the m. g. f. <t>(e) of a c. d. f. F(x), aa defined by (a), 

exists for -h ^ e h, where h > 0, then the 1-th moment of P(x) about the origin la 



0,1,?,... 



3 8 II. DISTRIBUTION FUNCTIONS 



The proof of this theorem may be baaed on the theory of the bilateral Laplace 
transform and is beyond the level of this course. 

The m. g. f . if it exists is uniquely determined by (a). The converse is 
stated in 

Theorem (B); If F(x) has the m. g. f . 6(6), and 6(6) exists for -h < 6 h, 
h > o, and if the c. d. f . G(x) has the same m. g. f., then G(x) s F(x-). 

The reader may write out an expression for <t>(e) in the discrete case, which is 
a sum of terms, and an expression in the continuous case,, which is an ordinary integral, 
by using the analysis of 2.51. 

We note that if Y = i|)(X) is a continuous function of X, and Gf-(y) is the 
c. d. f . of Y, then the m. g. f . of G(y) is 

+00 
E(e eY ) = E(e* (X >) - e eip(x) dF( X ). 



-CD 

If this exists for I el h (h>o) and is recognized as the m. g. f. of a known distribution, 
then theorem (B) determines G(y). 

In certain problems, particularly in sampling theory, it is important to know 
the limiting form of a c. d. f. F/ x(x) as n - CD of a function X n of n random variables. 
The m. g. f . offers a powerful method for determining the limit of this distribution. 
The method Is to obtain the m. g. f . of X n , say d> n (e); then if 6 n (6) has a limiting form 
as n -* oo which la the m. g. f . of some c. d. f . F(x), we may conclude under certain con- 

ditions that Lim F, Jx) = F(x). More precisely we shall state the following theorem 

n -* oo ( ' 
without proof.*** 

Theorem (C): Let F/ n j and 6/ n )(6) be respectively the c. d. f . and m. g. f . 
of a random variable X n (n=i ,2,1,^, ). If 6/ n \(6) exists for le| < h for all n and if 

there exists a function 6(6) such that Lim 6/^\(6) = 6 (6) for |e| < h', then Lim F/^(x) 

n-*co (n) n 

= F(x), where F(x) Is the c. d. f . of a random variable X with m. g. f . <t>(6). 



*D. V. Widder, The Laplace Transform. Princeton University Press, 19^, P- 

**If the integral defining 6(6) exists on the real interval (-h,h), it exists for complex 

in the strip determined by the condition that the real part of 6 be in the Interval, 

and 6 la an analytic function in the strip: see Widder, loc. clt. Hence if for F(x) and 

G(x) the moment generating functions coincide in the interval, they coincide in the atrip. 

For coincidence In the atrip there is a uniqueness theorem: Widder, p. 21*3. 

***For proof, see J. H. Curtiss, "On the Theory of Moment Generating Functiona", Annala 

of MM-,h. Ctrt., Vol. 13, No. k , pp. 



{2.82 



II. DISTRIBUTION FUNCTIONS 



(a) 



2,82 Multivariate Caae 

The m. g. f. of a distribution P(X I , 



E(e 



-1 



is defined to be 



dF. 



We assume 



(b) 



<t> exists for -h 



h, h > 0, 



and then may consider restrictions on P(x), analogous to those of 2.81, implied by (b). 
We state without proof 

Theorem (A); Under the assumption (b) 



J-1 Jo J\ 

ae 1 1 de 2 2 ...ae k l 



>1 = 6 2 



Theorem ( B ) : If <b satisfies condition (b), it uniquely determines P. 
Let F*(x^) f with m. g. f. ^^(6.^) , be the c. d. f. ! 3 of mutually independent 
variables X 1 , 1 - i,2,...,k. Then the joint c. d. f. is 



(c) 



and the m. g, f . of P is 



P(X.,X ,...,3 



k 

U 1 



dF 



r 

_| ] 



e 

-CD 



(d) 



By the uniqueness Theorem (B) it follows that if the m. g. f. la (d), the distribution 
is (c). 

Theorem (C): Suppose that random variables X^, i l,2,...,k, have c . d . f . ' a 
p.fx^) with m. g. f * 4)4(6^)^ and that all 6^(e.) satisfy Condition (b). Then the X 
are mutually independent if and only if the m. g. f . <t> of the joint distribution F fac- 
tors according to (d). 



II. DISTRIBUTION FUNCTIONS 



SS2.9. 2.91 



The theorem 13 also valid In the case where the X i are vector variables (then 
e i are also vectors). 

If Y.^ - i|) l (X 1 ,X 2 , .. .,X^), 1 l,2,...,t, are continuous functions, then a 
method of determining the joint c. d. f. G(y 1 ,y 2 , . . .,y t ) of the variables Y is to form 
the m. g. f . of G; It is 



= E 



6 i Y i 



If this exists for le^l < h > o, 1 = i,2,...,t, It uniquely determines G(y 1 ,y gf . . .,y t ), 
2.9 Regression 
2.91 Regression P'unctions 

If X 1 , X 2 have the joint p. d. f. f(x 1 ,x 2 ), we define the regression function 
a 1<x of X 1 on X 2 as the mean value of X 1 for a fixed value x 2 of X 2 , 1. e. 



+00 



(a) 



- E(X 



l Ix 2 )dx 1 , 



-00 



where the conditional p. d. f. t'fxJXg) is defined by (g) of 2.U. We note that the 
regression function (a) is a function of x 2 only. The graph of this function la called 
the regression curve . If the regression function is linear, 



(b) 



c, 



then we say that we have a case of linear regression, and call b and c the regression 
coefficients. The reader may show that if X 1 and X ? are statistically independent, then 
the regression of X. on X ? is linear, with b = and c = a. , the mean of X 1 . We remark 
that the regression of X 1 on X 2 may be linear, while that of X 2 on X 1 is not. 

If X 1 , X 2 are discrete random variables, then in the notation of 2.12, we 
define the regression of X 1 on X 2 only for X 2 = x 2i , i - 1,2,..., by 

(c) a. * 

2 

where both summations are made for all j such that x ? . = x ^. For the mixed case des- 
cribed in 2.12, we define the regression of X 1 on X by 



52.92 _ II. DISTRIBUTION FUNCTIONS _ J*L 

+00 

(d) a 1-x = E(X 1 lX 2 x 2l ) - ^X 1 f(x 1 !x 2l )dx 1 . 

2 -OD 

We shall limit the discussion for more than two variables to the continuous 
case. Pork random variables X 1 ,X 2 , . . . ,X k , let f (x 1 \x^ f x^ f . . . ,x k ) be the conditional 
p. d. f . -defined by (h) of 2.1*. Then we define the regression function of X 1 on 
X ,X X ,...,X. to be 

2 j K 

+00 

(e) a i. x x ...x = E(X 1 |x i =x i' 1-2,3,... jk) = J X 1 f(x 1 lx 2 ,x 5 ,...,x k )dx 1 . 

2 3" k 



-(]D 



If this function of x 2 ,x.,, . . . ,x k is linear, 



(f) a - ECXJXi-x,) = Z_b.x + c, 

1 'V3'" k 111 j =2 J J 

then the regression is said to be linear and the b- and c are called regression coeffic- 

ients. Similarly, we may define the regression function of any X on the remaining X f s. 

We note in conclusion that a regression function may always be regarded as the first 

moment of a conditional distribution. 

2.92 Variance about Regression Functions 

The variance of X 1 fcrr a fixed value x ? of X 2 is defined as 



(a) 



CD 

r^ x = f (x 1 -a 1 



-00 



p 

cr!r is, in general, a function of x 0> and its mean value ** ^ with respect to x is 

1 ' Xp d i c: d 

known as the variance of X, about the regression function of X, on X . That is, we have 

- i - - - - i d 



OD CD 

(b) 



f f 

-So -O 



c. c c. || 1 X o ' c. e. 

-CO 2 -60 ~OD 2 

In the k-variate case, we have 

GO 

p r ? 

( f\\ fr- MY n i f ^ Y I Y Y Y^HY 

* ' 1.YY Y I^II.YY Y ' * 1 P ^ * * * If ' ^^ 1 * 

2 3*" k J^ ? j-*' k 
and the variance of X. about the regression function of 'X, on X ,X, , . . . ,;c is 

___________. ^ ^_____^ _-____ - I j K. 



II. DISTRIBUTION FUNCTIONS 



2.93, 



' 



00 00 



-00 -00 
00 00 



-oo-oo 



\-f . t 



The quantities given by (a), (b), (c) and (d) may be similarly defined for discrete and 
mixed cases, and also for empirical distributions. 

2. 93 Partial Correlation 

Suppose X 1 , Xg,...,^ Is a set of random variables. The covarlance between 
any two of the variables, say X 1 and X ? for fixed values of any set of the remaining var- 
iables, say X p , X r+1 ,...,^ c (2<rk), Is defined as 



OD oo 






C i2.r(r+i)...k be ' the mean value of 



^j^ 



wlth 



to 



00 00 



...k " J...( c i2- 



xx 



-oo-oo 



00 00 



-00 -OD 



, . . .x k 



)( 



.x k )f (x l ' W ' - 



partial correlation coefficient 



X r+i ' ' ' ?X k i3 def ined as 



) k ^ e ^ ween ^i > ^ with respect to . 



Pi2-r(r+i )...k " 



The quantities defined in (a), (b) and (c) extend to discrete and mixed cases. 

2.9** Multiple Correlation 

A procedure which is often carried out In statistics is that of determining 
best-fitting linear regression functions in the sense of least squares even thougji the 
actual regression function is "not quite" linear. The procedure is perhaps more often 
carried out with an empirical c. d. f. P n (x 1 ,x g , . ,,x k ) than with a probability c. d. f, 



II. DISTRIBUTION FUNCTIONS 



Jti. 



Here, we shall only consider the case of a probability c. d. f . where the variables are 
all continuous. There will be analagous results for discrete and mixed cases (and also 
for the empirical distributions). 

In this problem we let X 1 , X 2 ,..,X^. be random variables with c. d. f. 
,X,. >*) and determine the constants b b,..,* so that the mean value cf the square 



g,. 



1-2 



) Is a minimum, 1. e,, so that 



(a) 



? k 
S - J...J (x r b r yb l x l ) 2 dP(x 1 ,x 2 ,..,x k ) = 





-CD -00 



"" 



Is a minimum. 

The values of the b f s which minimize S are given by solving the equations 

a q 

jrij - 0, (1 1 ,2, . . . ,k). Writing out these equations we have (after dividing each equation 
by -2): 



(b) 



where a^^ = E(X 1 ) and c^ E(X 1 X.). Substituting the value of b 1 from the first equation 
Into each of the remaining equations, and setting Cj* = c ii~ a i a -j ** E[(X^-a^)(X^-a . )], 
the covarlance betweeh X^ and X^, we have the following equations to solve for 
b g , b^.^b^: 

(c) 



from which we obtain by using Cramer's rule for solving linear equations 



(d) 



where 



j=2 



cof actor of Cj* in \C^,\ 
1C, J 



being the determinant 



It is assumed, of course, that this determinant ^ 0. 



2k 



kk 



II. DISTRIBUTION FUNCTIONS 



For the value of b, we therefore have 



(e) 



a, - 



The least squares regression function of X 1 on X^X,,..,^ is thus 



(f) 



where the values of the b's are given by (d) and (e). 

If we substitute the minimizing values of the b's, given by (d) and (e), in 
(a) we obtain the minimum value of 3: 



Min(S) = E[(X 1 -a 1 - 



,, 



(g) 



If we sum the last expression first with respect to 1, we find that 




1-2 



if 



i f - j , and = if i ! ^ j. Hence summing on i and putting i f = j the last expression 

T*- i^ 

reduces to >. C, jCmC^' which is the same as ^> C^C, iC 1 ^. Thus denoting Mln(S) 
-.2 JT? 2 U J iJ = 2 J 

-2 3...k 



by <n . , ",/we have 



(h) 



C 



n 






c n 


c 12 . . 


C 1k 


9*i 


c 22 . . 


C 2k 









. 








'' kk 



1C 



ij' 



To show that orf. , v may be expressed as this ratio of determinants, let us note that 

I C.J uK. 

the determinant in the numerator may be expressed as 



(D 



II. DISTRIBUTION FUNCTIONS 



where TJ^ oof actor of C^ in the numerator determinant. Now, for i - 2,3>..*k, 
(j) ^i " ~ ^ *^1 " l '*"**^ 



where t^j ia the cofactor of Cj, in the determinant ICjjl* (i, J-2,3,..,k). Hence the 
numerator determinant may be expressed aa 



t IC^I "C 1 jCitj"^, 3, .k). Dividing expression (k) by TJ n and remembering that 
^J - C M,i,j 2,5,...,k), we therefore establish the fact that cr?. , ,. may be expressed 

. I c^ K. 

2 

as the ratio of determinants given in (h). The quant it><rf. , i. is the variance of X-, 

I d,J llv .-- i 

about the least- square linear regression function (f ), and should not be confused with 

"?-23...k a3 defined ^ 2 -93. 

The correlation coefficient between X 1 and the regression function (f ) is 

known as the multiple correlation coefficient between X 1 and Xg^X,,..^ and is denoted 
by R. . . To obtain an expression for the multiple correlation coefficient, we 

I C.J K. 

first determine the covariance between X 1 and the function (f ), which is 




(1) 

The variance of X 1 ia C n and that of (f) is 



whose value is equal to the last expression in (g), and which has been reduced to 
k 11 

C^C. .C J . Hence the multiple correlation coefficient is 



l..1-g 

TT- .. " I/ 

Cii 




It will be observed from (h) that 

p 



k6 II. DISTRIBUTION FUNCTIONS S2.9U 

and hence by 2.72, R?. 2 3...k " 1 lf ' and only lf> a11 of the Probability in the 

k- dimensional space of the random variables lies on the least-square regression surface 



X 1 " a 1 " 

It should be noted that a partial correlation coefficient between X 1 and X g 
with respect to X^, X^, ..., X^ could be determined for the case of a linear least- 
square regression function by replacing a. and a Y _ v by the corres- 

hr*' x k 2 ' x r c r+r' -x k 



ponding linear least-square regression functions in determining C 12 . r ( r+1 ) 



Again, we remark that analogous results can be obtained by using an empirical 
c. d. f. Px^X,...,) instead of a probability c. d. f. F(x ,x, . . .,x). 



CHAPTER III 

SOME SPECIAL DISTRIBUTIONS 

In the present chapter, the notions of the preceding chapter will be exemp- 
lified by considering certain distributions that arise frequently in applied statistics. 
We shall begin by considering distributions for the discrete case. Since the distinction 
between the random variable X and the corresponding independent variable x of the dis- 
tribution function has been made clear, we shall henceforth denote both by the lower 
case x unless this leads to ambiguity. 
3*1 Discrete Distributions 
3.11 Binomial Distribution 

An important distribution function of a discrete variate is the binomial dis- 
tribution which may be derived in the following manner. Suppose the probability of a 
"success 1 * in a trial is p and the probability of a "failure" is q = 1 - p. For e- ample 
the probability of a head in a toss of an "ideal " coin is ~ and the probability of not H 



head (a tail) is 1 - - = -. We can represent these probabilities in functional form 



~ 
- 

f(a) where f(<x) - p for a - i , a success, and f(<*) - q for <x *= 0, a failure. In other 
words f (a) is the probability of obtaining a successes in a single trial. 

The probability associated with n trials which are mutually independent in 

the probability sense is 



f(,) . f(a a ) . _ . f(o^). 

The probability of x successes and n - x failures in a specified order say 
2 ' ..... a x = 1 ' a x + i 



" a = 1 a = 0,....,a- 0, ia 



f(1) x f(0) n " x = p x q n " x . 

The number of orders In which x successes and n - x failures can occur is the 
number of combinations of n objects taken x at a time which is 

{a) n c x = x!(n-x)J ' 



it 8 



III. SOME SPECIAL DISTRIBUTIONS 



4i.ll 



These n C x orders are mutually exclusive events. Hence, to find the probability B(x), 
say, of exactly x successes irrespective of order we add the probabilities for all of the 
C orders, thus obtaining 



(b) 



B(x) 



B(x) will be recognized as the (x+1 )-st term In the expansion of (q+p) n . This demon- 
strates that the sura of the probabilities is equal to unity, 1. e. 



(q+p) n - 



x=o 



Hence / B(xM is clearly a c. d. f. P(x). 

To derive the momenta of the distribution B(x) we will find it convenient to 
use the m. g. f . 

r n 
(c) <t>(e) - E(e xe ) 



x-o 



The h-th moment of x can be expressed as 



In particular the mean E(x) Is 
(d) 

and the second moment about zero is 

- 2 



6=0 



6=0 



npe e (q+pe e ) n " 1 



6=0 



np , 



en 



ae 



(q+pe) 



een " 1 



npe(q+pe) 



6=0 



n(n-l)p 2 e 2e (q+pe e ) n ~ 2 



6=0 



6=0 



np + n(n-i )p . 



Therefore, the variance is 
(e) 



or = np + n(n-l )p 2 - n 2 p 2 = np - np 2 = npq. 



S3.11 III, SOME SPECIAL DISTRIBUTIONS 1*9 

Example: Applying the binomial distribution to the coin tossing problem, 
we have p = ~ and q - ~. The probability of x heads is 

The mean and variance are, respectively, 

u. 1 ^ = n 

In deducing B(x) we have assumed that p remains constant from trial to trial. 
If the probability is different for each trial, our conclusions must be modified. Let PJ 
be the probability of a success in the i-th trial (1 * i,2,...,n) and q^ - 1 - p^ the 
corresponding probability of a failure. Let 

P = EPI- Q-HI^QI- ' - P- 



Then the expected value of x - 2L_ a i > th total number of successes in n trials, is 

1-1 1 



= E(<x 1 )+...+E(a n ) - p^...^ - np. 



The variance of o is P. Since the trials are independent the variance 



of x - yL 



Noting that p, - p + (p4-p) and q^ - q - (p--p) we can write the variance 



(f) ^_ _ 

1-1 

This is obviously less than the variance, npq, we found above. When the probability is 
constant from trial to trial, the distribution is known as the Bernoulli case; when the 
probability varies, we have the Polsson-caae. 

In 2.71 it was proved that if a variate x is distributed about the mean a 

P 
with the variance o- , we have the Tchebycheff Inequality 

Pr(lx-a|><Scr) ^ lg 

for any <J>0. In the binomial distribution x has mean np and variance npq. Let us change 
to the variate r - ~, the "relative frequency" of successes. We have E(r) - E(^) - fj 
" ^n " p ' Similarly, o- SE| , EL. The Tchebycheff inequality states that 



_20 III. SOME SPECIAL DISTRIBUTIONS .12 



If we choose 6 ^fcrA, this inequality becomes 
^ppq 

(8) Pr(|r-p|>A) ^M . ^ 

Inequality (g) expresses what is known as the 

Law of Large Numbers: For any given positive number A, the probability that 
r will deviate from 2 ]L more than A can be made arbitrarily small by choosing n suffic- 
iently large . 

Roughly speaking, the larger the value of n, the more the probability 
"piles up" around p (the mean of r) such that in the limit (as n > oo ) the probability 
is all piled at p. 

In the example of "ideal" coin tossing r is the ratio of number of heads to 
total number of tosses. Then 

Pr(lr - ||>A) < -^r- . ~~T * 

1 M ' 

Example: If A = 0.1 and n = 100, we have Pr( | r - ^|>.1 ) < -r^r r; In other 



words, the probability is less than -j-*~'that the relative frequency of heads will 
deviate from ^ by more than o.i . 

3.12 Multinomial Distribution 

An immediate generalization of the binomial distribution is the multinomial 
distribution. Suppose an event is characterized by a variate that can take on one and 
only one of k values, say y 1 , y 2 ,..., y k . For example, if the event is the throw of a 

die and if y is the number of dots appearing on the top face, y can take on only one of 



the values 1,2, 3, ^, 5, 6 In each throw. It should be noted that the k mutually ex- 
clusive kinds of events may not correspond to k values of a one- dimensional variable y. 
Thus, if Cj, C 2 ,...., C^ are k kinds of events, (e. g., the sides of a die may be 
colored rather than numbered) one and only one of which will occur in each trial, then we 
may let y be a vector with k components (y ,y ', ,y ' ), such that the value of the 
vector for an event of type C 1 is ( 1 ,0,0, . . . ,0), the value for one of type C 2 is 
(0,1 ,0, . . ,o), etc. For convenience, we could denote these values of the vector y by 
y 1 , y^, etc., and proceed as in the case where y 1 , y 2 ,...., y k are different values of a 
one- dimensional variable y. 

Let the probability of y being y i be p i where 2^p i i . The probability assoc- 
iated with n trials is 



S3-12 - III, SOME SPECIAL DISTRIBUTIONS _ ai 



where each of the y's will have one of the values ^ , y 2 ,...,y k , where f(y 1 ) - p 1 (il,2, 
. ..,k). We now wish to find the probability that x 1 of the y f a are y 1 ! s, x 2 of the y f s 

are y s, etc., (2_x, n). 
2 l-i x 

The probability of x 1 events characterized by y 1 , etc., occurring in a speci- 

fied order, sayy (1) -y,,...., y ( ^ } - y, , y (x ^ } . y g , . . ., y (n) - y k , is 






The number of different orders in which we can get x 1 y 1 ! s, etc., is the number of ways 
in which n objects can be permuted where x 1 are of type C 1 ,...,x k are of type C k , that is 



So the probability of x 1 y 1 's, x g y g a, etc., irrespective of the order in 
which they occur is given by adding the probabilities of various possible orders. We 
obtain 



(a) 



This may be recognized as the general term in the expansion of (p^pg*...^) 11 . Hence, 
the sum of M(x 1 ,x 2 ,..,x k ) over all partitions of n, that is, all sets of x i (^c.-n, 
is unity. 

To find the means, variances, covariances, and higher moments we set up the 
m. g. f. 



(b(e 1 ,e 2 ,...,e k ) = E[e 



ni -Vi x i X 2 x k 




i" 
e, e,, n 




92 III. SOME SPECIAL DISTRIBUTIONS 

The mean of x, is 

(c) E(x. ) - t| - np^ I (p 1 e 

JL ^f I *" ' 

And 



de; j. i . j. i . |ea*o 

1 e f s-o 

np* + n(n-l )p< . 
Therefore, the variance of x^ is 

2 222 

(d) <r > npj -f n(n-l )p^ - n p.. np^(1-p^ ), 

1 * 

In a similar manner we find the covariance between x^ and x^ to be - np^p^. It is clear 
that the binomial distribution is the special case when k 2. 

3.13 The Polsson Distribution 

The Poisson distribution is in a sense a particular limiting form of the bi- 
nomial distribution. We shall deduce it from geometrical considerations. Let AB be a 
line segment of length L and CD a segment of length 1 contained in AB. 



AC D B 

Figure 5 

Let the probability that a point taken at random falls on an interval of length du be 
j^ ; that is, the p. d. f. of u is a constant. The probability of the point falling in 
CD is ^. If we let n points fall at random on AB, the probability that exactly x of the 
fall on CD is given by the binomial distribution ((a) of 53.11 ) 



n: 



Now let n and L increase indefinitely in such a way that the average number of 
points per unit length is a finite number k^ o, i. e., k. Now 

n/ ^x n(n-l ). .(n-x+i ) /nlvX,, n l x n-x 
B( X ) -| (j-) (1- j- 5 ) . 

[x!u x ] 
So the limiting value of B(x) for a given x Is 



III. SOME SPECIAL DISTRIBUTIONS 



Lim B(x) - Lim 



x: 



n, x n n l,n-x (kl) x e' kl 
E D (i- E . H ) - ^T 



Let kl = m, and we get the usual expression for the Poisson distribution 
(a) p(x) = ! 

The aim over all x Is seen to be 1, 



= e- m (i+m+f?+...) = e' m e m - i. 




xo 



The m. g. f. is 



(b) 



A/^\ ^ 
<i)(6) = e 



CD CD 

m V ^ X V ,6 

""A 8 ^T - e Z^T 



Prom this we derive the momenta about zero in the customary manner. 



(c) 



E(x) - ii 



Efx^) 



6=0 



ae 



6=0 



= m + m 



e=o 



Therefore, the variance is equal to the mean , 



(d) or ~m + m -m =m. 



This argument given for one dimension immediately extends to two or more di- 
mensions. For example, for two dimensions we would take AB and CD to be regions of the 
plane, the latter contained in the former, arid k to be the limiting ratio of the number 
of points per unit area. The Poisson distribution is applicable to problems dealing with 
occurence of events in a time interval of a given length such as emission of rays from 
radioactive substances, certain traffic problems, demands for telephone service and 
bacteric count in cells . 

Example ; Let us consider the following problem as an example to which the 
Poisson distribution is applicable. If X-rays i-.re considered as discrete quanta and 
if the absorption of k or more will kill a certain unicellular organism, what is the 
probability that an organism of a given size S on a given glass glide will escape 
death by X-r^-ys after being exposed for t seconds? On the assumption thcit the 



III. SOME SPECIAL DISTRIBUTIONS 



projection of the organism of size S on a plane has an area of a, and m la the 
average number of rays striking an area of size a in t seconds, and the rays 
appear independently and at random, then the probability that x of the X-rays 
hit. the organism in t seconds is 



P(x) - 



x! 



k-1 



Hence, the probability of survival ia > p(x). The average number of rays ab- 
sorbed by the survivors is 



> xp(x)/ 5~p(x). 
M) / x^T) 



3*n The Negative Binomial Distribution 

Another discrete distribution which is closely related to the Bernoulli bi- 
nomial distribution is the negative binomial. If we expand, according to the binomial 

theorem, 

(q - P)" k 
where q - 1 + p, k > o, p > o, we get as the general term 

(a) q- k [ 



When we interpret this as a probability function of x, p(x), it ia called the negative 
binomial distribution and ia defined for x 0, 1, 2,... We notice that the sum of p(x) 
for all x is unity , 



oo 



oo 



X=0 X=0 

The m. g. f . is 
oo 



(b) 



OQ 



From this wo find the merm 



(C) 



E(x) 



at 

06 



6=0 



kp 



83. 



III* SOME SPECIAL DISTRIBUTIONS 



and 



ae 



k(k+l 



6-0 



- kp + k(k+i )p* 



e-o 



Therefore the variance is 
(d) 



<r 2 - kp + k(k+l )p 2 - k 2 p 2 - kp + kp 2 - kpq. 



The similarity of this m. g. f . and these moments to those of the positive binomial dis- 
tribution should be noted. 

It can easily be shown that a special limiting case of the negative binomial 
distribution is the Poisson law. If we let p o and k * coin such a way that 



lira kp - m , 



then 



-Urn 



- lira (n) * j, I 
k-oo 

= e->. 

If we make a change of parameters, we have the usual expression for the 
Polya-Eggenburger distribution. Let 

h 

k -y . D = d. 
O 

Then the distribution may be written as 



(e) 



P(x)=(Ud) 




This distribution, one of a number of contagious distributions, ia useful In describing, 
for example, the probability of x cases of a given epidemic in a given locality. 

If we Interpret - as the probability of a "success" and jj as the probability of 
a "failure" In a trial, then it will be seen that (a) is the probability that x + k trials 
will be required to obtain k successes. For the probability of obtaining k - 1 successes 



JJ6 _ III. SOME SPECIAL DISTRIBUTIONS _ SS?-*, 
and x failures in x + k - 1 trials is 



_ 

(k-l)J x. 1 q q 

Now the last trial must be a success. Therefore multiplying this probability by (~), 
the probability of success, we obtain (a), the probability that x * k trials will be 
required to obtain k successes. 

3.2 The Normal Distribution 

3.21 The Uhlvarlate Case. A very important distribution is the normal or 
Gaussian distribution 

ke -h 2 (x-c) 8 

defined over the range -co < x < oo where k, h, and c are constants.* Various attempts 
have been .made to establish this distribution from postulates and other primitive assump- 
tions. Gauss, for example, deduced it from the postulate of the arithmetic mean which 
states, roughly, that for a set of equally valid observations of a quantity the arith- 
metic mean is the most probable value. Pearson derived it as a solution of a certain 
differential equation. It can be shown that it is the limiting distribution of the 
Bernoulli binomial distribution. We shall not derive the normal distribution from more 
basic considerations, but we shall observe that it arises under rather broad conditions 
as a limiting distribution in many situations involving a large number of variatea. 

We can determine k in the distribution by requiring that the Integral over the 
entire range be unity. If we let u h(x-c), we wish 

JdP(x) - H \ e" u du 1. 



r 2 

To evaluate the integral I = } e" u du we observe that 



2 

r u 

-GO 



oo oo 



- \ 



-00 -00 

Changing to polar coordinates u r cos e, v - r sin e, we get 

2rr oo 2 

l2 " \ i re ~ P ^ de " 


Therefore, we take k - *== . 



83.21 III. SOME SPECIAL DISTRIBUTIONS 



The mean of the distribution is 

CD 

> 

-00 



-00 -00 

The latter Integral is zero because the integrand is an odd function of x - c. So 

oo 



a = E(x) = 

* n 

-00 

The variance is found by integration by parts, 



-00 

We usually write the normal distribution with c and h expressed in terms of a and cr^ f 

respectively, i. e., 

. (x-a) 2 



(a) f(x) - -=1- e 

|[27Tcr 

We shall refer to this distribution as N(a,<r 2 ). 

To find higher moments (about the mean) it is convenient to use the ra. g. f . 
of the normalized variate ~ 



x-a 
E(e 6( "o r " ) ) - -~ \ e e 2a dx 



e 

-oo 



Setting jr - e y, the last integral becomes 

oo 1,,2 i 



-oo 

Hence. 

-Le 2 

(b) t(e) - e 2 



III. SOME SPECIAL DISTRIBUTIONS 



It should be noticed that the normal distribution is symmetrical with respect to the 
line x - a, its mean. The smaller the value of a^ is, the greater the concentration 

about the mean. In fact o- is the distance from the mean to the points of inflection: 
f(x) 




a-o- a a+<r 

Figure k 
Because of its wide application and because of its theoretical importance, the normal 

distribution has been the origin of much of the terminology and many of the concepts in 
statistics. 



The integral 



00 



du = 1 - F(x) 



is widely tabulated; the ordinate 



1 e 2 



la also tabulated in many places. The value of x for which 

t ru! 

O 1 

e du TT 



-x 



is called the probable error and is approximately . 

It can be readily verified by applying Theorem (C) ^.21, that as n * oo the 

normalized variable x " n P , where x is distributed according to the binomial law, has the 

Vnpq 
limiting distribution N(0,1 ). For we may write 



x-np 




where x lf x g ,.. f x n are independently distributed according to the law p(x) p x (l-p) 1 " x , 

(x-o, or 1 ). The mean of this distribution is E(x) - - X" X P X ( 1 "P) 1 " X " P* and the vari- 
ance is o- 2 - 21 (x-p) 2 p x (i-p) 1 ~ x - pq. The applicability of Theorem (C),U.2l, is then 
x0 



obvious . 



. 22 TIT- SOME SPRHTAL DISTRIBUTIONS 



3.22 The Normal Bivariate Distribution 

The extension of the normal probability density function to the case of two 
variables, x 1 and x g , is straight forward. We replace (x-a) 2 by a quadratic form in 
x 1 - a 1 and x 2 - a 2 . The distribution may be written 

-iQ 
Ke 2 

where Q - A^y 2 + 2*! 2*1*2 + A 22 y 2> y i " x i " a i' and K > A n > > A 22 > ' A 12 are con " 
stants such that A.-A 22 > A 2 . These inequalities on the A f s are necessary and sufficient 
conditions for Q to be a positive definite quadratic form in y- and y 2 , i. e., Q > 
unless y 1 - y - o. We wish to determine K so that the integral of the p. d. f . over the 
x,jX -plane is unity. The integral transforms to 



-CD -CO 

(a) 



A 

0805 - 



-1 S 



6 ii n^ r\^ u J^ V4 Jp 

-oo-oo 

If we let y 1 -f ^ y z 1 , and integrate z. and y in (a) from -oo to --oo, and use the 

1 *n d * 

fact that 

oo 



-oo 
we obtain for (a) 

K . 



If the integral is to be unltyj we must choose 



r- 
J[A 



2TT 2TT 

where A is the determinant 

A 1 1 A 12 

A 12 A 22 

We may, therefore, write the distribution as 



60 



III. SOME SPECIAL DISTRIBUTIONS 



2tr 



where 



^ . 



In order to find the means, variances, and covariance of x. and x p , it will 
be convenient to obtain the nu g. f. of (Xj-a-j) and(x 2 -a 2 ), i. e. 



e) - E(e 



) 



(c) 



Letting 



HJ< 

-oo-oo 



- y. , we have 




* 

iiy J 






-00 -00 



where R 



-oo-oo 



A. eA6-2A 6.6 



l . . .. 

2 - lg ] 2 = A 1 1 ef+A 22 6^2A 12 e 1 e 2 where A lj 



. cof actor of A, . in A 



Making the change of variables 



and integrating with respect to z 1 and z g , we obtain 

(e) 6(6^63) e l ^' a " 1 ". 

Now consider the problem of finding the mean values of x 1 and x ? . We have 

- o. 



Hence E(X I ) - a, . Similarly E(X S ) - 



S3 *22 



TTT. SOME SPECIAL DISTRIBUTIONS 



To find the variances and covariances of x. and x 2 , we must take second derivatives. 
Thus to find the variance of x. we have 



f - E[(x r a 1 ) 



O 
36? 



6,^2-0 



Similarly, 



For the covariance, we have 



- A 



22 



a- 2 - E[(x r a, )<x 2 -a a )] 



If the three equations 



(f ) 



of- A" 



2 .22 
" = A 



=A 



12 



- A 



12 



, 



are solved for A.-, 



2 , 



we obtain 



tribution 



(h) 



We may summarize as follows: 



Theorem (A); If x 1 , x 2 are distributed according to the blvariate normal dis- 

_ 1 *~~ 
g 



the m. g. f . of (x 1 -a 1 ) aiid(x 2 -a 2 ) is given b^ (e); E(X I ) = a^, (1-1,2); the variance 



ii i P 

of Xj 1 A (11 ,2) and the covariance between x 1 and x g la A . A 



, A 12 are 



expressed In terms of variances and the correlation coefficient between x 1 and x b^ ( 
Expressing A 1 1 , A 12 , A 22 in (h) in terms of cr 2 cr 2 and p, the distribution 

(h) may be written as 



62 



III. SOME SPECIAL DISTRIBOTIQNS 



(x 1 -a 1 )(x s - a ) 



(1) 



e 2 <'-P ) *i 



The marginal distribution of (1) with respect to x, la the distribution of 
x,. Thus Integrating (1) with respect to x 2 we obtain as the distribution of x, 



A similar expression holds for the distribution of x g . 

'Ve would also like to know the conditional probability function 



Substituting the expressions for f(x r x 2 ) and 
flnd 



from (a) and (b), respectively, we 



Thus, for a fixed value of x 1 , x 2 13 distributed according to N(a 2 4p 5 ^(x 1 -a 1 ), <Tg(l-f 2 )). 

In a similar way we can show that the marginal distribution of x 2 is 
N(a 2 ,o|) and the conditional probability of x^ 1 given x 2> is NU,* p^(x 2 -a 2 ),cr^(l-p 2 )). 
It will be observed that if p - o, the marginal and the conditional probability distribu- 
tions of x 1 (or x 2 ) are identical. 

Since the conditional distribution of x 2 is ti(a 2 *f^(x^-a. } ), cr 2 (i-p 2 )), the mean 
value of x 2 for the Interval (x^x^dx, ) is simply a g -pP(x 1 -a 1 ). So the regression 



function of 



on 



x 1 is linear, that is, 



00 



Similarly 



p J- (x 2 -a g ). 



Since o-g (l-p ) Is the variance of x g about the meana 2-x in the conditional probability 
distribution, the nearer p 2 la to 1 , the smaller is this variance. If p 0, x 2 does not 



III. 30MB SPECIAL DISTRIBUTIONS 63 



depend on x 1 ; the two variatea are independent and 

(*r*1 )2 (* 2 -a g ) g 

2cr 
f(x f x) - 



3.23 The Normal Multivariate Diatribution 

Let us now consider the extension of $3.22 to the case of k variatea. 

Let 



Xg,...^) - Ce 

where 1 1 A^i 1 1 is a synmetric, positive d^n^j matrix, that is, AJJ - Aj^ and 
> A 11 t 1 t^ > o for real t,, not'all zero. 

i,j1 1 J ! J 1 

We wish to determine C so that the integral over the entire range, -oo<x^<oo, 
is unity. We must have 



i- 

-00 -00 



To evaluate this integral, we transform the variables. Let 



Then 

oo i 



-00 -00 

where Q - > A, 



f ~? * 
...] e dyi*r 2 



Ifow we can write 

" A 



1 1*1 P ^- A A 
' J J * X^ M 4**i 



Let < 

z, - y, + -* 

1 1 



III. SOME SPECIAL DISTRIBUTIONS 



Then 



1 

7- 



...J 



.dy k . 



-CD -03 



The range of z 1 la -oo < z^ < oo . 

We should observe that the quadratic form Is again positive definite, that Is, 



A,J? 



for real s^ not all zero. For If therp were such a set of s's for which this quadratic 
fonn were zero or negative, it would be implied that there is a set of t ! s for which 



We continue this process, in turn letting 



+ J 



41' 



and correspondingly 



v<2) 



(D A (D 



.(k-2) .(k-2) 
(k-1) A(k-2) Vi I .k ^k^k-i 

' - 



Each quadratic form in this sequence is positive definite by the foregoing argument. The 
integral becomes 



1 

TJ 



T ? -^ 

J ... I e 



dz r ..dz k . 



-00 -OD 



The final quadratic form is positive definite, so A n > o, 
we can integrate on each 2 in turn, using the fact that 



> o, 



1 ' > o. Hence 



e 

-OD 



-cx 



^ 
! c 



3*23 



III. SOME SPECIAL DISTRIBUTIONS 



Therefore, we get 



1 
TJ 






(21T) 



(D 



To find the value of w, let us evaluate by Lagranges 1 method (known alao as 
pivotal condensation) the determinant of II AJ || 



A 21 A 22 " A 2k 



- A, 



1 A, 

f\f* m 



^22 



K1 



If we subtract A 12 times the firsL column from the second, etc., we get 



IA! - A, 



1 .... 

A 21 A A 21 A 12 . . . . 
22"" S 

A 



AT; 



A kk " 



A 2i A ik 



A ki A ik 






(a) 



Continuing in this way, we find the value of the determinant 

- 



Therefore, the constant we are seeking is 



(2TT) 



and the normal multivariate p. d. f. is 



(b) 



iTJAT 



66 _ _ _ TIT. SOME SPECIAL PTftTRIBOTIQNa t? ? 

At this point we should notice some properties of positive definite quadratic 
forms and matrices. Since IA| - AA ^ > ' A ' is P sitive > ** or each of 



factors is a positive constant. Corresponding to each principal minor of II A^j II of order 
h, there is L. quadratic form in h variables. This quadratic form is again positive def- 
inite. For if there were a set of h t's (not all zero) making this form zero or negative, 



this set and the (k-h) other t f s zero would do the same for > A..,t,tj. Since the de- 

173=1 J J 
tenninant of a positive definite matrix is positive, it follows that every principal minor 

is positive. Conversely, if every principal minor is positive the matrix or the quadratic 
form is positive definite, for then each AJ[' is positive and the above process of re- 
ducing to a sum of squares may be carried out. 

The transformation to the z ! s is linear, of the form 




where b^ * - for j < 1. The process we have used proves the theorem that any positive 
definite quadratic form may be "diagonalized" by a real linear transformation. If we 
followed this by the transformation 



W 

w i 



we would have reduced the quadratic form to a sum of squares. This last is equivalent to 



i- V A "" :Jt 

(c) 



w - \/A (l " 1 Ob x 
* W ii 1 ij J 




Now we wish to show that the mean is 



To do this we differentiate both sides of the following equation with respect to a : 



6 

-QD -00 



Since <L. ^.A, .(x,-^ )(x .-a . ) -2 ^L A b ,(k,-a.) the differentiation of the above equa- 

da h l,j 1J x x J J J=1 nj J 
tlon gives us 



$3.23 _ III. SOME SPECIAL DISTRIBOTIONS 67 



2 J ... 



SO 



for i - 1, 2, . .., k. This gives us k homogeneous linear equations In the k unknowns, 

E(x,-a,). Since the determinant of the coefficient matrix, |A|, is not equal to ero, 

J J 

the only solution to these equations is that all the unknowns be zero. 

aJ - 0. 



So 

(d) E(xj) - a,j, j - 

Next we wish to show that the covariance of x^ and x* is 



, < cof actor of A< v in II A- ^ II 
a^Xx.-a,)] - A 1 ^ - - *J - 3J_ . 

1 J J |A| 

To demonstrate this we differentiate with respect to AJ both sides of the identity 

P ? - 

e 
J J 



Differentiating, we have 



(2tt 



? ? *ir 2 " 

f < "2 Xx-'a-Xx.^,)] e 

J 2 l l j j 



d2 " I d ir 2 

2 . (cofactor of A) - (-* 



where ^ 1 if 1 - j, and - o if i j( j. 

is euation by (- 

i j 



P "5 
If we multiply both sides of this equation by (- ) |A| the left hand side is 



and the right hand aide is A J. So we have 



66 _ III, SOME SPECIAL DISTRIBUTIONS 
(e) <r - Et(x-a; 2 ] -A 11 , 



(f) 



We may summarize aa follows: 

Theorem (A): If x. , * 2 , . . . , x^ are distributed according to the normal multi- 
varlate distribution (b), then E) - a, < - A 11 , and 



Now let us find the 'Joint marginal distribution of x 1 ,x g ,..,x r (r<k). To ao 
this we integrate out x p 1 ,..-.,Xj c , getting 

I.) 

We can see this is true if we recall the procedure used in evaluating . If at any stage, 
we had integrated out the z f s, we would have had remaining a normal multivariate distribu- 
tion of the x's. 

We wish to find an expression of the B UV in terms of the A^i. We know that the 
value of E[(x u -a u )(x v -a v )] is A uv if found from the original distribution and ia B uv if 
found from the marginal distribution. But these two expressions must be equal. Therefore 

A U Y - B uv . 

Hence, to derive || B uy || from || A^ J| we delete from || A J || the last k - r rows and columns 
(obtaining HB UV || ) and take the inverse of this matrix. 

In particular, suppose r = 1 . We find the distribution of x 1 to be 



where 



where 75^ - cof actor of A n in A. 

Thus, tr* - A 11 

Similar distributions exist for the other x ! s. 

This result gives us a simple method of finding the m. g. f. of 
(x 2 -a 2 ), ..., (x^a^) defined by 



43.83 



III. 30MB SPECIAL DISTRIBUTIONS 



(h) 



- E(e 






6 4 (x.-a, ) 



-1 



j. 

<*Z. 



l-i 



-oo 



(1) 



Consider the expression 



+ I A oo< x o- a o> 2 , 



GO 






where 
inite. 



and A Q - (A^J (1, j-o, i,2,.. f k) and 



(i, j0,i ,2, . . ,k) positive def- 



If we set A Ql -^ and (x -a ) * 1, then it will be seen that the expression 
(i) is exactly the same as that defining 4)(6 1 ,$ 2 , . . .>\). But the expression in [ ] is 



where B 1 



Now ^7 argument presented in 2.91*, we may write 



A oo A oi ' ' ' A ok 
A io A 11 ' A ik 



" A od A| - 



Therefore we have 



(k) 



k 

A oo " /nr 1 A oi A oj A 
i> j=i 



Substituting this value of B^ in (j) and the expression for ( j) in (1) we find that (1) 
reduces to ' 



(1) 



Setting x Q -a = i, and A Q > --e^, we therefore obtain the following result: 

Theorem (B): If x, , x , . . . ,x,, are distributed according to the normal multivar- 

a^HM^BBS^aKeRBOMI^fe ^ I ^ ^ .._.._. __ ..i - . _ . - 



70 _ ITT. 3CMB 



late law (b), the m. g. f . of (x^a, ), (x s -a. s ),..,(x^-& k ) is 
(m) 



The argument leading to Theorem (B) may be readily applied to show that any 
r (r^k) linearly Independent linear functions of (Xj-a^, 1 - 1,2,..., are distributed 
according to a normal r-varlate distribution. To show this, let 



(n) 

* i-1 

be the r linearly independent linear functions, i. e., such that there exists no set of 
constants C (i>i ,2, . ,,r) not all zero for which ^ Ip^C- - 0, io,2,..,k. Let 
^,6 , ...,e ) be the m. g. f. of the L, i. e., 

*- 

IT D ^ ( Y Q ^ _i- .^ A T 

4 ^*-l / \ ^ 4 ** 4 / "* f ri 'ri 

,e A ' J "' " 1 dx,...dx k 



(21T) 2 - 00 



(o) 



I ]]' 

f 

i ^Y Q ^^Y fi ^J./ "t" ^Y Q 

4 ^V^ ^*4 / \ ** 4 Cl J / T X 1^4 V 4 < 

J Jw -J-iX 



where t 1 - 26 1 . . The value of this integral is given by (1) with x -a - i, 



oo 



A ol =-t r Thus 



(P) 6(6, ..... e p ) - e ^J- 1 



where 

Now consider the quadratic form 

X BP Vq= ^^^W 1 ^^Q*- 

If || A ij '[| is positive definite and if 1 are linearly independent, then clearly || B pq || 
is positive definite. We therefore have 

Theorem (C); Let x 1 , . . ,x k be distributed according to the normal multivariate 
i_aw (b), and lot L p = ]^l pl (x 1 -a 1 ) (p-i,2,..,r) be linearly independent linear functions 
<2L t/no x j " ' j Then L p are distributed according to the normal r-variate law 



3.23 HI* SOME SPECIAL DISTRIBUTIONS 71 



(2tr) 2 



A 1 ^! ,1 , . 



where II B^ll is the inverse of the matrix II B pq ll, and B pq > A 1 ! , 

- pq ------ i -^- i pi 

Next let us find the conditional p. d. f . 

r(x^,..., x k ) 
f(x 1 |x 2 ,...,x k ) - g( x 2 ,..., Xk ) > 

where g(x 2 ,...,x k ) la the marginal distribution of the last k - 1 variables. Using the 
marginal distribution found abo^ where now II B pq || = II A pq || (p,q 2,,..,k) and also 
t we get 



(r) f(x l !x 2 ,...,x k )dx 1 



dx dy dx 

ox 1 ax 2 ...ox k 



rar 



Therefore, f9r fixed values of x , ...,x k , we have x 1 normally distributed with variance 
and mean 



k 
(3) E( Xl lx 2 ,...,x k ) - a, - 



The regression function for the multivariate normal distribution is linear. 



III. SOME SPECIAL DISTRIBUTIONS UJL 



3.3 Pearson System of Diatribution Functions 

Thus far we have dealt with special distributions which arise under certain spec- 
ified conditions. Several attempts have been made to develop a general system of distri- 
butions which can describe or closely approximate the true distribution of a random 
variable. 

One of these systems derived by Karl Pearson Is based upon the differential 
equation 



(a) 



b+cx+dx 



Depending on the values given the constants a, b, c, and* d we get a wide variety of dis- 
tribution functions as solutions of the differential equations. We get J-shaped and 
U-shaped curves, symmetrical and skewed curves, distributions with finite and infinite 
ranges. 

The normal distribution may be obtained aa a solution of the differential equa- 
tion for c*d0 and b < o. This function Is Type VII of Pearson's twelve types of solu- 
tions. 

Another special case we shall be interested in Is d - 0. Then the equation is 

dj; (x+a)y 

ox b+cx 

Writing this as 

dy dx /ca-b\ dx 

"~ + ( ~ ) 



we see the solution Is 



2 

C 



Changing the constants, we have 

y Ke^tx+atf- 1 P>o,V>o, 



00 



where K is chosen 30 K \ e~^ x (x+a) ~ 1 dx i. This is the Pearson Type III distribution, 
>r-at<x<oo7 a 

To determine K we m^,ke tlie Indicated integration. Let 



defined for -ot<x<oo7 a 



- x 



83.3 III. SOME SPECIAL DISTRIBUTION 



CO 00 

K 

-Qt 

where K ! Ke' *(f v . Therefore we choose K 1 so 





This last integral is an important function of the exponent V denoted byRv), the gamma 
function of V. 

To evaluate f(v ) we integrate by parts, using z v " 1 as u and e~ z dz as dv. 



-Z 



oo oo 

+ (v-1 ) i ( 
o 



This gives us a recursion or a functional equation forRv). If V is an integer, 

(b) Rv) - (v-i)(v-2)...2.ini). 

Since 

oo 
Ri) - J e~ z dz -i, 

o 
we have for V an integer, 

Rv) - (V-1)! . 
It is also easy to evaluate T(V) if V is an integer plus ^. For 



00 _1 QD 

(i)- 



I--" 



o -oo 
and we have 

(c) f^(v ) (v-1 )(v-2). . ..5 VTT. 

In general for v > 0,Tcv) has a finite value, and in any interval (a,b) of values of 
V(0<a<b), Rv) is continuous. Rv) has a minimum f or V 1.1*6163. 



_lit III. SOME SPECIAL DISTRTBOTIONS 

With this determination of K, the Pearson Type III distribution is 



(d) 

This distribution for the case a o and p 1 is known as the ^-distribution with 2V 
degrees of freedom and Is one of the most important distributions in statistics. It and 
certain applications will be studied ih detail in Chapter V. 

It will be convenient at this point to find the moment -generating function of 
the distribution (d) when a 0. We have 

s*\ 
U 

6(6) E(e ex ) - pA-y 


CD 



y> 

= (fi-e) v ' 

Therefore, for p~6 ) 0, we have 

6(6) d - |r v . 

F:>r p ~, we have 

(e) 6(6) - (1-26)" V , 

which :Ls the m. g, f . for the X 2 - distribution with 2V_ degrees of freedom. 

Next let us consider the solution of the differential equation (a) when 
dx 2 -f ox + b has two real roots, say g and h (g<h), both different from -a. Then, using 
partial fractions we can write the equation as 

d(x-g)(x-h) 



<*Z y( x + a _ / A B v 
dx * d(x-g)(x-h) " y 4-g " h-x ; 



where A and B are functions of g, h, a and d which we do not need to determine. 
The solution of this equation is 



y C(x-g) A (h-x) B , 



*3.3 _ III. SOME SPECIAL DISTRIBUTIONS _ 7? 

where C la a constant of Integration. We wish to determine C ao that 

h 



C$(x-g) A (h-x) B dx - i. 



If we let x - g + (h-g)v, the Integral becomes 

1 

C(h-g) A+B ^ J /0-v) B dv. 


Because we will need the result later, let us evaluate the integral, namely 

\ n-i n p -i 
} v 1 (i-v) 2 dv, 

o 
which is known as the Beta Function of n 1 and n 2 , B(n 1f n 2 ). We wish to show that this is 

Hn.nrO 

(f) B(n v n 2 ) - ! - ^ . 

1 2 



To do this we consider the product'Rn., ) Hn 2 ), where 

*? n.-i -x 
I (^ ) . } x ] e dx , 

o 
and similarly for r(n 2 ). Letting x - s 2 , we get 



2n-i 2 
Rn 1 ) = 2 ] s ] e 3 da. 



So we can express F T (n 1 )T(n 2 ) as the double integral 

oooo 



n 1 )Rn 2 ) - U \ J s 1 t 2 e~ a -t ds dt. 




If we change to polar coordinates , 



s r cos e, 

t = r sin 6, 



this Integral over the positive quadrant of the at plane becomes 

f f sn r l 2n n -i ?n + 2n -l g 

ii I \ cos e sin e r " c dr de. 



Now 


2n 1 +2xig-l 2 
r e x dr - I (n^ng). 





If we let cos^e - x, we get 



H 

2 s 

2J COS 




,-i 



2n--i A n.-l n p -i 
sin e de * \x (i-x) dx - 



Combining these results, we have 



thus proving our desired result. 

Therefore, the Type I distribution may be written In the general form 

t ^ RA+B+2)(x-g) A (h-x) B 



There are twelve types of Pearson distributions. Below are graphed several 



representative ones 
It 




Figure 5 
}.k The Gram-Charlier Series 

Another rather general system of distribution functions, known as the Gram- 
Charlier Series, is based upon the normal distribution and its derivatives. Instead of a 
number of distributions of different functional forms, this system is composed of an in- 
finite series of terms of a certain kind. Charlier gave a theoretical argument for this 
system from his development of the hypothesis of elementary errors. We shall regard it, 
however, as a distribution which has been found satisfactory for fitting or "smoothing" 
certain empirical distributions. 



III. SOME SPBCW, pHnHWffMW 22- 



generator of this series is the Gaussian or normal distribution. Let 

Jx-a) 2 
(a) (Mx) - -7=- e 



and let 

a l 

(b) ^(x) - -^j 4> (x), 1 - 1,2,..., 

where x 1 2lfi . Then the Gram-Charlier series is 

(c) f(x) - b 6 (x) ^ b^^x) + b 2 4> 2 (x) + ... - 4> Q (x) |b Q -b 1 ^ + b g [t x '|) - i] - ... 



where (z) Is the nth Hermlte polynomial 



-2^ n|[n-Ti(n-a?(n- ? ) z n-4. p>> 

By choosing the a, <r, and b f s properly we obtain a wide variety of distribution f unctions t 
which are asymptotic to the x-axis at both ends of the range. 
Since ^ 

f(x) dx b 



we choose b 1 . The mean is 

oo 



-00 



\ xf (x)dx - crb 1 + a. 



-00 

If a in the expression for x 1 is taken as the mean of the distribution f (x), then b 1 - o. 
Taking a a^ the mean of the distribution we find 

oo 

f (x-a) 2 f(x)dx cr 2 + so 2 ^. 
-oo 

If a-, in the expression for x 1 , la chosen "as the standard deviation of f(x) then b 2 0. 
It is easily found that the third and fourth moments are 

(d) 
(e) 



78 III. SOME SPECIAL DISTRIBUTIONS fS-fc 

Similarly, higher momenta can be found. Equations (d) and (e) and similar ones for higher 
moments give equations for determining the b f s in terms of moments. The problem of 
fitting distributions by the uae of moments, however, will be discussed in 56.4. 



CHAPTER IV 
SAMPLING THEORY 

4l General Remarks 

Suppose x is a random variable with c. d. f . F(x). In accordance with the 
statement made at the end of 2.3, we define a random sample (^ of size n of values of x 
from a population with c. d. f. P(x) as a set of n random variables x 1 , x 2 ,...,x n with 

c. d. f. 



(a) FU,) . P(x 2 ) 



We note that a random sample consists of statistically independent random variables all 
having the same c. d. f . It is often convenient to think of x^ as the value of x in the 
first "drawing" from the population, x g as the value of x in the second "drawing", etc. 

In the theory of sampling, we are usually interested in c.d.f.'s of one or more 
functions of the n random variables comprising the sample. Thus, suppose g(x^ ,x 2 , . . >x n ) 
is such a sample function (Borel measurable). We are interested in determining the c. d. 
f. of g, i. e., Prfgfx^Xg,...^) g], the value of which is obtained by performing the 
Stieltjes integration 

(b) S""$ dP( * l) "" dP(X ) ' 

R 

where R is the "region in the n- dimensional space of the x f s for which g(x 1 ,x g , . . 9 * Tl ) ^ g 

Similarly, if g l (x 1 ,x 2 , , . .^J (i - i,2,...,k), kn, are k Borel measurable 
functions, -we are interested in determining Pr(g i (x 1 ,x 2 ,.. .^J ^ g^ (i 1,2,...,k)). 

The random variable x may be a vector with r component^ say x* 1 ',x^ 2 ',. ,.,x^ ' , 
with c. d. f. F(x' 1 ',x^ 2 ',. . .,x^ r '). In this case the sample O n would consist of n random 
vectors (x ,x a , . . .,x^ )* a " i,2,...,n, (a total of nk vandom variables) with c. d. f 



ft 



(1) x (2) x (r) 
,x , .. .,x 



Again, the sampling prbblem is to determine the c. d. f . of one or more (Borel measurable) 



IV. SAMPLING THEORY 



functions of the nk random variables involved. Por example, here one may wish to deter- 
mine the- probability theory of such functions as x' ', 2I X a x i * 3^" (* L * ) 

Y\ ttwi OhMirt 

/4\ / 4 \ f 1 } 1 ^ M} I *! 

(x v ;*'-x v J; )> where x vx; - 5_.x v ^ ; , 1, j - 1,2, ...,r, and other symmetrical functions. 

n a-i 
In mathematical statistics one is usually interested in relatively simple sample 

functions, such as averages, ratios, sums of squares, correlation coefficients, etc. One 
is able to obtain simple expressions for sampling distributions for such functions only In 
certain special cases which will be considered in this and in later chapters. However, 
one is able to obtain moments of some of the simpler g functions such as .averages, average 
sum of squares, etc., under broader conditions. Some of these cases will also be con- 
sidered. 

4.2 Application of Theorems on Mean Values to Sampling Theory 

This section consists of the application of results of 552,71-2.75 to cases of 
interest In sampling theory. No assumptions are made about the population distribution 
except the existence of first and second moments. 

U.21 Distribution of Sample Mean 

Let O n :(x 1 ,x 2 ,.. ,,x n ) be a sample from a population with an arbitrary distribu- 
tion for which the first moment jij - a exists. Let x be the mean of the sample, 

x - > x 1 /n. 
1-1 1 

Then from equation (b) of 52. 7 1 *, we have that the expected value of x is 

E(x> 7 ^ a l /n " a ' 

since a^ - Efx^) a. If furthermore the population distribution P(x) has a finite vari- 
ance o- 2 , then since each x^^ has the c. d. f . P(X I ), and the x^ are mutually independent, 
we get from (d) of 52. 7 fc that the variance of x is 



We gather these results into 

Theorem (A); If x is the mean of a sample of size n from a population with 
arbitrary c.. d. f . P(x), then if the mean a of P(x) exists , 

E(x) . a, 
and if P(x) has finite variance <r 2 , the variance of x ^is 



o-J - cr 2 /n. 



81 



Having computed the mean and variance of x we may now apply Tchebycheff f a In- 
equality ($2.71): 

(a) Pr(|x-a| > <j<r/n) ^/6 2 . 

Let be an arbitrary positive number, and define 6 from d<r/n - . Then (a) may be writ- 
ten 

tb) Pr(|x-a| > ) cr 2 /n 2 2 . 

Now a random variable X^ which la defined for n - 1,2,3,..., la aald to converge stochas- 
tically to a value A If 

Pr(|X^-A|<) aa n KDO for every fixed > o. 

Letting n > oo In (b) we get 

Theorem (B); For an arbitrary population with finite variance, the sample mean 

convergea atochaatlcally to the population mean, 

* 
For the aample of alze n let G n (x) be the c. d. f . of x:. From theorem (A) we aee 

that the limiting form of G la the atep function 

o for x < a, 



yx) - \ 



lira 

_ ii' / 

1 for x > a. 

In order to "apread out" again the probability which all "pllea up" at x a, we might con- 
aider the diatribution of z - (x-a)/h(n), where the function h(n) la choaen ao aa to keep 
the variance of z from approaching zero. From (d) of 2.?U, we aee that 



<r 

Hence if we choose h(n) - n 2 , the variable z has zero mean and unit variance for all n. 
A beautiful reault about the limiting distribution of z as n - a* regardless of the popu- 
lation diatribution, is contained in the central limit theorem; 

p 
Theorem ( C ) ; For an arbitrary population with mean a and finite variance <r , the 

c. d. f. G n (z) of 



z - fr-a 
approachea the normal diatribution -N( 0, 1 ) as n * CD , 



IV. SAMPLING THEORY 



Z 1.2 

1 f "5 
( c ) G ( z ) v - \ e dt as n > oo , 

n Vin J 
-co 

uniformly in z. 

We make the proof for the case where the m. g. f. i|)(e) of the original distribu- 
tion exists for |6|<h, h>o. Then for |6|<h, the m. g. f . <t>(6) of y - (x-a)/<ralso exists, 
for 4>(6) - e" ae /ill(6/(T). Finally, let $(e) be the m. g. f. of z: 

+00 +00 

$(e) = E(e ez ) - \ \ exp[eT" (x,-a)/ Vno-JdPlx. )dP(x )....dF(x r ,) 

J i "T" 

-00 -00 

+00 

- ij exp [e(x-a)/Vnc^dF(x)l n |*(e/Vn) l n . 
-oo 

Now 



hence <t>"(u) 4>"(o) + TJ(U), where T^(U) v o as u ^ o. We recall d)^^(o) is the i moment 



where < u 1 < u < h if u > 0, and -h < u < u 1 < if u < o. <t>"(u) is continuous at u - 0, 
hence <t>"(u) 4>"(o) + TJ(U), where T^(U) v o as u ^ o. We rec 
if y about the origin, so <t(o) i, <t> ! (o) = o, 4>"(o) - 1, and 

(d) 



where < 6 1 < 6 < h>fn or -hlln < e < 6 1 < 0. Now choose any 6 and hold it fixed, (d) is 
valid for n > 6 2 /h 2 . Letting n > oo , for every fixed e, 



lira J(e) - e 2 
CD 



which is the m. g. f, for N(0,1). Therefore from Theorem (C) of 2.91, the limiting distri- 
bution of G (z) is given by (c) above. 

While the above proof based on the generating function can be shortened, we have 
purposely given it in a way which permits of generalization to distributions of which it is 
assumed only that the second moment exists. In this general case one employs instead of 
the m. g. f. the characteristic function (t) of the distribution, which is related to the 
generating f unct ion <P(e) by $(t) - 4>(it). This always exists for all real t. The argu- 
ment follows the above step by step and at the end one appeals to a theorem analogous to 
(C) of 2.91> which states that if the limit of the characteristic function is the charac- 



$$4.22.4.3 IV. SAMPLING THEORY 83 

teriatic function of some contlnuoua c. d. f . F*(x) then the limit of the c. d. f . is 
F*(x) uniformly for all x. 

4 .22 Expected Value of Sample Variance 

For the sample O n :(x 1 ,x g , . . .^J, call S the sum of squared deviations from the 
sample mean, 

S - 

Recalling that E is a linear operator, we get 

E(S) - >TE(x 2 ) - nE(x 2 ). 



Now if the population distribution F(x) has mean & and finite variance cr 2 , 

E(x 2 ) - [ K of F(x)] - <r 2 + a 2 , 

E(x 2 ) = [^ of c.d.f . of x] - cr 2 + a 2 - a 2 + cr 2 /n. 

Thus 

E(S) - (n-l)tr- 2 . 

We note that E(S /n) ^ o- 2 , but if we define 

s 2 - 3/(n-i), 
then 

E(s 2 ) - cr 2 . 

U.3 Sampling from a Finite Population 

Suppose that a population has a finite number N of elements, each characterized 
by a number x x^ 1 ', 1 - i,2,...,N, and that we draw a random sample O n :(x 1 ,x 2 , .. ,,x n ) 
without replacement. The sample may be represented by a point (x 1 ,x 2 , . . . ,x^) In n dimen- 
sions, the possible values of x a being x' 1 ', x^ 2 ),..., x, <x - 1 ,2, . . .,n. To simplify the 
discussion, let us assume that the values of the x^ ' are distinct, i 1,2,...,N. Then 
Pr(XQ^x^ forot^p) = o. Hence we may think of the range of the sample point being all 
points of the lattice x a x^ 1 ' 9 x^ 2 ' , . . . ,x^, ot- i,2,...,n, but we must ascribe to any 
point for which x a > x^, ct^ft f the probability zero. By a random sample we mean that all 
points of this lattice, barring the exceptional points just mentioned, have the same prob- 
ability p. To enumerate the points with probability p, we note that to obtain such a 
point, we may choose x 1 in N ways, x 2 in N-1 ftaya, ..., x n in N-n+1 ways. The number of 



points with probability p is thus N(N-1 )...(N-n+i ). Since the total probability of the 
points of the lattice must add up to unity, we have 

(a) p- [NON-i )...(N-n+i)f 1 , 

pU^Xg,...^) - p* x^..^, 

where 

if any two x^are equal, 

if all x are distinct. 
Define the mean a and the variance < of the population from 




i-l 1-1 

Here, we shall consider the problem of determining the mean and variance of the mean of a 
random sample from this population. Let x be the sample mean, 



We note that the x^ are not Independent, it will later be seen that the correlation be- 
tween x a and x la not zero, but we may nevertheless use the formula (f ) of 52.7^* as 
pointed out there. Thus 

(b) E(x) - 

ot-1 

To calculate E(x a ) we desire t"ie marginal distribution of x a . Suppose a - 1 . Then 
Pr^-or ') is the sum of the probability over all lattice points for which x 1 - x'*', 
that is, it is p times the number of lattice points for which x 1 - x^ ', and no two of 

.,^ are equal. To compute this number note that we may choose x^ In only one way, 
, then x 2 In N-1 ways, x 2 ^ x' 1 ', then x. In N-2 ways, x^ j x^ 1 ^ or x g , etc.; so 
the desired number is (N-1 )(N-2)...(N-n+l ) The marginal probability of x- is thus seen 
to be 



x 1 



from (a). We get 



E(x 1 ) - xPr(x r x) - J"x/N - a. 



IV* SAMPLING THEORY 8*5 



Similarly, 

E(x a ) - a, ot- l,2,...,n, 

and substituting In (b), we find 

E(3c) - a. 

To calculate c we use formula (c) of 2.7 1 *> 



Employing again the marginal distribution of x a , we get for the variance of x^, 
- 2 - E(x 2 ) - [E(x a) ] 2 - ^(x^fpr^-x^) - a 2 = ftx' 1 '] 2 ^ - a 2 , 

(d) cr a - (T. 

To find fa- for QL ^ p, we use the joint marginal distribution of x a and x^. To simplify the 
notation, let <x 1, p 2. Then Prjxx^ ',xx^ J ;i^j) is p times the number of points 



for which x <1 x' \ x 2 x^^ ^ x^ ', and no two of x 1 ,x 2 ,...,x n are equal. To enumerate 



these points, note that we may choose *i> x p ^ n on ^y one wa> y> then x, in N-2 ways, x^ in 
N-3 ways, etc. Hence 



E[(x r a)(x 2 -a)] - Xu (1) -a)(x ( j) -a)Pr(x r x (i) ,x 2 =x ( J 
J-j J 

V3 



IV. SAMPLING 



Likewise, 

(e) Pap - -1/(N-1) If 

Combining (c), (d), (e), we have 




n 



no*-n(n-l>r 2 /(H-1), 
er* -H-n, 



We note that for n N, CP- o, that <r~ la a monotonlc increasing function of N, and that 

Jv X 

as N > oo , o^ <r^/n f <pr fixed n. 

4.4 Represent at ive Sampling 

Suppose we have a population re consisting of k mutually exclusive sub-populations 
ft each with c. d. f . PCx), that is, 



If X is drawn at random from TV let 

p, - Pr(X from re ), > p. . 1 , 

1 x 1-1 1 

To find the c. d. f. of X we may proceed as follows: 

P(x) - Pr(Xx) - >^Pr(X from tr 1 )-Pr(X^x | X from T^) - J>~PjFi(x) 
Denoting the mean of P(x) by a, and its variance by o- 2 , we calculate 

dP(x) - ^ ~ 



+00 +OD 



a - f x dP(x) -ipij x dP 1 (x) 



00 -00 



where a is the mean of 



iv. 



. 



+OD 

a 2 - { x 2 dP(x) - 



+00 



x 2 dF (x) - 



-00 ~~ ' -00 

where <r? la the variance of P 1 (x). This may be written 



From 4.21 we have that if x is the mean of a sample of size n drawn at random 
from 35 then 

E(x) - a, 

(a) ar| , 



k.ki Sampling when the p^ are known 

We auppoae the probabilities p^ are known (the means a,^ are assumed throughout 
to be unknown). Let us draw a sample O n consisting of the following sub-sarilples: (r ' 
(n 1 elements f rom TTJ ), 0^ 2 ' (n 2 elements from <O, .., 0^ ' (n^ elements from T^); "n n. 
Call X R the mean of , and x^ the mean of (r '. Then 



(a) 



X R - 









If we use X as an estimate of the mean a of tr, we would like to have 



Since we do not know the a^, we require 



for all a^, and this uniquely determines the n^ as 

If n^ np^^, then O n is called a representative sample from rr. The advantages of 
representative sampling over random sampling from tr are implicit in 

Theorem (A) ; The variance cr| of the mean x*_ of a representative sample and the 

AU uK.li. 1 IJ. --- j ^ - - 

o ** 

variance o^ of the mean x of a random sample of the same size have the following relation- 
ship; 



IV. SAMPLING THEORY |M? 



the equality holding only when all a^ are equal. 
To prove the theorem, we calculate 

(b) erf - V cr| (n^/n) 

X R f="l x l 1 

from (a) mid the mutual independence of the x. Now 



Therefore 



Hence (a) of ^J> may be written 

f l m 4 + 

and the theoron folJowa. 

J*. 1 *? Jrjnpllnp; when the <r. are also known 

We employ the same notation as in i*.M . If we use the mean "X R of the sample to 
estimate a, wo have ji at seen that the n, are uniquely determined by the requirement 

E(x R ) = a. 

Suppose however that we use as an estimate of a the statistic 
(a) 

How shri Id we rhooae the n^, for fixed n = A^n^, so that 
(b) E(y) = a, 

and e is minimum (for the class of statistics satisfying (a) and (b))? The method of 
j y 

H.Vi shows th.'it we must take c^ = p^. Then 



TV. SAMPLING THEORY fiSL 



The problem is now to find the n^ which minimize (c)- subject to the condition 

that LIU - n. Treating the n 1 as though they were continuous variables, and following 

i-1 x x 

the method of Lagrange, we form 



-1 x 

and set 

dg/a ni = o, i - V^...,t: 

We get 

-p^/n^ + A a o, 

( A \ r T /A 

To evaluate A sum the equations (d) for 1 = i,2,...,k, and solve for A 2 : 



The minimizing n. are thus 

^ - np^/ 

Putting these back in (c), we find the minimum variance to be 

k 

<J - 



1 * 
With the help of the Schwartz Inequality, 



j_ 
(the equality holding only if the a i are proportional to the b i ), where we let a i - p?, 

b^ p^ <r. , we obtain 



Theorem (A); <r 2 < cr| , 

^-* y - X 



the equality holding only if all o^ are equal . 

^.3 Sampling Theory of Order Statistics 

1*.51 Simultaneous Distribution of any k Order Statistics, Suppose :(x*,x , 
...,x n ) is a sample of size n from a population with probability element f(x)dx, and that 
x 1 ,x g ,...,x n are arranged in ascending order of magnitude. These ordered values of x will 
be referred to as order statistics, more specifically, x a will be called the a^* 1 order 



IV. SAMPLNG THEORY 



statistic. Let r^ i , ..., r k be k Integers such that 1 ^ r t < r g < ... < i^ ^ n. The 

problem to be considered here Is that of finding the probability element of x_ , x_ ,..., 

r ! r 2 

V *' e> 

(a) f(x ,x ,...,x ) dx dx ...to- . 



Let 1^, I 2 , I., . .., I be tha 2^+1 Intervals 



(b) (-00, x ),(x ,x +dx ),(x +dx ,x ),(x ,x +dx ),...,(x +dx ,400), 

P 1 r 1 r 1 r 1 r 1 r 1 r 2 r 2 r 2 r 2 r k ^ 

and let 

^ f(x)dx - q > i - l,2 

1 
The problem of finding the probability element (a) Is Identical with that of finding the 

probability ( to terms, of order dx^ dx^ . . . dx w ) that If a sample of n elements Is 

r i r 2 r k 
drawn from a multinomial population with classes 1^ I g , ..., I 2lc+1 then r,-i elements will 

fall In I } , 1 element In I 2 , Tg-r^i elements In I,, 1 element In 1^, ..., n - r^-1 ele- 
ments In Igfc.^. It follows from the multinomial law (3.12) that the probability of such 
a partition is 

- n - 1 



v 



Substituting the values of the q^ and noting that, to within terms of order dx_ , 

1 r i 



x 

f 

)dx r and \ 

* * v 



(d) J f(x)dx - f(x_ )dx r and \ f(x)dx - \ f(x)dx, 



'i i <j 

X 



we have 



(e) f(x ,x ,...,x )dx dx ...dx 

r 1 r 2 r k r l r s r k 



x <1 



f(x )dx ...f(x )dx . 
1 r l r k k 



M4.52. 4,53 _ 3V. SAMPLING THEORY _ 31 

The distribution function (e) has many applications, some of which will now be 
considered briefly. 

4.52 Distribution of Largest (or Smallest) Varlate 

In this case k - 1, r - n; (e) of 4.51 then becomes the probability element of 




n(J f (xjdx) 11 "^ (x n )dx n , 



the largest element x , 



-00 

a similar expression holding for the probability element of the smallest element. 
4.53 Distribution of Median 

In this case let the number of elements in the sample be 2n + 1 . We would then 
have k - 1, r fc - n + i, and (e) of 4.51 will be the probability element of the sample 
median x^ . Denoting the median by x, we have 

<v 

X 00 

(a) I2nlll ({ ffxJdxAkoOdxj'VftJdx. 

(n:) -oo X 

The asymptotic distribution of the median for large n may be derived from (a). 

If x Is the population median then Jf(x)dx - ^. Therefore 

-OD 

x 3 OD Sf 

f(x)dx - 1 * f f(x)dx and Jf(x)dx - 1 - \ f(x)dx, 

5f * *o 

and hence (a) may be written as 

x 

(b) ( ^ 1)J g (l-k(\ f(x)dx) 2 ] n f(x)dx. 

2 2n (n! ) 2 4 

x 

We may write J f(x)dx = f-(x-x ), where 

S o - 

min i'(x) < 7 < max f(x) 
xtl 

and I is the interval (3t ,X) or (X f ). 
Let Vn(5c-x ) - y. Then (b) becomes 



92 IV. SAMPLING THEORY 



We now choose any value of y, hold it fixed, and let n * oo . If f (x) la continuous at 

x - X Q and f (X Q ) J o, then f (X +y/Yn) -* f (X Q ), f > f(x Q ), and with the help of Stirling's 

formula for the factorials, we thus get as the limit of (c) as n > oo , 

1_2 . 2 
(d) -^- e"^ 7 /ay dy, 



where cr^ 1/8 [f(x n )] 2 . Hence the median x In samples of size 2n + 1 Is asymptotically 

j U 

normally distributed with mean X Q and variance 1 /8n[f (X Q ) ] 2 . It is of interest to note 
that this asymptotic distribution depends only on the X Q and f(* o ) of the population. 
Example: For the normal distribution 



f(x) 



Y2TTO- 

we have X Q - a, f(5^) = i/V^or Therefore, the variance cr| of x in samples of 
size 2n + 1 from a normal distribution with variance <r 2 is Tro- 2 /4n, approximately. 
It will be recalled from U.21 that the variance o| of the mean of a sample of size 
2n + 1 is o^/fen+i). Hence, for large samples from a normal population, the mean has 
smaller variance than the median. 

In a similar manner one could treat the problem of finding the sampling distri- 
bution of the lower quart lie of a sample (the (n+1 )st element in rank order in a sample of 
size Un -f 3), and other particular order statistics. 

J+.5 1 * Distribution of Sample Range 

The joint distribution of the largest and smallest values of x in the sample is 
given by (e) of 4. 51 with k = 2, r 1 = 1 , r g - n. We have 



x 



(a) n(n-i )( f (x)dx) n ' 2 f (x, )f (x^&^db^. 



X 1 



To obtain the distribution of the sample range R, we make the following transformation 



(b) 



x n * X 1 - R 



and integrate the resulting distribution with respect to S. 

Example: Suppose x has the rectangular distribution 

Vr, < x < r, 



(c) 

= o, otherwise. 

We have for (a), 
(d) n(n-i)r~ 



IV SAMPLING THEORY 



Applying transformation (b) and Integrating with reapect to S from o to r - R, we 
obtain aa the probability element of the range in aamplea of aize n from the rect- 
angular diatribution 

(e) n(n-l )r" n R n " 2 (r-R)dR. 

U.55 Tolerance Limits 

The joint diatribution of the amalleat and largeat valuea of x in the aample ia 
given by (a) of 4.5^. Now auppoae we set 

x i x n 

(a) ^ f(x)dx - u, ^ f(x)dx - v. 

-oo x 



We have 

d(u,v) 



- 



9(u,v) 8(x 1 ,x n ) f(x 1 )-f(x n ) 
and hence the joint diatribution of u and v ia 

(b) n(n-D v n ~ 2 dudv, 

and the region of non-zero probability denaity ia the triangle bounded by uo, vo, 

The probability element (b) clearly doea not depend on the probability denaity function 
f(x). Integrating with reapect to u from o to i-v, we find the probability element of v 

to be 

(c) n(n-l) v n " 2 (i-v)dv. 

It will be aeen that v ia the amount of the probability in the diatribution f(x) 
included between x 1 and x n (or statistically apeaking, it la the proportion of the popula- 
tion included between x 1 and x n , 1. e. between the leaat and greateat valuea of a aample 
of aize n). Prom expreaaion (c) one can determine the aample aize n auch that the probabil- 
ity ia that at leaat 1 oqp* of the population will be included between the leaat and 
greateat value of the aample. Such a value of n would be obtained by aolving the following 

equation for n: 

1 

(d) n(n-l) { v n ~ 2 (l-v)dv - , 

P 
or 

(e) nO n " 1 - (n-i)p n - 1 - . 

Example : For .95 and P - ,99> we find n 130. Thua, if a aample of 130 
caaea ia drawn from a population in which the random variable x ia continuoua, the 
probability ia .95 that the leaat and greateat valuea of x in the aample will include 



IV, 3AMPLINQ THEORY 



at least 99% of the population. 

x 1 and x^ are examples of tolerance limit a. More generally, two functions of 
the sample values, say L^x^..^^) and L g (x 1 ,x 2 , . ..,x n ), will be called 100/Sx distribu- 
tion-free tolerance limits at probability level if 

r* 

(f) Pr(] f(x)db/J) - , 

L 1 

for all possible probability density functions f(x). 

If the functional form of f(x) is known but depends on one or more parameters 
6 1 ,e 2 ,.. . ,e^ and if L 1 and L 2 are such that (f) holds for all possible values of the para- 
meters we shall call L 1 and L 2 looflx parameter-free tolerance limits at probability level 
. 

If we denote by u 1 , Ug,...,*^ the quantities 




respectively, it is -easy to verify in a manner similar to our treatment of the distribu- 
tion of u and v, that the probability element of u^Ug,...,!!. is 



n , 



a result which is independent of f(x). The domain over which this density function is 



> u. < 1 . 

^ 



defined is the region for which u- ^ (i-l,2,..,k) and 

1 1 

k.6 Mean Values of Sample Momenta when Sample Values are Grouped; Sheppard 

Corrections 

Suppose that x is a continuous random variable having probability element f(x)dx, 
and that O n is a sample from a population having this distribution. Let the x axis be 
divided into non- over lapping intervals of equal length 6, suppose I Q is the Interval in- 
cluding the origin, and let h be the x- coordinate of the center of I Q . Denote the inter- 
vals by ..., I_ 2 , I -(| , I Q , i lf I 2 , ... where the end points of I are (h+(i-^)<S,h+(i-Kl)<J ), 
i * ,-2,-l ,0, 1 ,2, . . . Let 

"/ 



(a) p i " J f(x)dx, 



(U.6 TV. SAMFLIKG THEORY 



the probability associated with 1^. If f (x) ia identically zero outside some finite inter- 
val there will be only a finite number of non-zero p^, otherwise there will be a conver- 
gent series of p^. Let H I be the number of x ! a in O n falling into 1^, and let the value 
of each of these x's be replaced by h + id, the midpoint of 1^. Let .M be the r-th 
"grouped" moment of the sample, defined as follows 



(b) ^- 

It will be noted that tf M is the "grouped" analogue of 

(o 



where x^x^,...^ are the values of x in the sample. In fact M - Lim M. It is easy to 
verify that E(M) - K , where 

OD 

(d) H^ - J x r f(x)dx. 



-GO 



The problem to be considered here is that of finding ECMK where h is a con- 



tinuous random variable distributed uniformly (i.e. with probability element j<3h) on the 

o 

interval (- ~6, ^6). For a given 6, the random variables involved in the grouping problem 
are the n^ and h. The conditional probability law of the n^ given h is the multinomial 
distribution 



nl _ r-S-T 2 
lnJnJ... '"P^ P -1 p o p i P 2 



Now we have 



where 2. denotes avmnnation over all positive integral or zero values of the n, such that 
- n. The m. g. f. of is 



. . r _ _ 

(g) *<e)-E(e)- e" r Pdh - 1 \ ( T P e ) n dh. 



If the m. g. f . does not exist then the characteristic f unct Ion (obtained by replacing 



Q* _ IV, SAMPLING THEORY 



by e V^T) will exist since the j> are positive and will form a convergent series If there 
Is not a finite number of them. We now have 



(h) 



Making uae of (a) we may write 



(1) E(.MI) - f2.\ \ f(x)(h+l<srdx dh 



E( ( } M r ) ~i'^-\ \ 



Setting h + Id - y, we have 

U) ELMM-12L\ \ fUJ^dxdy 



dd +J>) y +* 



oo y + tf 
f C 2 
M \ ftxj^dx dy. 



-OD y - -id 
Interchanging the order of integration, we obtain 



oo x + d oo 



M)-J.f ( 



(k) E(M)-. ftxJydy dX - [(X +) + - (X -- 



-00 X - -(5 -00 

In particular, for r - 1, 2, 3, (k) becomes 

QD 

E^MJ) - ( xf(x)dx - nj, 
-oo 

? 2 2 

E(^M^) I (x 2 ^^)f(x)dx- ^ +^, 

^oo 

oo 2 

(x 3 + |- x)f(x)dx H t 
-oo 

2 .2 

It will be noted that ^MJ, (,M^ - y^), and ^M^ - ^Mj are unbiased (56.21 ) esti- 
mates of jj^ Kg* KJ- The quantities j^, ~T"d M i are called Sheppard corrections of ^M^ 



IV* Q THEORY 97 



and, Ml. Such corrections can be obtained for higher values of r by further use of (k). 
Similarly one can determine Sheppard corrections for grouped moments about the sample 
mean, as defined by 



J 



*.7 Appendix on Lagrange ! s Multipliers 

We frequently encounter the problem of finding the extreme (maximum or minimum) 
value of a function g(x 1 ,...,x n ) subject to side conditions 

(a) 4> l (x 1 ,...,x n ) - o, i-i,...,k<n. 

To Insure the Independence of the conditions (a) we assume that for some x ***_ 

, 



at the extremum. To simplify the notation, assume n^i, l-i,..., k. At the extremum, dg0, 

(b) ^ ff dx i 0, 

where dx-,,..,dx^ are functions of dx^ 1 ,...,dx n , determined by d4>, -o, i. e., 

n_ a<t> . 

(c) 2_ 5T 2 dx 1 - o, j - l,...,k, 

1=1 Ox i X 

and dx^ 1 ,...,dx n are completely arbitrary numbers. In order that (b) be satisfied for 
all dx 1 ,...,dx , which are arbitrary except that they must satisfy (c), a necessary and 
sufficient condition is that the equation (b) be a linear combination of equations (c), 

i. e., that for some A^.../^, 

Ao , Jc. d<t>4 

(d) fx ~ ~ f^- A i dx ' i " 1 '-" n 

We see that the conditions (d) are obtained if we employ the following rule: To minimize 
g subject to (a-), form the function 

G(x 1 , . . .,x ;A I , . . ,,A, ) n g 
and set 

( e ) g"5c j ** u ' ,...,. 

The equations (a) and (e) constitute a system of n+k equations in n+k unknowns Xj,...,^; 
A. , . . . ,A, , For an extremum it is necessary that x 1 , . . . ,x n satisfy these equations . In 

most applications In statistics the question of sufficiency can be settled in an obvious 
way. 



CHAPTER V 
SAMPLING FROM A NORMAL POPULATION 

Since the normal distribution appears in such a wide variety of problems, we 
shall consider in detail certain sampling problems from such a distribution. Many distri- 
butions are important in statistics for the reason that they arise in connection with 
sampling from a normal universe. In the present chapter, we shall only consider certain 
sampling problems, deriving certain sampling distribution. The application of these samp- 
ling problems to problems of significance tests, statistical estimation, etc., will be 
made In later chapters. 

5.1 Distribution of Sample Mean 

An important property of the normal distribution is the so-called reproductive 
property. We wish to demonstrate that a linear function of normally distributed varlates 
is again normally distributed. Suppose x., x g , ..., x are distributed independently 
according to N(a i; o^), N(a 2; o^), ..., N(a n ,o^), respectively. Let us find the distribution 
of the linear form L I l x 1 + l^x^ + ... + l n x n . According to the results of 2.7*1, the 
expected value of L is 



(a) E(l 1 x 1 +l 2 x 2 +...4-l n x n ) - IjEUj) -f 1 2 E(X 2 ) + ... + \f(^) m 1 i a i + 1 2 a 2 + + Vn* 
The joint distribution of the x f a is 



(b) - e "* 

(2T() 2 cr 1 ...cr n 

Prom this we shall find the moment generating function of the linear form minus its mean, 
L - E(L), 



8S.1 V. SAMPLING PROM A NORMAL POPULATION 



(c) 



\...\ 



(2TT) 2 ^^ "00 -00 



co - 



- A -L- * - 



e dx, 

i=i 1^I ."a, J 



IV- 1 

1.1 ViFo 



e 



This la the moment generating function for the probability element 

-4 

(d) -J=- e 2o dy, 

V2TtCT 

where 

2 

" 



n n 



p p 

Therefore L is distributed according to N(^_ l^a. ,>_lf<rf). We have the 

1 i i 1 i i 

Theorem (A); If x. , x , . . . ,x^ are Independently distributed according to 

i f ji i - " 

N(a 2 ,o|), . . . N ( a n q ^) f respectively, then any linear function of the x ! s 



-X- + ^-p x p + f + ^n x n ^ 3 ^ r ^ u ^ e( ^ according to 



From this result we can easily derive the distribution of the mean of a sample. 
Consider a sample, R , of n observations x 1 ,x 2 , . . .,x n . The x f s are Independently distri- 
buted each according to N(a,o^). If we take 1 1 - 1, ..., l n 5, the linear form L is 
simply x, the mean of the sample. Its expected value is 



100 _ y, SAMPLING PROM A NORMAL POPULATION _ {51g 

(e) e, +^a + ... + la - a; 

its variance la 



n n n 



Therefore, we have the following corollary to Theorem (A): 

Corollary (A): JEf O n : Xj ,x 2 , . . . ,^ la a sample from the normal population 
^(a^er 2 ), then the sample mean x la dlatrlbuted according to N(a,|~). 
3.11 Dlatrlbutlon of Difference between Two Sample Meana 
Suppoae we have two aamplea, O n and O n , of n and n f obaervatlona drawn from 
normal populationa, NCa/r 2 ) and N(a',cr 2 ), respectively. Then the two aample meana, x 
and x 1 , are dlatrlbuted according to N(a, <r^/n) and N(a f , a* 2 /it), respectively. To find 
the dlatributlon of the difference of the two meana, let ua conalder the linear function 
x - x f . In thla caae 1 1 l f l g - -i; so the expected value of the linear form Is 

(a) a - a', 

and Ita variance la 



We therefore have the following corollary to Theorem (A): 

Corollary (A g jh II! O n : X 1 ,x 2 ,...,x n and O n , : x 1 x^...,x^ t are samples from 

the populatlona N(a,<r^) and N(a ! ^cr' 2 ). reapectlvely, then x - x 1 _la dlatrlbuted according 

2 ,2 _ _ 
to N(a-a',~- + ~ 2 ;rr) where x and x f are the meana of and ,. reapectlvely. 

3>12 Joint Diatrlbutlon of Meana in Samplea from a Normal Blvariate Diatrl- 

butlon 

Let ua conalder a aaraple O n ( x 1a / X 2 a / OL " 1 i 2 /^/ n ) from the blvarlate dlatribu- 
tlon 



Let x i " n ^ x ia' i - 1>2. We wiah to determine the joint diatribution of 
To do this, we determine the m. g. f. of the (x^) and (x 2 -a 2 ), i. e. 



V. ftilPTTNTfl FROM A NORMAT. 



(b) 



- E(e 1 



2tf 



-is s 



-oo 
2n-fold 



00 00 - ^ 



2 

51 j 
1-1 



-00 -00 



But, we know from (d) and (e) of 3.22 that if we set 
ing expreaaion inside [ ) will be 



- a 1 - y 1 inside [ ] the result- 



Therefore, the m. g. f. of (x, -a,), and (x 2 ~a 2 ) la 



(c) 



i(e,,6 a ) - e ^-'J" 1 



Since e L J"" " la the m. g. f. for (x^a^ and (x 2 ~a 2 ) In dlatrlbutlon (b) 3.22, it 
follows that the dlatrlbutlon of (x^-a,), (x" 2 -a 2 ) (having m. g. f. (c)) la 



(d) 



_ _ 

dx ld x 2 . 



If x- and x 2 are distributed jointly according to the normal bivar- 



We therefore have 

Theorem 

iate law (a) 55^2, then if x 1 and 5c 2 are sample means of the x^ and the x^ . respectively^ 
J=Q sample O n (x la ,il ,2;a=i ,2,..,n) from such a distribution then x 1 and x 2 are also 
distributed according to a normal bivariate distribution given bj (d). 

Theorem (B extends at once to the case of means in a sample from a k-variate nor- 
mal population with distribution (b) 3.23. The distribution of the means in this case is 



(e) 




n 



V. SAMW.TNQ FROM A KQRMAL POTOLMTQll 



3.g The ^-distribution 

The X 2 -dlstrlt>utlon function with m degrees of freedom is defined as 



This distribution arises very frequently in connection with sampling theory of quadratic 

forms of normally distributed variables. We shall consider some of the important cases 
in this chapter and others in Chapters VIII and IX. 

The integrals, j^ f m (x 2 ) d * 2 ^ J f m fa 2 ) d X 2 are tabulated in many places for 
various values of m and xf? When we let x /2 - t, the latter integral is transformed 
into the Incomplete Gamma Function of which extensive tables have been computed by Karl 
Pearson. 

5*21 Distribution of Sum of Squares of Normally and Independently Distributed 
Variables 

The simplest sample statistic which is distributed according to the ;( 2 -law is 
the sum of squares of variates independently distributed according to the same normal law 
with zero mean. Let us use the method of moment generating functions to find the distri- 
bution of x 2 - J>_x 2 -where each x^ (i- l,2,...,n) is independently distributed according 
to N(0,1). The joint distribution of the x's is 



i 

6 



Now let us find the moment generating function of 



> 

) --- !- \ \ 

r , / o J J 

(enr''-(jD - 



e 

-oo 



= . 

(2tT) 



$5.22 V. SAMPLING FRCM A NORMAL POPULATION 



n 
- (1-26) 2 , 



VT^ie 

for e < 1. 

But this is the moment generating function of the Pearson Type III distribution 
((e) of 53.3) when p 1, ex- o, and ^ f Therefore by uniqueness Theorem (B), 52.81, 
we have 

Theorem (A); If R : x 1 ,x 2 ,...,* n is a sample from N(0,1 ), .the function 
xf * -21 x i is distributed according to the X 2 -law with n degrees of freedom, i. e. 






From this result it follows that, if x 1 ,x 2 ,...,x n are distributed independently according 
to N(a,<r 2 ), then* 2 - I>_ (x^a) 2 /^ is distributed according to f n (* 2 )d(x 2 ). 

' 



We cem readily determine the momenta of the x distribution from i^s moment 
generating function. We expand 4>(e) in a power aeriea 



n P 
(c) *() - 1 + S. 20 + _ ( 2 ,e) 2 *...+ 5_2 (2e ) 



Then we find the momenta about ?ero 



(d) 

The mean is n and the variaRdte is 

or' = n(n+2) - n^ = Pn. 

3.?? Dlatrilji -tloii c\\ Uie Hxpcnorit In a Multlvarlate Normal Distribution 
Now let us consider r t iv-rmc.1 multivarlote distribution of k variatea with zero 
means 



V. SAMPLING PROM A NORMAL POPULATION , j.yg? 



and let us find the distribution of the quadratic form, ^> A 11 x 1 x 1 . To do this we find 

17F1 1J * J 
the moment generating function of the quadratic form, 



(b) 4>(0) - E(e ^ ^ 1 ^ - \ A l/r> ( { e 1 1 dx. ...dx k 

foW\ K / 2 J J IK 



-co -co 



00 00 - TT 



e 

-oo -oo 



It follows from 3.23 that 



00 00 - ^r^j^j k/2 

e ' dx, ...dx v - ^TLL, 

1 k VTi[ 



-co -oo 



the above Integration yields 



That is, 
(c) 

which, as will be seen from (e) in 3.3, is the m. g. f . for a x. 2 distribution with k 
degrees of freedom. 

We therefore have 

Theorem (A); If x, ,x n , . . ,x, are distributed according to the normal multivar- 

.- k ~ i d K: 

late law (a), then ^> A. ^x.x. = xf say, 1^ distributed according to fi.(x ) 

1,J-1 1 J x J _k 
More generally, the quadratic form ^> A 1 ^(x^a^ )(x j-a^ ) from the distribution 

i "j H1 " 11 J J 



-ok _ V. 3AMPLIHQ FROM A NORMAL POPULATION 



has the X s distribution wlth-k degrees of freedom. 

5.23 Reproductive Property of ^-Distribution 

In the same way that the normal distribution possesses the reproductive prop- 
erty so also does the x 2 -<Hatribution. Suppose we have T^> x!' 9 "'XJc Distributed accord- 



Ing to f m (7[f), f m (Xo)*" f m (x)> respectively. Prom the joint distribution of 

1 2 ^C k 

these varlates, let us find the moment generating function of the sum 2Tx?" assuming 

i-1 
Independence, 



o o 

00 



[(1-29) 



- (1-26) 



m 
2 



where m - jL-^* W&) ia the m * g- f - for a x- distribution with m degrees of freedom. 
Therefore, we have the following 
Theorem {A}; If , x >X are Independently distributed according to x 2 *" 



> 



laws with n^, m^,...,^ degrees of freedom respectively, then > 7t| la distributed ac- 

cordlng to a *x 2 -law with > m 4 degrees of freedom. 

1 i 

5.2k Cochran's Theorem 



Cochran f s theorem states certain conditions under which a set of quadratic forms 
are Independently distributed according to ^ 2 -laws if the variables of the quadratic forma 
are Independently distributed, each according to N(0,1). To prove this theorem, we need 
several algebraic theorems which will be stated as lemmas. 



z a* where the c a are +1 or -1 . 



Lemma 1 ; If q Is a quadratic form, > QOA^XXA/ of order n and rank r, there 
exists a linear transformation z a - ^bQ/jX^a-i ,2, . ..,r) such that 3> a afi x a x /5 il " 



In 53.25 we exhibited a linear transformation that would do this for a positive 
definite quadratic form. The reader may extend that demonstration to prove Lemma i .* 



*A proof of Lemma 1 is given in M. Bocher, Introduction to Higher Algebra, Maomillan, 
New' York, 1907. 



106 _ V, SAMPLING FROM A NORMAL POPULATION 



ok 



>> 2: If Z> Ax x is transformed into ]> a^ ^ a 2 rt bj; a linear trans- 



formation. z a - b a pXp(a-i,2,... f n) then 



This lemma can be readily verified f ran the fact that a n - ^> aJ^bL-b > ft and by using 

*" y,6i ' 7 ' 
the rule for multiplying determinants. 

Lemma 3: Suppose we have k quadratic forms. q 1 , q 2 , . ,q k , in x 1 , x g , . . . ,x n of 

K _iQ_ o 

ranks n 1f n ,.. % n. , respectively, and suppose > q^ .21. x . Then a necessary and suffi- 
i 2 TC Ias1 i aarl n 

cient condition that there exist a non- singular linear transformation z - 

-- S ------ 



(a-1 ,2,. ..,^>_ n^ ) such that 
1 - 1 



2 2 

*,+...+ z 



2 
" 



is that n n, + n -f ... + n... 
-- 1 2 TC 

' Proof; The necessity condition is obvious since > n 1 must be equal to n in 

1 - 1 
order for the transformation to be non-singular. 

Now consider the sufficiency condition. We assume n n 1 + n 2 4 ... + n^. By 
Lenma 1 there is a linear transformation y^ ' > b^Vx^ such that q 1 21 c (y a ) 8 



where c a +1 or -1. In the same way we know there exist transformations 

y (2) . y"b^ 2 ^x y (k ^ - 
such that 





In other words we have n 1 linear forms y^ 1 ' ^vb^Vx,. (ai , . . . ,n 1 ) such that q- 

X 1c a(A^ b iri )x A) 2 - 2J c (y ( J ^) 2 ; we have n o linear f orms ..sruch that 
0=1 a fil ^ P 0-1 2 



a ^s c (>b v 'x ^ N. c (v vc M pfp 
HO *" ,^> a^4_ D afl x r* ' ^> w tt ' 9 etc. 



Let us denote yj* ' by z a for a 1 ,2, . . .,n 1 , y^ ^ by z a for a - n^l , . . ., n 1 + n g , etc. 
Let us denote b^' by c^ for ot i,...,n 1 ty)i,...,n), bjj^ by c^ for a - n 1 +1, .., 



V. gUMP7,TNfl ff^qifl A NORMAL PQPT1LATTON _ 1 07 
(pl,...,n), etc. Combining all of the linear transformations, we may write 



( a " 1*2,.., n). 

n l' fn 2 n k n n n 

Tfcen Ql -v. q 2 - SZ^J. etc., and 

By Lemma 2, | afj | - |c a <5 ap | Ic^l 2 (where <5 af , is 1 if a - p and is o if a + p ). This 
reduces to 



Since the c a + 1 , this equation la 

i -1 . !c af ,| 2 , 

and because the c a are real lc a J * 1 . 

This fact tells us that the n linear forms are Independent and constitute a 
non-singular linear transformation. Prom the identity 



r~ o r~ p 

we deduce that 2_c a z^ is positive definite since z_x is positive definite. Hence, each 

c- m +1. This proves the sufficiency of the condition n n. + n p + ... + iv.. It is 



interesting to observe that |c a A| - +1 and that Xccr-c^ - &<&> that ls > the transforma- 

*Nl'11 ' If ' 

71 

tion is orthogonal . 

Cochran f s theorem follows readily from this algebraic theorem. 
Theorem (A) (Cochran ! s Theorem); If x a (ai,2,...,n) are independently dlstrl- 

f- P 

4T ., X(X " 



buted according to N( o, i ) and if > x^ - ^ q^ where q., is a quadratic form of rank n if a 

necessary and sufficient condition that the q^ be distributed according to f n (*x 2 ) 1 

1 



that > nj n. 

Proof ; Asaome n, = n, and find the m. g. f . of the q^. We have 



_n 
E(e 



Now transform the x f a to z'a by Lenrrv- ^, iiutinp l\n>t the Jacobian is unity. 



108 V. SAMPLING FROM A NORMAL POPMATION 



OD OD 1 X -d - ( d 'd \ i.Aty* 4. 4.7*} 

1 r r " 1^-- Z a+^i v z i+ + z n /+ +TC i *!!.+. fiii- 1+1 n' n 

;BI i... i- ' ' ?*" 

oo -oo 

-J^d-Mi)" 7 ". 

which is the m. g. f . of k independent ^ 2 distributions with n 1 ,n 2 , . . .,r^ degrees of free- 
dom, thus establishing the sufficiency condition. 
The converse assumes that 



oo oo _- . 



-CD -00 



- 
Since 5_q 1 * 2Z^^, the right hand side of the equation becomes the m. g. f . of > x 

fei 1 ai ai 

(which has a-xT distribution) when 6 1 6 g >... e^ 6. So the equation becomes 



~1 n 

(1-26) 2 - (1-26) 2 , 



that is, 



(1-26) ' = (1-26) 2 . 

Hence, ^> n. - ji, and the theorem is proved. 
1 - 1 
5.25 Independence of Mean and Sum of Squared Deviations from Mean in Samples 

From a Normal Population 

As an application of Cochran's Theorem, we shall show that the sample mean and 
sum of squares of deviations about the mean in a sample from a normal population are inde- 
pendent and have % 2 - distributions. Consider a sample O n :x 1 ,x 2 , .. ,,x n drawn from a normal 
population N( 0, 1 ) . Then 

(a) 

Let 

t 

Tl Y-l Tl T T1 

1 



> 



V. SAMPLING FROM A NORMAL POPULATION 



10Q 



and 



- nx 2 



q 2 IB of rank 1, for In the matrix 



n 



n 



any minor of order two, 



1 1 
n n 



n n 



la zero, but each element la different from zero. The determinant of the matrix of q 1 la 

111 i 



D - 



n 



n 



n 



n 



n " 

1 

n 



n " 



n 
n 

n 



Subtracting the first row from each of the others, we get 



D 



1 


1 


_ 1 


j_ 


n 


n 


n " * " 


n 


- 1 


1 


... 





- 1 





1 ... 





- 1 





... 


1 



Next we add each column to the first and find 



D - 



n 


1 


_ 1 


_ 1 


n 


n 


n 


" n 





1 





. ... 








1 


.... 











i 



- o, 



for all the elementa of the flrat column are zero. If we uae thla method of evaluation on 
any principal minor of order n - 1 , we get 



110 



V. SAMPLING FROM A NORMAL POPULATION 



M - 


i - n-1 _ J_ 
n n 


n 


.. . - n 


.1,.. 




1 















1 


... 












1 





Hence the rank of q. is n - 1. Using Cochran's Theorem we conclude that 5Z(x a -x) and 

x 2 )* respectively. 



nx 2 are Independently distributed according to f n .-|(% 2 ) 



If x is distributed according to 



then 






is distributed according to 



N(0,1). Hence, we have proved the following corollary to Cochran f s Theorem: 

n 2 2 

If <L:x 1f ...X- is a sample from N(a,<^), then J^ ( x y x ? and n ^ x "^ are inde- 

_ n 1 n - -- - *-y g-2 - g-2 - - 

pendent ly distributed according to fV .(-v 2 ) and f,(-x 2 ). It also follows that s 2 - 

*-. "UT - n ~ ' * - ' ^ - - - 



- 

?' and x are independently distributed. 

- 1 - - - - - 

It should be pointed out that one could establish the fact that 

1 
TT for a sample from N(a,<r 2 ) are independently distributed according to 

and N(o, 1 ), respectively, by verifying that the m. g. f. 



and 



*(e,, 8 ) - E(e 1 



6 2 (x-a)Vn 



y 5.3 The "Student" t- Distribution 

Next we shall derive the distribution of the ratio of two independent varlates, 
one normally distributed and the other distributed according to the -x. 2 - law. Let be a 
variate distributed according to N(0,1) and let % 2 be distributed according to f m (x 2 )- H 
these are independently distributed, the joint probability element Is 



e 2 d!d( x 2 ), 



Let us change variables to 



Then I 



- t^j, 



u. 



-00 < t < 00 , 

< U < OD . 



V. SAMPLING PROM A NORMAL POPULATION 



111 



The Jacoblan of this transformation is 



J - 



1 



VF- 



Hence the Joint distribution of t and u la 



- 1 



du dt. 



To find the marginal distribution of t, we integrate out u, 



00 

o 



du 



> 2 



(a) 



ffltL 



Is called the "Student" t-dlstrlbutlon with ra degrees of freedom. Values of tg have 
been tabulated such that 

- , 



for - .1, .2, .3, .**, .5, -6, ,7, .8, .9, -95, .98, .99 and m * 1/2, 3, ...,30, In 
R. A. Flsher f s Statistical Methods for Research Workers . 

The application of this distribution to sampling theory is imnedlate. As an 
important application consider a sample O n from N(a,<r 2 ), Then 



g m (5T-a)Vn 



V. SAMPLING FROM A NORMAL POPULATION 



Is distributed according to N(0,1 ) and 



, 
cr 2 

is independently distributed according to f n .^(x 2 )* The ratio 

(x-a) V5 _ 



, _ _ _ 
n(n-i ) m (x-a)Vir 

" 



cr*(n-1) 



is, therefore, distributed according to gr^^t). 

The quantity t and its sampling theory which marked a new step in statistical 
inference were first investigated by Gossett who without rigorously proving his result 
suggested the above distribution of t in a paper published in 1 908 under the name of 
"Student". A rigorous proof was supplied by R. A. Pisher in 1926. The essential feature 
of t is that both it and its distribution are functionally independent of <r. 

The "Student" distribution may also be used in connection with two samples. 
Let O n (x la ,a- 1,2,...,^) and O n (x 2a , a- l,2,...,n 2 ) be samples from N(a 1 ,cr 2 ) and 

i, respectively. Let x. and 5L be sample means and s? - ^(x.^-x, ) 2 /ki.-l) and 

P **2 - 9 a-l ' 

- ^'^-.-x^r/h -1). Then 



1 2 
is distributed according to N(o,l) and 



is distributed independently according to f n ^ n _ 2 (x 2 )- Hence, the ratio 




is distributed according to g r , 

^n. -fn^ 



V. flAIIPT.TW* ffRflM A NORMAL POPULATION 



115 



It can be verified by the reader that 



m > oo 



Vin 



5.< Snedecor's P- Distribution 

Now let us consider the distribution of the ratio of two quantities Indepen- 
dently distributed according to -^-distributions. Let v ^ and<xj be independently distri- 



buted according to f 



m 



f m ( x |), respectively. The Joint distribution is 



X s 



1 
/m, 



Let us make the change of variables 



P - m, 



Then 



m 



< P < 00, 
< V < CD . 



and the Jacoblan of the transformation is 

J - 



m m. 

v p 
m m 



The distribution of the transformed varlates Is 

^1 



m 



m. m- 
- 1 - 



m. 



- 1 - Pv + v) 



Integrating out the extraneous variable v, we get the distribution of P, 



m 



1 m. 



00 



ra 



p-ra. r--m^ 
\ (^-)l(^ 



ra- 






1 k v. SAMPLING PROM A NORMAL POPULATION 



P- , m 1 +m 2 m 1 m 1 n^ +ri 

(a) - ^'IT,! (5-) r P~ " \i + 5- ?)" "^~ 



2 2 



This distribution, known as Snedecor f s P- distribution with m 1 and m 2 degrees of freedom, 



will be denoted by li (P). 
m^fflg 

Values of F^ have been tabulated such that 

P 



for .99, .95 and all combinations of (m^nu) from (1,1) to (12,30) and for certain 
combinations from (14,32) to (500,1000), in Snedecor's Statistical Methods . 

The moments about zero are easily obtained. Since the above is a distribution 
function, the integral over the entire range of F is unity, and, hence, 

00 r 1 r 1* r 2 p-* P 1 P"* r 2 r 1 

p 2 ( 1 4 P ) 2 dP - 

m 2 r( r i +r 2 s 



Using this fact, we get n ! by integration. 

r 



, m, +m - ^ 



p/ 1 g) - 1U<I r m i 



(b) E(P")>^ m ^^ (^ }F< (1+ S; F ) dF 

~^ 



m, TT- \ TT- - 1 + r 



m. r m, i m m. 
n K~ I ( - * r >" ( ~- ^ ?,- 



( ) 



n m i 
( 



n m i n m p 
I c 1 )! () 



for r < ^. 

By a simple change of variable the P- distribution may be changed into a Type I 
distribution, (the integrand of the Betr*. function times a constant). Let 



V, SAMPLING PKQM A NORMAL POPULATION LLJL 



m. 



x - 



Then 



and 



- m g x 
m 1 1 - x ' 



m i (i - x) 2 



(F)dP transforms into 



m a . m l . 
dx ___ i _ T~ n . x x2~ " 



^n" 8 ) s mi ' 1- mi (i - x > 2 B( mi ) 

* *~* \~*~' 

It should be pointed out that the square of Student's t is simply distributed as 

t 2 ). 

If we make the change of variable 



(d) z- 

we obtain R. A. Fisher's z- distribution. 

Example i : As an example of the applications of the F-distribution, consider 
two samples O n :(x 1a ,ca - 1,2,...,^) and R : (x 2o o- I,2,...,n 2 ) from populations 
^ ^ 



o p 

^a^) and N(a 2 ,<y, respectively. Let 



Then 2 



-I 



is distributed according to h,. . ^F). 

1 " ' 2" 

Example 2; Suppose O n , O n ,...,0 are k samples fronN( a 1 ,<r 2 ), N(a 2 ,<r 2 ),..., 
,^- 2 ), respectively. Then 2 



116 



41*3. 



2 r- 

has a x -distribution with n - k (z.iu n) degrees of freedom. Since 



is distributed as 



)* the sum 



facts it follows that the ratio 



Is distributed according to (x ) and independently of the former sum* From these 



n~k 



is distributed according to h^ Q.^tF). 

If the k samples come from a normal populations with the same mean as well as 
the same variance, 



^r 



is distributed according to N(o,i) and, therefore, 



E-i or ^ 

has the -^-distribution with k - i degrees of freedom. Prom this fact> and since 
the x^ are distributed independently of the s^, it follows that 



P - 



n - k 



has the distribution h^ -1 n . k (F). 

3.3 Distribution of Second Order Sample Moments in Samples from a Bivariate 

Normal Distribution 

* 
Let us consider a sample O n (x la , x 2<x , a- i,2,.*.,n) from the distribution (a) 

in 55.12* The probability density function for the sample is 



We shall find the m. g. f . of the sample variance and twice the covariance 



4>(e 11 ,e 12 ,e 22 ) -E(e 



where 



,$... SJJL* We have 



V. SAMPLING FROM A NORMAL POPULATION 



n oo QO 

^\ \ 



9 




where 



-oo "-oo 



n oo oo 
-oo -oo 



n oo 



(2TT) 1 



. 

? ? ^ ^ 

. \ \e J- 

n 1 ... i 



-oo -oo 



IB!' 



/-r - 

lj,ap (x la 



The determinant of thp matrix of the quadratic 
2n and is 



|B 



j-jj rt 



i3 



C 


D 


D ... 


D 


D 


C 


D ... 


D 


D 


D 


C ... 


D 








. 





D 


D 


D ... 


C 



where C la a 2 x 2 block of elements as follows: 



12 



H } A 22 - 26 22 (1 - H> 



and D is a 2 x 2 block of elements as follows: 



2 . 2 

n 6 11 n 6 12 



n 



If in 



the first row of elements is subtracted from the third, fifth, etc., and 






the second row Is subtracted from the fourth, sixth, etc., and If In the resulting deter- 
minant to the first column Is added the third, fifth, etc., and to the second column Is 
added the fourth, sixth, etc., we find that 



which exists If the e^, are sufficiently small. Hence the m. g. f. of the a n , a g2 and 

2a i2 la 

n-i _ n-i 

(a) <)(6 11^12' 6 22 ) " |A lj' 2 |A lf 2e lj' 2 ' 

Now If we can find a function f (a 1 1 ,a 12 ,a 2g ) such that 

e J I 
R 

where R Is the region In the space of the a, . for which a, , > o, a 00 > , -1 < . 12 

iJ " 22 ^ a n a 22 

<1, then '( a i 1 a i2* a g2^ w *^ be the p ' d * f ' ^ the a li* The ^^Q 11 11633 of1 the solution 
can be argued from the multlvarlate analojrue of Theorem (B) of 2.81. 



Denoting |A jL j| (n " 1) / 2 by A (n " 1) / 2 and A I J - 26^ by Sjj.and choosing values of 
the e^ small enough for Iff^JI to be positive definite, we can write 



/n-i N /n-K _ n-1 

2 2 2 2 



and we can expand (1-k 2 )" 41 " 1 ^ 2 Into the Infinite series (i/rn-l)/2^[Rn-i)/2 ^ l)/ljjk 21 . 

^0 L 
Hence we may write 



But , 



,n-i , n fALL, 2 br A 

( ~ * 1J \ V g J _~2 A 1l a i1 



and a almllar expression holds for * 22 . Therefore, we may write 



t* 



V. ftAMPT.TWa IFRflM A NORMAL POPULATION 



119 



(a) 



- J 



00 00 

5 





n-1 



2T A -T A 

2*11*11 2*22*22 



If In t ] we make use of the formula U - >flrtrF/a al n 1 + -i) we nay write [ ] as 



(f) 



But from the definition of the Beta function, 53.3, 



t (1-t) * dt 



2 ' 
1 



1-r 2 ) 2 dr. 



Therefore 



a 



c -1 

since terms for odd values of j vanish upon Integration. Making use of this value of [ ] 
In (e) we have 

n-1 ' 

n-*i , j 

2 
^a n a 22 ; z v i-r ) 

~2~ V TT 



oo oo 1 



2 ,1 x n-l 



( f f A ( i } a 

(g) *(a n ,a 12 ,a 22 ) ^ J J^p^^n^g^il^g) j 

00-1 



1 1 t- C 



Setting r Va n a 22 - a 12 , (g) can be expressed as (b) where 



(h) 



a 
' a 



120 _ V, SAMPLING FROM A NORMAL POPULATION _ fs~6 

As we mentioned earlier, the uniqueness of this p. d. f. may be argued from, the multlvar-* 
iate analogue of Theorem (B) 2.ftl. 

The sampling distribution of the correlation coefficient r may be found by set- 
ting a 1g - r Va n a 22 in (h), expanding e r ^ aiiai2Al2 Into an Infinite series, and inte- 
grating with respect to a 1 1 and a g2 ; we obtain as the probability element of r 






where p ~(A 12 / V A n A 22 ) the correlation coefficient of the population. 
If p - o, the distribution of r is simply 



(J) 



The distribution (h) may be generalized to the case of a sample from a k-variate 
normal distribution given by (b) In 3.23. The distribution for the k-variate casa, 

which will be derived in Chapter XI, is 

k 

n-1 n-k-2 - - 

A I fi t O 

I Qij j i \y 



00 f (aj) " 



rr 



,where a^^j - m^ x laf x l^ x jor" x j^ ^a^ 1 " 1*2, ...,k; a- l,2,..,,n) being the sample. 
Clearly, n > k for this distribution to exist. 

This is a very Important distribution function and is fundamental in the theory 
of normal multivarlate statistical analysis. It is known as the wishart distribution, 

5*6 Independence of Second Order Moments and Means in Samples from a Normal 
Multlvariate Distribution 

In 5.25 it was shown that in samples of size n from a normal distribution 
N(a,cr 2 ), the quantities 6/<r 2 )^ (x a -x) 2 and n ^." a ) were Independently distributed accord 
ing to f n-1 (^) and N( 0,1), respectively. 

In the case of samples of size n from the k-variate normal distribution (b), 



V. SAMPLING PROM A NORMAL POPULATION 



53. 25, the two seta of quantitiea a j - (xi a -x 1 )Uj a -Xj)(i.J - 1,2,...,k)and 5^(1 - i,a) 
are Independently distributed according to (k), 55.5, and (e), $5.12, respectively. A 
straight forward method of establishing the Independence of the two systems Is by evalu- 
ating the characteristic function of a^ and s&,, (l^j), and (x^-a, jvli: 



k 

^- 
- E(e 1 



where e i , - e^, which turns out to be a product of the form <>,(<<) . <> 2 (Si), 1. e. 



!i - nil i 



CHAPTER VI 
ON THE THEORY OF STATISTICAL ESTIMATION 

Let be a sample from a population whose c.d.f. depends on h parameters 6 1 , 
e 2 ,..,6^. Suppose the functional form of the c.d.f. Is known, but the true values of 
the parameters are unknown. A fundamental problem in the theory of statistical estimation 
Is the following: On the basis of the evidence of O n , can we assign an interval for one 
of the parameters, say 6^ and then state with a given amount of confidence (the meaning 
of this phrase will have to be defined) that the true value of e 1 lies in this interval? 
More generally, can we make similar statements regarding a subset of the parameters, say 
9i >3p' * * *'<*m mh, and a region In the parameter space These problems are discussed in 
6.1. If Instead of assigning on the basis of O n an interval of values in which we esti- 
mate the true parameter value to be contained, we wish to assign a single value, the prob- 
lem is more difficult: We can hardly hope that our "point estimate" will coincide exactly 
with the true value: In what sense can such an estimate be said to be "good"? How can 
"good* 1 estimates be found? These questions are considered in 6.2. Closely related to 
the problem of point estimation of one or more parameters are questions of curve fitting; 
these are taken up In 6.*t. 

The problems described above may be called parametric problems in statistical 
estimation. There are also non-parametric cases of statistical estimation. One of these 
Is the problem of tolerance limits, which may be formulated as follows: Suppose a sample 
O n Is from a population In which the random variable x Is continuous. Can we determine 
functions Lj and L g of the x f s In the sample such that we can state with a given probab- 
ility that loop* of the x's in the population will be Included in the interval (L 1f L 2 ), 
no matter what the population distribution Is? or no matter what the values of the 
parameters are if the functional form of the distribution is known? This problem is 
discussed In $6.3. Some of the underlying sampling theory is discussed in 4.55. 

6.1 Confidence Intervals and Confidence Regions 

In this section we consider the estimation of one or more parameters by means of 
statements that the parameter lies, or the parameters lie, in a certain region of the para- 
meter apace. The discussion of the example of 6.11 should be carefully studied: while 
thla will not be repeated elsewhere, the analogous considerations pertain in every case 
taken up In $$6.11-6.13. 



6,11 _ VI. QN.THE THEORY OF STATISTICAL ESTIMATION _ 1gs 

6,11 Case in which the Distribution Depends on Only One Parameter 
It will be clearest if we begin by means of an example (range of a rectangular 
distribution): Let R be the range of a sample O n from a population with the p. d. f. 

f(x;e) i/e, when o x e, and o, otherwise. 
It has been shown in ^.5^ that the p. d. f . of R is 

f n (R;e) - n(n-i )e~ n R n " 2 (e-R), o R e. 

If we introduce the f unct ion 

tlr R/e, 

we find that the distribution of this function of sample and parameter i independent of 
the true value of the parameter, its p. d. f . is 

g(i)j) - n(n-i)dJ n ~ 2 O-iJO, oibl. 

We pick a positive number < 1 (it is customary to take .95 or .99) and define ty^ 
from 

i 
g()d<p - . 



Then regardless of the true value of e, 





which is equivalent to the statement 

(a) Pr(ReR/* ) - . 



It should be noted that R is the random variable in this statement and not e. The inter- 
val 6:(R,R/i|Jr) is called a confidence Interval for e, and is called the confidence coef- 
ficient. Let us examine the significance of the probability statement (a): 

First of all, (a) does not mean that if we take the value of R from a specific 
sample, say R ~ R I , that the probability that 

(b) R! i e R^ 

is : For, e is not a random variable, it is a constant, even if unknown, and hence the 
statement (b) is true or false; if (b) is true the probability is unity, and if false, 



1pit _ VI. ON THE THEORY OF STATISTICAL ESTIMATION _ S6.11 

zero, In no case la It . The situation Is analogous to the random drawing (with re- 
placement) of balls from the classical urn, In which the proportion of white balls is , 
of black balls, 1 - . After we have drawn a ball the randomness of the process Is over, 
the particular ball drawn Is either black or white, and probability statements, aside from 
the trivial one that p or 1 , are no longer possible. However, If we draw a large num- 
ber of balls we may expect that the percentage of white balls drawn will closely approxi- 
mate 100 . More precisely: The law of large numbers (3-11) tells us that the propor- 
tion of white balls drawn converges stochastically to as the number of drawings Is in- 
creased. 

We now see the practical significance of the probability statement (a): If we 
always use confidence coefficient and always assert that the true value of the parameter 
6 (It need not always be the same parameter) lies in the interval obtained by putting the 
sample values into the confidence interval, then in the long run (1. e. in repeated samp- 
ling) the percentage of correct statements can be expected to be very close to 100 . 
Again more precisely, we should say that the probability that the proportion of correct 
statements departs from by more than a fixed amount h > o, approaches zero as the number 
of statements (i.e. number of samples) is increased, no matter how small h. 

In general, if a distribution depends on one parameter 6, and if we have two 

functions 6(C) ) and 6(CL) which depend on the sample CL but not on e, so that the Interval 
H ii ii 



is a random Interval, then if the probability that the random interval 6 cover the true 
value of the parameter ia f 

Pr|6td(0 n )l - , 

whatever be the true value e, we call 6(0 ) a confidence interval for e, and the confi- 
dence coefficient. We shall sometimes refer to the pair 6, 1) of random variables as 
confidence limits. This terminology is due to Neyman. 

The method of finding confidence Intervals that was employed In the example is 
worth noting: It depends on finding a function ^ of O n and e whose distribution ia inde- 
pendent of 6. If the function V is monotone and continuous in e, then the relation 



can be inverted to read 



S6.11 VI , ON THE THEORY OP STATISTICAL E3TIMATIOK 

Pr(ee.<5(0 n )) - , 

where <*(0 n ) is the confidence interval. Another perhaps more direct method of determining 
confidence intervals ia aa follows : 

Suppose T(x 1 ,...,x n ) is a function of a sample O n : ^ ,x g , . . . ,x n ) from a popu- 
lation with distribution element f(x;e)dx, such that the probability element of T is 
g(T;6)dT. Suppose the range of values of T having non-zero probability density is (a,b), 
and suppose the range of possible values of 6 is (a,fl). Suppose two continuous monotone 
increasing functions T^(e) and T(e) exist such that 



(d) a 



\ g(T;e)dT - 
a 

b 

\ g(T;e)dT - 



T- 

where p and q are positive such that p + q - 1. Assume that g(T;e) is such that T^ and 
T each ranges from a to b as 6 ranges from a to p. Then for a given value of T, let 6 
and e be the values of e for which T^(e) = T, T(e) - T, respectively. Then (e,e) is a 
confidence Interval for e with confidence coefficient . That (6,6) is a confidence inter- 
val follows from the relation 



Pr(T (6) T T(6)) - , 

which, because of the continuous raonotonic character of T^(6) and T(6), may be inverted 
and written as Pr(6 6 ~e) - . It should be noted that we may obtain confidence limita 
for each value of p if functions T^(e) and T(e) of the required kind exist for each p. 
The question arises as to which value of p is "best". This would depend, of course, on 
what definition of "best" we choose. In those cases where the mean value of the length of 
the confidence interval is a function which factors in the fonn h.(p)h ? (6), coranon aenae 
suggests that we should choose p so that the mean length is a minimum. In the case of 
large samples, the definition of "best" confidence intervals ia fairly direct (see J6.12). 

We may represent confidence intervals obtained by thia process, graphically as 
follows : 



*We permit a or a to be -oo , b or fl to be +00 . 



126 



VI. ON THE THEORY OP STATISTICAL ESTIMATION 




Figure 



Suppose the true value of 6 is e Q . For any sample value of T, the corresponding confi- 
dence Interval Is formed as follows: Draw a line parallel to the 6- axis , defined by 
T - sample value. Let A, B be the points of Intersection of this line with the two 
curves as Indicated In Fig. 6 . The confidence Interval Is the projection of the segment 
AB on the 6-axls. The confidence Interval will cover the true value of 6 Q if and only if 
the segment AB crosses the line 6 = e , that is if and only if T falls in the range 
T (6 Q ), T(e Q ), But the probability of T falling in this interval is precisely . We 
thus have Pr(e e Q ^ 6) - . The discussion and conclusion hold for any 6 in the range 
(,P). 

This method, for example,' has been applied by R. A. Fisher to the problem of 
determining the confidence limits for p from the distribution (1) 5.5 of the correlation 

IL 

coefficient r. Fisher uses the term fiducial limits instead of confidence limits?* 

The idea involved in this method has also been applied to cases where T is a 
discrete random variable to obtain approximate confidence limits for the. parameter in- 
volved. In this case the signs in the analogue of (d) for the discrete case are replaced 
placed by ^ signs, and the largest value of T^(e) and smallest value of T(e) are obtained 
satisfying the inequalities. T (6) and T(e) will be step-functions and the approximate 
confidence limits are obtained by drawing a smooth curve through the graphs of the step- 
functiona. For example, Clopper and Pearson (Blometrika, Vol. 26 (ijjk), pp. kok-hij) 
have applied the method to the problem of determining approximate confidence limits for 
the binomial probability parameter p from the statistic in the binomial distribution 



6.12 VI. ON THE THEORY OP STATISTICAL ESTIMATION 1P 7 

C p x q n ~~ X (x-0, 1 ,2, . . . ,n), and Rlcker (Journal of the American Statistical Association. 

n x " " 

Vol. 32 (1937), pp. 3^9-356) has applied the method to the Polaaon distribution ^ e ~" m > 
where m ia the parameter and x the statistic. A method of determining confidence limits 
for 6 from large samples based on the likelihood function is given in 6.12. 

6.12 Confidence Limits from Large Samples 

Suppose x has c. d. f. F(x,6), where e is a parameter. Let O n be a sample of 
size n from a population having this c. d. f . Let P(0 n ,e) be the likelihood function, 
1. e. 

(a) P(0 .e) = 

i 1 A 

where f(x,6)is the p. d. f. If x is a continuous variable, and is simply probability if 
x is a discrete variable. 

We recall the first method of obtaining confidence Intervals given in 6.11, 
which depends on finding a function 'JJ of O n and e whose distribution Is Independent of e. 
That a function of the desired type for large 'samples may be obtained from the likelihood 
function P(0 n ,6) may be concluded by use of the central limit Theorem (C) of U.21. The 
central limit theorem applies to a sum (the average), so we replace the product in (a) by 
a sum by taking logarithms : 

n 
(D) log P(0 n ,e) = > y > 

where y^ = log f(x^,e) may be regarded as a random variable for any fixed e. To apply the 

central limit theorem we need E(y) arid cr. where y = log f(x,e). Now 

j 

4-OO 

(c) E(y) = j log f(x,e)d x F(x,e), 

-no 

where d x; F(x,e) = f(x,6)dx in the continuous case, and the integral (c) becomes a sum in th 
discrete case. The calculation (c) does not give a simple result, but It is clear that if 
we employed z = dy/6e, then 

+00 
E(z) = \ ^ log f(x,6)d F(x,6) , 

J U ~ .A 

-oo 
and in the continuous case this becomes 

+00 
E(z) - J ^ f(x,e)dx. 

-00 



*If x is discrete, f(x.e) = F(x.e) - F(x-o,e) 



12 8 VI. ON THE THEORY OP. STATISTICAL ESTIMATION 6,12 

If the order of integration -and differentiation may be interchanged, 
(d) E(z) - o. 

Let us now assume (d) to be true in any case and furthermore that A 2 - E(z 2 ) is finite. 
Differentiating (b) we get 



1-1 
and hence 



Applying the central limit theorem to 7 we have that 



- log P(0 n ,e) 

n 



la asymptotically distributed according to N(0,1). We summarize in 
Theorem (A): If 



log f(x,e)| - 0, and A 2 - Ef[ log f(x,6)] 2 | 



i finite, then 

-1--2- log P(0 n ,e) 
VnA ae n 

iS asymptotically distributed according to H(0 t l ). 
Hence we have approximately, for large n, 



(e) 



where d la chosen so that 



d - V 
1 r^ o*f 

~\ e 2 dy . . 

V2TT J 



Now if - 3. 1S P is monotone in e, we may invert in (e) and write the result 
VnA ae 



6.lg VI. ON THE THEORY OP STATISTICAL ESTIMATION i og 



(f) Pr(6 6 6) - . 

The asymptotic confidence intervals (f ) furnished by ~ -- ^ " are optimum in 

" 



the following sense: The mean value of i^~ ( -- d ^fl P ) 1 2 is greater than that of any 

6 e 



other function G(0 n ) f the sample which has N(0,1) as its limiting distribution . 
This maximum property of the mean squared rate of change with respect to 6 implies short- 
est average confidence intervals in a certain sense, since confidence intervals are ob- 
tained by taking the inverse of & with respect to 6. 

VnA 36 

Example : Suppose samples of size n are drawn from a population having the bi- 
nomial distribution 

f(x,p) - dF(x,p) - P X O-P)'~ X , x - 0,1. 

In a sample of size n 

^l n - S=*l 



n 1 n-n 1 
where n 1 = J>L x,. We verify that E(d log f/dp) - o and calculate 



and 



1 3 log P _ VP(I-P) ,11 1LZL\ 
wr* 6D ^[ V D 1-t) ; 




V p (i - p) 



Therefore, to find approximate confidence limits with confidence coefficient , we 
invert the expression 






*For proof for the case where G(0 n ) is of the form hjx^e) where G(0 n ) is asymptoti- 
cally distributed according to N(0,1 ), see S. S. Wilis, Annals of Math. Stat., Vol. 9 
(1938), pp. 166-175, and for more general results, see A. Wald, Annals of Math. Stat., 
Vol. 13, O9**2), PP- 127-137. 



150 



VI. ON THE THEORY__OP STATISTICAL ESTIMATION 



obtaining 

p p) - 



_ 1 p O p 

where p and p are given as the roots of the quadratic ( p) n d|(p - p ). 
6.13 Confidence Intervals in the Case where the Distribution Depends on 

Several Parameters 

Suppose that the c. d. f. of the population depends on parameters 6 1 ,6 2 , . . .,6_, 
and we wish to estimate 6^ If there exist functions 6^0 n ), ^-|(0 n ) f the sample, such 
that the probability that the random interval 



cover the true value of 6 1 does not depend on the true values of 6 1 ,6 2 , * , . ,6^, 

Pr|6 1 tc$(0 n )| , independent of 6 1 ,6 2 , . . .,6^ 

then we say that 6(0 ) is a confidence interval for 6 1 with confidence coefficient . 
(The parameters 6 2 , S-,..., 6^ are sometimes called nuisance parameters . ) 

Example J_ (Mean of a normal population) : If is a sample from a population 
with distribution N(a,cr^), then in the notation of 5.3, 

t - Yn(x-a)/a 
has the t-dlstributlon gj.^^t) with n-1 degrees of freedom. Define t^ from 



Then 

= Pr(-t ^ t ^ t ) - 

whatever be the true values of a and o- 2 . Hence (x-t^a/ Via, x+t^a/ fn) is a confidence 
interval for a with confidence coefficient . 

Example 2_ (Difference of means of ^bwo normal populations T<nown t have the game 

variance ) : Let O n :(x il ,x 10 , . . . ,x in ) be a sample of size n^ from N(a i ,cr 2 ), 1=1,2. 
Let 1 " 1 



VI. ON THE THEORY OF STATISTICAL ESTIMATION 



Then by 5-25 3^/cr has the --distribution with r^-1 degrees of freedom, hence 
(5.23) S/cr 2 , where 3 - S 1 + S , has the T^-distribution with n. -f ri - ? degrees 
of freedom. Furthermore, y = (d-a)/[a' 2 (n^ +n~ 1 ) J 1 ' ? has the distribution N(0,1), 
and aince y and S/o-^ are statistically independent (5.0 1 ?), it follows from r ^.^ 
that cry/[3/(n 1 -fn -2) ] 1 ' 2 has the t-distribution with n 1 + n -p dcgrooa of i reodom, 
gn + n _ 2 (t). Defining t f from 



we find by the method of Example 1 that a confidence interval for a Is (d-t f 3 ! 
d-i-t^s ' ), where 



Example 



Let 



(Variance of a normal distribution) : Let be a sample from N(a,cr 



JL _ 2 
= >_(x,-x) , 

X 



_ 

1=1 



Then (5-25) 

o o 

Xfv "^2 be 



has the x" di3tribution f n -i^) wlth n " 1 degrees of freedom. Let 

*** ' 

two P^ n ^ 3 on tne ran S e (*oo ) such that 



We find that 
ent . 



) is a confidence interval for cr ? with confidence coeffici- 



Example k (Ratio of the variances of two normal distributions ) : Let : 
_. n 



i?' * ' * ' x ln 



n^ from N(a^,cr. ), 



Let 



J 



J 



Since (n.-1 )s^ for 1=1,2, are independently distributed according to -^-distributions 
with n.^-1 degrees of freedom respectively, it follows from 5.^ that T/6 has the 

F-distribution h^ . ^ .(F) with n--l and n -i degrees of freedom. Pick a pair of 
n. - i , n o ~ i i d 

limits F l , P p so that 



since S 1 and S are statistically independent 



VI. ON THE THEORY OP STATISTICAL ESTIMATION. 



1 _ 1>n _ 1 (F)dP - . 



P , 

Then a confidence interval for 6 is (T/F 2 , 

6.1U Confidence 'Regions 

We suppose that the population distribution depends on parameters 6 1 ,6 2 , . . . ,6^. 
We denote the parameter point (6 1 ,e g ,. . .,6^) by 9, and the entire h-dimensional space of 
admissible parameter values by _fl . If d(0 ) la a random region in O which depends on the 
sample O n , but not on the unknown parameter point , and if the probability that the ran- 
dom region <5(0 n ) cover the true parameter point is independent of , 

Pr J9fc<5(0 n ) I * , Independent of 0, 

then we say that <5(0 R ) is a confidence region for , with confidence coefficient . 

It may be desired to estimate only a subset 6 1 ,6 2 , . . . ,e m , m < h, of the h para- 
meters (the remaining parameters are called nuisance parameters ) . Denote the in-dimen- 
sional space of ' :(e 1 ,e ? ,...,e m ) by A 1 . If d ! (0 n ) is a random region in n ! such that 

Pr|0 I ed l (O n )| = , independent of 0, 

whatever be the true value @, then <5'(0 ) is said to be a confidence region for e ! with 
confidence coefficient . 



Example; Suppose O n and O n are samples from normal populations Nfa^o- 2 ) and 
N(a 2 ,cr ? ), respectively. We know f^om 5.25 that S^cr 2 , s /o- 2 (defined in Example 2, 
6.13), 



n(x-a ) 2 n 



1l - l 



are independently distributed according to 7^ 2 - laws with n.-l, n 2 -l, 1, 1 degrees of 
freedom respectively. By 5.23 it follows' that 






2 2 

cr cr 



are independently distributed according to ^ 2 -laws with n. + n 2 -2 and 2 degrees of 
freedom respectively. Hence if we set 



g-,, 

* ~ - v~5~; 
S, + 3 2 



then P la distributed according to h ' _ (F) where n - n, + n . 

c ^ n c. \ c. 



SS6.2.621 VI. ON THE THEORY OP STATISTICAL E3TBMTION Lii_ 

Therefore If F la chosen 30 that 

P 

\ **,*-*<*** ' ' 



we may say that 

i(r a i> 2 +W a a > 2 ,n-a. x - . , 

Pr( 3, + S 2 ( > * P J " ' 

which is equivalent to the statement that 

Pr|(a,,a 2 )fc<5(0 n ;i - , 

where <5(0 n ) la the region in the (a,,a p ) plane bounded by the random ellipse with 
equation 

2 - 2 2(S,+S 2 ) 

Vr a i> * n 2 (x 2 -a 2 ) 2 = P n . 2 2 . 

In other words , the probability Is that this ellipse will cover the true para- 
meter point (a^a ). 

6.2 Point Estimation: Maximum Likelihood Statistics 

Throughout this section we consider the point estimation of a parameter 6 in the 
c. d. f . of a population." There may be other unknown parameters present, if so, we de- 
note these by e ,e , . . . ,e^. A statistic is any function T(0 n ) of the sample, not depend- 
ing on 6, or on any other parameters if such are present. Point estimation consists of 
the use of a single statistic for estimating the parameter; confidence intervals, we re- 
call, Involve two statistics, the end-points of the confidence interval, satisfying cer- 
tain conditions (6.1). Desirable conditions for statistics used as point estimates have 
been given by R. A. Fisher: An optimum estimate satisfies the criteria of consistency, 
efficiency, and sufficiency, defined below. A method which sometimes yields optimum sta- 
tistics is Fisher's method of maximum likelihood. 

6.21 Consistency 

A statistic T(0 n ) is said to be a consistent estimate of 6 if T converges stoclv- 
astically (4.21 ) to e as n * oo . From Theorem (B) of ^.21 we know that whenever the 
population has finite variance, the sample mean is a consistent estimate of the population 
mean. We remark that consistency is purely an asymptotic property. If for every n, 
E(T) =* e, then we say that the statistic T is unbiased . It follows from Theorem (A) of 
4.21 that the sample mean is always an unbiased estimate of the population mean (when- 
ever the latter exists). The following theorem enables us to recognize the consistency 



VI. ON THE THEORY OF STATISTICAL ESTIMATION 



of statistics In many cases: 

Theorem (A); A sufficient condition that T be a consistent statistic for esti- 
mating 6 Is that E(T) + 6 and <r| * as n + oo . 



that 



To prove the theorem, write T(0 n ) T R , and set any > o. We need to show 



Pr( IT n -e| >) >oasn oo. 



Since E(T ) > 6, there exists an N such that 

|E(T n )-6| < l for n > N. 

We note that for n > N the Interval of T n values |T n -E(T n )| T> is always contained in 
the interval |T -e| . Hence the probability of T falling outside the latter interval 
is the probability of T falling outside the former: 

Pr(|T n -6| > ) Pr[|T n -E(T n )l > ^] - Pr[ |T n -E(T n )| > <T T . /2cr T ]. 

By Tchebychef f f s inequality (2.71) the last expression Is (2<r T /) 2 , and by hypothesis 
this * as n oo for all fixed > 0. 

Example; Let O n be a sample from a population with an arbitrary c. d. f . about 

which we assume only that the fourth moment |i k about the mean exists, and consider 

P o ^ 

s as an estimate of <r , where 



_ 



Then we know E(s 2 ) -<r 2 , and by use of the well known formula for the variance of s 2 , 



it follows from theorem (A) that s 2 is a consistent statistic for estimating o- 2 . If 



we apply the same theorem to 

(a') 2 - (x r x) 2 /n, 



we find it is also a consistent estimate of o- 2 , but unlike s , it is biased. 

6.22 Efficiency 

T(0 ) is said to be an efficient estimate of 6 if 
i) VrT(T-e) is asymptotically distributed according to N(0,fi) with ^ < oo, 



VI, ON THE THEORY OF STATISTICAL ESTIMATION 



11) for any other statistic T f (0 n ) such that Vn"(T f -6) Is asymptotically distributed 

according to N(0,fi'), n fi f . 

Since the asymptotic mean and variance of T are e and >i/n, respectively, It follows from 
Theorem (A) of 6.21 that (1) Implies the consistency of T. The efficiency of T f In esti- 
mating 6 Is defined by E - 




Example; Consider the sample mean x and the sample median x of O n from N(a,<r ) 

as estimates of a* We have from 5.11 that VTT(5c-a) Is distributed according to 
N(0,o^), and from 4. 53 that Vn(3!-a) Is asymptotically distributed according to 
N(0,~rror 2 ). Hence x Is more efficient than X. However, to prove x "efficient" It 
would be necessary to verify condition (11) of the definition. This example may be 
generalized as follows: If O n is from a population with p, d. f. f(x), if the pop- 
ulation median = a (the population mean), and if f(x) is continuous at a, then using 
the results of ^.53 on the asymptotic distribution of x, we find that x is a more 
efficient estimate of a than x if o- < [2f(a)]~, 1 x is more efficient if o- > [2f(a))~l 

6.23 Sufficiency 

T is said to be a sufficient statistic for estimating e if for any other statis- 
tic T 1 , the conditional distribution f(T f |T) of T f , given T, is independent of 6. (We use 
the same notation f(T ! |T) whether the population Is continuous or discrete.) Thus, ex- 
pected values, moments and other probability calculation* about T f , given T, will be calcu- 
lated from f(T'|T) and hence will not depend on e, but they will depend on T in general. 
Or, in Fisher 1 s terminology a sufficient statistic "exhausts the information" in a sample. 
We note that sufficiency, unlike consistency and efficiency, is not merely an asymptotic 
property. 

A convenient method of spotting sufficient statistics is embodied in 

41 

Theorem (A) : If the population distribution is continuous, let P((L;e,e , . . ,6u) 

'' ' ' ' - = - - ' - - - ~~~ ~" ""''' "---'" II C. I* 

be the p. d. f. of : if the distribution la discrete, let P(CL;e,6 , . . . ,& ) be the dls- 

~ ' " Ji '' ' . 1 - - - - __.-_LL_ H J _ ._.. - 

crete probability of . In either case a necessary and sufficient condition that T be a 
sufficient statistic for estimating 6 Ij3 that the function P_ factor in the following 
manner 

P(0 n ;e,e 2 ,...,e h ) - g 1 (T;e,6 2 ,...,6 h )-g 2 (O n ;6 ? ,e 3 ,...,e h ). 

A sufficient set of statistics with regard to a set of parameters may be defined- 
and an analogue of Theorem (A) obtained for that case; see Neyraan and Pearson, Statistical 
Research Memoirs, Vol. 1 (1936), pp. 119-121. 



*Por proof, see J. Neyman, Giornale dell Istltuto I ta llano degll Attuari, Vol. 6 (193*0, 
pp. 320-33^. 



15 6 VI. ON THE THEORY OF STATISTICAL ESTIMATION 

Example 1 ; Suppose O n is from N(a,o-^). Then 

*f O . O 1 1 / P 

(x*~aj /o" ~ -"*** - ^ I ** 
P(0 n ;a,cr 2 ) e 2 (! 

where 



(Here and in the following examples the factors corresponding to g 1 and g of Theorem 
(A) are separated by a dot.) Hence x is a sufficient statistic for estimating a. 
In this case there is no sufficient statistic for <r "but it is easily shown that 
x,3/(n-i) are a sufficient set of (unbiased) statistics for a and <j". 

Example 2: For R from N(0,6), 

- In - ls'/e 
P(0 n ;e) = (2ite) 2 e 2 -i, 

? o 

where S 1 wj x i- Hence 3' is a sufficient statistic for estimating a. S ! /n is an 

unbiased (see ex.2,6.2U) sufficient statistic. 

Example 3 : Suppose the population has the discrete distribution 

p(x;6) = 6 X e~ 6 /xj, x = 0,1,2,... 



We recall from 3.15 that F,(x) = 6. For a sample : (x 1 ,x , . . . ,x n ), 

n x, -e 
F(0 r ;e) - FTe e /x, I . 



If we write this 



/ x i -na n 

P(0 n ;e) = (e 1 1 e )'(i/TTx '), 
n 1 



2X* is a sufficient statistic for estimating 6. Since 



we see that 



it follows that x = <2__x,/n Is an unbiased sufficient statistic, 
b . P J t Max 1 mum JUI 1 i? 1 i h P o_cl gstimates 

The function P(0 ;e,e , . . . ,e,. ) defined in Theorem (A) of 0.23, when considered 
as a function of the parameter point e,e o ,...,e, , for fixed CL, Is called the likelihood 

r. 1 1 1 1 * 

of the parametei' point. If the Ill-el lliood function P has a unique maximum at a = (0 n ), 
a ? = 6 p (0 n ), . . . ,6| ] ~ a^(0 ), Uu3ii tlie set nf statistics e,a , ...,e. Is called the maximum 
likelihood oatimate of the pnmmetor point. 



S6.2U _ VI. ON THE THEORY OF STATISTICAL ESTIMATION _ 137 

Let ua consider the case of one parameter , say e. In Theorem (A), 6.12, It was 

shown that under certain conditions the quantity j is asymptotically distributed 

VnA ae 
according to N(0,1 ). Let us assume that 6 is the value of 6 which maximizes P and that we 

can make the following expansion about a 6, 



VnA 66 v~nA 66 TnA 36 * 2 VnA d 6^ 



tl, ( e-8) + -1- ( tl2tP> ( 
^ * ^ * 



-AV + UV + 



where 6 is on the Interval (6,6), and 



We have employed the fact that 9P/de vanishes for 6=6. Now from Theorem (A), 6. 12, 

YnA 86 

where TI v o a3 n v GO . 

Making use of (a) we may write 

d . 1, 
(c) Pr(-AV + W +7riwv- ? < d) =^ \ e 2 



-00 



Considering U, V, W as three random variables, the left side of (c) states that the prob- 

ability of U, V, W falling into a certain region in this space is given by the expression 

p 

on the right. Now let us assume (1) that (-!- - ^f )^ converges stochastically to 



P 
l x 6 - 2 







we 3 hall assLune = -A (implying that U converges stochastically to 0) 



as n CD , (2) that (- - )y (o^d hence 2AW) converges stochastically to some finite 

n 36^ ^ 
number K, and (3) that V has some limiting non- degenerate p. d. f. aan co (1. e., 

has a c. d. f. which la continuous), then the limiting form of the distribution function in 
the If, V, W space aa n > oo is a one-dimensional p. d. f . along the straight line 

JU - 
' W K . 



M6 _ VI, ON THE THEORY OP STATISTICAL ESTIMATION _ 6.2** 
The p. d. f . on this line la that of the limiting distribution of V. Hence, 



Llm Pr( -AV + UV + WV < d) - Lira Pr( -AV < d). 
n * co Vn n * oo 

The equality of the two expressions for A Is a reasonable assumption as the reader will 
see from the following discussion: 




ae " ae 

Differentiating this with respect to e, we get 

, * f a 2 iog f df a log f a 2 f 

\e) J - 5 * - ~ P 

ae ae ae ae 

Substituting (d) Into (e), and Integrating with respect to x from -oo to +00 , we have 



tf ]m f a| ta- 
a e ae -oo de 

Now If we may Interchange the order of Integration and differentiation in the right member, 
then the left member is seen to be equal to 



a 2 



r ^ 

\ f dx - -^ (i) - o. 



-00 

We may summarize in the following 

Theorem (A) : Let 0_(x. ,x , . . . ,x,_ ) be a sample from a population with c. d. f. 
1-* n i . n - - 

F(x;e). Let P(0 n ) ~ 1 U, f ( x i ; e ) M the likelihood function, where f(x;6) 1_3 the . d. f . 
1 x ii a continuous variable and probability of x i_f x i_3 discrete. Let P(0 T ;e) have a 
unique maximum at e = e, and assume 



ae ae' 
and that as n oo 



(II) (;- 9 lo ^ P )^convergeg gtocliagtic:;lly to -A p , 

n de ^ 

(III) ( fj d l0 ^ P ^ converges 3tocIin.atlcn.llY t<> a finite K, 

ae^ ~ 

(iv) VrT(e-e) has a limiting non- degenerate p. d. f . 



VI. ON THE THEORY OF STATISTICAL _ESTIMAT I ON 



59 



Then Yn(3-e) is distributed asyrnptotically according to N(o,^). 

Under fairly general conditions, which will not be given here, it can he shown 
that if 6 is any other statistic such that YrT(6-6) Is asymptotically distributed accord- 
ing to N(0,B 2 ), then B 2 > ~r. 

A 2 

In the present case where the c. d. f. F(x;6) depends on only one parameter 1t 

is often possible to transform from the old parameter 6 to a new parameter <t> s< that the 
asymptotic variance of 6, the maximum likelihood estimate of b will be Independent of the 

o 

parameter. Let A be defined as before, let <t> = h(6), a function to be determined, and 
define 



We will try to determine the function h(e) go that B' la a given positive constant. We 
have 



<t> - gi Ade + C, 

where C Is any arbitrary constant. If the last equation determines <b as a monotonlc con- 
tinuous function of 6, then since P(0 n ;6) has a unique maximum for 6=6, clearly 
P(0 n ;h" (<b)) has a unique maximum for <{>=$ = h(6). By Theorem (A) the asymptotic vari- 

/% o _ i 

ance of <t>, the maximum likelihood estimate of <b, will be (nB^J which Is independent of 
<t>. As an illustration the reader can verify that in Example 2 below, <b log 6 is a new 
parameter of the desired type. 

Theorem (A), 6.!? f and Theorem (A) of the present section can both be extended to 
the case of several parameters. In the case of several parameters it may be shown under 
conditions analogous to those in Theorem (A) that for large n Vn^-e), Vn(6 -e ),..., 
Yn~(6, -6, ) are asymptotically distributed according to a normal multivarlate distribution 
with variance- covariance matrix I \<r*<r,p* J I given by | |A 1 -I I" 1 , where 



<n 




Exainple_1: Suppose O n is from N(a,1): 



H VI. ON THE THEORY OF STATISTICAL ESTIMATION 6.24 

f(x;a) = (2tt) 2 e 2 , 
log f - - | log (2TT) - ^(x-a) 2 , 

9 log f/da x - a. 

P 
If we use the first of the two expressions for A we get 

A 2 = E[(x-a) ? ] = cr ? = 1. 

To use the second expression we would have to take the expected value of 

-3 2 log f/da 2 = 1, 

and we note we get the same result. To find e we inspect 

- ^n - l[n(x-a) 2 +S] 
P(0 n ;a) = (2TT) - e 2 

and see that this is maximum when the exponent is minimum, that is for 

a = x. 

Theorem (A) says that VrT(x-a) is asymptotically distributed according to N(0^1). 
In the present case this is the exact distribution. 

Example P: For R from N(0,6), 

f(x;6) = (2tre) 2 e 2 , 
log f = - 1 log (2it) - ~ log 6 - |x 2 /6, 



dlog f/de . l(-i/e + x 2 /e 2 ), 



A = lS[(-i/e + x 2 /e 2 ) 2 ). 



Let us see whetner it, may not be easier to calculate A from the other formula: 

9 2 log " 



-E(e" 2 -x 2 /e 5 ) = - ie' 2 4 E(x 2 /e)/e 2 . 



O 

Since x /e has the -^"-distribution with k - 1 degrees of freedom, its mean is k =1 
Hence 

A 2 - - ie- 2 + e' 2 - ie' 2 - 



6.24 VI. ON THE THEORY OP STATISTICAL ESTIMATION U1 

Now 1 1 V 

- -n - -n - Tri 
P(0 n ;e) = (2n) 2 e 2 e 

where S f - 2_x i 2 . Differentiating 

log P - - | log (2-it) - 7p log 6 - 

with respect to e, we get 

. lp-(-n/e+s'/e 2 ). 



Equating this to zero and solving for 6, we find 

6 - S ! /n. 

A P 

By Theorem (A), VTf(e-e) is asymptotically distributed according to N(0,26 ). Since 

S 1 / 6 actually has the x -distribution with n degrees of freedom, its exact mean and 
variance are n and 2n, respectively; hence the asymptotic mean and variance given by 

Theorem (A) turn out to be the exact mean and variance. However, the'.Sxact distri- 

< P 
bution of n6/6 is the ^ -distribution with n degrees of freedom, and not a normal 

distribution. 

Example 3: As an illustration of the method of obtaining maximum likelihood 
estimates when the distribution is discrete, consider again the sample of Example 3, 

6.23. We may write 

P(0 n ;e) - e ra e~ ne U, 
where 



U- 1/ftx ' 
1-1 



are independent of 6. To find 6 we setdP/36 = o and solve for 6: 

log P nx log e - ne + log U, 
dP/ae P-(nx/e - n) - o, 

6 = x. 
This we have already shown to be 'an unbiased sufficient statistic. We calculate 

log f(x,6) = X log 6 - 6 - log Xj 

d 2 log f/ae 2 = - x/e 2 , 
A 2 - -E(-x/e 2 ) = 1/6. 

Thus Theorem (A) tells us that Vn(5c-e) is asymptotically distributed according to 
N(o,e). 



VI, ON THE THEORY OP STATISTICAL ESTIMATION 



Example 



In thla example we Illustrate the method of maximum likelihood for 



dbtainlng estimates when more than one parameter is present in the population distri- 
bution. Suppose O n Is from N(a,6). Then 



P(0 n ;a,e) - (2tr e) 



where 



i-i 



To find the estimates a, $, we set 

3P/da - aP/ae = 

and solve for a and e: 

log P - - |n log (?TT) - l n log 6 - l[n(x-a) 2 +S]/6, 

dP/da = P[n(x-a)/e] = o, 

ap/de - lpf-n/e + [n(x-a) 2 +sye ? | o. 

The solutions of these equations are easily found to be 

a = x, e = 3/ri. 

As we have previously noted, these are both consistent estimates, but the latter is 
biased. 

Let us compute the asymptotic variance- covarlance matrix of V"n(a-a) arid VTT(e-e) 
as given in the generalization stated below Theorem (A): 

log f = - 1 log e - (x-a) 2 /e - 1 log (sir), 



26 



Hence the asymptotic variance- covariance matrix la 



daae 



x-a 

o" 

" 



A aa A ae 
A ea A ee 




- 1 




5 
^ 




-1 

= 


e o 

26 2 



It is easily verified that the entries in the last matrix are exact with the excep- 
tion of the (2,?) entry whose exact value is P6 2 (n-i)/n. 

6.3 Tolerance Interval Estimation 

In the foregoing sections we have discussed two methods of estimating one or 
more parameters in distribution functions from samples: the method of confidence inter- 
vals and the method of point estimation based on the method of maximum likelihood. If 



VI. ON THE THEORY OF STATISTICAL ESTIMATION 



the original parameters, say 6 1 , 6 2 ,..., e^ are transformed to new parameters <(>. , <t> , 
> *h by any one-to-one transformation ^ - 4^(6^ 6 2 ,..., e h ), 1 i,2,...,h, which is 
continuous and possesses first derivatives, we may apply both methods of estimation as be- 
fore to the problem of estimating the new parameters. In fact, It can be readily verified 
that the maximum likelihood estimates of the d> 1 are d> 1 (6 1 , e g ,..., e^, 1 i,2,...,h, 
where 6^ 2 ,..., ^ are maximum likelihood estimates of the e^ A specific case of 
transforming a single parameter was discussed In 6.24; the problem there was to find a 
function of 6 having a maximum likelihood estimate whose variance in large samples (to 
terms of order ) does not depend on this function of e. 

Another problem of estimating a function of the parameter which deserves special 
comment Is that of setting tolerance limits (see 4.55). This problem is as follows: 
Suppose f(x,6)dx is the probability element of x where e is the parameter. For a given 
< p ! < 1 let L 1 and L ? be such that 

L 1 oo 

J f(x,e)dx - 1|., ^ f(x,e)dx - ^ . 

-oo L^ 

L! and L e are continuous functions of p 1 and 6; denote them by L-^p 1 ), L 2 (9,p f ). Prom the 
discussion 3n the paragraph above it follows that in a sample of size n the likelihood es- 
timates of 1^(6, p 1 ) and of L 2 (e,(V) are L^^p') and L ($,P'), which are completely expressible 
in terms of the sample, when the functional form of f(x,e) la given. Now the Integral 



f(x,e)dx 



is a random variable which represents the proportion of the population for which L (e,p f ) 
< x < L 2 (e,p). Assume that the distribution function of the integral (a) is independent 
of e. If e converges stochastically to e aa n * oo (which ia Implied by the assumption 
(iv), Theorem (A), 6.24) then L^e,^ 1 ) and L p (6,f5 f ) converges stochastically to L^e,^ 1 ) 
and L 2 (e,p ! ), respectively, and hence the Integral (a) converges stochastically to f5 f . 
Therefore, for a given p on the interval (0,1), and on the same interval and choosing 
some p 1 on the interval (p , i ), one can choose an n, say n, such that for n > n ! 

< b > Pr(v>p) > , 

no matter what value e may have. For some values of p and , particularly those near 
unity (e. g., .95 or .99) there exists a smallest n, say n Q , such that 



VI. ON THE THEORY OF STATISTICAL ESTIMATION 



(c) Pr(v^p) > . 

Therefore, under this condition L^^p 1 ) and L 2 (,p') are 1 oop % parameter- free tolerance 
limits at probability level (see ^.55)*. The interval L^^p 1 ), L 2 (3,P') on the x-axis 
may be referred to as a tolerance interval baaed on samples of size n Q for covering 01 
estimating at least loop % of the values of x of the population, with a probability of 
at least . These results may be extended to the cas in which two or more parameters are 
involved in the distribution function of x. 

It is evident that there are many ways of choosing tolerance limits as functions 
of 6 so that statement (b) can be made, e. g., L 1 and L 2 could be determined by cutting 
off unequal probabilities from the tail of the distribution function f(x,e) rather than 
equal probabilities. 

The reader should note carefully the distinction between a confidence interval 
statement (6.11 ) about a population parameter and a tolerance Interval statement (in this 
and ^.55) about a population proportion. It will be seen, however, in the last ex- 
ample of 6.12 that the confidence statement about the proportion p In a binomial popula- 
tion is closely analogous to a tolerance interval statement about a population proportion 
in the case of a population with a continuous random variable. 

As an example of tolerance limits of this type involving two parameters con- 

sider a sample O n (x 1 , X Q , ..., x n ) drawn from the normal distribution N(a,<r 2 ). Lot x be 

? 1 ^ - P 
the sample mean and s = T 2_ (x,-x) . Let t , be such that 

11 - 1 



-v 

where g n _ 1 (t) is the "Student" distribution with n-l degrees of freedom (see 5-3 ). Let 
L I - x - t , y^jp- s, and L ? = x + t t yjlp- a The proportion of the population having 



values of x on the tolerance interval x + t , y j-p a Is 

(e) \ N(a,a 2 )dx. 

L 1 

The distribution function of this integral la not known. However, it has been shown' 



*For details of the approach to tolerance limits for large samples when the functional 
form of f(x,e) is known, see A. Wald. "Setting of Tolerance Limits when the Sample is 
Large", Annals of Math. Stat . , Vol. 13 (191*2). 

**S. S. Wilks, "Determination of Sample Size for Setting Tolerance Limits", Annals of 
Math. Stat . , Vol. 12, (19^1), pp. 9 J *-95. 



6.U VI. ON THE THEORY OF STATISTICAL ESTIMATION . U5 

that the mean value of this integral Is p. Its variance has been determined only for 

9 "tfit 1 

large samples, which is t^,e ' /(rtn) to terms of order ~r. 

In the discussion thus far, it has been assumed that the functional form of the 
population distribution f(x,e) is known but the value of e is unknown. From the point of 
view of practical statistics the case in which x is a continuous random variable with an 
unknown distribution is perhaps more important than the case in which the functional form 
la known. Thia case haa been treated in **.55*. 

6.U The Fitting of Distribution Functions 

The problem of fitting of distribution functions is as follows: Let 
F(x,e 1 ,e 2 , . . . ,6^) be a c. d. f. depending on the h parameters e 1 , 6 , ..., 6, , and let 

be a sample of alze n from a population having this c. d. f . Consider the values 
x.j, x , ..., x of the sample aa n values of a variable x. From these n values we can 
construct an "empirical" c. d, f ., say F n (x). The problem of fitting F(x,6. , . . . ,&) to 
F n (x) is that of determining e ] , e^,...,^ so that F(x,6 1 , . . . ,e^) is approximately equal 
to F (x) in some aenae. 

The method of maximum likelihood provides one method of determining values of 
e^ e ? ,..., e h by maximizing the likelihood f|f (x l ,6 1 ,6 ? , . . . ,6^ with respect, to the e'a. 
Clearly the values assigned to the parameters by this method are precisely their maximum 
likelihood estimates 6 1 , 6 2 ,..., ^ (6.2U). This method of fitting la beat in the sense 
that for large n, the variance of each 6^ ia leaa than or equal to that of any other con- 
sistent and asymptotically normally distributed eatlmate of 6, . 

Another method of fitting which la eaay to apply in many problema ia the method 
of momenta . Thia method conalata of equating the moments 



and solving for 6-, 6 2 ,..., e, (assuming pj exists for i = i ,2, . . . ,h), where 



*For further details, not given here, the reader is referred to S. S. Wilks, loc. cit.,and 
also S. S. Wilks, "Statistical Prediction with Special Reference to the Problem of Toler- 
ance Limits", Annals of Math. Stat . , Vol. 13 (191*2). An extension of the notion of tol- 
erance limits to two or more variables Is to be presented in a forthcoming paper in the 
Annals of Math. Stat. by A. Wald. 



VI, ON THE THEORY OP STATISTICAL ESTIMATION 



00 



-00 
CD 

M{ - 

-00 

In the case of fitting certain distributions, for example the normal distribu- 
tion, the binomial and Poisson distributions, the two methods of fitting yield the same 
results. 



CHAPTER VII 
TESTS OF STATISTICAL HYPOTHESES 

Suppose the distribution function of a population depends on parameters 6^ G , 
. .., &. We assume the functional form of the distribution to be known, but not the true 
parameter values. Let /ibe the h- dimensional space of admissible parameter values. De- 
note the parameter point by. Let uj be a specified point set InH: it may be of dimen- 
sionality 0,1 , . . . , up to h. In this chapter we consider tests of the statistical hypoth- 
esis, 



A teat of H is a procedure for accepting or rejecting H Q on the evidence afforded by a 
sample from the population. A more precise definition of a test will be given in 7.3. 
As a general rule one seta up a test with the hope of rejecting the hypothesis, and for 
thia reason the hypothesis la often called a null hypothesis in such cases. Thus, if one 
dealrea confirmation of a suspicion that two populations have different means, one takes 
aa H the hyag^fcsis that the means are equal, and if H Q is rejected by the test, then 



one'a suspiclBMps confirmed on the basis of the test used. 
* ~*^T \ 

Statistical hypotheses are classified as follows: If OL is a single point of n, 

that ia, if H^ atatea =9^, then H is called simple; in any other case H^ is called 
o o o - o 

compoaite. 

7.1 Statistical Testa Related to Confidence Intervals 

Consider the case where H Q specifies the value of only one parameter 6^ 

v e i - ? 

If the population distribution depends on no other parameters, this Is a simple hypothesis; 
If other parametera 6 , ..,, \ are P re9ent > H ia composite, u> being the h-1 dimensional 



U8 VII. TESTS OF STATISTICAL HYPOTHESES S7....1 

subapace (hyperplane) in O defined by e 1 = e. If confidence intervals <5(0 n ) for 6 1 are 
available, then one may proceed as follows: Form <5(0 n ) for the sample R , and reject H 
unless <5(0 ) covers 6. If is the confidence coefficient, then 

Pr(rejecting H Q if it is true) - 1 - Pr |6^<5(0 n ) |6 1 = ei = 1 - . 

The quantity a 1 - is called the significance level of the test. It will be noted 
that when confidence intervals for 6 are known, then a whole family of tests is at hand: 
A test exists for every 6, that is, for every admissible value of 6. . We remark that be- 
yond the statement Pr( rejecting H Q if true) - a , no further property of the test can be 
deduced from the definition of confidence intervals. One might ask about the Pr(accepting 
H Q if false), that is, accepting H Q when 6 1 has some other value than e, but the signif- 
icance level tella us nothing about this*. As will be seen in the examples below, our 
method usually leads us to the calculation of a certain statistic, say T, and H ia re- 
jected if T falls in a certain range R. Suppose, for example, that R la the range T > T , 
and that T possesses the p. d. f. f(T) If 6 1 =6^. In certain cases It Is sometimes said 
that or = Pr( finding a value of T less probable than T if H is true). This really does 
not motivate the test any better: If by ff T 1 is less probable than T " we mean f (T 1 ) < 
f(T 2 ), then the same test can be made with other statistics S= <t(T), and the relation 
"less probable" is not invariant under such transformations *. 

It should be noted that confidence intervals give us a far more complete judge- 
ment about the parameter 6 1 than significant^ tests. We also remark that if confidence 
regions (6.1 k) for the set e 1 , e ? ,..., e fn are available, then go are significance tests 
for the hypothesis 

V 6 1 = e ?> e 2 = *2 e m= e m- 

H is simple if m = h, composite if m < h. 



Example 1 : Suppose that on the basis of the sample from a population with 

o /-N n 

Jistril 

hypothesis 



the distribution N(a,cr 2 ), where a and cr" are unknown, we wish to teat the (Student) 



H : 



This la a composite hypothesis: The space H of admissible parameter points (a, a 2 ) is 



*See 7.3- 

**This may be shown by considering the signs of f f (T) and g f (S), where g(S) is the p. d. 
f. Of S. 



7.2 VII. TESTS OF STATISTICAL HYPOTHESES 



(b) P - P(0 n ;a ,e) - (2-rte) * e 

log P Q = - \ n log (2n) - In log 6 - ^[n(x-a ) 2 +S]/e, 

aP /a~P l- l n /e +1 

Equating this to zero and solving for 6, we get 

6 - (x-a ) 2 +S/n, 

and substituting this into (b), we find 

JL 1 

I^(0 n ) - |2it[(x-a ) 2 +S/n]| 2 e 2 . 

Hence 

In 
A - [i+n(x-a ) 2 /S] 2 . 

The distribution of A under the assumption that H Q is true is independent of the un- 
known 6, in fact 

-i* 
A - [Ut 2 /(n-i)j 2 , 

where 

t = Vn(x-a o )/s 

has the t-dlstribution g^^t) with n-1 degrees of freedom. Let t 2 t 2 correspond 
to A - A Q . Then A ^ A Q if and only if |t| > t Q . To get 

Pr(A ^ A Q ) - a, 

we define t from 

oo 

' t )dt = a . 

~JU~ I 

*0 

The likelihood ratio test for H Q is seen to be the same as the (Student) test of 
Example 1 , 7.1 . 

In many cases the asymptotic distribution of the likelihood ratio is given by 
Theorem (Aj^: Suppose the c. d. f . oi_ the population depends on parameters 6. , 

e_, . ., , a , and that A is the likelihood ratio for the hypothesis 

c. n - - . "" - 

H Q : 6 1 - e, e^ - e,..., e m - e, 



1 5 g VTT. TESTS QF STATISTICAL HYPOTHESES 7 .3 

where m < h . Then under certain regularity conditions the asymptotic distribution of 

o 

-2 log A, under the assumption that, H r is true, is the -^ -distribution with m degrees of 
freedom. 

In the above example we may write 

_ 1 

A (i + lt ? /N)~ N (1 + -k 2 /N) 2 , 

C. C. 

where N = -(n-1 ) . Hence as n > co , N > co , and 

^-^\ 

-2 log A - t 2 . 

Since the asymptotic distribution of t is N(0, l ), the asymptotic distribution of t 2 



is the X-diatributlon with one degree of freedom, and this accords with theorem (A). 
7.3 The Neyman- Pearson Theory of Tea ting Hypotheses 

In the notation introduced at the beginning of Chapter VII, consider the hypo- 
thesis 

H Q : eo>. 

In many problems (for instance, all the examples we have considered in 7.1) several 
tests, or a whole family of tests, are available, and the question arises, which is the 
"best" test? For the comparison of teats, Neyman and Pearson have introduced the concept 
of the power of a test. We approach this concept through the following steps: 

First, we note that any teat consists of the choice of a (B-meaa.) region w in 
the sample space and the rule that we reject H if and only if the sample point falls 
in w. w is called the critical region of the test. The power of the test is defined to 
be the probability that we reject H . This is a function of the critical region w (a set 
function of w) and of the parameter point <S> (a point function of ). We write it 
P(wl@) and note it is 

P(w|9) Pr(0 n w|e). 

The interpretation of the power function is based on the following observation: In using 
a teat of H Q , two types of error are possible (exhaustive and mutually exclusive): (I) 
We may reject H Q when It is true. (II) We may accept H Q when it is false, I. e., when 
9 is a point not In ui. We call these respectively Type I and Type II errors. Now a 



*The regularity conditions are the same as those for the mu It i- parameter analogue of 
Theorem (A), 6.24. 



S7 * 3 _ VII. TESTS OF STATISTICAL HYPOTHESES _ 1 53 

Type I error can only occur If the true o>. Hence the probability of making a Type I 
error if o> is 

(a) Pr(0 n wieo>) = P(w!<3>) for o>. 

A Type II error can be committed only if @co. The probability of making a Type II error 
if dlo> Is 

w|eeo) = l-Pr(O n w|eco) 
= i-P(wie) 



The significance of the power of a teat is now seen to be the following: For co, P(w|8) 
ia the probability of committing a Type I error; for goo, B(w|e) ia the probability of 
avoiding a Type II error. We illustrate this discussion with an example of a one para- 



meter caae . 



Suppose O n is from N(a,l ), and that we wish to test the hypothesis 



H : a - a . 



Let u 1 , u^ be any two numbers, -oo u 1 < u + oo , such that 



(b) (2TT) e du - 1 - a. 

u i 
Consider the test which consists of rejecting H Q if 

(c) Vn(x-a ) < u 1 or Vn(x-a o ) > u g . 

The critical region w of the test is the part of the sample space defined by (c), that is, 
the region outside a certain pair of parallel hyperplanes (if u 1 -oo , or n n ** +00, w is 
a half-space). Let us calculate the power of the test: 

P(w|0') - Pr[V5(x-a ) < u, or YH~(x-a ) > u ? |a] 
- i-Pr[u 1 Vn(x-a o ) u g |a]. 

Now if the true parameter value is a, 

u = Vn~(x-a) 



*An elementary discussion of a simple case with several parameters may be found in a 
paper by H. Scheffe, "On the ratio of the variances of two normal populations", Annals 
of Mathematical Statistics, Vol. 13 (19^2), No. h. 



i;U _ VII, TESTS OF gPATiariCAL HVPflPHEflRS _ >73 

has the distribution N(0,1 ). Write 

Yn(x-a ) u+Vn(a-a ). 

Then 

P(w|a) - i-Pr[u r Vn(a-a ) u u 2 -VH(a-a Q )|a], 



u 2 -Vn(a-a ) 




(d) P(w|a) 



Each choice of the pair of limits \i ]9 u g aatlafylng (b) gives a teat of H Q . Let ua now . 
consider the clasa C of testa thus determined, and try to find which is the "best 11 test of 
the claaa C. 

We note first that for all teat a of the claaa, 

Pr(Type I error) - P(w|a Q ) a, 

from (d) and (b). Thia la what we have previoualy called the aignificance level of the 
teat. To compare the teata we might conaider the grapha of P(w|a) againat a for the var- 
ioua teats; the graph for a given teat la called the power curve of the teat. We have 
aeen that for every teat of the claaa C, the power curve paaaes through the point (a Q ,a). 
To find the shape of the power curve, we might plot points from (d), but by elementary 
methods we reach the following conclusions: The alope of the power curve correapondlng 
to (u lf u 2 ) is zero if and only if a la equal to 

(e) a ta - a + (u 1+ u 2 )/Vn. 

As a + co , P(w|a) 1, unleaa u 2 - +co , in which caae P * 0. Aa a + -oo, again 
P 1, unleas u 1 - -co, in which caae P 0. Alao, < P < 1. Except for the cases 
u- -oo or u n +00 , the power curve muat then behave aa followa: rising from a minimum 
at the value a = a given by (e), it increaaea monotonlcally, approaching the aaymptote 
P i aa a * oo . It may be shown from (d) and (e) that its behavior is symmetrical 
with reapect to the line a a m . In the exceptional caaea, for u 1 -CD , P Increases mon- 
otonlcally from o to i aa a Increaaea from -oo to +00; for u 2 = +00, P decreasea monoton- 
ically from 1 to o. 3ome power curvea are aketched in the figure: 



>7.3 



VTI. TESTS OP STATISTICAL HYPOTHESES 



155 




Figure 7 

(1) la for the teat with u 1 -oo , (11) for u g - +00 , (ill) has its minimum at a Q , (iv) 
to the left of a , (v) to the right. All the tests of class C have power curves lying in 
the region between the curves (1) and (11). 

Aa far aa the probability of avoiding errors of Type I Is concerned, the tests 
are equivalent, aince the curves all pass through (a ,a). For a ^ a , we recall that the 
ordinate on the curve is the probability of avoiding a Type II error. For two tests of 
H Q , say T 1 and T g , with critical regions w 1 and w ? , we say that T 1 is more powerful than 
T 2 for testing H Q : a - a Q against an alternative a a 1 ^ a Q if P(w 1 |a f ) > P(w 2 la ! ). 
Thia means that if the true parameter value is a 1 , the probability of avoiding a Type II 
error la greater in uaing T 1 than T 2 - Now for altervatives a > a , the power curve (i) 
liea above all other power curves of tests of claaa C, that ia, the teat obtained by taking 
u- - -oo la the most powerfi.il of the class C for all alternatives a > a . Hence this 
would be the best test of the class to use in a situation where we do not mind accepting 
H if the true a < a , but want the most sensitive teat of the class for rejecting H 
when the true a > a . On the other hand, we see that this teat is the worst of the lot, 
that la, the leaat powerful, for teating H Q against alternatives a < a Q . For these alter- 
natives the teat with power curve (ii), obtained by taking u ? = +00, is the most powerful. 
There la thua no teat which la uniformly most powerful of the class C for all alternatives 
-oo < a < 4-00 1 

The situation described in the last sentence is the common one. To deal with it 
Neyman and Pearson defined an unbiaaed test aa one for which P(w|a) is minimum for a = a . 
The argument against biased tests In a situation where we are interested in teatJng a 
hypotheaia against all poaalble alternatives is that for a biased tost Pr( accepting H ) 



la greater if a has certain values ^ a Q than if a - a . 



If we set a 



a in (e) we find 



that the unbiased teat of the class C is that for which i^ -u . This la the teat of the 
claaa C we should prefer, barring the "one-aided" situations where the teats with power 
curves (1) and (ii) are. appropriate . 



VII* TESTS OP STATISTICAL 1 ^ S73 



This serves to Illustrate the comparison of tests by use of their power func- 
tions. Beyond this description of the underlying ideas of the Neyman- Pearson theory, it 
is not feasible to go into it further except for a few remarks: If one considers instead 
of the class C, the more inclusive class of all tests with critical regions w for which 
P(w|a Q ) - a, there is again no uniformly most powerful test. However, the unbiased test 
obtained above la actually the uniformly most powerful unbiased test of this broader 
class. 

Leaving the one parameter case now, we recall that the definition of the power 
of a test and its meaning in terma of the probability of committing Type I and Type II 
errors was given for the raultiparameter case at the beginning of this section. Methods 
of finding optimum critical regions in the light of these concepts have been given by 
Neyman, Pearson, Wald and others, but there is still much work to be done. The problems 
of defining and finding "beat" confidence intervals are related to those of "best 11 teats; 
the groundwork for such a theory has been laid by Neyman*. In conclusion, we recall the 
assumption made at the beginning of Chapter VII: that the functional form of the distri- 
bution is known for every possible parameter point: It is clear that In the application 
of the theory the calculations for the gain In efficiency by using a "best" test in pref- 
erence to some other test will be invalidated if those calculations have been made for a 
distribution other than the true distribution. The whole theory introduced abo\re pre- 
sumes the knowledge of the functional form of the distribution. 



*J. Neyman, "Outline of a theory of statistical estimation baaed on the classical theory 
of probability", Phil; Trans. Roy. Soc, London. Series A, Vol. 256 (1937), pp. 535-380. 



CHAPTER VIII 

NORMAL REGRESSION THEORY 

In 2.9 certain ideas and definitions in regression theory were set forth and 
discussed. In the present chapter we shall consider sampling problems and tests of sta- 
tistical hypotheses which arise in an important special type of regression theory which 
we shall refer to as normal regression theory. To be more specific, we shall assume that 



y is a random variable distributed according to N(> a^x .<r ), where x 1 ,...,x l . are fixed 

p-1 P P i K 
variates, and consider samples of size n from such a distribution. N(2. a^jc^o- ) is a 

conditional probability law of the form ftylxX,...). A sample of size n will con- 



sist of n sets of values (y a !* 1a * *2 a >*- ' x iea^ a * i*2,...,n, where y 1 ,...,y n are n ran- 
dom variables, but where the x , p i,2,...,k, a V...,tti, are fixed variates. and not 
random variables. We shall consider such problems as estimating (by confidence intervals 
and point estimation according to principles set forth in 6.1 and 6.2) values of the 
a ! s and <r^ from the sample and of testing certain statistical hypotheses regarding the 
a ! s. We shall also consider applications of normal regression theory to certain problems 
in analysis of variance, including row-column and Latin square lay-outs. 

8.1 Case of One Fixed Variate 

In order to fix our ideas in the regression problem, we shall first consider in 
detail the case in which y is distributed according to Nfa+bx,** 2 ). Let (L: (y a lx a ), 
a l,2,...,n >lobe a sample of size n from a population having this distribution. The 
probability element for the sample is 



(a) (y 1 ,...y n ) - 

, 2<r 2 - 



158 VIII. NORMAL RBG >wefll 



Maximizing the likelihood function (that enclosed in [ ]) with respect to <^, a, t>, we 
find in accordance with 6. 2k that a and 6 are given by solving 



> 



y a - an - t>2_*a - 0, 

xf. - o, 



and a- 2 is given by 



(o) **-i 

Solving (b) we obtain 

a - y bx, 
(d) 



In order to be able to solve (b), we must have2l (x a -x) 2 ^ o. Now a and S are linecu? func- 



a 



tlons of y 1 ,.../y n , and It* follows from Theorem (C) of 55.23 that a and b are jointly dis- 
tributed according to a normal blvarlate law with 

b, E(a) a, 



"* " * U a -x) 2 ' 



cov (a,b) - - 



The sum of squares in the exponent of (a) may be written as 

2 - q 1 - Q 2 , 



where 



.,48.1 



VIII. 



It la evident from (b) that A - a, t> - b are homogeneous linear functions of (y a -a-hx a ) 
(a l,2,...,n). Also y a - a - fcx a - y a -a-bx a - (a-a) - (6-b)x a which is a homogeneous 
linear function of the (y a ra-bx a ) (a. i,2,...,n). We know that 2l(y a -a-bx a ) 2 is distri- 
buted according to the *x 2 -law with n degrees of freedom, and that q 2 (which is the expo- 
nent In the Joint normal distribution of a - a and t> - b) is distributed according to a 

X. 2 -law with 2 degrees of freedom. Therefore, it follows from Cochran's theorem 5.2^ 

i P 

that q^ is distributed according to the x -law with n - 2 degrees of freedom. 

We may summarize in the following 

Theorem (A)! Let O n ; (y a lx a ), a* l,2,...,n, where the x a are not all equal , 
be a sample of size n from a population with the distribution N(a+bx, a 2 ) , Then 

( 1 ) The maximum likelihood estimates a, fc and of a, b, <r 2 . respectively, are given b 
(b) and (c). 

( 2 ) a-a and S b are jointly normally distributed with zero means and variance- covar- 
lance matrix given b 



n 
0-2 




1 distributed according to the X-law with n - 2 degrees of free- 
dom and j^ distributed independently of a and 6. 

One may readily set up confidence limits for a or b on basis of the "Student" 
t-distributlon. For example, from 55.3 It follows that 

(b-b] 



t - 





n - 2 



is distributed according to ^gCt), from which confidence limits can be set up for b, or 
the statistical hypothesis can be tested that b has some specified value b Q (e. g., 0, 
which corresponds to the hypothesis that y is independent of x). A similar treatment holds 
for a* 



160 



VIII, NORMAL REGRESSION THEORY 



8,2 The Case of k fixed Variatea 

Suppose y la distributed according to W^****' " 2 )- *** n : (y a ' x i > X 2 '* 
x lca ), a- l,2,...,n>k,be a sample of size n from this distribution. The probability ele- 
ment for the sample Is 



(a) 



) - [( 



-=- 
V2tr< 



. 

) n 



]dy 1 ...dy 



n . 
n 



There Is no loss of generality In considering the mean of y a as a homogeneous linear func- 

tion of x lo( ,, ^''^a' for by Choosln 8 one of the X!B > 8& y x tf = 1 for a11 a > we can 
reduce our results so as to cover the case In which the mean value of y Is not homogeneous 
In the fixed variates.l.e., of the fonn (a 1 4a 2 x p *...*a k x k ). The results for the homogeneous 
case are simpler than for the non-honogeneous case from the point of view of notation be- 
cause of greater symmetry. 

The maximum likelihood estimates of a^...,^ and <r 2 , found by maximizing the 
quantity In [ ] In (a), are given by the following equations 



a 



<> 



equations (b) as 



' a op " ^^a' a oo " 



oq 



" i2.-.k). We may write the 



(q- 1,2,...,k). 



If the determinant |a | T< o, then It follows from 52. 9U that the solution of (d) Is 



It should be noted that 



1 1 Is positive definite arid hence 



o If 



are 



linearly Independent (l. e., If there exists no set of real numbers C (p - 1,2,..., k) 



not all zero for which 

Jt- 

X_ a PQ C P C a" 

*.*** r -i, 



C- - o, for all a). For consider the quadratic form 
p n k 

) If the x a are l^^arly Independent, 



161 

JBL JL p JL- 

clearly ^(^x^C ) , and hence ^> a^CLCg, cannot vanish. Now the a QD and hence the 

aL are linear functions of the random variables y^ y 2 ,..., y n . Therefore, the L are 

distributed according to a normal k-varlate distribution. The variance of cL is 
n k t P 

- :> o^er 1 a , - a 1 ^. Similarly, the covarlance of L and a n is a?*. 



q "*~ q f qi *i f Q 

It will be noted that (b) can be written as 

- 1,2,...,k) 



which shows that (^"^), (p 1 ,2, .. .,k) ? are homogeneous linear functions of 
k p p 

^la.^^, (a i,2,...,n). y fl - 
1^1 ** ^ k 
functions of y a - ^"^X a . Now 



, (a i,2,...,n). y fl - 21 ^x , (a- i,2,...,n) are also homogeneous linear 

^ F 



where 



(f) 



Hence, q 1 and q are homogeneous quadratic forms In (j^- > \X-QQ)' Since eL - a., are 
distributed according to a k-varlate normal law with variance- covarlance matrix 



- 1 1 " 1 it follows from 55.22 that q 2 Is distributed according to the*x?-l8M with 
k degrees of freedom. Similarly, we know that 



Is distributed according to the x 2 " law wlth n degrees of freedom. Therefore, by 
Cochran's Theorem, 5. 2k, q 1 Is distributed according to the *x, 2 - law with n - k degrees of 
freedom and Independently of q 2 (1. e., the aL - cr). 

Consider the aum of squares In (c); we may write 



162 



VIII. NORMAL REGRESSION THEORY 



But, It follows from S2.9 1 * that this expression reduces to 



(g) 



oo 



oi 



a 



ok 



. a, 



We may summarize in 



Theorem (A); tet O n : (y a l* la *3c 2a ,... > x lDQl ), (a- 1 ,2, .. .,n).be a sample of 



size n from a population with distribution N(2_a p x p ,cr*), where x 1 , x 



(a- 1 ,2,... jn),are linearly independent. Let a^ > 

P^ a 



V 



Thai 



aI 



and 



( 1 ) The maximum Itolihood estimates of the a and or 2 are given bj ( e ) and (^) . 

(2) The quantities (a p - a^), (p - 1 ,2, . . .,k),are distributed according jto a k-variable 
normal law with zero means and variance- covarlance matrix 1 1 -&L 1 1 ~ 1 . 

(3) The quantity -^2_ (ya"zZ^p x p a ) 2 which may be expressed as in (g) as the ratio of two 
determinants, is distributed according to a x 2 -law with n - k degrees of freedom, and 
independently of the (a- aK 

Making use of the results as stated in Theorem (A), one may set up confidence 
limits (or a significance test) for any a by setting up the appropriate Student ratio. 
Or one may set up confidence limits for <r 2 by using q 1 . Confidence regions may be set up 
for all of the a or any sub- set of them by setting up a Snedecor P ratio, in which the 
numerator sum of squares is the exponent in the normal distribution of the corresponding 
set of a_ and the denominator sum of squares is q. . 

An Alternative Proof of the Independence of q 1 and q 2 . 

The proof which has been given for establishing the independence of the two 
above expressions in the probability sense depends upon Cochran's Theorem. The indepen- 
dence can also be established by the use of moment generating functions. Let 
be the moment generating function defined as 



where q 1 and q 2 are defined in (f ). If we can show that 



$8.2 _ VIII. NORMAL REGRESSION THEORY 



then it follows by Theorem (B) In 2.81 and (e) of $33 that q 1 and q g are Independently 

awa with n - k and k degrees of free 
" z a* ^ e ma ^ write equations (b) as 



distributed according to yf-lawa with n - k and k degrees of freedom respectively. 



which reduces to 



from which we have 



Now 



where 



For q 2 we have 



A ^z z . 
a a 



The probability element associated with the sample is given by (a). Makipg the 
transformation j:(y a -^ax a ) - z a , a - 1,2, ...,n, we obtain as the probability element of 



the z a the expression 



For the m. g. f . we have 



-00 -CD 



VIII. NORMAL RECffiESSIQN THEORY 



$6.2 



where 



where 



- 1 - 26 + 



B 

GLp i 'c. *-*f t~ 

The value of the n-tuple integral is ^i where B is the determinant |B I . 
B, let us augment it as follows (letting 1 - 26 - M, 2(e--6 2 ) - N): 



To evaluate 



B - 



M+NA n NA 12 ... NA ln 



x 21 ...x kl 



,... NA, 



2n 



x m X 2n'" x kn 
1 



1, 



o 1 



Suppose the (n+p)-th column is multiplied by - N(2_ a pq x ) - C , say, p - l,2,...,k, and 

q-1 q p 
added to the a-th column a - l,2,...,n. We obtain the following expression for B 



B - 



o 


M .. 


. . . . 3 


M1 ^21 ~kl 


o 


6. . . 


* 

../..M 3 


. 

C 3t X 




(V .. 


....C, 


o o 


c 11 


12 

c .. 


....C 


) 1 o 


a_. 


a_.. 


"^2X1 

. 
. 
. 

....C,_ ( 


) 1 



Now suppose the a-th column (a- l,2,...,n) be multiplied by 
column (p - l,2,...,k). Noting that 



and added to the {n+phth 



P-q ' 



58.2 



VIII. NORMAL REGRESSION THEORY 



165 



We find that B reduces tc 


D 
M 0. . . .0 






M....O 




B - 


o 6 M o o C .0 






C, , C^..C, (1+12) o ...0 






11 12 * m v T W 
C , C ^. .C ( 1- n) 






21 22 2n v M': . 



. 






l/ui ^Vo ^\rn I 1 tl' 

JV 1 IV C JYJL 1 1*1 





Therefore we have 



n-k 



k 

2 



which concludes the argument that q and q 2 are Independently distributed according to 

p 
X -laws with n-k and k degrees of freedom respectively. 

Remarks on the Generality o>f the linear Regression Function. The regression 



function 



P 

= t, 



x is much more general than it might appear at first. For example, if 



t 



Jc-1 



t , the regression function would be the polynomial 



-, -, A g 

i\. _,^ 
zl_apt p ~ , in which case we would have a random variable y, having as its mean value the 

function of t given by ^.apt^" 1 . The estimates a would, of course* have the same form 
as those given by (e), except that x would be replaced by t^""- 1 in calculating a_ and 



p 1 



Again, we might have for k = 2m+ 1 , X- - 1, x ? = sin t, x, cos t, x^ sin 2t, 
= cos mt, in which case the mean value of y is a harmonic function of the form 



+ a 2 sin t + a cos t + 



rat. The procedure for obtaining a^ a 2 ,..., 
a 2 1 is as before given by (e). 

Another example: Suppose k 2 and x la = 1, x 2a =* for a 1,2,...,^, and 
x 1a 0, x 2a - 1 f or a = n 1 +l,...,n 1 +n 2 . The sample O n : (y a lx la ,x 2a ), a = 1 ,2,.. ^n-n^r 

drawn from N( y a, x^,a 2 ) is equivalent to two Independent samples 0^ : (y- >y *..iy Y , )/ 
^1 P p 1 o U 

^a 2 ) and N(a 2 ,<r 2 ) 



n 



+T*" y n ^*n ' ^ 3ize n i and n 2 wapectlvely, drawn from 



respectively. This example extends readily to the case of several independent samples. 



166 _ VIII . NORMAL REGRESSION THEORY _ 18,3 

Curvilinear regression ia alao a special case* For example, for quadratic re- 

gression in two variables, say u and v, we would let x 1 -1, x g -u, x, v, x^ u , 

p 
x v , x^ uv. 

8,3 A General Normal Regression Significance Test 

The following general significance test frequently arises in normal regression 

V 

theory: A sample R : (y a l x i a > x p a > -' x ka^ a " 1 > 2 >'-> n ia assumed to be drawn from a 



population with distribution N(^>_a^x ,<r 2 ), and it is desired to test the hypothesis 

P=1 p p 
that a p+1 , a p+2 , . .., a k (r<k) have specified values, say a p ^ 1>0 , a r + 2 ,o'*"' a 

tively, no matter what values a 1 , a 2 , ...,a and <r 2 may have. For example, all specified 
values may be zero in which case the problem is to test the hypothesis that y, which is 
assumed to be distributed according to N(^a^x D ,cr 2 ),is actually independent of x -, 



In order to determine the teat function (of the y a and x^ a ) for testing the 
hypothesis we shall make uae of the method of likelihood ratios discussed In 57.2. 
The probability element of the sample ia 

(a) dF(y 1 ,...y n ) - 
where 

(b) p (n' a T a 2' *> a l 
likelihood function. 



Let XI be *the (k+1 )-dimenaional parameter apace for which <r > 0, -OD < o^ < +00, 
p = l,2,...,k, and let co be the (k-r)- dimensional subapace of ilfor which a - a - Q , 
a p+2 a p+2 Q , ...,8^ - a^ Q . If Ho denotes the hypothesis to be tested, then Ho is the 
hypothesis that the true parameter point lies in ou, where the admissible points are those 
in H. 

The likelihood ratio A for testing Ho is given by 



where the denominator is the maximum of P(0 n ;a 1 ,...,a k ,<r 2 ) for variations of the parameters 
over fl and the numerator ia the maximum for variations of the parameters over co. To find 
the maximum of the likelihood function (b) over U, we follow the ordinary procedure of 
taking the first derivative of the likelihood function with respect to each parameter, 



VIII. NORMAL REPRESSION THEORY 



setting the derivatives equal to zero. We find that the maximizing values are 
(d) < S 



' p - l,2,...,k, as given In 8.2. Substituting these In the likelihood function we find 

n _ n 



^a^ag,...,^,^) - ( ) 2 e 2 . 



Similarly, by maximizing the likelihood over cu, we set a p+1 
m \ Q> and differentiate with respect to o^, a 1f ...,a , obtaining- * 



where y - 7- < - x, a - 7x, a QO - , and I 



Substituting in the likelihood function, we find 



n _ n 
)2 e 2 ' 



Therefore 

^ n 
(f ) A - b) 2 . 



Now it is clear that cr^ ^ or^, since <TQ la the minimum of > (y" ax ) ^ or var:1 - a tion3 



a 
of a^,...,^, while cr^ la the minimum for variations of a 1 ,...,a r , for fixed values of 

&,...,B.. Now let 



n 

q - 



^ 
-o^) is 



The difference n(o^-o^) is simply the further reduction In the sum of squares 

) 2 obtainable by varying a r ^ 1 ,. . .,0^. in addition to a 1 ,a g , . . .,a p . 
Expressed in terms of q 1 and q 2 , 

(g) A - 



1 ^ 



Thus A Is a single- valued function nt Qg/q, , irtilch meana that q 2 /q. la equivalent to A as 
a teat function. The "nearer the value of A to unity, the smaller the value of q> as 



168 _ VIII. NORMAL REGRESSION THEORY _ 88.3 

compared with q 1 . To complete the problem of setting up a test for testing H^> we must 
now obtain the distribution of A, (or q 2 /Q 1 ) under the assumption that the hypothesis 



H^ Is true, 1. e., that has been drawn from N( xIaJK-,^) for a . - a - '***' 

*k " *k,o* 

We shall now show- that If H Is true then q^ and q g are distributed Indepen- 

dently according to ^-lexra with n - k and k - r degrees of freedom respectively. 

The probability element for the sample O n from the population having distribu- 



(2l 



tion N(2apX p* ") la given by (a). Now as we have seen In 8.2, the sum of squares In 
the exponent can be written as 



(h) 

The second expression in (h) may be written as 




where 1^. (u l,2,...,r) are linear functions of (a -a ), (g - r+i,...,k) and where 

o o 

I Ibgh' I " I la 1 ! I" 1 * g>h - r-fi ,...,k, where a^ 1 is the element in the g-th row and h-th 
colxomn in I la I I" 1 P,q - l,2,...,k. See 3.23. 

pq 
To verify the statement that expression (i) is equal to the second expression in 

k 
(h), let us denote a - a by d and let L - > \ d . We must then determine the 1 

u U- u TJ B< Ug g . ilg 

and the b so that 



that is, an identity in the d'a. 

d 2 
Taking ^ &A of botn sides of this identity we get 

/v - l,2,...,r% 



iv-Sjg " a vg 
and hence 

(1) V 

*2 

Taking fa a of both sides of (j) we get 



8.3 



Vlir. NORMAL REGRESSION THEORY 



169 



(m) 







b " 



u,vl " '* 
Using (k) and (1) we find that (m) reduces to 



+ gh 



(n) 



or 



(o) 



" a 



gh 



Referring to 2.91* it will be seen that 



(P) 



b^ - 



11 



*rl' 



*g1 



+ b gh " a gh 



,:.a 



rr tt rh 
V a gh 



which la eqxial to the term In the g-th row and h-th column* In the Inverse of | la^ a l I 

Making use of the fact that the sum of the squares In the exponent of the like- 
lihood function in (a) is 



(q) 



n 



expression (i), 



it is now clear that by maximizing the likelihood function for variations of a-,..., a,, 
o- 2 inu>, we find 



(r) 



since the first expression in (i) vanishes when a., , . . . ,a p are varied so as to maximize the 
likelihood function. 

Remembering that when the likelihood function is maximized with respect to a- , 
cr? 



, . * . ,a, ,<r 2 over Awe obtain cr? as given in (d), we clearly have 






Is) 



which is a function of the a (g r+i,...*k) which, as we have seen in 8,2, are distri- 

o 



i 7 Q _ VIII NORMAL REGRESSION THEORY * Sfi^ 

nc n 
buted Independently of -g- - Q 1 , Q 1 being distributed according to the x, -law with n - k 

degrees of freedom. But - -|q 2 la aeen to be the exponent In the joint distribution of a 
when Ho is true. Hence by 5.23 q g is distributed according to the 7(, 2 -law with k - r 
degrees of freedom. 

Since q g and q 1 are distributed independently according to ^f-laws with k - r 
and n - k degrees of freedom when HQ is true it follows that the quantity 

q (n-k) 



is a Snedecor ratio distributed according to the law \_ T n-i^F)^ ( 3ee 5*0 when HQ is 
true. Now 



n 



which shows that A is a single-valued function of P, and hence P is equivalent to A for 
testing the hypothesis HQ, the upper tail of the P-distribution being used for determining 
critical values of F for various significance levels. It can be shown that this test is 
unbiased (see 7.3 ) although we shall not demonstrate this fact here.* 

We may summarize our results in the following fundamental Theorem in normal 
regression theory: 

Theorem (A): Let O n : (y a lx lQt x Ca , ..., x ka ) be a sample from a population with 

distribution N(^a x ,0^), where x are linearly independent . Let H be the statistical 
pss1 p p p 

hypothesis that the true parameter point a 1 , a^,..., ^-^ belongs to <^: -o>< a < +00, 
(u m i ,2, ...,r), cr p > o, a r+1 * a p ^ 1 Q . . . ,8^ - a^ Q which jls a subset of the admissible set 



A: -oo < a < 4-00, p 1,2,... ,kp^ > o. Let o^ be the maximum likelihood estimate of a* 2 
for variations of the parameters over O, and cj^ the maximum likelihood estimate of a- 2 for 
variations of the parameters over o. Then: 




(2) no . no^ + -^" ^^g'^o^V" 8 ^ o } wnere Sp are given b (d), I Ib^l I is the 
inverse of the matrix obtained bj deleting the first r rows and columns of 1 1 a | 
p,q = i,2,...,k. 

(5) The quantities 

no a n(cr ^"" ( ^ ) 

q i " "TT" q 2 " T2 ' 



are independently distributed according to x g -laws with n-k and k - r degrees of 



*See J. F. Daly, "On the Unbiased Character of Likelihood Ratio Tests for Independence 
in Normal Systems", Annals of Math. Stat . , Vol. 11, (19^0). 



VIII. NORMAL REGRESSION THEORY _ _ 171 



freedom. respectively. when H la true. 

Q .r - _____ 

( ^ ) The likelihood criterion A for testing H Q ig given fcy 



n ri 



where P is Snedecor^ ratio which Is. distributed according to 



when H Is true, 

...... - \j -~~ 

8. *f Remarks on the Generality of Theorem (A), 8.3 

In order to emphasize the generality of Theorem (A), 8.3, we shall discus* 
briefly several cases of particular interest. 

B.k\ Case i . It freqaently happens tnat the following statistical hypothesis 



la to be tested on basis of a sample 0^: (y l x i a > x p a > > x ka^ a3sumeci to have been drawn 



from a population with distribution N(> a^x ,cr )r 

_C1: j -OD < a^ < +CD , <r > o, p - 1 ,2, . . .,k 

JL. 2 

co: ! Region in /I for which ^_c a o, cr > o, u r+1 ,r+2, ...,k 

pl * -^ 

where the c are linearly Independent constants. In other words the hypothesis to be 
tested here is that there are k - r linear restrictions among the a , given that the 
sample la from a population with distribution N(^a x .<r 2 ). Denoting this statistical 
hypothesis by H 1 , we may verify from Theorem (A) that the likelihood criterion for testing 

H 1 la of the same form as A for H_ where <r 2 q- is the minimum of S - ^1 (y a ~^L a n X TvJ 2 for 
' <zi pai ^ ^ 

variations of a. , . . f a over FL, and cr 2 q g is the difference between the minimum of S over u> 
and the minimum of S over A- As In the case of H Q , q 1 and q 2 for H^ are Independently 
dlatrlbuted according to -^-laws with n - k and k - r degrees of freedom respectively, 
when H 1 is true. To verify that this is true, we transform the a as follows: 

a -a' 'u - 1,2,..., r 



P-1 
We may write this twinafonnation aa 



JLLE 



VIII. NORMAL REGRESSION THEORY }8.12 



where 

Cgp - 1 g - P i r 
- othemrise . 

Without loss of generality we may assume the c to be auch that |c 'I J 0. Hence 



k k 

and 21 a x I> a q x q where x* > c^x . Therefore H^ may be expressed as the statis- 
tical hypothesis: 

H: -oo < aJ < oo , <r )> o p- i,2,...,k, 

co : Region in 1 for which a ' o, <r 2 > o g r+1 , . . . ,k, 

o 

which is to be tested on basis of the sample O n : (y GL l x i ot > x 2 CL * > ^Q.) a!BS 1 >2,...,n 
drawn from a population with distribution N(^; a'xj^cr 2 ). Theorem (A) is immediately 
applicable to H' as expressed In this form. 

8.1+2 Case 2. The following statistical hypothesis say H frequently arises, 
to be tested on basis of a sample O n from N(^> CLX ,<r^); 

.fL: -oo < ap < oo, . o- 2 > o, p l,2,...,k 

co : Region in A for which a - ^c u ^a^, a^ > 0, p T,2,...,k. 

In other words the hypothesis H is that the a^ can each be expressed linearly in terms 

of r k) parameters a. 1 , &.' where the c up are given. By using the transformation 

fi r 
c^a 1 , where the c^, (,q r+l,...,k, p l,2,...,k) are further given numbers 

such that Ic^l ^ we can express H as follows: 

H: -oo < a^ < oo, <r 2 > o, p = i,2,...,k 
^: a g ** ' cr 2 > o, g - r+1,...,k, 

to be tested on basis of the sample O n : (y a l x i CL * > x k a ) a ** 1 * 2 *---^ n > x a = , 
from a population with distribution N(JXaJ ) x*,cr 2 ). This case is clearly covered by 
Theorem (A). 



In this case o la the minimum of S - (ya-VW 2 for variations of a 1 . 

o^T p-T ^ p 1> 

ag,...^ over n, while <Tq 2 is the difference between the minimum of S for variations of 

the ap over n and that of S for variations of the a over u>(l.e., for unrestricted varia- 



tions of a^, u-i,2,...,r, when the ap are replaced by "c^a'q in S and the a' are set 
equal to o (g-r+1 , . . ,,k). ^ 



VIII. NORMAL REGRESSION THEORY 



q- and q 2 are independently distributed according to ^ 2 -laws with n - k and k - r degrees 
of freedom, respectively, when H is true. 

6.^3 Case 3. The following variant of the hypothesis H, of 8.3 arises in 



such problems as randomized blocks (see 9-2 ), Latin squares (see 9^ ), etc., to be 
tested on basis of a sample O n : (y a l x i a > *> x ka^ aiB 1 *2,...,n. Denoting this hypothesis 



by H"', it is specified as follows: 



-oo < ap < +00, cr 2 y 0, p - 1,2,...,k, 
with the a^ restricted by the r 1 linearly Inde- 
pendent conditions. 



I The subspace in fl for which 

" ' V " ^2,...,^, where r 1 < r 2 < k ' 
* 

H" 1 is the hypothesis that the a satisfy r 2 - r 1 further linear restrictions, assuming 
that r <1 linear restrictions are fulfilled, linear independence being assumed throughout. 
H" 1 is to be tested on basis of a sample O n : (y a l x i a >-* x kcx^ a ** 1 > 2 >-** n In this 



o p 

case o- q- is the minimum of S J>_(y a - ^L-^rprKi) for variations of the a over IX (i. e. 

o^T p=i p p k p P 

for variations of the a subject to the restrictions J>_ dp^a > A= l,2,... 4 r 1 ) while cr q g 

is the difference between this minimum and that for variations of the a overall, e. for 



variations of the a subject to the restriction "^r^ 6 ^^ v " I>2,...,r 2 ). When H' is 

P p-1 p p 

true, q 1 and q are independently distributed according to ^ l awa with n - k - r 1 and 

r 2 - r., degrees of freedom respectively. 

That this case is 'covered by Theorem (A) may be seen by considering the non- 
singular transformation of the ^J>_ ( l Da a I) = a g> q 1 ,2-, . . ,,k where 6L are given numbers 



since that |dJ ^ 0. We have a^ - XI da' whlch transforms y a^x^ into 

IT /n. i ^ 



where xJ^ a - ^ d^x^ Now under XI the regression function is ,-3?-- ^X-na' an< ^ we 



therefore specify H^ 1 as 



j-oo < a; < co, er 2 > o, p - 
A: 

| being assumed o from the outset ) 



\ Subspace In A for which 




The applicability of Theorem (A) is now obvious. 



174 VIII. NORMAL REGRESSION THBQRY 



8.5 The Minimum of a Sum of Squares of Deviations with Respect to Regression 

Coefficients which are Subject to Linear Restrictions 

It will be noted in $$8.3 and 8.4 that frequently we have to find the minimum 
of 

(a) S- 

for variations of the a^, when the a^ are subject to one or more linear restrictions. 
The object of this section is to give an explicit expression for the minimum of the sum of 
squares, under such conditions, as a ratio of two determinants. 

Let us consider the problem of finding the minimum of the sum of squares (a), 
when the a are subject to the linear restrictions 



(u 1*2, ...,r < k). 



We shall use the method of Lagrange, $4.7 , and write 

(c) F(a 1 ,a 2 ,...,a k ;A 1 ,A 2 ,...,A p ) - s + 2^A U (T c up a p ). 

It is necessary that 

" ' , (q 



in order for S to have an extremum (in this case a minimum). Performing the differentia- 

tion |-| , these equations may be written as 
q 

k r 



where a , a (and a QO ) are defined in (d) of $8.2. Multiplying each of (e) by a and 
summing from q - l to k, we get 

(f) 

Expanding S, we obtain 



and making use of (f ), 



18.5 



VIII. NORMAL REGRESSION THEORY 



17S 



Rewriting (h) and (e) with a Q - 1, and using the conditions (b), we obtain the following 
homogeneous linear equations In the 1 + k + r quantities a Q , a 1 , a g , . . , a^, A 1 , . . . , A r . 



(3 ' a oo )a o 



(i) 







U " 1 > 



In order for these equations to have a non- vanishing solution the determinant 
of the 1 + k + r equations must satisfy the Well-known condition of being 0, i. e., 



a or*" a ok o .... o 
a ll*"* a lk c n"** c n 



(o-o) 

. 
. 

(o-o) 



c n c rk 



- o. 



Treating the first column as a sum of 2 columns as indicated and employing the 
usual rule for expressing the determinant as the sum of two determinants, we find the min- 
imum value of 3 to be given by 

A 



where 



(11) 



a OO a 01"" a Ok 



ik 



rk 



and A OQ is the minor of a QO in A. 

It should be noted that the values of the a^ and A U which yield the extremum of 
P(or the values of the a which yield the minimum value of S) are given by the last k + r 
linear equations in (i) with a Q 1 . 



CHAPTER IX 



APPLICATIONS OF NORMAL REGRESSION THEORY TO ANALYSIS OF VARIANCE PROBLEMS 

In this chapter we shall consider some applications of normal regression theory 
together with the general significance test embodied in Theorem (A), 8.3, to certain 
problems In the field of statistical analysis known as analysis of variance. This field 
of analysis is due primarily to R. A. Pisher. 

9*1 Testing for the Equality of Means of Normal fopulationa with the Same 
Variance 

Suppose (y pa )> <* - i,2,...,iu, p= l,2,...,k, are samples from N(a 1 ,<r 2 ), 
N(a g , o- 2 ), ..., N(a^,cr 2 ) respectively, and that it 1 Is desired to test the statistical 
hypothesis H (a^ . . .-a. ) specified as follows: - 

Q : -oo < a^ < oo, " / 0, p - 1,2,...,k 

cu : a^ - a, -co< a < oo , <r 2 > o, p i,2,...,k. 

In other words H is the statistical hypothesis that all of the samples are drawn from 
normal populations with identical means, given that the populations are normal and have 
equal variances. The probability element for the k samples is 



V2Tt<r 

Maximizing the likelihood function (i. e.,- the expression in [ ]) for variations of the 
parameters over n, we obtain 



St 

. 
for variations of the parameters over <jj we find 



where y - Tk^.^'ooL' the mean of the y ' 3 ^ the p " th saD ? )le Maximizing the likelihood 



9. 2 IX. APPLICATIONS OP 



(c) 



where y ~ 



(d) 



N W q 1 and q 2 of Theorem < A )> 58.3, are aa follows: 



2 > 



Assuming HCa^a^. . ***\) is true ( i. e., that a 1 - a g - ... - a^j It follows from Theorem 
(A), 8.3, that q 1 and q 2 are "Independently distributed according to x 2 * laws with n - k 

and k - 1 degrees of freedom respectively. Hence 

k 



(e) 



(n-k)q 



is distributed according to tu v(F) dP. 

K- i ,n-K. 

To see exactly how this problem is an application of Theorem (A), the reader 

should refer to 8. in, Case i. It will be noted that -the set of k samples, (L , 0- 

Jc 1 

0_ , can be regarded as a single sample of size n (n 

k k 

K- _- p 

tribution N(<>_a x , <r ), where x. = 1 , x 
p-1 p p 



a - 



for O n , and so on. The hypothesis 

J. where C 1p - 1, a - a, a - 0, q - 2,3,...,k, 



from a population with dis- 
x k - b for O n , Xg -1, x 1 - X 3 - 

is that all a are equal, i, e. 



q=1 



9.2 Randomized Blocks or Two-way layouts 



Suppose y 4 , ( i - i,2,...,r, j = i,2,.'..,s) are random variables independently 

r s 
distributed according to NCm-i-R.-i-C..,^) where > R. - 5]C^ 0, and that we wish to test 

on basis of the y, . the hypothesis H[(C.) 0] specified as follows: 

2 



-co < m, R,, C. < oo, <r* > o, 

r - A -J 



1,2, ...,r; j - 1,2, . ..,s 



The aubapace in D. obtained by setting each C^ - 0. 



The c* space is simply the subspace in Q for which the C. are all 0. The probability ele- 
ment for the sample (i. e. the y. .) is 

1 " ^T^^ij"" 1 " R i" C 1 )2 
(a) t(-zz~) rs e 2<rl 'J ]T~Tdy... 



The sum of -squares in the exponent of (a) may be written aa 



178 IX. APPLICATION OF NORMAL REGRESSION THEORY TO ANALYSIS OF VARIANCE PROBLEMS S9.2 

s - A.[ 

^9 J 

(b) 

" g (y ij~^i"*j^ )2 + 

where y - j^Z_y 1 j, y ia ^A y^ y.j - ~^_y 1: . Maximizing the likelihood function In [ ] 



(which la equivalent .to minimizing S as far as m, the R^ and C* are concerned) for varia- 



y 

luiv 
tions of the parameters over -fl, we find 

m - y, iL - y, - y, C, - y , - y, 



Maximizing the likelihood function for variations of the parameters over u we simply set 
each of the C. equal to zero and maximize for variations of cr 2 , m, R^ (subject to^R^ - o). 
We find 

m - y, fi - y. - y 



It may be readily verified that 



u> 



q 1 and q of Theorem (A), 8.3, are given by 



rsoj 



2 

<T 



It follows from Theorem (A), 8.3, (See Case 3, $8.^3) that q 1 and q 2 are inde- 
pendently distributed according to y^-Iaws with (r-i)(s-l) and (s-1) degrees of freedom 
respectively, when H[(C.) =0 ] is true. Hence, under the same conditions, 

(r-l)q 2 (r-1 



is distributed according to h/_ 1N /^ . Wa . v(P)dP,and is equivalent to the likelihood 

\ a i ) 9 \ r- i M s - i ) 



ratio criterion for testing H[(C.) 0] using the upper tail of the distribution for ob- 
taining critical values of P for given significance levels. 

In an entirely similar manner we may derive an F test for the hypothesis 



S9.2 TY r APPLTnATTON OF NORMAL RF/IRRSSTON THEORY TO ANALYSTS OF VARIANT 

H[(R. )=o] defined as follows: 

Jl: f Same as forn in definition of H[(CJ0] 

cx>: I The subspace inn obtained by setting each R. o. 

Following ateps similar to those followed for H[(CJ-0] we find for HUR^-c 

J -^ 

(c) P - 



which will be distributed according to h/ r _ 1 \ / r . 1 )(g.-j \(F)dF, when HKR^J-o] is true. 

The applicability of Theorem (A), 8.3, in testing H[(CJ-0] is evident when it 

is noted that under L the y 1 . can be regarded as a sample of size rs from a population 

1J r+a+i g 
having a distribution of the form N(^>__a x ,cr ) In which there are two homogeneous lin- 

p-1 p p 
ear conditions on the a (the a being written in place of the m, R,, C- and each x having 

the value or 1 ) whereas under o> there would be s + 1 linear conditions on the a (or 



s - 1 linear conditions In addition to those already imposed under /l). Both HUC-J^O] 
and HKR^Ho] come under Case ~$, 8.43. 

If the y^., 1 = l,2,...,r; j - i,2,...,s, are considered in a rectangular array 
with 1 referring to rows and j to columns, then it will be seen that we are assuming that 
y^ . is a normally distributed random variable with a mean which is the sum of three parts: 

a general constant m, a specific constant R 1 associated with the i-th row and a specific 

^ 

constant C s associated with the j-th column (where 21 R, = ]EIc. 0). The variance is 

J 1- 1 j J 

assumed to be independent of i and j. Statistically speaking. R, la often referred to as 

effect (or main effect ) due to the i-th row, and c . the effect (or main effect ) due to the 
.1-th column . HfCR^Ho] is therefore the hypothesis that row effects are zero no matter 

what the values of m and column effects. The quantity < ^...(y 1 i~y*~y '+y) 2 Is often re- 

ij J * J 
ferred to as "error" or "residual" sum of squares after row and column* effects are re- 

moved, and when divided by (r-1 ) (s-1 ) the resulting expression provides an unbiased esti- 

o v _ o 

mate of cr no matter what the values of m, the R^ and C^. ^_Jy4 ~y) la usually referred 
to as sum of squares due to rows, and when divided by r - 1 , the resulting quotient pro- 
vides an unbiased estimate of a- 2 (and, as we have seen, independent of that obtained by 
using "error 11 sum of squares) if the R^ - 0, no matter what values the C. and m may have. 

K 2 

A similar statement holds for y (y ^-y) . It can be shown by Cochran f s Theorem and by 

"- 1 9 J 

the uae of moment generating functions, although we shall not do so here, that 

Z^v K 

^ y lj" y l" y .j' l ' y ^ 2 ' ""2^- (yi~y) 2 > ^2 ^(y.j'Y) 2 are Independently distributed according 

to \T- laws with (r-1) (s-i), (r-1), (s-i) degrees of freedom respectively, if the R I and 



1 80 IX. APPLICATION OP NORMAL REGRESSION THEORY TO ANALYSIS OP VARIANCE PROBLEMS 



are all zero, and furthermore the sum of the three quantities Is - 

~ 



* n --y), which 
* 
is distributed according to the ^ 2 -law with rs-1 degrees of freedom if each R^ and each 



2 



, 



is zero. 



These various sums of squares together with their degrees of freedom are com- 
monly set forth in an analysis of variance table as follows : 



Variation Due to 


Sum of Squares 


Degrees of Freedom 


Rows 


- <n.-r> 2 


r - 1 


Columns 


SB ^^ ( "V V \ 
f* -^ * J ^ v ' 


s - 1 


Error 


S.. S (T^,^J,' 


(r-1) (s-1) 


Total 


s - ^, 7l ^,' 


ra - 1 



The main facts regarding the constituents of this Table may be summarized as follows: 

(1) S - Sp + S c + Sg. 

(2) Sp/ 2 , SQ/ p, Sg/ 2 are independently distributed according to x. 2 ~l aw3 with (r-1), 

(s-1), (r-l)(a-l) degrees of freedom respectively if all R. and C. are zero. 

(3-1)Sn 1 J 

(*>) F - - is distributed according to h, ., , x_ 1 j,_ 1 JPJdP when Hi [(R 1 )-0] is 

true. 

(r-i)S r 
(U) P - g- ^ is distributed according to h, g _ 1 . / r-1 x (g-1 x(P)dP when H[(Ci)-0] is 



true. 



(5) 



2 is distributed according to the x 2 "law with (r-i)(s-i) degrees of freedom for 



any parameter point in II (i, e. no matter what values the R^ and C. may have). 
(6) S/or 2 is distributed according to the ^ 2 -law with ra-1 degrees of freedom if all R^^ 



and 



J 



are zero. 



The theory discussed in this section has been widely used in what are called 
designed experiments, particularly in agricultural science. For example, rows in our 
rectangular array may be associated with r different varieties of wheat, columns with s 
different types of fertilizer, and y^, with the yield of wheat on the plot of soil associ- 
ated with the i-th variety and j-th fertilizer, it being aaaumed that plots are of the 
same size and the soil homogeneous for all plots. In such an application, we emphasize 
that the fundamental assumptions are that the yield on the plot associated with the i-th 



9.3 IX. APPLICATION OF NORMAL REGRESSION THEORY TO ANALYSIS OF VARIANCE PKRT.EMfl iAi 



variety and the j-th fertilizer may be regarded as a normally distributed random variable 

having mean value of the form m + R. 4- C^ (where^_R^ s> C * 0), and a variance <f 

1 J .1 x j- J 
which has the same value for all 1 and j. The question of whether the assumptions are 

tenable in any given case is one for the individual applying the method to settle. In 
this example H[(C.)0] would be the hypothesis that fertilizer effects on yield are all 
equal -no matter what the variety effects may be. 

9.3 Three-way and Higher Order Layouts; Interaction 

The analysis presented in 9.2 can be extended to three-way and higher order 
layouts. In this section we shall consider in detail the three-way layout. Let y^ 
(i = 1, ?,..., r; j = 1, ?,..., s ; k = i,2,...,t) be random variables distributed indepeir - 
dently according to 

(a) N(m+T ljk ,<r 2 ), 
where 

(b) T ijk = X ijo + T iok + Z ojk + X loo + T ojo * W' 

where each set of I's on the right hand side of (fc) is such that when summed over each index 
the sum is zero. Thus there are (r-i )(s-l ) linearly independent constants in the set 
(I,. I, (r-i ) such constants in the set H loc J, with similar statements holding for the re- 
maining sets. For convenience, we may consider y^^ as a random variable associated with 
the cell in the i-th row, j-th column and k-th layer of a three-dimensional rectangular 
array of cells. The mean value of y^^ is given in (a), in which the IJL OO > t ne 



the I , are row, column and layer main effects, respectively; the 1^ . are row- column 
interactions, the I row- layer interactions, and the I are column- layer int e rac t ionat 



The probability element of the 
(c) 

The sum of squares in the exponent of (c) is 
(d) S ^ 

Now let 

y -A-!>Iy< 

rst , * 1 lie ' 

y*. . = -z<c_y4.n> with similar meanings for y. * and y a .^. , 
(e) ^11* ** ~^-^11k' with similar meanings for y^.^ and y.^> 

Y i .. = y^.."^* with similar meanings for Y. ,. and Y. , k , 
*These are called first-order interactions. 



1 82 IX- APPLICATION OP NORMAL RTOTCRF.ft3IQN THEORY TO ANALYSTS OP VARIAffifTft PRflHTJqyLci 

Y^i. - yij.~yi..~y. j'+y> with aimilar meanings for Y^.^ and Y^, 
( Q ) ^z ______o 



", with similar meanings for S. Q . and S o .. , 
S. 00 - XI(Y 1 ..-I loo ) 2 , with similar meanings for S Q . and 3 00? , 

lj J>K 

OOO 

Let S be the value of S with each 1^^^ 0, with similar meanings for 



We may write 

s 



Squaring the quantity In [ ], keeping the expressions within the parentheses Intact, and 
summing with respect to 1, j, k, we obtain 

(8) 3- S... + S.. Q + S. Q . + S .. + S. 00 + S . + 3 OQ . + 3 000 . 

It follows from Cochran's Theorem, 5.2U , (and can also be shown by moment -generating 
functions) that the eight sums of squares on the right side of (g), each divided by o^, 
are Independently distributed according to x 2 ~l awa with (r-1 )(s-l )(t-i ), (r-1 )(s-l ), (r-1 ) 
(t-1), (s-i)(t-i), (r-1), (s-1), (t-1), 1 degrees of freedom, respectively, If the 7^^ are 
distributed according to (a). 

The sumsof squares In (g) provide the basis for testing various hypotheses con- 
cerning the Interactions I^^ Q9 I i o i c > I o jk and the maln effects T loo' ^jo' ^ok* Fbp "* 
ample, suppose we wish to test the hypothesis that row-column Interaction Is zero (1. e. 
each IIJQ-O) no matter what the row- layer and column- layer Interactions and main effects 
may be. This hypothesis, say H[(I^ ^ )o] l may be specified as follows: 

-oo< m, I ljo , I loj , I QJk , I lQO , I ojo , I Qok < co, a 2 > o, 
-Cl: <^ f or all l,j,k, the sum of the I ! s In each set over any 
(h) / Index being 0. 

I Subspace of fl obtained by setting each I 1 1ft o. 



Q.3 IT. APPLTGATTnn ng NORMAL KTarcrraaflTQir THEORY TQ ANALYSIS QP VARIANCE ppnnr.EMs 
Maximizing the likelihood in (c) for variations of the parameters overn, we find 



and maximizing the likelihood for variations of the parameters over o>, we find 



It should be noted that in maximizing the likelihood over n we obtain as maxi- 
mum likelihood estimates of I I J Q , I lok , 1^, I lQO , I Q .J O , I ook the quantities Y IJ . ,^ 1>k , 



When the hypothesis H[(I 1 j )-o] is true it follows from Theorem (A), 8.3 (see 



Case 3, SB.IO), that 



3 -o 



are Independently distributed according to -y-lBMa with (r-l )(a-i )(t-i ) and (r-1 )(a-i ) 
degrees of freedom, respectively. Hence the F-ratio for testing this hypothesis la 



irtilch i distributed according to h (r-1 )( 8 . 1 j j (r-1 )(s-i )(t-i J^^ when P^^ijo^" ^ is 
true. In a similar manner F-ratios can be set up for testing the hypothesis of zero row- 
layer or zero column- layer interaction. 

The constituents in (g) also provide a method of testing the hypothesis of no 
interaction between rows and columns in a two-way layout from t (t^>2) replications of the 
layout. This hypothesis amounts to the hypothesis that effects due to rows and columns 
are additive on the mean value of the y^, in which case the mean value of y^. is of the 
form m + I loo I Q J O In this problem we consider y i ^ k (i - l,2,...,r; j 1,2, ...,s) as 
the variables associated with the k-th replicate, and assume the mean value of y^ k to be 
m + I^j + 1^ + I Q 4 The problem is to test the hypothesis that each 1^. o. This 
hypothesis which will be called Hf t(I 1: j )-o] is specified as follows: 

!-oo < m, I 1; j , I 10Q , I j < oo, <r 2 > o, for each i and 
j, where the sum of the I f s in each set over each index 
ISO. 

o>: { The subspace ofn obtained by setting each 1^ . o. 



18U IX. APPLICATION OP NORMAL REGRESSION THEORY TO ANALYSIS OF VARIANCE 

Maximizing the likelihood fvinctlon In (c) for variations of the parameters over H, we 
find 



-A 



and similarly 



By Theorem (A), 58-3, It follows that 



S...+S? .+3g..+3g . 



Mo o 9 

^ cr^ cr^ 

are independently distributed according to y^-lawa with ra(t-i) and (r-i)(s-i) degrees of 
freedom, respectively, when H 1 [(I il - )=o], is true, and hence under the same assumptions 

rs(t-i ) 3 
(P) P- 

is distributed according to 



In a similar manner, the existence of second-border Interaction in a three-way 
layout may be tested on basis of replications of the three-way layout. This problem, 
however, leads us into four-way layouts and the details must be left to the reader. 

Suppose we are interested in testing the hypothesis H[(I loo )=o] that the I i 
no matter what the interactions and main effects due to columns and layers may be. 
This hypothesis may be specified as follows: 

n: | Same as fl in (h). 

(q) 



I Subspace of il for which each I.i oo = 0. 



We have 



and 



and hence by Theorem (A), 8.3 , 



69. S IX> APPLICATION OF NORMAL REPRESSION THEORY TO ANALYSIS OP VARANCE PRnRT.EVS 185 



rat 



IV ^2 

are independently distributed according to ~^~ laws with (r-1 )(s-l )(t-l ) and (r-1) degrees 
of freedom, respectively, when H[(I loo )=o], is true, and the P-ratio for the hypothesis is 



which is distributed according to h, ,v . . Wa , w . , A (P)dF. Similar tests exist for 

\ r- 1 ; , ( r- 1 ; \ s- i )( L- i ; 

testing the hypothesis that the I . = o or that the I ook = 0. 

Suppose the interactions I.. _. , I^**., and !.,_,, are all zero and that it is de- 

XJO OJiv lOcC 

sired to test the hypothesis that the main effects due to rows are 0, i. e., I, 0. 

This hypothesis say H 1 f(I ioo ) a= 0] may be specified as follows: 

f 

|-oo < m, I 10Q , I ojo , I Qok < co, a 2 > 0, 



We find 



and hence 

<li - 



1 

v 

f 

j: I Subspace of H obtained by setting each 1^ = 0. 



.. 



s?. + S . + s.. 



rst(<r 2 -o^) & 

(jj LL *OO 

q 2 - 2" ' 

cr or 

which are distributed independently according to -x^-l&wa with rst -r-s-t + 2 and 
r - 1 degrees of freedom, respectively, when H! t(I ioo ) sss O] is true. The P-ratio is 

(rst-r-s-t + 2) S 



which has the distribution h (r-1 ^ ( r3 t- r -s-t+2) (F)dF ^ when H f [d loo )-o] is true. 

'The difference between the F-ratio for testing H[(I^ )-0] and that for testing 

VVflt" 

H'[(I loo )-o] should be noted. In the first hypothesis the interactions are /assumed to be 



THPIHTTY TO ANALYSTS Off VARTJ 



town zero, and in the second one the interactions are assumed to be zero. The 
sum of squares in the denominator of F for the first hypothesis is simply 3. . . while it 
is 3... + S. + S Qt + S. % in the F for the second hypothesis. The terms S? tQ , S Ot > 
SQ are conmonly known as interaction sums of squares, and the process of adding these to 
the error sum of squares S. m % in the case of testing H f I(Ii oo )"l *- a often referred to aa 
confounding first -order interactions with error. Of course, the hypothesis may be set 
up in such a way that only two ( or even only one) of the Interaction sum of squares will 
be confounded with error. The term confounding as it is commonly used is more general 
than it is in the sense used above. For example, if layer effects (IQQ^) are assumed to 
be zero throughout the hypothesis specified by (s) we would have found not only all first- 
order interaction sum of squares but also layer effect sum of squares 3^ Qv confounded witt 

s.... 

There are many hypotheses which can be tested on basis of the S's on the rigit 
hand side of (g), and we shall make no attempt to catalogue them here. It is perhaps 
sufficient to sumnarize the constituents of the various possible tests in the following 
analysis of variance table (the 21 extending over all values of 1, j, k in each case): 



Variation Due To 


Sum of Squares 


Degrees of 
Freedom 


Rows 


s V Z( yi--y )S 


r - 1 


Columns 


S, -Z(y, Jf -y) 2 


B - 1 


Layers 


s ,-Z(y f , k -y) 8 


t - 1 


Row- Column Interaction 


S , - Z(y j . -y , r -7, j , +y ) 2 


(r-O(s-V) 


Row-Layer Interaction 


3. o ."" ^-(yi-ic"^* ."y. k 4 ^^ 


(r-i)(t-0 


Column-Layer Interaction 


^o- 1 ^^f Jk"^f J?"^f fk" 1 ^ 


(3-1)(t-1) 


Error 


3 ...- z <yijk-yuryi f k-y.jk + yi. .*y.j,*y,,ic-y) 2 


(r-D(s^)(t-i) 


Total 


^ -*yijk-y> 2 


rst - 1 



9 A 4 latin Squares 

Suppose y t j (1,J - 1,2, ...,r) are random variables distributed according to 

*. 5 j r g 

N(m + R * Cj + I t ^cO,where ^I R i - ^Cj ^T t - o, such that each T t occurs in con- 
junction with each R^ once and only once, and with each C^ once and only onoe, each IL 
occurring once and only once in conjunction with each Cj. Such an arrangement of combina- 
tions of attributes is known as a Latin Square arrangement. For a given r there are namy 



IX. APPLICATION OF NORMAL REGRESSION THEORY TO ANALYSIS OP VARIANCE PROBLEMS 1fi7 



such arrangements, each of which can be represented in a square array in which the R 1 
would be row effects > the C* column effects and the T^. treatment effects. For example, 
when r - k, the following is a Latin Square arrangement of row, column and treatment 
effects : 



Fisher and Yates (Statistical Tables , Oliver and Boyd, Edinburgh, 1938) have tabulated 
Latin Squares up to size 12 by 12. 

Now consider the following hypothesis, say H[(T t )-o],to be tested on basis of 
the sample 



n: 



u>: iSubspace In SI obtained by setting each TV 0. 

In other words, we wish to test the hypothesis that the T fc are all zero, assuming that th 
jj are distributed according to Nfm+RC-fTo 2 ). The probability element of the y Is 




(a) 



The sum of squares S In the exponent nay be written as 



1,J ^ 



r 2 (y-m) 2 , 



where 



186 IX. APPLICATION OP NORMAL REGRESSION THEORY TO ANALYSIS OP VARIANCE PRnRT.BM* 



where y - l g ^Ty^ y A . - ^Zy^, y. j - ^Zy^ and y (t) - y^, denoting 
summation over'all cells (1 and j) In the Latin Square array in which T t occurs. Let S^ 
be the value of Sp when the R, 0, with similar meanings for SQ and s. 

... . . IHG, llkellhood function in (a) for variations of the parameters over 



Hwe find 



3j-y.j.-y. T t-y ( t)-y- 



Maximizing the likelihood for variations of the parameters over u>, we aet T^.= o 
(t - l,2,...,r) and maximize for variations of m, 1^, C,, a 2 (21^ = Z.Cj - 0). We find 

^- J 



J 

ined by maximizing over .Q, and 



J to be the same as those obtained by maximizing over .Q, and 



(c) 






It follows from Theorem (A), 8.3, (see Case 3, 8.1*3) that q 1 and q. are inde- 
distributed according to the x 2 "l awa 
dom respectively when H[(T t )-o] is true, where 



pendently distributed according to the X 2 "l awa with (r-1 )(r-2) and (r-1 ) degrees of free- 



and hence 

(x-2)q p (r-2)sg 
(d) F -- - - ^^ -- ^ 



is distributed according to h/ ^ Ur-i )(r-2)^^^ anc * ^ 3 e Q u:I - va l ent to ^ he livelihood ratio 
ratio criterion for testing H[(T t )=o], it being understood, of course, that' critical values 
of F for a given significance level are obtained by using the upper tail of the F distri- 
bution. 

In a similar manner, if HKR^J-O] denotes the hypothesis for which A is identi- 
cal with that for H[(T t )0] while co is the subspace in ilfor which R 1 R ? , . . R^ *o, 



IX. APPLICATION OF NORMAL REGRESSION THEORY TO ANALYSIS OF VARIANCE PROBLEMS 1 89 



then we obtain for P the following: 

(e) P = 



which is distributed according to h/ 1 ^ ( r -2)(r-i )^ F ^ dF when HKR^JaQ] is true. 

An entirely similar hypothesis, say H[(C*)-0] may be defined by considering c*> 

j 

as the subspace in JCL for which C 1 C 2 = ... C o, and an P similar to (e) is ob- 
tained with the same distribution as that of P defined by (e). 

We may summarize in the following analysis of variance table 



Variation Due to 


Sum of Squares 


Degrees of Freedom 


Rows 


ag-rZtJFi.-y) 8 


r - 1 


Columns 


s = r]T(y. r y) 2 


r - 1 


Treatments 


s = r]My (t) -y) 2 


r - i 


Error 


SE-W*i--^<t^ 8 


(r-1 )(!->) 


Total 


s = Zjy 1%j -y) 2 


r 2 - i 



The main properties relating to the constituents of this table are the 



following: 



2 2 2 



(2) s2/cr, s2/<j, s/or, S E /cr are independently distributed according to 



with 



r-1, r-1, r-i, (r-i)(r-?) degrees of freedom respectively, when all R, , C. and T+ 

J- J ^ 



are zero. 



(r-i)sg 

(3) p - - gJl is distributed according to h (r _ 1 j (r _ 1 ^ r . 2 j(P)dP when 



la true. 



(4) p - k ig distributed according to h/ r-1 j / p-1 )( r .o)( F ) dF when Ht(CJo] la trua 



g 



(5) 



_ 



distributed accordlng to h/ r-1 ^ / p-1 N/ r-? x(F)dP when H[(T t )0] la true, 



(6) S E /CT is distributed according to the x~ law with (r-i)(r-p) degrees of freedom for 
any parameter point in/1 (i. e. no matter what values m and the R^, C-, T t may have). 

^ P P 

(7) S/cr is distributed according to the x -law with r"-l degrees of freedom when all R^, 

C - and T t are zero. 



1 QQ "DC. APFLTCATTHN OP NORMAL RTMffreflflTnN THEORY TO ANALYSIS OP VAPTAWra PttflHTJUfl 



The reader will find it initructlve to wrtfy ttmt Sg la the lmliHijm of 

-Cj-T^) for variations of the m, R< , C* and T+ subject to the restrictions 

ij" J " JJ "*' x J t v 

R^ - j Cj - ^ T t - o,and can be obtained by applying formula (k) of 8.5, noting 

that all x are or 1 . 

As in the case of two- and three-way and higjier order layouts, Latin Square lay- 
outs have been widely used in agricultural experiments. For example in studying the ef- 
fects of r types of fertilizer on yields of a certain variety of wheat, it is common to 
lay out a square array of r 2 plots of equal area and to associate row and column effects 
with variations In fertility of soil and associate treatments with different fertilizers. 
The main assumption In such an application is that variation In fertility of soil from 
plot to plot is such that yield on the plot in the i-th row and j-th column may be re- 
garded as a normally distributed random variable y, , with mean value of the form m + R, 

+ C, + T., (where 21 H, - X CM - X!lV 0, T f being the effect of the t-th treatment) 
J t 1 j J t t t 

and variance <r which is the same for all plots. 

Latin Square lay-outs have also been tried out in other fields, for example in 
industrial research. 

9*5 Graeco-Iatin Squares 

Higher order Latin Squares, known as Graeco-Latln Squares may be treated in much 
the same manner as Latin Squares. A Graeco-Latln square involving, for example, a four- 
way classification may be defined as follows: Let fc^l, fp^, |7 1 I, M 1 1, 1 - 1 ,2, . . .,r, 
be four sets of mutually exclusive attributes. Let r 2 objects be arranged in such a way 
that r of the objects have attribute a., r have attribute p^, r have attribute "y.., and 
r have 1 attribute <5^, i i,2,...,r, and in such a way that exactly one object has the 
combination of attributes (a , pj, 1, j-1 ,2,. . .,r, exactly one has the combination 

(a^^and so on for each of the combinations (ct 1 ,d.), (p..,*^), (P****), 0^,61). We may 
* J ^-j^j^-j^-J 

conveniently allow the a^ to refer to rows, p* to columns, >. to treatments in an ordinary 
Latin Square and let 6^ refer to the fourth classification. Let y^ (i,j - l,2,...,r) be 
random variables distributed according to NCm+R.+Cj+T..-*.!! ,cr 2 ) where R<, C^, T+, II, are 

r JL* Y*^tJ"** 

effects due to a., p ., 7,., <J , and where 2~R< C< Z~T^ 5_ UL - o. As a matter 
ijtu 1 i i jYt^ r u 

of fact, we may consider the four-way classification Graeco-Latln square as a superposi- 
tion of the two Latin squares fc^l, fp.^1, (7 1 I and |a l, fp^J, \6 \, the a and p re- 
ferring to rows and columns in both cases, the i as treatments in the first Latin square, 
and 6^ as treatments on the second Latin square, such that when the two Latin squares are 
superimposed each 1^ will occur with each 6^ exactly once. Two Latin squares which have 
this property are said to be orthogonal. A set of r - 1 mutually orthogonal Latin squares 



$9.5 IX. APPLICATION OF NORMAL REGRESSION THEORY TO ANALYSIS OF VARIANCE PRC 

la gald to form a complete set of mutually orthogonal Latin squares and vtfien superimposed 
would form an (r+1 )-way classification Graeco-Latin square. Complete seta of orthogonal 
Latin squares exist when r is a prime integer and also for certain other values, e. g. 
r = **,8,9. The sum of squares S in the likelihood function is 



and may be written as 



where 



r 2 (y-m) 2 , 



where y^ , y ., y, y/ t x are as defined in 9.^ and y. , is the average of all y^, having 
mean values involving U u - Let sS "be the value of 3^ when the R^ with similar meanings 
for sg, 1% and sg. 

As before, we may define hypotheses HKR^-o], H[(Cj)-0], H[(T t )=o], and 
H[(U )=0] all with the same Jl. parameter space given by 



.fl: 



J-oo < m, R t , Cj, T t , U u < -i-oo, or > o, 




but wither parameter spaces obtained by setting each R^ o, each C. 0, each T^. 0, and 
each U u = o, respectively. The F ratios for these four hypotheses may be written down by 
the reader in terms of Sg, SR, S^, S^, and S^. 

The analysis of variance table for the four-way Graeco-Latin square turns out to 
be as follows : 



192 IX, APPLICATION OF NORMAL REGRESSION THEORY TO ANALYSIS OF VARIANCE PROBLEMS SQ.6 



Variation Due to 


Sum of Squares 


Degrees of Freedom 


The a 1 


s2 - rX(y<r y) 2 


r - i 




K 1 




The ft. 


- r5~ (v -y) 2 


r - 1 


J 


c T ' j 




TheTr t 


3?= r^I(y (t) -y) 2 


r - 1 


The d 


o _ r j- ( - _- } 2 


r - 1 


u 


^n ^ u ^ 




Error 


"E S <3 ir*i-' 5 -r ? (t)" 5 iu]*' 5)2 


(r-l) (r-3) 


Total 


s -^(y ir r. 


r 2 - 1 



p o 



^ ave ^ e 3ame meanings as for the Latin Square and yr u j is the 



where y, y^ , , y 

mean of all y^ . having attribute <$ u . 

The properties of the constituents of this table are very similar to those of 
the constituents of the table pertaining to the ordinary Latin Square and therefore we 

shall not write them down. The reader may verify that S-^ is the minimum of 
j 

" ' ~ ~ ~ " x2 ~^-'~^ to the restrictions 



> (74 ..-m~R,-C .-T t -U ), subject 

i73-i J J 

and is obtainable from formula (k), 8.5. 



1 



Extensions to higher order Graeco-Latin squares and complete seta of Latin 
squares are straightforward. 

9.6 Analysis of Variance in Incomplete Layouts 

The results which have been presented in 9.2-9.5 depend on complete or bal- 
anced layouts in the sense that there is exactly one random variable corresponding to each 
cell of the layout, or in the sense of orthogonality exemplified by Latin Squares, Graeco- 
Latin squares, and complete sets of Latin squares. Because of this element of balance the 
sums of squares arising in connection with the various hypotheses are relatively simple. 
The problem to be considered here is that of deriving sums of squares appropriate to tests 
of hypotheses in case there are arbitrary numbers of random variables associated with the 
various cells. 

First let us consider the case of a two-way layout. Let y 1 , y 2 , ..., y n be the 
random variables of the sample such that each y belongs to one row and one column in an 
r by s layout. If a y, say y a , belongs to the i-th row and j-th column, we assume it to 

o V V 

be distributed according to N(m+R, +C ,,cr- ) where /_ R, Z-C< 0. We may rewrite this 

V S~ i i j J 

distribution as N(m+ /L-RjX 1ioL + 4- c ^ x 2 ia >cr ~ ) where for a given a the x lla (i = 1,2,..., ft) 



S9.6 IX. APPLICATION OF NORMAL REGRESSION THEORY TO ANALYSIS OP VARIANCE PROBLEMS 195 

are all zero except for the value of 1 corresponding to the row within which y a occurs, a 
and similarly the x 2 , (j 1,2,..., a) are all zero except for the value of J correspond- 
ing to the column within which y a occurs. 

The likelihood function for the sample y 1 ,...., y n la 

(a) 

and the sum of squares In the exponent of this likelihood function Is 
(b) 



V~j 

Now suppose we consider the hypothesis that the w* are all 0, This hypothesis H[(C.)"0] 

may be specified as follows: 

-OD < m, R^, C 1 < CD, cr* > o,(all 1 and j) 
-fl: 

(c) 

GJ : [ Subspace In A obtained by setting each C* 0. 

Maximizing the likelihood function for variations of the parameters InHwe find from 
8.5 that the values of m, the R.^ and C^ which minimize S are given by the linear equa- 
tions 

V V V 

- o, 

- 0, 1 - 1,2,...,] 

(d) -Zj 

j 



j " ' 

where / y denotes summation of all y . J~y denotes summation of all y In the 1-th 
4-i- J a <* *T7 a a 



denotes summation of all y Q in the J-th column, n^j is the number of y a falling 



in the cell at the intersection of the i-th row and J-th column,n^, n ij and 

n * - 21 n. .. It follows from 8.5 that the minimum of S for variations of the m, R, anil 
J j[ J-J x 



C In H is given by 



oo 



where 



IX. APPLICATION OF NORMAL REGRESSION THEORY TO ANALYSIS OP VARIANCE PROBLEMS 



(e) 



A - 



2 ^r~~ _ ^r~ 

77 a 4r a n~. 



n 



r. 
n^ n 



.1 



y a n i. n i. n n 



n a o 

9 



n i8 ' 



n r. 



^s 



1 o 



n r1 n .j o o 1 



n n 



.3 "is- 



^3 



n _ o 1 

8 








and A^ is the minor of / y n in A. Hence 
oo ^a 



oo 



Maximizing the likelihood function for variations of the parameters overu> we find that 
the maximizing values of m and the R^ are given by the r + X- equations resulting by set- 
ting A^ and all Ci equal to zero in (d) and deleting the last equation. Similarly, 






where A 1 is obtained by deleting the last s + 2 rows and columns from A with exception of 

the next to the last row and column. A is the minor of V y^ in A . 

Hence 



oo 



cr 2 -! 
uj n 



l oo 



It follows from Theorem (A), 8:3, that 



q i m ^ 



and 



t96 IX, APPLICATION OF NORMAL RBC51E33ION THEORY TO ANALYSIS OP VARIANCE 



are distributed independently according to i-lsata with n - r - s + 1 and a - 1 degrees of 
freedom respectively when H[(C *)(>] is true* The F ratio is therefore 



(f ) _ A oo 



which has the distribution h/_ - % / ^ _ Al %(F)dF when H[(C*)-0] is true. The reader may 

vs-i j, vn-r-s+i ; j 

verify that if m, * - 1 (all i and J) then n - rs and we have the complete two-way layout 
discussed in 9-2, and In this case the F ratio reduces to that given in (b), $9*2. 

The extension of the foregoing treatment to higjher order layouts is stralght- 
forward and will not be considered in detail. It is perhaps sufficient to note that in 
the case of higher-order layouts we would have several sets, say q, classifications, the 
u-th classification consisting of p u mutually exclusive -categories, such that each y a in 
the sample would belong to exactly one category in each classification. If we denote the 
mean effect (on y a ) of the v-th category of the u-th classification by I uy where 

5_I, 1V - o, u - l,2,...,q (or more generally several linear restrictions may be applied to 

v=H uv q PU 

I uv for each u) then the mean value of y a may be expressed as m + 3> *> ^v^uva wnere 

ul vT 
for each value of u, x uya (a - i,2,...,n) is unity for only one value of v and zero other- 

wise; the value of v for which ^ uva is unity being that corresponding to the category (of 
the u-th classification) within which y a falls. The problem of testing the hypothesis 
that I uy for the u-th classification (u-th classification effects ) are all zero amounts to 
setting up a determinant corresponding to A in (e) based on q classifications instead of 
2, and performing operations similar to those performed on A to obtain A OO , A 1 , and A 00 . 
The reader will find it instructive to work through the details of setting up A, q 1 , q 2 , 
and F for the case of a three-way layout when the hypothesis to be tested is that the main 
effects due to one of the classifications are zero. He will also find it profitable to 
treat the ordinary latin square as a three-way layout by this* method and show that the F 
obtained for testing the hypothesis of no treatment effects is identical with that given 
by (d) in 89. ^. The generality of this procedure should be carefully noted by the reader 
because not only can all of the results previously discussed in this chapter be obtained 
by this procedure, but tests for the existence of interaction between two or more classi- 
fications in incomplete or unbalanced layouts may be deduced by applying the procedure. 
9.7 Analysis of Covariance 

Throughout all of the discussion in $$9.2-9.6 we have assumed the mean value of 
the random variable in each case to consist of the sum of a general constant (which is the 
same for all random variables) and constants referring to rows, columns, treatments, in- 



196 IX. APPLICATION OP NORMAL REGRESSION THEORY TO ANALYSIS QF VARIANCE PPORT.FMS $Q.7 

teraction, etc. It frequently happens that there are practical situations which suggest 
that the mean value of the random variable should include linear functions of one or more 
fixed variates (see 8.2) in addition to the sum of constants of the type mentioned above. 
For example, if y,, refers to yield of wheat in a plot in the i-th row and j-th column of 
a two-way layout, not only should the mean value of y^. Include a general constant and row 
and column effects, but also linear effect of number of plants on this plot, say x, -. The 
mean value would then be of the form m + ax. . + Rj + C., with the usual conditions on the 

R 1 and C., The object of this section is to examine what modifications of 9.2-9.6 
* J 

should be made in order to take one or more fixed variates into account in the mean value 
of the random variables involved. 

Let us return to the two-way layout discussed In 9.2 and assume that the mean 

value of y, . depends linearly not only on m, R. and C . but also on a fixed variable X, .. 
* J -^ J ! J 

In other words, assume that the y, are random variables Independently distributed ac- 
cording to N(m+ax, _.+R.-fC,,cr : ), where^I R, = ^LC- = o. The question arises as to what 
1J 1 J i 1 j J 

forms the P- ratios take for testing the hypothesis that the C- are all zero or the hypothe- 

J 

sis that the R^ are all zero, when the/1 parameter space is the (r + 3 + i)-dimensional 
space for which -oo < a, m, R,, C. < +00, cr^ > o. The probability element of the y . . is 
exactly that given in (a), 9. 2, with y^. replaced by y^. - ax^.. Making this substitution 
In (b), 9.2, we see that the sum of squares in the exponent of this probability element 
(for any point in A) may be broken down into the following components: 

(a) ; 

where Y 1 . y, 4-y..-y *+y, with similar meaning for X 1 . ,and Y 1m y, ,-y, with similar mean- 
*-jiji**j ij 1*1,* 

ing for Y j, X^., X .. The first sum of squares on the right In (a) may be written as 
(b) 

where a 

Making the substitution (b) in (a) we obtain 5 sums of squares which when divided by a 2 
are (by Cochran f s theorem) distributed independently according to -/f-Iawa with (r-1)(s-1) 
-1, 1, r-1 , s-1, 1 degrees of freedom, respectively. 

Now suppose we wish to test the hypothesis H 1 [(C 1 0=o] which is specified as 
follows : 



+ X^(Y 1 .-aX 1 .-R 1 ) 2 4- ZjY ^ .-C.) 2 + rs(y-ax-m) 2 , 
1>J ijj 




89.7 PPLTrA>rTQW OP NQRMAT. BHOREaaTQll THEORY TO ANALYSIS OP VARIANCE PRnm.EMfl 1 q 



) -OD< a, m, R t , C, < oo , r 2 > 0, (all 
n: 



u>: I The aubapace In IL obtained by setting each C* - o. 

Maximizing the likelihood function for variations of the parameters lnJ% which 
la equivalent to minimizing S aa far aa variations of a, m, R*, C^ are concerned, we 
obtain 

(d) R 1A -? 1 . - fe., Cj^-Y.., - ^X.j, ^-y- a^x, a^ 



The aum of squares In the exponent of the probability element for any point In GJ (1. e., 
all C - 0) may be expressed In terms of the following components: 



(e) S^- UYi+Y) - aCX+X.)] + Lf.-aX) 2 * rs(y-ax-m) 2 . 



^ . 

J-* J 1*9 J 

Maximizing the likelihood function for variations of the parameters In a* amounts to mini- 
mizing 3 as far as m, a, R^ are concerned. We find 



- Y - 



By Theorem (A), 8.3, It follows that 



are Independently distributed according to x 2 -lawa with (r-i)(s-i)-i and 3-1 degrees of 



freedom, respectively, when H^tC^-O] Is true. Hence the P-ratlo for this hypothesis Is 



which has the distribution h g-1 f / r . 1 )( 3 .i ).i(?)^ when the hypothesis Is true. 

It should be noted that rsc^ and rso^ can each be expressed In terras of deter- 
minants (see ( g ), (8.2) as follows: 



1 98 IX. APPLICATION OF NORMAL REGRESSION THEORY TO ANALYSIS OP VARIANCE PROBLEMS 9.7 



rs oa- 




In a similar manner we may define hypothesis H 1 [(R 1 )-o] by replacing GJ - by R.^ - in 

A p Ag 

the specification of o>. We find that o^ remains the same but oj^ for this hypothesis is 
identical with that for H 1 t(Cj)o] after replacing . and X by Y^ and ^, respectively, 

The F-ratio for H.KR, )=o] is distributed according to h_, , , w 1X JFJdF. 
I L| >vr~ijv3~i/""i 

The constituents which are used in making up the F-ratios for testing the two 
hypotheses considered above may be set forth in the following analysis of covarlance table: 



Variation Due To 


Sums of Squares 


Cross Products 
(x) (y) 


Degrees 


of Freedom 


y 


X 


Rows 


?' 


?. 


13 1 ' 1 ' 


r 


- 1 


Columns 


^i 


F !j 


? >j? ' J 


s 


- 1 


Error 


& 


^j 


^ViJ 


(r-1 


)(s-i) 


Total 


'*>* 


^c^) 


S^)U lf5 ) 


rs - 1 



The results obtained for the case of one fixed variate may be extended in a 
rather straight forward manner to the 'case of k fixed varlates where k < (r-1 )(s-l ). Thus, 
if k fixed variates x^i, p i,2,.,.,k, are taken into account linearly 'in our two-way 
layout, W e would begin by replacing y^ . in the probability element in (a), 9. 2, by 



(y^j-2_a x ^.) and follow a procedure similar to that for the case of one fixed variate. 
Thus, in place of QX^^ 0X4 . aX. o aX in (a) we would have^_a X ,., 
n^-v /_aX. respectively, where the meanings of X , ., 3L 1 . , Xp.^ 

r r \J - r^ x^ x^^-J f^- f J 



Th 



are obvious. 



e reader will find it instructive to carry out the details in arriving at F-ratios for 



S9.7 IX, APPLICATION OP NORMAL REGRESSION THEORY TO ANALYSIS OP VARIANCE PRC 



testing hypotheses H^[(Cj)o] and H^KR^H)] which are k-fixed-varlate analogues of 
HjKfijH)] and HjKR^-0], respectively. 

The procedure which we have outlined for introducing fixed variates linearly 
into the mean value of the random variables in a two-way layout extends in a straight 
forward manner to three-way layouts, Latin squares, Graeco-Latin squares, and to incom- 
plete or non- orthogonal layouts of the type discussed in 9.6. We shall have to leave 
the matter of carrying out details as exercises for the reader. Because of the generality 
of 9.6 it is perhaps worth while to remark, without going through the details of proof, 
that if one fixed variate is introduced linearly into the mean value of y a , which would 
amount to replacing m by m + ax a in (a), 9.6, the effect on the determinant A as defined 
in'(e), 9.6, would be to insert another row and column into A as second row and second col- 
ujnn, the r + s + 5 elements of this row and column being 



1 I c 3 

reading left to right in the row, and reading top to bottom in the column. This augmented 
determinant has its own A . A, A 1 (see 9.6) which are obtained by operations analo- 
gous to those used in obtaining A QQ , A 1 , A f Qo from A in 9-6. The extension of our pro- 
cedure to the problem of linearly taking into account k fixed variates in the mean value 
of y a in 9.6 is straightforward and will be left to the reader. 



CHAPTER X 
ON COMBINATORIAL STATISTICAL THEORY 

Many problems in distribution or sampling theory in statistics reduce to combin- 
atorial considerations. For example, the derivation of the binomial distribution (3.11) 
depends on the determination of the number of distinct orders in which x p ! s and n-x q ! s 
can be multiplied together, and similarly the derivation of the multinomial distribution 

(3.12) depends on the enumeration of the number of distinct orders in which n. p. f s, 

JL k ] ] 

n o Po*****^ PI^ |S can b multiplied together where >_ p, -i , y~n, n. A majority of 
d d * K ii K i^"i - 1 

the combinatorial problems of the drawing-balls-from-urns variety involve direct applica- 
tions of permutation and combination formulas, which in turn are often simply expressible 
in terms of binomial and multinomial coefficients. -The theory of sampling from a finite 
population (J*.3) is based on the use of binomial and multinomial coefficients and their 
use as weigjhts in various averaging operations. The sampling theory of order statistics 
($l*.5) is a direct application of the multinomial distribution law to probability functions 
of continuous random variables. 

The object of the present chapter is to discuss some of the more complicated dis- 
tribution problems in combinatorial statistical theory which are of particular interest in 
applied mathematical statistics. More specifically, we shall present some results on the 
theory of runs, the theory of matching and its application to testing Independence in con- 
tingency tables, Pearson's original ^-problem, and Inspection sampling, 

10.1 On the Theory of Runs 

Suppose we have an arbitrary sequence of n elements, each element being one of 
several mutually exclusive kinds. Bach sequence of elements of one kind is called a run. 
The simplest case is that in which there are two kinds of objects. We shall consider this 
case in detail, and also present briefly some results for the case of several kinds of 
elements. 

10,11 Case of Two Kinds of Elements 

Suppose we have n 1 a f s and n g b f s (n^n -n). Let r 1 , denote the number of runs 
of a ! s of length j and r, denote the number of runs of b f s of length j. For example, if 



X. OH CQMBIHATQRIAL STATISTICAL THBflff 



the arrangement Is 

aaabbaabaabbab , 

then r n - 1, r 12 - 2, r 15 -1 , r gl - 2, r 22 - 2, and the other r f a are zero. It should be 

observed that ^.Jr- 1 - n. , the number of a's, and also 2_ Jr j - n . Let r. - 
J j j J 



d that ^.Jr- 1 - n. , the number of a's, and also 2_ Jr j - n . Let r. - 2r;.4 

V" J j j J J 3 

- Z_ r o< denote the total number of runs of a's and b's, respectively. For a given 

j r 

set of numbers r--, r 10 , r-.,... there are ~ f- | ' - - r ways of arranging the r- 
11 12 ^ r n* r i2 ........ r mi rj ] 

runs of & ! s. And for a specified set, r ,, there ape n; i - - - - r **&* of arrang- 

2 



** p 



ing r g runs of b's. It is clear that r 1 cannot differ from r 2 by more than unity, for if 
it did two runs of one kind of element would have to be adjacent, but this is contrary to 
the definition of runs. If r 1 - r g , a given arrangement of runs of a f s can be fitted Into 
a given arrangement of runs of b f s in two ways, either with a run of a f s first or with a 
run of b f s first. We define the function F(rj,r 2 ) to be the number of waya of arranging 
r,j objects of one kind and r g objects of another so that no two adjacent objects are of 
the same kind. Clearly, 

(a) F(r lf P 2 ) - if Ir r r 2 l > 1 

- i if Ir r r 2 l -1 

- 2 if r, - r g . 

The total number of ways of getting the set r^ (i - 1,2; j - 1,2,...,^) is 

rj r 1 

2 -T P(r 1f r 2 ) . 



nl 

Since there are \ possible arrangements of a ! s and b f a, the joint distribution func- 
n r n 2 . 

tlon of the given set r.. (all possible arrangements given equal weight) Is 




Now let us determine the joint distribution of the n^. To do this we 

r J J 

with respect to the r .. We wish to sum i ^ - r over all partitions of n , V e.. for 
n^2J r 21 ....r . 2 

all r g i such that Z-3r 2 ^ - n g and 2l.r 2 J - r 2 . In order to do this, consider 



M-x) 2 



t 

" 



202 _ X. ON COMBINATORIAL STATISTICAL THEORY _ t10.11 

n 2 
It ia evident that the coefficient of x in the initial expression is the sum 

r^l n 

- - r that we desire. The coefficient of x in the final expression is the co- 



Z 



efficient of the term for which r +t n , i. e.,t - n -r~. Therefore the desired sum is 

(r -un p -r ){ (n p -i).' 22 22 

(rg-i(ng-r g )l " (r 2 -i)i(n 2 -r g )] Hence > the ^ oint Distribution function of the r^ and 

r z l3 

V (n 2 -l)J 

(c) P (r ij' r 2> ~ r n l...r 1ni l (r 2 -iJJ(n 2 -r 2 Ji ' P(r l ' 

Now we sum out r. By (a) we get 




}J F(r i 'V " (rajKn-r*! )I * 1 + (r-i jJdi-r )J ' 



r, .' 



This gives ua the joint distribution function of the r 1 ., 

__ r,! _ (n g +i)l 

(d) p(r 1 .)= T I - r . T / _ - ^ i 

'J r i1' r ! ---- 1n * ! 1 




with a similar expression holding for the joint distribution of the r^. 

Another important distribution is the joint distribution of r 1 and r g . We get 
this by summing out the r 1 . in (c), just as we summed (b) with respect to the r 2 ^ to 
tain (c). The result ia 



(e) P(r v r 2 ) = ( P -I JK-r )J ' (r-i )1 (n-r) ' p(r i ' 





Finally, we find the distribution function of r 1 by aumming (e) with respect to r 2 , ob- 
taining 

(n^i). 1 (n 2 +i)J I nl 
(f) p(r i )as (^-1)1(^-^)1 r 1 . l (n 2 +i 

The distribution of the total number of runs of a f s and b ! a is of considerable 
interest in applications of run theory. It ia uaed aa a teat for randomness of the ar- 
rangement of a 1 a and b'a; the amaller the total number of runa the more untenable the hy- 
potheala of randomneaa. Let u ^ + r-, the total number of runa. To find the distribu- 
tion of u we must sum (e) over all points in the r^ , r 2 plane for which u - rj + r 2 . We 
have two caaea, (1) u 2k (even) and (2) u - 2k- 1 (odd). To find the probability that 



S10.11 X. ON COMBINATORIAL STATISTICAL THEORY 205 

u - 2k, we note there is only one point In the r-,r 2 plane for which u r- + r~ 2k 
where F(r 1 ,r 2 ) J 0, and that point Is (k,k). When u - r 1 + r g 2k - 1 there are two 
points at which F(r^ 9 r 2 ) J o, namely (k, k-1) and (k-1, k). Hence from (e) we have at 
once (using the notation m C n O) : 

n.-i n -l / n.+n 9 
Pr(u-2k) = 2 ( k 1 ) ( ^ ) /( 1 2 } 
K i / 1 

(g) 

Pr(u=2k-l ) - ^^ 




n. +n- 



This distribution was derived by Stevens and also by Wald and Wolfowitz** and 

Q* 

the function Pr(uu ! ) >" p(u) haa been tabulated by Swed and Eisenhart*** for n 1 n g 
(n^m, n -n In their notation) from the case n 1 - 2, n g 20 to n 1 - 19, n g - 20 for 
various values of u ! . 

Another probability function of considerable interest in the application of the 
theory of runs is the probability of getting at least one run of a ! s of length s or 
greater or in other words the probability that at least one of the variables r. , r. ,., 
r is+2'"'' in the di3tribution (d) ia ^ 1 Mosteller has solved this problem for the 
case n 1 - n 2 n. To obtain this probability we put n 1 - n 2 - n in (d),thus obtaining 




and sum over all terms such that at least one of the variables r 1 , r 1 - , . . . ^ 1 . We can 
accomplish the same thing by summing over all terms such that all of these variables are 
zero, and subtracting the result from unity. To do this we must sum the multinomial co- 
efficient In (h) over all values of r^,...,^ such that r, - r^ .. ... r- 0, 



W. L. Stevens, "Distribution of Groups in a Sequence of Alternatives", Annals of 
Eugenics, Vol. IX (1939). 

A. Wald and J. Wolfowitz, "On a Test of Whether Two Samples are from the Same Popula- 
tion", Annals of Math. Stat . , Vol. XI (19^0). 
***Prieda S. Swed and C. Eisenhart, "Tables for Testing Randomness of Grouping in a 



Sequence of Alternatives", Annals of Math. Stat., Vol. XIV 
****Frederick Mosteller, "Note on an Application of Runs to Quality Control Charts", 
Annals of Math. Stat., Vol. XII 



2pl X. ON COMBINATORIAL STATISTICAL THEORY 

> jr 1 , - n, > r 1 , - r 1 , and then aura with reapect to r 1 . It will be noted that the aum 
of the multinomial coefficients under theae conditiona ia given by the coefficient o'f x 11 
in the formal expanaion of 

v s-i r l r l s-1 r ! V" r i~ ut t 

(X+X+...+X ) -X (1-X ) / ( , )X . 

' o r r 1 

which ia 

Jj 1 r, n-j(a-i)-l 



The deaired probability of at least one run of length a or greater ia therefore 



(i) Pr(at leaat one of r.^i, J^s) - 1 



J.1 1 r- n-l-j(s-i ) n+1 
: SZt-iHt^X r ., )( r 
r. Jo J P 1 ] r i 

! - 

2n 



the aummation on r 1 extending from ^/>t^ the largest integer rti^' Applying similar 
methoda to each of the multinomial coefficients in (b), Moateller has shown that the prob- 
ability of getting at least one run of a ! s or b f s of length s or greater is 



2 , 




(j) Pr(at leaat one of r, j or r 2j > 1 , j ^ a) = 1 - A/r) 

where 

r 1 n-i-j(s-l ) 
( j )( r -l 

the r 1 aummation being aimilar to that in (i). Moateller has tabulated the smallest value 
of a for which each of the probabilities (i) and ( j) is ^ .05 and- .01 for 2n - 10, 20, 30, 
JK>, 50. 

In order to indicate how to find moments of run variables let us consider the 
case of r 1 . We ahall firat find the factorial moments E(x^) where 

x< a >- 

for they are earaier to find then ordinary momenta in the preaent problem. Prom them the 
ordinary momenta may be found aince E(x^ ') ia a linear function of the firat i ordinary 
momenta. Letting i - 1,2,..., a, we obtain a ayatem of a linear equations which may be 
solved to obtain the ordinary moments as linear functions of the factorial moments. 



$10.12 X. ON COMBINATORIAL STATISTICAL THBQRY 303 

We have 



L 1 

In order to evaluate (k) we use the following Identity: 



m T" ,_ ...ft! , ... - Bl ______ (A+B)J 

1 ' - (Ofl)I(A-C-l)J 1J(B-1)J ,(C+B)J(A-C)! 

which follows at once by equating coeff Iclenta of x In the expansion of 
(m) 



Therefore we have upon substituting p(r 1 ) from (f ) Into (k), simplifying, and using (1) 



-- _ 




- *"2 T " (n r a)Jn 2 J 
Prom this result we find 




(n +1 )n. 
n -> 



A similar expression holds for 

If the.a's and b's are regarded as elements In a sample of size n from a bi- 
nomial population In which p and q represent the probabilities associated with a and b, 

respectively, then n- , the number of a f s, Is a random variable distributed according to 

n 1 n 
the binomial law ^C.. p q . The probability laws analagous to (b), (c), (d), (e), (f ) 

n ii 

when n 1 Is regarded as a random variable In this manner are simply obtained by multiplying 

n i n 2 
each of these probability laws by C n p q . 

10,12 Case of k Kinds of Elements 

The theory of runs has been extended to the case of several kinds of elements by 
Mooi*. If there are k kinds of elements, say a 1 , a 2 ,-. ..,8^, denote by r^j the number of 
runs of a^ of length j. Let r, be the* total number of runs of a^ Mood has shown that 

*A. M. Mood, "The Theory of Runs", Annals of Math. Stat., Vol. XI (19^0) . 



206 X. ON COMBINATORIAL STATISTICAL THEORY $10.2 

the joint distribution law of the r^ is given by 

r. , A V ,, r n , / nl 

IT,*) " I I r^ ^ r F(r. ,r ,...,ivj / ? i 



(a) p(r 1;) ) - FT rii ; r t,,, r ^ ' FdV^,...,^) / n< in ^| ; , , 



where P(r 1 ,r , . . .,r^) is the number of ways r 1 objects of one kind, r 2 objects of a second 

kind, and so on, can be arranged so that no two adjacent objects are of the same kind. 

r l r 2 ^ 
Is the coefficient of x 1 * 2 Xj c in the expansion of 



k r- -i r -1 n- 

(b) (xX+.- * 



The argument for establishing (a) is very similar to that for the case of k 2 and will 
not be repeated. Mood showed that the joint distribution function of r- ,r , . . .,r, is 

I c. 1C 

given by 

(c) p(r 1 ,r 2 ,...,r k ) 




i i i ._ .1 *\*i**o'**lr / / n '-n TV 

li '^-i^ 12 K / n^ng r^. 



which we state without proof. Various moment formulas and asymptotic distribution func- 

* 
tions have been derived by Mood in the paper cited. 

If instead of holding ^ ,n g , . . .,11^ fixed in the run problem for k kinds of 
elements, we allow the n f s to be random variables with probability function Tf(n 1 ,n g ,..,nj c ) 

(e. g., the multinomial distribution with 21 n i**n), the run distribution functions (a) and 

1 1 
(c) would simply be multiplied by r^n^n^...,!^). 

10.2 Application of Run Theory to Ordering Within Samples 
Suppose 2n+1 (x l > x 2 >-*-> x 2n+l ' ls a 3am P le from a population in which x is a 
continuous random variable. Let x be the median value of x In the sample. Let each 
sample value of x < x be called a and each sample value of x > x be called b. There are 
n a f s and n b f s in the sample, ignoring the median (which is neither). Now suppose we 
consider all possible orders In which the sample x's could have been drawn (ignoring the 
median in each case). It is clear that all of the run distribution functions (b), (c), 
(d), (e), (f) are applicable, for n 1 * n 2 n, to this aggregate of possible orders of the 
x*s (i. e. a ! s and b f a) in the sample. If there is an even number, say 2n, items in the 
sample, we can take any number between the two middle values of x in the sample as a num- 
ber for dividing the x ! s Into a ! s and b ! s, and our run theory is immediately applicable <to 
this case with n 1 = n^ n. In general if in a sample of size kn + k - 1 we choose the 
(n+1 )th, (2n--2)th, (3n+3)th, . . . .,(k-1 )(n+1 )th values of x in increasing order of magnitude 
as points of division, and let all x's less than the (n+1 )th x be denoted by a 1 , those 



110. g X. ON COMBINATORIAL STATISTICAL THEORY 207 

"W 
between the (2n+2)th and (tesff )th by a 2 , and 30 on, we then reduce our sample to n a. 'a, 

n a 2 ! s, ,n a^'s- Ignoring the k - 1 x ! s used for division points, it is clear that 

run theory for k kinds of objects is applicable to the aggregate of all possible orders in 
which sample x ! s could occur (ignoring the x f s used for division points). The points of 
division can, of course, be taken so as to yield an arbitrary number of a 1 ! s, a 2 'a, etc. 
By classifying the values of x in a sample into a f s and b f s (or more generally 
into a 1 'a, a 2 ! s, . . . ,8^*3) and using the theory of runs we have a basis for testing the 
hypothesis of randomness in the sample as far as order is concerned. The more commonly 
used tests of the hypothesis of randomness based on run theory are: 

(1) Number of runs of a's,for which the distribution is (f), 10. n. For given 
values of n^ and n 2 , the test consists in finding the largest value of r 1 (the 
number of runs of a f s), say r, for which Pr(r 1 ^r^ > ) ^ , e. g., for - .05. A 
similar statement may be made concerning runs of b f s. 

(2) Total number of runs of a f s and of b f s having distribution (g), 10.11. Again, 
the test consists in finding the largest value of u, say u, for which 

Pr(u u) j^, for given values of n 1 and n 2 . 

(3) At least one run of a f s (or b ! s) of at least length s, for n 1 n 2 = n, based 
on the distribution (1), 10.11. The test consists of finding the smallest 
value of s for which probability (1) is < . 

(k) At least one run of either a f s or b f s of at least length s, for n 1 - n ? - n, 

based on the distribution (j), 10.11. The test consists of finding the small- 
est s for which probability ( j) is < . 

The distribution theory of each of these tests has been determined under the 
assumption that the hypothesis of randomness is true, with a view to controlling only 
Type I (see 7-3 ) errors. Type II errors for these tests have never been investigated, 
i. e., probability theory of the testa when some alternative weighting scheme (other than 
equal weights) is used for the different possible arrangements of a ! s and b f s. 

It should be noted by the reader that the theory of runs developed in 10.11 is 
not applicable to the following type of problem of reducing a sample to two kinds of ele- 
ments: Suppose x-,x 2 ,...,x are elements of a sample from a population with a continuous 
distribution function. Consider an arbitrary order of these n x ! s, and between each suc- 
cessive pair of elements write a if the left number of the pair ia smaller than the right 
and b if it is larger. We then have reduced the sample to n - 1 a'a and b f s. We may de- 
fine runa of a'a and b's as before, but the theory of arrangements of the a 1 a and b ! a as 
defined from the corresponding arrangements, and hence the distribution theory of runa of 
this type, is an unsolved problem in combinatorial atatiatica. 



gpfl X. ON COMBINATORIAL STATISTICAL THEORY M10.5. 10.^1 

10.3 Matching Theory 

A problem which frequently arises In combinatorial statistics is one which may 
be conveniently described by an example of card matching. Suppose each of two decks of 
ordinary playing cards is shuffled and let a card be dealt from each deck. If the two 
cards are of the same suit let us call the result a match. Let this procedure be contin- 
ued until the entire 52 pairs of cards are dealt. There will be a total Dumber of 
matches, say h. Each possible permutation of one deck compared with each possible per- 
mutation of the second deck will yield a value of h between o and 52, inclusive.. There- 
fore If we consider all of these possible permutations with equal weight, we inquire as 
to what will be the distribution function of h in this set of permutations. Similarly if 
we consider three decks D I , D 2 , and D, of cards to be shuffled and matched we would have 
triple matches and three varieties of double matches. A triple match would occur if the 
three cards in a single dealing from the three decks were of the same suit. As for 
double matches, they would occur between decks D I , D 2 , between D 1 , D, and between D g , D,. 
The problem arises as to what will be the distribution of triple matches and of the three 
varieties of double matches. 

Extensions of the problem to more than three decks, to decks with arbitrary 
numbers of cards In each suit and an arbitrary number of suits suggest themselves at once. 
In this section we shall present some techniques for dealing with this problem without 
attempting to be exhaustive. It will be convenient to continue our discussion in card 
terminology, for no particular advantage is gained Hi introducing more general terminology. 
The generality of the results for objects or elements other than cards Is obvious. 

10.3V Case of Two Decks of Cards 

Suppose we have a deck D T of n cards, each card belonging to one and only one 
of the k suits C^, Cg,...,^. Let n^, n l2 ,...,n 1k (^n^-n) be the number of cards be- 
longing to C 1f Cg,...,^, respectively. Let D 2 be another deck of n cards, each card be- 
longing to one and only one of the classes C 1 , Cg,...,^. Let n gl , n 22 ,...,n 2k 
(2l n 2i" n ) k the number of cards In D 2 belonging to C t , C 2 , .. .,0^, respectively. 

The problem is to determine the probability of obtaining h matches under the 
assumption of random pairing of the cards. In other words, we wish to find the number of 
ways the two decks of cards can be arranged so as to obtain exactly h matches. Dividing 
this number by N, the total number of ways the two decks can be arranged, we obtain the 
probability of obtaining h matches under random pairing. The value of N. is simply the * 
total number of ways the two suits can be permuted, and is given by the product of two 
multinomial coefficients: 



{10.31 X. ON COMBINATORIAL STATISTICAL THEORY 209 



(a) 



To determine N(h), consider the enumerating function 

n 
(b) 



where <5 1 . - 1 , If 1 - j, and o,.if 1 / j. We associate the auxiliary variables x v ,x 2 ,..., 
x k with the suits C^, C^...,^ respectively of the first deck, and the auxiliary vari- 
ables y 1 ,y 2 ,.. . ,y k with the corresponding suits of the second deck. 4> la the product of 

n Identical expressions, each expression consisting of the sum of k 2 terms, each term 

d i1 6 
being a product of an x and a y. The term xye J In any one of the n factors corres- 



ponds to the event of a card In suit (L of the first deck being paired agrinst a card In 
suit C. of the second deck. If 1 j we have a match, and e 6 occurs as a factor. Now 
suppose we pick a typical term In the product given In (b). Such a term would be of the 
form 

"Ml 6 S^ J 6 

(c) (X, y 1 e 1 1 )( Xl y. e 2 2 ).... 

1 1 J 1 X 2 J 2 

This general term corresponds to the event of n pairings as follows: a pairing between 

(L of D- and C. of D_; a pairing between C, of D. and C^ of D ; ....; and a pairing be- 

IT ' JT ^ X 2 J 2 

between C i of D I and C, of D 2 . Now If the compositions of D I and D 2 are specified as 

n *^n 
n n , n 12 ,...,n lk and n gl >n g2 , .. ->n 2k , respectively, then It follows that the only terms In 

the expansion of (b) which have any meaning for pairings of these two decks of cards are 
those of the form 

hA n. . n 19 n. v n 91 n^^ n . 

n ^ ' ^ K 2 2 dK 



where h Is an Integer such that h n. It should be noted that such terras may not 



Various authors have considered various enumerating functions, but the one which we shall 
use was devised by I. L. Battln, "On the Problem of Multiple Matching", Annals of Math. 
Stat . 9 Vol. XIII (19^2). Battin's function is relatively easy to handle and has the ad- 
vantage of representing the two decks of cards symmetrically in the notation and opera- 
tions. It extends readily to the case of several decks of cards. The reader should re- 
fer to Battin f s paper for a fairly extensive bibliography on the matching problem. 



210 X. ON COMBINATORIAL STATISTICAL THEORY {m *1 

exist for some values of h between o andU, which means that it is not always possible to 
have any arbitrary number of matches for given deck compositions. The tenn given in (d) 
corresponds to some arrangement of the two decks of cards such that there are exactly h 
matches. In general there are many such terms. Therefore, if we expand to and determine 
the coefficient of the expression given by (d) we obtain the value of N(h), the number of 
ways in which h matches can occur. To simplify our notation let 1^(4)) denote the opera- 
tion' of taking the coefficient of expression (d) In the expansion of 4>. We may rewrite <t> 
as 

(e) to 

Expanding we have 

n-h 




Expanding the expression in [ ], we have 



Inserting this expression Into (f), and expanding (XI x, ) 8 (51 y*) 8 (Zl^y*) 11 "? we find 

x 1 x 1 




(h) N(h) Kfr(to) - s VftM g Mg 9 

where 

V- (gl) 2 (n-g)l 

M \ T y . 

8 Y-lfc 

8 1 TT [<n 11 -8 1 )I(n 2l -B 1 )Ja 1 ll 

the sunnation extending over all positive Integral (or zero) values of the s, such that 

\r 

^9j n-g and H II -S I ^ 0, n^-Sj, ^ 0, i - 1 ,2, .. .,k. The probability P(h) of obtaining 
h matches is therefore N(h)/N, where N is given by (a). 

For the case k - 2, the probability of h matches reduces to the following ex- 
pression 



n n 



where 1 - ^(n 1 ^ngg-t-h), J - -^(n^+ngg-h). Dhless h la such that for given valuea of 
and n 22 , n n + (n 22 ~h) are positive even Integers or o, then P(h) - o. 



$10.31 X. ON COMBINATORIAL STATISTICAL THEORY 21 1 

Greville* has given the distribution of h in a slightly different form and by 
another method. 

Moments of the random variable h can be found directly from the enumerating 
function <(>. We have 



(k) - ] 

JL1 i 1 A1 i P A1 i IT 1Ji p 1 iA pp 1Ar 
coefficient of x 1 x 2 ...x k ^r 1 y 2 y k 

in the expansion of wi 



The reader will find it instructive to carry out this operation for p 1,2, and 
find that 

f n i1 n pl 
, "IT 21 ' 

(1) a 2 - E(h 2 ) - [E(h)] 2 



It should be noted that our results can be readily extended to the case of two 
decks of cards in which the total numbers of cards are different or where one or more of 
the suits may have no cards at all. To consider, the case of unequal total numbers of 
cards, say n 1 in deck D 1 and n g in deck E^ where, without loss of generality, we can let 
n 1 > n 2 , we simply add to D 2 n 1 - n 2 dummy cards, and consider them as a new suit. We 
would thus have k + 1 suits of cards, where the (k+1 )-thsuit is empty in DJ, i. e. 
n 1lc+1 - o, n ^ j - n 1 - n 2 . The procedure from here on is just as before. The case in 
which some of the suits are empty in one or both decks is taken into account by specifying 
the values of the corresponding n^ or n gi as in expanding <t> and collecting terms. 

The reader should note that if a score s^ . is assigned to a pairing in which the 
D. card belongs to the 1-th suit and the D 2 card belongs to the j-th suit, then one can 
find the distribution of the total score T in n pairings (i. e.,when the two decks are 
paired against each other) under the assumption of random matching, by replacing 6^* by 



T. N. E. Greville, "The Frequency Distribution of a General Matching Problem", Annals 



of Math. Stat., Vol.- XII 



212 X. ON COMBINATORIAL STATISTICAL THEORY 



8,1 



In (b) and finding the coefficient of 



^22 J*2k 



In the expansion of the resulting expression. The procedure for finding E(T) and a| and 
higher moments Is the same as that' for dealing with the moments of h with s^, substituted 
for 4 ir 

10*32 Case of Three or More Decks of Cards 

Suppose we have a third deck of cards, say D,. Let the numbers of cards be- 
longing to suits C^ Cg,...,^ be n , n- 2 ,..,n* k . A triple match has been defined as one 
In which the triplet of cards (one from each deck) are of the same ault. A double match 
between D 1 and D g will occur when the cards from D 1 and D 2 In a triplet are of the same 
suit but different f!om the suit of the card from D, In the triplet. Double matches be- 
tween Dj, D. and D 2 , D. are similarly defined. If In the complete set of n triplets from 
the three decks we let h 125 be the number of triple matches, h 12 the number of double 
matches between D^ D g , with similar meanings for h 15 and h 23> we may obtain the distribu- 
tions and moments of the h f s from the following enumerating function: 

*ljk 6 123 * 'VlS * 4 ik 6 13 + V25 - 
(a) 




where 6* ^ -1, if 1 - j - k, and otherwise. The remaining 4 'a are defined as for the 
2-deck problem. By following an argument similar to that for the 2-deck problem, It will 
be noted that the number of ways in which the three decks of cards can be permuted so as 
to obtain h 12 - triple matches, and h 12 , h 15 , h g5 double matches between D^ D 2 ; D 1 , D,; 
and Dg, D^, respectively, is given by the coefficient of 

where 

Q x^ Xg . tX^ . y^ y 2 7^ ^\ z % *** z k 



in the expansion of t. 

This coefficient and hence the joint probability law of the h's is rather cum- 
bersome and will not be given here. As in the case of the 2-deck problem we may find 
moments and joint moments of the h ! s by performing differentiations on 4> with respect to 
the 6's, that Is, 



S510.4. 



COMBINATORIAL STATISTICAL THEORY 



r. r 2 r r, - , 
(c) ^23*12*1 3*23 * - J ^Coeff . of Q in 

where 



N - 



(nJ) 



The mean values of the h f s are the following: 



6 f a 



0\ 



n 



B(h 18 ) - 



with similar expressions for ECh^) and E(h 2 ,). The reader may refer to Battin'a paper 
for second moments. 

The extension of our technique to the problem of determining the distribution 
and moments of the numbers of hits for various orders of multiple matching when more than 
three decks of cards are involved is immediate. The extension of the results to the case 
of decks of unequal numbers of cards, empty suits, etc., when three or more decks are con- 
sidered, is straightforward. 

10.4 Independence in Contingency Tables. 

In this section we shall consider the problem of testing the independence of a 
two-way classification on basis of a sample of n elements, each element belonging to one 
and only one the classes A 1 , A 2 ,...,A r and to one and only one of the classes B^ , B 2 , 

...,B a . In the sample, let n, * be the number of elements belonging to A,, and B*. Let 
s r X J J- J 

n . n .,, 3>~n 4 4 - n. The number of elements belonging to A< is ix. and 
J 11. 



the number to B 4 is n 4 . The problem is to test. the hypothesis of the independence of the 
j J 

A and B classification. We shall consider two approaches to this problem. The first 
(10.U1 ) is a pure combinatorial approach based on partition theory in which the set of 
all possible partitions of n into rs components n^^ satisfying the marginal conditions 
listed above are investigated. The second approach (10.1*2), which is Karl Pearson 1 s orig- 
inal treatment of the problem, is an application of the theory of sampling from a multi- 
nomial population consisting of the rs classes (A^B^) i - i,2,.. f r; j - 1,2,.., a. 
10.41 The Partitional Approach 



In this section we shall consider the problem of determining the number of ways 
of partitioning the integer n Into rs integers (or zero) n (i-1,2,...,r; J-1 ,2, . . .,a) 



ON GQMBIMATQRIAL STATISTICAL THEORY 



41Q.JH 



such that 5.1^. - n^ and 2L_njj n . are fixed. The technique discussed In 810.3 can 
be extended so as to accomplish this enumeration. We shall then find the mean values of 
certain functions of the n 1 , over this set of partitions. 



We may represent the n, n , n 



, 



n in the following contingency table : 
Total 



(a) 



n n n 12 . . . . n 1a 


n i. 


n 21 n 22 .... n 2a 


n 2. 








n r , n r2 . . . . Hpg 


n r. 


Total n , n . . . . n _ 

1 . c. .9 


n 



Consider the enumerating function 



(b) 



6 4 . n. 



1-1 



of which 



vhich is the product of n factors, n- of which are 2-X^e > 

1 ' 1 J 

and so on. A typical term in the expansion of this product of n factors is of the form 



(c) 
where 



) . (x. 



la the number of tlmea 



n l2 9 12 



Is taken from the 



n is*ls 



), 



ru 
* 



factors 



.. 

J 



" n 



Z"s~ 
n^ 1 n ^ 

1 
corresponds to one way of partitioning n Into the set n^.* so that 2L^ 4 " n 1 

Z- 



To find the total number of ways of partitioning n Into the given set 
we must determine how many individual terms in the expansion of (b) are identical with 
In other words we are to find the coefficient of 



(d) 



T-T n 
J ' Xj' 



in the expansion of (b). 

JEL 6 n n i 
Expanding each of the terms (2_x^e J ) , 1 - l,^,.^r, by the multinomial law 

and multiplying the results and taking the coefficient of the expression (d), we find at 
once that the number of partitions of n into the sets of values nu ., subject to the mar- 



ginal conditions 



- n 



j, 



, Is 



OK CQMBIMATQRIAL STATISTICAL THEORY 



The total number of ways of partitioning n, subject to the marginal conditions mentioned 

above, is 

nl 



Therefore the probability of partitioning n into the particular set of values n^ ^assum- 
ing all ways of making partitions (subject to the marginal conditions) equally likely, is 
given by the ratio of expression (e) to expression (f ). 

The moments of the n^j may be found directly from the probability law of the 
Consider first the problem of determining the h-th factorial moment of a particular n^ 

say n^. We have 

*> 




where 2_ denotes summation over all values of the n^* subject to the usual marginal con- 
ditions. Now when h - 0, we know that the right hand side of (g) is simply the sum of the 
probability function of the n^, over all possible values of the n^ and is therefore 
unity, which amounts to the statement that 



(h) 



Now the numerator on the right hand side of (g) may be written as 




where nJ - n, for all 1 except i - a and n f (n -h) and nJ j n^ for all i, j ex- 

1 1 GL Qt XJ J. J 

cept for 1 a, j p and n^ = n a -h. Now perform the summation Indicated in (i) over 

all values of the nJ . subject to the conditions ZLnJ . nJ and / nJ . = n f , where 
ij v j ij i. ^ ij .j 

n ! 4 n 4 except when j - p and n f a n _-h. It follows from (h) that the value of this 
j j -p v 



sum is 



Therefore we have 



X, ON COMBINATORIAL STATISTICAL THEORY *1Q.J1 




y- 

It Is clear that h must be less than each of the numbers n and n ~. For h = 1 and 2 
we have 

n n A 
Edi^)- -^- 

(k) 

(2)J2) 



Hence 
(1) 




n 



By a similar argument one can find joint factorial momenta of two or more of the 
For example, 



(h) (g) 
n .r5 n .<5 a -7 



A similar expression holding for ot ^ y, ft = <5. The restrictions on the size of g and h 
are obvious. These moments can also be found directly from the enumerating function <f> by 
carrying out appropriate differentiations on the e^, then setting the e f s o and col- 
lecting appropriate coefficients. 

The criterion which Karl Pearson def 'ned for testing the hypothesis of row- 
coluran Independence In r by s contingency tables Is defined as follows 




which Is a quadratic form In the n^. It should be noted that % Is simply the sum of the 
squared differences between each n^ and Its mean value (under the assumption of Inde- 
pendence or "randomness" ), each squared difference weighted Inversely by the mean value of 
the corresponding n^,. This Inverse weighting scheme suggests Itself fairly readily In 
the Pearson approach to be considered In 10.1*2. The mean value of y may easily be 
found by making use of formulas (k) and (1), and Is 



X. ON COMBINATORIAL STATISTICAL THEORY 217 



(o) 

By using formulas ( j) and (m) for the appropriate values of g and h, the variance and 

P 
higher moments of x ma y "bs found. 

10.42 Karl Pearson's Original Chi-Square Problems and Its Application to 
Contingency Tables, 

Suppose FT Is a multinomial population In which each element belongs to one and 
only one of the classes C., ,C 2 ,. . .,0^. Let p 1 ,p 2 , . ..^ (zIpj-1) be the probabilities 
associated with C^C^,...,^ respectively. In a sample of size n let n 1 ,n g , .. ,,11^ be the 
numbers of elements falling Into (^ ,C 2 , ...,0^. respectively. We have seen (53.12) thAt the 
probability law of the a. Is 

(a) n! n n i n 2 J\ 

n^iigJ...!^] P 1 P 2 '"Pk ' 

It was shown In 3.12 that E(n^) np^. In view of the Central Limit Theorem (4.21 ) It 
Is clear that the limiting distribution aa n * oo of each of the quantities 

n i 
( " 



is N(0,i). Now let us investigate the limiting joint distribution of the set 

(n i -np 1 ) 



Since X ** on ^Y k - 1 of the x, are functionally independent. It is sufficient to 

1 ^ A 

consider the limiting joint distribution of the first k - 1 of the j^. The m, g. f . of 

^1 '^2* " * * *^k~ 1 

Zle^i v Zle^-np^/ln n , 

(b) <t> - E(e ) = Z_(e rt n l^TPi > 

nnj. j_ l 



/ 1 j 
(PT 

Expanding each of the exponentials in ( ) and taking logarithms, we have 



X. ON CCMBINATOBIAL STATTgPTnAL THEORY 



810.42 



log 4> - - 



n log (i + 2_ -^ + 2_ - 
1 Vn i 2n 






Lira d> - e f 
oo 



Therefore we have 



(d) 



where A 1 ^ - PJ^JJ " p l p j' 1 'J " i2,...,k-i, where dj, - 1, i - j, and o, i ^ J. Making 
use of the multivariate analogue of Theorem (C), 2.81, it follows that the limiting prob- 
ability element for the distribution of the x is 



where 



^^ . 



" 1 



A ij x i x j 
J 



. It may be readily verified by the reader that 



and hence 
(g) 



~ 



- 
p i 



We have seen, (5.22), that if x ,x ,...,x, - are random variables having distribution (e) 

fc..-| 1 d *- I 

then ^Z A ii x i x i is distributed according to a ^ 2 -law with k - 1 degrees of freedom. 

Now if we -replace x by (n^-np^/ VTT in (g) denoting the result by -^ 2 , we obtain 

(h) X 2 - 



We conclude that the limiting distribution of ^ is identical with the distribution of 
2ZA^.x,x, where the x^ are distributed according to (e); that is to say, the limiting 
distribution of the expression in (h) is the ^ 2 -law with k - 1 degrees of freedom. A rig- 
orous proof of this statement is beyond the scope of this course, but It is a consequence 
of the following theorem which will be stated without proof: 



$10.42 _ X. ON COMBINATORIAL STATISTICAL THEORY _ : _ 21Q 

Theorem (A): Let xj n *,x n ),...,x n ' be random variables having a Joint c. d. f . 
for each n greater than some n Q . Let the limiting joint c. d. f , as n > OD be 
P(x 1 ,x 2 ,...,x r ). Let g(x^ 9 x 2 ,... 9 x r ) be a (Borel measurable ) function of x 1 ,x g , ...,x r . 
Then the limiting c. d. f . say P(g) of g(x| n ',x^ n ', . . . ,x n ' ) as n * oo I.B given b 

P(g) - IdFCx^Xg,...^), . 
R 

where R Is the region In the x apace for which g(x.,x 9 ,...,x ) < g. 

i^BMMMc ------ - . - . - - - - - - ^ - j ^ ^ _ 

We may sunmarize our results therefore In the following theorem, which is, in 
fact, a corollary of Theorem (A): 

Theorem (B)t Let be a sample of size n from the multinomial distribution 

(a). Then the limiting distribution of ^T (i^-np^ ag n _ QQ j^g the ^-distribution 

1 np, 
with k - 1 degrees of freedom , 

Now let us consider the contingency problem described in the introduction of 
510. if. In this, case the multinomial population consists of ra classes (A^B*) i - i,2,..r; 
-j 1,2,.. ,s. Let the probability associated with (A^B^) be pj^tlETpif' 1 ) It follows 
at once by Theorem (B) that 

(1) 

has as its limiting distribution for n * oo, the ^ 2 -law with ra - 1 degrees of freedom. 
If the p^., were known a priori, then the test given in (1) could be used for testing the 
hypothesis that the sample originated from a multinomial distribution having these values' 

of p.. j. If the A and B classifications are independent in the probability sense then 

1J r s 
P4 4 ~ PI^I (^ Pi" 1 > 21 Qf* 1 ) I* 1 the p^ and q were known a priori then (i) with p^ . 

"* PiQ-f can, of course, be used to test the hypothesis that the sample came from a multi- 
nomial population with probabilities P^q^- 

But suppose neither the p^ nor q^ are known a priori, and that we wish merely to 
test the hypothesis of independence of the A and B classifications. Karl Pearson proposed 
the following test for this hypothesis 



U) x i 

where the n 4 and n . are defined in 10.1*1. 
i j 

If we let 



220 ; _ X, ON CQMBIRATQRIAL STATISTICAL THEORY 



and express the n, in ( j) in terms the x, we obtain 



00 X* - 

where x - ^x^ and X<J - x ir 

By following an argument similar to that used in determining the limiting dis- 
tribution (e) of the x^, 1 - 1,2,.,.,k-1, we may find the limiting distribution of the Xjj 
(all i,j except i-r, js) to be normal multivariate. From this limiting distribution one 
finds that the distribution of y^(x^ .-x^ q.-x iP^/P^} ia the A. ? distribution with 
(r-1)(a-1) degrees of freedom. By an argument similar to that embodied in Theorem (A) we 
may make the following statement: 

Theorem (C); Let be a sample from a multinomial population with the- mutually 

aaHRHB&SnSKaSMBBMV J^ " "" """ ~ *~- ~- - -"- ' 

exclusive classes (AjBJ i l,2,...,r; j - l,2,...,s, In which the probability asaoc- 
iated with (A^,) is P^Qj. Let -^ be defined as in ( j). Then the limiting distribution 
Xo n * t* 16 ^-distribution with (r-1 )(s-i ) degrees of freedom. 

The reader may verify that the likelihood ratio criterion for testing the hypo- 
thesis specified by 

n: Plj > 0, 
"= 

that la, the hypothesis that the A and B classifications are Independent is given by 



It follows from Theorem (A), 7.2, that when the hypothesis of independence is true, the 
limiting distribution of -2 log A is the -^-distribution with (r-1 )(a-l ) degrees of free- 
dom. 

10.3 Sampling Inspection 

In a mass production process, suppose articles are produced in lota of N arti- 
cles each, and suppose each article, upon inspection, can be classified ag defective or 
non-defective. It is often uneconomical to carry out a program of 100% inspection. As 
an alternative, sampling methods of inspection applicable to each lot have been developed 
which have the property of guaranteeing that the percentage of defectives remaining 



S10.51 X, ON COMBINATORIAL STATISTICAL THEORY gg1 

after applying the sampling inspection procedure ^in the long run (i. e. to a large number 
of lots) is not more than some preassigned value. Such sampling methods have been devel- 
oped and put into operation by Dodge and Romig* of the Bell Telephone Laboratories. It 
should be pointed out that these sampling methods are essentially screening devices for 
reducing defectives after production, and are not devices for removing the causes of de- 
fectives. Methods for detecting the existence of causes of such -defectives must be intro- 
duced further back into the production operations. In particular, statistical quality 
control methods originally introduced by Shewhart, have been found uaeful in connection 
with thia problem. 

The mathematical problem involved in sampling inapection ia one in combinatorial 
atatiatlca. Dodge and Romig have developed two typea of inapection aampling, aingle 
aarapling and double sampling, which will be conaidered in turn. Prom a mathematical 
point of view, many sampling inspection achemea can be devised which guarantee quality of 
outgoing producta in the sense mentioned above. 

10.51 Single Sampling Inspection 

Let p be the fraction of defectives in a lot 1^ of size N. The number of de- 
fectives will be pN. Now let a sample O n of size n be drawn from L^. Giving all possible 
samples of size n equal weight, the probability of obtaining m defectives (and n - m non- 
defectivea or conforming articles) in is 



P m,n; P N,N ^ * > W ,2, . . ., 



where r is the smaller of n and Np. Let 
(b) P(c;p,N,n) - 



It is easy to verify that if any two values of p and p 1 (pN and p ! N being inte- 
gers) are such that p < p 1 then 

(c) F(c;p,N,n) > F(c;p',N,n). 



* 



H. P. Dodge and H. G. Romig H A Method of Sampling Inspection", Bell System Technical Jour- 
nal, Vol. VIII (1929) and "Single Sampling and Double Sampling Inspection Tables?, Bell 
System Technical Journal, Vol. XX (19^1 ) 

**See "Guide for Quality Control and Control Chart Method of Analyzing Data 11 O9 1 *!) and 
"Control Chart Method of Controlling Quality During Production" (19^2), American Stand- 
ards Association, New York. 



X. ON COMBINATORIAL STATISTICA 



Let p t be the lot tolerance fraction defective, 1. e. the maximum allowable fraction de- 
fective In a lot, which is arbitrarily chosen in advance (e. g., .01 or .05). Let 

PQ- F(c;p t ,N,n). 

P C is known as the consumer's risk; It is (approximately) the probability that a lot with 
lot tolerance fraction defective p t will be accepted without 100* inspection. It follows 
from (c) that if the lot fraction defective p exceeds p t then the probability of accepting 
such a lot on basis of the sample is less than the consumer's risk. The probability of 
subjecting a lot with fraction defective actually equal to p (process average) to 100% 
inspection is 

(d) P p - 1 - F(c;p,N,n), 

which is called producer f a risk. It will be noted from (c) that the smaller the value of 
p, the smaller will be the producer's risk. 

The reader should observe that producer risk and consumer risk are highly anal- 
ogous to Type I and Type II errors, respectively, (see 57.3 ) in the theory of testing 
statistical hypotheses as developed by Neyman and Pearson. In fact, historically speaking 
the concept of producer and consumer risks In sampling Inspection may be considered as the 
forerunner of the concept of Type I and Type II errors In the theory of testing statisti- 
cal hypotheses. 

Now suppose we make the following rules of action with reference to a sampled 
lot where c is chosen for given values of P C , p t , N, n: 
(1 ) Inspect a sample of n articles. 

(2) If the number of defectives in the sample does not exceed c, accept the lot. 

(3) If the number of defectives in the sample exceeds d, Inspect the remainder of 
the lot. 

(M Replace all defectives found by conforming articles. 

Now let us consider the problem of determining the mean value of the fraction 
defectives remaining in a lot having fraction defective - p, after applying rules of 
action (1 ) to (k). 

The probability of obtaining m defectives in a sample of size n is given by (a). 
If these m defectives are replaced by conforming articles and the sample is returned to 
the lot, the lot will contain pN - m defectives. Hence the probability of accepting a 
lot with pN - m defectives is given by (a), m- 0,1,2,...,C. The probability of Inspect- 
ing the lot 100% is 1 - P(c;p,N,n), which, of course, is the probability of accepting a 



X. QM COMBINAJORIAL STATISTICAL THEORY 



lot with no defectives. Therefore the mean value of the fraction of defectives remaining 
after applying rules ( 1 ) to ( * ) is 

(e) ^ " ^5 (JL ^ ) P m,n;pN,N* 

The statistical interpretation of (e) is as follows: If a large number of lots 
each with fraction defective p are inspected according ^to rules (j_) to (J*J, then the 
average fraction defective Iji all of these lots after inspection ija p. For given values 
of c, n, and N, p is a function of p, defined for those values of p for which Np is an 
integer, which has a maximum with respect to p. Denoting this maximum by j^, it is 
called average outgoing quality limit. It can be shown that the larger the value of p 
beyond the value maximizing p, the smaller will be the value of p. The reason for this, 
of course, is that the greater the value of p, the greater the probability that each lot 
will have to be inspected 100*. If the consumer risk,n, and N are chosen in advance, 
then, of course, c and hence PT is determined. Thus, we are able to make the following 
statistical interpretation of these results: 

If rules (J_), (_2), (i) and (k ) are followed for lot after lot and for given 
values of c , n, N, the average fraction defective per lot after inspection never exceeds 
PL* no matter what fractions defective exist in the lots before the inspection. 

It is clear that there are various combinations of values of c and n, each 
having a p with maximum p^ (approximately) with respect to p. 

The mean value of the number of articles inspected per lot for lots having frac- 
tion defective p is given by 

(f) I - n + (N-n)d-F(c;p,N,n)), 

since n (the number in the sample) will be inspected in every lot and N - n (the remainder 
In the lot) will be inspected if the number of defectives in the sample exceeds c. 

Thus, we have two methods of specifying consumer protection; (i) Lot quality 
protection obtained by specifying lot tolerance fraction defective p^ and consumer's risk 
P C ; (11) Average quality protection in which average outgoing quality limit p^ is 
specified. 

By considering the various combinations of values of c and n corresponding to a 
given consumer's risk (or to a given average outgoing quality limit) there is. in general, 
a unique combination, for a given p and N, for which I is smaller than for any other. 
Such a combination of values of n and c together with a value of p as near to its actual 
value p in the Incoming lots as one can "obtain" is, from a practical point of view, the 



224 X. ON COMBINATORIAL STATISTICAL THEORY >10.32 

combination to use since amount of inspection is reduced to a minimum. 

Extensive tabulations of pairs of values of c and n, for consumer's risk - 0.10, 
for values of N from l to too,ooo,for lot tolerance fraction defective from .005 to .10, 
and for process average from .00005 to .05, all of the variables broken down into suitable 
groupings, nave been prepared by Dodge and Romig. They have also made tabulations of pairs 
of values of x* and n for given values of outgoing quality limit p^ from .001 to .10, for 
values of N from 1 to 100,000 and for values of process average from .00002 to .10. Num- 
erous approximations have been made to formulas (a), (b), (d), (e) and (f ) for computa- 
tion purposes, which the reader may refer to in the papers cited. For example, it is easy 
to verify that the Poisson law e"^ n (pn) m /ml is a good approximation to (a) if p and ^ are 
both small, say <0.10. 

10.32 Double Sampling Inspection 

In double sampling inspection from a given lot of size N, the procedure for 
taking action regarding a given lot is aa follows: 

(1 ) A first sample of size n 1 is drawn from the lot. 

(2) If the number of defectives is c^, the lot ia accepted without further samp* 
ling. 

(3) If the number of defectives in the first sample exceeds c 2 inspect the remainder 
of the lot. 

(4) If the number of defectives in the first sample exceeds c 1 but not c 2 , inspect 
a second sample of n g pieces. 

(5) If the total number of defectives in both samples does not exceed c 2 , accept the 
lot. 

(6) If the total number of defectives in both samples exceeds c 2 , inspect the 
remainder of the lot. 

(7) Replace all defectives found by conforming articles. 

As in the case of single sampling, we have two kinds of consumer protection: 
(i) Lot quality protection, and (11) Average quality protection. 

Consumer risk, the probability of accepting a lot with fraction defective p t 
without loox Inspection, is given by 



The single sum in this formula is simply the probability of accepting the lot on basis of 
the first sample (1. e. Step (2)) and the double sum is the probability of accepting the 



S9 X. ON COMBINATORIAL STATISTICAL THEORY 225 

lot on basis of the first and second samples combined (1. e. Step (5)), after having 
failed to accept on basis of the first sample alone. 

The mean value of the fraction of defectives per lot remaining after the defec- 
tives have been removed by the double sampling procedure, for lots having fraction de- 
fective p originally, is given by 



c i W 1 



)(P)(P .-- ) - 



=3 N c l ^i,n l ;pN,Nm.n 2 ;pN-C r i,N-n 1 



The mean value of the number of articles inspected per lot for lots having 
fraction defective p is 

C 1 
(c) I - n, + n 2 0-|lP m>rVpN|N ) + (N-n r n 2 )0-P a ), 



where P & is the value of the probability given in (a) with p t replaced by p. 

For given values of N I , n 1 , n g , c 1 , c 2 , it is clear that p is a function of p, 
defined for those values of p for which Np is an integer, and has a maximum value p^, 
the average outgoing quality limit . For a given value of N there are many values of n^ , 
n , C- , c_ which will yield the same value of PT (approximately), or will yield the same 
consumer risk (approximately) for a given lot tolerance fraction defective. Dodge and 
Romlg have arbitrarily chosen as the basia for the relationship between n f s and the c ! s 
the following rule: To determine n 1 and n 2 such that for given values of c 1 and c 2 , 
n- and c. (as sample size and allowable defect number) provides the same consumer risk 
(approximately) as n 1 + n 2 and c 2 (as sample size and allowable defect number). The sense 
In which "approximately" la used la due to nearest integer restrictions. Even after this 
restriction there la enough choice left for combinations of n 1 , n 2 , c^ q g to minimize I 
as given by (c). To determine the n f a and c's under these conditions for given N, for 
given consumer riak, (or average outgoing quality) involves a considerable amount of com- 
putation. Dodge and Romlg have prepared tables for double sampling analogous to those de- 
scribed at the end of 10.51 for single sampling. 

For a given amount of consumer protection, a smaller average amount of Inspec- 
tion Is required under doubling sampling than under single sampling, particularly for 
large lots and low process average fraction defective p. 



CHAPTER XI 
AN INTRODUCTION TO MULTIVARIATE STATISTICAL ANALYSIS 

A considerable amount of work has been done In recent years In the theory of 
sampling from normal multivariate populations and in the theory of testing statistical 
hypotheses relating to normal multivariate distributions. The two basic distribution 
functions underlying all of this work are the sample mean distribution (e) in $5.12, and 
the Wishart distribution, (k) in 55.6, of the second order sample moments. We have given 
a derivation of the distribution of means (55.12) and a derivation of the Wishart distri- 
bution for the case of samples from a bivariate normal population, (55.12). The general 
Wishart distribution was given In 55.5, without proof. 

In the present chapter we shall present a geometric derivation of the Wishart 
distribution, and consider applications of this distribution in deriving sampling distri- 
butions of several multivariate statistical functions and test criteria. The few sections 
which follow must be considered merely as an introduction to normal multivariate statis- 
tical theory. The reader interested in further material in this field is referred to the 
Bibliography for supplementary reading. 

11.1 The Wishart Distribution 

In 55.5, we presented a derivation of the joint distribution of the second order 
moments in samples from a bivariate distribution. The general Wishart distribution was 
stated in (k) of 55.5. We shall now present a derivation of this distribution. 

* " *' ''* **' a 

n observations from the k-variate population having a p. d, f . 



a 



- 
(2tt) k / 2 



*John Wishart, "The Generalized Product Moment Distribution In Samples from a Normal Multi- 
variate Population", Biometri ka , Vol. 20A, pp. 32-52. A proof based on the method of 
characteristic functions has also been given by J. Wiahart and M. S. Bartlett, "The Gen- 
eralized Product Moment Distribution", Proc. Camb. Phil. Soc., Vol. 29 (1933) pp. 260- 
270. 



HUt XI. AN INTRODUCTION TO MULTIVARIATE STATISTICAL ANALYSIS 227 

where A la the determinant -of the positive definite matrix I |A 1 J | . Let 



Clearly b, * - bj^, so that there are only k(k+i )/2 distinct b^*. The b^^/n may be re- 
ferred to as second order sample moments. Our problem Is to obtain the joint p. d. f . of 
the bj. The Joint p. d. f . of the x a (1 - 1,2, . . . ,k; a- 1,2,.. . ,n) Is given by 



* n/g 

( C) (8ir) (nic)/a " (2ff) <nvc)/ 2 

Now, the probability element of the b. , la given by 

A n / 2 r " i i A ij b iJ 

' 2 ..... k) " (gtr) (nk)/ g 



where R Is the region In the kn-dlmenslonal space of the x 1a for which 
(e) b<< 



within terms of order [ | db^.., the probability given by (d) may be written as 

A n/2 
(f ) 



Our problem now reduces to the Integration of ^ia over the region R. Let f 1 (b n )db n 

n p l,a 
be the volume element for which b^ < 2L x^ a < b n -H db n ; f 2 (b 21 ,b 22 |b 1 1 )db gl db 22 the 

volume element for which b 2i < ^Ix 2a x ia < b 2l + db 2l , 1 - 1,2, for a fixed value of b n ; 
with a similar meaning for f,(b 51 ,b 52> b 55 lb n ,b 21 ,b 22 )db 51 db 52 db 55 , and so on. Then the 
volume element for which b^j < ^* la Xj a < b^j + db ij' that ls * the ^tegral In (f ) (to 

terms of order If db. 4 ) Is given by the product 
KJ 1J 

(g) f 1 (b n )db n f 2 (b 2l ,b 22 lb l1 )db 21 db 22 .. 



Now, consider the problem of determining the expression for 

f m (b ml ' b m2' >*m^l 1 ' " Vl ,m-i )db ml db ra2- 



228 



XT. AN TMFRQDUCTIQN TO MIMTVARIATE STATISTICAL ANALYSTS 



We note that 



ia x 1a> 



" i,2,...,m-i, are fixed. Geometrically, we may repre- 



- 2- x 

sent P I (X II ,X IS ,.. .^x^), 1 - l,2,...,k, aa k points in an n- dimensional apace. 
ia the diatance between the i-th point P and the origin 0, while "b*/ Vbbi ia the 



coaine of the angle between the vectors OP^ and OP... Fixing 
means fixing the relative position of the vectors OP 1 , OP 2 , . . . ,OP 
free to vary in such a way that 




- 1 ,2, . . .,ra-i 



j . The vector OP m is 



(i) 



mi 



mi 



db 



mi' 



and we wiah to find the volume of the region over which P m is free to vary. If n - m, 
we have aa many vectora aa dimensions and we can find our volume element by making the 
transformation 



The Jacobian la 







(i - 1 ,2, . . .,m), 



(J) 



where 



X 11 X 12 * ' " x im 

X 21 X 22 ' * ' X 2m 



2x ml 



. . . 2X M 



The abaolute value of the determinant* |x^ J is the volume of the irallelotope baaed on 
the edgea OP^ OP 2 ,...,OP m . By taking the positive square root of the square of this 
determinant, we may overcome the difficulty of sign. Thus 



and hence 



Therefore, we have 



- Ib, 



A - 2 



(k) 



cr-1 




Hence the differential element on the right in (k) obtained by taking all values of 



for which 



_ iuv* xs* mo. ml 

is a function of the volume of the parallelotope and the differentials db^ in the values 
of the b^./ 

It can be shown that Ibj^ J is the volume of the parallelotope T m , based on the 
edges OP 1 ,OP 2 ,...,OP m , for any number of dimensions n ^ m. If n exceeds m, then P m is 
free to vary within an (n-m+i )-dimensional spherical shell, as will be noted by examining 
the inequalities in (i). One of these inequalities (i-m) represents an n-dimensional 
spherical shell of thickness db^, the remaining inequalities representing pairs of paral- 
lel (n-1 )-dimensional planes, where in general no two pairs are parallel to each other. 

The volume included between any arbitrary pair of planes, e. g., 
JL 



"" ^mi + ^rai ^^ m ^ * 8 an N~ dimensional slab of thickness db^/ Y^ii- ^ e inter- 
section of the (m-1 ) pairs of (n-1 ) -dimensional planes and the n-dimensional spherical 
shell yields an (n-m+1 )-dimensional spherical shell. Now the inner surface of this shell 
(or any spherical surface concentric with the inner surface) is perpendicular to the 
differentials db , db m^ > jdb ^ . This is evident upon examining the manner in which 
the (n-m+1 )-dimensional spherical shell mentioned above is obtained as the conmon inter- 
section of the m-1 parallel pairs of (n-1 ) -dimensional planes and the n-dimensional 



For example, see D. M. Y. Sommerville An Introduction to the Geometry of n Dimensions , 
Methuen, London (1929) Chapter 8. 

There is also another geometrical interpretation of |b^ J for any n ^ m, which is 
of considerable interest. The x, a (i = l,2,...,m; <* i,2,...,n) may be regarded as n 
points P a (a = i,2,...,n) in m dimensions. If we take any m of these n points*, say 

P : (x, , i = 1 ,2,. ,.,m),(r - i,2,...,m) together with the origin as the (m+1 )-st 
a r r 
point, then the square of the volume of the parallelotope based on OP a ,OP a ,...,OP 

m 12m 

is given by l2LX 1a x lA I. This follows from the discussion between (i) and (j). Now 

r*-i * r J r 
there are C ways of choosing m points from the n points P a , and hence there are n C m 

parallelotopes which can be formed in a manner similar to that discussed above. It can 
be shown that !b| - ! x x. I, where f denoted summation for all n C m paral- 



ij - la .j a 

lelotopes thus formed. The proof of this follows by mathematical induction by increasing 
the number of points from m to n successively by unity. -In the case i j 1 , we have n 
points in one dimension and ^ J * b 1 1 ^ x i a > the 3Um of 3 Q uarea of the distances of 
each point from the origin. In the case oPl,j = l,2,...,k, Ib^jl is the sum of squares 
of the volumes of all n C m m-dimenslonal parallelotopes which can be constructed from the 
n given points, using the origin as one vertex in each parallelotope. |bj.| may be 
referred to as the generalized sum of squares . 



.220 XI. AN INTRODUCTION TO MUMIVARIATE STATISTICAL ANAI^SIS fcLJLJ- 

spherical shell. Therefore, the rectangular volume element db ml db m2 ,...,db nsn Is perpen- 
dicular to the Inner surface of the (n-m+1 )-dimensional spherical shell. The thickness 
of this shell is given by the differential element l/(2 YlbjJ ) Mdb^ in (k). Therefore, 
by multiplying this thickness by the inner surface content of our (n-m+ 1 )-dimensional 

JJL 
shell, we obtain the volume of the shell to terms of order Tldb^. The radius of this 



inner surface is equal to the distance h from P m to the (m-1 J-dlmensional space formed by 
OP. ,OP 2 , . . .,OP j. This is, perhaps, seen most readily by noting that the inner surface 



of our shell is obtained by taking the intersections of dx^x^ - b^, 1 - l,2,...,m 
(we are assuming, of course, that all db , are > o). The cente? of the sphere having 
this surface must clearly lie in the Intersection of the (m-1) (n-1 )-dimensional planes 
a "* ^mi' * ** 1*2,..., m-i. This intersection point lies on each of the vectors 



OP 1 ,OP 2 ,...,OP nM ,and the line between this point and P m is perpendicular to each of the 
first m-l vectors, which is equivalent to the statement that the center of the (n-m+1 )- 
dimensional shell is at the point where the perpendicular from P m intersects the (m-1)- 
dimensional space formed by the remaining m-1 vectors, OP 1 ,OP 2 ,...,OP m-1 . The volume 
of T , the parallelotope formed from OP.,OP 2 ,...,OP , la Vlb77| - V , say, and that of 
T m-1 , the parallelotope formed from OP 1 ,OP 2 , . . .,OP m-1 , IsVlb-l - v m -i* a *P " 1*2,..., m-1. 
Using T - as the base of T and h as the height, we must have V = W i, or h - 

Wr 

Now the volume of an n- dimensional sphere of radius r ia 




00 



and the surface content of the sphere la obtained by taking the derivative of this ex- 
pression with respect to r, which la found to be 

n 

(m) 2 nri . 

The integral in (1) may be readily evaluated by integrating immediately with 
respect to x , then setting 



x n-i " 

1,2,..., n-1, and integrating with respect to the appropriate 6 at each stage. 

The surface content of the inner surface of our spherical shell is therefore 



$11.1 _ XI. AN PEPRQDUCTION TO MULTIVARIATE STATISTICAL ANALYSIS _ 331 

n-m+i 

5 V 

tn\ SIT / m xn-m 



and the content of the spherical shell is obtained by multiplying expression (n) by the 

thickness -5*7- || db ,. Therefore, we finally obtain as the expression for the function 

m 1-1 
in (h), 



n-m+i 
~~~ 



Letting m take on the values i,2,...,k in (o) and multiplying the results, we obtain the 

9 

following expression for (g) 

kn _ k(k-l 
(P) ** 



1=1 

which la the value of TTdx*,. in (f ) to terms of order 1 I db < ^. We therefore finally 

R l,a la 
obtain the Wlahart distribution: 

n n-k-l 

TT (A/a 

nk 
n,k 



TT (A/an t 

(q) w(b > A > db - e Mdb,,, 



which Is defined over the region In the b t . apace for which I Ibj .| ' is positive semi- 
definite, that Is, over all values of the a^, for which Ib^J and all principal minors of 
all orders are ^ o. In order for the distribution to exist it is clear that n+1-k > 0. 

Since -r-r 



where the integration is taken over the space of the b,, it is clear that 



r 1 MHC"i 1 J 1 J"TT if 

(r) ilb^l 2 e 1 'J ss1 Tldb, , - - 



, c . , , uu ^^_ 

i^j J ^ 

(A/2 k ) 

Replacing Aj, by A^^ - 26^ (e^^ - 6j 1 ) in (r) then multiplying the result by 

n 



23g XI. AN/ INTRODUCTION TO MUI/TIVARIATE STATISTICAL ANALYSIS 11. 2 

we obtain the m. g, f. of the b^ and 2b^i (i<j), which has the value 

n 

(q \ A I A O A 

a / f\ I f\j ^"4 

n 
x la an(i 2 2- x ia x ia^ i< ^^ as 
determined from (c) by multiplying (c) by 



n 
2 



and Integrating over the entire kn-dimenaional apace of the x'a ia alao given by (3). 
Therefore, If one were given the function (q) in advance, one could argue by the multi- 
variate analogue of Theorem (B), 2.81, that It ia the diatribution function of the 

f"*;H) w ^ ere ^ e P- d - f- of the x i a i3 gl ven toy (c). 
The Wiahart diatribution (q) may be regarded aa a generalization of the x 2 - 

distribution to the case of vectors with k- component a. In fact for k - 1, the quantity 

P 
A^bj. ia distributed according to the x. -diatribution with n degrees of freedom. In this 

case b n ia the aum of aquarea of the n aample values of x^ while in the k-variate case 
bj. Is the aum of aquarea of the n aample values of the x, (the 1-th component of the 
vector Xj,x 2 ,.. *>\) and b^ (i^j) is the inner product or bilinear form between the n 
sample values of the x^^ and x,. Aa In the case of the ^-distribution, the Wiahart dis- 
tribution has a reproductive property to be considered in the next section. 

11.2 Reproductive Property of the Wiahart Diatribution. 

The reproductive property of Wiahart distributions is very useful in multivari- 
ate statistical theory, and it may be stated in the following theorem: 

Theorem (A). Let b[l^,b[j\ . . .,b[^ (lj l,2,...,k) be p aystema of random 
variables distributed independently according to Wishart distributions (p. d. f . '^) 

Wj, >k (tojj ) ;A lj .), (t - i,2,...,p), 

respectively. Let ^ . = 5Zb[V, n Xl^. Then the b^i are distributed according to 

w" I U"* I 

the Wiahart . d. f . 



To prove this theorem, we determine the m. g. f, <f>(e. J, (6^=6..), of the.b 

1 J ^-J Jl ' ii 

and 2b (i<j). We have 



811.5 



XI. AN INTRODUCTION TO MUUIVARIATK STATISTICAL ANALYSIS 



233 



(a) 



But 
(b) 



- E(e 



- E(e 



) - A 



..... 



E(e 



and therefore 
n n 



which la the m, g. f . for the Wiahart p. d. f . 



which we conclude, "by the raultlvariate analogue of the Theorem (B), 2.81, to be the dis- 

tribution of the b, ^ (-!TblV)- 
ij ^ ij 

11.3 The Independence of Means and ^Second Order Momenta in Samples from a 
Normal Multivariate Population 

Suppose O n (x ia , li,2,...,k; a-i,2,...,n) is a sample from the normal raultivar- 

late population having p. d. f . 



(a) f(x ) 

1 

The p. d. f . of the sample is 



- 
2 



(2TC) 



(kn)/2 



where c^ - 2l ^io^i'^ja"^^ 



(c) 
where 



P^lt XI. AN INTRODDCTION TO MULTIVARIATE STATISTICAL ANALYSIS I11l 

The a,, are distributed according to the Wlahart distribution (q), $11.1, with 
n replaced by n-1 . It waa shown In 55.12 for the case k*2 that the x^ are distributed 
according to the normal blvarlate law (d), 5.12, and it was remarked that in the general 
case, the distribution of the x^ is given by (e), 55.12. The proof of (e), 55.12, may be 
carried out by evaluating the m. g. f. of the (x^-a^), 1. e. 



) - \ e 1 fU^ 



(d) E(e 

where f U ia ) is given by (b), the integration being over the entire kn-dlmensional space 
of the X . The evaluation of this integral may be carried out as an extension of the 
case of k*2, 55-12. The details are left to the reader. In order to show that the a** 
have the Wishart distribution with n replaced by n-1, it is sufficient to show that the 
ra. g. f. of the a 1 and sa lj (i^J) is A (n " 1 ) / 2 |A l . J -2e l jr ((n " 1 J / 2) . The problem of doing 
this is a direct extension to the k-variate case of the procedure followed for k>-2 in 
55-5. We shall have to leave the details to the reader. 

Just as in the 1 and 2 variable cases discussed in 55.6, the a^. and x^ 
are independently distributed systems. A fairly direct verification of this, although 
tedious, is to evaluate the joint m. g. f . of the a^ and x^ and note that it factors. 
11.4 Hotelling'a Generalized "Student" Test * 

Suppose a sample O n is drawn from a normal raultivariate population with distri- 
bution (a) in 11. 3, and that it is desired to test the hypothesis Hta^-a^ ) that the a, 
have specified values a io (i-1 ,2,...,k), no matter what values the A^ may have. This 
hypothesis may be specified as follows: 

j A 1 1 such that | |A 1 J I is positive definite 

A: < 

^and -oo < a^ < +00, 1 - l,2,...,k. 

(a) L 

The subspace of A for which a, - a^ , 

^ I!, 2,... ,iC. 

It will be noted that this is the k-variate analogue of the "Student" statistical hypoth- 
esis discussed in 57.2 for one variable, which is simply the hypothesis that a sample from 
a nonnal population comes from one having a specified mean, no matter what the variance 
may be. 

The likelihood function for testing the hypothesis HU^-a^) is given by (b) 
in 511.3. 



J11.lt _ H. AH mTRODqCTIOH TO MOUTIVARIATE STATISTICAL ANALYSIS _ 21. 
Maximizing the likelihood function for variations of the A and a over XI, 



ire find 

(b) a i-*i' " X ij" - 

and hence the maximum of the likelihood for variations of the parameters over n is 

1 --ink 



nk a n 






Similarly, the maximum of the likelihood for variations of the parameters over 
(1. e. for variations of the Ai and for -B) ia found to be 



(d) 



(a*) 2 



where c Ql , - c,, In (b), 511.5, with a^ - a^ Q . 

The likelihood ratio for testing Hta^-a^) Is the ratio of expression (d) to 
expression (c), 1* e. 



(e) A 



n 
la^l 2 

lc oi/ 



Clearly, we may use A n - Y,say, as a test criterion for H(a.-a. ) since it is a single- 
value function of A. To complete the derivation of our test, we must determine the dis- 
tribution of Y when H(a 1 a io ) is true. We shall obtain this distribution by first finding 
its moments. Now, we know from 11.1 that the joint distribution of the c ^* is the 
Wishart distribution 

n- n-k-l 



The g-th moment of Ic j^l Is obtained In the following way. Since the Integral of the 
function (f ) over the space S G of the C, is unity, we have 



XI, AN INTRODUCTION TO MULTWARIATE STATISTICAL ANALYSIS 



(g) 



TT 



KJ 



Replacing n by n+2g in (g), then multiplying by 

n 
(A/2 k ) 2 



1 

we obtain an expression on the left which defines E(|c 
right. That is 



(h) 



But the c .. are functions of the 



and x, since 



(1) 

Therefore, 

k l n-1 

(j) 



c olj 



g) 



(A/2 k ) 2 



and its value is given on the 



-aj) 



. 



Dividing both members of ( j) by the expression in [ ], then replacing n by ri+2h, except in 
the distribution of the x.^ (the n's here being easily removable by changing variables 
VnCx^a.^) = y 1 sayl then multiplying the resulting equation by [ J, we obtain, as the first 
member, an expression defining E(|c Ql .| g - |a..| h ) and its value is given by the second 
member; thus 



(k) 



E(lc ol1 |K| ai .| h ) --J 



Clearly this moment will exist for all integers g and h for which all arguments of the 



tn.lt 



XT. AN DfPRODTCTIOW TO MULTIVARIATE STATISTICAL ANALYSIS 



257 



gamma function are > o. Setting g - -h, we obtain aa the h-th moment* of Y, 



(1) 



h) j 



h) 



s^n Rf + 



This moment may be written as 



(m) 



(f) 



nf> ; 

" nn) 



^ + h - 1 f - 1 
x (1-x) <3bc. 



Therefore the h-th moment of Y (h - 0,1,2,...) Is Identical with the h-th moment of a 
variable x having probability element 



(n) 



dx. 



It follows from Theorem (A), $2.76, on the uniqueness of distributions from moments that 
Y Is distributed according to the probability law (n). 
Making use of the fact that 



and letting 

we may write 
(o) 



-1 





c oi j - a i j 



- Vn"( x - 



*Por more applications of the foregoing technique of finding moments of ratios of deter- 
minants, see S. S. Wilks "Certain Generalizations in the Analysis of Variance", 
Biometrlka. Vol. 2V (1932) pp. kn 



XT. AN "nflTOQDIin'PTflN TO Hni/PTVARIATE fVPATTSFPTCAI, ANALYSTS 



Multiplying the first row by -y^ then adding to the second; multiplying the first row lay 
-y g and adding to the third; and ao on, we may write the determinant as 



(P) 



-1 y, y 2 . 

y i a !1 a i2' 
7 2 a 21 a 22* 



*1k 
*2k 



*k 



It follows from the argument leading to expression (k), 3.23 that the expression (p) may 
be written as a^M* !> a 'y^y*], and substituting the value of y^ we are finally able 
to write 

4 4 rn^ 



(q) 



c oij' 



where T 2 Is Hotelllng's* Generalized Student Ratio which can be written down explicitly 
In terms of the a*J and (-a in an obvious way. Hence 



(r) 



Y - 



n-T^/n-1 

and the distribution of T can be found at once by applying the transformation (r) to the 
probability element (n) (with x replaced by Y). The result is 

^@ JT^l&t 



1K$ The Hypothesis of Equality of Means In Multlvarlate Normal Populations 
Suppose O r (x la , 11,2, ...,k; a-1 ,2,..,,!^; ti,2,...,p) are p sanples from the 

L 



normal k-varlate populations 



(a) 



and that It Is desired to test the following hypothesis: 

H. Hotelllng, "The Generalization of Student's Ratio 11 , Annals of Math. Stat . . Vol. 2 
(1931) PP. 359-378. 



XI. AN TNTRODUCTIQlf TO MHUT1VARIATE STATISTICAL ANALYSIS 



(b) 






co- 



such that 



la positive definite 



and -oo < aj < oo , i-1,2,.. .,k; t-l,2,...,p. 
1 Subspace of H for which a] - a? ... - a? - 



| where -oo < a i < oo , 1-1 ,2, , . .,k. 

Denoting this hypothesis by H(a]af-. . .-aj), it is simply the hypothesis that the samples 
come from k-variate normal populations having identical sets of means, given that they 
come from k-variato nonnal populations with the same variance- covariance matrix. It 
should be noted that this hypothesis is the multivariate analogue of that treated in 9.1. 
Let 

(c) 



ij- 



,_ o 77 \ / __ti 77 



and 

(d> 

where 
(e) 

and 



The a 1 ,/n are the second-order product moments in the pool of all samples, and similarly 
x is the mean of the i-th variate in the pool of all samples. 
The likelihood function for all samples is 



(8) 



where 




(2TT)< 



Itaxlnlzlng the likelihood function for variations of all parameters over n, we obtain 



XI, AN INTRODUCTION TO MULTIVARIATE STATISTICAL ANALYSIS 



and the maximum of the likelihood turns out to be 

(h) Brr~n e " ~ * 

(2TT) 2 |-~*! 2 

Similarly, maximizing the likelihood function for variations of the parameters over 
we obtain 



(i) a = x r MA^II - 1 1-51 1" 1 , 



and the maximum of the function turns out to be 



(J) 



nk 
1 * 2 



nk ft n 

U 
n 



Hence the likelihood ratio for testing H(a]=a^=.. .=a^) la the ratio of (j) to (h), 1. e. 



(k) 



Again we may use A ' n = Z, say, as our teat criterion. To find the diatrlbution 
of Z, we proceed aa In 11.4 by the method of momenta. Noting that the a^. are diatrl- 
buted according to the Wlahart diatrlbution w 1 v( a jM A ii) we have, aimllar to (h), 
511 .k, 




(^ + g> 

(1) E(|a, ,| g ) = -J 



Now It may be verified that 
(m) 



^ . 

k 

t -t tas1 . 

a^^ and x^ being independently distributed systems, it follows that the a^^ and the 

are independently distributed systems. The a are distributed according to Wishart dis- 



p k _t t - t 

where m. j - > ^ n^Cx^-x^ )(x.-x,). Since the a^ . are functions only of the a< the 

tas1 1 - 1 J J J 



tributions vt _j ^(a^A-j^j ), tl ,2, . . .,p, and it follows from the reproductive property of 

the Wishart distribution that the a , are distributed according to w^ . (a, _.,A^ ,). 

ij n-p,K ij ij 

Therefore by using the joint distribution of the a^. and 5c and following steps similar 
to those yielding (k) in 11.U, we find 



,5 3d* AN INTRODUCTION TO MUIfflVA 



(n) 



The h-th moment of Z la given by setting g -h. We find 



(o) 



It should be noted 'that for the case of two samples, (1. e. p2), the h-th moment of Z 
reduces to 



and. hence the distribution of Z In this case Is the same as that of Y with n replaced by 
n-1 . In the two-sample case, It should be remembered that n - n 1 + n g , the sum of the 
two sample numbers. 

For the case of p 3, the h-th moment of Z Is 



Making use of the formula 



(q) reduces to 



from which we Infer the distribution of Z to be Identical with that of x 2 , where x Is 
distributed according to 

Hn-2) n-k-Sfc-i 



r(n-k-2)T(k) 

Setting Z -IX- 2 , we find dx -^Z~^ 1 / 2 ^dZ, and hence the distribution of Z for the case of 
three samples Is 

<) j 



g li2 XI. AN INTRODUCTION TO MUUTIVARIATE STATISTICAL ^WATIVSTS $11 .6 

The distributions for p b and 5 turn out to be relatively simple also. 

11.6 The Hypothesis of Independence of Seta of Variables in a Normal Multl- 
variate Population 

Suppose O n (x ia , 1-1, 2,... ,k; ai,2,...,n) is a sample from a normal multivarlate 
population with distribution (a) in 11.3. Let the variates x^ be grouped Into r groups . 
as follows: G I : (x 1 ,x g , . ..,x^ ), G 2 : (x k +1 ,x fc + 2 > >*! ^ )>> <V 
( x v j. .iv .i>-^ x v)> where k - k.^-k^-f. . .4-k . The problem we wish to consider is that 

K 1*' - *' wt r-1 i e r 

of deriving a test for the hypothesis that these groups of variates are mutually indepen- 
dent, 1. e. that Aj, - for all l,j not belonging to the same group of variates. Let 
I lA^y 1 1 denote the value of | lA^I I when all A^ aw o for 1, j belonging to different 
groups of variates. The hypothesis to be tested may then be specified as follows: 

J Space of the A^ such that | |A 1 J | is positive 

(a) {definite and -oo < a i < +00. 

(o) 
u>: | Subspace of /I for which I lA^I I I |A jL j 1 1 . 



We denote this hypothesis by H( | lA^I I - I |A| I ). Maximizing the likelihood function 
(b) in 11.2 for variations of the parameters over IX we find the maximum to be 



nk n 



The nfexlmum of the likelihood function for variations of the parameters under u> la 



n 

~~ o 

,! 



where aiy a. , if i,j belong to the same group of variates and aiy o if 1, j belong 
to different groups of variates. Clearly |aj^ '| is equal to the product of r mutually ex- 

clusive principal minors Tlla i ., I, the u-th minor being the determinant of all a^ * 

u-1 u j u , . r J 

associated with the u-th group of variates. Similarly lAi^l |f |A 1 ^ |. The llkeli- 

( . X J u-1 x u j u 

hood ratio for testing H(l |A 1 j| | - I !A^y 1 1 ) is, therefore, 

n 

Ia i1 |2 
(d) ^-"-n 

(o)2 



S11.6 



XT. AN INTRODUCTION TO MUI/TIVARIATE STATISTICAL ANALYSIS 



Denoting A n by W, which may clearly be used as the teat criterion in place of A, we de- 
termine the distribution of W by the method of moments. 

It should be noted that if we factor VaJ^ out of the i-th row and i-th column, 

(il,2,...,k) of each of the two determinants |a 1 J and laiyl, and using the fact that 




" r 



iV 



sam P^ e correlation coefficient between the i-th and j-th variatea, 



where r^, = 1, and ri^ - r,, if i and j both belong to the same group of variatea, and 
ri^' If 1 and j belong to different groups of variates. 

To find the moments of W, let us divide the a^. into two classes: (A) those 
for which i and j correspond to different groups of variates, and (B) all others. Let 
the product of differentials of the a 1 . in (A) be dV A with a similar meaning for dVg. 
Now it is evident that if we integrate the Wishart distribution w r v (a, .;Ai^') with re- 

1 1 , K 1 J 1 J 

spect to all a., in Class (A), we will obtain the product of Wiahart distributions 



uu 

since this integration simply yields the joint distribution of the a^j In Class (B) which 

we know to be independently distributed in sets a, . (ui ,2, . . . ,r) when I I A.. ! I 

. , u j u J 

| \A\.'\ I, each set being distributed according to a Wishart law. Hence we must have 

n-1 



(e) 



H.1 




\ 



n-k-g 1 



(o) 



e 



2 itpi n ij tt ij 



dV A 



fr 



n-1 




A ^ 1 a i j 

J U U J U 



(n-i)k u k u (yi 



. Vi 

Let both members of (e) be multiplied by ]~Tla 1 ^ |" (which is constant as far 

j 
in Class (A) 

multiply throughout by 
(f) 



as the a,** in Class (A) are concerned), then replace n by n + 2h throughout (e), then 



h) 



AN INTRODUCTION TO MPLTIVARIATE STATISTICAL ANALYSTS 



then Integrate with respect to the a 1 , in (B). It will be seen that the first member in 
(e) after these operations will be the integral expression defining E(V\r), and the second 
member will be the value of ECW 11 ). We find 



(g) 



A ku 

n FT 



k 

'U 



As a special case, suppose we wish to test the hypothesis that x 1 is independent 
of the set x 2 ,x_,. ..,x-. In this case r 2, k 1 ** 1 , k = k-1 . The W criterion is 



(h) 



W 



' 1 1 



where r. 1 is the minor of the element in the first row and column of I r, . I , and R is the 
sample multiple correlation coefficient between x 1 and x^x,, . . .,Xv. The h-th moment of 
W for this case la found from (g) to be 



(1) 



Following the procedure used In inferring the distribution of Y in 11.1* from its h-th 
moment, we find the probability element of W to be 



U) 



n-k-g k-3 
W 2 (1-W) 2 dW. 



Setting W - 1-R 2 , we easily find the distribution law of R 2 , the square of the sample 
multiple correlation coefficient, between x 1 and x 2 ,x, f ... f x^ to be 



n-k-2 



(k) 



when the hypothesis of independence of x- and x ,x.,...,x,_ is true, i. e. when the AL--O, 

I c. I K I,J 

( J 2,3,...>k), which is equivalent to having the multiple correlation coefficient equal to 
zero in the population. This result was first obtained by R. A. Fisher, who also 



411.7 XI. AN TMTOQDTIC'PTQN TO MHUTIVARIATE STATISTICAL 

later* derived the distribution of R 2 in samples from a normal multivariate population 
having an arbitrary multiple correlation coefficient. 

Distributions of W for various special cases Involving two and three groups of 
variates have been given by Wilks**. 

11.7 Linear Regression Theory in Nonaal Multivariate Populations 

The theorems and other results presented in Chapters VIII and IX can be extended 
to the case in which the dependent variable y is a vector with an arbitrary number of 
components (say y 1 ,y 2 ,.. .,y a ), each component being distributed normally about a linear 
function of the fixed variates x 1 ,x 2 ,...,x^. In this section we shall state without 
proof the multivariate analogues of the Important theorems In Chapter VIII. The details 
of the proofs of these theorems are rather tedious and can be carried out as exten- 
sions of the proofs for the case of one variable. 

Suppose y 1 ,y 2 , ...,yj. are distributed according to the normal multivariate dis- 
tribution 

1 J. 
A 2 2 

(a) -A-e 

(2tt) 2 
where 

(b) b, - 







the 



x being fixed variates. Let O n (yi a IXp a ; i-i,2,...,s; J>-1 ,2, . . . ,k; a-i,2,...,n) be 



*R. A. Fisher, "The General Sampling Distribution of the Multiple Correlation Coefficient" 
Proc. Roy. Soc. London, Vol. 121 (1928) pp. 65^-673. 

An alternative derivation has also been given by S. S. Wilks, "On the Sanqpling Distribu- 
tion of the Multiple Correlation Coefficient", Annals Math. Stat., Vol. 3 (1932) 
pp. 196-203. 

**S. S. Wilks, "On the Independence of k Sets of Normally Distributed Statistical Vari- 
ables", Econometrics, Vol. 3 (1935), pp. 309-326. 

***Proof s and extensions of many of the results may be found in one or more of the follow- 
ing papers: 

M. S. Bartlett, "On the Theory of Statistical Regression", Proc. Royal Soc. Edinburgh, 
Vol. 53 (1933), PP- 260-283. 
P. L. Hsu, "On Generalized Analysis of Variance", Biometrlka, Vol. 31 (19^0), pp. 221- 

237. 

D. N. Lawley, "A Generalization of Fisher's z", Blometrlka, Vol. 30 (1938), pp. 180-187. 

W. G. MadoHf, "Contributions to The Theory of Multivariate Statistical Analysis", Trans. 

Aror. Math. Soo.. Vol. U (1938), pp. fc5V-*95- 

S. S. Wilks, "Moment-Generating Operators for Determinants of Product Moments In Sanqples 

froma'Normal System", Annala of Math., Vol. 35 O93M, PP. 312-3^0. 



XI. AN imODIETIOW TO MUUPIVARIATE aTATIgTICAL ANALYSIS 



a sample from a population having distribution (a). The likelihood function associated 

with this sample is 



? 



(ait) 



ng 
2 



b ia - 



C 4 i - 



(c) 

where 
(d) 

Let 



(e) 



Clearly c.^ - c .,, cJ - cj^, c* - c" . For a given value of 1, let a^ be the solution 
of the equations 



q 



(f) 

that is, 
(g) 

and let 
(h) 

Purthennore, let 
(D 



The essential functional and probability properties of the quantities defined 
in (d), (e), (f ), (g), (h) and (1) may be stated In the following theorems: 
Theorem (A); 



Theorem (B); 




ill ,7 _ JET- AH IMTRQDDCTIQN TO Mnr/rWARIAfTB gPATigTICAL ANALYSTS 



Theorem (C)t If 0^: (y^l^p^ la * sample from a population having distribution 

if the x_^ are such that I Ic 

^xi --- 

according to the Wishart distribution 



(a), then if the x_^ are such that I Ic" I 1 is positive definite , the a, , are distributed 

--- ^xi --- pq - ij -- 



and independently of the a^ (i-l,2,...,s; j*-l,2, ...,k) which are distributed according 
to the normal ka-variate distribution law 

xj^ip^ip^jq'^'q* 





TT2 

k s 

where D is the ks-order determinant |A, ^c" I and has the value A He"! . 
- -- -- X j pq pq 

The multivariate analogue of the general linear regression hypothesis stated in 
58.3 may be specified as follows: 

The space for whichlA, Jis positive definite 

iJ 

and -oo < a, < oo , i~i,2,...,s; pi,2,...,k. 



The subspace of XL for which a^ - a ip > 



Let us denote this hypothesis by H(a i -a^ ). It is the hypothesis that the last k-r 
regression coefficients corresponding to y^, (ii ,2, ...,s) have specified values E I . 
If the a^ 0, our hypothesis is that each y^ is independent of x r+1 * x r ^ 2 > ^k* 

The likelihood ratio A for testing this hypothesis (as obtained by maximizing 
the likelihood (c) for variations of the parameters over Hand by maximizing for varia- 
tions of the parameters over u> and taking the ratio of the two maxima) turns out to be 
given by if 1 / 2 , where 

ls.,1 

(n) U T"^ ' 

|a !j' 

The form of sf j may be seen from the following considerations: 

In view of Theorem (A), when the likelihood function (c) is maximized for vari- 
ations of the parameters overu>, we may consider the maximizing process in two steps: 
First, with respect to the a^ parameters over cj (holding the A^ fixed). Here we fix 
a i a i ' (l1,2,...,s; p=r+l,...,k) in (j) and minimize the second term on the right 
side of (j) with respect to a^ (i-1 ,2, . . .,s; p-i ,2,... f r). The coefficient of A I . in the 



XI- AN HgrRODDCTION TO MUUPIVARIATB STATISTICAL AMALY3I3 



right band aide of (J) after this minimizing step Is sjj, where 
(o) a - 8 



where BL * results from the second term of the right hand aide of (J). We next maximize 
(o) for variations of the A^, after maximizing with respect to the a., (over GJ). It 
will be seen that the maximizing values \* of A* * are obtainable after the first maxi- 
mizing step (i. e, with respect to the a^ overcj), and are given by 



It will be noted that the form of sj* is similar to that of s^*, and is given 



where 

-yc 



and where at are given solving the equations 

0, (q1,2,...,r), 



where 



The HLj in (o) are functions of the aj which are distributed independently of the 
In fact, it can be shown that the m^., are of the form >>"g^ u *^ u , where the ^ u 
(i-l,2,...,s) are linear functions of the fi^ distributed according to 



,., 



and furthermore the sets ^ u (^" 1 *2,...,k-r) are Independently distributed, and are dis- 
tributed Independently of the s^ where E^iW^ipo) la true. If the a A - o (i-l,2,..,s 
pr+i,.*.,k) then it follows from Theorem (B) that IsjJ may be expressed as the ratio 
of two determinants as follows: 



11,7 



XI. AN imRQDDCTIQH TO MCTJTIVARIATB aTATISKPICAL AKALgaiS 



(t) 




p;q'-1,2,...,j 



Now the problem of determining the distribution of U when ^^4-^ ) la true, 
la, therefore, reduced to that of determining the distribution of the ratio of determin- 
ants 

Is, 



1.1 



(u) 



where the S, are distributed according to the Wishart distribution 



(v) 



and the ,. are distributed according to 



lu 



r 



(w) 




Ju 



the s^j and ^ being Independently distributed systems. 

The simplest procedure for finding the distribution of U Is perhaps by the 
method of moments. The method of finding the moments of U Is entirely similar to that of 
finding the moments of Y and Z in 11 .U and 11.5, respectively. The h-th moment is 
given by 



(x) 



from which one may infer the distribution of U In any given case by methods illustrated 
in 11.5. We may summarize our remarks in the following theorem which is the multivarlate 
analogue of Theorem (A), 8.5. 

Theorem (D); Let O n (yi a l x pa ) M & sample of size n from the population having 
dlatrlbution (c). Let H(a 1 **a.^ ) be the statistical hypothesis specified by_ (m), and 
let U - A 2 / n , where A jLa the likelihood ratio for testing the hypothesis. Then 



(y) 



u 



(1, 



250 XI. AN INTRODUCTION TO MUUTIVARIATE STATISTICAL AMAT.YHTfl 1 1 . 



where s^ la defined by (^), and m^j bj (o) and (g), and if E(a^ -a^ ) la true, the h-th 
moment of U is given bj (x) 

It ahould be obaerved that U la a generalized form of the ratio 



In Theorem (A), 8.3. In fact, when a - 1, then a n - no^and m 1 1 - 

It may be verified that Theorem (D) la general enough to cover multlvarlate 
analoguea of Caae i (8.41), Caae 2 (8.^2) and Caae 3 (8.^3). The eaaentlal point to be 
noted In all of these caaea Is that k represents the number of functionally Independent 
a lp (** or ea h 1) involved in specif ying H and r k) represents the number of function- 
ally Independent a. (for each 1) involved in apecifylng u>. 

11,8 Remarks on Multivarlate Analysis of Variance Theory 

The application of normal linear regression theory to the various analysis of 
variance problems discussed in Chapter IX can be extended in every instance to the case 
in which the dependent variable y is a vector of several components. In all such raulti- 
variate extensions, U in 11.7 plays the role in making significance tests analogous to 
Snedecor's P (or 1 / ( 1 +lF ) > to be more precise) in the single variable case, (Theorem 
(A), 8.3). 

The reader will note that the problem treated in 11.5 is an example of multi- 
variate analysis of variance, and is the multivariate analogue of the problem treated In 
9-1. 

To illustrate how the extension would be made in a randomized block layout 
with r f rows and s f columns, let us consider the case in which y has two components y- 
and y g . Let y Uj . and y gl j be the values of y 1 and y 2 corresponding to the i-th row and 
j-th column of our layout, i-i ,2, . . .,r f , j-1 ,2, . . .,s ! . 

The distribution assumption for the 7^4 and Ypi^ 1* that (y-|i^ m i"Rii""C 1 j) 




i " are Jo^tly distributed 

I J. - 

according to a normal bivariate law with zero means and variance- covariance matrix 

A 11 A 12 II 
A 21 A 22 II ' 

Now suppose we wish to test the hypothesis that the "column effects* are zero 
for both y 1 and y^. Thia hypotheaia may be apeclfied as follows: 



rr. AN TOTROPTICTTQN TO MDUTIVARIATE STATISTICAL ANALYSIS 



-oo < nij, m g , 



12 



.a: 



(a) 



< <D, 



positive definite. 



. The aubapace in H obtained by setting 






each Cj * and each C 2 ^ - 0. 



This hypothesis, which may be denoted by H[(C 1 ,,C 2 i)-o], is clearly the 2- variable ana- 
logue of H[(Cj)-0] in 59-2. 

Let 7^ 7 1 *> 7^ > ^nR' ^nc' ^11E ^ iave m ar ^-ngs as functions of y 1 1 1 , 
7 112 i-^7 lr i a i similar to those of y^, 7.j> 7 > Sp, S G , Sg as functions of y n ,y 12 , 
^pg- Let 7 2 i^ 3 r 2.j' ^2 ' S 22R' S 22C' S 22E ^^ alrailar meanings as functions of 
.,. Let 



(b) 



It may be verified that the likelihood ratio A for testing the hypothesis 
H[(C 1 jiC 2 j)-0] is given by u/ 2 , where 



(c) 



S 11E S 12E 
S 12E S 22E 



S nE +S 1lC 

S 21E +S 21C S 22E+ S 22C 



It follows from Theorem (D), 1K7, that the h-th moment of U when 

H((C M 9 C 4)-0) is true, the special case of (x), 11.7, obtained by setting s2, k-r'-f3 f -l, 
i j ^J 

r - r 1 , n - r's 1 , i. e. 



using formula (r) In 511.5, this reduces to 



(e) 



XT, AN IMTRQPDCTION TO MIII^WARIATE STATISTICAL ANALYSTS 



from which we can easily obtain the distribution of U Q by the method used In 'deriving the 
distribution of Z In (u), 511.5. 

The extension of the hypothesis specified In (a) and the corresponding U Q to 
the case In which y has several components, say y^yg^...^ Is immediate. Similar re- 
sults hold for testing the hypothesis that "row effects" are zero. 

We cannot go into further details. The illustration given above will perhaps 
indicate how Theorem (D) can be used as a basis of significance tests for multivariate 
analysis of variance problems arising In three-way layouts, Latin squares, Graeco-Latln 
squares, etc. 

n.9 Principal Components of a Total Variance 

Suppose x 1 ,x g , .. . ,x^ are distributed according to the normal raultivariate*law 
(a) in 11.3. The probability density is constant on each member of the family or neat 
of k-dimensional ellipsoids 

k 

- C ' 

where < C < CD . The ellipsoids in this family all have the same center (a ] ,a , . ..,a. ) 
and are similarly situated with respect to their principal axes, that is, their longest 
axis lie on the same line, their second longest axis lie on the same line, etc., (assuming 
each has a longest, second longest, ..., axis). 

Our problem here is to determine the directions of the various principal axes, 
and the relative lengths of the principal axes for any given ellipsoid in the family (the 
ratios of lengths are, in fact, the same for each member of the family). We must first 
define analytically what is meant by principal axes. For convenience, we make the follow- 
ing translation of coordinates 



The equation (a) now becomes 



The theory of principal axes and principal components as discussed in this section (in- 
cluding no sampling theory) can be carried .through formally without assuming that the 
random variables Xj,Xg, ...,x k are distributed according to a normal multivariate law. 
However, this law is of sufficient interest to justify our use of it throughout the sec- 
tion. Some sampling theory of principal components under the assumption of normality 
will be presented in 11.11. 



PRODUCTIOH TO MOUTIVARIATE STATISTICAL 



233 



If P:(7 1 ,7 2 , ..>y k ) represent o any point on this ellipsoid, then the squared distance D 2 
between P and the center is > y^. How if we allow P to move continuously over the 
ellipsoid, there will, in general, be 2k points at which the rate of change of D 2 with 
respect to the coordinates of P will be zero, 1. e. there are 2k extreraa for D 2 under 
these conditions. These points occur in pairs, the points In each pair being synmetrlo- 
ally located with respect to the center. The k line segments connecting the points in 
each pair are called principal axes. In the case of two variables, 1. e. k - 2, our el- 
lipsoids are simply 'ellipses, and the principal axes are the major and minor axes. We 
shall determine the points In the k-variate case and show that the principal axes are 
mutually perpendicular. 

It follows from U.7 that the problem of finding the extrema of D 2 for varia- 
tions of P over (c) is equivalent to finding unrestricted extrema of the function 



(d) 






for variations of the y 1 and A. Following the lagrange method, we must have 

and also equation (c) satisfied. Performing the differentiations in (e), we obtain the 

following aquations 



(f) 



- 0, 



(1-1, 2, ...,k). 



Suppose we multiply the i-th equation by A , ii,2,...,k, and sum with respect to 1. We 
have 

(g) iLlA 11 ^ - * 3>"" A jA lh yj - o. 

Since iz^A^A 111 - 1 , if J - h, and o, if j ^ h, (g) reduces so that it may be written as 

(h) 

Allowing j to take values i,2,...,k, it is now clear that equations (h) are equivalent 
to (f ) for finding the extrema. In order that (h) have solutions other than zero, it is 
necessary for 



(D 



A 11 -A A 1 



A 22 -A, 



.2k 



- 0. 



25k XI. AN INTRODUCTION TO MUUTIVARIATB STATISTICAL ANALYSIS $11.9 

This equation is a polynimial- of degree k, usually called the characteristic equation 
of the matrix I I A M I . It can be shown that the roots of (i) are all real*. If the 
roots are all distinct, let them be A^A ,...y\^. The direction of the principal axis 



corresponding to A is given by substituting A in (h) and solving** for the 

o O 

j-i,2,...,k. Let the values of the y^ (i-1 ,2, . . ,,k) corresponding to A- be 
(ii ,2, . . .,k) and let the direction cosines of the g-th principal axis be c ff j (defined by 

Hence, we have from (f ) 

(J) y gl 

It ia now clear that if y ^ are solutions of (j), then -y ^ are solutions also. Multi- 
plying the i-th equation by y , and summing with respect to i, we find 

(k) 
or 

(1) f=T y gi " A g " * 

Therefore, the 3quared length of half the g-th principal axis is AC. If we consider the 

o 

h-th principal axis, (g^h), we have 

^hi hi^, ij^hj" ' ***/ 

If the i-th equation in ( j) be multiplied by y hl /A and summed with respect to i, and if 
the i-th equation in (m) be multiplied by Jxlfih and summed with respect to i, we obtain, 
upon combining the two resulting equations 

(n) 

Since A ^A,, this equation implies that ^y^yv,- - 0, which means that the g-th and 

(3 j * O^ ** 1 

h-th (gjfti) principal axes are perpendicular, i. e. all principal axes are mutually per- 
pendicular. 

Suppose we change to a new set of variables defined as follows 



See M. B8cher, Introduction to> Higher Algebra. MacMillan Co., (1929), p. 170. 
For an iterative method of solving the equations, together with a more detailed treat- 
ment of principal components than we can consider here, see H. Hotelling, "Analysis of 
a Complex of Statistical Variables into Principal Components", Jour, of Educ. Psych.. 
Vol. 2if, (1933), pp. M7-U1, 1*98-520. 



XI. AN INTRODUCTION TO MUUTIVARIATE STATISTICAL ANALYSIS 



255 



(o) 



"" Z 



g' 



Multiplying the g-th equation by c j and using the fact that ^" c gl c , 
of mutually orthogonal vectors) and summing with respect to g, we find 



Substituting In the equation of the ellipsoid (c), we have 



> 



(for a set 



" 



Now It follows from the argument leading to (n) that A i c i c "9* 
^A* (If 8"h). Hence the equation of the ellipsoid In the new coordinates Is 



(r) 



- c - 



The Jacoblan of the transformation (p) Is |c ^| which has the value 1, as one can see by 
squaring the determinant. Hence, if the (x^-a,) are distributed according to the normal 
multlvarlate law (a), 511 ,3* and since (p) transforms the quadratic form (a) into (r), 
then the z are Independently distributed with variance A . But from (o) we also have, 

B & 

by taking variances of both sides, 



" 







Sunming with respect to g, and using the fact that 



, we have 



In other words the sum of the variances of the y^ (11 ,2, ...,k) is equal to the sum of 
the variances of the z (g-l,2,...,k). A 1 ,A 2 , . ../ k are called principal components of 
'the total variance. It will be observed that z is constant on v-i)-dimensional planes 

o 

perpendicular to the -th principal axis, &-i,2,...,k. 
We may summarize in the following 
Theorem (A); Let y. ,y ? , . . . ,y^ be random variables distributed according to the 

normal multivariate distribution 

k 



(a) 






5 



dyr .. dv 



236 _ XI, AN INTRODUCTION TO MUUIVARIATE STATISTICAL ANALYSIS _ S11.Q 
Let the roots of the characteristic equation |A *-Ad , .| - o be A, *A ,...,A,.,. Let c^ 

_ __ _ _ __ __ 1J I e. K gl 

(1-1,2,.. . ,k) be the direction coainea of the g-th principal axia of 



and let 

(u) Agi y i " V ' (8-1, 2, ...,k). 

Then 

(1 ) The direction coainea are given by 



where the y . aatlafy the equations 



(2) The length of half the g-th principal axis la VA_C. 

g 

(3) The principal axea are mutually perpendicular. 

(h) The tranaformation (u) tranaforma the probability element (a) into 



(v) 

======= e - u*,...^. 



the z., being independently distributed. 
^ 8 ~_ 

(5) ^> A !_*&> i- Q- the aum of the variancea of the y< ia equal to the aum of 

- 



the variancea of the z . 

o 

If two of the roota of (i) are equal, we would have an indetermlnant situation 
with reference to two of the principal axea. In this caae, there will be a two-dimen- 
aional apace, i. e. plane, perpendicular to each of the remaining principal axes such 
that the intersection of thia plane with (c) ia a circle. Similar remarka can be made 
about higher multiplicitiea of roota. 

Aa a aimple example in multiplicity of roota, the reader will find it inatruc- 
tive to conaider the caae in which the variance of y (i=l ,2, . . .,k) ia o- 2 and the covari- 
ance between y^^ and y. ia a 2 p. Equation (i) becomea 

[ar 2 (1- ? )-A] k " 1 [<r 2 (H-(k-1) ? )-A] - 0. 
There are roota of two magnitudea, one being o- 2 (i-p) with multiplicity k-1; the other 



being oO+fk-i )f) with multiplicity 1. It ia convenient in thia caae to think of one 



11.10 , XI . INTRODOCTION TO MULTIVARIATB STATISTICAL ANALYSIS 237 

long principal axis (if f >0) and k-1 short ones all equal (although indeterminate in 
direction). If f^Q, then it is clear that the long axis increases as k increases, while 
the short axes remain the same. Thus the variance of the z (which is a linear function 
of the y 1 by transformation (u)) corresponding to the longest axis increases with k. 
This property of increasing variance of the linear function of several positively inter- 
correlated variables associated with the longest axis, is fundamental in the scientific 
construction of examinations, certain kinds of indices, etc. By continuity considera- 
tions one can verify that the property holds, roughly speaking, even when the variances 
(as well as the covariances) of the variables depart slightly from each other. 

11.10 Canonical Correlation Theory 

Let x 1 ,x 2 , . . . ,x^ be random variables divided in two sets S 1 : (x 1 ,X 2 , . . . ,x^.) 
and SQ'.IX,. .,,... ,x,_ .!_ ) (k.+ks-k). We shall assume that k- < k . Let L 1 and L be 

C. Jt.j-t-1 K,^ +Kg I d I . \ 

arbitrary linear functions of the two groups of variates, respectively, i. e. 

L, - 




The correlation coefficient between L I and L g (see 52.75) is given by 



where i and j in the summations range over the values i,2,...,k, while p and q range over 
the values k^+1, k^+2,.. .,k.j+k 2 . ||A p | is the covariance matrix between the variables 
in G 1 and those in 2 ; | |A M | is the variance- coveriance matrix for variables in 0- ; a 
similar meaning holding for I |A pq | I . 

Now suppose we consider the problem of varying the 1^ and lg so as to maxi- 
mize the correlation coefficient R 12 , (actually to find extrema of R 12 , among which there 
will be a maximum). Corresponding to any given solution of this problem say 1^, 1 2 , 
(i-i,2,...,k- ; i^-k^+1 ,...,k 1 -fk 2 ) there are infinitely many solutions of the form al 1A , 
bl* , where a and b are any two constants of the same sign. To overcome this difficulty, 
it is sufficient to seek a solution for fixed values of the variances of L 1 and L 2 , which, 



*This problem was first considered by'H. Retelling "Relations Between Two Sets of Variates 11 
i, Vol. 28 (1936), pp. 322-377- 



XT. AM IMTROPTCTTQW TO MIMIVARTATB OTATTflfPTnAI. AWAT.YSTfl 



for convenience we may take as 1 . This is equivalent to the determination of the extrema 
of R 12 for variations of the 1^ and l g , subject to the conditions 

(c) 



By Lagrange's method this amounts to finding the extrema of the function 



where A and M are divided by 2 for convenience. The 
tions 

(e) 



and 1 2 must satisfy the equa- 



1 1 , 2 , . . . , K 



(f) 

which are 
(g) 

(h) 



Multiplying (g) by 1^ and summing with respect to (i), then multiplying (h) by l g , 
summing with respect to p, and using (c), we obtain 



(i) 



Therefore putting jx A in (h), we obtain a system of k linear homogeneous equations in 
the 1 14 and 1 2 . In order to have a solution not identically zero, the k-th order deter- 
minant of the equations (g) and (h) must vanish. That is 



(j) 



0. 



If we factor tA 11 out of the i-th row and j-th column (i-1 ,2, . . .,k) and VA PP out of the 
p-th row and p-th column (p-^+1 , . . ^k^k^), we find that (J) is equivalent to 



(k) 



- o, 



811.10 _ XI, AN INTRODUCTION TO MOI/riVARIATB STATISTICAL ANALYSIS _ 

where the p f a are correlation coefficients, and f^ - p - i. it can be shown* that the 
roots of (k) are all real, since the determinant (k) Is the discriminant of Q. - AQ^, 
where Qg Is the sum of the two quadratic forms In (c), and hence Is positive definite. 
If the determinant In (k) Is expanded by Laplace's method by the first k 1 columns (or 

rows) It Is clear from the resulting expansion that (k) Is a polynomial of degree k-+k 

k k 
In which the lowest power of A la k g -k 1 . Hence by factoring out A 2 } ' we are left with 

a polynomial f(A) In A of degree 2k 1 . Now any term In the Laplace expansion of (k) (by 
the first k 1 columns) Is the product of a determinant of order k. and one of order k p . 
If the first determinant has r rows chosen from the upper left hand block of (k), then 

the second determinant will contain k - (k--r) rows from the lower left hand block of 

k -k +2r 
(k). The product of these two determinants will therefore have A 2 1 as a factor. 

k ~k 
Therefore, by factoring A 2 1 from each term In the Laplace expansion of (k), It Is clear 

that the resulting polynomial, that Is f(A), will contain only even powers of A. There- 
fore, the 2kj roots of f(A) - o are real and of the form +A V fA^...,^ , where each A 



la ^ o. Let A jL - p^y and ^ - fyi^y Let ^U' I \i2 9 be the aolutlona of the equations 



(g) and (h) corresponding to the root p^, (u-1 ,2, . . .,2k 1 ) and let L^ , L_, be the values 
of L 1 and L g In (a) corresponding to the solutions 1 U11 , l^p- Remembering that JA - A, 
and inserting the u-th root p in (g) and (h), we must have 



Multiplying (1) by l ull and summing with respect to i, and making use of the fact that 
^ ^ find 



The first term in (n) is simply the correlation coefficient between L^ and lyg, and its 
value is p^- If u is even, then the correlation between L^ and 1*^ la equal to that be- 
tween Lj u+1 ^ and -L^ u+1 j g (or -L^ u+1 ^ and L^ u+1 ^. It can be easily verified that the 
correlation between L( 2 i)i and L (2j)2 W$) ia zero- Hotelling haa called L^ and L^ 
the u-th canonical variatea^ and p<j the canonical correlation coefficient between the 
canonical variatea Iy 1 and L,^. Hence, the canonical correlatlona and therefore the roots 



*M. Bdcher, loc. cit., p. 170. 



260 XI. AN IMKOPqCTIQE TO MULTIVARIATE STATISTICAL ANALYSIS S11.11 

of the equation (k) lie on the interval (-1,+1). If there exists a single largest root, 
it is the one such that when it is substituted in (g) and (h) we obtain solutions (i. e. 
values of 1^ and 1 2I) K which, used in (a), will give the linear functions having maximum 
correlation. For further details on canonical correlation theory, the reader is referred 
to Retelling 'a paper. 

We may summarize our results in the following 

Theorem (A); Let S 1 :(x 1 ,x g ,...,x k ) and S 2 :(x k +1 ,...,x k ) be two sets of ran- 
dom variables where k - kj+kg (k^g). Let L I and L 2 , as defined in (a), e linear func- 
tions of the variables in S 1 and S 2 , respectively, such that the variances of L I and L 2 
are unity. Let R 1 2 be the correlation coefficient between L 1 and L g . Then 

( 1 ) There are at most 2k 1 distinct extrema of R 1 2 for variations o the 1 1 ^ and l gi 
in L 1 and L 2 ; 

(2) These extrema correspond to the 2k^ roots of equation (k), which lie on the 
interval (-1 ,+1 ) and are symmetrically spaced with respect jto the origin. 

(3) The value of R 10 corresponding to the u-th root o,.. of (k) is equal in value to 

_ , ^ t ^ U j __ 

Q..V itself (the u-th canonical correlation coefficient ) . 

H U j ___ ____ 

(U) The canonical correlation coefficient between the two canonical variates corres- 
ponding to any two numerically different values of o^ la zero. 
The reader should note that no assumptions have been made about the distribution 
function of the two sets of random variables, S^ and S g . We are able to maintain this 
degree of generality as long as we are considering canonical correlation theory of popu- 
lations. However, the statistical value of this theory may be questionable if the distri- 
butions of the x's in G I and G 2 departs radically from the normal multivariate law. Again 
In studying sampling theory of canonical correlations, progress has been made only for the 
case of sampling from normal multivariate populations. Some of the sampling results are 
given in 11.11. 

11.11 The Sampling Theory of the Roots of Certain Determlnantal Equations 
In the treatment of the theory of principal components (11.9) and of the canon- 
ical correlation theory (11.10), it was found that the roots of certain determinantal 
equations in which the matrices are variance- covariance matrices, played /fundamental roles. 
In testing hypotheses concerning principal components, canonical correlations and allied 
topics, we are interested in the roots of the analogous equations in which the matrices 
are sagiple variance- covariance matrices. In the following sections, we shall derive the 



(11 .111 



XI AN INTRODUCTION TO MUUTIVARIATE STATISTICAL ANALYSIS 



261 



distributions of the roots of several sample determinants! equations when the samples 
are drawn from certain special multlvarlate normal populations. The distribution theory 
of the roots for more general assumptions has not yet been developed. 

11.111 Characteristic Roots of One Sample Variance- covarlance Matrix . 

Let us consider a sample O n :(* la ; 1-1, 2,..., k; a-i,2,...,n) from a normal multl- 
varlate population, whose variance- covarlance matrix has one root A of multiplicity k. 
The variance- co variance matrix of the population Is then of the form 



A ... 
A ... 






I 00 ... 


A 



and its Inverse is 



i/A o . . . o 
o I/A ... o 



o ... 1/A 



The p. d. f . of this population la 



(a) 



Let 



w n-i k( a ij'X*ij)' where 



_ 

2A 



1 

j)' where *i 
* 1 > lf ^"J* 



the distribution of the roots of 



a n -l 



(b) 



a 22 -l 



a i1 are distri ^ u ted according to 
- We are interested in finding 



we 



-i 



- o, 



These dlatrlbutiona and their derivations were first published in the papers by R. A. 
Plaher, "The Sampling Distribution of Some Statistics Obtained from Non- linear Equa - 
tions", Annals of Eugenics. Vol. 9 (1939) pp. 238-21*9, and by P. L. Hsu, "On the Dis- 
tribution of Roots of Certain Determinantal Equations", Annals of Eugenics. Vol. 9 
(1939), PP 250-258. The derivations uaed in this section were developed by A. M. 

Mood (unpublished). 



XI. AN INTRODUCTION TO MULJIVARIATB STATISTICAL ANALYSIS 



which is analogous 'to (1) $11.9. For a geometrical Interpretation of these roots, the 
reader Is referred to 11.9. 

In 11.9, It was shown that for a matrix I lA^JI there Is a set of numbers 
(i,M>2,...,k) (direction cosines of the principal axes of the family of ellipsoids 

- C) such that the transformation 




c gi y i - 



will yield 2 



(c) _ 

g,h-i i, J1 



where the A are roots of |A J -A4. . | o. .Expressing the z in terms of the y* In the 

g * J o ^ 

middle member of (c), we get 

^- **-% 
s g 

Hence, 



In a similar manner we can find numbers -y^ (i,h-1 ,2, . . .,k) to express a^j as 

on a 



where the 1, are the roots of (b) and the 7^ are elements of an orthogonal matrix 



I l^jjJ I; that is 1^^^ <$ ll - and Sl^hi^hl " 6 r The ^h and tjie ^lh ^P 611 ^ onl y on 
the a,,. We can get the simultaneous distribution of 1^ and -y ih by substituting (d) In 
w n-i k^ a i1M^H^ and multiplying by the Jacobian of the transformation (d). Ordering the 
l h so that 1 1 ^ 1 2 ^ ... ^ l k ^ o, the Jacobian is 

(e) (I r l 2 )(l r l 5 )...(l r l k )(l 2 -l 3 )...(l k . 1 -l k )(t)(7 lh ), 



where iC^^) is a function of the 7 ih only, not involving 1^. This can be verified In 
the following way. It is clear from (d) that the Jacobian will be a polynomial in the 



l h ; in fact it will be a polynomial of degree k ^" 1 s for there are ^^" 1 ' Independent 



elements in I IT.. J I . If 1^ 1 ^ (i^j), the transformation (d) will not be uniquely de- 
termined, and hence the Jacobian will be zero since when a transformation is not 

(locally) unique, the Jacobian is zero. This fact Implies that we can factor out teras 

\e( k 1 * 1 \ 
There are ^ * such terms, and when they have been factored out, what 



.111 



XI. AM INTRODUCTION TO MUDPWARIATE STATISTICAL ANALYSTS 



remains la Independent of the l h since the Jacobian la a polynomial of degree 
Noting that 



and that 
the 7ju aa 



(f) 



lf we can write the distribution of the l h and 



In- ilk k(k-i 



(2A) 



jqpii 



-1 



To derive the distribution of the l h alone, we integrate (f ) with respect to the 
over the space of the 7** for which I l*yjJ I i 3 orthogonal, obtaining 



n-k-g - 



(g) 



1-1 



The constant K is determined by the condition that the integral of (g) over 
the space R of the l h is unity. To find K, let us first define 

j_ 

(h) <t>(r) - f(TT lj ) r e 

j 4_i * 

D I 



1=1 



We note that 
(1) 



K 



Since 



I a, . | , we have 



U) 



but from the Wiahart distribution (see (h) In 11'.1), we find that 



XI. AN INTRODUCTION TO MIM'TVARIATS OTATI9TICAL ANALYSTS 



' 111.111 



Equating ( j) to (k) and setting n - k+2, we get 



(1) 



It remains to evaluate 







It can be verified that 

00 00 00 



KJ 



i - 1 



. 2 



Making uae of (r) In $11 .5, the right hand side may be written aa 



1-1 



Hence 



1-1 



(2A) 



k(k+i 
2 



Using this result and equations (1) and (1), we find that 



K - 



Substituting In (g), we finally obtain as the distribution element of the characteristic 
roots of (b; 

TT 2 (T l il 1 )~ T ~e ^ ' 

(") "V/r.-^ 



(2A) 



1-1 



It can be shown fairly readily by making appropriate orthogonal transformations, 
that if the sample O n : ( x i a f i-l,2,...,k; a-i,2,...,n) is from the normal multlvarlate 
population (a) in 11.3 for the case in which the characteristic roots of the matrix 



J1U112 _ XI. AN DEPRQDUCTIOy TO MUI/PIVARIATE gPATIgTTGAL AMALYflTfl _ 

|A ^-Ad. J m o are all equal to A, say, then the characteristic roots of (b") are also 
distributed according to (m). 

We may summarize In the following 

Theorem (A): Let Q^Uj^; 1-1, 2, ...,k; a-i,2,...,n) be a sample from a normal 
multlvarlate population for which the characteristic roots of the variance- covarlance 
matrix are equal to A, Let a^ . (1, j-1 ,2, . ,.,k) be the second order sample product sums 
as defined below (a). Let ^ ,1 2 ,. ..,1^ be the roots (In descending order of magnitude ) 
of laij-Ujjl 0- The joint probability element of the l^ (1-1, 2, ...,k) Is given bj (m). 

11.112 Characteristic Roots of the Difference of Two Sample Variance- covarlance 
Matrices . 

Let us consider two samples O n '-(x^, 1-1 ,2, . ..,k; ct-1 ,2, . . . ,^ ) and O n :(x? a , 
1-1, 2,,.., k; a^i, 2,..., n 2 ) (n^k, n g >k) drawn from the same normal multlvarlate popula- 
tion 

VA" - 

(a) - j-75-e 

(2tr) k / 2 



i a "^i)( x ;) > (t-1,2,). In this section, we shall derive the distribu- 

tion of the roots of 



(b) 

In 11.9> we have seen that there la a linear transformation 

(c) x l " a l " 2I c gl z g (l-i, 2, ..., 

such that 



Now let 

^B 

fAg 

I.e. 

(e) x 1 - a 1 - 

- 

Then 



266 XI. AN INTRODUCTION TO MULTIVARIATE STATISTICAL ANALYSIS i 1 



The transformation (e) when performed on the sample values gives us 



where 



Now equation (b) becomes 



|d gi' 



Clearly the roots of 



are the same as those of equation (b). Note that the b^i are functions of the w5 , such 
that each value of i, t, and a, w? a is distributed according to a normal law with zero 
mean and unit variance, the w^ a being independently distributed. This shows that we lose 
no generality by assuming that A^ - i, if i=j, and o if i ^ j. 
Under this assumption, the a^. have the distribution 



(g) 



From a theorem In algebra* we know that there la & transformation of the a^. such that 



(h) 



where are the roots of (b) (arranged, say, In descending order of magnitude). The 



See M. Bocher, loc. clt. p. 171. 



J1 1 11g 



AN 



TO MTTnPTVARTATE STATIST THAT. 



P67 



and the e^ are functions of a^. and 
gtituting in 

(i) " -1 ir(a 



; hence, their distribution may be found by sub- 



and multiplying by the Jacobian of the transformation (h). By following a procedure 
similar to that of 11.111, we can show that the Jacobian of (h) is 

1 a2 N 



where Ip(Ujj) is a function of the u^, independent of e^. Hence, the simultaneous distri- 



bution of the e, and u, ., is 



n,-k-2 



k(k-i 



TT 



(J) 



n -k-2 



k(n 2 -1 ) k(k-l ) 



TT 



Tl (ej-eJUJtu.J. 
l<j-i a J 1J 



1=1 



Noting that '.u 



lu^l 



lu jh l and 



|u.,| we see that (j) factors into a function of the e^ and a function of the 



n,-k-2 



n 2 -k-2 



n 1 +n 2 -?k-4 



where C is a constant. On integrating with respect to the u, , we get the marginal distri- 



n,-k-2 n -k-2 



bution of the 



(1) 



K ia a constant to be determined so the integral of (1) over the range of the e^ is unity. 
Following a procedure similar to that used in determining K in 11.111, we evaluate K in 
(1) and obtain as the distribution element of the e. 



268 



AN HirRQDDGTIQN TO MUUTIVARIATE STATISTICAL ANALYSIS 



C11.113 




(m) it 



It should be emphasized that distribution (m) holds for the roots of (b) where 

1 2 

the a^. and the a, . are any two sets of random variables distributed Independently ac- 
cording to the Wishart distributions 

(n) w (a 1 ;A ) , w (a 2 ;A ), 

where n 1 and n g are both > k. In fact, we may summarize our results in 

Theorem (A): Let a], and a|* be Independently distributed according to the 
Wlshart distributions (n). Let B I ,e g , . , .,6^ (in descending order) be the roots of the 
equation I *(*]***! j)- a j[j I " - Then the joint probability element of the e.^ (1=1 ,2, .. , ,k) 
ll given bj (m), where the range of the e f s J.3 i^^e^. . .^e, >o. 

11.113 Distribution of the Sample Canonical Correlations. 

Corresponding to the population canonical correlations discussed In 11.10, 
there are canonical correlations of a sample. In this section, we shall determine the 
distribution of the sample canonical correlations when the smaller set of variatea has a 
normal multivarlate distribution independent of the other set. 

Consider a sample O n :'(x ua ; u1 ,2,. ..^-fkg; ai,2, ...,n) from a population where 
the first k 1 varlates have a normal distribution and the remaining k 2 variatea are dls- 



tributed independently of the first k 



). Let 



The canonical correlations of the sample are defined as the roots of 



i -1- 



( 1 > J = l 



(a) 



Multiplying each of the first k 1 columns by 1 and then factoring 1 out of each of the last 
k rows, we see that (a) la equivalent to 



(b) 



a pj I "? 



S11..113 



XI. AN INTRODUCTION TO MULTIVARIATE STATISTICAL ANALYSIS 



! 
except for a factor of 1 . Since we are not Interested in the roots which are 

Identically zero, we shall confine our attention to the roots of (b). 

Let a pc * be the element corresponding to a in the Inverse of I I a I | 
(p,q-k 1 +i , ...jlc^kg). After multiplication on the left by 



(c) 



equation (b) becomes 



hi 




(d) 




pq 



- o, 



Since each member in the upper right hand block is and since each element in the lower 
right hand block is 6 , (d) can be reduced to 



(e) 




o. 



The roots of (e) (which are also roots of (b)) are the sample canonical correlation co- 



efficients. Let the squared roots (in descending order) be 1^ 

that 

I 



We observe 






(f) 



pr 



; 1. e., a j _ J . may be written as 



and 



where, in the determinants on the right, h and j are fixed but p,r k^+1 , . . ., 
Let this value be b^. If we consider the x pa (a i,2,...,n; p - k^l , ...^k 

fixed with I |a_ | | positive definite, then a^ . and b^ . are bilinear forms in x 
pq 

Xjp (a,p - 1,2,...,n; l,j = 1,2,... 

where I IG a J I is of rank n-1, and b A . may be written as 

of rank k ? , H a n being a function of the fixed x . a^ 

forms in the x iot and x.^ having matrices which do not depend on 1 and j. By Cochran f s 



> H a ^x la x^, where | |H a ^| I is 
and b. are, therefore, bilinear 



270 _ XI. AN INTRODUCTION TO MULTIVARIATE STATISTICAL ANALYSIS _ 811.113 
Theorem, we know that there is a transformation which applied to the x la would make 



a n "" 



Applying this same transformation to each set we get 



The y's are normally and independently distributed with zero means and equal variances. 
Thus (e) may be written in the form 



where c^i a n"^iv the C i1 anc ^ b i1 t>ein S independently distributed according to Wlshart 
distributions w n _ k _^ k (c^Bji) and w k ^ k (VjfBj*), where IIBjjII is some positive 
definite matrix. 

Therefore, it follows from the results of 11.11? that the square of the roots 
of (e) (1. e. the square of the canonical correlation coefficients) have the distribution 
(m) where n 1 - n-k g , k=k 1 and n g - k 2 +i . That is, the distribution is 



where 6^^ - 1, 1=1,2,...^^ 

We may summarize our results in 

Theorem (A); let O n : (x la ; ii l 2 / ... f k 1 A g ; k 1 ^k c ; al ,2, ...,n) be a sample 
from a population in which the first k 1 varlates are distributed according ^o a normal 
multivariate distribution^ but independently of the remaining k g variates (^lich may have 
any arbitrary distribution or may be "fixed* 1 variates). Let a uv (u^v^i^^.^k^kg) be 
the second order sample product sums as defined above (a), and let l^,!^,...,!^ be the 
squared roots (squared canonical correlation coefficients ) of equation (b). The 



distribution element of the 1^, (i=i,2,...,k 1 ) is given by (g) where 6^^ 1^, and where 



the range of the I 2f s \a such that 



LITERATURE FOR SUPF-*MENTARY READING 

1 . American Standards Association: "Guide for Quality Control and Control Chart Method 
of Analyzing Data" (19^1) and "Control Chart Method of Controlling Quality 
During Production" (19^2), American Standards Association, New York. 

2. Anderson, R. L. : "Distribution of the Serial Correlation Coefficient", Annals of 

Math. Stat . , Vol. 13, (19^2) pp. 1 - 13. 

3. Bartlett, M. S.: "On the Theory of Statistical Regression", Proc. Royal Soc. of 

Edinburgh, Vol. 53 (1933), PP- 260 - 283. 

k. Bartlett, M. S. : "The Effect of Non-Normality on the t Distribution, Proc. Camb. 
Phil. Soc., Vol. 31 (1935), PP- 223 - 231. 

5. Battin, I. L. : "On the Problem of Multiple Matching", Annals of Math. Stat., Vol. 

13 (19^2), pp. 29^ - 305. 

6. B6cher, M. : Introduction to Higher Algebra. MacMillan, New York (1907). 

7. Bortkiewicz, L. von: Die Iterationen. Berlin, Springer, (1917), 

8. Brown, George W. : "Reduction of a Certain Class of Statistical Hypotheses", Annals 

Of Math, atfit., Vol. 11 (19^0), pp. 25 1 * - 270. 



9. Camp, B. H. : "A New Generalization of Tchebychef f ! s Inequality", Bull. Amer. Math. 
Soc., Vol. 28 (1922), pp. U27 - 



10. Cochran, G. C.: "The Distribution of Quadratic Forms in a Normal System, with 

Applications to the Analysis of Covariance", Proc. Camb. Phil. Soc., Vol. 30 
(193*0, pp. 178 - 191. 

11. Copeland, A. H. : "Point Set Theory Alplied to the Random Selection of the Digits of 

an Admissible Number", Amer. Jour. Math., Vol. 58 (1936), pp. 181 - 192. 

12. Craig, C. C.: "On the Composition of Dependent Elementary Errors", Annals of Math., 

Vol. 33 (1932), pp. 181* - 206. 

-<3. Craig, A. T.: "On the Distribution of Certain Statistics", Amer. Jour. Math,, Vol. 
5^ (1932), pp. 353 - 366. 

Ik. Cramer, H. and Wold, H. : "Some Theorems on Distribution Functions", Jour. London 
Math. Soc., Vol. 11 (1936), pp. 290 - ?9k. 

\f5. Curtiss, J. H. : "On the Theory of Moment Generating Functions", Annals of Math. 
Stat., Vol. 13 09^2), pp. 1*30 - if 33. 

16. Daly, J. F. : "On the Unbiased Character of Likelihood Ratio Tests for Independence 
in Nonnal Systems", Annals of Math. Stat., Vol. 11 (19^0), pp. 1 - 32. 



LITBRATITO FOR 3IPPLEMEOTW RBAPHfg 



17. Darmois, G. : Statistique Mathematique. Paris, Doin, 1928. 

18. Deming, W. E., and Birge, R. T.: "On the Statistical Theory of Errors", Rev. 

Modern Phya., Vol. 6 093*0, PP- 122 - 161. 

19. Dodd, E. L. : "Probability as Expressed by Asymptotic Limits of Pencils of Sequences" 

Bull. Amer. Math. Soc., Vol. 36, (1930), pp. 299 - 305. 

20. Dodd, E. L.: "The Length of Cycles Which Result from the Graduation of Chance 

Elements", Annals of Math. Stat . , Vol. 10 (1939), PP- 254 - 26k. 

21. Dodge, H. P., and Romig, H. G. : "A Method of Sampling Inspection", Bell System 

Tech. Jour., Vol. VIII, (1929). 

22. Dodge, H, F., and Romig, H. G. : "Single Sampling and Double Sampling Inspection 

Tables", Bell System Tech. Jour., Vol. XX (19^1 ) 

23. Doob, J. L.: "Probability and Statistics", Trans. Amer. Math. Soc., Vol. 36 (193*0, 

pp. 759 - 775. 

24. Feller, Willy,: "On the Integral Equation of Renewal Theory", Annals of Math. Stat . , 

Vol. 11 (1941 ), pp. 243 - 267. 

25. Fertig, J. W. : "On a Method of Testing the Hypothesis that an Observed Sample of n 

Variables and of Size N has been drawn from a Specified Population of the Same 
Number of Variables", Annals of Math. Stat., Vol. 7 (1936), pp. 113 - 163. 

26. Fertig, J. W. : "The Testing of Certain Hypotheses by means of Lambda Criteria with 

Particular Reference to Physiological Research", Biometric Bulletin, Vol. 1 
(1936), pp. 1*5 - 82. 

27. Fisher, R. A. and Yates, F. : Statistical Tables for Biological , Agricultural and 

Medical Research, London, Oliver and Boyd, 1938. 

28. Fisher, R. A.: "On the Interpretation of xf froro Contingency Tables, and the Calcu- 

lation of P., Jour. Roy. Stat. Soc., Vol. 85 (1922), pp. 87 - 9k. 

29. Fisher, R. A.: "Frequency Distribution of the Values of the Correlation Coefficient 

in Samples from an Indefinitely Large Population", Biometrlka, Vol. 10 (1915), 
pp. 507 - 521. 

30. Fisher, R. A.: "On the Mathematical Foundations of Theoretical Statistics", Phil. 

Trans. Roy. Soc. London, Series A, Vol. 222 (1921), pp. 309 - 368. 

31. Fisher, R. A.: "On a Distribution Yielding Error Functions of Several Well Known 

Statistics", Proc. Internat . Cong, of Math., Toronto (1924), pp. 805 - 813. 

32. Fisher, R. A.: "The Theory of Statistical Estimation", Proc. Camb. Phil. Soc., 

Vol. 22 (1929), PP- 700 - 725. 

33. Fisher, R. A.: "Applications of 'Student's' Distribution", Metron, Vol. 5 (1926), 

pp. 90 - i ok. 

34. Fisher, R. A.: "The General Sampling Distribution of the Multiple Correlation 

Coefficient", Proc. Roy. Soc. London, Series A, Vol. 121 (1928) pp. 654 - 673. 



LITERATURE FOR SUPPLEMENTARY READING 



35- Fisher, R. A.: "Inverse Probability ", Proc. Camb. Phil. Soc., Vol. 26 (1930), pp. 
528 - 535. 

36. Fisher, R. A.: "The Concepts of Inverse Probability and Fiducial Probability Re- 

ferring to Unknown Parameters", Proc. Roy. Soc. London , Series A, Vol. 139 
(1933), PP. 3^3 - 3^8, 

37. Fisher, R. A.: "The Fiducial Argument in Statistical Inference", Annals of Eugenics, 

Vol. 6 (1935), PP. 391 - 398. 

38. Fisher, R. A.: The Design of Experiments. London, Oliver and Boyd, 1935- 

39. Fisher, R. A.: "The Sampling Distribution of Some Statistics Obtained from Non- 

Linear Equations", Annala of Eugenics, Vol. 9 (1939), pp. 238 - 2^9. 

1*0. Fisher, R. A.: Statistical Methods for Research Workers. 8th Ed., London, Oliver 
and Boyd, 191*1 . 

in. Fry, T. C.: Probability and its Engineering Uses. Van Nostrand Co., 1928. 

1*2. Girahick, M. A,: "On the Sampling Theory of the Roots of Determinantal Equations", 
Annals of Math. Stat., Vol. 10 (1939), pp. 203 - 22U. 

43. Greville, -T. N. E. : "The Frequency Distribution of a General Matching Problem", 
Annals of Math* Stat., Vol. 12 (19^1), PP 350 - 35 1 *. 

M. Gumbel, E. J. : "Les Valewla Extremes des Distributions Statist iques", Annales de 
I'Inatitut a. Poincar (1935). 

45. Hamburger, H. : "Uber eine Erweiterung des Stleltzeaachen Momentenproblems " , Math. 

Annalen. Vol. 81 (1920) pp. 235 - 319, and Vol. 8? (1921), pp. 120 - 165, 168 - 

187. 

46. Retelling, H. : "The Generalization of Student 'a Ratio", Annals of Math. Stat . > Vol. 

?, (1931 ), pp. 359 - 378. 

1*7. Hotelllng, H. : "Analysis of a Complex of Statistical Variables into Principal Com- 
ponents", Jour. Ed. Paych., Vol. 2k (1933), pp. M? - Ml, pp. ^98 - 520. 

48. Hotelllng, H. : "Relations between Two Seta of Varlatea", Blometrika, Vol. 28 (1936), 
pp. 321 - 377. 

1*9. Hotelllng, H. : "Experimental Determination of the Maximum of a Function", Annala of 
Math. Stat., Vol. 12 (19^1), pp. 20 - 45. 

50. Hsu, P. L. : "On the Distribution of Roots of Certain Determinantal Equations", 

Annala of Eugenics, Vol. 9 (1939) pp. 250 - 258. 

51. Hau, P. L.; "On Generalized Arialyala of Variance", Blometrlka, Vol. 31 (19^0) pp. 

2?1 - 237- 

52. Ingham, A. E. : "An Integral which Occura in Statistics", Proc. Camb. Phil. Soc., 

Vol. 29 (1933), PP. ?70 - 276. 

53. Irwln, J. 0.: "Mathematical Theorems Involved in the Arialyais of Variance? Jour. 

Roy. Stat. Soc., Vol. 9^ (1931), pp. 284 - 300. 



21k _ LITERATURE FOR SUPPLEMENTARY READING _ 

5k. Irwin, J. 0. and others: "Recent Advances in Mathematical Statistics", Jour. Roy, 
Stat. Soc., Vol. 95 (1932), Vol. 97 093M, Vol. 99 (1936). 

55. Keurike, E. : Einfuhrung in die Wahrscheinlichkeitstheorie. Leipzig, Hirzel, 19J2. 

56. Kendall, M. G. and Smith, B. B.: "The . Problem of m Rankings", Annals of Math. Stat 

Vol. 10 (1939), pp. 275 - 287. 

57. Keynes, J. M. : A Treatise on Probability. MacMillan, London, 1921. 

58. Kolodziejczyk, S. : "On an Important Claaa of Statistical Hypotheses", Biometrika, 

Vol. 27 (1935), pp. 161 - 190. 

59. Koopman, B. 0.: "On Distributions Admitting a Sufficient Statistic", Trans. Amer. 

Math. Soc., Vol. 39 (1936), pp. 399 - ^09. 

oo, Koopmans, T.: On Modern Sampling Theory. Lectures delivered at Oslo, 1935, (unpub- 
lished). 

61. Koopmans, T. : "Serial Correlation and Quadratic Forma in Normal Variables", Annals 

of Math. Stat.. Vol. 13, 09^2), pp. U - 33. 

62. Kullback, S.: "An Application of Characteristic Functions to the Distribution Prob- 

lem in Statistics", Annals of Math. Stat., Vol. 5 (193M, pp. 26^ - 307. 

63. Lawley, D. N.: "A Generalization of Fisher's z", Biometrika, Vol. 30 (1938), pp. 

180 - 187. 

6k. L6vy, D. : Theorie de L f addition des Variables Aleatoires . (Monographies des prob- 
abilities, fasc. 1) Gauthier, 1937. 

65. Lotka, Alfred J. : "A Contribution to the Theory of Self-Renewing Aggregates, with 

Special Reference to Industrial Replacement", Annals of Math. Stat., Vol. 10 
(1939), PP. 1 - 25. 

66. Madow, W. G. : "Contributions to the Theory of Multlvarlate Statistical Analysis", 

Trans. Amer. Math. Soc., Vol. kk (1938), pp. 1*5^ - 1*95. 

67. Mises, R. von: Wahrscheinlichkeitsrechnung und ihre Anwendung in der Statlstik und 

Theoretischen Physik., Leipzig, Deuticke, 1931. 

68. Mood, A, M.: "The Distribution Theory of Runs", Annals of Math. Stat., Vol. 11 

pp. 367 - 392. 



69. Mosteller, Frederick, "Note on an Application of Runs to Quality Control Charts", 

Annals of Math. Stat., Vol. 12, (19^1) pp. 228 - 232. 

*' 

70. Neumann, J. von: "Distribution of the Ratio of the Mean Square Successive Difference 

to the Variance", Annals of Math. Stat., Vol. 12 (19^1), pp. 367 - 395. 

71. Neyman, J. and Pearson, E. S.: "On the Use and Interpretation of Certain Test 

Criteria for Purposes of Statistical Inference", Biometrika. Vol. 20 A (1928) 
pp. 175 - 2UO, pp. 263 - 29k. 

72. Neyman, J. and Pearson, E. S. : "On the Problem of the Most Efficient Tests of 

Statistical Hypotheses", Phil. Trans. Roy. Soc., London, Ser. A, Vol. 231 (1933) 
p. 289. 



LITERATURE FOR SUPPLEMENTARY READING 



73. Neyman, J. and Pearson, &. 3.: "The Testing of Statistical Hypotheses In Relation 
to Probabilities a priori", Proc. Camb. Phil. Soc., Vol. 29 (1933), pp. ^92 - 
510. 

Ik. Neyman, J. and Pearson, lif. 3.: Statistical Research Memoirs. University College, 
London, Vol. 1 (1936), Vol. 2 (1937). 

75. Neyman, J.: "On the Two Different Aspects of the Representative Method: The Method 

of Stratified Sampling and the Method of Purposive Selection", Jour. Roy, stat. 
SQfi., Vol. 97 O93M, PP. 558 - 625- 

76. Neyman, J.: "Su Uh Teorema Concernete le Cosiddette Statiatiche Sufficient!", 

Glornale dell* Institute Italiano degli Attuari, Vol. 6 (193^), pp. 320 - 331*. 

77. Neyman, J. : "Outline of a Theory of Statistical Estimation baaed on the Classical 

Theory^f Probability", Phil. Trana. Roy. Soc., London, Ser. A, Vol. 236 (1937) 
pp. 333 - 380. 

78. Perron, 0.: Die Lehre von der Kettenbriichen. Leipzig, Teubner, 1929. 

79. Pearson K. : "On the Criterion that a Given Set of Deviations from the Probable in 

the Case of Correlated Variables ia Such that It Can Reasonably be Supposed to 
have Arisen from Random Sampling 11 , Phil. Mag. 5th Ser., Vol. 50 (1900), pp. 157 
- 175. 

80. Pearson, K. : Tables for Statisticians and Biometricians. Cambridge University 

Press, 1914. 

81 . Pearson, K. : Tables of the Incomplete Gamma Function. Cambridge University Press, 
1922. 

82. Pearson, K.: Tables of the Incomplete Beta Function. Cambridge University Press, 

1932. 

83. Pomey, J. B. : Calcul des Probabilities. Paris, Gauthier-Villars, 1936. 

84. Reichenbach, H. : Wahrscheinlichkeitslehre. Leiden, Sijthoff, 1935. 

85. Rider, P. R.: "A Survey of the Theory of Small Samples", Annals of Math., Vol. 31, 

(1930), pp. 577 - 628. 

86. Rietz, H. L. : Mathematical Statistics. Open Court Publishing Co., Chicago, 1927. 

87. Sasuly, M. : Trend Analysis in Statistics. The Brookings Institution, Washington, 



88. Scheffe, H. : "On the ratio of the variances of two normal populations", Annals of 

Math. Stat,, Vol. 13 (19^2), pp. 371 - 388. 

" X """"" """""""" 

89. Shewhart, W. A. : Economic Control of Quality of Manufactured Product . Van Nostrand, 

1931. 

90.- Shewhart, W. A. : Statistical Method from the Viewpoint of Quality Control . U. S. 
Department of Agriculture, Wash -*ton, 1939. 

91 . Smirnoff, V. I. : "Sur lea Ecarts de la Courbe de Distribution Qnpiriquey Recuell 
Math&natique, Moscow, Vol. 6 (1939) pp. 25 - ?6. 



276 _ LITERATURE PQR SUPPLEMENTARY RBADIHQ _ _ _ 

92. Snedecor, G, W. : Calculation and Interpretation of Analysis of Variance and 

Covariance. Collegiate Press, Ames, Iowa, 193 1 *. 

93. Soramerville, D. M. Y. : An Introduction to the Geometry of N Dimensions. London, 

Methuen, (1929). 

91*. Stevens, W. L. : "Distribution of Groups in a Sequence of Alternatives", Annals of 
Eugenics. Vol. IX (1939). 

95. "Student": "The Probable Error of a Mean". Biometrika. Vol. 6 (1908), pp. 1 - 25. 

96. Swed, Frieda S., and Eisenhart, C.: "Tables for Testing Randomness of Grouping in 

a Sequence of Alternatives", Annals of Math. Stat . , Vol. U (191*3). 

97. Wald A. and Wolfowitz, J. : "On a Test of Whether Two Samples are from the Same 

Population", Annals of Math., Stat., Vol. 11, (19^0), pp. U7 - 162. 

98. Wald, A.:' "Contributions to the Theory of Statistical Estimation and Testing 

Hypotheses", Annals of Math. Stat., Vol. 10 (1939), pp. 299 - 326. 

99. Wald, A. : Lectures on the Analysis of Variance and Covariance. Columbia University 



100. Wald, A.: Notes on the Theory of Statistical Estimation and of Testing Hypotheses. 

Columbia University (19^1). 

101. Wald, A.: "Asymptotically Shortest Confidence Intervals", Annals of Math. Stat., 

Vol. 13, (19^2), pp. 127 - 137- 

102. Wald, A,: "Setting of Tolerance Limits when the Sample is Large", Annals of Math. 

Stat., Vol. 13, (19^2), pp. 389 - 399- 

103. Welsh, B. L. : "Some Problems in the Analysis of Regression among k samples of Two 

Variables". Biometrika. Vol. 27 (1935), pp. H5 - 160. 

1QJ*. Whittaker, E. T. and Watson, G, N. : A Course in Modern Analysis, Hh ed., Cambridge 
University Press, 1927. 

105- Whittaker, E. T. and Robinson, G.: The Calculus oj* Observations. London, Blackie 
and Son, 1932. 

106. Widder, D. V., The Laplace Transform. Princeton University Press, 191*1. 

107. Wiener, N. : The Fourier Integral. Cambridge University Press,, 1933. 

108. Wilks, S. S. : "Certain Generalizations in the Analysis of Variance", Biometrika, 

Vol. 2 1* (1932), pp. Vn - 1*94. 

109. Wilks, S. S. : "On the Sampling Distribution of the Multiple Correlation Coefficient" 

Annals of Math. Stat., Vol. 3 (1932), pp. 196 - 203. 

110. Wilks, S. S. : "Moment-generating Operators for Determinants of Product Momenta in 

Samples from a Normal System". Annals of Math., Vol. 35 (193*0, pp. 312 - 3^0. 

111. Wilks, S. S.: "On the Independence of k sets of Normally Distributed Statistical 

Variables", Econometrica, Vol. 3 (1935), pp. 309 - 326. 



LITERATURE FOR SUfflUWffff^ READING 77 

112, Wilka, S. 3.: "The Likelihood Teat of . Independence In Contingency Tables", Annals 

of Math. Stat., Vol. 6 (1935), PP. 190 - 195- 

113. Wilka, 3. S. : "Shortest Average Confidence Intervals from Large Samples", Annals 

of Math. Stat., Vol. 9 (1938), pp. 166 - 175. 

ilk. Wilks, S. S. : "Analysis of Variance and Covariance of Non- Orthogonal Data", Metron, 
Vol. XIII (1938), pp. 1*1 - 15*. 

115. Wilks, S. S.: "Determination of Sample Size for Setting Tolerance Limits", Annals 

of Math. Stat., Vol. 12 (19*1), pp. 9* - 95* 

116. Wilks, S. S. : "Statistical Prediction with Special Reference to the Problem of 

Tolerance Limits", Annals of Math. Stat,, Vol. 13 (19*2), pp. 4oo - 1*09. 

117. Wishart, J. : "The Generalized Product Moment Distribution In Samples from a Normal 

Multivariate Population", Biometrlka, Vol. 20 A (1928), pp, 32 - 52. 

118. Wishart, J. and Bartlett, M. S.: "The Generalized Product Moment Distribution in a 

Normal Distribution", Proc. Camb. Phil, Soc., Vol. 29 (1933), pp. 260 - 270. 

119. Wishart, Ji and Fisher, R. A.: "The Arrangement of Field Experiments and the 

Statistical Reduction of the Results". Imp. Bur., Soil. 3d., 1930. (Tech. 
Coram. 10.) 

120. Wolfowitz, J. : "Additive Partition Functions and a Class of Statistical Hypotheses" 

Annals of Math. Stat., Vol. 13 (19*2), pp. 2*7 - 279- 

121. Yates, P.: "The Principles of Orthogonality and Confounding in Replicated Experi- 

ments", Jour. Agrlc. Science. Vol. 23 0933)> pp. 108 - 1*5. 

122. Yates, F. : "Complex Experiments", Jour. Roy. Stat. Soc., Supplement , Vol. 2 (1935)* 

pp. 181 - 2*7. 

123. Yule, G. U. : An Introduction to the Theory of Statistics, 10th Ed., London, Griff in, 

1936. 



INDEX 



Analysis of covarlance, 195 

extension to several fixed 
variatea, 199 

Analysis of covariance table, 198 
Analysis of variance, 176 

for incomplete lay-outs, 195 

for Graeco-Latin square, 192 

for Latin square, 189 

for randomized blocks, 180 

for two-way layout, 1 80 

for three-way layout , 1 86 

multivariate extension of, 250 
Average outgoing quality limit, 225 
Average quality protection, 223 
Beta function, 75 
Binomial distribution, 47 

Bernoulli case, ^9 

moment generating function of, U8 

negative, 56 

Poisson case, U9 

Binomial population, confidence limits 
of p in large samples from, 129 

Borel-measurable point set, 10 
Canonical correlation coefficient, 259 

Canonical correlation coefficients, 
distribution of, in samples, 270 

Canonical variate, 259 

C. d. f. (cumulative distribution 
function), 5 

Central limit theorem, 81 

Characteristic equation of a variance- 
co variance matrix, 



Characteristic function, 82 
Characteristic roots 

of difference of two sample variance- 
covariance matrices, distribution 
of, 268 

of sample variance- covariance matrix, 
distribution of, ?64 

Chi square distribution, i 02 

moment generating function of, 7^ 

momenta of, 103 

reproductive property of, 105 



Chi-square problem, Pearson's original, 217 
Cochran ! a Theorem, 107 
Complete additivity, law of, 6 

Component quadratic forms, resolving 
quadratic Into, 168 

Conditional probability, 15 

Conditional probability density function, 17 
for normal bivarlate distribution, 62 
for normal multivariate distribution, 71 

Confidence coefficient, 12^ 

Confidence interval, *\2k 

Confldefnce limits, 12U 
from large samples , 127 
graphical representation of, 126 

of difference between means of two normal 
populations with same variance, 130 

of mean of normal population, 130 

of p in large samples from binomial 
population, 1 29 

of range of rectangular population, 123 

of ratio of variances of two normal 
population, 131 

of regression coefficients, 159 

of variance of normal population, 131 

Confidence region, 132 

Confounding, 1 86 

Contagious distribution function, 55 

Consistency of estimate, 133 

Consumer 1 s risk, 222 

Contingency table, 21 k 

Chi-square test of independence in, 216 

likelihood ratio test for independence 
in, 220 

Continuous distribution function, 
bivariate case, 10 

univariate case, 8 
Convergence, stochastic, 81 
Correlation coefficient, 32 

between two linear functions of random 
variables, Jk 

canonical, 260 

canonical distribution of, 270 

distribution of, 120 



INDEX 



P79 



Correlation coefficient (con f t) 
multiple, ^ 5 

multiple, distribution of, In samples 
from normal multlvarlate popula- 
tion, 2kk 

partial, U2 
Covarlance, 32 
analysis of, 195 

between two linear functions of 
random variables, 34 

Critical region of a statistical test, 1 
Cumulative distribution function, 

bivariate case, 8 

k-variate case, 11 

continuous case, 10 

continuous, univarlate case, 8 

discrete, bivariate case, 10 

empirical, 2 

mixed case, 1 1 

postulates for, bivariate case, 9 

postulates for, k-variate case, 12 

postulates for, univariate case, 5 

univariate case, 5 
Curve fitting, 

by maximum likelihood, 1^5 

by moments, 1 4 5 
Curvilinear regression, 1 66 

Difference between two sample means, 
distribution of, 100 

Difference of point sets, 5 
Discrete distribution function, 

bivariate case, 10 

univariate case, 7 
Disjoint point sets, 5 
Distribution function, 

binomial, ki 

contagious, 55 

cumulative, bivariate case, 8 

cumulative, univariate case, 5 

discrete, univariate case, 7 

limiting, of maximum likelihood esti- 
mates in large samples, 138 

marginal, 12 
multinomial, 51 
negative binomial, 5 1 * 
normal bivariate, 59 
normal multivariate, 63 



Distribution function (con f t) 
normal or Gaussian, 56 
of canonical correlation coefficients, 270 

of characteristic roots of difference of 
sample variance -co variance matrices, 268 

of characteristic roots of sample 
variance- covariance matrix, 26** 

of correlation coefficient, 120 ^ 

of difference between means of two samples 
from a normal population, 100 * 

of exponent in normal multivariate popula- 
tion, 101* 

of Fisher's z, 1 1 5 \^ 

of Retelling 1 s generalized "Student" ^ 
ratio, 238 

of largest variate in sample, 91 

of likelihood ratio for generalized "Student" 
statistical hypothesis, r38 

of linear function of normally distributed 
variables, 99 \S 

of means in samples from a normal bivariate 
population, 100, ioi\x 

of means in samples from a normal multi- 
variate population, 101 \s 

of median of sample, 91 \S~ 

of multiple correlation coefficient in 
samples from normal multivariate popula- 
tion, 2U 

of number of correct matchings in random 
matching, 210 

of number of trials required to obtain a 
given number of "successes", 55 

of order statistics, 90 
of range of sample, 92 ^ 

of regression coefficients, k fixed variates, 
16? 

of regression coefficients, one fixed ^ 
variate, 159 

of runs, 201 

of sample mean, limiting, in large samples, 

81 

of second order sample moments in samples 
from normal bivariate population, 1 1 6 

of smallest variate in sample, 91 
of Snedecor f a F ratio, 1U \S 
of "Student's" ratio, 110 ^ 

of sums of squares in samples from normal 
population, 1 02 ^ 

of total number of runs, 203 
Poisson, 52 
Polya-Eggenberger, 56 
Type I, 76 



280 



INDEX 



Distribution function (con't) 
Type III, 72 
Wishart, 120 
Wishart, geometric derivation of, 227 

Distribution functions, Pearson system 
of, 72 

Efficiency of estimates, 151* 
Equality of means, 

of normal populations, test of, 176 

test for, In normal raultlvarlate popu- 
lation, 238 

Estimation, 

by Intervals, 122 

by points, 135 
Estimates, 

consistency of, 133 

efficiency of, 13^ 

maximum likelihood, 136 

optimum, 133 

sufficiency of, 135 

unbiased, 133 
Expected value, 28 
Factorial moments, 20k 
F distribution, Snedecor's, 11U 
Fiducial limits, 126 
Finite population, sampling from, 83 
Fisher's z distribution, 115 
Fixed Variate, 16 
Gamma function, 73 
Gaussian distribution function, 56 
Generalized sum of squares, 229 
Graeco-Latin square, 190 

analysis of variance for, 192 
Gram-Charlier series, 76 
Grouping, corrections for, 9k 
Harmonic analysis, 166 
Hermite polynomials, 77 

Hotelling's generalized "Student 11 
ratio, 238 

Incomplete lay-outs, 192 
Independence, 

linear, 1 60 

In probability sense, 13 

of mean and sum of squared deviations 
in samples from normal population, 108 

of means and second order sample moments 
in samples from normal bivarlate 
population, 120 



Independence (con ! t) 

of means and second order moments in 
samples from normal multivariate 
population, 120, 233 

of seta of variates, test for, in normal 
multivariate population, 2H2 

mutual, 14 

statistical, 13 
Inspection, sampling, 220 
Interaction, 

first order, 181 

second order, 18^ 
Jacobian of a transformation, 

for k variables, 28 

for two variables, 25 

Joint moments of several random variables, 31 
Lagrange multipliers, 97 
Laplace transform, 38 
Large numbers, law of, 50 
Large samples, confidence limits from, 127 
Largest variate in sample, distribution of, 91 
Latin square, 186 

analysis of variance for, 189 

complete set of, 191 

orthogonal, 190 
Law of complete additivlty, 6 
Law of large numbers, 50 
Least square regression function, kk 

variance about, kk 
Least squares, 1*3 
Likelihood, 136 
Likelihood ratio, 150 
Likelihood ratio teat, 150 

in large samples, 151 

fpr equality of means in normal multivariate 
populations, 238 

for general linear regression statistical 
hypothesis for normal multivariate 
population, 2^7 

for general noiroal regression statistical 
hypothesis, 170 

for independence in contingency tables, 220 

for independence of sets of variates in 
normal multivariate population, 2^2 

for "Student" hypothesis, 150 

for the statistical hypothesis that means 
in a normal multivariate population have 
specified values, 235 



INDEX 



281 



Limiting form of cumulative distribu- 
tion function as determined by 
limiting form of moment generating 
function, 38 

Linear combination of random variables, 
mean and variance of, 33 

Linear combinations of random variables, 
covariance and correlation coef- 
ficient between, ?k 

Linear functions of normally distributed 
variables, distribution of, 99 

Linear independence, 160 
Linear regression, JfO 
generality of, 165 

Linear regression statistical hypothesis, 
likelihood ratio test for, in normal 
raultivariate populations, 2^7 

Lot quality protection, 223 
Marginal distribution function, 12 
Matching theory, 

for three or more decks of cards, 212 

for two decks of cards, 208 
Matrix, 63 

Maximum likelihood, curve fitting by, \k$ 
Maximum likelihood estimate, 136 
Maximum likelihood estimates, 

distribution of, in large samples, 138 

of transformed parameters, 139 

Mean of Independent random variables, 
moment generating function of, 82 

Mean value, 29 

of linear function of random 
variables, 33 

of sample mean, 80 
of sample variance, 83 
Means, 

distribution of difference between, in 
samples from normal population, 1 oo 

distribution of, in samples from a normal 
blvariate population, 100 

distribution of, in samples from a normal 
multivariate population, 101 

Median of sample, distribution of, 91 

M. g. f . (moment generating function), 36 

Moment generating function, 36 

of binomial distribution, U8 

of Chl-square distribution, 7^ 

of mean of independent random 
variables, 82 

of multinomial distribution, 51 

of negative binomial distribution, 5^ 



Moment generating function (con't) 
of normal bivariate distribution, 60 
of normal distribution, 57 
of normal multivariate distribution, 70 
of Poisson distribution, 53 

of second order moments in samples from a 
normal bivariate population, 118 

Moment Problem, 35 
Moments, 

curve-fitting by, U5 

factorial, 204 

joint, of several random variables, 31 

of a random variable, 30 
Multinomial distribution, 51 

moment generating function of, 51 
Multiple correlation, kz 

Multiple correlation coefficient, distribution 
of, in samples from normal raultivariate 
population, 2 4 ^ 

Negative binomial distribution, 5*f 
moment generating function of , 54 

Neyman- Pearson theory of statistical tests, 
152 

Normal* bivariate distribution, 59 

conditional probability density function 
for, 62 

moment generating function of, 60 

regression function for, 62 

distribution of means in samples from, 101 

distribution of second order moments in 
samples from, 1 1 6 

independence of means and second order 
moments in samples from, 120 

Normal distribution, 56 

moment generating function of, 58 
reproductive property of, '98 

Normally distributed variables, dlstributior 
of linear function of, 99 

Normal multivariate distribution, 63 

conditional probability density function 
for, 71 

distribution of exponent in, 10k 
distribution of subset of variables in, 68 
moment generating function of, 70 
regression function for, 71 
variance- co variance matrix of, 68 
Normal multivariate population, 

distribution of means In samples from, 101 

distribution of multiple correlation 
coefficient in samples from, 



IKDEX 



Normal multivarlate population (con f t) 

dlatrlbution of second order moments 
In samples from a, 232 

general linear regression statistical 
hypothesis for, 2^7 

generalized "Student" test for, 25 4 

Independence of means and second 

order moments in samples from, 120, 233 

test for Independence of seta of 
variables in, 24? 

Normal multivarlate populations, test 
for equality of means In, 238 

Normal population, 

distribution of means in aainples 
from, i oo 

distribution of sums of squares in 
samples from, 102 

Independence of moan and sum of squared 
deviations In samples from, i 00 

Normal populations, 

distribution of dil'fe fence between moans 
in samples from, 100 

test of equality of means of several, 176 
Normal regression, 1 "; / 

fundamental theorem on testing hypothesis 
in, 170 

k fixed varlates, 1 ^o 

one fixed variato, i c ,y 
Nu i s one e pa rame t e r s , 150 
Null hypothesis, 1 J *7 
Optimum estimate, 155 
Order atat i at tea, K 

distribution the ry of, 89 

Ordering within sam^lea. tost, for randomness 
of, r>07 

Parallelogram, area o'\ < 
ParalleL >t< -pe f vn] ' jrio < f , P28 
Partial corre^.'ilic n, ^^ 

coefficient, V< 

P. d. f. (probability density function), 8 
Pearson system oi 1 distribution ''unot ii-na, *; ^ 
Pearson's original Obi -square prul 1cm, Pi 7 
Point set, 

difference, r 

product, 5 

sum, 5 
Poisson distribution, ^ 

moment generating function of, s? 



Polya-Eggenberger distribution, 55 

Population parameter, admissible set of 
values of, 1^7 

Population parameters, 

Interval estimation of, 122 

point estimation of, 153 
Positive definite matrix, 63 
Positive definite quadratic form, 

k variables, 63 

two variables, 59 

Power curve of a statistical test, 154 
Power of a statistical test, 15? 
Principal axus, 2 l j2 

direction cosines of, 256 

relative lengths of, r?56 
Principal components of a variance, 255 
Probability, conditional, 15 
Probability density function, 8 

bivariate case, 11 

conditional, 1 7 
Probability element, 8 
Probable error, 58 
Producer's risk, 222 
Product of point sets, 5 
Quadratic form, 

positive definite, 59 

resolving, into component quadratic 
forms, 168 

Quality control, statistical, 2?1 
Quality limit, average outgoing, 2?3 
Quality protection, 

average, 223 

lot, 22^ 

Randomized blocks, 177 
Randomness, 2 

Randomness of ordering within samples, 
test for, 207 

Random sample, definition of, 79 

Random variable, definition of, 6 

Range of sample, distribution of, 92 

Rectangular populatl on, 

confidence limits of range of, 123 
distribution of range In samples from a, 92 

Regression, ^ 

Rep,i'e^slori coefficient, Uo