// THE MULTIVARIATE ONE-WAY CLASSIFICATION MODEL WITH RANDOM EFFECTS BY JAMES ROBERT SCHOTT A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1981 To Susan and My Parents ACKNOWLEDGMENTS I would like to express my deepest appreciation to Dr. John Saw for suggesting this topic to me and for constantly providing guidance and assistance. He has made this project not only a very rewarding educational expe- rience but also an enjoyable one. I also wish to thank Dr. Andre Khuri, Dr. Mark Yang, and Dr. Dennis Wackerly for their willingness to provide help when called upon. Finally, a special thanks goes to Mrs. Edna Larrick who has turned a somewhat unreadable rough draft into a nicely typed manuscript. TABLE OF CONTENTS Page ACKNOWLEDGMENTS iii ABSTRACT vi CHAPTER 1 INTRODUCTION 1 1.1 The Random Effects Model, Scalar Case . . 1 1.2 The Multivariate Random Effects Model . 11 1.3 Notation 14 2 MAXIMIZATION OF THE LIKELIHOOD FUNCTION FOR GENERAL I 17 2.1 The Likelihood Function 17 2.2 Some Lemmas 27 2.3 The Maximum Likelihood Estimates .... 40 2.4 The Likelihood Ratio Test 46 3 PROPERTIES OF THE st LARGEST ROOT TEST ... 51 3.1 Introduction 51 3.2 The Uniformly Most Pov/erful Test for m = 1 51 3.3 An Invariance Property 55 3.4 The Union-Intersection Principle .... 56 3.5 A Monotonicity Property of the Power Function 6 0 3.6 The Limiting Distribution of (J) .... 66 4 MAXIMIZATION OF THE LIKELIHOOD FUNCTION WHEN I = a2I 75 4.1 The Likelihood Function 75 4.2 The Maximum Likelihood Estimates .... 79 4.3 The Likelihood Ratio Test 88 5 AN ALTERNATIVE TEST WHEN £ = a I AND ITS PROPERTIES 94 5.1 Introduction 94 5.2 An Invariance Property 95 TABLE OF CONTENTS (Continued) CHAPTER Page 5 (Continued) 5.3 A Monotonicity Property of the Power Function 96 m 5.4 The Limiting Distribution of I ij). . . 101 i=s BIBLIOGRAPHY 108 BIOGRAPHICAL SKETCH 110 Abstract of Dissertation Presented to the Graduate Council of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy THE MULTIVARIATE ONE-WAY CLASSIFICATION MODEL WITH RANDOM EFFECTS By James Robert Schott August 1981 Chairman: Dr. John G. Saw Major Department: Statistics A well-known model in univariate statistical analysis is the one-way random effects model. In this paper we investigate the multivariate generalization of this model, that is, the multivariate one-way random effects model. Two specific situations, regarding the structure of the variance-covariance matrix of the random error vectors, are considered. In the first and most general case, it is only assumed that this variance-covariance matrix is sym- metric and positive definite. In the second case, it is assumed, in addition, that the variance-covariance matrix is a scalar multiple of the identity matrix. Maximum likelihood estimates are obtained and the likelihood ratio test for a hypothesis test on the rank of the variance-covariance matrix of the random effect vectors is derived. Properties of the likelihood ratio test are investigated for the general case, while for the second case an alternative test is developed and its prop- erties are investigated. In each case a sequential proce- dure for determining the rank of the variance-covariance matrix of the random effect vectors is presented. CHAPTER 1 INTRODUCTION 1. 1 The Random Effects Model, Scalar Case Suppose a physician is considering administering some particular blood test to his patients as a part of their physical examination. He suspects that the test results vary with the presence and severity of a particular path- ological condition. In order to examine variability in the results of the blood test, the physician chooses to administer the blood test n times to each of g patients. This results in the observations x. .: i = l,2,...,g; j = 1,2, . . . ,n. A suitable model to explain the different values of x..: i = 1,2,. . . ,g; j = 1,2, ... ,n would be xij = y + ai + 2ij- d.1.1) Here u is an overall mean, a. is an effect due to the i patient, and z. . represents a random error due to the measuring process. We assume that z..: i = l,2,...,g; j = l,2,...,n are independent and have a normal distribu- 2 tion with mean zero and variance a . z If the physician is interested in using the blood test as a diagnostic tool, he will certainly be interested to know whether a major source of variation in the results of the blood test is due to variation between the patients. Since the physician will administer the test to an unlimited num- ber of patients in the future, we should properly regard the g patients involved as a sample from the entire population of patients. The patient effects, a.: i = 1,2,. . . ,g, now have the role of random variables, and (1.1.1) is a random effects model. Again we assume that a.: i = 1,2,. ..,g are independent and have a normal distribution with mean zero 2 and variance a . Thus, from our model (1.1.1) we deduce that a x. . has a normal distribution with mean \i and variance 2 L 2 a + a . a z The variation in the results of the blood test is 2 2 governed by a + a . The portion of this attributable to 2 2 2 the patients is, of course, a / (°a+0z) > and the physician would like to know whether this or, correspondingly, 9 ? 2 2 a /a is sizeable. If a /a is sufficiently large, he would a z a z choose to investigate the possible use of this test as a means of detecting the pathological condition; otherwise he would find the blood test essentially useless as a diag- nostic tool. Hence, the physician might be interested in 2 testing the hypothesis HQ : a =0 against the hypothesis H, : a2 > 0. 1 a In order to derive the likelihood ratio test for testing the hypothesis H against H-, , we first need to obtain the 2 2 likelihood function of (u,a , a ). This is most easily done z a by making a transformation. Let C be an orthogonal matrix, with the element in the i row and the j column denoted by c • , such that c, . = l//n : j = 1,2, ... ,n. Since C is orthogonal , n __ n I c, .=/n E c, . cv. = 0 for k = 2 ,3 , . . . ,n. ■i=l ^ -i=l ^ 3 D x D -1 (1.1.2) Consider the orthogonal transformation ^il^i2 W =C(xil'Xi2 xin)'- (1-1'3) Upon replacing x . . by the right side of (1.1.1) and using (1.1.2), we observe that n _ __ _ il k=1 IK i. n Yij = kl± CJk Zik for j = 2'3'""n' n where x. = Z x. /n. Thus, i. k=1 IK Cov(yi.,yik) = 0 for j # k, 2 v(Y,-i) = a f°r 3 = 2,3,. ..,n, and {x. ,y. - ,y .-,,... ,y . }. , is a set of gn mutually l. Ji2 Ji3 ,Jm i=l 3 J independent random variables, where x. has a normal dis- 2 2 tribution with mean u and variance a /n + a , and y. . z a J l j has a normal distribution with mean zero and variance a z n n 2 " 2 Note also from (1.1.3) that Z (x..-x. ) = I y . j=l 1D X- j = 2 ^ and denote this quantity by u . . We can now write the joint density function of y . „ ,y .-,.... ,y . as J J i2 J i3 ' ' 1 in n f^i2^i3 ^in;az2) - . H (2^) "^ expf-y2. /2a2] = (2.0*)-"*"-1' exp[- ly2.,2o2] z . „n z 3=2 - i ? 2 2. , 2, = g(.^/ij;az) = 9 in o\ (1.1,4) o„2 o„2 2 z 2a 2a z z 2 Differentiating (1.1.4) with respect to y and a , we obtain, respectively, the equations (x..-y)gn (x -y) gn+u+v -+h+1 2 U' _. 2.2 , 2 a2 2(az) 2az which yield the maximum likelihood solutions y = x , a2 = (u+v) / (e+h+1) . zoo In Q the logarithm of the likelihood function, omitting a function of the observations, is (^.._y)2gn u e 0 2 v (h+1) „ , 2M 2, " - -~ £n a ~ ^— - 0 £n (a +na ) . 0/ 2 2. -2 2 z 0/ 2^ 2, 2 l z a 2 (a +na 2a 2 (a +no ) z a z z a (1.1.5) 2 2 Differentiation of (1.1.5) with respect to y , a , and a yields, respectively, (x -y)gn / 2^ 2, (o +na ) z a = 0, n((x_-u)2gn+v) n(h+1) _ 2 ? ? ~ 7 2 ' 2 (a +na ) 2(a +na ) z a z a (x_-y)2gn+v u e (h+1) = q 2(a2+no2)2 2 (a2)2 2a2 2 (a2+na2) z a z z z a 2 2 Solving these equations for \i , o , and a , we obtain the ~ ~2 maximal solution of the likelihood function in fl, (y/O , aa) , given by y = x , .2 , a = u/e = u+ , z * a2 = (v/(h+l)-u/e). /n = (vruj/n, where u* = u/e and v* = v/ (h+1) . Since we insist that a „ be greater than or equal to zero, the solution above is the maximum likelihood solution only if v*-u* > 0. Suppose, however, that v* < u*. Clearly (1.1.5) is still maximized when y = x , so that we need to minimize ^r + e In a2 + — , V , + (h+1) k(aW) a2 z (a2+na2) Z z z a 2 2 subject to the contraints a > 0 and a > 0. Equivalently , we consider the problem of minimizing ip(x,t) = u/x + e £n x + v/t + (h+1) £n t subject to the constraint t>x>0 . For fixed x i|j(x,t) is concave upward in t with its absolute minimum at t = v^. For each x i>(x,t) is, therefore, minimized with respect to t > x when v* if v* > x, ' ,x if Vj, < x, Thus, ijj(x,t) is minimized over {(t,x): t > x > 0} by setting t = v* and x = u^ if v* > u^ , t = x = (u+v) / (e+h+1) if v* < uA. Hence, for the maximum likelihood estimators when the parameters are restricted to be within Q, we obtain 2 9. .« = u* , and a2n = (u+v) /(e+h+1) , a2n = 0, if v* < u*. Substituting the maximum likelihood estimates into the likelihood function, we see that in oo u^e-l vHh-l exp[_ (e+h+1) /2] »8* f(«..^v) - ———^- ,%(e+h+l, % 2^ (e+h+1)' 'e+h+1' and in fi u^e-lv%h-l exp[_(e+h+1)/2] max f (x , u,v)= nwrwuH,^?^11,^*^^! max f (x ,u,v) if v* > u*, if v+ < u+. The likelihood ratio, X, is ^(u/e)^(v/(h+l))^h+1) A = f (x ,u,v) / [ (u+v) /(e+h+1)] h (e+h+1) max f (x ,u,v) if v+ > ui( if vt < u; Now putting w = u/ (u+v) and noting that v* i u* , if and only if (iff) h+1 " e ' u+v > ,1 , 1 . - u(- + rr-rr) , h+1 'e h+1' iff iff e > JSL- e+h+1 u+v = w, 10 we can rewrite the likelihood ratio r(e+h+l)*(e+h+1) hen ^(h+1) .- , A— Wv^n v W (1-W) if W < A = ■ ete(h+1)%(h+l) - «— " - * e+h+l 1 if w > v_ e+h+l Since A is an increasing function of w, and H is rejected for small values of A, it follows that H is rejected for small values of w or large values of 1/w. Now l=H±Z=1 + v = 1 + h(Wh w u u e u/e so the likelihood ratio test rejects H for ev/hu large. 2 Recall that u/o has a chi-square distribution with 2 2 e degrees of freedom, and v/(a +na ) has a chi-square dis- tribution with h degrees of freedom, independent of u. 2 2 2 Hence, the quantity a ev/ (a +na ) hu has an F distribution z z a with h and e degrees of freedom. If we let F(h,e,a) denote the constant for which P(F(h,e) > F(h,e,a)) = a where F(h,e) has an F distribution with h and e degrees of freedom, then we will reject H if ev/hu > F(h,e,a). The power function 2 2 of this test is a function of 9 = a /a and is given by 6(6) = P(F(h,e) > F(h,e,a) /(l+n6) ) . Although the analysis which we have just outlined is, by now, quite standard to any graduate level course in design and analysis, we have reproduced it since it motivates the more general problem to be described in the next section. 11 Indeed the situation we wish to consider contains the one- way random effects model as a special case to which we can return on occasion to check our work. 1. 2 The Multivariate Random Effects Model Suppose a physician is considering administering a bat- tery of m distinct types of blood tests to his patients as a part of their physical examination. He believes that, based on the results of these tests, he may be able to detect any one of several particular pathological conditions. In order to examine variability in the results of the blood tests, the physician chooses to administer the battery of blood tests n times to each of g patients. This results in the observations x . . (mxl) : i = 1,2,. . . ,g; j = 1,2,. . . ,n. A suitable model to explain the different values of x..: i = 1,2,. ..,g; j = 1,2,. ..,n would be 2£ij = H + «i + lij- (1.2.1) Here jj(mxl) is an overall mean, a. (mxl) is an effect due to the i patient, and z. . (mxl) represents a vector of random errors due to the measuring process. We assume that z...: i = l,2,...,g; j = 1,2,.. . ,n are independent and have an m-variate normal distribution with mean _0 and variance-covar- iance matrix E . Since the physician will administer the tests to an unlimited number of patients in the future, we should prop- erly regard the g patients involved as a sample from the 12 entire population of patients. The patient effects, a.: i = 1,2,. ..,g, now have the role of random vectors, and (1.2.1) is a multivariate random effects model. We will assume that a-: i = 1,2, ... ,q are independent and have an m-variate normal distribution with mean 0^ and variance- covariance matrix A. Hence, from our model (1.2.1) we see that x. . has an m-variate normal distribution with mean u -ID and variance-covariance matrix A + I . While there are m different blood tests, it is believed that there are some groups of tests for which the tests within a group vary quite strongly together. In other words, the data from some of the tests are highly correlated. For this reason the number of sources of variation between the patients, which we will denote by p, may be less than the number of tests, m. That is, the rank of the variance-covariance matrix A is p where p < m. Since A is symmetric, nonnegative definite, and of rank p, there exists a matrix L (mxp) such that A = LL ' . Clearly L is not unique since if A = LL ' and P(pxP) is such that PP ' = I, then A = L*L* where L* = LP. This enables us to rewrite (1.2.1) as x. . = y + Lf . + z. . , (1.2.2) -in - -x -i] where f. (pxl) : i = 1,2,..., g are independently distributed, having a p-variate normal distribution with mean _0 and variance-covariance matrix equal to the identity matrix. 13 If the physician is interested in using the blood tests as a diagnostic tool, he will certainly be interested in deter- mining the value p, since the p sources of variation may correspond to p different pathological disorders. So of particular interest to the physician is a test of the hypoth- (s) esis H : the rank of the matrix LL ' < s-1 against the o r (s) hypothesis H, : the rank of the matrix LL ' = s. With such a test procedure he could develop a sequential test procedure for determining the rank of LL ' . He would first test H o against H, , and if he rejects H , he would stop and take the rank of LL ' to be m; otherwise, he would proceed to test H against H, . The procedure continues until either (s) some hypothesis H is rejected, in which case he then takes the rank of LL ' to be s, or the hypothesis H is accepted, in which case he would conclude that there is no significant variation between patients. In this paper we investigate the multivariate one-way classification model with random effects, given by (1.2.2). Two specific cases, regarding the structure of the variance- covariance matrix Z, will be considered. In the first and most general case we will assume no more than that I is sym- metric and positive definite. In the second case we will assume that the vector of random errors, z. ., is such that its components are independent and have the same variance. That is, we assume that £ is equal to some constant multiple of the identity matrix. In each case we develop a test 14 (s) procedure for testing the hypothesis H : the rank of (s) LL ' < s-1 against the hypothesis H, : the rank of LL ' = s. In addition, we investigate some of the properties of these test procedures and present a numerical example to illustrate the use of these procedures. 1. 3 Notation The following notation will be used whenever convenient: Notation Interpretation (A) . row i of the matrix A (A) column j of the matrix A (A) . . the element in row i and column j of the matrix A a. . the element in row i and column j of the matrix A A the inverse of the matrix A A ' the transpose of the matrix A |a| the determinant of the matrix A tr A the trace of the matrix A dg (A) the diagonal matrix which has as its diagonal elements the diagonal elements of A diag (a, ,a2 , . . . ,a ) the diagonal matrix which has a,, ch. (A) the i largest latent root of the matrix A 15 Notation Interpretation rank (A) the rank of the matrix A I the m x m identity matrix m J I the identity matrix (used when the order of the matrix is obvious) (0) the matrix which has all of its elements equal to zero x a vector x. the i element of the vector x _0 the vector which has all of its elements equal to zero E (x) the expected value of x V(x) the variance of x Cov(x,y) the covariance of x and y P (A) the probability of event A P(A|B) the probability of event A given event B r (x) the gamma function x > x x converges to x in distribution n n ^ a > a convergence of a sequence of constants exp(x) Euler's constant, "e," raised to the x power £ is contained in is distributed as 2 N(y,o ) the normal distribution with mean u and variance a 16 Notation Interpretation N (y,E) the m-variate normal distribution with m — mean u_ and variance-covariance matrix Z 2 X the central chi-square distribution with v degrees of freedom Vl F the central F distribution with v. numer- V2 1 ator degrees of freedom and v_ denomina- tor degrees of freedom W (Z.v.O) the central Wishart distribution with m variance-covariance matrix E and degrees of freedom v Jones [1973] the reference authored by Jones and published in 1973 Jones [1973:1] page 1 of the reference authored by Jones and published in 1973 CHAPTER 2 MAXIMIZATION OF THE LIKELIHOOD FUNCTION FOR GENERAL E 2. 1 The Likelihood Function Suppose the vectors x. . (mxl) : i = 1,2,... ,g; j = 1,2, . . . ,n can be modeled by 2ij = )L + Lli + Zj.j' (2.1.1) wherein y(mxl) is a fixed but unknown vector, L (mxp) is a fixed but unknown matrix, f. - N (0,1): i = 1, 2,...,g, and z. . ~ N (0,£) : i = 1,2, . . . ,g; j = 1,2, ... ,n. We assume that the set of random vectors { f_, , f _ , . . . , f , _z^ , . . . , _z } are mutually independent. Thus, x. . ~ N (y,V) with V = LL ' + Z. However, for any orthogonal matrix P (pxp) , V = LL ' + Z = LP (LP) ' + Z so that L is not unique whereas LL ' is unique. The purpose of this section is to derive the like- lihood function for y_, LL ' , and E. Although x. . and x^ are independent for all (j,£) when i # k, x. . and x.„ are not independent even when j ^ £ , since Cov (x . . ,x. . ) = LL'(Jt^). Thus, the likelihood function is not simply the product of the density functions of the x. .'s. A transformation of the x. .'s will expedite the derivation of the likelihood function. -ID 17 18 Consider the Helmert transformation (see, for example, Kendall and Stuart [1963:250]) given below: £il = ^i. + (2*1)_^il + (3*2)"^i2 + *" + ' = H(Hi.'*ii ^iv} '' and we note that, while not an orthogonal matrix, the columns of H are orthogonal. The matrix H fails to be orthogonal since H'H = diag (n, 1 , 1 , . . . , 1) . Observe that, upon replacing x . . by the right side of (2.1.1), we have -ID 19 X. = y + Lf . + z . , -l. - -1 — l. ' *il = (2)"i5(zil-li2), Yi2 = (3.2)^(2.^2.2-22.3), v. = {n(n-l)} 2(z. ,+2. ,+. • .+z. ,-(n-l)z. ). iv — il — i2 — i,n-l —in Thus E^i.Zij) = (0) , E(Xi;j2iq) = (0) if j # q' E (y. .y.' .) = Z. Hence, it follows that {xi ,y_i;L ' • • • 'Xiv^=l are a set of gn mutually independent vectors with x. ~ N (y , (1/n) Z+LL ' ) : i = 1,2,. ..,g and £„ - Nm(0,Z): i = 1,2,. ..,g; j = 1,2,. ..,n. n _ __ v Note also that Z (x. .-x. ) (x. .-x. )'= Z ¥_■■%!■ and denote j=l J 1J x" j=l XJ 1-) this matrix by E.. We can now write the joint density function of y .,,... ,y. as f&il'..-rZiv'-Z) = ,n |2ttZ| Sexp[-^ij2 Zij] -%v _r ,. : ,_ , .-1 = |2ttZ| expf-ij Z (vf. Z y,,)] j=l i: 1J 1, v , exp[-3s Z tr(W.Z v..)] j=l XJ x:i 20 = |2ttZ|~2V exp[-h Z tr(E v.. y.'.)] = |2ttZ|~2V exp[-Jj tr Z-1 E.] = g(E.;Z) so that from the set {x. ,y . , ,y .„,... ,y . }, (E.,x. ) is sufficient. Thus, we may assume that we have, independently, E. ~ W (Z, V, 0) l m x. - N (y, " Z + LL') — i . m — n 1 < i * g. Note that g_ _ g _ _ _ _ E '+g(* -Ji) (2£ -£) '/ i=l " x * i=l ' ... g _ where x Z x. /g. Then putting W = (l/n)Z + LL ' , i=l :Lm we can write the joint density function of xn ,x„ , . . . ,x as J — 1. —2 . g. f(xx ,x2 ,...,1 ?JJ,W) = JI \2-nW\~iiexp[-h(xi -yJ'W-1^ -y ) ] g. ^_^ ,_j,0 g _-, _ = | 2ttW| 2yexp[-^ Z tr ( (x. -y)' W (x. - y) ) ] i=l 1- 1' = | 2ttW I 2yexp[-^ Z tr(W (x. -y) (x. -y)'M i=l 1* x- ,-^a -1 ® — — = |2ttW| 2yexp[-Js tr(W Z (x. y)(x. -y)' ) ] i=l x* 1- 21 = |2ttW| 2yexp[-lsg tr W (x -y) (x -y)' -h tr(W X E (x. -x ) (x. -x )' )] i=1 l. .. 1. .. 2ttW| 2yexp[-^g tr W (x -y) (x -y)' ] _1 g _ _ _ _ x exp[-h tr W E (x. ~x ) (x . -x }' ] i=l x* • * 1* = g(x ,H;jj ,W) , g _ _ _ _ where H = n E (x . -x ) (x . -x )' . Hence, from the set { x, , . . . ,x } , (x ,H) is sufficient for (y , (1/n) E + LI/ ) . Also if we let c denote a constant, we can write the joint density function of E,,...,E as * 1' g f(E1 >Eq;Z) = c n |Ei|^(v"m"1)exp[-32 trfE^E..)] g i=l - c expf-J^ tr(E_1 E E.)] II | E . | ^ (V"m_1) i=l 1 i=l x = g1(E;Z)g2(E1,E2, . . . ,E ) , g where E = E E.. Thus, from the set {E, , . . . ,E } , E is i = l ^ sufficient for E. Then we may assume that we have, independently, ^.. ~ Nm(^'gl{(E+nLL' M' E - W (Z,e,0) , m H~ W (Z+nLL' ,h,0) , 22 where e = g(n-l) and h = g-1. The problem is to estimate y, £, and LL ' or, equivalently , to estimate jj, Z , and M where M = nLL ' . Recall that L is not uniquely defined so that if LL ' is an estimate of LL ' , then any L, such that LL ' = LL ' , is an estimate of L. The likelihood function of (jj,Z,M) can be expressed as K (I,e)K (I,h) hlh ,. , , ,. f(x ,E,H) =-2—^ r m ^h rr IhI^^-^IeI^6-^1) — (E+M) r|l+MHn|z|^e gn ' ' iii x exp[-^(x -y) ' (— (E+M) )_1(x -y)-^tr (E^EJ-Jjtr (E+M)_1H] , where if1 (I, v) » 2lsmv *tal(m"1) S r(%(v-j+l)). The logarithm of the likelihood function, omitting a function of the observations, is -^trE~ E-2$e£n|Z| -%tr (E+M) _1K -*sh in | E+M' - Jj £n|Z+M| - ^(x -y) ' ( (1/gn) (E+M) )-1(x -y ) . We seek the solution, (y ,E ,M) , which maximizes the equation above, or equivalently, the solution which minimizes tr E~ E-;-e£n|E| + tr (E+M) _1H+ (h+1) £n | E+M| + (x^-y) '( (1/gn) (E+M) )_1(x -y ) . (2.1.2) Before we can minimize the above equation, we need some results on differentiation. Let W (mxm) , X(raxm) , and Y (mxm) be sym- metric matrices, and let z(mxl) and a(mxl) be vectors. The proof of the first result can be found in Graybill [1969:267]. 23 Lemma 2.1.1 9£n|x| = 2X"1 - dgtx"1) 3X Lemma 2.1.2: ^nljj+Y| = 2(X+Y)_1 - dgUX+Y)-1). Proof: Let V = X + Y. Then 3£n|x+YJ = 3£n|v[ = z z 3£n]v] 9vpq = 3 An 1 V | 3x. . 3x. . l(A,B;D,e,h) = e[tr A_1 + £n|A|] +h[tr B_1D +£n|B|], where A, B, and D are m x m matrices. We assume that D is diagonal with distinct, descending, positive diagonal elements; that is, D = diag (d, ,d~, . . . ,d ) with d, > d~ > . . . > d > 0. i z m i z m We are interested in minimizing cf> (A,B;D , e ,h) subject to s (A,B) e C = {(A,B): A e P , B£P , B -A £ U P.}, where P. m m -i=o 3 3 is the set of all symmetric, nonnegative definite matrices of rank j. In this section it will be shown that the required absolute minimum occurs when both A and B are diagonal. 28 The proof of this result relies mainly on a lemma regard- ing the stationary points of the function g(P) = tr PB P'D where P (mxm) is orthogonal. Lemma 2.2.1: Consider g(P) = tr PXP 'D where P (mxm) is such that PP ' = I , and X (mxm) and D (mxm) are both symmetric and positive definite. It is assumed that D is diagonal with distinct, descending, positive diagonal elements. Then the stationary points of g(P) occur when PXP' is diagonal. Further, the absolute maximum of g(P) is m max g(P) = E d.ch. (X) , P:PP'=I i=l 1 and the absolute minimum of g(P) is m min g(P) _ E dm+1_ichi ch. (X) for i = l,2,...,m. If Y is positive definite, then ch. (X+Y) > ch. (X) for i = l,2,...,m. Lemma 2.2.3: The function (A,B; D , e ,h) has an absolute minimum over the set of solutions C = { (A,B) : A e P , B e P , s m m B - A e U P . } . j=0 3 Proof: Since B is positive definite, it follows that B is also positive definite, so that the diagonal elements of B~ are positive. Then we find that -1 m -1 m -1 -1 tr B D = I (B ) . . d. > d E (B ) . . = d tr B . , n l m . , n m i=l i=l m _ , m _ , = d I ch (B L) = d £ (ch (B)) m i=1 i m .=1 i since ch . (B ) = (ch ,, .(B))-. Hence, using the fact that l m+l-i m for any matrix X (mxm) , tr X = £ ch. (X) and m |X|= II ch.(X), we see that i=l x 31 ™ -1 (J)(A,B;D,e,h) > e Z ((ch. (A)) + £n(ch. (A))) i=l m -1 + h I (d (ch. (B)) + ln(ch. (B) ) ) . (2.2.3) . , m i i i=l From Lemma 2.2.2 we know that ch . (B-A) < ch. (B) , since A is positive definite. Then C can be written C = {(A,B): ch. (A) > 0: i = 1,2,. ..,m; cfcu (B) > 0: i = l,2,...,m; 0 < ch^ (B-A) < chi (B) : i = l,2,...,s; ch . (B-A) = 0 : i = s+1 , . . . ,m; A = A ' , B = B ' } . The closure , ~C , Of C is { (A,B) : ch . (A) > 0: i = l,2,...,m; ch. (B) > 0: s s i i i = 1,2,..., m; 0 £ ch. (B-A) £ ch.(B): i = 1 , 2 , . . ., g; chi (B-A) = 0: i = s + l,...,m; A = A',B = B'}. Since 0, it has an absolute minimum over C , since C is closed. Note that from Lemma 2.2.2 if ch.(B-A) = ctu (B) for some i, then it must be true that ch (A) =0, since A must then be m positive semidef inite. Thus, for every (A,B) £ cs ~ cs it: must be true that ch (A) = 0 or ch (B) = 0 or both. It then m m follows from (2.2.3) that cf> (A,B; D ,e ,h) = °° whenever (A,B) £ ~C - C . Hence, (A,B; D,e,h) has an absolute minimum over C . s Lemma 2.2.4: Suppose the function f (x) , minimized over x e S, achieves a minimum at x = a. Let the set S, be such that for any x £ S-S-. , there exists an x, £ S, such that f(x ) < f (x) . Similarly, let the set s2 be sucn that for any x es~S2' there exists an x„ e S- such that f(x„) < f(x). Then it follows that a £ S1 n S 2. 32 Proof: Suppose a f S^ Q S 2. Then either a t S, or a t S ? or both. However, if a t 5^, then a e 5-5-,, and there exists no x1 £ 51 such that f(x;[) < f (a) , since f is minimized at a. This then is a contradiction, so it must be true that a e S, . Similarly, if a £ S2 , then a e S-S~, and there exists no x2 e 52 such that f(x2) < f (a) . This also is a contradiction, so it must be true that a e S2 . Hence, it follows that a £ s1 n S2. In Lemma 2.2.3 it was seen that the function $ (A,B;D ,e,h) has an absolute minimum over the set C . We will now show s that this absolute minimum will occur only when both A and B are diagonal. Lemma 2.2.5: The absolute minimum of <}> (A,B; D, e ,h) subject to (A,B) e C occurs when both A and B are diagonal. We offer two proofs. Proof 1: Define the sets 5, and S ? as follows: 51 = { (A,B) £ Cs: A is diagonal}, 52 = {(A, 3) e Cs: B is diagonal}. We want to show that if (A,B;D ,e,h) achieves a minimum at (A,B) = (A*,BA), then (A*,BJ e s± AS.. Now with A = D 2AD 2 and B = D 2BD 2, 0 (A,B;D,e,h) - e[trA_1+£n|A[ ] + h [ trB_1D+£n | B | ] = e[trA~ D~ +£n|A|] + h [trB-1+£n | B | ] +(e+h)£n|D| = (B,A;D~ ,h,e) + (e+h)£n|D|. 33 _3^ Note that since D 2 is positive definite, (A,B) £ Cg if and only if (D~ 2AD~ 2,D~ 2BD~ 2) = (A,B) e C . Thus, minimizing 4> (A,B;D,e,h) subject to (A,B) e C is equivalent to minimiz- ing (A,B;D,e,h) . Now arbitrarily fix (A,B) e Cg and consider (PBP ' ,PAP ' ;D~ ,h,e) for all orthogonal P. Clearly the terms £n|PAP'|, tr PB~ P', and £n | PBP ' | are constant for all orthogonal P, so that 4> (PBP ' ,PAP ' ;D~ ,h,e) is minimized with respect to P when tr PA~ P 'D is minimized. It follows from Lemma 2.2.1 that all the stationary points, and thus the absolute minimum, occur when PAP' is diagonal. Hence, for any (A,B) e C - S, there exists an (A-^B^) e S1 such that (J) (B1,A1;D_1,h,e) < cf>(B,A;D~ ,h,e). — k — \l But since A = D AD 2, we know that A is diagonal if and only if A is diagonal. So we find that for any (A,B) e C - S^, there exists an (A1,B-L) e S1 such that tHAwB-]/ D, e,h) < <$> (A,B;D,e,h) . In a similar manner now arbitrarily fix (A,B) £ Cg and con- sider (J> (PAP ' ,PBP ' ;D,e,h) for all orthogonal P. Clearly this is minimized with respect to P when tr PB~ P 'D is minimized, since the terms tr PA-1P', £n | PAP ' | , and In | PBP ' | are con- stant for all orthogonal P. So from Lemma 2.2.1 it follows that all the stationary points, and therefore the absolute minimum, of (PAP ' ,PBP ' ;D ,e ,h) occur when PBP' is diagonal. 34 This implies that for any (A,B) e C - S~ ' there exists an (A2,B2) e S2 such that (J»(A2/B2;D,e/h) <

(A,B;D ,e ,h) , then the diagonal elements of D 2A^D 2 are increasing and the diagonal elements of B* are decreasing. The second proof of Lemma 2.2.5 utilizes the concept of "majorization" (see Marshall and Olkin [1974] ) . Definition 2.2.6: Let x and y_ be real mxl vectors with i element x. and y., respectively, and i largest element x... and y,-w respectively. We say that x major- izes y_ and write x > y_, if s s Z x,.. 2 I yM> for s = l,2,...,m, i=l [±) i=l y±> with equality when s = m. We will need some results which, while well known to workers in the area of majorization, may not be readily accessible to others. We prove the results here for the benefit of the uninitiated reader. Lemma 2.2.7: If S (mxm) is doubly stochastic, then x > sx = v.. Proof : Since S is doubly stochastic, it follows that s. . > 0 for all (i, j) , and lj J 35 * sij = 1 for i = 1,2,. ..,m, j = l m z sii = 1 for j = 1,2,. . .,m. 1=1 J Thus, for 1 < t ^ m there exists k1,k2,...,kt such that t m ■ Z Y(i) = Z (Sk i+Sk i + *-'+Slc i)xi' 1=1 u' j=i KlJ K23 t3 J Clearly when t < m, m Bk1j+Sk2j + ---+Bktj * .^Sij = 1 for J = 1'2 m, and Z (sk i+sk i + -"+sk -i> = fc- j=l Kl^ V kt: Then when t < m, t i 1 y(i) = Z (sk i+sk i + -"+sk -i>xi i=l u' j=i Ki^ k2j ktD D t e i=l If t = m, then * Ex.... -i (i) sv -;+su ^ + *..+s, . = E s. . = 1, kl: k2^ V i=l ^ so that t t E y... = E (s .+s, .+...+S, .)x. i=l (1) j = l kl=> k2^ V ^ t t E x . = Ex.... j=l ^ i=i (i) 36 ILl Lemma 2.2.8: If x > y_ and a . > a (2. > ... > a . . > 0, then m m .Z=1x(i)a(i) * .Z=/ia(i)- Proof: Put d. = x..> ~ Yj_« Then .^(D^i^i) = ^Vd) = dl(a(l)_a(2)) + (d1+d2) (a(2)-a(3)) + (d1+d2+d3) (a(3)-a(4) ) + (d1+d2+...+dm_1)(a(m_1)-a(m)) + (d1+d2+...+dm)a(ra). The last term is zero, since mm m = 0 I d = E (x(i)-y.) = I x - E y .} i=l 1 i=l {1) x i=l llj i=l u' The partial sums are nonnegative, since t t t t t E d. = E x... - E y. * E x... - E y,,M > 0. i=l 1 i=l (1) i=l X i=l (1) i=l (1) Further, the differences a(1)-a(2), a (2) -a (3) , . . . , a (m_1) -a (m) are nonnegative. Hence, the result follows. Lemma 2.2.9, Corollary: If x is an ordered vector, that is, x, ^ x- 2. ... > x , S is doubly stochastic, and 12 m a is also an ordered vector, then x'a > (Sx) 'a. then 37 Lemma 2.2.10: If x > y and a.,, > a.„. >...> a. , > 0, - *■ (1) - (2) - - (m) - ' m Z x. . . °. , , . . £ Z y . a i=1 d) (m+l-i) i=i x (m+l-i) Proof: The proof is similar to that of Lemma 2.2.8. Letting d. = x . . . -y . , we have 3 l (l) J l m m .^(i)"^ (m+l-i) = ^V (m+l-i) = dl(a(m)-a(m-l)) + (dl+d2)(a(m-l)-a(m-2)) + (dl+d2+d3)(a(m-2)-a(m-3)) + (d1+d2+...+dm_1)(a(2)-a(1)) + (d1+d2+...+dm)a(1). t We have seen that the partial sums , E d . : t=l , . . . ,m-l, are i=l 1 m nonnegative and Z d. is zero, so that the last term is zero 4-1 1 (m)~a(m-l) ,a(m-l)"a(m-2) '** * ' i=l Further, the differences a, ,-a, ,,,a,_ , , -c 0,-,-a,,, are negative or zero. Hence, the result follows Lemma 2.2.11, Corollary: If x is an ordered vector, and y_ = Sx with S doubly stochastic, then m m •j^i0 (m+l-i) * J^ia (m+l-i)' Furthermore, if aM, > a,-,. > ...> a, w then there is (1) (2) (m) equality only if y_ = x. 38 We are now ready for the second proof of Lemma 2.2.5. Recall that we need to show that the absolute minimum of cf> (A,B;D,e,h) subject to (A,B) e C occurs when both A and B are diagonal. Proof 2 (Lemma 2.2.5): Let S-. and S 7 be defined as before; that is, S, = { (A,B) e C : A is diagonal}, S2 = { (A,B) e C : B is diagonal}, and recall that we need to show that if $ (A,B: D,e,h) is min- imized at (A*,B*), then (AA,B*) e S, fl S~ . Let 3i - 3o - • • • - 8 > 0 be the latent roots of B , and PB_1P' = diag(3m,6m_1,...,61) . Then { (PAP' ,PBP';D,e,h) - $ (A,B;D,e ,h) }/h = tr PB-1P'D - tr B-1D m m / m 2 = Z 6 J.1 -d. - E Z 3 _,, .P^ Id. j=1 m+1-3 ] j=1 ^.=1 m+l-i 11) 3 m m = I B -d .. . - I y .d Al . , (2.2.4) j=l 3 m+1-3 j=l 3 m+1_3 where y_ = P^Ji'Ji ' = ( B-, / 62 / • • • , 3 ) , and P2 is the matrix with (i,j) element p ,, . ,, .. Since PP ' = P 'P = I, we see that P~ is doubly stochastic. Also d = (d, ,d~,...,d ) 2 J — 1 ' 2 ' m and _B_ are ordered vectors, so by Lemma 2.2.11, equation (2.2.4) is not positive. Furthermore, d, > d„ > . . . > d so that r 12 m 6(PAP' ,PBP ';D,e,h) < $ (A,B;D , e ,h) , 39 with equality holding only when B = diag ( B , B ,,..., 8-, ) . Therefore, for any (A,B) e C - S- there exists an (A„,B_) e S~ such that (A,B;D,e,h) = e[trA~ d" +£n|A|] + h[trB~ +An|5|] + (e+h)£n|D| (2.2.5) = (B,A;D~ /h,e) + (e+h)An|D|. Let a, > a_ > . . . > a > n be the latent roots of A 1 2 m u and QA Q' = diag (a,,a2,...,a ). Then by an argument iden- tical to the previous one we find that <$> (QBQ' ,QAQ ';D_1,h,e) < c(> (B , A; D_1 ,h , e) , with equality holdinq only when A = diag (a, ,a_ , . . . ,a ). ^ 12m From (2.2.5) it follows that $ (D2QAQ'D2,D 2QBQ'D2;D,e,h) < tj) (D 2AD 2 ,D 2 B D 2; D ,e ,h) , with equality holding only when A = diag (a, ,a„ , . . . , a ). h h - 1 -1 Note that D2QAQ'D2 = diag (d, a. , . . . ,d a ). Thus, for any ' 1 1 m m (A,B) e C - 5, there exists an (A, ,B, ) e 5, such that si ill (A.,,B, ;D,e,h) < (A,B; D ,e ,h) . The result now follows from Lemma 2.2.4. Lemma 2.2.12, Corollary: Let R be some restriction on the latent roots of A or B or both, and let C be the subset s of C such that (A,B) e C implies that R is satisfied. Since (A,B) e C*R if and only if (PAP',PBP') e CR for any orthogonal P, it follows that the minimal value of 40 T3 (A,B;D,e,h) over (A,B) e C occurs when A and B are diagonal. For example, if the latent roots of A were known to be pro- portional to a given set, then the minimal value of (j> (A,B;D,e,h) over (A,B) e C occurs when A is a diagonal matrix with diagonal elements proportional to this set. 2 . 3 The Maximum Likelihood Estimates In this section we seek the maximum likelihood estimates s of Z and M subject to the constraints Z e P and Me UP.. m j=0 3 Recall that the likelihood function of (Z,M) is K (I,e)K (I,h) , ,, . . , . f(EfH) = m |H|%(h-m-l)|E|^(e-m-l) Z+VL\hii\z\h& x exp[-JjtrZ 1E-%tr (Z+M)_1H] . The logarithm of the likelihood function, omitting a function of the observations, is - ^trZ~1E-J3e£n| Z| - %tr (Z+M) _1H-Jshiln | Z+M | . We seek the solution, (Z,M), which maximizes the above equa- tion, or equivalently , the solution which minimizes trZ_1E+e£n| Z | + tr (Z+M) -1H+h£n | Z+M | (2.3.1) s subject to Z e P and Me UP.. j = 0 ^ Let E* = (l/e)E and H* = (l/h)H. Note that since E* and m H* are both symmetric matrices, and E^e P and H* e U P., there exists a nonsingular matrix K(raxm) such that KE*K' = I and KH*K ' = D, where D = diag (d1 ,d2 , . . . ,d ), and dl > d2 > "'' > dm > ° are the latent roots of H*E* . 41 Then with E = KZK ' and M = KMK ' , (2.3.1) can be rewritten etrK'-1E~ K-1I+e£n| E |+htrK'-1 (E+M) ~1K~1D+h£n | E+M| = e[trE-1+iin| E | ] + h [tr (E+M) -1D+£n | Z+M| ] - (e+h)£n|K[2 =

(E,Z+M;D,e,h) subject to E e P and Me UP. or, J m . _ i ' ... - , D=0 J equivalently , (E,E+M) e C . But from Lemma 2.2.5 it is known that the minimal solution to $ (E ,E+M;D ,e,h) is such that E and E+M are diagonal, and in addition, it is known that the diagonal elements of D ^ED are increasing while the diagonal elements of E+M are decreasing. Consider the function g(x,y) = e(^+2nx) + h^+£ny) , (2.3.2) where d > 0. Differentiating (2.3.2) with respect to x and y, we get the equations -i + i-o. 2 x ' X y2 y which yield the minimal solution xn = 1 and yn = d. If instead we wanted to minimize (2.3.2) subject to x = y, (2.3.2) would reduce to g(x) = e (- + Hnx) + h (- + Jinx). (2.3.3) 42 Then dg (x) . 1,1.,,, d , 1. d_- = e(_ _ + , + h(_ + } = 0/ x x so that x, = y. = — -r- minimizes (2.3.3). 1 1 e+n Now let f(d) = Q(xl,y1) - g(xQ,y0) - ~ / e+h , p„ ,e+dh, . , ,d(e+h) ,s+dhn " e (i+dh + £n (^F) > + h ( e+dh + £n (-eTF) ) - (e+h+h£nd) _n (d-1) hx . .. , (d-l)e. , . ,.. . ,e+dh. e(1 " -^dh-] + h(1 + TRBr) + (e+h) in (-^j^-) - (e+h+hilnd) = (e+h)Hn(^) - hind, e+h Differentiating f (d) with respect to d and noting that e > 1, h > 1, we find that when d > 1 df .(d) _ h(e+h) h dh (e+h) -h (e+dh) eh (d-1) dd e+dh d " (e+dh)d (e+dh)d ' In other words, the difference g(x,,y,) - g(xn,y_) is an increasing function of d when d > 1. Now with X = diag (x, ,x„ , . . . ,x ) and Y = diag (y, ,y_ , . . . ,y ) consider minimizing m , m d . <|>(X,Y;D,e,h) = e I (7-+ tax.) + h T. (— + In y.) (2.3.4) i=l Xi x i=l Yi x subject to (X,Y) e C , which in this case implies that y. > x. > 0 for all i, and x. = y. for at least m - s of the 1 1 1 d 1 i's. Suppose that dn > d„ > ... > d > 1 > d ,>...>d > 0. 12 r r+1 m Using the fact that f (d) is increasing in d for d > 1, 43 it then follows that the minimal solution to (2.3.4) is (X ,Y ) , where if r > s, 'x . = y . = (e+d.h) / (e+h) for s+1 < i < m, si J si 1 ' x . = 1, y . = d. for 1 < i < s, «. si 2 si 1 ' and if r < s , fx . = y . = (e+d.h) / (e+h) for r+1 < i < m, Ix . =1, y . = d. for 1 £ i < r. v. si Jsi 1 Thus, cj)(E ,E+M;D,e,h) is minimized subject to (E,Z+M) £ C at Z = X , s M = Y -X , s s' so that the maximum likelihood estimates of I and M are E and M, where E = K-1X K'-1, s ' M = K_1 (Y -X )K'_1. s s We now present an example to illustrate the computation involved in deriving the maximum likelihood estimates. Consider model (2.1.1) in which we take m = 4, g = 21, n = 6, Z = I, and M = diag (99 , 24 , 0 , 0) . Hence, e = g(n-l) = 105 and h = g-1 = 20. Generating a matrix E from the dis- tribution W. (1,105,0) and a matrix H from the distribution W4 (I+M,20,0) , we obtain 44 E = 69.1329 H = 1845.85 4.07476 127.055 63.5986 688.962 -5.12762 -3.77638 116.342 -16.5227 1.14908 20.1453 -9.94924 20.4629 8.12511 100.186 -1.43363 -8.61601 -,0100181 12.2617 With E* = (1/105) E and H* = (1/20) H we need to find a non- singular matrix K such that KE^K' = I and KH*K' = D, where D is a diagonal matrix. Let D, = diag (ch, (E*) , . . . ,ch4 (E*) ) , and let P be the orthogonal matrix for which the i column is the characteristic vector of E* corresponding to ch^ (E*) , then, since E* is symmetric, P'E*P = D. Similarly, let D = diag(ch1(D1i5P'H^PD^ 2) ,. . . ,ch4 (D12P'Hi,PD135) ) , and let Q be the orthogonal matrix for which the i column is the — h —h characteristic vector of D, P'H^PD, 2 corresponding to — \ —it — 1? — h ch. (D,^P fH*PD,^) , then, since D1 2P 'H^PD-j^ 2 is symmetric, Q 'D, 2P 'H*PD, 2Q = D. Thus, we may take K = Q 'D1 2P ' . Using the above decomposition for K, we find that, for our example, 1.24522 .0181884 .00831611 -.000914637 -.0464049 -.925978 -.00767896 -.0158712 .0380069 -.0477042 .940375 -.153048 .130133 .213476 -.237376 -.994866 and D = diag (142 . 729 , 29 . 6669 ,. 91847 ,. 625404 ) . Note that dy > 1 and d_ < 1, so that r = 2 Simple calculation yields XQ = YQ = diag(23. 6766, 5. 5867, .986955, .940065) , Xx = diag(l, 5. 5867, .986955, .940065) , 45 Y1 = diag(142.729, 5.5867, .986955, .940065), X2=X3=X4= diag(l, 1, .986955, .940065), Y2=Y3=Y4= diag(142.729, 29.6669, .986955, .940065). Hence, if we let E. and M. be the -n^ximum likelihood estimates of I and M, respectively, subject to the constraints Z e P i and Me U P . , we find that j=0 3 h = 15.3199 .541388 6.52813 -.173203 -.0210184 1.0919 -.0910632 .0947756 .064921 .899582 J MA = (0) , .665836 .246703 -.046356 -.0924036 h = 6.5222 -.0184676 .0947486 1.0908 .0649326 n— .899582 „ 91.5881 1.84178 -.792796 — i .00837764 .0370371 -.0159426 .000168469 kl - .0068625.2 -.000072518 s~ .000000766- .6578 .040037 -.0471092 -.0889834'"1 1.20736 -.0378372 .182707 Z2=E3=E4= 1.09073 .0652532 .898126 M =M =M = 2 3 4 46 91.6383 3.13344 -.788088 -.0129987 33.2548 .105117 -.549569 .00730371 -.002076 .00909865 Further commentary on these data will be made in Sections 3.6, 4.2, and 5.4. 2. 4 The Likelihood Ratio Test s Recall that C = { (A,B) :AeP ,LeP ,B-Ae UP.}, s m m j=(J ] and suppose we know that (Z,E+M) e Q = C . We wish to test, s say, the null hypothesis that (E,Z+M)e co = C , O C . The s-l s alternative hypothesis is then that (E,E+M)e n - to = C - C i- Thus, we are testing the hypothesis H^s) : rank (M) < s-l against the hypothesis h|s) : rank (M) = s. We adopt the likelihood approach and compare max f (E,H) with max f(E,H). Specifically, we look at max f(E,H)/»a*" f(E,H) = A e (0,1]. With the matrices X = diag (x -, ,x „,..., x ) and s r si s2 sm Ys = diag(ygl,ys2,...,ysm) given by x s X s i = y^-: = (e+d.h) / (e+h) for s+1 <, i <. m, i = 1'ysi = di for 1 * i £ s, 47 if r > s, and . = y . = (e+d.h) / (e+h) for r+1 < i < m, ;i -*si 1 x . = 1, y . = d. for 1 < i < r, si Jsi 1 if r < s, the maximum likelihood estimators, Z0, of Z and, MQ, of M when the parameters are restricted to lie within Q , are given by h = k"1xsk'_1' where K is a nonsingular matrix. Similarly, the maximum likelihood estimators, Z , of Z and, ft , of M where the CO ' CO parameters are restricted to lie within lo , are given by K = K'^s-lK'"1' It should be noted that if r < s, then X = X , and s s-1 Y = Y , , and if r - s, x . = x , . and y . = y , . s s-1 si s-l,i Jsi is-l,i only for i / s. The likelihood ratio, A, is A = max f(E.H) co max f(E,H) n expt^trZ-V^rtZ^)-^] \t^J^\lj^ 48 exp[-JsetrXa -JshtrYg D] |Ys_1| sn|Xg_1r |Y |^|X \hG 1 s1 ' s ' s-11 ' s-11 since, if r < s, etrfx'^-X-1) + htr(Y^1-Y~1)D (2.4.1) = etrtX^-x"1) + htr(Y~1-Y~1)D = 0, and, if r > s, (2.4.1) becomes e (x , -x ) + h (y ■ , -y d. s-l,s ss -^s-^s ^ss s / ,i-\ d h(e+h) = eje+hi _ _s h e+d h e+d h s s (e+d h) (e+h) s -- (e+h) = 0. e+d h s So we have -%(e+h) ,. ,e+d h dhh{ s s [ e+h / if r > s, if r < s. Since d-, > d0 > . . . > d >1> d_,, > ...> d_ > 0, 1 I r r+i m clearly, r > s if and only if d > 1. Hence, we can write if ds > 1, if 0 < d £ 1. s 49 Now upon taking the derivative of A with respect to d over s the range d > 1 , we get dcT = *h ds s M ,,e+d h^(e+h)-l *sh-lf s e+h e+d h e+h -d = hh d' Ve+d8hV,s(e+h)-1 \ e+h e-d e s e+h _ which is negative for d > 1. Thus, A is a decreasing function of d over the range d > 1. In addition, , , fe+d h M 2- s y e+h rh(e+h) < 1 for d > 1, s ' with equality when d = 1, so that A is a decreasing func- tion of d . (s) The likelihood ratio test rejects H* for small values of X . Since A is a decreasing function of d , the likeli- (s) hood ratio test rejects H^ for large values of d . Now recall that with H* = (l/h)H and E* = (l/e)E, there exists a nonsingular matrix K such that KH*K' = D and KE^K' = I. It follows then that d.: i = l,2,...,m are the solutions to H+-dEj = iKH.K'-dKE^K' D-dl = 0, th so we observe that d is the s largest solution to H*-dE. = 0 (2.4.2) With cjk = d^/e : i = l,2,...,m, (2.4.2) can be written H " f E| = 0, or H - = 0. (2.4.3) 50 (s) Hence, we would reject H' for large values of 0 = d h/e, where <$> is the s largest solution to (2.4.3). It is of particular importance to recall that H ~ W (E+M,h,0) and E ~ W (Z,e,0), independently. (s) We have seen that the likelihood ratio test rejects H* when (J) >c for some constant c. Now we want to choose for s the constant c some number, which we will denote by c(a,m,s) to indicate its dependence upon a, m, and s, such that P ((f) > c(a,m,s) | (E,M) ) < a for all (£,Z+M) e C . For c(a,m,s) we propose the a level critical value for the largest root, 9,, from amongst the m-s+1 roots of |w,-9W-| = 0, where W, - W , , (I ,h-s+l , 0) and W0 ~ W ,,(I,e,0), inde- 1 m-s+1 2 m-s+1 pendently. That is, we take c(a,m,s) such that P(91 > c(a,m,s)) = a. Justification for this choice of c(a,m,s) will be given in the next chapter. CHAPTER 3 PROPERTIES OF THE sth LARGEST ROOT TEST 3 . 1 Introduction In this chapter we investigate some properties of the s largest root test presented in the previous chapter. It would be desirable to show that this test is the uni- formly most powerful test, but we were unable to do so for general m. However, in Section 3.2 we show that for m = 1 the test is uniformly most powerful. Also, in Sections 3.3 and 3.4 it is shown that the s largest root test is an (s) (s) invariant test of H' against H, and is the test obtained by the union- intersection principle (see Roy [1953]). Finally, in the last two sections we discuss an important monotonicity property of the roots . : i = l,2,...,m, and then use this property in deriving the asymptotic distribution of cf> . 3. 2 The Uniformly Most Powerful Test for m = 1 For m = 1 the problem reduces to that of the univariate random effects model discussed in Section 1.1. Recall that — 2 2 2 n we have x «. N(u,(a +na ) /gn) , u ,. a y , and z a z "e ,222 22 v .. (a +no ) Xh, independently, where u,a , and a are all 2 unknown, and we wish to test the hypothesis H„ : a = 0 aqainst 0 a H, : a2 > 0. 1 a 51 52 Suppose that for some set of points, y', in the space of (x ,u,v) , we reject HQ whenever the experimental (x ,u,v) belongs to y'. Let 6 (y ' ; y ,a2 ,a2) = P [ (x ,u,v) ey ' I V ,a2 ,a2] and require that y' be such that 8(y';y,a2, 0 ) = « Let x = u + v and y = v/u, so that u = x/ (y+1) and v = xy/ (y+1) . Then the Jacobian, ||jj| , of x ,u, and v with respect to x , x, and y is ||j]| = || 3 (x ,u,v) /3 (x ,x,y)|| = x/(y+l)2. Then 3(y';y,a ,a ) = /// g (u; a2) g (v; a2 a2) g_ (x ;y,a2,a2)du dv dx, = /// f(x,y;a ,a )f (x ;y,a ,a )dx dy dx , -. ^ Ut U • • z ex » • where y = { (x^ ,x,y) : (x ,u,v) e y'} and where, independently, 2 2 u - a v , zAe 2 2 2 v ~ (a +na ) y, / z a Ah x ~ N(y, (a2+na2) /gn) , so that ^%(e+h)-l „„„r „/n(.2,„2 f (x,y;o^,a^) = f (x,y) = o 9 x2ieT"riexp[-x/2(o>naz)] z a (2(a2+na2))3s(e+h)r(J5(e+h)) (l+na2/aVe y^"1 (y+l) ^(e+h) exp [-xna2/2a2 (a2+na2) (y+1, ] a z =_ a z z a J ' Btte^h) and (x,y) is independent of x . We note that when a2 = 0, a x and y are independent; that is, f(x,y;a2,0) = f ± (x; a2) f 2 (y) = f^xjf^y, 53 where 2 2 x ~ azxe+h ' y . (h/e)Fg. Letting y(x,x ) = {y: (x ,x,y) e y} , we can write 00 00 3(Y';u,a2,0) * / / / f(x,y)fQ(x )dy dx dx ^ z -oo o y (x,x ) = / / f.(x)fn(x ) /_ f,(y)dy dx dx — 0 L U •• Y(x,x ) Putting h(x,x ;a2) = h(x,x ) = /_ f2(y)dy, z ' * y (x,x ) we see that 00 00 &(y';\i,a2 ,0) =11 f1(x)fQ(x )h(x,x^)dx dx . x- -oo 0 2 — 2 When o = 0, (x,x ) is sufficient for (oz,\i). Further, {f, (x) f n (x ) : -co < y < oo, a > 0} is a complete family (see, 2 for example, Lehmann [1959:130]). Thus, since & (y ' ;\i ,o z>0) = a„, we must have h(x,x ) = a_. Now let vi = { (x ,u,v): y = v/u > c} where c is some 2 2 2 constant. Then with 3 = (q1 ,q2)' where qx ~ (az+naa) xe+h 2 2 and q„ ~ N(y,(a +na ) /gn) , independently, [3(Y;;y,a2,a2) - 6(Y';y,a2,a2)] (l+no2/a2)_3ie 2 2 = E (d (q; a , a ) ) , -1 z a where the expectation is with respect to the distribution of 3. 54 Here d&al>al) = y {}f2(y)Q(y'ql)dy ' y{)f2(y)Q(y'qi)dy' where y*0 = Y* ^ ,^2) ' Y () = Y (ql'q2) ' and Q(y,qi) = exp[-q1noa/2az(az+naa) (y+1) ] . Therefore , d^°z'aa) =y,()^()f2(y)Q(y'ql)dy " Y()-Y*()f2{y)Q(y'ql)dy' Since Q(Y,q1) > Q(c,q1) when y e y*()-Y()» Q(y,q1) < Q(c,q1) when y e y()-Y*0# we find that d(a;a',a2) > Q(c,qi)[^()/Y()f2(y)dy - y () /^ () f 2 (y)dy] = Q(c,qx) ty/()f2(y)dy - Y{)f2(y)dy] = Q (c^) [a0-aQ] = 0. Thus, so that E(d(3;az,a^)) > 0, (Y*;u,°z,aa) > B(Y';U/az,aa) Therefore, amongst all critical regions of size aQ the crit- ical region which rejects HQ when v/u > c is uniformly most 2 2 powerful in a test of HQ: o^ = 0 against H-^ oa > 0. That is, the critical region $ > c, where is the only root of (v-, > d>„ > ... > d> , of rl 2 Ym I H— E I = 0 and the roots, 8 . > 9n > . . . > 6 , of ± Z m |KHK'-9KEK'| = 0, where K is nonsingular. Clearly |KHK'-8KEK' | = 0 implies |k| |h-8E| |k'| =0 so that I H-6E I =0, 56 and hence, 9. = . : i = 1,2, . . . ,m. Suppose now that 9. = . : i = 1,2/... /in are the roots of |H,-9E, I = 0 and i Ti ' ' 1 1 ' | H_— 4> E _ [ = 0, respectively, where E,, E„, H-, , and H2 are all positive definite, symmetric matrices. Then there exist nonsingular matrices K, and K~ such that Ei = KiKi' Hi = Ki$Ki' E2 = K2K2, H2 = K20K2, where $ = diag (A. , 4>_ , . . . , ). It then follows that 9 -l^l'V = (K2KIlElKi"lK2'K2KllHlKi"lK2) 21 = (K2K~1K1K;[K:[~1K^ ,K2K^1K1K-[K-[~1K^) = (K2K^,K2$K^) = (E~ , H_ ) , where g„ „_, e G since, clearly, K„K, is nonsinqular. So by K2K1 Z l Definition 3.3.1 {<$>: |H-AE| = 0} is the maximal invariant with respect to G. The s largest root, A , is clearly a function of (A, , A._ , . . . , ), and hence, by Lemma 3.3.2 the test statistic is an invariant test statistic for testing (s) (s) the hypothesis H' against the hypothesis H, . 3 . 4 The Union-Intersection Principle Suppose that in testing H-! : rank (M) < m-1 against f m \ H, : rank (M) = m, we adopt the rule R(m:m): reiect H„ if d> > c(a,m,m). U m Here A0 > . . . > > 0 are the roots of Ih-AEI = 0, yl 2 rm i r i 57 E ~ W (E,e,0) and H ~ W (Z+M,h,0), independently, and c(ct,m,m) is chosen such that P( > c(a,m,m) | H» ) < a. (s) (s) Consider now testing H* : rank (M) < s-1 against H, : (s) rank (M) = s. The hypothesis H' is true if and only if (s) the hypothesis „H,: : rank (FMF ' ) < s-1 is true for all F e S(m,s), where S(m,s) is the class of all (sxm) matrices (s) of rank s. Similarly, the hypothesis HQ is false if and Is) only if the hypothesis FHQ is false, and the hypothesis „H, : rank (FMF') = s is true, for at least one, and in fact F 1 (s) all F e S(m,s). Hence, we could think of HQ as Fesft.,.) FH0S' ^ "I'' " Ftift.,.) FH1S) and re^Ct H0S) if (E,H) e y = _ e*L * Y(F), where y (F) is the rejection r Eo ^m , S i (s) region appropriate to a test of the hypothesis HQ . The sizes of Y (F) : F e S(m,s) should be such as to produce a desired overall error of the first kind of the desired size. This procedure is known as the union- intersection procedure. (s) Note that we will reject Hq : rank (M) < s-1 if for some F e S(m,s), we reject fHqS) : rank (FMF') < s-1. Let A, >d)^ >...>*> 0 be the roots of I FHF ' - 4>FEF ' | = 0, IF y2F sF ' where, clearly, FEF ' ~ We (FZF ' ,e , 0) and FHF' ~ W (FZF ' S s + FMF',h,0), independently. Then by the rule R(s:s) we reject „H'S' if c (a ' , s , s), where a' is chosen to give J F 0 SF the desired overall error of the first kind of the desired (s) size. Hence, we will reject Hq if for some F e S(m,s), cf) _ > c(a',s,s), or equivalently , if max c(a',s,s) sF FeS(m,s) 58 We need the following results, the first two of which can be found in Bellman [1970:115]. Lemma 3.4.1: Let A(raxm) be a symmetric matrix. Then the smallest latent root of A may be defined as follows: ch (A) = min u'Au, where u is a (mxl) vector. The next result is well known as the Poincare separation theorem. Lemma 3.4.2: Let A (mxm) be a symmetric matrix. Then for any matrix F(sxm) such that FF ' = I ch.(A) > ch.(FAF') > ch , . (A) j j m-s+3 for j = 1, 2, . . . ,s. We need Lemma 3.4.2 to prove the following lemma. Lemma 3.4.3: Let A (mxm) be a symmetric matrix. Then max min u'FAF'u = ch (A), (3.4.1) F:FF'=I u'u=l S where F is a s x m matrix, and u is a m x 1 vector. Proof: Since A is symmetric, there exists an orthogonal matrix P (mxm) such that P'AP = A = diag(ch1(A), ch (A),...,ch (A)), and hence, for any F such that FF ' = I min u'FAF'u = min u'FAF'u, u'u=l ii'u=l where F = FP and FF' = FPP'F' = FF' = I. Then we can rewrite (3.4.1) as max min u'FAF'u. F:FF'=I u'u=l 59 Let F . (sxm) be the matrix with (F + ) . . = 1 for all i, and (F.) . . = 0 for all i 4 j. Then * ij J max min u'FAF'u > min u'F*AF£u = ch (A). F:FF'=I u'u=l u'u=1 ' *" Now by Lemma 3.4.2, for any F such that FF' = I , we know that min u'FAF'u < ch (A), u'u=l S so that max min u'FAF'u < ch (A) . F:FF'=I u'u=l S Therefore, it follows that max min u'FAF'u = ch (A). F:FF'=I u'u=l ta We have seen that the union-intersection principle leads (s) to the rule which rejects H^ : rank (M) < s - 1 in favor of H-fs) : rank (M) = s if max c(a;s,s). Note that 1 F£S(m,s) with T (mxm) andF(sxm) such that TT ' = E and F = FT , then for fixed F e S (m,s) | FHF ' - 4>FEF '|=0 implies |FTT HT' T'F' - cj>FTT~ ET'~ T'F'| =0, or |FT_1HT'-1F' - 4>FF'| = 0. (3.4.2) Since F is of rank s, so also is FF' (sxs), and thus, there exists a nonsingular matrix S(sxs) such that SFF's' = I. 60 So with F = SF we find that (3.4.2) implies |ft"1ht,~1f' - <$>l\ = 0, and clearly, FF ' = SFF'S' = I. Hence, it follows that max F = max min{cf):|FT_ HT'~ F'-l| ^=0} FeS(m,s) S F:FF'=I max min u'FT- HT'~ F'u, F:FF'=I u'u=l with the final equality due to Lemma 3.4.1. Now using Lemma 3.4.3 and the fact that the latent roots of T~ HT' ~ are the roots of |h-e| = 0, we observe that max tj) =

> c (a ,s,s). 3 . 5 A Monotonicity Property of the Power Function The test procedure developed in the previous sections depends on the latent roots, , , 2, . . . , cj> , of the random matrix HE . The distribution of these roots (see James [1964]), and hence the power function of our test procedure, depends upon the latent roots of the corresponding population matrix (E+M)Z as parameters. Let 6, >6^ ^ . . . >6 > 1 v ' r 12 m be the latent roots of (E+M)E , and note that with T defined such that Z = TT' (Z+M)Z 1 - 61 I = 0 implies |M - (6-1)1 I = 0, 61 so that |T~1MT" - (5-1) I | = 0. Since I is nonsingular, T is also nonsingular, and so the rank of T MT ' is the same as the rank of M. Hence, M has rank of at most s-1 if and only if 6 =1, and testing la) (a) the hypothesis H^ : rank (M) < s - 1 against H' : rank (M) = s (s) is equivalent to testing the hypothesis 111 ' : 6 =1 against (s) H, : 6 > 1. A desirable property of the test statistic 4> would be that it stochastically increases in 6 , and thus, that the power function increases monotonically in 5 - In this section we not only show that stochastically increases in 6 , but also that it stochastically increases in each 5.: i = l,2,...,m. This more general result will be utilized in the following section. We will first prove the result for the largest latent root, 4>, . That is, we will show that cf>, stochastically increases in 6 . : i = l,2,...,m. l Lemma 3.5.1: The test with the acceptance region , = ch, (HE-1) < c has power function which is monotonically increasing in each population root 6 . . The proof of Lemma 3.5.1 involves the followina three results, the first of which is due to Anderson [1955]. Lemma 3.5.2: Let y ~ N (0,Z,) and u „ N (0,I„), — m — ' 1 — m — ' 2 ' where Z~ - Z, is nonnegative definite. If to is a convex set, symmetric about the origin, then P (Yew) > ? (ueco) . 62 Lemma 3.5.3: Let the random vectors y. ,yn ,y and the matrix U be mutually independent, the distribution of y. being N (0,Z): i = l,2,...,n. Let the set to , in the space of m — {y_, ,y~ , . . . ,y_ ,U}, be convex and symmetric in each y_. given the other y_-'s and U. Denote by P^ , (to) the probability of the set co when £ = Z . . Then whenever Zn - Z, is nonnegative 1 2 1 r definite, Pv (to) > Pv (to). ll z2 Proof: Since Z, and Z ,. are symmetric and X, e P and 1 2 J 1 m m l0 £ UP. it follows that there exists a nonsingular matrix K such that KZ , K ' = I and KZ-K' = A = diag ( 6, , 60 , . . . , 6 ). 1 2 ' 1 2 m m Since it is assumed that Z0 - Z, e U P . , we know that 2 X j=0 3 6j_ > 1: i = l,2,...,m. Then y_. = K;^ - Nm(0,i) if j = j and y_* - Ky^ ~ Nm(0,A) if Z = Z2> Let oj* = {y_* ,£* , . . - ,£* ,U: * (y-, ,y9, . . . ,y ,U) e to}, then Pv (to) = P_ (to ) and P_ (to) = * P. (to ). So without loss of generality we can take £, = i and Z2 = A. Let A. = diag(61,92 6.^,1,9.^ 9m) , Ax = diag(ere2 ei-l^i'9i + i"-"em)' Ri = {^i: (^1'^2' - * ' '^n'U) £ Ui; Xn : J^i and u fixed}, where 8. e {1,6.}: j / i. Then from Lemma 3.5.2 it follows that PA.(Ri'Zj: ^i,U) > PA*(Ri|Zj: j*i,U). (3.5.1) Multiplying both sides of the inequality (3.5.1) by the joint 63 density of the temporarily fixed variables and integrating with respect to them, we obtain PA (co) ^ PA* (co) . i i Then by induction we have or equivalently , P.,- (co) > PA (co) , P (co) > Py (to) ^1 L2 Finally, the third result we need is due to Das Gupta, Anderson, and Mudholkar [1964]. Lemma 3.5.4: For any symmetric matrix B (m> X_ > . . . > A be the latent roots ^ 2 12 m of UV , and let to be a set in the space of A,,A~,...,A such ^ 1 2 ' m that when a point ( A, , A_ , . . .,A ) is in as, so is every point * * * * (A,, A _.,..., A ) for which A. < A.: i = l,2,...,m. Then the l z m 11 probability of the set co depends on £, and Z_ only through the latent roots of £_Z," and is a monotonically decreasing func- tion of each of the latent roots of 2~Z, . Clearly, the set co = { (cf>, , 2 , . . . , cf) ) : < c} satisfies the conditions of Lemma 3.5.5, so it follows that the proba- bility of the set co is monotonically decreasing in each of the latent roots <5,,6~,...,5 of (Z+M)I . In other words, 1 2 m ' the power function of the s largest root test is a monotonically increasing function of 6.: i = l,2,...,m. We now know that as 6 -*■ °° , P(* >c) increases mono- s s tonically. We will show that actually, as 5 + °° , P (<$> >c) -*■ 1, and hence, for sufficiently large values of (s) 6 the probability of rejecting H^ : 6 =1 will be arbi- trarily close to one. Recall that there exists a nonsingu- lar matrix K such that KEK ' ~ W (l,e,0) and KHK'~W (A,h,0). Let K, (mxm) be such that K, (mxm) = diag (ak, , ak„ , . . . , ak ,1,...,1). Note that I^KHK'K' ~ Wm (K-^K^ ,h, 0) and K AK{ = diag(a2k2S a2k26 .. ,a2k26 ,6 ,, 6 ) x -1 -L-L £ I s s ' s + 1' ' m ' 2 2 so that as a + °°, a ^^i + °°, and hence ch . (K,AKj) -* °° , for 65 i = l,2,...,s. Thus, we need to show that P (ch (K,KKK'K' (KEK')~ ) > c) ■> 1 as a + °° . The following si 1 lemma provides the necessary result. Lemma 3.5.6: Let V ~ wm(si'v'°> and u ~ Wm^Z2/U'0)/ independently, and let K , (mxm) = diag (ak, ,ak- , . . . , ak , 1, . . . ,1) Then P (ch (K-^K-Jv ) > c) -*• 1 as a + Proof: Let U12A fVll V12 U22V VV21 V22 where U, , is s x s, U_, is (m-s)xs, U, ~ is s x (m-s) , and U22 is (m-s) x (m-s) . Similarly, define V^, V2l' V12' and V22 Let F^ be the s x m matrix with (FA) . . = 1: i = 1,2, ... ,s and (F*)^ =0: i # j, and let K2 (sxs) = diag (k, ,k2, . . . ,k ). Recall from Section 3.4 that ch (K,UKjV-1) = max min{ X : | FK UK'F '- AFVF ' [ = 0} FeS(m,s) X L l > min{X: | F*K, UK'FJ-AF*VF£ | = 0} X ~ L . o = min{X:|a K2uiiK2~ Avil I = °^ X = oc2chs(K2UllK^V-l). Thus, P(chs(K1UK-[V~1) > c) > P(a2chs(K2U11K2'V^) > c) = p(chs(K2U11K^V^) > c/a2), 66 and K-U-, , K 'V, , is positive definite with probability one, so that lim P(ch (K,UK/V ) > c) _ s 1 1 a->oo = lim P(ch c/a2) a •*■ °° = P(chs(K2Ui;LK^V~^) > 0) - 1. 3.6 The Limiting Distribution of $ We have seen that the likelihood ratio test for testing the hypothesis HqS) : rank (M) < s - 1 against k|s) : rank (:M) = s is based on the s largest root, $ . However, if is to be used as a test statistic, it is necessary to compute the significance level, a, where a = sud P (c|> > c|H(s) ) . „(s) s ° HQ With 6, > 60 > ... > 5 as the latent roots of (Z+M) Z~ I z m (s) the null hypothesis can be written H' : 6 = 1 , or more precisely, H*s) : 6, > 6« > ... > 6 , > 1, 6 = 5 _,__ = r J ' 0 1 2 - s-1 s s + 1 ... = 6 = 1. We will write .m (61 , 62 ,. • . , 6m) to indicate that <$> is the s largest root of m roots and depends on s the population roots &^ , 82 , . . . , 6m. Then we may write a, the significance level, as sup P(* (6T ,69,. . . ,6 , ,1,.. . ,1) > c) :...*6 ,>1 s-1 But we saw in the previous section that e is stochastically 1*62*' --^s-l-1 s increasing in each 5 . : i = 1 , 2 , . . . ,m. It then follows that 67 a = P (*s-m(o°,C°' " ' * '°°'1' ' • ' ,:L) > c) ' where <{> (°° ,°° , . . . ,°° , 1 , . . . , 1) denotes the random variable which has the limiting distribution of d> (<$-,, <5_ .... ,<5 , , 3 rs:m 1 2' ' s-1 1,...,1) as 5. + <»: i = 1,2,..., s-1. So the problem at hand is to determine the distribution of d> (°° ,°°, • . . ,°°/ Ys:m ' ' ' ' 1,...,D. Recall that E , W (Z,e,0), H ~ W (Z+M,h,0), and there exists a matrix K such that KEK ' = I and K(E+M)K' = A = diag (6, , 6-, , . . . , 6 ). If we define E and H as l z m E = A_2KEK'A~2 ~ W (A_1,e,0), m H = A~2KHK'A"'2 ~ W (l,h,0), m ' ' -k -3< -3- _J< where A - 2 = diag(<5, ,6_ 2, . . . , 6 2) , then clearly 6 (6, , 12 m J Ts:m 1 6n ,...,& ) = ch (HE~ ) = ch (HE~ ). Hence, if we let ^ m s s E ~ W (A ,e,0) , where A = diag (n<5 , ,n6 _ , . . . ,nS „ , ,1, . . . ,1) , niun n. xz ™ x. then we need to find the limiting distribution of ch (HE- ) 3 s n as n •> oo. Since we can write E = Y Y', where Y = (y,v n n' n xl (n) — x x win: x trj i —iv . n (n) (n) % (n) -1 I2 ' -•• 'le ) and ^i ~ Nm(-^' n ): i = l»2,...,e, inde- pendently, we can restate the problem as that of determining the limiting distribution of ch (H (Y Y' ) ). Consider the s n n following elementary result. Lemma 3.6.1: Ifu ~ N(0,l/n), then u — — > u, where u is a degenerate random variable with all of its probability at zero. We also need the following results, the first of which is well known as the continuity theorem (see, for example, Breiman [1968:236] ) . 68 Lemma 3.6.2: Let x, ,x» ,x_ , . . . be a sequence of random vectors. Then x > x if and only if — n — J lim E[exp(ix't)] = E[exp(ix't)] n-><» for all t where i = /-T. Lemma 3.6.3: Suppose that as n ■* °° , x. >x.: i -, j r (n) (n) (n) , , , j = 1 , 2 , . . . ,m, and suppose {x, ,X-> > • • • t i£ •> are mutually independent for all n. Then (n) (n) 2l (n) x2 (n) x L_ — m ■ ^2 x J v- — m Proof: Note that it follows from Lemma 3.6.2 that lim E[exp (ix.(n) t . ) ] = E[exp (ix!t .) ] . Also, because of independence, E[exp(ix(n) 't)] = E[exp(i E xfn)'t.)] j = 1 1 D = n E[exp(ix!n' t.)], j=l 3 3 lim E[exp(ix(n) 't) ] = n lim E [exp (ix fn) 't . ) ] n->«> j=l n^-°° ■* -* = n E[exp(ixft .) ] j=l 3-3 69 = E[exp(i I x'.t .) ] = E[exp(ix't) ] . The result now follows from Lemma 3.6.2. From Lemma 3.6.1 and Lemma 3.6.3 we observe that Y — > Y with n 2 where the elements of Y, ((s-l)xe) are all equal to zero with probability one, and Y„ = (y2-i 'Y.70 ' ' ' ' '%-2 ^ witn y~ . ~N , , (0 ,1) : i=l,2,...,e, independently. ■*-Zi m-s + I — Consider the following result, the proof of which can be found in Ostrowski [1973:334]. Lemma 3.6.4: Let A(nxn) and B(nxn) be two matrices, and suppose the latent roots of A and B are A. and A '. : i = 1,2, ...,n, respectively. Put N = max ( I a . . | , | b . • | ) , l^i^n,l^jin 1D 1D and 1 n n 6=4t Z E la. .-b. . I . nN i=l j=l ^ ^ Then to every root A.' of B there belongs a certain root A . of A such that we have 1 |A^-Ai| < (n+2)N61/n. Further, for a suitable ordering of A. and A.' we have I A. -A.' I £ 2 (n+l)2N61/n. 1 1 1 ' 70 Lemma 3.6.5, Corollary: If A is an n x n matrix, then for each i ch. (A) is a continuous function of the elements of A. Lemma 3.6.6, Corollary: Let A be an n x n matrix and B, an n x p matrix. Then the roots of the equation | A- ABB' |=0 (3.6.1) are continuous functions of the elements of A and B except at B such that | BB ' | =0. Proof: Let X.: i = l,2,...,n be the roots of (3.6.1). Then when | BB ' ] 4- 0, it follows that these roots are also the latent roots of A(BB') . So, from Lemma 3.6.5, for each i A. is a continuous function of the elements of A (BB ' ) But clearly, when [ BB ' | #0, the elements of A(BB') are continuous functions of the elements of A and B. Hence, for each i X. is a continuous function of A and B except when i | BB ' j = 0 . We need one final result involving the limiting distri- bution of a function of random vectors (see Mann and Wald [1943] ) . Lemma 3.6.7: Let x >x, and let g (x) be a Borel — n — 3 — measurable function such that the set R of discontinuity points of g (x) is closed and P (x£R) = 0. Then g(xn) — "> 9(x) ■ Now recall that we seek the limiting distribution of ch (H(Y Y') ). In order to use Lemma 3.6.7 it is neces- s n n sary to show that ch (H(YY') ) is continuous with 71 probability one under the distribution of (H,Y) . Now with ■11 l12 H = VH21 n22/ where H,, is (s-1) x (s-1) , H12 is (s-1) x (m-s+1) , H21 is (m-s+1) x (s-1) , and H22 is (m-s+1) * (m-s+1) , the roots under the distribution of (h,Y) are the solutions to H 11 H 12 (0) (0) - * H, H. (0) Y Y ' x2 2* = 0 (3.6.2) l21 22 Since H is nonsingular with probability one, we may put /'G- H"1 = G - 11 G12> V*21 22 so (3.6.2) can be written ( (0) I - m G12Y2Y2 (0) C V V ' b22 2 2 = 0, s-1 -*G12Y2Y2 (0) Vs+r*G22Y2Y2 Thus, it must be true that I .. - 4)G„9Y„Y' = 0, m-s+1 22 2 2 = 0 G22 " *Y2Y2| = ° (3.6.3) -1, Then we see that with probability one ch1(H(YYf) ), ch„ (H(YY')"1) , . . . ,ch _1(H(YY')~1) are undefined, and 72 ch (H(YY') ) is now the largest solution to (3.6.3); that is, since YY ' is of rank m-s+1 with probability one, there are only m-s + 1 solutions to |H-YY'| =0. It can be shown (see, for example, Graybill [1969:165]) that G22 = (H22-H2,h7 }^12) _1 > since H -H--, H, , H, ~ is nonsingular with probability one, so that (3.6.3) can be written |H22-H21H~lfi12 - 4>Y2Y^| = 0. Clearly, Y~Y^ is also nonsingular with probability one, and hence by Lemma 3.6.6 ch (H(YY') ) is continuous with probability one under the distribution of (H,Y) . The set of discontinuity points, R, is closed, since R = {(H, Y) : | Y-Y- | =0} Note also that as is well known (see, for example, Anderson [1958:85]) H22-H21H~^H12 ~ W +1 (I ,h-s+l , 0) . Therefore, from Lemma 3.6.7 since (H,Y ) > (H,Y), it follows that <$> (°°,°°, . . . ,°°,1, . . . ,1) - $, , , (1,1, . . . ,1) , rs:m tii ii yl:m-s+l iii where (j>, _ , (1 , 1 , . . . , 1) denotes the distribution of the largest root of |w,-d>W»| = 0, where W, ~ W , , (I ,h-s+l , 0) , 3 ' 1 2 ' 1 m-s+1 and W_ , W _ , (l,e,0) , independently. So in testing H^ ' : rank (M) < s-1 against HJ ' : rank (M) = s we choose as our critical value c(a,m,s) , where P (ch, (W-,W« ) > c(a,m,s)) = a. By so doing we will guarantee that fs) sup P (*s.m( c(ct 'm's) I hq ) = a- H(s) Charts and tables of the distribution of the largest root, 0, , of | W, -9 (W, +W_) | = 0 are available (see, for example, 73 Morrison [1976:379], Pillai [1965,1967]). These may be used to calculate c(a,m,s), since 8, = , / (l+, ) , where , is the largest root of | W , — cJ>W _ | = 0. In order to determine the rank of M, a sequential test procedure is used. To illustrate this procedure, we will return to the example presented in Section 2.3. Recall that D = diag(142.729, 29.6669, .91847, .625404), h = 20, e = 105, so that <}>, - 27.1865, cf>2 = 5.65084, 3 = .174947, and 4 = (4) .119125. First we consider testing the hypothesis HQ : (4) rank (M) < 3 against H^ - rank (M) =4. The null hypothesis, H^4) , is rejected if A > c(.05,4,4), where c(.05,4,4) = 17 F(17,105, .05) /105, and F (17 , 105 , . 05) is the constant for which P(F(17,105) > F (17 , 105 , . 05) ) = .05 if F(17,105) ~ 17 F,ot-(0). Thus, c (.05,4,4) is approximately equal to .28, and clearly, $. = .119125 < .28, so that we do not reject (4) (4) H^ . Since H* is not rejected, we now consider testing the hypothesis H*3) : rank (M) < 2 against H-[3) : rank (M) = 3. (3) The null hypothesis, H* ', is rejected if _ > c(.05,4,3). Using the charts mentioned earlier we find that c (.05,4,3) is approximately equal to .36. Since $ -. - .174947 < .36, (3) we do not reject H* and so next consider testing the hypothesis Hg2) : rank (M) < 1 against H-J ': rank (M) =2. We find that c (.05,4,2) is approximately equal to .42, and (2) therefore, since <}>- = 5.65084 > .42, we reject K' and conclude that the rank of M could very reasonably be taken as two. 74 The procedure above is open to objections on the grounds that the significance level for the test criterion has not been adjusted to take into account the fact that a sequence of hypotheses is being tested, with each one dependent on the previous ones not being rejected. The mathematical com- plications involved in controlling the overall error make such an adjustment virtually impossible to carry out. CHAPTER 4 MAXIMIZATION OF THE LIKELIHOOD FUNCTION WHEN £ = a I 4 . 1 The Likelihood Function Suppose the vectors x . . (mxl) : i = 1 , 2 , . . . , g; j=l , 2 , . . . ,n can be modeled by x. . = u + Lf . + z. . , (4.1.1) wherein jj(mxl) is a fixed but unknown vector, L (mxp) is a fixed but unknown matrix, f. ~ N (0,1): i = 1,2,. ..,g, 2 and z . . ~ N (0,o I): i = l,2,...,g;i=l,2...,n. We assume — xj m — 3 J that the set of random vectors {f, ,f „,..., f , z -,,,..., z } — 1'— 2 — g — 11 — gn are mutually independent. Thus, x. .~ N (jj,V) with 2 V = LL ' + a I . For any orthogonal matrix P (pxp) it follows that V = LL' + a2I = LP (LP)' + a I, so that, while LL' is unique, L is not unique. In this section we will derive the 2 likelihood function for \i, LL ' , and o . By methods identical to those presented in Section 2.1 _ 2 it can be shown that (x ,E,H) is sufficient for (_p,a ,nLL') where g n x = £ E x../gn ~ Nffl (j±, (1/gn) (cTI+nLL ') ) , i=l j=l g n 2. E= I I (x,^-x. ) (x^-x. ) '-. W_(a l,e,0) , 75 i=1 j = 1 "ID -i. -ID -x. 76 a _ _ _ _ 2 H = n Z (x. -x ) (x. -x ) '„ W (a I+nLL',h,0) , , — 1. — . . — l. — . . m i=l and e = g(n-l); h = g-1. In addition, if we let c denote a constant, we find that the density of E can be written f(E) = c|E|Js(e"m~1) expi-htrio2!)'^] i ,-, i h (e-m-1) r , _, v /-,^21 = c|E| exp[-( I e..)/2a ] i=l 1X m 2 = g ( S e..;a )g2(E). 1=1 m Hence, from the set {e, , ,e, _ , . . . ,e , ,e } b=trE= £ e . . 11' 12' ' m,m-l' mm . t n 2 1=1 is sufficient for a . We may assume, then, that we have, independently, x , b , and H where *.. - Nm(iMl/gn)(a2I+nLL')), b/a ~ xBf H - W (a2I+nLL' ,h,0) , m 2 and B = mg(n-l). The problem is to estimate jj, a , and LL ' , 2 or equivalently , to estimate jj, a , and M where M = nLL ' . We have seen that L is not uniquely defined and so if LL ' is an estimate of LL ' , then any L, such that LL ' = LL ' , is an 2 estimate of L. The likelihood function of (jj,a ,M) can be expressed as *(Z b h) - Km(I'h) %3-l,H|Mh-m-l) | (2tr/gn) (a I+M) | 2\o I+M| 2 (2a ) 2pr (%0) x exp[-Jsgn(x -u ) ' (a 2I+M) _1 (x -y ) -hb/a 2-^tr (a2I+M) -1H] , 77 where, as before, K_1(l,h) = 2hmh ^m(m-D n T(h(h-j + l)). m j=l The logarithm of the likelihood function, omitting a function of the observations, is -b/2o -h$lno2-htr (a2I+M) -1H-^h£n | a2I+M[ -%£n| a2I+M|-^gn(x -y) ' (a2I+M) _1 (x -y) . We seek the solution which maximizes the equation above, or equivalent ly, the solution which minimizes b/a2+6£na2+tr (a2I+M)-1H+(h+l) In | a2I+M| (4.1.2) +gn(x -y ) ' (a2I+M)_1 (x -u_) . 2 If we ignore the constraints that a is positive and M is nonnegative definite and seek the stationary values of (4.1.2) 2 over all possible (v,o ,M) , we find, upon taking the partial 2 derivatives of (4.1.2) with respect to a , M, and y and setting them equal to zero, that -b/ (a2)2+B/a2-tr (a2I+M) -1H (a2I+M) -1+ (h+1) tr (a2I+M)_1 -tr (gn(a2I+M)_1(x -y ) (x -y ) ' (c2I+M) _1) = 0, - (a2I+M)_1H(a2I+M)_1+(h+l) (a2I+M) _1 -gn(o2I+M)_1 (x -y) (x -y ) ' (o2I+M) _1 = (0), gn(a2I+M)_1(x -y) = 0, for which the solutions are u, = x , o2 = b/6, M = (h+l)_1H - (b/6) I. 78 Since M is a nonnegative definite matrix, its maximum likeli- hood estimate must also be nonnegative definite, so the solu- tions above are the maximum likelihood estimates only if (h+1) H - (b/B)I is nonnegative definite. Although the 2 solutions for _y and a are the natural unbiased estimates, the solution for M is not since E (M) = (h+1) (hH-o I). In addition, we observe that E (M) is also not necessarily nonnegative definite. Using the principle of marginal sufficiency referred to in Chapter 2, we see that (b,H) is marginally sufficient 2 for (a ,M) . Hence, we choose to use the marginal likelihood 2 function of (a ,M) instead of the likelihood function of 2 2 (y,a ,M) . The marginal likelihood function of (a ,M) can be expressed as f(b,H> ._ llU'h2\s b%6-l,H|Mh-m-l) x exp[-b/2a2-J5tr (a2I+M) -1H] . The logarithm of the likelihood, omitting a function of the observations, is -b/2a2-^B£na2-^tr(a2I+M)-1H-%h£n| a2I+M| , and we seek the solution which maximizes this equation, or equivalently , the solution which minimizes b/a2+6£no2+tr (a2I+M)_1H+h£n| a2I+M| . (4.1.3) 2 Again, if we ignore the constraints that a is positive and M is nonnegative definite and seek the stationary value of 2 (4.1.3) over all possible (a ,M) , we find, upon taking 79 2 the partial derivatives of (4.1.3) with respect to 0 and M and setting them equal to zero, that -b/ (a2)2+8/a2-tr (a2I+M) _1H (a2I+M) -1+htr (a2I+M)_1 = 0, and - (a2I+M)-1H(a2I+M)-1+h(a2I+M) 1= (0) , for which the solutions are a2 = b/B, M* = (l/h)H - (b/3)l. Note that these solutions are the natural unbiased estimates 2 of a and M and, clearly, E(M^) = M is nonnegative definite. Hence, we choose to continue our work with the marginal like- 2 lihood function of (a ,M) . Since M is nonnegative definite, the solutions above are the maximum likelihood estimates only if (l/h)H - (b/8)I is also nonnegative definite. In the next section we will derive maximum likelihood estimates for 2 a and M which are valid for all possible (b,H). 4 . 2 The Maximum Likelihood Estimates In this section we seek the maximum likelihood estimates 2 2 s of a and M subject to the constraints a > 0 and Me UP.. j-0 3 Recall that, aside from a constant, the logarithm of the 2 likelihood function of (a ,M) is -h/2a2-h&lno2-htr(o2I+M)~1E-hhln\ g2I+M| . We seek the solution which maximizes the equation above, or equivalently , the solution which minimizes b/a2+0£na2+tr (o2I+M) -1H+h£n I a2I+M| 80 2 S subiect to a > 0 and Me U P.. Note that this can be j-0 3 rewritten as tr (a2I)-1(-I)+ -£n|a2l| +tr (a 2I+M) _1H+Mn| a2I+M| . (4.2.1) v m ' m ' ' ' ' Since (b/6)I and H* = (l/h)H are both symmetric matrices, m and (b/3)I eP and H^e UP., there exists a nonsingular m * j=0 J matrix K(mxm) such that K ( (b/3) I) K ' = I and KH*K ' = D where D = diag (d, ,d„ , . . . ,d ) and d,>d2> . . . >d > 0 are the latent roots of H*((b/3)I)-1 = (@/b)H*. Then with a2 = Ba2/b and M = KMK', (4.2.1) can be rewritten £ tr K'~1(a2I)"1K-1+ -In I a2I I +htrK '_1 (a2I+M) -1K-1D m m i 2 i + h£n[ a I+M| = -^[tr (a2I)_1+S,n]a2l| ] +h [tr (a2I+M) _1D+£n I a2I+M| ] m S i i 2 - (£+h) in K m = (a2I,a2I+M;D,-,h)- (- +h)£n|K|2, where $ is the function discussed in Section 2.2. Thus, the ~2 ~2 problem has been reduced to that of minimizing 0 and MeU P., or equivalently , m j=0 3 (a2I,a2I+M) £ C since C = { (A,B) : AeP , BeP , 3-A£ UP.}. s s mm -;_qJ 2 2- 2 Now for fixed (a I, a I+M) e C consider <|>(P(a I)P'r P (o2I+M)Pr ;D,-,h) = A (a2I,P(a2I+M)P' ;D,-,h) for all orthog- 'm m onal P. Note that this is minimized with respect to P when 2 ~ -1 tr P(o I+.M) P'D is minimized. So from Lemma 2.2.1 it fol- lows that all stationary points, and therefore the absolute 7 1 ~ B 2 minimum, of $ (a I ,P (a I+M ) P' ; D ,-,h) occur when P"(0 I+M)P' 81 is diagonal. Hence, in searching for the absolute minimum 9 0 ft "}0~" of <$> (o 1,5 I+M;D, — ,h) over all (a I, a I+M) e C we may assume 2 that a I+M is diagonal. This result also follows immediately from Lemma 2.2.12. Now with V = diag (vn , v„ , . . . ,v ) and f . (v . ) = d . /v . 3 * 1' 2 m 11 11 + £n v . , consider minimizing l v R 1 m d . 4>(uI,V;D,-,h) = 3(- +£nu)+h E (— + £nv.) (4.2.2) m u . , v. i i=l l 1 m = B(i +£nu)+h Z f.(v.) , i=l subject to (uI,V) eC . The constraint (ul ,V ) zC can be equivalently written as v. > u > 0 for i = 1,2,.. .,m, (4.2.3) and v. = u for i e J , (4.2.4) where J c {l,2,...,m} is a set which has at least m - s elements. Now df (v ) —J — = (l-d./v )/v., dv- 11 i so that the function f. decreases monotonically in v. for l 2 l v. e"(0,d.l, increases monotonically in v. for v. e [d.,00), li J i 11 and is minimized over all v. e (0,°°) when v. = d. . Thus, i 11 the unrestricted minimum of (4.2.2) occurs when u = 1 and V = D. It is evident from the structure of f . that if the l unrestricted minimum does not satisfy the constraints (4.2.3) and (4.2.4), then the restricted minimum will 82 occur when u = v. = v. = ... = v. for some set of integers {i. ,i„ / . . . ,i, } <3 {1,2 , . . . ,m} . We need to determine k, the number of integers, and also we need to know exactly which k integers from amongst the integers l,2,...,m comprise the set {i,,i2 , . . . ,ik> . First, we will consider the constraint given by (4.2.3). Let the variable r be defined in the following manner. If l (B+h l d.)/(3+rh). m_r j=m-r+l 3 Finally, if d 0, the minimum of (4.2.2) subject to (4.2.3) is just the minimum of (4.2.2) subject tou=v = v ,=...= v . n which occurs at J m m-l m-r+i m fu = vm = ... = vm _, = (B+h I d.)/(B+rh), I m m_r+1 j=m-r+l 3 I v. = d. for i = l,2,...,m-r. v. 1 1 Now consider the constraint given by (4.2.4). If r > m - s, then the minimum of (4.2.2) subject to (4.2.3) 83 and (4.2.4) is simply the minimum of (4.2.2) subject to (4.2.3). If r < m - s, the minimum of (4.2.2) subject to (4.2.3) and (4.2.4) is obtained by minimizing (4.2.2) sub- ject to rn = v . = v . = . . . = v . if r = 0, ) Jl '2 Dra~s (4.2.5) I u = v = = v ,i - v. = ... = v . if r > 0, m"r+1 3l ^m-s-r where { j- , j2, . . . Om_s_r^ G {1,2 , . . . ,m-r-l,m--r} . We will now show that, in fact, j, = m-r, j„ = m-r-1, . . . , Jm_s_r = s+1. Note that for q = l,2,...,m-l (4.2.2) is minimized subject to u = v = . . .=v , , when J m m-q+1 m ru - »„ - — - %-q+i - ,3+hj=m£q+1dj)/,6+qh)' l^v . = d. for j = l,2,...,m-q, and has a minimal value equal to m •B+h I d •_ _ -, j \ m-q 3B+qh+ — J + (3+qh)+h(m-q)+h £ Zndy (4.2.6) Similarly, (4.2.2) is minimized subject to u = v = ... = v , - v. , where i e {1 , 2 , . . . ,m-q-l ,m-q} , when m-q+1 l m-q+ m ru-v = . . . = v _,, = v. = (B+h I d.+hd. ) / (B+(q+l)h) , I m m-q+1 i j=m-q+i 3 ! L.V. = d. for j = 1, . . . , i- ] , i+1, . . . ,m-q, and has a minimal value equal to m /B+h £ d .+hdA _ _ ( B+ (q+l)h)£nl ^-(f+TTh r( 3+ (q+1) h) +h (m-^_1) +h . Z £ndj * (4.2.7) 84 Now subtracting (4.2.6) from (4.2.7), we obtain m m /6+h E d.+hd\ S+h E d\ (3+(q+l)h)&nf i^'^l j-(B+qh)£nf ^M*^, ^ (4.2.8) which is the increase in the minimal value of (4.2.2) due to the additional constraint, u = v.. Differentiation of (4.2.8) with respect to d. yields J msin^ . i , \3+h E d.+hd./ x j=m-q+l : 1 m which is negative when d. < (B+h E d.)/(g+qh) and positive 1 j=m-q+l J m when d. > (8+h E d.)/(6+qh). Hence, (4.2.8) is an increas- j=m-q+l -* m ing function of d. when d. > ( 3+h E d.)/(8+qh), so that if 1 X j=m-q+l D m d > (B+h E d.)/(B+qh), choosing i = m-q will yield m~* j=m-q+l 3 a smaller minimum value than any other choice of i < m-q. In a similar manner subtracting the unrestricted minimal value of (4.2.2) from the minimal value of (4.2.2) subject to u = v. where i e { 1 , 2 , . . . ,m} , we obtain rtj+hdA (6+hnnl-g^l-h £ndi, which is an increasing function of d. for d. > 1. Thus, if d > 1, choosing i = m will yield a smaller minimum value than m any other choice of i < m. Recall that we are investigating the minimum of (4.2.2) subject to (4.2.3) and (4.2.4) when r 1, so that m-r = m is the 85 optimal choice for j,. Further, since d. > 1 for i = 1,2, where r = 0 , we have m d, (B+crh) ( Z d./a) m m 6+h Z d^ (6+qh) ( Z d . /q) j=nt-q+l " < j=m-q+l 3 __ _ v 6+qh (6+qh) " j=m-q+l j m"q' for q = 1,2, . . . ,m-l, and hence, when r = 0 choosing j, = m, j2 = m-l,...,j = s+1 in (4.2.5) will yield a smaller minimum than any other choice of {j-,,J-,/...,i }a{l,2, J 1 J 2 ' Jm-s ...,m-l,m}. Now from the definition of r we see that m dm-r > (6+h Z d.)/(B+rh) if 1 < r < m-1. In addition, j=m-r+l J m for q = l,2,...,m-2 if d >(6+h l d.)/(6+qh), then q j=m-q+l J m m 6+h Z d. (6+qh) ((6+h Z d.)/(6+qh)) + hd j=m-q 3 = j=m-q+l J m_q ,m 6+(q+l)h 6+(q+l)h < ((6+qh)dm_q+hdm_g)/(6+(q+l)h) = d < d m-q m-q-1 m Thus, d > (6+h Z d )/(6+qh) for q = r ,r+l , . . . ,m-l . j=m-q+l -1 It then follows that, when 1 < r < m-s, choosing j, = m-r, j2 = m-r-1, . . . , jm_s_r = s+1 in (4.2.5) will yield a smaller minimum than any other choice of {i-,,i«,...,i }c 1 ~ 2. m-s-r {1,2,. . . , m-r-1, m-r } . We can now obtain the minimal solution to (4.2.2) subject to (4.2.3) and (4.2.4). Denoting the minimal solution by (u ,V ) , we find that if r > m-s, 86 Uo=VcTT1=Vo «. i = ---=vc: ™ ^i=(B+h z d )/((3+rh), s sm s,m-l s,m-r+l j=m-r+l -1 v . = d. for j = 1,2, . . . ,m-r , and if r < m - s, 'u.=v =vo m i = ---=v<= c+r 0 and Me U P . , we see that j=0 3 „2 aQ = 5.95987, MQ = (0) , aj = 2.3551, 88 89.8421 4.92338 -.808366 -.0931905 kl = .269803 -.0442987 -.00510687 .00727337 .000838493 .0000966637 o2 = .967085, 91.3255 3.17993 -.826433 -.0 715582 33.4811 .0575388 -.426237 M2 - .00770191 -.000448497 .0054369 M3=M4= 2 -2 3 = a4 = .965608, 91.327 3.17993 -.826136 -.0715587 33.4825 .0574547 -.426256 .0416617 -.000452157 .00543713 4. 3 The Likelihood Ratio Test s Recall that C = { (A,B) : Ae P , Be P , B-A e U p.}, and . s mm .q 3 2 ° suppose we know that (a I,a"I+M) eQ = C . We wish to test, 2 2 say, the null hypothesis that (a I, a I+M) e w = C _,c C . 2 2 The alternative hypothesis, then, is that (c I, a I+M) e n-oj - C - C _, . Thus, we are testing the hypothesis ,(s) l0 rank (M) < s - 1 against the hypothesis (s) H' ' : rank(M) = s. 89 We adopt the likelihood approach and look at max f(b,H)/max f(b,H) = X e (0,1]. With u and the matrix V = diag (v , ,v ~,...,v ) s s ' si s2 sm given by m 'u = v =...=v _,, = (6+h E d.)/(0+rh), s sm s,m-r+l .__ ,, 1 ' j=m-r+l J v . = d. for j = l,2,...,m-r, if r > m - s, and u = v =...=v .. = (3+h I d.)/(B+(m-s)h) , s sm s,s+l j=s+l 3 v . = d. for j = l,2,...,s, S] j -2 2 if r < m - s, the maximum likelihood estimators a , of a , and MQ, of M, when the parameters are restricted to lie within Q, , are given by °l = bus/3' MQ = K"1(VUsI)K''_1' where K is a nonsingular matrix. Similarly, the maximum ~2 2 likelihood estimators a , of a , and M , of M, when the parameters are restricted to lie within co , are given by °l = bus-l/B' .K = K-1 m - s +1, then V = V , and s s-1 u = u , . s s-1 90 The likelihood ratio, A, is A = max f(b,H)/max f(b,H) exo[-b/2a/2-35tr(S2I+M )_1H] (52) h® I S2I+Mn |%h exp[-b/2a2-^tr(a2I+ftn)-1H] (a2)h^\o2l+9i \hh U Q, U CO ' CO CO ' exp[-6/2u ,-JshtrV 11D] Iv I hh uhi S-I s-i ' s1 s exp[-6/2u -^htrV XD] |v . I hh u^6, S S S~ X 3— X IV \hh u* s s s-11 s-1 since, if r > m - s + 1 6(us-l"us1) + h tr(Vs-l"Vs1)D (4.3.1) = 6(u^1-u^1) + h tr(V~s1-V~sl)D = 0, and if r < m - s + 1, (4.3.1) becomes g|V(m-s+l)h - B+(m-s)h \hL) + /Vfrn-s+DhA ™ d_ _ \m m/\ I m j ■ - 1 ^B+h Id. 8+h Z d/ V \B+h Id. V3-53 j=s -1 j=s+l -1 j=s -1 g+(m-s)h "\ ™ 1 m / •_ , -i j / 3+h E d.y 3-s+l y j=s+l D B+(rS+1)hl (B+h ? d.) -fj±fet£lhJ\ (6+h Z d.) - h .B+h z d 7 3=s - v6+h z d.y ^*+1 D j=s D j=s+l 3 ; + (m-s+l)h - (B+(m-s)h)-h = 0. 91 So we have %(6+h(m-s) ) X = + (m-s+l)h>^+h(m-s+l))p.^ + 1djA .f < +1 « 0 , J. — rr— if r < ra-s+1, mi I 3+ (m-s) h I ' B+h I d. / V / j=s : if r > m-s+1. Putting t = hd /(6+h £ d.), we can rewrite A as j=s D ^fB+(m-s+l)hfh f6+(m-s+l)h>(6+h(m-s)) h 6+(m-s)h .Jsh 's t.-(l-y (6+h(m-s)) if r < m-s+1, v if r > m-s+1 We will now show that r < m-s+1 if and only if t > h/ (6+ (m-s+1) h) . First consider the case in which s = m. Then r < m-s+1 = 1 if and only if d > 1, and m t = hd /(6+hd ) = h/(6/d +h) > h/(6+h) if d > 1, and m m m m m t ■= h/ (B/d +h) < h/ (B+h) m m if d < 1. Consider now the case in which 1 < s < m-1. m Again we want to show that r < m-s+1 if and only if t > h/ (6+(m-s+l)h) . If r = 0, clearly d . > (B+h E d.)/(6+ih), j=m-i+l 3 for i = 1,2, ... ,111-1. Also, if 0 (3+h Z d.)/(B+rh), m-r . , , 1 j=m-r+l J and we have seen that this implies that m d > (3+h E d.) / (3+qh) , m-q • , i 1 ^ 3=m-q+l J for q = r ,r+l, . . . ,m-l and, more specifically, for q = m-s. Hence, if r (3+h Z dJ/(3+(m-s)h) , which implies j-8+1 3 + h E d. < d (3+(m-s)h)+hd , i s s D=s J so that + h Z d. < hd (£ + m-s+1 , 1 s h D=s J and thus hd t = s m 3/h + m-s+1 3+ (m-s+1) h* 3+h E d. j=s : Also, if r > m-s+1, then it must be true that dm-(m-s) = ds * (3+h_=Z+id.)/(3+(m-s)h) which implies that t = -S- * s m 3+ (m-s+1) h 3+h E d. j=s : 93 It follows that the likelihood ratio, A, can be written as fe+ (m-s+1) hY2*1 (&+ (m-s+1) h\h ( 3+h (m"s) } . hh .Js(3+h(m-s)) ^ h J ^3+(m-s)h J s [L sj A=\ if fcs > 3+ (m-s+1) h ' h V 1 lf ts " 3+ (m-s+1) h * ^ ,_ • /^ \ u-^h., , % (3+h (m-s) ) Consider the function g t = t2 1-t )2X . s s s The derivative of g (t ) with respect to t is ^ s s t%h"1(l-t )H^+h(m-s))-l[hh{1_t )_^{3+h(m_s))t ], s s s s which is negative for t e (h/ (3+ (m-s+1) h) , 1) . Thus, A is a s decreasing function of t when t e (h/ (3+ (m-s+1) h) , 1) . In s s addition, i+ (m-s+1) h\i8hrB+(m-s+l)h^(g+h(m"s))^hf., . ,h (3+h (m-s) ) < . h J ^ 3+(m-s)h J s U~V " L' for t e (h/ (3+ (m-s+1) h) ,1) , with equality when t = h/ (3+ (m-s + 1) h) , so that A is a decreasing function of t . (s ) Since the likelihood ratio test rejects H~ for small (s) values of A, it equivalently rejects Hq for large values of t . However, the distribution of t is intractable, and s s (s) (<=) so use of t in a test of H„ versus H, is not practical, s 0 1 In the following chapter we present an alternative test (s) (s) statistic for testing H~ versus H' CHAPTER 5 AN ALTERNATIVE TEST WHEN Z = a2I AND ITS PROPERTIES 5 . 1 Introduction We have seen that the likelihood ratio test rejects (s) m H_ for sufficiently large values of hd /(8+h E d.), i=s where d,>d„>...>d are the solutions to I H+-d (b/B) 1 1 = 0. l /. m i -k i Let i^->^2> . . . >\]j be the solutions to | H— ipbl | = 0, that is, ^i =hd./6 for i = 1,2, . . . ,m. Then the likelihood ratio test (s) m rejects H' for sufficiently large values of ip / (1+ £ \\i . ) . i=s This quantity is an increasing function of ^ , so that it (s) would be reasonable to reject H~ for sufficiently large values of tj; . However, the complexity of the null distribu- (s) (s) tion of \\j makes the use of 4> in a test of H^ versus H, impractical. Therefore, in this chapter we present an (s) (s) alternative test statistic for testing H^ against H, (s) m and consider the test which rejects H* when E iLi . is 0 . ri i=s sufficiently large. In the remainder of this chapter we investigate some properties of this new test. In Section 5.2 m it is shown that the test based on I \b . is an invariant test i=s x (s) (s) of HQ against H . In the last two sections we discuss 94 95 an important monotonicity property of the roots ty . i i = 1,2,... ,m and use this property in deriving the asymptotic m distribution of Z ty • • 5 .2 An Invariance Property Consider the group of transformations G = {g : a> 0, a , P P (mxra) is such that PP ' = al}, where g _(b,H) = (ab,PHPf). a , ir If b ~ a2xl and H~W (a2I+M,h,0), then ab~aa2x?, PHP ' ~ 2 W (ao I+PMP ' ,h, 0) and rank (PMP ' ) = rank(M). Hence, the m (s) problem of testing the hypothesis H' : rank (M) < s-1 against (s) H, : rank(M) = s is invariant under the group G. Now consider the roots tL> -. >\b~> . . . >^ of |H-^bl| = 0 12 m i t i and the roots 6,>9^>...>e of [PHP'-9abI = 0, where a>0 and I 2. m ' ' P is such that PP ' = al . Clearly, | PHP' - 6abl | = 0 implies | PHP' - 9bPP' |=0, so that | h - ebi |=o, and thus, 9. = ^ . : i = l,2,...,m. Suppose now that 9. = \b . : i = 1,2,. . . ,m are the roots of |H,-9b,l| = 0 11 '11' and | H_— ip fc> „ I | = 0, respectively, where b, > 0, b^ > 0 , and H, and" H- are positive definite, symmetric matrices. There exist orthogonal matrices Q, and Q? such that 96 Q1(H1/b1)Q;[ = V, Q2(H2/b2)Q^ = Y, where f = diag (i|j,, i|>2 , . . . ,ty ). Take a = b_/b, and P = /a QnQi • It now follows that (a) 9a,P(bl'Hl) = (ab1,PH1Pr) = (b2,ab1Q^Q1(H1/b1)Q^Q2) - 0 , so that g D e G. Therefore, by Definition 3.3.1 a ,^ {i|i:|H-^bl| = 0} is the maximal invariant with respect to m G. The test statistic £ ty . is clearly a function of i=s (ijj, ,ijj_ , . . . , xp ), and so, by Lemma 3.3.2, the test statistic m £ iJj . is an invariant test statistic for testing the i=s (s) (s) hypothesis H~ against the hypothesis H-/ 5 . 3 A Monotonicity Property of the Power Function The test procedure which we have been investigating depends on the latent roots ip, , i\> ,.,..., \\i of the random matrix H(bl)"1 = (a"2H) ( (b/a2 ) I ) _1 . If we let 91>02>... -2 2 >0 be the latent roots of a H, then i/j . = a 8./b: m li 97 i = l,2,...,m. The distribution of the roots 6,,9~/.../0 (see, for example, James [1964]) depends upon the latent -2 roots of the corresponding population matrix I+a M as param- -2 eters. Let 5,>6.->...>6 >1 be the latent roots of I+a M, 1 A m and note that M has rank of at most s-1 if and only if 6 = 1. s (s) (s) Thus, testing the hypothesis H~ : rank (M) £ s-1 against H-/ : (s) rank(M) = s is equivalent to testing the hypothesis H' : (s) m 6 =1 against H, : 6 > 1. Since we are using Z \b . as s ^ 1 s r . ri ( s ) a test statistic in testing the hypothesis Hn (s) against H, , a desirable property would be that it stochasti- cally increases in 5 , and hence, the power function increases monotonically in 6 . In this section we not only show that m Z ill. stochastically increases in 6 , but also that it l s i=s stochastically increases in each 6.: i = l,2,...,m. This more general result will be utilized in the following section.*. We will need the following results from Anderson and Das Gupta [1964] . Lemma 5.3.1: Let X (m*h) (h>m) be a random matrix having density f(X;Z,h) = (2tt) -ishm| E | _i2h exp [-3strZ_1XX ' ] , where I is positive definite. Let X, >A0> . . . >A be the latent c 1 A - m roots of XX' and to be a set in the space of A, , A~ , . . . , A LA m such that when a point (A,,A~,...,A ) is in lo so is every t- 1 ' 2 m -L point (A ' , A ' , . . . , A ') for which A.' ^ A- : i = 1.2,...,m. 1 A m ii Then the probability of the set co depends on Z only through 98 the latent roots of I and is a monotonically decreasing func- tion of each of the latent roots of Z. Lemma 5.3.2: Let A be a positive definite matrix of order m, and D and D^ be two diagonal matrices of order m such that D4 - D is positive semidef inite , and D is positive definite. Then ch. (DAD) £ ch. (D^AD*) for i = l,2,...,m. Using these two results, we can now prove the main result of this section. Lemma 5.3.3: Let X (mxh) be a random matrix having density f(X;D,h) = (2TT)-J5hm|Dr^h exp[-%trD_1XX'] , where D = diag (d ,d2 , . . . ,d ). Let V (mxm) be a random, posi- tive definite matrix independent of X. Let u be a set in the space of the latent roots of XX 'V satisfying the condi- tion stated in Lemma 5.3.1. Then the probability of the set co is a monotonically decreasing function of each of the elements of D. Proof: Consider V as fixed, and let V = T'T where T is nonsingular. Then the density of W = TX is f(W;TDT',h), and ch. (XX'V-1) =ch.(TXX'T') =ch.(WW') l l l for i = l,2,...,m. Thus, for any fixed V, we have _/.. f(X;D,h)dX= / f (W;TDT ' ,h) dW (5.3.1) R(X) R(W) 99 where R(X) denotes the region {X: (ch (XX 'V_1) , . . . ,ch (XX 'V-1) ) £ to}, and R(W) denotes the region {W: (ch, (WW') , . . . ,ch (WW')) e w}. Let D* be a diagonal matrix for which D^-D is positive semidefinite. It follows from Lemma 5.3.2 that chi(TDvtT') = chi(D^(T'T)D^) ;> chi (D*5 (T 'T) D*) = ch.(TDT') for i = l,2,...,m. Then from Lemma 5.3.1 and (5. 3.1) we have / f(X;D,h)dX > / f (X;D*,h)dX R(X) R(X) for any fixed V. Taking expectations with respect to V, we find that PD(u» > PD#(«). Now recall that we are investigating the test statistic m I iK. Let P be the orthogonal matrix such that P(I+a M)P' i=s = A = diag(61,52, . . . ,6m) . Then since a~2E ~ W (I-ra~2M,h, 0) , it follows that P(a"2H)P' ~W (A,h,0), and we can write -2 P(a H)P' = XX', where X (m*h) has density f(X;A,h) given in Lemma 5.3.1. The latent roots of a~2H ( (b/a2) I) -1 are the latent roots of P (a"2H) P ' ( (b/a2) I ) ~\ or equivalently , 2—1 9 XX '((b/a )I) . Hence, with V = (b/a )I, clearly V is independent of X, and 1^,1^,...,^ are the latent roots of XX 'V . in addition, if E ii> . < c and \b.'<\b. : i = l 2 m X 1~" 1 ■*-?**? mm* fill f m i=s ~ then I ty' < c, so that the set to = { (ty , ^ , . . . ,ty ) : 2 iJj.^c} i = q x 1 ^ m . ri ■ ■ _ 1=s satisfies the condition of Lemma 5.3.3. So it follows from Lemma 5.3.3 that the probability of the set co is a monoton- ically decreasing function of each of the latent roots 100 _2 6,,62,...,<5 of I+c M; in other words, the power function m of the test based on I i> . is a monotonically increasing i=s function of 6 . : i = l,2,...,m. m We now know that as 6 +00 P( I i|j. >c) increases i=s monotonically. We will show that, in fact, as 5 -*• « s m P( I i(j. > c) -*■ 1, and thus, for sufficiently large values i=s . of 6 the probability of rejecting H* : 6 =1 will be arbitrarily close to unity. Let K, (mxm) be such that K1 = diag(ak1,ak2, . . . , ak , 1 , . . . ,1) . Note that K, P (a~2H) P 'K ' W (K, AK,' , h,0), and 1 1 rail K,AK' = diag(a2k26, ,a2k26~,. . . ,a2k26 ,1,...,1), li 3 11 22 s s iii 2 2 so that as a •*■ °° , ch . (K,AK') = a k 6 . -> °° for i = 1,2, . . . ,s. Thus, we need to show that m -2 2-1 P( Z ch^^CK^io H)P'K-[( (b/a^)I) x) > c) + 1 i=s as a -> oo. However, clearly, m -2 2-1 P( E chi(K1P(a ^HJP'K^ ( (b/a )I) ) >c) i=s 2 P(chs(K1P(a"2H)P'K^((b/a2)I)"1) > c) . The result now follows from the following lemma. Lemma 5.3.4; Let V (mxm) and U (mxm) be random matrices independently distributed such that both V and U are positive definite with probability one. Let K, (mxm) = diag (ak, ,ak2, . . . ,ak ,1,...,1). Then P (chg (K-^K 'V_1) > c) + 1 as a ■*■ ». Proof: The proof is identical to that of Lemma 3.5.6 101 5 . 4 The Limiting Distribution of Z \p . m If I ij). is to be used as a test statistic in the test i=s of the hypothesis fAs' : rank (M) < s-1 against h|s) : rank (M) = s, it is necessary to compute the significance level, a, where m . . a = sup P ( I ty. > c|Hi ' ) . H(s) i=s x ' u H0 _2 Let 6, ><$_>... >6 be the latent roots of I + a M, and recall i- z~ - m (s) that the null hypothesis can be written H* : 6 = 1 , or more precisely, H^s) : 6,>6~>...>& ,>1,<5 =6 ,,=...=6 =1. 0 12 s-1 s s+1 m We will write ty. 16 , , &2 • • • • ' 6 ) to indicate that ty . is the i largest root of m roots and depends on the popu- lation roots 5, , <$2 / . . • * <5 • Then we may write a, the signif- icance level, as m <* = sup P( E c) . 61>62> . . .>6g_1>l i=s 1,m L * s"i However, we have seen in the previous section that i> . is ri:m stochastically increasing in each 6.: j = l,2,...,m. It then follows that m a = P( Z ^i.m(00,00, . . . ,°°,1, . . . ,D > c) , i = s where \p . (°° ,°° , . . . ,°° , 1 , . . . , 1) denotes the random variable which has the limiting distribution of i> . (6,, 6 _,..., 6 n, i:m 1 2 ' s-1 1,...,1) as 6. -> °°: j = 1,2 , . . . ,s-l . Hence, we need to deter- m mine the distribution of I \b . f00,00 , . . . ,°° , 1 , . . . , 1) . i:m iii ii i=s 102 Recall that b ~ o y a an<3 H ~ W (a I+M,h,0), and ap m -2 there exists an orthogonal matrix P such that P (I+a M)Pf = A = diag (6, ,6-, . . . , 6 ). If we define B and H as B = (b/a2)A~1, H = A-J5P(a~2H)P'A~35 -h -k -k where A 2 = diag ( " -k. 1 ,u2 '••* ' m i:m(6l'62 !) , then, clearly, H~W (l,h,0) and f , (6, ,6v...,fi ) = ch . ( (a"2H) ( (b/a2) I) X) = ch. (HB 1) . Then if we let B = (b/a2)A , where l n n A = diag (n6, ,n6~ , . . . ,n6 , ,1,...,1), we need to find the n r 1 2 s-1 m __i limiting distribution of Z ch. (HB ) as n -> °° . r .in i=s We will need the following result. Lemma 5.4.1: Suppose v ~ x • Then x (mxl) = — n "c,v/n c-v/n cs-lv/n *1 where c c- , . . . ,c , are constants and x, ( (s-1) xl) is a s-1 — 1 degenerate random vector with all of its probability at 0^. Proof: Clearly x, and v are independent, so the character- istic function of x is 103 E [exp (i x ' t) ] = E [exp ( i x-f t, ) exp (i v E t . ) ] j=s D E[exp(i x{t1) ] E[exp(ivE t.)] j=s : = E[exp(i v E t .)] j=s 3 m _j, = (1 - i 2 E t.) j=s J 3CC Now the characteristic function of x is — n s-1 m E[exp(i x't)] = E[exp(i( E c.t./n + E t.)v] j=l 3 3 j=s 3 s-1 m _, = (1 - i 2( E c.t./n + E t.) ) 2C\ j-i D : j=s y so lim E[exp(i x't)] = (1 - i 2 E t.) 2a = E[exp(i x't)]. n+» " j = s -* The result now follows from the continuity theorem (Lemma 3.6.2) . From Lemma 5.4.1 we observe that B > B with n - 8=fBi ,o,y V(o) B2y where §1 ( (s-l)x (s-1) ) = (0) with probability one, B„ ( (m-s+1) 2 x (m-s+1) ) = bl , and b -» xR- We now need to show that m __! E ch . (HB ) is continuous with probability one under the i=s distribution of (H,B) . Put fEll H12A H = H21 H22' 104 where E^ is (s-l)x(s-l), H12 is (s-1) x (m-s+1) , H21 is (m-s+1) x (s-1) , and H„2 is (m-s+1) x (m-s+1) . Then the roots of interest are the solutions to H 11 H 12 (0) (0) - iC H, H, JO) B. = 0. (5.4.1) '21 "22' vv~' 2 Since H is nonsingular with probability one, we may put H"1 - G . '11 '21 '12 '2 2- so that (5.4.1) can be written f(0) i - i> m r d0) G12 B2 G22 B2 = 0, s-1 (0) " ^G12B2 m-s+1 r 22 2 Hence, it must be true that = 0 Vs+1 " ^G22B2I = °' G22 " ^2| = 0. (5.4.2) Thus, with probability one ch, (HB ),ch2(HB ),..., ch , (HB~ ) are undefined and ch (HB ),ch L, (HB- ),..., s-1 s s+1 ch (HB ) are the solutions to (5.4.2); that is, since B is of rank m-s+1 with probability one, there are only m-s+1 105 solutions to | H— ipB | = 0. Now since H22-H2,H, , H, ~ is non- singular with probability one, it follows that G„~ = (H22-H2,H, , H.. _) , and so (5.4.2) can be rewritten |H22 - H21H-^H12 - *B2| = 0. Clearly B2 is also nonsingular with probability one, and thus, by Lemma 3.6.6, ch . (HB ) is continuous with proba- bility one under the distribution of (H,B) for i = s,s+l, m •A«-l, . . . ,m. This implies that Z ch . (HB ) is also continuous i=s with probability one under the distribution of (H,B) . Note that the set of discontinuity points, R, is closed, since R = {(H,B): |§2| = 0} , and also recall that H22~**21**ll**12 ~ Wm-s+l ^ ,h~s+1'°* ' Therefore, from Lemma 3.6.7, since (H,B ) > (H,B), it follows that for i = s ,s+l , . . . ,m, *i:m(ca'M "'1 !) ^i-s+lzm-s+l*1'1 1} ' where ^i_s+1 .m_s+i t1 /!/••• > 1) denotes the distribution of the i-s+1 largest root of |w-ijjvl| = 0, with 2 W - W . (I,h-s+l,0) and v~Xo/ independently. Now if we let e,>e0>...>9TY, „,, be the solutions to I W- 6 1 1 = 0, then we 1 I m-s+l ' ' can put i>i.m (°°,°°, . . . ,°°,1, . . . ,1) = 6._ +1/v, so that m m-s+l ^ ^i.m(00,00, • . . ,°°,1, . . . ,D = Z 6./v = (trW)/v. i=s " j=l -1 But tr W ~ x i where v = (m-s+l) (h-s+1) , so m 1 *ism{-'- "'1 1} ^|F6- i=s M M 106 (s) (s) Hence, in testing H* : rank (M) < s-1 against H, : rank(M) = s , we choose -r F{v,3,a) as our critical value, where F(v,8,a) is the constant for which P(F(v,B) > F(v,8,a)) = a when F(v,8) ~FR. By so doing we will guarantee m , . sup P( E | F(v,B,a) |H*S') = a. „(s) i=s 1-m X 2 m 6 ° H0 In order to determine the rank of M, we will again use a sequential procedure. To illustrate this procedure, we will return to the example presented in Section 4.2. Recall that D = diag ( 94 . 1065 , 34.8845, 1.01721, .618312), h = 20, and 8 = 420, so that since i> . = hd . / 6 : i = 1,2,3,4, ty1 = 4.4813, it = 1.6612, ip = .04844, and ^^ = .029443. (4) We will first consider testing the hypothesis H^ : (4) rank (M) < 3 against H, : rank (M) =4. We reject the null hypothesis, Hq4), if i> > 17 F(17, 420, .05J/420. Now 17 F(17, 420, .05)/420 is approximately equal to .066 and iK = .029443 <.066, so that we do not reject H(j4) and, 13) instead, consider testing the hypothesis H^ : rank (M) < 2 against h|3): rank (M) = 3. The quantity 36 F(36, 420, .05)/420 is approximately equal to .122, and clearly \ii3 + ip4 = .07788 < .122, so that the null hypothesis, H^3* , is not rejected. Since H.! is not rejected, we next con- (2) sider testing the hypothesis H ' : rank (M) < 1 against h|2) : rank(M) = 2. We find that 57 F(57, 420, .05)/420 is approximately equal to .181, and therefore, since 107 (2) \p + ty- + \i>. = 1.7391 > .181, we reject H* ' and conclude that the rank of M could very reasonably be taken as being two . Note that this sequential procedure is open to the same objections, regarding the use of the significance level, a, at each step, mentioned earlier in Section 3.6. Again, however, it seems unlikely to cause serious error in practice. If the true rank of M is p, then there is a small probability, usually less than a, that the rank, s, determined by the sequential procedure will be greater than p. Also, if 6 is sufficiently large, then the probability ir of s being less than p is also small. BIBLIOGRAPHY Anderson, T. W. (1955) . The integral of a symmetric unimodal function over a symmetric convex set and some probability inequalities. Proceedings of the American Mathematical Society 6, 170-176. Anderson, T. W. (1958) . An Introduction to Multivariate Statistical Analysis. John Wiley & Sons, Inc., New York. Anderson, T. W. , and Das Gupta, S. (1964). A monotonicity property of the power functions of some tests of the equality of two covariance matrices. Annals of Mathematical Statistics 35, 1059-1063. Barnard, G. A. (1963) . Some logical aspects of the fiducial argument. Journal of the Royal Statistical Society B 25, 111-114. Bellman, R. (1970) . Introduction to Matrix Analysis. McGraw-Hill Book Company, Inc. , New York. Breiman, L. (1968) . Probability. Addison-Wesley Publish- ing Company, Reading, Massachusetts. Das Gupta, S., Anderson, T. W. , and Mudholkar, G. S. (1964) . Monotonicity of the power functions of some tests of the multivariate linear hypothesis. Annals of Mathematical Statistics 35, 200-205. Graybill, F. A. (1969) . Introduction to Matrices with Applications in Statistics. Wadsworth Publishing Company, Inc., Belmont, California. James, A. T. (1964). Distributions of matrix variates and latent roots derived from normal samples. Annals of Mathematical Statistics 35, 475-501. Kendall, M. G. , and Stuart, A. (1963). The Advanced Theory of Statistics Vol. 1. Hafner Publishing Company, New York. Lehmann, E. L. (1959). Testing Statistical Hypotheses. John Wiley & Sons, Inc., New York. 108 109 Mann, H. B . , and Wald, A. (1943). On stochastic limit and order relationships. Annals of Mathematical Statistics 14, 217-226. Marshall, A. W. , and Olkin I. (1974). Majorization in multivariate distributions. Annals of Statistics 2, 1189-1200. Morrison, D. F. (1976) . Multivariate Statistical Methods. McGraw-Hill Book Company, Inc., New York. Ostrowski, A. M. (1973). Solution of Equations in Euclidean and Banach Spaces. Academic Press, Inc., New York. Pillai, K. C. S. (1965). On the distribution of the largest characteristic root of a matrix in multivariate analysis. Biometrika 52, 405-414. Pillai, K. C. S. (1967). Upper percentage points of the largest root of a matrix in multivariate analysis. Biometrika 54, 189-194. Roy, S. N. (1953). On a heuristic method of test construc- tion and its use in multivariate analysis. Annals of Mathematical Statistics 24, 220-238. BIOGRAPHICAL SKETCH James Robert Schott was born on January 9, 1955, in Cincinnati, Ohio, where he spent the first twenty- two years of his life. Upon graduating from La Salle High School in June, 1973, he attended Xavier University, which is located in Cincinnati, and received the degree of Bachelor of Science with a major in mathematics in June, 1977. In September, 1977, Jim enrolled in the graduate school at the University of Florida and was awarded the degree of Master of Statistics in March, 1979. Since that time he has been working toward the degree of Doctor of Philosophy. While at the University of Florida, Jim has been a recipient of a graduate fellowship and, in addition, he has been employed by the Department of Statistics as a graduate assistant for both teaching and consulting duties. 110 I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. ■ Q -^ V >A,' John\Gl Saw, Chairman Professor of Statistics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Ct^r .A^n/V^K-^ Alan G. Agresti Associate Professor of Statistics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of/^Doctor of Philosophy. £ia Grams r of Pathology This dissertation was submitted to the Graduate Faculty of the Department of Statistics in the College of Liberal Arts and Sciences and to the Graduate Council, and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. August 1981 Dean for Graduate Studies and Research UNIVERSITY OF FLORIDA ■■■111 If 3 1262 08553 1803