// 


THE  MULTIVARIATE  ONE-WAY  CLASSIFICATION 
MODEL  WITH  RANDOM  EFFECTS 


BY 
JAMES  ROBERT  SCHOTT 


A  DISSERTATION  PRESENTED  TO  THE  GRADUATE  COUNCIL 

OF  THE  UNIVERSITY  OF  FLORIDA  IN 

PARTIAL  FULFILLMENT  OF  THE  REQUIREMENTS 

FOR  THE  DEGREE  OF  DOCTOR  OF  PHILOSOPHY 


UNIVERSITY  OF  FLORIDA 


1981 


To    Susan 
and 

My    Parents 


ACKNOWLEDGMENTS 

I  would  like  to  express  my  deepest  appreciation  to 
Dr.  John  Saw  for  suggesting  this  topic  to  me  and  for 
constantly  providing  guidance  and  assistance.   He  has  made 
this  project  not  only  a  very  rewarding  educational  expe- 
rience but  also  an  enjoyable  one.   I  also  wish  to  thank 
Dr.  Andre  Khuri,  Dr.  Mark  Yang,  and  Dr.  Dennis  Wackerly  for 
their  willingness  to  provide  help  when  called  upon. 

Finally,  a  special  thanks  goes  to  Mrs.  Edna  Larrick 
who  has  turned  a  somewhat  unreadable  rough  draft  into 
a  nicely  typed  manuscript. 


TABLE  OF  CONTENTS 

Page 

ACKNOWLEDGMENTS  iii 

ABSTRACT vi 

CHAPTER 

1  INTRODUCTION   1 

1.1  The  Random  Effects  Model,  Scalar  Case  .  .  1 

1.2  The  Multivariate  Random  Effects  Model   .  11 

1.3  Notation 14 

2  MAXIMIZATION  OF  THE  LIKELIHOOD  FUNCTION 

FOR  GENERAL  I 17 

2.1  The  Likelihood  Function 17 

2.2  Some  Lemmas 27 

2.3  The  Maximum  Likelihood  Estimates  ....  40 

2.4  The  Likelihood  Ratio  Test 46 

3  PROPERTIES  OF  THE  st      LARGEST  ROOT  TEST  ...  51 

3.1  Introduction 51 

3.2  The  Uniformly  Most  Pov/erful  Test 

for  m  =  1 51 

3.3  An  Invariance  Property 55 

3.4  The  Union-Intersection  Principle  ....  56 

3.5  A  Monotonicity  Property  of  the 

Power  Function 6  0 

3.6  The  Limiting  Distribution  of  (J)    ....  66 

4  MAXIMIZATION  OF  THE  LIKELIHOOD  FUNCTION 

WHEN  I  =  a2I 75 

4.1  The  Likelihood  Function 75 

4.2  The  Maximum  Likelihood  Estimates  ....  79 

4.3  The  Likelihood  Ratio  Test 88 

5  AN  ALTERNATIVE  TEST  WHEN  £  =  a    I 

AND  ITS  PROPERTIES 94 

5.1  Introduction 94 

5.2  An  Invariance  Property 95 


TABLE  OF  CONTENTS  (Continued) 
CHAPTER  Page 

5       (Continued) 

5.3  A  Monotonicity  Property  of  the 

Power  Function 96 

m 

5.4  The  Limiting  Distribution  of   I  ij).   .  .   101 

i=s 

BIBLIOGRAPHY   108 

BIOGRAPHICAL  SKETCH  110 


Abstract  of  Dissertation  Presented  to  the  Graduate  Council 

of  the  University  of  Florida  in  Partial  Fulfillment  of  the 

Requirements  for  the  Degree  of  Doctor  of  Philosophy 


THE  MULTIVARIATE  ONE-WAY  CLASSIFICATION 
MODEL  WITH  RANDOM  EFFECTS 

By 

James  Robert  Schott 

August  1981 

Chairman:   Dr.  John  G.  Saw 
Major  Department:   Statistics 

A  well-known  model  in  univariate  statistical  analysis 
is  the  one-way  random  effects  model.   In  this  paper  we 
investigate  the  multivariate  generalization  of  this  model, 
that  is,  the  multivariate  one-way  random  effects  model. 

Two  specific  situations,  regarding  the  structure  of 
the  variance-covariance  matrix  of  the  random  error  vectors, 
are  considered.   In  the  first  and  most  general  case,  it  is 
only  assumed  that  this  variance-covariance  matrix  is  sym- 
metric and  positive  definite.   In  the  second  case,  it  is 
assumed,  in  addition,  that  the  variance-covariance  matrix 
is  a  scalar  multiple  of  the  identity  matrix. 

Maximum  likelihood  estimates  are  obtained  and  the 
likelihood  ratio  test  for  a  hypothesis  test  on  the  rank 
of  the  variance-covariance  matrix  of  the  random  effect 
vectors  is  derived.   Properties  of  the  likelihood  ratio 


test  are  investigated  for  the  general  case,  while  for  the 
second  case  an  alternative  test  is  developed  and  its  prop- 
erties are  investigated.  In  each  case  a  sequential  proce- 
dure for  determining  the  rank  of  the  variance-covariance 
matrix  of  the  random  effect  vectors  is  presented. 


CHAPTER  1 
INTRODUCTION 

1. 1   The  Random  Effects  Model,  Scalar  Case 

Suppose  a  physician  is  considering  administering  some 
particular  blood  test  to  his  patients  as  a  part  of  their 
physical  examination.   He  suspects  that  the  test  results 
vary  with  the  presence  and  severity  of  a  particular  path- 
ological condition.   In  order  to  examine  variability  in 
the  results  of  the  blood  test,  the  physician  chooses  to 
administer  the  blood  test  n  times  to  each  of  g  patients. 
This  results  in  the  observations  x. .:  i  =  l,2,...,g; 
j  =  1,2, . . . ,n. 

A  suitable  model  to  explain  the  different  values  of 
x..:  i  =  1,2,. . . ,g;  j  =  1,2, ... ,n  would  be 

xij  =  y  +  ai  +  2ij-  d.1.1) 

Here  u  is  an  overall  mean,  a.  is  an  effect  due  to  the  i 
patient,  and  z. .  represents  a  random  error  due  to  the 
measuring  process.   We  assume  that  z..:  i  =  l,2,...,g; 

j  =  l,2,...,n  are  independent  and  have  a  normal  distribu- 

2 
tion  with  mean  zero  and  variance  a    . 

z 


If  the  physician  is  interested  in  using  the  blood  test 
as  a  diagnostic  tool,  he  will  certainly  be  interested  to  know 
whether  a  major  source  of  variation  in  the  results  of  the 
blood  test  is  due  to  variation  between  the  patients.   Since 
the  physician  will  administer  the  test  to  an  unlimited  num- 
ber of  patients  in  the  future,  we  should  properly  regard  the 
g  patients  involved  as  a  sample  from  the  entire  population 
of  patients.   The  patient  effects,  a.:  i  =  1,2,. . . ,g,  now 
have  the  role  of  random  variables,  and  (1.1.1)  is  a  random 
effects  model.   Again  we  assume  that  a.:  i  =  1,2,. ..,g  are 

independent  and  have  a  normal  distribution  with  mean  zero 

2 
and  variance  a  .   Thus,  from  our  model  (1.1.1)  we  deduce  that 

a 

x.  .  has  a  normal  distribution  with  mean  \i    and  variance 

2   L   2 

a   +  a  . 

a     z 

The  variation  in  the  results  of  the  blood  test  is 

2      2 
governed  by  a   +  a  .   The  portion  of  this  attributable  to 

2    2   2 
the  patients  is,  of  course,  a    / (°a+0z)  >    and  the  physician 

would  like  to  know  whether  this  or,  correspondingly, 

9   ?  2   2 

a    /a      is  sizeable.   If  a  /a   is  sufficiently  large,  he  would 

a   z  a   z 

choose  to  investigate  the  possible  use  of  this  test  as  a 
means  of  detecting  the  pathological  condition;  otherwise 
he  would  find  the  blood  test  essentially  useless  as  a  diag- 
nostic tool.   Hence,  the  physician  might  be  interested  in 

2 
testing  the  hypothesis  HQ :  a   =0   against  the  hypothesis 

H,  :  a2  >  0. 

1    a 


In  order  to  derive  the  likelihood  ratio  test  for  testing 

the  hypothesis  H   against  H-,  ,  we  first  need  to  obtain  the 

2   2 

likelihood  function  of  (u,a  , a   ).   This  is  most  easily  done 

z   a 

by  making  a  transformation.   Let  C  be  an  orthogonal  matrix, 
with  the  element  in  the  i    row  and  the  j    column  denoted  by 
c  •  ,  such  that  c,  .  =  l//n  :  j  =  1,2, ...  ,n.   Since  C  is 

orthogonal , 

n        __   n 

I   c,  .=/n   E   c,  .  cv.  =  0     for  k  =  2 ,3 , .  .  .  ,n. 

■i=l    ^      -i=l    ^    3 

D  x         D  -1  (1.1.2) 

Consider  the  orthogonal  transformation 

^il^i2 W  =C(xil'Xi2 xin)'-      (1-1'3) 

Upon  replacing  x . .  by  the  right  side  of  (1.1.1)  and  using 

(1.1.2),  we  observe  that 

n       _     __  _ 
il    k=1   IK  i. 

n 
Yij  =  kl±   CJk  Zik       for  j  =  2'3'""n' 

n 

where  x.   =  Z      x.  /n.   Thus, 
i.    k=1   IK 

Cov(yi.,yik)  =  0       for  j  #  k, 

2 
v(Y,-i)  =  a  f°r  3  =  2,3,. ..,n, 

and  {x.  ,y.  -  ,y .-,,...  ,y .     }.     ,   is  a  set  of  gn  mutually 
l.  Ji2  Ji3     ,Jm  i=l  3         J 

independent  random  variables,  where  x.   has  a  normal  dis- 

2       2 

tribution  with  mean  u  and  variance  a  /n  +  a  ,  and  y. . 

z       a       J  l  j 


has  a  normal  distribution  with  mean  zero  and  variance  a 

z 

n  n 

2     "    2 


Note  also  from  (1.1.3)  that   Z  (x..-x.  )   =  I      y  . 

j=l   1D   X-      j  =  2   ^ 


and 


denote  this  quantity  by  u . .   We  can  now  write  the  joint 

density  function  of  y  .  „  ,y  .-,....  ,y  .   as 
J  J i2  J i3 '    ' 1 in 

n 

f^i2^i3 ^in;az2)  -  .  H  (2^)  "^  expf-y2. /2a2] 

=  (2.0*)-"*"-1'  exp[-  ly2.,2o2] 
z  .  „n    z 

3=2   - 

i  ?   2    2.      ,     2, 

=  g(.^/ij;az)  =  9<W' 

so  that  from  the  set  {x.  ,y . „ ,y ...... ,y .  },  (x.  ,u.) 

l.   i2  Ji3      ■'in      i.   i 

2 
is  sufficient  for  (u,a  ).   Thus,  we  may  assume  that  we  have, 

—  2 

independently,  u.  and  x.   for  i  =  l,2,...,n,  where  u./a 
11.  l   z 

has  a  chi-square  distribution  with  v  =  n-1  degrees  of 

freedom,  and  x.   has  a  normal  distribution  with  mean  y  and 

2       2 

variance  a    /n  +  a  .   Note  that  with 
z       a 

g  _ 

x    =   Z  x.  /g, 
i=l  1* 

g   -      2     g   -    -    2      -      2 
I     (x   -y)   =   Z  (x   -x   )z  +  g(x   -y)  . 

i=l    "        i=l    ' 

2     2     2 

Then  putting  a   =  a  /n+a  ,  we  can  write  the  joint  density 

function  of  xn  ,x„  , . . . ,x    as 

1 .  Z .       g. 


—        —  —  2  2    —3?  —  2         2 

f(x,     ,x      ,  ...,x      ;y,a    )    =      II    (2iTa    )     2exp[-(x      -y)    /2a    ] 

i.     ^.  g.  i=1  i- 

=    (2Tra2)_35g   exp[-    I     (x .    -y)2/2a2] 
i=l      1- 

=    (2vo2)~hg  exp[-(    E    (x      -x      )2   +g (x      -y)2)/2a2] 
i=1      x.       .. 

=   g(x       ,v;y,a    )  , 

g     -        -        2 
where        v   =  n      E     (x .    -x      )     .      Hence,    from  the   set 

i=l   X*   '• 

—   —        —      —  2       2 

{x   ,x2  , . . .  ,x   },  (x   ,v)  is  sufficient  for  (y,a  /n  +  a  ). 

Also,  if  we  let  c  denote  a  constant,  we  can  write  the  joint 


density  function  of  u, ,u-,. .. ,u  as 

2      g 
f  (un  ,u0, .  .  .  ,u^;a  )  =   II   c  exp  (-u./2ai)u.2V  ± /  (a^) 

'        i=l 


2,  %v-l  .  ,  2,33V 


=  (a2)^gv  exp(-   E   u./2a2)  E  c  u^"1 
i=l   1    z  i-1    L 

2 

=  g(u;az)h(u1,u2, . . . ,u  ) , 

g 

where  u  =   E  u..   Thus,  from  the  set  {u, ,u9,...,u  } ,  u  is 
i=1  1  L      z  q 

2 
sufficient  for  a  . 
z 

We  may  now  assume  that  we  have,  independently,  x   , 

u,  and  v,  where  x   has  a  normal  distribution  with  mean 

2    2  2 

y  and  variance  (a  +no  )  /gn;  u/a   has  a  chi-sguare  distri- 

2    2 

bution  with  e  =  g(n-l)  degrees  of  freedom,  and  v/(a  +naa) 

has  a  chi-square  distribution  with  h  =  g-1  degrees  of 

2   2 
freedom.   The  likelihood  function  of  lu.a    ,0    )    can  be 

z   a 

expressed  as 


_                        exp[-(x      -y)    gn/2  (az+naa)  ]    u2        exp[-u/2oz] 
f(x      ,u,v)    = "-J- 2  pj £- 


(2tt  (o^+nap  /gn)  2  (2a^)  2=r  fte) 


^h-1     r   ,_,  .  2  ,   2.  , 
v     exp[-v/2(a  +na  )] 


(2(o2+no*))hhT(hh) 

z   a 


2   2     2 

Let  the  set  u>   =    {  (o    ,o    )  :    o      =  0}  and  the  set 
a   z     a 

2      2  2 

fi   =    {(a    ,a    )   :  a      >    0}.      We   seek   the  maximum   likelihood 
or     z  a 

2  2 

estimators,  u   and  d      ,    of  y  and  a   when  the  parameters 

'   M(x)         ZO)        P         Z  r 

are  restricted  to  lie  within  to  ,  and  the  maximum  likelihood 

a  2         -  2  2        2 

estimators,  y„  ,  o^,       and  a^,   of  y,  a  ,  and  a& 

when  the  parameters  are  restricted  to  lie  within  £2. 

In  to  a   =0  and 

a 

_  exp[-(x   -y)  gn/2a  ]  u     exp[-u/2a  ] 

fz                        \  •  •  z  z 

(x       ,u,v)    =  


(2vo2Jqn)h  (2aJ)W(%e) 


Jjh-1  r        /o~2i 

v  exp[-v/2a    ] 


(2a2z)HhT  (hh) 


so  that  the  logarithm  of  the  likelihood  function,  omitting 
a  function  of  the  observations,  is 

(X-'"U)2gn    u+v    <e+h+1>  in    o\  (1.1,4) 


o„2        o„2       2         z 

2a         2a 

z  z 

2 

Differentiating  (1.1.4)  with  respect  to  y  and  a    ,  we 

obtain,  respectively,  the  equations 

(x..-y)gn  (x   -y)  gn+u+v    -+h+1 

2        U'  _.  2.2         ,  2 

a2  2(az)         2az 


which  yield  the  maximum  likelihood  solutions 

y   =  x   , 

a2   =    (u+v) / (e+h+1) . 

zoo 

In  Q   the  logarithm  of  the  likelihood  function,  omitting 
a  function  of  the  observations,  is 

(^.._y)2gn     u     e  0         2  v        (h+1)  „  ,    2M      2, 

"   -  -~    £n  a  ~ ^—  -    0   £n  (a  +na  )  . 


0/  2    2.     -2    2      z    0/  2^   2,      2     l  z    a 

2  (a  +na      2a  2 (a  +no  ) 

z    a       z  z    a 

(1.1.5) 

2        2 

Differentiation  of  (1.1.5)  with  respect  to  y ,  a  ,  and  a 

yields,  respectively, 
(x   -y)gn 


/  2^   2, 

(o  +na  ) 
z    a 


=  0, 


n((x_-u)2gn+v)  n(h+1)    _ 

2    ?  ?    ~     7         2  ' 

2  (a  +na  )  2(a  +na  ) 

z    a  z    a 

(x_-y)2gn+v  u       e      (h+1)    =  q 


2(a2+no2)2     2  (a2)2    2a2    2  (a2+na2) 
z   a  z        z       z   a 

2        2 

Solving  these  equations  for  \i ,    o    ,    and  a    ,    we  obtain  the 

~  ~2 
maximal  solution  of  the  likelihood  function  in  fl,  (y/O  , 

aa) ,  given  by 

y  =  x   , 

.2      , 

a  =  u/e  =  u+ , 
z  * 

a2  =  (v/(h+l)-u/e). /n  =  (vruj/n, 


where  u*  =  u/e  and  v*  =  v/ (h+1) .   Since  we  insist  that 
a  „  be  greater  than  or  equal  to  zero,  the  solution  above 
is  the  maximum  likelihood  solution  only  if  v*-u*  >  0. 
Suppose,  however,  that  v*  <  u*.   Clearly  (1.1.5)  is  still 
maximized  when  y  =  x   ,  so  that  we  need  to  minimize 

^r  +  e  In   a2    +  — ,  V  ,   +  (h+1)  k(aW) 

a2         z    (a2+na2)  Z 

z  z   a 

2  2 

subject  to  the  contraints  a      >  0  and  a   >  0.   Equivalently , 

we  consider  the  problem  of  minimizing 

ip(x,t)  =  u/x  +  e  £n  x  +  v/t  +  (h+1)  £n  t 
subject  to  the  constraint  t>x>0  .   For  fixed  x  i|j(x,t)  is 
concave  upward  in  t  with  its  absolute  minimum  at  t  =  v^. 
For  each  x   i>(x,t)    is,  therefore,  minimized  with  respect  to 
t  >  x  when 

v*     if  v*  >  x, 


'  ,x     if  Vj,  <  x, 

Thus,  ijj(x,t)  is  minimized  over  {(t,x):  t  >  x  >  0}  by 

setting 

t  =  v*  and  x  =  u^         if   v*  >  u^ , 
t  =  x  =  (u+v) / (e+h+1)      if   v*  <  uA. 

Hence,  for  the  maximum  likelihood  estimators  when  the 

parameters  are  restricted  to  be  within  Q,    we  obtain 


2 


9. .«  =  <v*-u*)  /n' 


if  v*  >  u* ,  and 

a2n   =    (u+v) /(e+h+1) , 

a2n  =  0, 
if  v*  <  u*. 

Substituting  the  maximum  likelihood  estimates  into 

the  likelihood  function,  we  see  that  in  oo 

u^e-l  vHh-l   exp[_ (e+h+1) /2] 

»8*  f(«..^v)  -  ———^-  ,%(e+h+l,       %  2^  (e+h+1)' 


'e+h+1' 


and  in  fi 


u^e-lv%h-l  exp[_(e+h+1)/2] 


max  f  (x   ,  u,v)= 


nwrwuH,^?^11,^*^^! 


max  f (x   ,u,v) 


if  v*  >  u*, 
if  v+  <  u+. 


The  likelihood  ratio,  X,    is 


^(u/e)^(v/(h+l))^h+1) 


A  = 


f (x   ,u,v)    / 


[ (u+v) /(e+h+1)] 


h (e+h+1) 


max  f (x   ,u,v) 


if  v+  >  ui( 


if  vt  <  u; 


Now  putting  w  =  u/ (u+v)  and  noting  that 

v*  i  u*  ,  if  and  only  if  (iff) 


h+1  "  e  ' 


u+v  >   ,1  ,   1  . 
-  u(-  +  rr-rr)  , 


h+1 


'e    h+1' 


iff 
iff 


e    >  JSL- 
e+h+1    u+v 


=  w, 


10 


we  can  rewrite  the  likelihood  ratio 


r(e+h+l)*(e+h+1)  hen       ^(h+1)     .-    , 

A— Wv^n  v    W    (1-W)  if  W  < 


A  = 


■ 


ete(h+1)%(h+l)   -   «—  "  -  *  e+h+l 


1  if  w  > 


v_ 


e+h+l 


Since  A  is  an  increasing  function  of  w,  and  H   is  rejected 

for  small  values  of  A,  it  follows  that  H   is  rejected  for 

small  values  of  w  or  large  values  of  1/w.   Now 

l=H±Z=1  +  v  =  1  +  h(Wh 
w    u        u       e  u/e 

so  the  likelihood  ratio  test  rejects  H   for  ev/hu  large. 

2 
Recall  that  u/o   has  a  chi-square  distribution  with 

2    2 

e  degrees  of  freedom,  and  v/(a  +na  )  has  a  chi-square  dis- 
tribution with  h  degrees  of  freedom,  independent  of  u. 

2       2    2 

Hence,  the  quantity  a   ev/ (a  +na  ) hu  has  an  F  distribution 

z       z    a 

with  h  and  e  degrees  of  freedom.   If  we  let  F(h,e,a)  denote 
the  constant  for  which  P(F(h,e)  >  F(h,e,a))  =  a  where  F(h,e) 
has  an  F  distribution  with  h  and  e  degrees  of  freedom,  then 

we  will  reject  H   if  ev/hu  >  F(h,e,a).   The  power  function 

2   2 
of  this  test  is  a  function  of  9  =  a  /a   and  is  given  by 

6(6)  =  P(F(h,e)  >  F(h,e,a) /(l+n6)  )  . 

Although  the  analysis  which  we  have  just  outlined  is, 
by  now,  quite  standard  to  any  graduate  level  course  in  design 
and  analysis,  we  have  reproduced  it  since  it  motivates  the 
more  general  problem  to  be  described  in  the  next  section. 


11 


Indeed  the  situation  we  wish  to  consider  contains  the  one- 
way random  effects  model  as  a  special  case  to  which  we  can 
return  on  occasion  to  check  our  work. 

1. 2   The  Multivariate  Random  Effects  Model 

Suppose  a  physician  is  considering  administering  a  bat- 
tery of  m  distinct  types  of  blood  tests  to  his  patients  as 
a  part  of  their  physical  examination.   He  believes  that, 
based  on  the  results  of  these  tests,  he  may  be  able  to  detect 
any  one  of  several  particular  pathological  conditions.   In 
order  to  examine  variability  in  the  results  of  the  blood 
tests,  the  physician  chooses  to  administer  the  battery  of 
blood  tests  n  times  to  each  of  g  patients.   This   results  in 
the  observations  x . . (mxl) :  i  =  1,2,. . . ,g;  j  =  1,2,. . . ,n. 

A  suitable  model  to  explain  the  different  values  of 
x..:  i  =  1,2,. ..,g;  j  =  1,2,. ..,n  would  be 

2£ij  =  H  +  «i  +  lij-  (1.2.1) 

Here  jj(mxl)  is  an  overall  mean,  a.  (mxl)  is  an  effect  due  to 
the  i    patient,  and  z.  . (mxl)  represents  a  vector  of  random 
errors  due  to  the  measuring  process.   We  assume  that 
z...:  i  =  l,2,...,g;  j  =  1,2,..  .  ,n  are  independent  and  have  an 
m-variate  normal  distribution  with  mean  _0  and  variance-covar- 
iance  matrix  E . 

Since  the  physician  will  administer  the  tests  to  an 
unlimited  number  of  patients  in  the  future,  we  should  prop- 
erly regard  the  g  patients  involved  as  a  sample  from  the 


12 


entire  population  of  patients.   The  patient  effects, 

a.:  i  =  1,2,. ..,g,  now  have  the  role  of  random  vectors,  and 

(1.2.1)  is  a  multivariate  random  effects  model.   We  will 

assume  that  a-:    i  =  1,2, ... ,q   are  independent  and  have  an 

m-variate  normal  distribution  with  mean  0^  and  variance- 

covariance  matrix  A.   Hence,  from  our  model  (1.2.1)  we  see 

that  x. .  has  an  m-variate  normal  distribution  with  mean  u 
-ID 

and  variance-covariance  matrix  A  +  I . 

While  there  are  m  different  blood  tests,  it  is  believed 
that  there  are  some  groups  of  tests  for  which  the  tests  within 
a  group  vary  quite  strongly  together.   In  other  words,  the 
data  from  some  of  the  tests  are  highly  correlated.   For  this 
reason  the  number  of  sources  of  variation  between  the  patients, 
which  we  will  denote  by  p,  may  be  less  than  the  number  of  tests, 
m.   That  is,  the  rank  of  the  variance-covariance  matrix  A  is 
p  where  p  <  m.   Since  A  is  symmetric,  nonnegative  definite, 
and  of  rank  p,  there  exists  a  matrix  L (mxp)  such  that  A  =  LL ' . 
Clearly  L  is  not  unique  since  if  A  =  LL '  and  P(pxP)  is  such 
that  PP  '  =  I,  then  A  =  L*L*  where  L*  =  LP.   This  enables  us  to 
rewrite  (1.2.1)  as 

x. .  =  y  +  Lf .  +  z. . ,  (1.2.2) 

-in    -    -x   -i] 

where  f. (pxl) :  i  =  1,2,..., g  are  independently  distributed, 
having  a  p-variate  normal  distribution  with  mean  _0  and 
variance-covariance  matrix  equal  to  the  identity  matrix. 


13 


If  the  physician  is  interested  in  using  the  blood  tests 
as  a  diagnostic  tool,  he  will  certainly  be  interested  in  deter- 
mining the  value  p,  since  the  p  sources  of  variation  may 
correspond  to  p  different  pathological   disorders.   So  of 

particular  interest  to  the  physician  is  a  test  of  the  hypoth- 

(s) 

esis  H    :  the  rank  of  the  matrix  LL '  <  s-1  against  the 
o  r 

(s) 

hypothesis  H,   :  the  rank  of  the  matrix  LL '  =  s.   With  such  a 

test  procedure   he  could  develop  a  sequential  test  procedure 

for  determining  the  rank  of  LL ' .   He  would  first  test  H 

o 

against  H,  ,  and  if  he  rejects  H  ,  he  would  stop  and  take 
the  rank  of  LL '  to  be  m;  otherwise,  he  would  proceed  to  test 

H      against  H,     .   The  procedure  continues  until  either 

(s) 

some  hypothesis  H     is  rejected,  in  which  case  he  then  takes 

the  rank  of  LL '  to  be  s,  or  the  hypothesis  H  is  accepted, 
in  which  case  he  would  conclude  that  there  is  no  significant 
variation  between  patients. 

In  this  paper  we  investigate  the  multivariate  one-way 
classification  model  with  random  effects,  given  by  (1.2.2). 
Two  specific  cases,  regarding  the  structure  of  the  variance- 
covariance  matrix  Z,  will  be  considered.   In  the  first  and 
most  general  case  we  will  assume  no  more  than  that  I  is  sym- 
metric and  positive  definite.   In  the  second  case  we  will 
assume  that  the  vector  of  random  errors,  z. .,  is  such  that 
its  components  are  independent  and  have  the  same  variance. 
That  is,  we  assume  that  £  is  equal  to  some  constant  multiple 
of  the  identity  matrix.   In  each  case  we  develop  a  test 


14 


(s) 

procedure  for  testing  the  hypothesis  H    :  the  rank  of 

(s) 

LL '  <  s-1  against  the  hypothesis  H,   :  the  rank  of  LL '  =  s. 

In  addition,  we  investigate  some  of  the  properties  of  these 
test  procedures  and  present  a  numerical  example  to  illustrate 
the  use  of  these  procedures. 

1. 3   Notation 

The  following  notation  will  be  used  whenever  convenient: 
Notation  Interpretation 

(A) .  row  i  of  the  matrix  A 

(A)  column  j  of  the  matrix  A 

(A) . .  the  element  in  row  i  and  column  j 

of  the  matrix  A 
a.  .  the  element  in  row  i  and  column  j 

of  the  matrix  A 
A  the  inverse  of  the  matrix  A 

A '  the  transpose  of  the  matrix  A 

|a|  the  determinant  of  the  matrix  A 

tr  A  the  trace  of  the  matrix  A 

dg  (A)  the  diagonal  matrix  which  has  as  its 

diagonal  elements  the  diagonal 
elements  of  A 
diag (a, ,a2 , . . . ,a  )   the  diagonal  matrix  which  has  a,, 

ch. (A)  the  i    largest  latent  root  of  the 

matrix  A 


15 


Notation  Interpretation 

rank  (A)  the  rank  of  the  matrix  A 

I  the  m  x  m  identity  matrix 

m  J 

I  the  identity  matrix  (used  when  the 

order  of  the  matrix  is  obvious) 
(0)  the  matrix  which  has  all  of  its 

elements  equal  to  zero 
x  a  vector 

x.  the  i    element  of  the  vector  x 

_0  the  vector  which  has  all  of  its 

elements  equal  to  zero 
E  (x)  the  expected  value  of  x 

V(x)  the  variance  of  x 

Cov(x,y)  the  covariance  of  x  and  y 

P  (A)  the  probability  of  event  A 

P(A|B)  the  probability  of  event  A  given 

event  B 

r  (x)  the  gamma  function 

x  >  x  x   converges  to  x  in  distribution 

n  n        ^ 

a  >  a  convergence  of  a  sequence  of  constants 

exp(x)  Euler's  constant,  "e,"  raised  to  the 

x  power 
£  is  contained  in 

is  distributed  as 

2 
N(y,o  )  the  normal  distribution  with  mean 

u  and  variance  a 


16 


Notation  Interpretation 

N  (y,E)  the  m-variate  normal  distribution  with 

m  — 

mean  u_  and  variance-covariance  matrix  Z 

2 
X  the  central  chi-square  distribution  with 

v  degrees  of  freedom 

Vl 
F  the  central  F  distribution  with  v.  numer- 


V2 


1 

ator  degrees  of  freedom  and  v_  denomina- 
tor degrees  of  freedom 

W  (Z.v.O)  the  central  Wishart  distribution  with 

m 

variance-covariance  matrix  E  and  degrees 

of  freedom  v 
Jones  [1973]         the  reference  authored  by  Jones  and 

published  in  1973 
Jones  [1973:1]       page  1  of  the  reference  authored  by  Jones 

and  published  in  1973 


CHAPTER  2 


MAXIMIZATION  OF  THE  LIKELIHOOD  FUNCTION 
FOR  GENERAL  E 


2. 1   The  Likelihood  Function 

Suppose  the  vectors  x. .  (mxl)  :  i  =  1,2,... ,g;  j  =  1,2, 
. . . ,n  can  be  modeled  by 

2ij  =  )L   +  Lli  +  Zj.j'  (2.1.1) 

wherein  y(mxl)  is  a  fixed  but  unknown  vector,  L  (mxp)  is 
a  fixed  but  unknown  matrix,  f.  -  N  (0,1):  i  =  1,  2,...,g, 
and  z. .  ~  N  (0,£)  :  i  =  1,2, . . . ,g;  j  =  1,2, ... ,n.   We  assume 
that  the  set  of  random  vectors  {  f_,  ,  f  _  ,  .  .  .  ,  f  , _z^  ,  .  .  .  , _z   } 
are  mutually  independent.   Thus,  x. .  ~  N  (y,V)  with  V  = 
LL '  +  Z.   However,  for  any  orthogonal  matrix  P (pxp) ,  V  = 
LL '  +  Z  =  LP  (LP)  '  +  Z    so  that  L  is  not  unique  whereas  LL ' 
is  unique.   The  purpose  of  this  section  is  to  derive  the  like- 
lihood function  for  y_,  LL '  ,  and  E.   Although  x.  .  and  x^  are 
independent  for  all  (j,£)  when  i  #  k,  x. .  and  x.„  are  not 
independent  even  when  j  ^  £ ,  since  Cov (x . . ,x. . )  =  LL'(Jt^). 
Thus,  the  likelihood  function  is  not  simply  the  product  of 
the  density  functions  of  the  x. .'s.   A  transformation  of  the 

x.  .'s  will  expedite  the  derivation  of  the  likelihood  function. 
-ID 

17 


18 


Consider  the  Helmert  transformation  (see,  for  example, 
Kendall  and  Stuart  [1963:250])  given  below: 

£il  =  ^i.  +  (2*1)_^il  +  (3*2)"^i2  +  *"  +  <n<n-1))_^iv' 
^i2  =  Hi.  "  (2,1)~^il  +  (3*2)"^i2  +  •**  +  (n(n-l))"^iv, 
xi3  =  x±  -  2(3-2)~^i2  +  •'•  +  (n(n-l))"2Yiv, 


x.   =  x.  -  (n-1)  (n(n-l))  V  , 

—in   — l.  1V 

where  v  =  n-1.   It  will  be  helpful  to  note  that  the  above 
equations  imply  the  following: 

*im    =   n~\i    +   n~\2    +  '••  +   n"  -in' 

*il    =    2~%1    ~    2~%2' 

Yi2    =    (3-2)_Jsxi]L    +    (3.2)_isxi2    -    2(3-2)_J5xi3, 


y.   =  (n(n-l))  \ .  .    +    •••  +  (n(n-l))  \.    n  ,  -  (n-1)  (n  (n-1)  ) "  2x .  , 
In  matrix  formulation  we  have 

<*ii *in>'  =  H(Hi.'*ii ^iv} '' 

and  we  note  that,  while  not  an  orthogonal  matrix,  the  columns 

of  H  are  orthogonal.   The  matrix  H  fails  to  be  orthogonal 

since  H'H  =  diag (n, 1 , 1 , . . . , 1) .   Observe  that,  upon  replacing 

x .  .  by  the  right  side  of  (2.1.1),  we  have 
-ID 


19 


X.  =  y  +  Lf .  +  z .   , 
-l.    -     -1    — l. ' 

*il  =  (2)"i5(zil-li2), 

Yi2  =  (3.2)^(2.^2.2-22.3), 


v.  =    {n(n-l)}  2(z. ,+2. ,+. • .+z.    ,-(n-l)z.  ). 
iv  — il  — i2     — i,n-l      —in 

Thus 

E^i.Zij)  =  (0)  , 

E(Xi;j2iq)  =  (0)   if    j  #  q' 
E  (y.  .y.'  .)  =  Z. 

Hence,  it  follows  that  {xi  ,y_i;L '  •  •  •  'Xiv^=l  are  a  set  of 

gn  mutually  independent  vectors  with  x.   ~  N  (y ,  (1/n) Z+LL  ' ) : 

i  =  1,2,. ..,g  and  £„  -  Nm(0,Z):  i  =  1,2,. ..,g;  j  =  1,2,. ..,n. 

n      _        __       v 
Note  also  that   Z  (x.  .-x.  )  (x.  .-x.  )'=  Z      ¥_■■%!■    and  denote 
j=l    J        1J   x"     j=l   XJ  1-) 

this  matrix  by  E..   We  can  now  write  the  joint  density  function 
of  y .,,... ,y.   as 

f&il'..-rZiv'-Z)  =  ,n  |2ttZ|  Sexp[-^ij2   Zij] 


-%v   _r  ,.  :  ,_  ,  .-1 


=    |2ttZ|         expf-ij    Z     (vf.  Z      y,,)] 
j=l      i:  1J 

1,  v  , 


exp[-3s    Z    tr(W.Z      v..)] 
j=l  XJ  x:i 


20 


=    |2ttZ|~2V   exp[-h      Z      tr(E        v..    y.'.)] 

=    |2ttZ|~2V   exp[-Jj    tr    Z-1    E.] 
=    g(E.;Z) 

so   that   from  the    set    {x.     ,y . , ,y .„,... ,y .    },     (E.,x.    )    is 
sufficient. 

Thus,    we  may  assume   that  we   have,    independently, 


E.  ~  W  (Z,  V,  0) 
l    m 


x.  -    N  (y,  "  Z  +  LL') 
— i  .   m  —  n 


1  <  i  *  g. 


Note  that 

g_  _  g    _     _       _     _ 

E  <Xi  "ii)  (2£i  -u)'  =   z  (2£i  -2   )  (^i  -x   >  '+g(*  -Ji)  (2£  -£)  '/ 

i=l   "     x  *       i=l    '        ... 

g  _ 

where  x       Z   x.  /g.   Then  putting  W  =  (l/n)Z  +  LL '  , 
i=l  :Lm 

we  can  write  the  joint  density  function  of  xn  ,x„  , . . . ,x    as 
J  — 1.  —2 .      g. 

f(xx  ,x2  ,...,1   ?JJ,W)  =   JI  \2-nW\~iiexp[-h(xi    -yJ'W-1^  -y )  ] 
g.         ^_^ 

,_j,0  g  _-,  _ 

=    |  2ttW|      2yexp[-^    Z    tr  (  (x.    -y)'  W       (x.  -  y)  )  ] 
i=l  1-  1' 


=    |  2ttW  I     2yexp[-^   Z    tr(W      (x.    -y)  (x.    -y)'M 
i=l  1*  x- 


,-^a  -1     ®       —  — 

=    |2ttW|     2yexp[-Js   tr(W        Z    (x.       y)(x.    -y)'  )  ] 

i=l     x*  1- 


21 


=  |2ttW|   2yexp[-lsg  tr  W   (x   -y)  (x   -y)' 


-h   tr(W  X   E  (x.  -x   )  (x.  -x   )'  )] 
i=1   l.   ..    1.   .. 


2ttW|  2yexp[-^g  tr  W   (x   -y)  (x   -y)'  ] 


_1  g     _   _    _   _ 
x  exp[-h   tr  W     E  (x.  ~x   )  (x .  -x   }'  ] 
i=l   x*   •  *    1* 

=  g(x   ,H;jj  ,W)  , 

g    _     _       _     _ 

where   H  =  n   E  (x .  -x   ) (x .  -x   )'  .   Hence,  from  the  set 


{  x,  ,  .  .  .  ,x   }  ,  (x   ,H)  is  sufficient  for  (y ,  (1/n)  E  +  LI/  )  . 

Also  if  we  let  c  denote  a  constant,  we  can  write  the  joint 

density  function  of  E,,...,E   as 

*  1'      g 

f(E1 >Eq;Z)    =   c  n  |Ei|^(v"m"1)exp[-32  trfE^E..)] 

g        i=l 

-  c  expf-J^  tr(E_1  E  E.)]  II  |  E .  |  ^  (V"m_1) 
i=l  1   i=l   x 

=  g1(E;Z)g2(E1,E2, . . . ,E  ) , 

g 

where  E  =   E   E..   Thus,  from  the  set  {E, , . . . ,E  } ,  E  is 
i  =  l  ^ 

sufficient  for  E. 

Then  we  may  assume  that  we  have,  independently, 

^..  ~  Nm(^'gl{(E+nLL'  M' 

E  - W  (Z,e,0)  , 
m 

H~  W  (Z+nLL'  ,h,0)  , 


22 


where   e  =  g(n-l)  and  h  =  g-1.   The  problem  is  to  estimate 

y,  £,  and  LL '  or,  equivalently ,  to  estimate  jj,  Z ,    and  M  where 

M  =  nLL ' .   Recall  that  L  is  not  uniquely  defined  so  that  if 

LL '  is  an  estimate  of  LL ' ,  then  any  L,  such  that  LL '  =  LL ' , 

is  an  estimate  of  L.   The  likelihood  function  of  (jj,Z,M)  can 

be  expressed  as 

K     (I,e)K     (I,h)  hlh         ,.  ,    ,  ,. 

f(x       ,E,H)     =-2—^ r      m      ^h        rr    IhI^^-^IeI^6-^1) 

—  (E+M)  r|l+MHn|z|^e 
gn  '      '  iii 

x    exp[-^(x      -y)  '  (—  (E+M)  )_1(x      -y)-^tr  (E^EJ-Jjtr  (E+M)_1H]  , 

where   if1  (I,  v)  »  2lsmv  *tal(m"1)   S   r(%(v-j+l)). 

The  logarithm  of  the  likelihood  function,  omitting  a  function 
of  the  observations,  is 

-^trE~  E-2$e£n|Z|  -%tr  (E+M)  _1K  -*sh  in  |  E+M' 

-  Jj  £n|Z+M|  -  ^(x   -y)  '  (  (1/gn)  (E+M)  )-1(x   -y )  . 

We  seek  the  solution,  (y ,E ,M) ,  which  maximizes  the  equation 
above,  or  equivalently,  the  solution  which  minimizes 

tr  E~  E-;-e£n|E|  +  tr (E+M) _1H+ (h+1) £n | E+M| 

+  (x^-y)  '(  (1/gn)  (E+M)  )_1(x   -y )  .    (2.1.2) 

Before  we  can  minimize  the  above  equation,  we  need  some  results 
on  differentiation.   Let  W (mxm) ,  X(raxm) ,  and  Y (mxm)  be  sym- 
metric matrices,  and  let  z(mxl)  and  a(mxl)  be  vectors.   The 
proof  of  the  first  result  can  be  found  in  Graybill  [1969:267]. 


23 


Lemma  2.1.1 


9£n|x|  =  2X"1  -  dgtx"1) 


3X 


Lemma  2.1.2: 

^nljj+Y|  =  2(X+Y)_1  -  dgUX+Y)-1). 

Proof:   Let  V  =  X  +  Y.   Then 

3£n|x+YJ  =    3£n|v[  =      z  z   3£n]v]  9vpq  =  3  An  1 V | 

3x.  .      3x.  .       l<p<q<m  3v     3X  .  "  "       3v  .    ' 
13        i]         r        pq     13       13 

so   iMXiYi  .  MnM  .  2v-l.dg(v-l,  .  2(M)-l.ag((M)-l| 
Lemma  2.1.3: 

atrax+Y) — "  =  -2(x+Y)~lw(x+Y)~1+dg((x+Y)"1w(x+Y)"1) . 


Proof:       Let   V   =   X   +   Y.      Note    that 


3v-xv  =  (0)  =  (av^u  +v-i/  3V 


3x. .  vw'  \3x. . Jv  3x. . 

iD  V     137  V     13. 

^u    *.       9V"1  TT-lf    3V  V.-l 

so  that   ^r- =  "  v    terr  v    • 
id  V    i:7 

Then  atrtx+Y)-^  =  atrv^w  =  tr/W^w 


3x.  .  3x.  .  3x. 

ID  13  V     l 


=    -    tr    V-M^-JV-^ 


mm  ,       _  3v 

E         Z     (V    XWV   i)         ^-£3 
p=l    q=l  ^         13 


24 


-  (v  1wv  X)  .  .  -  (v  1wv  l)  .  .  if  i  4   j , 

-  (V-1WV-1)  .  .  if  i  =  j  . 

11  J 


-  2  (V  1WV  1)  .   .  if  i  i-   j, 

lj 

-  (V_1WV-1)  .  .  if  i  =  j. 

li 


Hence,  StrtX.Y)-^  =  _2V-1WV"1  +  dg(v-1WV-1) 


9X 


Lemma  2.1.4 


=  -2(X+Y)  ^(X+Y)  1    +   dg((X+Y)  ^(X+Y)"1) 


3  (z-a)'  W(z-a) 

—     - — =^^  =  2W(z-a) 

dZ  —  — 


9  (z-a)'  W  (z-a)     „   /"  m   m 


Proof:  r =  ^—        Z    I     (z  -a  )  (z  -a  )w 

3zi        9zi  Vp=l  q=l   P   P    q   q   P<3 


=  I     (z  -a  )w.   +   E  (z  -a  )w  . 


q=l   q   q   iq    p=l   P   q   P1 


=  2   E  (z  -a  )w.   =2  (W)  .  (z-a) 

i    q  q  iq     i-  — 

3  (z-a)'  W(z-a) 
so  that  s =  2W(z-a)  . 

dZ  —    — 

If  we  ignore  the  constraints  that  E  is  positive  definite 
and  M  is  nonnegative  definite  and  seek  the  stationary  values 
of  (2.1.2)  over  all  possible  (_u,E,M),  we  find,  upon  taking  the 
partial  derivatives  of  (2.1.2)  with  respect  to  E,  M,  and  jj 
and  setting  them  equal  to  zero,  that 


25 


eE_1+(h+l)  (E+M)"1  -  E~1EE-1-(Z+M)"1H(Z+M)"1 

-  gn(E+M)-1(x   -y)  (x   -y)'  (E+M)""1  =  (0), 
(h+1)  (E+M)"1- (E+M)~1H(E+M)~1-gn(E+M)~1  (x   -y )  (X   -y)'(E+M)~ 


=  (0)  , 

gn(E+M)-1  (x   -y)  =  0, 
for  which  the  solutions  are 

^  =  -. . ' 
E  =  d/e)E, 

M  =  (1/  (h+1)  )H-  (l/e)E. 

Since  M  is  a  nonnegative  definite  matrix,  its  maximum  likeli- 
hood estimate  must  also  be  nonnegative  definite,  so  the  solu- 
tions above  are  the  maximum  likelihood  estimates  only  if 
(1/ (h+1) ) H- (l/e)E  is  nonnegative  definite.   We  find  that, 
while  the  solutions  for  _y_  and  E  are  the  natural  unbiased  esti- 
mates, the  solution  for  M  is  not.   That  is, 

E(M)  =  (1/  (h+1)  )  (hM-E)  . 
Hence,  we  see  that  E (M)  is  also  not  necessarily  nonnegative 
definite. 

Suppose  that  instead  of  using  the  likelihood  function 
of  (jj,E,M)  we  use  the  marginal  likelihood  function  of  (E,M). 
Justification  for  this  follows  from  the  fact  that  (E,H)  is 
"marginally  sufficient"  for  (E,M)  or,  in  other  words,  (E,H) 
is  "sufficient  for  (E,M)  in  the  absence  of  knowledge  of  jj." 


26 


For  a  detailed  description  of  the  principle  of  marginal  suffi- 
ciency see  Barnard  [1963].   There  is  ample  precedent  for  the 
use  of  this  principle  in  multivariate  theory.   For  example, 
Bartlett's  test  has  two  forms,  one  involving  the  sample  size 
and  the  other  involving  the  degrees  of  freedom.   The  marginal 
likelihood  function  of  (E ,M)  can  be  written 

K  (I,e)Km(l,h)     .(h.^D     »,(e-m-l) 

f(E,H)  =  ■ ,h         h       |H|  |E| 

|E+M|  2    \Z H 

x  exp[-h   tr  E_1E  -  h   tr(E+M)_1H], 


where 


k_-(i,v)  =  2^  ^m^-^   n  r(Js(v-j+D) 


,-i 

The  logarithm  of  the  likelihood,  omitting  a  function  of  the 
observations,  is 

-h   tr  Z-1E  -he    ln\l\    -    h    tr(Z+M)_1H  -  hh    £n|Z+M|. 
We  seek  the  solution,  (E*,M*),  which  maximizes  the  above 
equation,  or  equivalently ,  the  solution  which  minimizes 

tr  E-1E  +  e  £n  |  E  |  +  tr(E+M)-1H  +  h  &n  |  Z+M  |  .     (2.1.3) 
Again  if  we  ignore  the  constraints  that  E  is  positive  defin- 
ite and  M  is  nonnegative  definite  and  seek  the  stationary 
values  of  (2.1.3)  over  all  possible  (E,M),  we  find, 
upon  taking  the  partial  derivatives  of  (2.1.3)  with  respect  to 
E  and  M  and  setting  them  equal  to  zero,  that 


e 


E  x+h(E+M)   -E   EE   -(E+M)   H(E+M)    =  (0), 


h(E+M)  X-(E+M)  1H(E+M)  X  =  (0)  , 


27 


for  which  the  solutions  are 

Z*  =  (l/e)E, 

M*  =  (l/h)H-(l/e)E. 

We  see  that  these  solutions  are  the  natural  unbiased  estimates 
of  1    and  M,  and  thus  E(MA)  =  M  is  clearly  nonnegative  definite, 
For  this  reason,  we  choose  to  continue  our  work  with  the  mar- 
ginal likelihood  function  of  (£,M).   Note  that  since  M  is  non- 
negative  definite,  the  solutions  above  are  the  maximum  likeli- 
hood estimates  only  if  (1/h) H-  (1/e) E  is  also  nonnegative 
definite.   In  the  next  two  sections  we  will  derive  maximum 
likelihood  estimates  for  £  and  M  which  are  valid  for  all 
possible  (E,H) . 

2 . 2   Some  Lemmas 

Consider  the  function 

<J>(A,B;D,e,h)  =  e[tr  A_1  +  £n|A|]  +h[tr  B_1D  +£n|B|], 

where  A,  B,  and  D  are  m  x  m  matrices.   We  assume  that  D  is 
diagonal  with  distinct,  descending,  positive  diagonal  elements; 

that  is,  D  =  diag (d, ,d~, . . . ,d  )  with  d,  >  d~  >  . . .  >  d   >  0. 

i   z      m        i     z  m 

We  are  interested  in  minimizing  cf>  (A,B;D ,  e  ,h)  subject  to 

s 
(A,B)  e  C      =  {(A,B):  A  e  P    ,    B£P  ,  B  -A  £  U   P.},  where  P. 

m       m  -i=o   3  3 

is  the  set  of  all  symmetric,  nonnegative  definite  matrices 

of  rank  j.   In  this  section  it  will  be  shown  that  the  required 

absolute  minimum  occurs  when  both  A  and  B  are  diagonal. 


28 


The  proof  of  this  result  relies  mainly  on  a  lemma  regard- 
ing the  stationary  points  of  the  function  g(P)  =  tr  PB   P'D 
where  P (mxm)  is  orthogonal. 

Lemma  2.2.1:   Consider  g(P)  =  tr  PXP 'D  where  P (mxm)  is 

such  that  PP '  =  I ,  and  X (mxm)  and  D (mxm)  are  both  symmetric 

and  positive  definite.   It  is  assumed  that  D  is  diagonal  with 

distinct,  descending,  positive  diagonal  elements.   Then  the 

stationary  points  of  g(P)  occur  when  PXP'  is  diagonal. 

Further,  the  absolute  maximum  of  g(P)  is 

m 
max   g(P)  =   E  d.ch.  (X)  , 
P:PP'=I         i=l  1 

and  the  absolute  minimum  of  g(P)  is 

m 
min    g(P)  _   E  dm+1_ichi <x) • 
P:PP'=I       "  i=l 

Proof:   Using  the  method  of  Lagrange  multipliers,  we  look  at 

L(P,A)  =  tr  PXP'D  +  tr  A(PP'-I), 

where  A  =  A'.   Let  A    be  the  matrix  that  has  1  in  row  i, 

ij 
column  j  ,  and  O's  elsewhere.   Then 

aL   =  tr  (A .  .XP'D+PXA  .  .D)  +  trA  (A .  .P  '+PA  .  . ) 


3pij        ID         Di      ""    ID    '  Di 

=  tr  (DPXA  .  .+PXA  .  .D)  +  tr  (PA  .  . A+APA  .  . ) 
Di     Di  Di      Di 

=  2tr(A..DPX)  +  2tr(A..AP) 
Di  Dl 

=  2(DPX)ii  +  2(AP)ij, 


29 


jp-  =  tr(A   +A   )  (PP'-I)  =  2  (PP'-I)..     if  i  *   j, 

ij         J   J  J 

•5-Y^-  =  tr  A.  .  (PP'-I)  =  (PP'-I)... 
3  A .  .        11  ii 

n 

Thus,  the  stationary  values  of  g(P)  occur  at  the  solutions  to 
2DPX  +  2AP  =  0, 


(2.2.1) 

PP  '  =  I. 

From  (2.2.1)  it  follows  that 

A  =  -DPXP  '  , 
so  that  A  =  A'  implies  that 

DPXP'  =  PXP  'D, 

or  DY  =  YD,  (2.2.2) 

where   Y  =  PXP ' . 

Examining  the  (i,j)    term  on  each  side  of  (2.2.2),  we  see  that 

we  must  have  d.y. .  =  y. .d . .   Since  d.  ^  d . :  i  ^  j,  it  follows 

that  y..  =0:  i  4-   j.   Thus,  Y  =  PXP'  is  diagonal.   It  is  clear 

then  that  the  stationary  values  of  tr  PXP'D  are  given  by  the 

set  of  values 

m 
_Zi   dt(i)  ch.(X), 

where  {  t  (1)  ,  t  (2)  ,  .  .  .  ,  t  (m)}  is  a  permutation  of  {  1,2  ,  .  .  .  ,m  },  the 

set  being  formed  over  all  such  permutations.   Further,  the 

absolute  maximum  of  tr  PXP'D  is,  clearly, 

m 
max   tr  PXP'D  =   E   d.  ch.(X), 


30 

and  the  absolute  minimum  is,  clearly, 

■m+l-i  chi(X) 


m 
min    tr  PXP'D  =   E   d 
P:PP'=I  i=l 


We  will  also  need  the  following  results,  the  first  of 
which  can  be  found  in  Bellman  [1970:117]. 

Lemma  2.2.2:   Let  X(mxm)  and  Y (mxm)  be  symmetric  matrices 
with  Y  nonnegative  definite.   Then 

ch.  (X+Y)  >  ch. (X)       for  i  =  l,2,...,m. 

If  Y  is  positive  definite,  then 

ch.  (X+Y)  >  ch.  (X)       for  i  =  l,2,...,m. 

Lemma  2.2.3:   The  function  <j>  (A,B;  D ,  e  ,h)  has  an  absolute 

minimum  over  the  set  of  solutions  C    =  {  (A,B)  :  A  e  P    ,  B  e  P    , 

s  m       m 

B  -  A  e   U  P  . }  . 
j=0   3 

Proof:   Since  B  is  positive  definite,  it  follows  that  B    is 

also  positive  definite,  so  that  the  diagonal  elements  of  B~ 

are  positive.   Then  we  find  that 

-1     m    -1  m    -1  -1 

tr  B   D  =   I  (B   )  .  .  d.  >  d    E  (B   )  .  .  =  d   tr  B 

.  ,      n  l    m  .  ,      n     m 
i=l  i=l 

m       _ ,         m         _ , 

=  d    I      ch  (B  L)    =  d    £  (ch  (B)) 
m  i=1   i         m  .=1    i 

since  ch . (B   )  =  (ch  ,,  .(B))-.   Hence,  using  the  fact  that 
l  m+l-i 

m 
for  any  matrix  X (mxm) ,  tr  X  =   £   ch. (X)  and 

m 
|X|=   II   ch.(X),  we  see  that 
i=l    x 


31 


™  -1 

(J)(A,B;D,e,h)  >  e   Z  ((ch.  (A))    +  £n(ch.  (A))) 

i=l 

m  -1 

+  h   I  (d  (ch.  (B))    +  ln(ch.   (B)  )  )  .     (2.2.3) 

.  ,   m    i  i 

i=l 

From  Lemma  2.2.2  we  know  that  ch . (B-A)  <  ch.  (B)  ,  since  A  is 

positive  definite.   Then  C      can  be  written 

C      =    {(A,B):  ch.  (A)  >  0:  i  =  1,2,. ..,m;  cfcu  (B)  >  0: 

i  =  l,2,...,m;  0  <  ch^  (B-A)  <  chi (B) :  i  =  l,2,...,s; 

ch .  (B-A)  =  0 :  i  =  s+1 , . . . ,m;  A  =  A ' ,  B  =  B  '  } .   The  closure , 

~C    ,    Of  C      is  {  (A,B)  :  ch .  (A)  >  0:  i  =  l,2,...,m;  ch.  (B)  >  0: 
s       s  i  i 

i  =  1,2,..., m;  0  £  ch.  (B-A)  £  ch.(B):  i  =  1 , 2 ,  .  .  .,  g;  chi  (B-A) 

=  0:  i  =  s  +  l,...,m;  A  =  A',B  =  B'}.    Since  <J)  (A,B;D  ,e,h)  >  0, 

it  has  an  absolute  minimum  over  C    ,  since  C      is  closed. 

Note  that  from  Lemma  2.2.2  if  ch.(B-A)  =  ctu  (B)  for  some  i, 

then  it  must  be  true  that  ch  (A)  =0,  since  A  must  then  be 

m 

positive  semidef inite.   Thus,  for  every  (A,B)  £    cs    ~    cs    it: 

must  be  true  that  ch  (A)  =  0  or  ch  (B)  =  0  or  both.   It  then 

m  m 

follows  from  (2.2.3)  that  cf>  (A,B;  D  ,e  ,h)  =  °°  whenever  (A,B) 

£  ~C      -    C    .   Hence,  <j>(A,B;  D,e,h)  has  an  absolute  minimum 

over  C    . 
s 

Lemma  2.2.4:   Suppose  the  function  f (x) ,  minimized 

over  x  e  S,    achieves  a  minimum  at  x  =  a.   Let  the  set  S, 

be  such  that  for  any  x  £  S-S-.  ,  there  exists  an  x,  £  S, 

such  that  f(x  )  <  f (x) .   Similarly,  let  the  set  s2   be  sucn 

that  for  any  x  es~S2'    there  exists  an  x„  e  S-  such  that 

f(x„)  <  f(x).   Then  it  follows  that   a  £  S1   n  S 2. 


32 


Proof:   Suppose   a  f    S^    Q  S 2.      Then  either   a  t   S,  or  a  t   S ? 

or  both.   However,  if   a  t   5^,  then   a  e  5-5-,,  and  there  exists 

no  x1  £  51  such  that  f(x;[)  <  f  (a)  ,  since  f  is  minimized  at  a. 

This  then  is  a  contradiction,  so  it  must  be  true  that   a  e  S,  . 

Similarly,  if   a  £  S2 ,  then  a  e  S-S~,    and  there  exists  no 

x2  e  52  such  that  f(x2)  <  f (a) .   This  also  is  a  contradiction, 

so  it  must  be  true  that  a  e  S2 .   Hence,  it  follows  that 

a  £  s1    n  S2. 

In  Lemma  2.2.3  it  was  seen  that  the  function  $ (A,B;D ,e,h) 

has  an  absolute  minimum  over  the  set  C    .      We  will  now  show 

s 

that  this  absolute  minimum  will  occur  only  when  both  A  and  B 
are  diagonal. 

Lemma  2.2.5:   The  absolute  minimum  of  <}>  (A,B;  D,  e  ,h) 
subject  to  (A,B)  e  C      occurs  when  both  A  and  B  are  diagonal. 
We  offer  two  proofs. 
Proof  1:   Define  the  sets  5,  and  S ?    as  follows: 

51  =    { (A,B)  £  Cs:       A  is  diagonal}, 

52  =    {(A, 3)  e  Cs:      B  is  diagonal}. 

We  want  to  show  that  if  <j>  (A,B;D ,e,h)  achieves  a  minimum  at 
(A,B)  =  (A*,BA),  then  (A*,BJ  e  s±    AS..   Now  with 

A  =  D  2AD  2    and  B  =  D  2BD  2, 

0  (A,B;D,e,h)  -  e[trA_1+£n|A[ ]  +  h [ trB_1D+£n | B | ] 

=  e[trA~  D~  +£n|A|]  +  h [trB-1+£n | B | ]  +(e+h)£n|D| 
=  <t>(B,A;D~  ,h,e)  +  (e+h)£n|D|. 


33 


_3^ 

Note  that  since  D  2  is  positive  definite,  (A,B)  £  Cg  if  and 
only  if  (D~  2AD~  2,D~  2BD~  2)  =  (A,B)  e  C    .   Thus,  minimizing 
4>  (A,B;D,e,h)  subject  to  (A,B)  e  C   is  equivalent  to  minimiz- 
ing <MB,A;D   ,h,e)  subject  to  (A,B)  e  C  .   Moreover,  if 
(A*,BJ  minimizes  <MB,A;D   ,h,e),then   (D  2A*D  2  ,D  2  B*  D  2)  min- 
imizes 4>  (A,B;D,e,h)  .   Now  arbitrarily  fix  (A,B)  e    Cg  and 
consider  <j>  (PBP  '  ,PAP  '  ;D~  ,h,e)  for  all  orthogonal  P.   Clearly 
the  terms  £n|PAP'|,  tr  PB~  P',  and  £n | PBP  '  |  are  constant  for 
all  orthogonal  P,  so  that  4>  (PBP  '  ,PAP  '  ;D~  ,h,e)  is  minimized 
with  respect  to  P  when  tr  PA~  P  'D    is  minimized.   It  follows 
from  Lemma  2.2.1  that  all  the  stationary  points,  and  thus  the 
absolute  minimum,  occur  when  PAP'  is  diagonal.   Hence,  for 
any  (A,B)  e  C     -   S,  there  exists  an  (A-^B^)  e  S1    such  that 

(J)  (B1,A1;D_1,h,e)  <  cf>(B,A;D~  ,h,e). 

—  k    —  \l 

But  since  A  =  D   AD  2,  we  know  that  A  is  diagonal  if  and  only 
if  A  is  diagonal.   So  we  find  that  for  any  (A,B)  e    C      -   S^, 
there  exists  an  (A1,B-L)  e   S1    such  that 

tHAwB-]/  D,  e,h)  <  <$>  (A,B;D,e,h)  . 

In  a  similar  manner  now  arbitrarily  fix  (A,B)  £  Cg  and  con- 
sider (J>  (PAP  '  ,PBP  '  ;D,e,h)  for  all  orthogonal  P.   Clearly  this 
is  minimized  with  respect  to  P  when  tr  PB~  P 'D  is  minimized, 
since  the  terms  tr  PA-1P',  £n | PAP  '  |  ,  and  In | PBP ' |  are  con- 
stant for  all  orthogonal  P.   So  from  Lemma  2.2.1  it  follows 
that  all  the  stationary  points,  and  therefore  the  absolute 
minimum,  of  <t>  (PAP  '  ,PBP  '  ;D  ,e  ,h)  occur  when  PBP'  is  diagonal. 


34 


This  implies  that  for  any  (A,B)  e    C      -  S~ '  there  exists  an 
(A2,B2)  e  S2    such  that 

(J»(A2/B2;D,e/h)  <  <p  (A,B;D,e  ,h)  . 

The  result  now  follows  from  Lemma  2.2.4.   Furthermore,  from 
Lemma  2.2.1  we  see  that  if  (A*,B*)  minimizes  <f>  (A,B;D  ,e  ,h)  , 
then  the  diagonal  elements  of  D  2A^D  2  are  increasing  and 
the  diagonal  elements  of  B*  are  decreasing. 

The  second  proof  of  Lemma  2.2.5  utilizes  the  concept  of 
"majorization"  (see  Marshall  and  Olkin  [1974] ) . 

Definition  2.2.6:   Let  x  and  y_  be  real  mxl  vectors 
with  i   element  x.  and  y.,  respectively,  and  i    largest 
element  x...  and  y,-w  respectively.   We  say  that  x  major- 
izes y_  and  write  x  >  y_,  if 

s         s 

Z  x,..  2      I    yM>  for   s  =  l,2,...,m, 

i=l  [±)         i=l  y±> 

with  equality  when  s  =  m. 

We  will  need  some  results  which,  while  well  known  to 
workers  in  the  area  of  majorization,  may  not  be  readily 
accessible  to  others.   We  prove  the  results  here  for  the 
benefit  of  the  uninitiated  reader. 

Lemma  2.2.7:   If  S  (mxm)  is  doubly  stochastic,  then 

x  >  sx  =  v.. 

Proof :   Since  S  is  doubly  stochastic,  it  follows  that 

s.  .  >  0  for  all  (i, j)  ,  and 
lj  J 


35 


*  sij  =  1  for   i  =  1,2,. ..,m, 


j  =  l 


m 


z    sii  =  1  for   j  =  1,2,. . .,m. 

1=1   J 

Thus,  for  1  <  t  ^  m  there  exists  k1,k2,...,kt  such  that 
t         m 

■  Z  Y(i)  =   Z   (Sk  i+Sk  i  +  *-'+Slc  i)xi' 
1=1  u'    j=i   KlJ  K23  t3      J 

Clearly  when  t  <  m, 

m 
Bk1j+Sk2j  +  ---+Bktj  *  .^Sij  =  1  for  J  =  1'2 m, 

and 


Z  (sk  i+sk  i  +  -"+sk  -i>  =  fc- 
j=l   Kl^   V       kt: 


Then  when  t  <  m, 
t         i 


1  y(i)  =  Z  (sk  i+sk  i  +  -"+sk  -i>xi 
i=l  u'    j=i  Ki^  k2j  ktD   D 


t 
e 

i=l 
If  t  =  m,  then 


*   Ex.... 
-i  (i) 


sv  -;+su  ^  +  *..+s,   .  =   E  s.  .  =  1, 

kl:   k2^      V    i=l  ^ 

so  that 

t         t 

E  y...  =   E  (s   .+s,   .+...+S,   .)x. 
i=l  (1)    j  =  l   kl=>   k2^       V   ^ 

t        t 
E  x  .  =   Ex.... 
j=l  ^    i=i  (i) 


36 


ILl 

Lemma  2.2.8:   If  x  >  y_  and  a   .  >  a  (2.  >  ...  >  a  .  .  >  0, 

then 

m  m 

.Z=1x(i)a(i)  *  .Z=/ia(i)- 
Proof:   Put  d.  =  x..>  ~  Yj_«   Then 


.^(D^i^i)  =  ^Vd) 


=  dl(a(l)_a(2))  + 

(d1+d2)  (a(2)-a(3))  + 

(d1+d2+d3)  (a(3)-a(4) 


)  + 


(d1+d2+...+dm_1)(a(m_1)-a(m))  + 
(d1+d2+...+dm)a(ra). 


The  last  term  is  zero,  since 

mm  m 


=  0 


I    d   =   E  (x(i)-y.)  =  I    x     -   E  y  .} 
i=l  1   i=l  {1)      x  i=l  llj    i=l  u' 

The  partial  sums  are  nonnegative,  since 

t       t         t        t         t 
E  d.  =   E  x...  -   E  y.  *   E  x...  -   E  y,,M  >  0. 
i=l  1    i=l  (1)    i=l  X         i=l  (1)    i=l  (1) 

Further,  the  differences  a(1)-a(2),  a  (2)  -a  (3)  ,  .  .  .  ,  a  (m_1)  -a  (m) 

are  nonnegative.   Hence,  the  result  follows. 

Lemma  2.2.9,  Corollary:   If  x  is  an  ordered  vector, 

that  is,  x,  ^  x-   2.  ...  >  x  ,  S  is  doubly  stochastic,  and 
12  m 

a  is  also  an  ordered  vector,  then  x'a  >  (Sx)  'a. 


then 


37 


Lemma  2.2.10:   If  x  >  y  and  a.,,  >  a.„.  >...>  a.  ,  >  0, 
-   *■  (1)  -   (2)  -    -   (m)  -   ' 


m 


Z  x.  .  .  °.  ,  ,  .  .  £   Z  y  .  a 
i=1  d)  (m+l-i)    i=i  x  (m+l-i) 

Proof:   The  proof  is  similar  to  that  of  Lemma  2.2.8. 

Letting  d.  =  x  .  . . -y .  ,  we  have 
3   l     (l)  J  l 

m  m 

.^(i)"^  (m+l-i)  =  ^V  (m+l-i) 

=  dl(a(m)-a(m-l))  + 

(dl+d2)(a(m-l)-a(m-2))  + 
(dl+d2+d3)(a(m-2)-a(m-3))  + 


(d1+d2+...+dm_1)(a(2)-a(1))  + 

(d1+d2+...+dm)a(1). 

t 
We  have  seen  that  the  partial  sums  ,   E  d  .  :  t=l ,  .  .  .  ,m-l,  are 

i=l  1 
m 
nonnegative  and   Z  d.  is  zero,  so  that  the  last  term  is  zero 

4-1   1 

(m)~a(m-l)  ,a(m-l)"a(m-2)  '**  *  ' 


i=l 
Further,  the  differences  a,  ,-a,   ,,,a,_  ,  , -c 


0,-,-a,,,  are  negative  or  zero.   Hence,  the  result  follows 

Lemma  2.2.11,  Corollary:   If  x  is  an  ordered  vector, 

and  y_  =  Sx  with  S  doubly  stochastic,  then 

m  m 

•j^i0  (m+l-i)  *  J^ia  (m+l-i)' 

Furthermore,  if  aM,  >  a,-,.  >  ...>  a,  w  then  there  is 
(1)     (2)         (m) 

equality  only  if  y_  =  x. 


38 


We  are  now  ready  for  the  second  proof  of  Lemma  2.2.5. 
Recall  that  we  need  to  show  that  the  absolute  minimum  of 
cf>  (A,B;D,e,h)  subject  to  (A,B)  e  C      occurs  when  both  A  and  B 
are  diagonal. 

Proof  2  (Lemma  2.2.5):   Let  S-.    and  S 7   be  defined  as  before; 
that  is, 

S,  =  { (A,B)  e    C      :  A  is  diagonal}, 

S2   =    { (A,B)  e  C      :  B  is  diagonal}, 

and  recall  that  we  need  to  show  that  if  $ (A,B:  D,e,h)  is  min- 
imized at  (A*,B*),  then  (AA,B*)  e  S,  fl  S~  .   Let 

3i  -  3o  -  • • •  -  8   >  0  be  the  latent  roots  of  B   ,  and 
PB_1P'  =  diag(3m,6m_1,...,61) .   Then 

{<j>  (PAP'  ,PBP';D,e,h)  -  $  (A,B;D,e  ,h)  }/h 
=  tr  PB-1P'D  -  tr  B-1D 

m  m  /  m        2 

=  Z    6  J.1  -d.  -   E    Z    3  _,,  .P^  Id. 
j=1  m+1-3  ]    j=1  ^.=1  m+l-i  11)    3 


m  m 

=   I  B -d  ..  .  -  I    y .d  Al   . ,  (2.2.4) 

j=l  3  m+1-3    j=l  3  m+1_3 

where  y_  =  P^Ji'Ji '  =  ( B-,  /  62  /  •  •  •  ,  3  )  ,  and  P2  is  the  matrix 

with  (i,j)    element  p  ,,  .   ,,  ..   Since  PP  '  =  P 'P  =  I, 

we  see  that  P~  is  doubly  stochastic.   Also  d  =  (d, ,d~,...,d  ) 
2         J  —     1 '  2     '  m 

and  _B_  are  ordered  vectors,  so  by  Lemma  2.2.11,  equation  (2.2.4) 

is  not  positive.   Furthermore,  d,  >  d„  >  . . .  >  d   so  that 
r  12  m 

6(PAP' ,PBP ';D,e,h)  <  $  (A,B;D , e ,h)  , 


39 


with  equality  holding  only  when  B    =  diag  ( B  ,  B   ,,...,  8-,  )  . 
Therefore,  for  any  (A,B)  e  C      -   S-  there  exists  an  (A„,B_)  e S~ 
such  that 

<MA2,B2;D,e,h)  <  * (A,B; D ,e ,h) . 

Now  with  A  =  D  ^AD  2  and  B  =  D  2BD  2 

<J>(A,B;D,e,h)  =  e[trA~  d"  +£n|A|]  +  h[trB~  +An|5|] 

+  (e+h)£n|D|  (2.2.5) 

=  <j>(B,A;D~  /h,e)  +  (e+h)An|D|. 

Let  a,  >  a_  >  . . .  >  a   >  n  be  the  latent  roots  of  A 
1     2  m   u 

and  QA  Q'  =  diag  (a,,a2,...,a  ).   Then  by  an  argument  iden- 
tical to  the  previous  one  we  find  that 

<$>  (QBQ'  ,QAQ  ';D_1,h,e)  <  c(>  (B  ,  A;  D_1  ,h ,  e)  , 

with  equality  holdinq  only  when  A   =  diag (a, ,a_ , . . . ,a  ). 
^  12m 

From  (2.2.5)  it  follows  that 

$  (D2QAQ'D2,D  2QBQ'D2;D,e,h)  <  tj)  (D  2AD  2  ,D  2  B  D  2;  D  ,e  ,h)  , 

with  equality  holding  only  when  A   =  diag (a, ,a„ , . . . , a  ). 

h      h  - 1         -1 

Note  that  D2QAQ'D2  =  diag  (d, a.  , . . . ,d  a   ).   Thus,  for  any 

'      1   1  m  m 

(A,B)     e    C      -    5,    there   exists    an    (A, ,B, )     e    5,    such   that 
si  ill 

<j>  (A.,,B,  ;D,e,h)     <    <J>  (A,B;  D  ,e  ,h)  . 

The  result  now  follows  from  Lemma  2.2.4. 

Lemma  2.2.12,  Corollary:   Let  R  be  some  restriction  on 

the  latent  roots  of  A  or  B  or  both,  and  let  C      be  the  subset 

s 

of  C      such  that  (A,B)  e  C      implies  that  R  is  satisfied. 
Since  (A,B)  e  C*R  if  and  only  if  (PAP',PBP')  e  CR  for  any 
orthogonal  P,  it  follows  that  the  minimal  value  of 


40 


T3 

<j>  (A,B;D,e,h)  over  (A,B)  e  C      occurs  when  A  and  B  are  diagonal. 
For  example,  if  the  latent  roots  of  A  were  known  to  be  pro- 
portional to  a  given  set,  then  the  minimal  value  of 
(j>  (A,B;D,e,h)  over  (A,B)  e  C     occurs  when  A  is  a  diagonal 
matrix  with  diagonal  elements  proportional  to  this  set. 

2 . 3   The  Maximum  Likelihood  Estimates 

In  this  section  we  seek  the  maximum  likelihood  estimates 

s 
of  Z  and  M  subject  to  the  constraints  Z  e  P  and  Me   UP.. 

m         j=0  3 
Recall  that  the  likelihood  function  of  (Z,M)  is 

K  (I,e)K  (I,h)     ,  ,,    .  .    ,  . 
f(EfH)  =  m  |H|%(h-m-l)|E|^(e-m-l) 


Z+VL\hii\z\h& 


x  exp[-JjtrZ  1E-%tr  (Z+M)_1H]  . 
The  logarithm  of  the  likelihood  function,  omitting  a  function 
of  the  observations,  is 

-  ^trZ~1E-J3e£n|  Z|  -  %tr  (Z+M)  _1H-Jshiln  |  Z+M  |  . 
We  seek  the  solution,  (Z,M),  which  maximizes  the  above  equa- 
tion, or  equivalently ,  the  solution  which  minimizes 

trZ_1E+e£n| Z |  +  tr (Z+M) -1H+h£n | Z+M |  (2.3.1) 

s 
subject  to  Z  e  P      and  Me   UP.. 

j  =  0  ^ 

Let  E*  =  (l/e)E  and  H*  =  (l/h)H.   Note  that  since  E*  and 

m 
H*  are  both  symmetric  matrices,  and  E^e  P   and  H*  e  U  P., 

there  exists  a  nonsingular  matrix  K(raxm)  such  that 


KE*K'  =  I  and  KH*K '  =  D,  where  D  =  diag (d1 ,d2 , . . . ,d  ),  and 
dl  >  d2  >  "''  >  dm  >  °  are  the  latent  roots  of  H*E*  . 


41 


Then  with  E  =  KZK '  and  M  =  KMK '  ,  (2.3.1)  can  be  rewritten 

etrK'-1E~  K-1I+e£n| E |+htrK'-1 (E+M) ~1K~1D+h£n | E+M| 

=  e[trE-1+iin|  E  |  ]  +  h  [tr  (E+M)  -1D+£n  |  Z+M|  ] 

-  (e+h)£n|K[2 
=  <p  (E,E+M;D,e,h)  -  (e+h)£n|K|2. 

Thus,  the  problem  has  been  reduced  to  that  of  minimizing 

~  ,.  _  ^  s 

<j>  (E,Z+M;D,e,h)  subject  to  E  e  P      and  Me   UP.   or, 
J  m  .  _  i  ' 

...  -  ,  D=0  J 

equivalently ,  (E,E+M)  e    C    .      But  from  Lemma  2.2.5  it  is 

known  that  the  minimal  solution  to  $ (E ,E+M;D ,e,h)  is  such 

that  E  and  E+M  are  diagonal,  and  in  addition,  it  is  known 

that  the  diagonal  elements  of  D  ^ED    are  increasing  while 

the  diagonal  elements  of  E+M  are  decreasing. 

Consider  the  function 

g(x,y)  =  e(^+2nx)  +  h^+£ny)  ,  (2.3.2) 

where  d  >  0.   Differentiating  (2.3.2)  with  respect  to  x  and 
y,  we  get  the  equations 

-i  +  i-o. 

2    x     ' 

X 

y2   y 

which  yield  the  minimal  solution  xn  =  1  and  yn  =  d.   If 

instead  we  wanted  to  minimize  (2.3.2)  subject  to  x  =  y, 
(2.3.2)  would  reduce  to 

g(x)  =  e  (-  +  Hnx)  +  h  (-  +  Jinx).  (2.3.3) 


42 


Then 

dg  (x)     .   1,1.,,,   d  ,  1. 
d_-  =  e(_  _  +   ,  +  h(_     +   }  =  0/ 

x  x 

so  that  x,  =  y.  =  — -r-  minimizes  (2.3.3). 
1     1    e+n 

Now  let 

f(d)  =  Q(xl,y1)    -  g(xQ,y0) 

-  ~  /  e+h  ,  p„  ,e+dh,  .    ,  ,d(e+h)      ,s+dhn 
"  e  (i+dh  +  £n  (^F)  >  +  h  (  e+dh   +  £n  (-eTF)  ) 

-  (e+h+h£nd) 


_n    (d-1)  hx    .  ..  ,  (d-l)e.  ,  .  ,..  .  ,e+dh. 
e(1  "  -^dh-]    +   h(1  +  TRBr)  +  (e+h)  in  (-^j^-) 

-  (e+h+hilnd) 


=  (e+h)Hn(^)  -  hind, 
e+h 

Differentiating  f (d)  with  respect  to  d  and  noting  that  e  >  1, 

h  >  1,  we  find  that  when  d  >  1 

df .(d)  _  h(e+h)    h   dh  (e+h) -h  (e+dh)    eh  (d-1) 

dd     e+dh    d    "   (e+dh)d        (e+dh)d     ' 

In  other  words,  the  difference  g(x,,y,)  -  g(xn,y_)  is  an 

increasing  function  of  d  when  d  >  1. 

Now  with  X  =  diag (x, ,x„ , . . . ,x  )  and  Y  =  diag (y, ,y_ , . . . ,y  ) 

consider  minimizing 

m   ,  m   d . 

<|>(X,Y;D,e,h)  =  e  I  (7-+  tax.)  +  h  T.     (—  +  In   y.)    (2.3.4) 
i=l  Xi       x     i=l  Yi       x 

subject  to  (X,Y)  e    C    ,    which  in  this  case  implies  that 

y.  >  x.  >  0  for  all  i,  and  x.  =  y.  for  at  least  m  -  s  of  the 
1     1  1    d  1 

i's.   Suppose  that  dn  >  d„   >  ...  >  d   >  1  >  d   ,>...>d   >  0. 

12  r         r+1      m 

Using  the  fact  that  f (d)  is  increasing  in  d  for  d  >  1, 


43 


it  then  follows  that  the  minimal  solution  to  (2.3.4)  is 

(X  ,Y  ) ,  where  if  r  >  s, 

'x  .  =  y  .  =  (e+d.h)  /  (e+h)      for   s+1  <  i  <  m, 
si   J  si       1  ' 

x  .  =  1,  y  .  =  d.  for     1  <  i  <  s, 

«.  si       2  si     1  ' 

and  if  r  <  s , 

fx    .    =  y    .    =    (e+d.h)  /  (e+h)      for   r+1  <  i  <  m, 

Ix  .  =1,  y  .  =  d.  for     1  £  i  <  r. 

v.  si       Jsi     1 

Thus,  cj)(E  ,E+M;D,e,h)  is  minimized  subject  to 

(E,Z+M)  £  C     at 

Z    =  X  , 
s 

M  =  Y  -X  , 
s   s' 

so  that  the  maximum  likelihood  estimates  of  I    and  M  are 

E  and  M,  where 

E  =  K-1X  K'-1, 
s     ' 

M  =  K_1 (Y  -X  )K'_1. 
s   s 

We  now  present  an  example  to  illustrate  the  computation 
involved  in  deriving  the  maximum  likelihood  estimates. 
Consider  model  (2.1.1)  in  which  we  take  m  =  4,  g  =  21, 
n  =  6,  Z    =  I,  and  M  =  diag  (99  ,  24  ,  0  , 0)  .   Hence,  e  =  g(n-l)  = 
105  and  h  =  g-1  =  20.   Generating  a  matrix  E  from  the  dis- 
tribution W.  (1,105,0)  and  a  matrix  H  from  the  distribution 
W4  (I+M,20,0)  ,  we  obtain 


44 


E  = 


69.1329 


H  = 


1845.85 


4.07476 
127.055 


63.5986 
688.962 


-5.12762 
-3.77638 
116.342 

-16.5227 

1.14908 
20.1453 


-9.94924 
20.4629 
8.12511 
100.186 

-1.43363 

-8.61601 

-,0100181 

12.2617 


With  E*  =  (1/105) E  and  H*  =  (1/20) H  we  need  to  find  a  non- 
singular  matrix  K  such  that  KE^K'  =  I  and  KH*K'  =  D,  where 
D  is  a  diagonal  matrix.   Let  D,  =  diag (ch,  (E*)  , . . . ,ch4 (E*) )  , 
and  let  P  be  the  orthogonal  matrix  for  which  the  i   column 
is  the  characteristic  vector  of  E*  corresponding  to  ch^ (E*) , 


then,  since  E*  is  symmetric,  P'E*P  =  D. 


Similarly,  let 


D  =  diag(ch1(D1i5P'H^PD^  2)  ,.  .  .  ,ch4  (D12P'Hi,PD135)  )  ,  and  let  Q 

be  the  orthogonal  matrix  for  which  the  i    column  is  the 

—  h  —h 

characteristic  vector  of  D,  P'H^PD,  2  corresponding  to 

—  \  —it  —  1?  —  h 

ch.  (D,^P  fH*PD,^)  ,  then,  since  D1  2P  'H^PD-j^  2  is  symmetric, 

Q  'D,  2P  'H*PD,  2Q  =  D.   Thus,  we  may  take  K  =  Q 'D1 2P  ' .   Using 

the  above  decomposition  for  K,  we  find  that,  for  our  example, 


1.24522 
.0181884 
.00831611 
-.000914637 


-.0464049 
-.925978 
-.00767896 
-.0158712 


.0380069 
-.0477042 

.940375 
-.153048 


.130133 

.213476 

-.237376 

-.994866 


and  D  =  diag (142 . 729 , 29 . 6669 ,. 91847 ,. 625404 ) .   Note  that 


dy    >  1  and  d_  <  1,  so  that  r  =  2 


Simple  calculation  yields 


XQ  =  YQ  =  diag(23. 6766, 5. 5867, .986955,  .940065)  , 
Xx  =  diag(l, 5. 5867, .986955, .940065)  , 


45 


Y1   =   diag(142.729,  5.5867,  .986955,  .940065), 

X2=X3=X4=  diag(l,  1,  .986955,  .940065), 

Y2=Y3=Y4=  diag(142.729,  29.6669,  .986955,  .940065). 

Hence,  if  we  let  E.  and  M.  be  the  -n^ximum  likelihood  estimates 


of  I    and  M,  respectively,  subject  to  the  constraints  Z    e   P 

i 
and  Me   U  P . ,    we  find  that 


j=0  3 


h    = 


15.3199 


.541388 
6.52813 


-.173203 

-.0210184 

1.0919 


-.0910632 
.0947756 
.064921 

.899582  J 


MA  =  (0)  , 


.665836 

.246703 

-.046356 

-.0924036 

h  = 

6.5222 

-.0184676 

.0947486 

1.0908 

.0649326 

n— 

.899582  „ 

91.5881 

1.84178 

-.792796 

— i 
.00837764 

.0370371 

-.0159426 

.000168469 

kl  - 

.0068625.2 

-.000072518 

s~ 

.000000766- 

.6578 

.040037 

-.0471092 

-.0889834'"1 

1.20736 

-.0378372 

.182707 

Z2=E3=E4= 

1.09073 

.0652532 
.898126 

M  =M  =M  = 

2   3   4 


46 


91.6383      3.13344      -.788088       -.0129987 
33.2548       .105117       -.549569 
.00730371     -.002076 

.00909865 


Further  commentary  on  these  data  will  be  made  in 
Sections  3.6,  4.2,  and  5.4. 


2. 4   The  Likelihood  Ratio  Test 

s 

Recall  that  C      =    { (A,B) :AeP    ,LeP    ,B-Ae  UP.}, 
s  m    m      j=(J  ] 

and  suppose  we  know  that  (Z,E+M)  e  Q   =   C    .      We  wish  to  test, 

s 

say,  the  null  hypothesis  that  (E,Z+M)e  co  =  C      ,    O  C    .      The 

s-l     s 

alternative  hypothesis  is  then  that  (E,E+M)e  n  -  to   = 
C     -    C      i-   Thus,  we  are  testing  the  hypothesis 

H^s) :  rank  (M)  <  s-l 
against  the  hypothesis 

h|s) :  rank  (M)  =  s. 

We  adopt  the  likelihood  approach  and  compare  max  f (E,H) 

with  max  f(E,H).   Specifically,  we  look  at 

max  f(E,H)/»a*"  f(E,H)  =  A  e  (0,1]. 

With  the  matrices  X   =  diag  (x  -,  ,x  „,..., x   )  and 
s      r   si   s2      sm 

Ys  =  diag(ygl,ys2,...,ysm)  given  by 


x 

s 


X 

s 


i    =   y^-:  =  (e+d.h)  /  (e+h)     for   s+1  <,    i  <.   m, 
i  =  1'ysi  =  di  for    1  *  i  £  s, 


47 


if  r  >  s,  and 

.  =  y  .  =  (e+d.h) / (e+h)         for   r+1  <  i  <  m, 
;i   -*si       1 

x  .  =  1,  y  .  =  d.  for     1  <  i  <  r, 

si      Jsi     1 

if  r  <  s,  the  maximum  likelihood  estimators,  Z0,  of  Z  and, 
MQ,  of  M  when  the  parameters  are  restricted  to  lie  within 
Q ,    are  given  by 

h  =  k"1xsk'_1' 

where  K  is  a  nonsingular  matrix.   Similarly,  the  maximum 
likelihood  estimators,  Z  ,  of  Z and,  ft  ,  of  M  where  the 

CO  '   CO 

parameters  are  restricted  to  lie  within  lo  ,  are  given  by 
K    =   K'^s-lK'"1' 

It  should  be  noted  that  if  r  <  s,  then  X   =  X   ,  and 

s     s-1 

Y   =  Y   ,  ,  and  if  r  -  s,  x  .  =  x   ,  .  and  y  .  =  y   ,  . 
s    s-1  si    s-l,i     Jsi   is-l,i 

only  for  i  /  s. 

The  likelihood  ratio,  A,  is 


A  = 


max  f(E.H) 

co 

max  f(E,H) 

n 

expt^trZ-V^rtZ^)-^]  \t^J^\lj^ 


48 


exp[-JsetrXa  -JshtrYg  D]     |Ys_1|  sn|Xg_1r 

|Y  |^|X  \hG 
1  s1   '  s ' 


s-11   '  s-11 


since,  if  r  <  s, 

etrfx'^-X-1)  +  htr(Y^1-Y~1)D  (2.4.1) 

=  etrtX^-x"1)  +  htr(Y~1-Y~1)D 

=  0, 
and,  if  r  >  s,  (2.4.1)  becomes 

e  (x   ,   -x   )  +  h  (y  ■  ,   -y    d. 
s-l,s   ss      -^s-^s  ^ss   s 

/  ,i-\        d  h(e+h) 
=  eje+hi  _     _s h 

e+d  h         e+d  h 
s  s 

(e+d  h) (e+h) 

s       --  (e+h)  =  0. 


e+d  h 
s 


So  we  have 

-%(e+h) 


,.  ,e+d  h 
dhh{         s 


s  [  e+h  /  if   r  >  s, 

if   r  <  s. 

Since     d-,  >  d0  >  .  .  .  >  d   >1>   d_,,  >  ...>  d_  >  0, 
1     I  r        r+i         m 

clearly,  r  >  s  if  and  only  if  d   >  1.   Hence,  we  can  write 


if   ds  >  1, 


if   0  <  d   £  1. 
s 


49 


Now  upon  taking  the  derivative  of  A  with  respect  to  d  over 

s 

the  range  d   >  1 ,  we  get 


dcT  =  *h  ds 
s 


M  ,,e+d  h^(e+h)-l 
*sh-lf    s 


e+h 


e+d  h 
e+h 


-d 


=  hh   d' 


Ve+d8hV,s(e+h)-1 

\  e+h 


e-d  e 
s 

e+h  _ 


which  is  negative  for  d   >  1.   Thus,  A  is  a  decreasing 
function  of  d   over  the  range  d   >  1.   In  addition, 


,  ,  fe+d   h 

M 2- 

s  y  e+h 


rh(e+h) 


<  1 


for  d   >  1, 
s     ' 


with  equality  when  d   =  1,  so  that  A  is  a  decreasing  func- 
tion of  d  . 


(s) 

The  likelihood  ratio  test  rejects  H*    for  small  values 

of  X .   Since  A  is  a  decreasing  function  of  d  ,  the  likeli- 

(s) 

hood  ratio  test  rejects  H^    for  large  values  of  d  .   Now 

recall  that  with  H*  =  (l/h)H  and  E*  =  (l/e)E,  there  exists 
a  nonsingular  matrix  K  such  that  KH*K'  =  D  and  KE^K'  =  I. 


It  follows  then  that  d.:  i  =  l,2,...,m  are  the  solutions  to 


H+-dEj  =  iKH.K'-dKE^K' 


D-dl   =  0, 


th 


so  we  observe  that  d   is  the  s    largest  solution  to 


H*-dE. 


=  0 


(2.4.2) 


With   cjk  =  d^/e  :  i  =  l,2,...,m,  (2.4.2)  can  be  written 


H  "  f E|  =  0, 


or 


H  - 


=  0. 


(2.4.3) 


50 


(s) 

Hence,  we  would  reject  H'    for  large  values  of  0   =  d  h/e, 

where  <$>      is  the  s    largest  solution  to  (2.4.3).   It  is  of 
particular  importance  to  recall  that  H  ~  W  (E+M,h,0)  and 

E  ~  W  (Z,e,0),  independently. 

(s) 

We  have  seen  that  the  likelihood  ratio  test  rejects  H* 

when  (J)    >c  for  some  constant  c.   Now  we  want  to  choose  for 

s 

the  constant  c  some  number,  which  we  will  denote  by  c(a,m,s) 

to  indicate  its  dependence  upon  a,  m,  and  s,  such  that 

P  ((f)   >  c(a,m,s)  |  (E,M)  )  <  a  for  all  (£,Z+M)  e    C         .   For 

c(a,m,s)  we  propose  the  a  level  critical  value  for  the 

largest  root,  9,,  from  amongst  the  m-s+1  roots  of  |w,-9W-|  =  0, 

where  W,  -  W    , ,  (I  ,h-s+l ,  0)  and  W0  ~  W    ,,(I,e,0),  inde- 
1    m-s+1  2     m-s+1 

pendently.   That  is,  we  take  c(a,m,s)  such  that 

P(91  >  c(a,m,s))  =  a.   Justification  for  this  choice  of 

c(a,m,s)  will  be  given  in  the  next  chapter. 


CHAPTER  3 


PROPERTIES  OF  THE  sth  LARGEST  ROOT  TEST 


3 . 1   Introduction 

In  this  chapter  we  investigate  some  properties  of  the 
s    largest  root  test  presented  in  the  previous  chapter. 
It  would  be  desirable  to  show  that  this  test  is  the  uni- 
formly most  powerful  test,  but  we  were  unable  to  do  so  for 
general  m.   However,  in  Section  3.2  we  show  that  for  m  =  1 
the  test  is  uniformly  most  powerful.   Also,  in  Sections  3.3 

and  3.4  it  is  shown  that  the  s    largest  root  test  is  an 

(s)  (s) 

invariant  test  of  H'    against  H,    and  is  the  test  obtained 

by  the  union- intersection  principle  (see  Roy  [1953]).   Finally, 
in  the  last  two  sections  we  discuss  an  important  monotonicity 
property  of  the  roots  <J> .  :  i  =  l,2,...,m,  and  then  use  this 
property  in  deriving  the  asymptotic   distribution  of  cf>  . 


3. 2   The  Uniformly  Most  Powerful 
Test  for  m  =  1 

For  m  =  1  the  problem  reduces  to  that  of  the  univariate 

random  effects  model  discussed  in  Section  1.1.   Recall  that 

—  2    2  2    n 

we  have  x    «.  N(u,(a  +na  ) /gn)  ,  u  ,.  a  y  ,  and 

z    a  z  "e 

,222  22 

v  ..  (a  +no  )  Xh,     independently,  where  u,a  ,  and  a      are  all 

2 

unknown,  and  we  wish  to  test  the  hypothesis  H„ : a  =  0  aqainst 

0   a 

H,  :  a2  >  0. 

1    a 

51 


52 


Suppose  that  for  some  set  of  points,  y',  in  the  space 
of  (x   ,u,v) ,  we  reject  HQ  whenever  the  experimental  (x   ,u,v) 
belongs  to  y'.   Let  6  (y  ' ; y ,a2  ,a2)  =  P  [  (x   ,u,v) ey  '  I V ,a2 ,a2] 
and  require  that  y'  be  such  that  8(y';y,a2,  0  )  =  «    Let 
x  =  u  +  v  and  y  =  v/u,  so  that  u  =  x/ (y+1)  and  v  =  xy/ (y+1) . 
Then  the  Jacobian,   ||jj|  ,  of  x   ,u,  and  v  with  respect  to 
x   ,  x,  and  y  is  ||j]|  =  ||  3  (x   ,u,v)  /3  (x   ,x,y)||   =  x/(y+l)2. 
Then 

3(y';y,a  ,a  )  =  ///  g  (u; a2) g  (v; a2  a2) g_ (x   ;y,a2,a2)du  dv  dx, 

=  ///  f(x,y;a  ,a  )f  (x   ;y,a  ,a  )dx  dy  dx   , 

-.  ^    Ut    U    •  •       z    ex  »  • 

where  y  =  {  (x^  ,x,y) :  (x   ,u,v)  e  y'}  and  where,  independently, 

2  2 

u  -  a  v  , 
zAe 

2    2   2 

v  ~  (a  +na  )  y,  / 
z    a  Ah 

x    ~   N(y,  (a2+na2)  /gn)  , 


so  that 


^%(e+h)-l  „„„r  „/n(.2,„2 
f  (x,y;o^,a^)  =  f  (x,y)  = 


o      9  x2ieT"riexp[-x/2(o>naz)] 


z      a 


(2(a2+na2))3s(e+h)r(J5(e+h)) 


(l+na2/aVe  y^"1  (y+l)  ^(e+h)  exp  [-xna2/2a2  (a2+na2)  (y+1,  ] 
a   z =_     a    z   z    a  J       ' 


Btte^h) 

and  (x,y)  is  independent  of  x   .   We  note  that  when  a2   =  0, 

a 

x  and  y  are  independent;  that  is, 

f(x,y;a2,0)  =  f  ±  (x;  a2)  f  2  (y)  =  f^xjf^y, 


53 


where 


2  2 

x  ~  azxe+h  ' 


y  .  (h/e)Fg. 


Letting  y(x,x   )  =  {y:  (x   ,x,y)  e  y} ,    we  can  write 

00   00 

3(Y';u,a2,0)  *  /  /  /       f(x,y)fQ(x   )dy  dx  dx  ^ 
z      -oo  o  y  (x,x   ) 


=  /  /  f.(x)fn(x   )    /_    f,(y)dy  dx  dx 
—  0  L  U   ••  Y(x,x   ) 


Putting 


h(x,x   ;a2)  =  h(x,x   )  =    /_   f2(y)dy, 
z  ' *    y (x,x   ) 

we  see  that 

00   00 

&(y';\i,a2  ,0)    =11      f1(x)fQ(x   )h(x,x^)dx  dx   . 

x-  -oo  0 

2  —  2 

When  o        =  0,  (x,x   )  is  sufficient  for  (oz,\i).      Further, 

{f,  (x)  f  n  (x   )  :  -co  <  y  <  oo,  a   >  0}  is  a  complete  family  (see, 

2 
for  example,  Lehmann  [1959:130]).   Thus,  since  &  (y  '  ;\i  ,o z>0) 

=  a„,  we  must  have  h(x,x   )  =  a_. 

Now  let  vi  =  { (x   ,u,v):  y  =  v/u  >  c}  where  c  is  some 

2    2   2 
constant.   Then  with  3  =  (q1  ,q2)'  where  qx  ~  (az+naa)  xe+h 

2    2 
and  q„  ~  N(y,(a  +na  ) /gn) ,  independently, 

[3(Y;;y,a2,a2)  -  6(Y';y,a2,a2)]  (l+no2/a2)_3ie 

2   2 
=  E  (d  (q;  a  ,  a  )  )  , 
-1  z  a 

where  the  expectation  is  with  respect  to  the  distribution 
of  3. 


54 


Here 

d&al>al)     =    y     {}f2(y)Q(y'ql)dy  '  y{)f2(y)Q(y'qi)dy' 

where   y*0  =  Y*  ^  ,^2)  ' Y  ()  =  Y  (ql'q2)  '  and 

Q(y,qi)  =  exp[-q1noa/2az(az+naa)  (y+1) ]  . 

Therefore , 

d^°z'aa)  =y,()^()f2(y)Q(y'ql)dy  "  Y()-Y*()f2{y)Q(y'ql)dy' 
Since 

Q(Y,q1)  >  Q(c,q1)      when   y  e  y*()-Y()» 

Q(y,q1)  <  Q(c,q1)      when   y  e  y()-Y*0# 
we  find  that 

d(a;a',a2)  >  Q(c,qi)[^()/Y()f2(y)dy  -  y  ()  /^  ()  f  2  (y)dy] 
=  Q(c,qx)  ty/()f2(y)dy  -  Y{)f2(y)dy] 
=  Q  (c^)  [a0-aQ]  =  0. 


Thus, 


so  that 


E(d(3;az,a^))  >  0, 


(Y*;u,°z,aa)  >  B(Y';U/az,aa) 


Therefore,  amongst  all  critical  regions  of  size  aQ  the  crit- 
ical region  which  rejects  HQ  when  v/u  >  c  is  uniformly  most 

2  2 

powerful  in  a  test  of  HQ:  o^   =   0  against  H-^  oa   >  0.   That 

is,  the  critical  region  $    >    c,  where  <j>  is  the  only  root  of 
(v-<jm)  =  0,  is  uniformly  most  powerful. 


55 


3. 3   An  Invariance  Property 

Consider  the  group  of  transformations  G  =  {g  :  K  (m*m) 

is  nonsingular} ,  where  g  (E,H)  =  (KEK',KHKr).   Since 

K 

E  „  Wm(Z,e,0)  and  H  ~  W  (I+M,h,0),  it  follows  that 
m  m 

KEK'~  Wm(KZK' ,e,0) ,KHK'~   W  (KEK ' +KMK ' ,h , 0 ) ,  and  rank 
(KMK')  =  rank  (M) .   Thus,  the  problem  of  testing  the  hypoth- 
esis Hq   :  rank  (M)  <  s-1  against  h|s' :  rank  (M)  =  s  is 
invariant  under  the  group  G. 

We  will  need  the  following  definition. 

Definition  3.3.1:   Let  X  be  a  space  and  G,  a  group  of 
transformations  on  X.   A  function  T (x)  on  X  is  said  to  be 
a  maximal  invariant  with  respect  to  G  if 

a)  T(g(x))  =  T(x)      for  all  xeX  and  g  e    G; 

b)  T  (x^  =  T(x2)  implies  x1  =  g(x»)   for  some  g  e    G. 

We  will  also  need  the  following  well-known  result  (see, 
for  example,  Lehmann  [1959:216]). 

Lemma  3.3.2:   Let  X  be  a  space,  let  G  be  a  group  of 
transformations  on  X,  and  let  T(x)  be  a  maximal  invariant 
with  respect  to  G.   A  function  f  (x)  is  invariant  with  respect 
to  G  if  and  only  if  f (x)  is  a  function  of  T  (x)  . 

Now  consider  the  roots,  <J>,  >  d>„  >  ...  >  d>  ,  of 

rl  2  Ym 

I  H— <J>E  I     =    0    and    the    roots,     8 .     >    9n    >     .  .  .     >    6     ,    of 

±  Z  m 

|KHK'-9KEK'|  =  0,  where  K  is  nonsingular.   Clearly 

|KHK'-8KEK'  |  =  0 
implies  |k|  |h-8E|  |k'|  =0 

so  that  I H-6E I  =0, 


56 


and  hence,   9.  =  <j> .  :  i  =  1,2,  .  .  .  ,m.   Suppose  now  that 

9.  =  <t> .  :    i  =   1,2/... /in  are  the  roots  of  |H,-9E,  I  =  0  and 
i    Ti         '  '  1    1 ' 

|  H_— 4> E _  [  =  0,  respectively,  where  E,,  E„,  H-,  ,  and  H2  are  all 
positive  definite,  symmetric  matrices.   Then  there  exist 
nonsingular  matrices  K,  and  K~  such  that 

Ei  =  KiKi'        Hi  =  Ki$Ki' 

E2  =  K2K2,  H2  =  K20K2, 

where  $  =  diag  (A.  ,  4>_  ,  .  .  .  ,<J>  ).   It  then  follows  that 

9    -l^l'V  =  (K2KIlElKi"lK2'K2KllHlKi"lK2) 

21 

=  (K2K~1K1K;[K:[~1K^  ,K2K^1K1<I>K-[K-[~1K^) 

=  (K2K^,K2$K^) 

=     (E~  ,  H_  )   , 

where  g„  „_,  e    G  since,  clearly,  K„K,   is  nonsinqular.   So  by 

K2K1  Z    l 

Definition  3.3.1  {<$>:     |H-AE|  =  0}  is  the  maximal  invariant 

with  respect  to  G.   The  s    largest  root,  A  ,  is  clearly  a 

function  of  (A,  ,  A._  ,  .  .  .  ,  <f>  ),  and  hence,  by  Lemma  3.3.2  the 

test  statistic  <j>   is  an  invariant  test  statistic  for  testing 

(s)  (s) 

the  hypothesis  H'   against  the  hypothesis  H,   . 

3 . 4   The  Union-Intersection  Principle 

Suppose  that  in  testing  H-!   :  rank  (M)  <  m-1  against 

f  m  \ 

H,   :  rank  (M)  =  m,  we  adopt  the  rule 

R(m:m):  reiect  H„        if   d>   >  c(a,m,m). 
U  m 

Here   <b,  >  A0  >  .  .  .  >  <t>      >  0   are  the  roots  of  Ih-AEI  =  0, 
yl     2  rm  i   r  i 


57 


E  ~  W  (E,e,0)  and  H  ~  W  (Z+M,h,0),  independently,  and 

c(ct,m,m)  is  chosen  such  that  P(<j>   >  c(a,m,m)  |  H»   )  <  a. 

(s)  (s) 

Consider  now  testing  H*   :  rank  (M)  <  s-1  against  H,   : 

(s) 

rank  (M)  =  s.   The  hypothesis  H'    is  true  if  and  only  if 

(s) 

the  hypothesis  „H,:   :  rank  (FMF ' )  <  s-1  is  true  for  all 

F  e  S(m,s),  where  S(m,s)  is  the  class  of  all  (sxm)  matrices 

(s) 

of  rank  s.   Similarly,  the  hypothesis  HQ    is  false  if  and 

Is) 

only  if  the  hypothesis  FHQ    is  false, and  the  hypothesis 

„H,   :  rank  (FMF')  =  s  is  true, for  at  least  one,  and  in  fact 

F  1 

(s) 
all  F  e  S(m,s).   Hence,  we  could  think  of  HQ    as 

Fesft.,.)  FH0S'  ^  "I''  "  Ftift.,.)  FH1S)  and  re^Ct  H0S) 

if  (E,H)  e  y  =  _  e*L   *  Y(F),  where  y  (F)  is  the  rejection 
r  Eo  ^m ,  S  i 

(s) 

region  appropriate  to  a  test  of  the  hypothesis   HQ   .   The 

sizes  of  Y  (F) :  F  e  S(m,s)  should  be  such  as  to  produce  a 
desired  overall  error  of  the  first  kind  of  the  desired  size. 

This  procedure  is  known  as  the  union- intersection  procedure. 

(s) 

Note  that  we  will  reject  Hq   :  rank  (M)  <  s-1  if  for 

some  F  e    S(m,s),  we  reject  fHqS) :  rank  (FMF')  <  s-1.   Let 

A,   >d)^   >...>*>  0  be  the  roots  of  I FHF  '  -  4>FEF  '  |  =  0, 
IF    y2F  sF  ' 

where,  clearly,  FEF  '  ~  We (FZF ' ,e , 0)  and  FHF'  ~  W  (FZF ' 

S  s 

+  FMF',h,0),  independently.   Then  by  the  rule  R(s:s)  we 

reject  „H'S'  if  <b  „  >  c  (a '  ,  s ,  s),  where  a'  is  chosen  to  give 
J  F  0        SF 

the  desired  overall  error  of  the  first  kind  of  the  desired 

(s) 

size.   Hence,  we  will  reject  Hq    if  for  some  F  e  S(m,s), 

cf)  _  >  c(a',s,s),  or  equivalently ,  if    max     <b  _  >  c(a',s,s) 
sF  FeS(m,s) 


58 


We  need  the  following  results,  the  first  two  of  which 
can  be  found  in  Bellman  [1970:115]. 

Lemma  3.4.1:  Let  A(raxm)  be  a  symmetric  matrix.  Then 
the  smallest  latent  root  of  A  may  be  defined  as  follows: 

ch  (A)  =  min  u'Au, 

where  u  is  a  (mxl)  vector. 

The  next  result  is  well  known  as  the  Poincare  separation 
theorem. 

Lemma  3.4.2:   Let  A (mxm)  be  a  symmetric  matrix.   Then 

for  any  matrix  F(sxm)  such  that  FF '  =  I 

ch.(A)  >  ch.(FAF')  >  ch    ,  .  (A) 
j         j  m-s+3 

for  j  =  1, 2, . . . ,s. 

We  need  Lemma  3.4.2  to  prove  the  following  lemma. 

Lemma  3.4.3:   Let  A (mxm)  be  a  symmetric  matrix.   Then 

max    min  u'FAF'u  =  ch  (A),  (3.4.1) 

F:FF'=I  u'u=l  S 

where  F  is  a  s  x  m  matrix,  and  u  is  a  m  x  1  vector. 

Proof:   Since  A  is  symmetric,  there  exists  an  orthogonal 

matrix  P (mxm)  such  that  P'AP  =  A  =  diag(ch1(A), 

ch  (A),...,ch  (A)),  and  hence,  for  any  F  such  that  FF '  =  I 

min   u'FAF'u  =   min   u'FAF'u, 
u'u=l  ii'u=l 

where  F  =  FP  and  FF'  =  FPP'F'  =  FF'  =  I.   Then  we  can 

rewrite  (3.4.1)  as 

max    min   u'FAF'u. 
F:FF'=I  u'u=l 


59 


Let  F . (sxm)  be  the  matrix  with  (F  +  ) . .  =  1  for  all  i,  and 

(F.)  .  .  =  0  for  all  i  4   j.   Then 
*  ij  J 

max    min   u'FAF'u  >   min   u'F*AF£u  =  ch  (A). 
F:FF'=I  u'u=l  u'u=1        '       *" 

Now  by  Lemma  3.4.2,  for  any  F  such  that  FF'  =  I ,  we  know 

that 

min   u'FAF'u  <  ch  (A), 
u'u=l  S 


so  that 

max    min  u'FAF'u  <  ch  (A)  . 
F:FF'=I  u'u=l  S 

Therefore,  it  follows  that 

max    min   u'FAF'u  =  ch  (A). 
F:FF'=I  u'u=l  ta 

We  have  seen  that  the  union-intersection  principle  leads 

(s) 

to    the    rule   which   rejects    H^       :    rank    (M)     <    s    -    1    in    favor   of 

H-fs)  :    rank    (M)    =    s    if        max  <b         >    c(a;s,s).      Note   that 

1  F£S(m,s) 

with   T  (mxm)  andF(sxm)     such   that   TT '    =   E    and   F    =   FT ,    then    for 

fixed   F    e    S (m,s) 

|  FHF  '    -    4>FEF  '|=0 

implies 

|FTT      HT'       T'F'    -    cj>FTT~    ET'~    T'F'|     =0, 

or 

|FT_1HT'-1F'    -    4>FF'|     =    0.  (3.4.2) 

Since  F  is  of  rank  s,  so  also  is  FF' (sxs),  and  thus,  there 

exists  a  nonsingular  matrix  S(sxs)  such  that  SFF's'  =  I. 


60 


So   with   F   =   SF   we    find    that    (3.4.2)     implies 

|ft"1ht,~1f'   -   <$>l\   =  0, 

and  clearly,  FF  '  =  SFF'S'  =  I.   Hence,  it  follows  that 

max     <J>  F  =   max    min{cf):|FT_  HT'~  F'-<t>l|  ^=0} 
FeS(m,s)   S     F:FF'=I   <J> 

max    min   u'FT-  HT'~  F'u, 
F:FF'=I  u'u=l 

with  the  final  equality  due  to  Lemma  3.4.1.   Now  using 

Lemma  3.4.3  and  the  fact  that  the  latent  roots  of  T~  HT' ~ 

are  the  roots  of  |h-<J>e|  =  0,  we  observe  that   max    tj)    =  <p 

FeS(m,s)  S      £ 

and  thus,  the  union-intersection  principle  leads  to  the  rule 

(s) 

which  rejects  H^    if  <\>      >    c  (a  ,s,s). 


3 . 5   A  Monotonicity  Property 
of  the  Power  Function 

The  test  procedure  developed  in  the  previous  sections 

depends  on  the  latent  roots,  <j>,  ,  <J>2,  .  .  .  ,  cj>  ,  of  the  random 

matrix  HE   .   The  distribution  of  these  roots  (see  James 

[1964]),  and  hence  the  power  function  of  our  test  procedure, 

depends  upon  the  latent  roots  of  the  corresponding  population 

matrix  (E+M)Z    as  parameters.   Let  6,  >6^  ^  . . .  >6   >  1 
v    '        r  12         m 

be  the  latent  roots  of  (E+M)E   ,  and  note  that  with  T  defined 


such  that  Z  =  TT' 


(Z+M)Z  1    -    61  I  =  0 


implies 

|M  -  (6-1)1  I  =  0, 


61 


so  that  |T~1MT"    -  (5-1) I |  =  0. 

Since  I    is  nonsingular,  T  is  also  nonsingular,  and  so  the 
rank  of  T   MT '    is  the  same  as  the  rank  of  M.   Hence,  M 

has  rank  of  at  most  s-1  if  and  only  if  6   =1,  and  testing 

la)  (a) 

the  hypothesis  H^   :  rank  (M)  <  s  -  1  against  H'   :  rank  (M)  =  s 

(s) 

is  equivalent  to  testing  the  hypothesis  111    '  :    6      =1  against 

(s) 

H,   :  6   >  1.   A  desirable  property  of  the  test  statistic  4> 

would  be  that  it  stochastically  increases  in  6  ,  and  thus, 

that  the  power  function  increases  monotonically  in  5  -   In 

this  section  we  not  only  show  that  <f>   stochastically  increases 

in  6  ,  but  also  that  it  stochastically  increases  in  each 

5.:  i  =  l,2,...,m.   This  more  general  result  will  be  utilized 

in  the  following  section. 

We  will  first  prove  the  result  for  the  largest  latent 

root,  4>,  .   That  is,  we  will  show  that  cf>,  stochastically 

increases  in  6 . :  i  =  l,2,...,m. 
l 

Lemma  3.5.1:   The  test  with  the  acceptance  region 
<j>,  =  ch,  (HE-1)  <  c 
has  power  function  which  is  monotonically  increasing  in 
each  population  root  6 .  . 

The  proof  of  Lemma  3.5.1  involves  the  followina  three 

results,  the  first  of  which  is  due  to  Anderson  [1955]. 

Lemma  3.5.2:   Let  y  ~  N  (0,Z,)  and  u  „  N  (0,I„), 
—    m  — '  1      —    m  — '  2  ' 

where  Z~  -  Z,  is  nonnegative  definite.   If  to  is  a  convex 
set,  symmetric  about  the  origin,  then  P  (Yew)  >  ?  (ueco)  . 


62 


Lemma  3.5.3:   Let  the  random  vectors  y. ,yn ,y   and  the 

matrix  U  be  mutually  independent,  the  distribution  of  y.  being 

N  (0,Z):  i  =  l,2,...,n.   Let  the  set  to ,  in  the  space  of 
m  — 

{y_,  ,y~ ,  .  .  .  ,y_  ,U},  be  convex  and  symmetric  in  each  y_.  given 

the  other  y_-'s   and  U.   Denote  by  P^  ,  (to)  the  probability  of 

the  set  co  when  £  =  Z  .  .   Then  whenever  Zn  -  Z,  is  nonnegative 

1  2     1  r 

definite,  Pv  (to)  >  Pv  (to). 
ll  z2 

Proof:   Since  Z,  and  Z  ,.  are  symmetric  and  X,  e  P   and 
1      2      J  1    m 

m 
l0    £   UP.   it  follows  that  there  exists  a  nonsingular  matrix 

K  such  that  KZ , K  '  =  I  and  KZ-K'  =  A  =  diag  ( 6,  ,  60  ,  .  .  .  ,  6  ). 
1  2  '   1   2      m 

m 

Since  it  is  assumed  that  Z0  -  Z,  e   U  P . ,  we  know  that 

2     X    j=0  3 

6j_  >  1:  i  =  l,2,...,m.   Then  y_.  =  K;^  -  Nm(0,i)  if  j  =  j 

and  y_*  -  Ky^  ~  Nm(0,A)  if  Z  =  Z2>   Let  oj*  =  {y_*  ,£* ,  .  .  -  ,£*  ,U: 

* 

(y-,  ,y9, .  .  .  ,y  ,U)  e  to},  then  Pv  (to)  =  P_  (to  )  and  P_  (to)  = 

* 

P.  (to  ).   So  without  loss  of  generality  we  can  take  £,  =  i  and 

Z2  =  A.   Let 

A.  =  diag(61,92 6.^,1,9.^ 9m)  , 

Ax  =  diag(ere2 ei-l^i'9i  +  i"-"em)' 

Ri  =  {^i:  (^1'^2'  -  *  '  '^n'U)  £  Ui;    Xn :  J^i  and  u  fixed}, 
where  8.  e  {1,6.}:  j  /  i.   Then  from  Lemma  3.5.2  it  follows 
that 

PA.(Ri'Zj:  ^i,U)  >  PA*(Ri|Zj:  j*i,U).  (3.5.1) 

Multiplying  both  sides  of  the  inequality  (3.5.1)  by  the  joint 


63 


density  of  the  temporarily  fixed  variables  and  integrating 
with  respect  to  them,  we  obtain 

PA  (co)  ^  PA*  (co)  . 

i        i 


Then  by  induction  we  have 


or  equivalently , 


P.,-  (co)  >  PA  (co)  , 


P   (co)  >  Py  (to) 

^1        L2 


Finally,  the  third  result  we  need  is  due  to  Das  Gupta, 
Anderson,  and  Mudholkar  [1964]. 

Lemma  3.5.4:   For  any  symmetric  matrix  B  (m><m)  the  region 
to  =  {A(mxn):  ch,  (AA'B)  <  c}  is  convex  in  A. 

Proof  (Lemma  3.5.1) :  Recall  that  H  ~  W  (Z+M,h,0)  and 
E  -  W  (Z,e,0).   Since  the  problem  is  invariant  under  trans- 
formations g.j.(E,H)  =  (KEK',KHK'),  we  may  assume,  without  loss 
of  generality,  that  I    +   M  =  A  =  diag (61 , &2 , . . . , 6  )  and  I  =  I. 
Then  we  can  write  H  =  YY  '  ,  where  Y  =  (y_,  ,y_2 ,  .  .  .  ,Y.y)    and 
y_.    ~  N  (0_,A)  :    i  =  1,2,.  ..,h,  independently.   So  the  accep- 
tance region  can  be  written  as  {Y:  ch,(YY'E   )  <  c}.   From 
Lemma  3.5.4  it  follows  that  the  acceptance  region  is  convex 
in  Y,  and  clearly  vie    see  that  the  acceptance  region  is  also 
symmetric  in  each  of  the  column  vectors  of  Y.   Note  that  the 
vectors  Xi  'X? '  •  •  • 'Yy,  an<^  E  are  mutually  independent,  and  the 

distribution  of  y.  is  N  (0,A).   The  result  now  follows  from 
■^•l     m  — 

Lemma  3.5.3. 

The  main  result  of  this  section  follows  from  a  result 
due  to  Anderson  and  Das  Gupta  [1964] . 


64 


Lemma  3.5.5:   Suppose  V  ~  W  (Z,,v,0)  and  U  ~  W  (Z2,u,0), 

independently.   Let  A,  >  X_  >  . . .  >  A   be  the  latent  roots 
^       2  12  m 

of  UV   ,  and  let  to  be  a  set  in  the  space  of  A,,A~,...,A   such 

^         1   2     '  m 

that  when  a  point  ( A,  ,  A_  ,  .  .  .,A  )  is  in  as,  so  is  every  point 

*   *       *  * 

(A,,  A _.,...,  A  )  for  which  A.  <  A.:  i  =  l,2,...,m.   Then  the 
l   z      m  11 

probability  of  the  set  co  depends  on  £,  and  Z_  only  through  the 
latent  roots  of  £_Z,"  and  is  a  monotonically  decreasing  func- 
tion of  each  of  the  latent  roots  of  2~Z,  . 

Clearly,  the  set  co  =  {  (cf>,  ,  <J>2  ,  .  .  .  ,  cf)  )  :  <j>  <  c}  satisfies 
the  conditions  of  Lemma  3.5.5,  so  it  follows  that  the  proba- 
bility of  the  set  co  is  monotonically  decreasing  in  each  of 

the  latent  roots  <5,,6~,...,5   of  (Z+M)I   .   In  other  words, 
1   2       m  ' 

the  power   function   of  the  s    largest  root  test  is  a 

monotonically  increasing  function  of  6.:  i  =  l,2,...,m. 

We  now  know  that  as  6   -*■  °° ,  P(*  >c)  increases  mono- 

s         s 

tonically.   We  will  show  that  actually,  as  5  +   °° , 

P  (<$>    >c)    -*■    1,  and  hence,  for  sufficiently  large  values  of 

(s) 

6   the  probability  of  rejecting  H^   :  6   =1  will  be  arbi- 
trarily close  to  one.   Recall  that  there  exists  a  nonsingu- 
lar  matrix  K  such  that  KEK '  ~  W  (l,e,0)  and  KHK'~W  (A,h,0). 
Let  K, (mxm)  be  such  that 

K, (mxm)  =  diag (ak, , ak„ , . . . , ak  ,1,...,1). 
Note  that  I^KHK'K'  ~  Wm  (K-^K^  ,h,  0)  and 

K  AK{  =  diag(a2k2S   a2k26    .. ,a2k26  ,6   ,, 6  ) 

x   -1  -L-L     £    I  s  s '  s  +  1'     '  m  ' 

2  2 

so  that  as  a  +  °°,  a  ^^i   +   °°,  and  hence  ch .  (K,AKj)  -*  °° ,  for 


65 


i  =  l,2,...,s.   Thus,  we  need  to  show  that 

P  (ch  (K,KKK'K'  (KEK')~  )  >  c)  ■>  1  as  a  +   °° .   The  following 

si     1 

lemma  provides  the  necessary  result. 

Lemma  3.5.6:   Let  V  ~  wm(si'v'°>  and  u  ~  Wm^Z2/U'0)/ 
independently,  and  let 


K 


, (mxm)  =  diag (ak, ,ak- , . . . , ak  , 1, . . . ,1) 


Then   P  (ch  (K-^K-Jv   )  >  c)  -*•  1   as  a  + 


Proof:   Let 


U12A  fVll  V12 


U22V  VV21     V22 


where  U,  ,  is  s  x  s,  U_,  is  (m-s)xs,  U, ~  is  s  x    (m-s) ,  and 
U22  is  (m-s)  x  (m-s)  .   Similarly,  define  V^,  V2l'    V12'  and  V22 
Let  F^  be  the  s  x  m  matrix  with  (FA)  .  .  =  1:  i  =  1,2, ...  ,s 
and  (F*)^  =0:  i  #   j,  and  let 

K2  (sxs)  =  diag (k,  ,k2, . . . ,k  ). 
Recall  from  Section  3.4  that 

ch  (K,UKjV-1)  =    max    min{ X : | FK  UK'F '- AFVF ' [  =  0} 
FeS(m,s)  X        L      l 

>   min{X:  | F*K, UK'FJ-AF*VF£ |  =  0} 
X         ~       L 
.  o 
=  min{X:|a  K2uiiK2~ Avil I  =  °^ 

X 

=  oc2chs(K2UllK^V-l). 

Thus, 

P(chs(K1UK-[V~1)  >  c)  >  P(a2chs(K2U11K2'V^)  >  c) 

=  p(chs(K2U11K^V^)  >  c/a2), 


66 


and  K-U-,  ,  K  'V,  ,  is  positive  definite  with  probability  one, 


so  that 


lim   P(ch  (K,UK/V   )  >  c) 
_        s   1   1 

a->oo 

=   lim   P(ch  <K2U11K2V11)  >  c/a2) 

a   •*■   °° 

=   P(chs(K2Ui;LK^V~^)  >  0)  -  1. 

3.6   The  Limiting  Distribution  of  $ 

We  have  seen  that  the  likelihood  ratio  test  for  testing 
the  hypothesis  HqS)  :  rank  (M)  <  s  -  1  against  k|s)  :  rank  (:M) 
=  s  is  based  on  the  s    largest  root,  $    .       However,  if  <J>   is 
to  be  used  as  a  test  statistic,  it  is  necessary  to  compute 
the  significance  level,  a,  where 

a  =  sud   P  (c|>   >  c|H(s)  )  . 

„(s)     s       ° 
HQ 

With  6,  >  60  >  ...  >  5   as  the  latent  roots  of  (Z+M) Z~ 
I     z  m 

(s) 

the  null  hypothesis  can  be  written  H'   :  6   =  1 ,  or  more 

precisely,  H*s)  :  6,  >  6«  >  ...  >  6   ,  >  1,  6   =  5  _,__  = 
r       J '       0      1     2        -   s-1       s     s  +  1 

...  =  6   =  1.   We  will  write  <J>  .m  (61 ,  62  ,.  •  .  ,  6m)  to  indicate 

that  <$>      is  the  s    largest  root  of  m  roots  and  depends  on 
s 

the  population  roots  &^ , 82 , . . . , 6m.   Then  we  may  write  a, 
the  significance  level,  as 


sup         P(*    (6T ,69,. . . ,6   , ,1,.. . ,1)  >  c) 

:...*6    ,>1 

s-1 

But  we  saw  in  the  previous  section  that  <t>e  is  stochastically 


1*62*'  --^s-l-1 


s 


increasing  in  each  5 . :  i  =  1 , 2 , . . . ,m.   It  then  follows  that 


67 


a  =  P  (*s-m(o°,C°'  "  '  *  '°°'1'  '  •  '  ,:L)  >  c)  ' 
where  <{>    (°°  ,°°  ,  .  .  .  ,°°  ,  1 ,  .  .  .  ,  1)  denotes  the  random  variable 

which  has  the  limiting  distribution  of  d>    (<$-,,  <5_  ....  ,<5   ,  , 

3  rs:m   1   2'    '  s-1 

1,...,1)  as  5.  +  <»:  i  =  1,2,..., s-1.   So  the  problem  at 

hand  is  to  determine  the  distribution  of   d>    (°° ,°°,  •  .  .  ,°°/ 

Ys:m   '  '    '  ' 

1,...,D. 

Recall  that  E  ,  W  (Z,e,0),  H  ~  W  (Z+M,h,0),  and  there 

exists  a  matrix  K  such  that  KEK '  =  I  and  K(E+M)K'  =  A  = 

diag  (6,  ,  6-, ,  .  .  .  ,  6  ).   If  we  define  E  and  H  as 
l   z      m 

E  =  A_2KEK'A~2  ~  W  (A_1,e,0), 
m 

H  =  A~2KHK'A"'2  ~  W  (l,h,0), 
m     '   ' 

-k  -3<      -3-  _J< 

where   A  - 2   =  diag(<5,     ,6_  2,  .  .  .  ,  6     2)  ,    then   clearly   6         (6,  , 
12  m  J    Ts:m      1 

6n ,...,&    )    =   ch    (HE~    )    =   ch    (HE~    ).       Hence,  if   we    let 
^  m  s  s 

E      ~    W    (A     ,e,0)  ,    where  A       =   diag  (n<5  ,  ,n6  _  ,  .  .  .  ,nS  „    ,  ,1,  .  .  .  ,1)  , 
niun  n.  xz  <z>™  x. 

then   we   need   to    find   the    limiting   distribution   of    ch    (HE-    ) 

3  s    n 


as  n  •>  oo.   Since  we  can  write  E   =  Y  Y',  where  Y   =  (y,v 

n  n'         n    xl 


(n) 

—   x   x      win:  x  trj   i    —iv  . 

n 

(n)        (n)  %       (n)         -1 
I2       '   -••  'le      )     and  ^i   ~  Nm(-^'  n  ):  i  =  l»2,...,e,  inde- 


pendently, we  can  restate  the  problem  as  that  of  determining 


the  limiting  distribution  of  ch  (H (Y  Y' )   ).   Consider  the 

s    n  n 


following  elementary  result. 

Lemma  3.6.1:   Ifu  ~  N(0,l/n),  then  u   — — >  u, 
where  u  is  a  degenerate  random  variable  with  all  of  its 
probability  at  zero. 

We  also  need  the  following  results,  the  first  of  which 
is  well  known  as  the  continuity  theorem  (see,  for  example, 
Breiman  [1968:236] ) . 


68 


Lemma  3.6.2:   Let  x, ,x» ,x_ , . . .   be  a  sequence  of  random 


vectors.   Then  x  >  x  if  and  only  if 

— n     —  J 


lim  E[exp(ix't)]  =  E[exp(ix't)] 
n-><» 


for  all  t  where  i  =  /-T. 

Lemma  3.6.3:   Suppose  that  as  n  ■*  °° ,  x.    >x.: 

i  -,  j  r  (n)   (n)       (n)  ,  ,  , 

j  =  1 ,  2  ,  .  .  .  ,m,  and  suppose  {x,   ,X->  >  •  •  •  t i£    •>  are  mutually 

independent  for  all  n.   Then 

(n) 


(n) 


2l 


(n) 


x2 


(n) 
x 

L_  — m  ■ 


^2 


x  J 

v-  — m 


Proof:   Note  that  it  follows  from  Lemma  3.6.2  that 

lim  E[exp  (ix.(n)  t . )  ]  =  E[exp  (ix!t .)  ]  . 
Also,  because  of  independence, 


E[exp(ix(n)  't)]  =  E[exp(i  E  xfn)'t.)] 

j  =  1  1     D 


=   n  E[exp(ix!n'  t.)], 
j=l         3  3 


lim  E[exp(ix(n)  't)  ]  =   n   lim  E  [exp  (ix  fn)  't  .  )  ] 
n->«>  j=l  n^-°°  ■*     -* 


=   n  E[exp(ixft .)  ] 


j=l 


3-3 


69 


=  E[exp(i  I    x'.t  .)  ] 

=  E[exp(ix't) ] . 

The  result  now  follows  from  Lemma  3.6.2. 

From  Lemma  3.6.1  and  Lemma  3.6.3  we  observe  that 

Y  — >  Y  with 
n 


2 

where  the  elements  of  Y,  ((s-l)xe)  are  all  equal  to  zero  with 

probability  one,  and  Y„  =  (y2-i  'Y.70 ' '  '  '  '%-2    ^    witn 

y~  .  ~N    ,  ,  (0  ,1)  :  i=l,2,...,e,  independently. 
■*-Zi  m-s  +  I  — 

Consider  the  following  result,  the  proof  of  which  can  be 
found  in  Ostrowski  [1973:334]. 

Lemma  3.6.4:   Let  A(nxn)  and  B(nxn)  be  two  matrices,  and 

suppose  the  latent  roots  of  A  and  B  are  A.  and  A  '.  :  i  =  1,2, 

...,n,  respectively.   Put 

N  =     max     ( I  a .  . |  , | b .  •  | )  , 
l^i^n,l^jin    1D     1D 

and 

1   n  n 

6=4t   Z  E  la. .-b. . I . 

nN  i=l  j=l   ^   ^ 

Then  to  every  root  A.'  of  B  there  belongs  a  certain  root 

A  .  of  A  such  that  we  have 

1 

|A^-Ai|  <  (n+2)N61/n. 
Further,  for  a  suitable  ordering  of  A.  and  A.'  we  have 

I  A. -A.'  I  £  2  (n+l)2N61/n. 
1  1   1  ' 


70 


Lemma  3.6.5,  Corollary:   If  A  is  an  n  x  n  matrix,  then 
for  each  i  ch. (A)  is  a  continuous  function  of  the  elements 
of  A. 

Lemma  3.6.6,  Corollary:   Let  A  be  an  n  x  n  matrix  and  B, 
an  n  x  p  matrix.   Then  the  roots  of  the  equation 

| A- ABB' |=0  (3.6.1) 

are  continuous  functions  of  the  elements  of  A  and  B  except 
at  B  such  that  | BB  '  |  =0. 

Proof:   Let  X.:    i  =  l,2,...,n  be  the  roots  of  (3.6.1).   Then 
when  |  BB '  ]  4-   0,  it  follows  that  these  roots  are  also  the 
latent  roots  of  A(BB')   .   So,  from  Lemma  3.6.5,  for  each  i 
A.  is  a  continuous  function  of  the  elements  of  A (BB ' ) 
But  clearly,  when  [ BB ' |  #0,  the  elements  of  A(BB')    are 
continuous  functions  of  the  elements  of  A  and  B.   Hence,  for 

each  i  X.    is  a  continuous  function  of  A  and B  except  when 

i 

| BB  '  j  =  0 . 

We  need  one  final  result  involving  the  limiting  distri- 
bution of  a  function  of  random  vectors  (see  Mann  and  Wald 
[1943]  )  . 

Lemma  3.6.7:   Let  x  >x,  and  let  g (x)  be  a  Borel 

— n     —  3  — 

measurable  function  such  that  the  set  R  of  discontinuity 
points  of  g  (x)  is  closed  and  P  (x£R)  =  0.   Then 

g(xn)  — ">  9(x)  ■ 
Now  recall  that  we  seek  the  limiting  distribution  of 

ch  (H(Y  Y')   ).   In  order  to  use  Lemma  3.6.7  it  is  neces- 
s     n  n 

sary  to  show  that  ch  (H(YY')   )  is  continuous  with 


71 


probability  one  under  the  distribution  of  (H,Y) .   Now  with 


■11 


l12 


H  = 


VH21     n22/ 
where  H,,  is  (s-1) x (s-1) ,  H12  is  (s-1) x (m-s+1) ,  H21  is 
(m-s+1) x (s-1) ,  and  H22  is  (m-s+1) * (m-s+1) ,  the  roots  under 
the  distribution  of  (h,Y)  are  the  solutions  to 


H 


11 


H 


12 


(0)     (0) 


-  * 


H, 


H. 


(0) 


Y  Y  ' 
x2  2* 


=  0 


(3.6.2) 


l21      22 

Since  H  is  nonsingular  with  probability  one,  we  may  put 
/'G- 


H"1  =  G  - 


11 


G12> 


V*21      22 

so  (3.6.2)  can  be  written 

(   (0) 


I   - 

m 


G12Y2Y2 


(0) 


C        V  V  ' 

b22  2  2 


=  0, 


s-1    -*G12Y2Y2 


(0)     Vs+r*G22Y2Y2 


Thus,  it  must  be  true  that 


I    ..  -  4)G„9Y„Y'   =  0, 
m-s+1      22  2  2 


=  0 


G22  "  *Y2Y2|  =  ° 


(3.6.3) 


-1, 


Then  we  see  that  with  probability  one  ch1(H(YYf)   ), 
ch„  (H(YY')"1)  ,  .  .  .  ,ch  _1(H(YY')~1)  are  undefined,  and 


72 


ch  (H(YY')   )  is  now  the  largest  solution  to  (3.6.3);  that  is, 
since  YY '  is  of  rank  m-s+1  with  probability  one,  there  are 
only  m-s  +  1  solutions  to  |H-<J>YY'|  =0.   It  can  be  shown  (see, 
for  example,  Graybill  [1969:165])  that  G22  =  (H22-H2,h7 }^12) _1 > 
since  H   -H--,  H,  ,  H,  ~  is  nonsingular  with  probability  one, 
so  that  (3.6.3)  can  be  written 

|H22-H21H~lfi12  -  4>Y2Y^|  =  0. 

Clearly,  Y~Y^  is  also  nonsingular  with  probability  one, 

and  hence  by  Lemma  3.6.6  ch  (H(YY')   )  is  continuous  with 

probability  one  under  the  distribution  of  (H,Y) .   The  set  of 

discontinuity  points,  R,  is  closed,  since   R  =  {(H,  Y)  :  |  Y-Y- |  =0} 

Note  also  that  as  is  well  known  (see,  for  example,  Anderson 

[1958:85])  H22-H21H~^H12  ~  W    +1  (I ,h-s+l , 0)  .   Therefore, 

from  Lemma  3.6.7  since  (H,Y  )  >   (H,Y),  it  follows  that 

<$>         (°°,°°,  .  .  .  ,°°,1,  .  .  .  ,1)  -  $,     ,  ,  (1,1, . . . ,1)  , 
rs:m        tii         ii         yl:m-s+l        iii 

where  (j>,   _   ,  (1 , 1 ,  .  .  .  ,  1)  denotes  the  distribution  of  the 

largest  root  of  |w,-d>W»|  =  0,  where  W,  ~  W    ,  ,  (I  ,h-s+l ,  0)  , 
3  '  1    2 '  1     m-s+1 

and  W_  ,  W  _   , (l,e,0) ,  independently. 

So  in  testing  H^  '  :  rank  (M)  <  s-1  against  HJ  '  : 
rank  (M)  =  s  we  choose  as  our  critical  value  c(a,m,s) ,  where 
P  (ch,  (W-,W«  )  >  c(a,m,s))  =  a.   By  so  doing  we  will  guarantee 

that 

fs) 
sup  P  (*s.m(<S1'52'  *  '  *  "5m)  >  c(ct 'm's)  I  hq   )  =  a- 
H(s) 

Charts  and  tables  of  the  distribution  of  the  largest  root, 
0, ,  of  | W, -9 (W, +W_) |  =  0  are  available  (see,  for  example, 


73 


Morrison  [1976:379],  Pillai  [1965,1967]).  These  may  be  used 
to  calculate  c(a,m,s),  since  8,  =  <J>,  /  (l+<j>,  )  ,  where  <j>,  is  the 
largest  root  of  |  W ,  —  cJ>W _  |  =  0. 

In  order  to  determine  the  rank  of  M,  a  sequential  test 
procedure  is  used.   To  illustrate  this  procedure,  we  will 
return  to  the  example  presented  in  Section  2.3.   Recall  that 
D  =  diag(142.729,  29.6669,  .91847,  .625404),  h  =  20,  e  =  105, 

so    that    <}>,    -    27.1865,    cf>2    =    5.65084,    <J>3    =    .174947,    and   <J>4    = 

(4) 

.119125.   First  we  consider  testing  the  hypothesis  HQ   : 

(4) 

rank  (M)  <  3  against  H^   -  rank  (M)  =4.   The  null  hypothesis, 

H^4)  ,  is  rejected  if  <t> A    >    c(.05,4,4),  where  c(.05,4,4)  = 
17  F(17,105, .05) /105,  and  F (17 , 105 , . 05)  is  the  constant  for 

which  P(F(17,105)  >  F  (17  ,  105  ,  . 05) )  =  .05  if  F(17,105)  ~ 

17 
F,ot-(0).   Thus,  c  (.05,4,4)  is  approximately  equal  to  .28, 

and  clearly,  $.    =    .119125  <  .28,  so  that  we  do  not  reject 

(4)  (4) 

H^   .   Since  H*    is  not  rejected,  we  now  consider  testing 

the  hypothesis  H*3)  :  rank  (M)  <  2  against  H-[3)  :  rank  (M)  =  3. 

(3) 

The  null  hypothesis,  H*  ',  is  rejected  if  <f>_  >  c(.05,4,3). 

Using  the  charts  mentioned  earlier  we  find  that  c (.05,4,3) 

is  approximately  equal  to  .36.   Since  $ -.    -    .174947  <  .36, 

(3) 

we  do  not  reject  H*    and  so  next  consider  testing  the 

hypothesis  Hg2)  :  rank  (M)  <  1  against  H-J  ':  rank  (M)  =2. 

We  find  that  c (.05,4,2)  is  approximately  equal  to  .42,  and 

(2) 

therefore,  since  <}>-  =  5.65084  >  .42,  we  reject  K'    and 

conclude  that  the  rank  of  M  could  very  reasonably  be  taken 
as  two. 


74 


The  procedure  above  is  open  to  objections  on  the  grounds 
that  the  significance  level  for  the  test  criterion  has  not 
been  adjusted  to  take  into  account  the  fact  that  a  sequence 
of  hypotheses  is  being  tested,  with  each  one  dependent  on 
the  previous  ones  not  being  rejected.   The  mathematical  com- 
plications involved  in  controlling  the  overall  error  make 
such  an  adjustment  virtually  impossible  to  carry  out. 


CHAPTER  4 


MAXIMIZATION  OF  THE  LIKELIHOOD  FUNCTION 
WHEN  £  =  a    I 


4 . 1   The  Likelihood  Function 

Suppose  the  vectors  x .  .  (mxl)  :  i  =  1 , 2 , . . . , g; j=l , 2 , . . . ,n 
can  be  modeled  by 

x.  .  =  u  +  Lf .  +  z.  .  ,  (4.1.1) 

wherein  jj(mxl)  is  a  fixed  but  unknown  vector,  L  (mxp) 

is  a  fixed  but  unknown  matrix,  f.  ~  N  (0,1):  i  =  1,2,. ..,g, 

2 
and  z  .  .  ~  N  (0,o  I):  i  =  l,2,...,g;i=l,2...,n.   We  assume 
— xj  m  —  3  J 

that  the  set  of  random  vectors  {f,  ,f  „,...,  f  ,  z  -,,,...,  z   } 

— 1'— 2     — g  — 11     —  gn 

are  mutually  independent.   Thus,  x.  .~  N  (jj,V)  with 

2 
V  =  LL '  +  a  I .   For  any  orthogonal  matrix  P (pxp)  it  follows 

that  V  =  LL'  +  a2I  =  LP  (LP)'  +  a  I,  so  that,  while  LL'  is 

unique,  L  is  not  unique.   In  this  section  we  will  derive  the 

2 

likelihood  function  for  \i,    LL '  ,  and  o  . 

By  methods  identical  to  those  presented  in  Section  2.1 

_  2 

it  can  be  shown  that  (x   ,E,H)  is  sufficient  for  (_p,a  ,nLL') 

where 

g       n 
x        =      £         E    x../gn    ~    Nffl  (j±,  (1/gn)  (cTI+nLL ')  )  , 

i=l    j=l 


g         n 


2. 


E=      I        I    (x,^-x.    )  (x^-x.    )  '-.   W_(a   l,e,0)  , 

75 


i=1    j  =  1     "ID    -i.      -ID    -x. 


76 


a   _    _     _    _  2 

H  =  n  Z  (x.  -x   ) (x.  -x   ) '„  W  (a  I+nLL',h,0) , 
,  —  1.  — .  .   — l.  — . .     m 
i=l 

and  e  =  g(n-l);  h  =  g-1.   In  addition,  if  we  let  c  denote 

a  constant,  we  find  that  the  density  of  E  can  be  written 

f(E)  =  c|E|Js(e"m~1)  expi-htrio2!)'^] 

i ,-,  i  h  (e-m-1)     r  ,  _,     v  /-,^21 
=  c|E|  exp[-(  I    e..)/2a  ] 

i=l  1X 

m       2 
=  g  (  S  e..;a  )g2(E). 

1=1 

m 

Hence,  from  the  set  {e, , ,e, _ , . . . ,e     , ,e  }  b=trE=  £  e  .  . 

11'  12'    '  m,m-l'  mm       .  t  n 

2  1=1 

is  sufficient  for  a  . 

We  may  assume,  then,  that  we  have,  independently, 

x   ,  b ,  and  H  where 

*..  -  Nm(iMl/gn)(a2I+nLL')), 

b/a   ~  xBf 

H  -  W  (a2I+nLL' ,h,0)  , 
m 

2 

and  B  =  mg(n-l).   The  problem  is  to  estimate  jj,  a  ,  and  LL '  , 

2 

or  equivalently ,  to  estimate  jj,  a  ,  and  M  where  M  =  nLL '  . 

We  have  seen  that  L  is  not  uniquely  defined  and  so  if  LL '  is 

an  estimate  of  LL ' ,  then  any  L,  such  that  LL '  =  LL '  ,  is  an 

2 

estimate  of  L.   The  likelihood  function  of  (jj,a  ,M)  can  be 

expressed  as 

*(Z       b  h)    -  Km(I'h) %3-l,H|Mh-m-l) 

|  (2tr/gn)  (a    I+M)  |  2\o   I+M|  2    (2a    )  2pr  (%0) 

x    exp[-Jsgn(x      -u  )  '  (a  2I+M)  _1  (x      -y  )  -hb/a  2-^tr  (a2I+M)  -1H]  , 


77 


where,  as  before, 

K_1(l,h)  =  2hmh   ^m(m-D   n  T(h(h-j  +  l)). 
m  j=l 

The  logarithm  of  the  likelihood  function,  omitting  a  function 

of  the  observations,  is 

-b/2o  -h$lno2-htr (a2I+M) -1H-^h£n | a2I+M[ 

-%£n|  a2I+M|-^gn(x   -y)  '  (a2I+M) _1 (x   -y)  . 

We  seek  the  solution  which  maximizes  the  equation  above,  or 

equivalent ly,  the  solution  which  minimizes 

b/a2+6£na2+tr (a2I+M)-1H+(h+l) In | a2I+M|  (4.1.2) 

+gn(x   -y  )  '  (a2I+M)_1  (x   -u_)  . 

2 
If  we  ignore  the  constraints  that  a      is  positive  and  M  is 

nonnegative  definite  and  seek  the  stationary  values  of  (4.1.2) 

2 
over  all  possible  (v,o    ,M)  ,  we  find,  upon  taking  the  partial 

2 

derivatives  of  (4.1.2)  with  respect  to  a  ,  M,  and  y  and 

setting  them  equal  to  zero,  that 

-b/ (a2)2+B/a2-tr (a2I+M) -1H (a2I+M) -1+ (h+1) tr (a2I+M)_1 
-tr  (gn(a2I+M)_1(x   -y )  (x   -y )  '  (c2I+M)  _1)  =  0, 

-  (a2I+M)_1H(a2I+M)_1+(h+l)  (a2I+M) _1 

-gn(o2I+M)_1  (x   -y)  (x   -y )  '  (o2I+M)  _1  =  (0), 

gn(a2I+M)_1(x   -y)  =  0, 
for  which  the  solutions  are 
u,  =  x   , 
o2    =  b/6, 
M  =  (h+l)_1H  -  (b/6)  I. 


78 


Since  M  is  a  nonnegative  definite  matrix,  its  maximum  likeli- 
hood estimate  must  also  be  nonnegative  definite,  so  the  solu- 
tions above  are  the  maximum  likelihood  estimates  only  if 

(h+1)   H  -  (b/B)I  is  nonnegative  definite.   Although  the 

2 

solutions  for  _y  and  a   are  the  natural  unbiased  estimates, 

the  solution  for  M  is  not  since  E  (M)  =  (h+1)   (hH-o  I). 
In  addition,  we  observe  that  E (M)  is  also  not  necessarily 
nonnegative  definite. 

Using  the  principle  of  marginal  sufficiency  referred 

to  in  Chapter  2,  we  see  that  (b,H)  is  marginally  sufficient 

2 
for  (a  ,M) .   Hence,  we  choose  to  use  the  marginal  likelihood 

2 
function  of  (a  ,M)  instead  of  the  likelihood  function  of 

2  2 

(y,a  ,M) .   The  marginal  likelihood  function  of  (a  ,M)  can  be 

expressed  as 

f(b,H>  ._ llU'h2\s b%6-l,H|Mh-m-l) 

x  exp[-b/2a2-J5tr  (a2I+M) -1H]  . 
The  logarithm  of  the  likelihood,  omitting  a  function  of 
the  observations,  is 

-b/2a2-^B£na2-^tr(a2I+M)-1H-%h£n| a2I+M| , 
and  we  seek  the  solution  which  maximizes  this  equation,  or 
equivalently ,  the  solution  which  minimizes 

b/a2+6£no2+tr (a2I+M)_1H+h£n| a2I+M| .  (4.1.3) 

2 
Again,  if  we  ignore  the  constraints  that  a   is  positive  and 

M  is  nonnegative  definite  and  seek  the  stationary  value  of 

2 
(4.1.3)  over  all  possible  (a  ,M) ,  we  find,  upon  taking 


79 


2 
the  partial  derivatives  of  (4.1.3)  with  respect  to  0      and  M 

and  setting  them  equal  to  zero,  that 

-b/ (a2)2+8/a2-tr (a2I+M) _1H (a2I+M) -1+htr (a2I+M)_1  =  0, 


and 


-  (a2I+M)-1H(a2I+M)-1+h(a2I+M)  1=  (0)  , 


for  which  the  solutions  are 
a2  =  b/B, 
M*  =  (l/h)H  -  (b/3)l. 

Note  that  these  solutions  are  the  natural  unbiased  estimates 

2 
of  a   and  M  and,  clearly,  E(M^)  =  M  is  nonnegative  definite. 

Hence,  we  choose  to  continue  our  work  with  the  marginal  like- 

2 
lihood  function  of  (a    ,M) .   Since  M  is  nonnegative  definite, 

the  solutions  above  are  the  maximum  likelihood  estimates 

only  if  (l/h)H  -  (b/8)I  is  also  nonnegative  definite.   In  the 

next  section  we  will  derive  maximum  likelihood  estimates  for 

2 
a      and  M  which  are  valid  for  all  possible  (b,H). 


4 . 2   The  Maximum  Likelihood  Estimates 

In  this  section  we  seek  the  maximum  likelihood  estimates 

2  2  s 

of  a   and  M  subject  to  the  constraints  a   >  0  and  Me     UP.. 

j-0  3 

Recall  that,  aside  from  a  constant,  the  logarithm  of  the 

2 

likelihood  function  of  (a  ,M)  is 

-h/2a2-h&lno2-htr(o2I+M)~1E-hhln\ g2I+M| . 
We  seek  the  solution  which  maximizes  the  equation  above,  or 
equivalently ,  the  solution  which  minimizes 

b/a2+0£na2+tr (o2I+M) -1H+h£n I a2I+M| 


80 


2  S 

subiect  to  a   >  0  and  Me  U  P..   Note  that  this  can  be 

j-0  3 

rewritten  as 

tr  (a2I)-1(-I)+  -£n|a2l|  +tr  (a  2I+M)  _1H+Mn|  a2I+M|  .      (4.2.1) 
v       m  '   m   '    '  '      ' 

Since  (b/6)I  and  H*  =  (l/h)H  are  both  symmetric  matrices, 

m 
and  (b/3)I  eP   and  H^e  UP.,  there  exists  a  nonsingular 
m       *  j=0  J 

matrix  K(mxm)  such  that  K (  (b/3) I) K '  = I  and  KH*K  '  =  D  where 
D  =  diag (d, ,d„ , . . . ,d  )  and  d,>d2>  . . .  >d   >  0  are  the  latent 
roots  of  H*((b/3)I)-1  =  (@/b)H*.   Then  with  a2  =  Ba2/b  and 
M  =  KMK',  (4.2.1)  can  be  rewritten 

£  tr  K'~1(a2I)"1K-1+  -In  I a2I I +htrK '_1 (a2I+M) -1K-1D 
m  m 

i  2    i 
+  h£n[ a  I+M| 

=  -^[tr  (a2I)_1+S,n]a2l|  ]  +h  [tr  (a2I+M)  _1D+£n  I  a2I+M|  ] 
m 

S      i  i  2 
- (£+h) in  K 
m 

=  <J>  (a2I,a2I+M;D,-,h)-  (-  +h)£n|K|2, 

where  $  is  the  function  discussed  in  Section  2.2.   Thus,  the 

~2   ~2 
problem  has  been  reduced  to  that  of  minimizing  <j)  (a  I, a  I+M; 

B  2  ~   s 

D,  — ,h)  subject  to  a   >  0  and  MeU  P.,  or  equivalently , 

m  j=0  3 

(a2I,a2I+M)  £  C      since  C   =  { (A,B) :  AeP  ,  BeP  ,  3-A£  UP.}. 
s        s  mm      -;_qJ 

2    2-  2 

Now  for  fixed  (a  I, a  I+M)  e  C  consider  <|>(P(a  I)P'r 

P  (o2I+M)Pr  ;D,-,h)  =  A  (a2I,P(a2I+M)P' ;D,-,h)  for  all  orthog- 
'm  m 

onal  P.   Note  that  this  is  minimized  with  respect  to  P  when 

2   ~  -1 
tr  P(o  I+.M)   P'D  is  minimized.   So  from  Lemma  2.2.1  it  fol- 
lows that  all  stationary  points,  and  therefore  the  absolute 

7  1       ~  B  2 

minimum,  of  $  (a  I  ,P  (a  I+M  )  P'  ;  D  ,-,h)  occur  when  P"(0  I+M)P' 


81 


is  diagonal.   Hence,  in  searching  for  the  absolute  minimum 

9     0  ft  "}0~" 

of  <$>  (o   1,5  I+M;D,  — ,h)  over  all  (a  I,  a  I+M)  e  C     we  may  assume 

2 
that  a  I+M  is  diagonal.   This  result  also  follows  immediately 

from  Lemma  2.2.12. 

Now  with  V   =   diag (vn , v„ , . . . ,v    )    and    f . (v . )    =   d . /v . 
3  *    1'    2  m  11  11 

+  £n  v  .  ,    consider  minimizing 
l  v 

R  1  m     d . 

4>(uI,V;D,-,h)    =    3(-   +£nu)+h    E     (—   +    £nv.)  (4.2.2) 

m  u  .    ,    v.  i 

i=l      l 

1  m 

=    B(i  +£nu)+h   Z    f.(v.)  , 

i=l 
subject   to    (uI,V)   eC    .      The   constraint    (ul ,V )  zC      can   be 
equivalently      written   as 

v.    >    u    >    0  for      i    =    1,2,.. .,m,       (4.2.3) 


and 

v.  =  u         for   i  e  J ,  (4.2.4) 

where  J  c  {l,2,...,m}  is  a  set  which  has  at  least  m  -  s 
elements.   Now 

df  (v  ) 

—J —  =  (l-d./v  )/v., 

dv-         11    i 

so  that  the  function  f.  decreases  monotonically  in  v.  for 

l  2  l 

v.  e"(0,d.l,  increases  monotonically  in  v.  for  v.  e   [d.,00), 
li  J  i       11 

and  is  minimized  over  all  v.  e  (0,°°)  when  v.  =  d.  .   Thus, 

i  11 

the  unrestricted  minimum  of  (4.2.2)  occurs  when  u  =  1  and 

V  =  D.   It  is  evident  from  the  structure  of  f .  that  if  the 

l 

unrestricted  minimum  does  not  satisfy  the  constraints 
(4.2.3)  and  (4.2.4),  then  the  restricted  minimum  will 


82 


occur  when  u  =  v.   =  v.   =  ...  =  v.   for  some  set  of  integers 

{i. ,i„ / . . . ,i,  } <3 {1,2 , . . . ,m} .   We  need  to  determine  k,  the 
number  of  integers,  and  also  we  need  to  know  exactly  which  k 
integers  from  amongst  the  integers  l,2,...,m  comprise  the 
set  {i,,i2 , . . . ,ik> . 

First,  we  will  consider  the  constraint  given  by  (4.2.3). 
Let  the  variable  r  be  defined  in  the  following  manner.   If 

l<d<d   .  < .  .  .  <  d,  ,  then  let  r  =  0 .   If  d   <  1  <dm  .  <  .  .  .  <  d,  , 

m   m- 1        1  m      m- I        1 

then  let  r  =  1.   If  d  <  . . .<  dL  ..,<  1<  d   .   <  ...<  d-.,  then 

m        m-t+l      m-t  i 

let  r,  1  <  r  <  t,  be  the  smallest  value  for  which 

m 
d    >  (B+h    l        d.)/(3+rh). 
m_r        j=m-r+l  3 

Finally,  if  d   <d   ,<...<  d,  <  1,  then  let  r,  1  <  r  <  m  -  I, 
m    m- 1         l 

be  the  smallest  value  for  which  the  inequality  above  is 
satisfied.   If  the  inequality  is  not  satisfied  for  any 
choice  of  r,  H  r  i  i  -  1,  then  let  r  =  m.   Now  if  r  =  0 , 
the  minimum  of  (4.2.2)  subject  to  (4.2.3)  is  simply  the 
unrestricted  minimum  of  (4.2.2),  and  if  r  >  0,  the  minimum 
of  (4.2.2)  subject  to  (4.2.3)  is  just  the  minimum  of  (4.2.2) 

subject  tou=v   =  v   ,=...=  v    . n  which  occurs  at 

J  m    m-l  m-r+i 

m 
fu   =  vm  =  ...  =  vm  _,  =  (B+h    I        d.)/(B+rh), 
I  m  m_r+1        j=m-r+l  3 

I  v.  =  d.  for   i  =  l,2,...,m-r. 

v.  1     1 

Now  consider  the  constraint  given  by  (4.2.4).   If 

r  >  m  -  s,  then  the  minimum  of  (4.2.2)  subject  to  (4.2.3) 


83 


and  (4.2.4)  is  simply  the  minimum  of  (4.2.2)  subject  to 
(4.2.3).   If  r  <  m  -  s,  the  minimum  of  (4.2.2)  subject  to 
(4.2.3)  and  (4.2.4)  is  obtained  by  minimizing  (4.2.2)  sub- 
ject to 

rn   =  v  .   =  v .   =  . . .  =  v .  if  r  =  0, 

)  Jl     '2  Dra~s  (4.2.5) 

I  u  =  v   =  =  v    ,i  -  v.   =  ...  =  v .       if  r  >  0, 

m"r+1     3l  ^m-s-r 

where  { j- , j2, . . . Om_s_r^  G  {1,2 , . .  .  ,m-r-l,m--r}  .   We  will 

now  show  that,  in  fact,  j,  =  m-r,  j„  =  m-r-1, . . . , Jm_s_r  = 

s+1.   Note  that  for  q  =  l,2,...,m-l  (4.2.2)  is  minimized 

subject  to  u  =  v   =  . . .=v    , ,  when 
J  m        m-q+1 

m 

ru  -  »„  -  —  -  %-q+i  -  ,3+hj=m£q+1dj)/,6+qh)' 

l^v .  =  d.  for  j  =  l,2,...,m-q, 


and  has  a  minimal  value  equal  to 

m 
•B+h    I        d 


•_  _   -,  j  \  m-q 
3B+qh+  — J  +  (3+qh)+h(m-q)+h  £  Zndy  (4.2.6) 

Similarly,  (4.2.2)  is  minimized  subject  to  u  =  v   =  ... 

=  v     ,  -   v. ,  where  i  e  {1 , 2 , . . . ,m-q-l ,m-q} ,  when 
m-q+1     l 


m-q+ 

m 
ru-v   =  .  .  .  =  v   _,,  =  v.  =  (B+h    I    d.+hd.  )  /  (B+(q+l)h)  , 
I      m  m-q+1     i       j=m-q+i  3    ! 

L.V.  =  d.  for  j  =  1,  .  .  .  ,  i-  ]  ,  i+1,  .  .  .  ,m-q, 

and  has  a  minimal  value  equal  to 

m 
/B+h    £    d  .+hdA  _  _ 

(  B+  (q+l)h)£nl ^-(f+TTh r(  3+  (q+1)  h)  +h  (m-^_1)  +h  .  Z  £ndj  * 

(4.2.7) 


84 


Now   subtracting    (4.2.6)    from    (4.2.7),    we   obtain 

m  m 

/6+h        E        d.+hd\  S+h        E        d\ 

(3+(q+l)h)&nf    i^'^l  j-(B+qh)£nf    ^M*^, 

^  (4.2.8) 

which  is  the  increase  in  the  minimal  value  of  (4.2.2)  due  to 

the  additional  constraint,  u  =  v..   Differentiation  of  (4.2.8) 

with  respect  to  d.  yields 

J msin^  .  i  , 

\3+h   E   d.+hd./    x 
j=m-q+l  :    1 

m 
which  is  negative  when  d.  <  (B+h   E    d.)/(g+qh)  and  positive 

1        j=m-q+l  J 
m 
when  d.  >  (8+h    E    d.)/(6+qh).   Hence,  (4.2.8)  is  an  increas- 
j=m-q+l  -* 

m 
ing  function  of  d.  when  d.  >  ( 3+h    E    d.)/(8+qh),  so  that  if 
1        X        j=m-q+l  D 
m 
d     >  (B+h    E    d.)/(B+qh),  choosing  i  =  m-q  will  yield 
m~*  j=m-q+l  3 

a  smaller  minimum  value  than  any  other  choice  of  i  <  m-q. 

In  a  similar  manner  subtracting  the  unrestricted  minimal  value 

of  (4.2.2)  from  the  minimal  value  of  (4.2.2)  subject  to  u  =  v. 

where  i  e  { 1 , 2 , . . . ,m} ,  we  obtain 

rtj+hdA 

(6+hnnl-g^l-h  £ndi, 

which  is  an  increasing  function  of  d.  for  d.  >  1.   Thus,  if 

d  >  1,  choosing  i  =  m  will  yield  a  smaller  minimum  value  than 
m 

any  other  choice  of  i < m.   Recall  that  we  are  investigating 
the  minimum  of  (4.2.2)  subject  to  (4.2.3)  and  (4.2.4)  when 
r<m-s.   If  r  =  0,  then  d  >  1,  so  that  m-r  =  m  is  the 


85 


optimal  choice  for  j,.   Further,  since  d.  >  1  for  i  =  1,2, 
where  r  =  0 ,  we  have 


m 

d,    (B+crh)  (    Z        d./a) 

m 


m 
6+h    Z         d^    (6+qh) (    Z         d . /q) 

j=nt-q+l  "  <  j=m-q+l  3  __  _     v 

6+qh  (6+qh)  "  j=m-q+l  j      m"q' 

for  q  =  1,2, . . . ,m-l,  and  hence,  when  r  =  0  choosing  j,  =  m, 

j2  =  m-l,...,j     =  s+1  in  (4.2.5)  will  yield  a  smaller 

minimum  than  any  other  choice  of  {j-,,J-,/...,i    }a{l,2, 

J 1  J 2  ' Jm-s 

...,m-l,m}.   Now  from  the  definition  of  r  we  see  that 

m 
dm-r  >  (6+h   Z    d.)/(B+rh)  if  1  <  r  <  m-1.   In  addition, 
j=m-r+l  J 

m 
for  q  =  l,2,...,m-2  if  d    >(6+h    l        d.)/(6+qh),  then 

q      j=m-q+l  J 

m  m 

6+h  Z      d.    (6+qh) ((6+h   Z        d.)/(6+qh))  +  hd 

j=m-q  3  =  j=m-q+l  J  m_q 


,m 


6+(q+l)h  6+(q+l)h 

<  ((6+qh)dm_q+hdm_g)/(6+(q+l)h) 

=  d     <  d 

m-q    m-q-1 

m 
Thus,  d     >  (6+h    Z    d  )/(6+qh)  for  q  =  r ,r+l , . . . ,m-l . 
j=m-q+l  -1 

It  then  follows  that,  when  1  <  r  <  m-s,  choosing  j,  =  m-r, 

j2  =  m-r-1, . . . , jm_s_r  =  s+1  in  (4.2.5)  will  yield  a  smaller 

minimum  than  any  other  choice  of  {i-,,i«,...,i      }c 

1  ~  2.  m-s-r 

{1,2,. . . , m-r-1, m-r } . 

We  can  now  obtain  the  minimal  solution  to  (4.2.2) 
subject  to  (4.2.3)  and  (4.2.4).   Denoting  the  minimal  solution 
by  (u  ,V  ) ,  we  find  that  if  r  >  m-s, 


86 


Uo=VcTT1=Vo  «.    i  =  ---=vc:   ™    ^i=(B+h        z        d    )/((3+rh), 
s      sm     s,m-l  s,m-r+l  j=m-r+l    -1 

v    .    =  d.  for   j    =   1,2, . . . ,m-r  , 

and   if   r    <  m  -    s, 


'u.=v      =vo  m    i  =  ---=v<=    c+r<e+h      z      d.)/(3+(m-s)h)  , 
s      sm      s,m-l  s,s+l  t=s+1    3 

v    .    =  d .  for    j    =   1,2, ... ,s. 

2    2-8 
Thus,  cj)(a  I/O  I+M;D,  — ,h)  is  minimized  subject  to 

(a~I,a2I+M)e  C   at 
s 

~2 

a      =  u  , 
s 

M  =  V   -  u  I. 
s     s 

2 

The  maximum  likelihood  estimates  of  a   and  M  are,  therefore, 

-2 

a   and  M  given  by 

a2  =  bus/B, 

M  =  K-1 (V  -u  I)K'_1. 
s   s 

To  illustrate  the  computation  involved  in  deriving  the 

maximum  likelihood  estimates,  we  will  again  consider  the 

example  presented  in  Section  2.3.   Recall  that  with  m  =  4, 

g  =  21,  n  =  6,  1   =  1,    and  M  =  diag(99,  24,  0,  0)  a  matrix  E 

from  the  distribution  W.(I,  105,  0)  and  a  matrix  H  from  the 

distribution  W.  (I+M,  20,  0)  were  generated.   With 

B  =  mg(n-l)  =  420,  b  =  trE,  and  H*  =  (1/20)H,  we  need  to  find 

a  nonsingular  matrix  K  such  that  K((b/B)I)K'  =  I 


87 


and 


KH*K '  =  D  where  D  is  a  diagonal  matrix.   Let  D,  = 


diag (ch, (H*) ,ch_ (H*) , . . . ,ch  (HA) )  ,  and  let  Q  be  the  orthog- 
onal matrix  for  which  the  i    column  is  the  characteristic 
vector  of  HA  corresponding  to  ch . (H*) ,  then  since  H^  is  sym- 
metric ,  P  'H*P  =  D,  .   Clearly,  (  (  B^/b^)  P)  '  (  (b/B)  I)  (  (B^/b*2)  P) 
=  P'P  =  I  and  (  (B^/b^P)  'H*  (  (BVb^P)  =  (B/b)D1.   Hence, 
we  find  that,  for  our  example, 


K  = 


1.00723 
.0551796 
.00921922 
-.000345628 


.0551967 
-1.00719 
-.00261055 
-.0128084 


-.00906271 
-.00310948 
1.00874 
-.000137374 


-.00104477 

.0127707 
-.00010739 
-1.0087 


and  D  =  diag ( 94 . 1065 ,  34.8845,  1.01721,  .618312).   Note  that 
d .  <  1  <  d  <  d„  <  d,  ,  so  that  r  =  1.   Then  simple  calculation 
yields  uQ  =  6.06506,  VQ  =  6.065061,  u1  =  2.39667,  V±   = 
diag(94.1065,  2.39667,  2.39667,  2.39667),  u2  =  .984153, 
V2  =  diag(94.1065,  34.8845,  .984153,  .984153),  u3  -  u4  = 

.982651,  V   =  V4  -  diag(94.1065,  34.8845,  1.01721,  .982651). 

2 
Thus,  if  we  let  a.  and  M.  denote  the  maximum  likelihood 
11 

2 

estimates  of  a   and  M,  respectively,  subject  to  the  con- 

2  i 

straints  a  >  0  and  Me  U  P .  ,  we  see  that 

j=0  3 

„2 

aQ  =  5.95987, 

MQ  =  (0)  , 


aj  =  2.3551, 


88 


89.8421 

4.92338 

-.808366 

-.0931905 

kl  = 

.269803 

-.0442987 

-.00510687 

.00727337 

.000838493 

.0000966637 

o2    =    .967085, 


91.3255 

3.17993 

-.826433 

-.0  715582 

33.4811 

.0575388 

-.426237 

M2  - 

.00770191 

-.000448497 
.0054369 

M3=M4= 


2  -2 

3  =  a4  = 

.965608, 

91.327 

3.17993 

-.826136 

-.0715587 

33.4825 

.0574547 

-.426256 

.0416617 

-.000452157 
.00543713 

4. 3   The  Likelihood  Ratio  Test 

s 

Recall    that  C     =  {  (A,B)  :  Ae  P    ,  Be  P   ,  B-A  e  U  p.},   and    . 
s  mm  .q    3 

2    ° 

suppose  we  know  that  (a  I,a"I+M)  eQ  =  C    .      We  wish  to  test, 

2    2 
say,  the  null  hypothesis  that  (a  I, a  I+M)  e w  =  C  _,c  C  . 

2    2 

The  alternative  hypothesis,  then,  is  that  (c  I, a   I+M)  e 


n-oj  -   C      -    C   _,  .   Thus,  we  are  testing  the  hypothesis 


,(s) 
l0 


rank (M)  <  s  -  1 


against  the  hypothesis 
(s) 


H'  ' :  rank(M)  =  s. 


89 


We  adopt  the  likelihood  approach  and  look  at 
max  f(b,H)/max  f(b,H)  =  X    e  (0,1]. 

With  u   and  the  matrix  V  =  diag (v  ,  ,v  ~,...,v   ) 
s  s       '   si   s2       sm 

given  by 

m 

'u   =  v   =...=v     _,,  =  (6+h    E    d.)/(0+rh), 
s     sm      s,m-r+l        .__   ,,  1 

'  j=m-r+l  J 

v  .  =  d.  for  j  =  l,2,...,m-r, 

if  r  >  m  -  s,  and 


u   =  v   =...=v    ..  =  (3+h  I      d.)/(B+(m-s)h)  , 
s     sm      s,s+l       j=s+l  3 

v  .  =  d.  for  j  =  l,2,...,s, 

S]     j 

-2       2 

if  r  <  m  -  s,  the  maximum  likelihood  estimators  a    ,    of  a    ,  and 

MQ,  of  M,  when  the  parameters  are  restricted  to  lie  within 
Q, ,    are  given  by 

°l    =   bus/3' 

MQ  =  K"1(VUsI)K''_1' 

where  K  is  a  nonsingular  matrix.   Similarly,  the  maximum 

~2       2 
likelihood  estimators  a  ,  of  a  ,  and  M  ,  of  M,  when  the 

parameters  are  restricted  to  lie  within  co ,  are  given  by 
°l    =   bus-l/B' 

.K  =  K-1<Vrus-iI,K'~1- 

It  should  be  noted  that  if  r  >  m  -  s  +1,  then  V  =  V   ,  and 

s     s-1 

u   =  u   ,  . 
s     s-1 


90 


The  likelihood  ratio,  A,  is 


A  =  max  f(b,H)/max  f(b,H) 

exo[-b/2a/2-35tr(S2I+M    )_1H]  (52)  h®  I  S2I+Mn  |%h 

exp[-b/2a2-^tr(a2I+ftn)-1H]  (a2)h^\o2l+9i    \hh 

U                     Q,           U  CO             '      CO           CO  ' 


exp[-6/2u   ,-JshtrV  11D]   Iv  I hh   uhi 
S-I       s-i     '  s1     s 


exp[-6/2u  -^htrV  XD]      |v   .  I hh   u^6, 

S         S  S~  X       3—  X 


IV  \hh   u* 
s     s 


s-11     s-1 


since,  if  r  >  m  -  s  +  1 

6(us-l"us1)  +  h  tr(Vs-l"Vs1)D  (4.3.1) 

=  6(u^1-u^1)  +  h  tr(V~s1-V~sl)D 

=    0, 
and  if  r  <  m  -  s  +  1,  (4.3.1)  becomes 

g|V(m-s+l)h  -    B+(m-s)h    \hL)  +  /Vfrn-s+DhA    ™  d_    _ 

\m  m/\  I            m               j  ■  -      1 

^B+h   Id.  8+h      Z      d/       V  \B+h   Id.      V3-53 

j=s    -1  j=s+l    -1  j=s    -1 

g+(m-s)h      "\        ™  1 


m  /      •_      ,  -i      j  / 

3+h      E      d.y    3-s+l     y 
j=s+l    D 

B+(rS+1)hl    (B+h  ?  d.)    -fj±fet£lhJ\    (6+h     Z      d.)    -   h 

.B+h  z  d     7      3=s  -    v6+h  z  d.y      ^*+1  D 

j=s    D  j=s+l    3 

;   +    (m-s+l)h   -    (B+(m-s)h)-h   =   0. 


91 


So  we  have 


%(6+h(m-s)  ) 


X   = 


+  (m-s+l)h>^+h(m-s+l))p.^  +  1djA   .f    <     +1 

« 0  ,  J.  — rr—   if  r  <  ra-s+1, 

mi  I  3+ (m-s) h  I  ' 

B+h  I  d.  /  V  / 

j=s  : 


if  r  >  m-s+1. 


Putting  t   =  hd  /(6+h  £  d.),  we  can  rewrite  A  as 

j=s  D 


^fB+(m-s+l)hfh  f6+(m-s+l)h>(6+h(m-s)) 
h  6+(m-s)h 


.Jsh 

's 


t.-(l-y 


(6+h(m-s)) 
if  r  <  m-s+1, 


v 


if  r  >  m-s+1 


We  will  now  show  that  r  <  m-s+1  if  and  only  if 

t   >  h/ (6+ (m-s+1) h)  .   First  consider  the  case  in  which 

s  =  m.   Then  r  <  m-s+1  =  1  if  and  only  if  d   >  1,  and 

m 

t   =  hd  /(6+hd  )  =  h/(6/d  +h)  >  h/(6+h)  if  d   >  1,  and 
m     m      m  m  m 

t  ■=  h/  (B/d  +h)  <  h/  (B+h) 
m         m 

if  d   <  1.   Consider  now  the  case  in  which  1  <  s  <  m-1. 
m 

Again  we  want  to  show  that  r  <  m-s+1  if  and  only  if 
t   >  h/ (6+(m-s+l)h) .   If  r  =  0,  clearly 


d   .  >  (B+h   E    d.)/(6+ih), 
j=m-i+l  3 


for  i  =  1,2,  ...  ,111-1.   Also,  if  0  <r  <  m-s  +  1,  then 


92 


d    >  (3+h    Z        d.)/(B+rh), 

m-r        .     , ,  1 
j=m-r+l  J 

and  we  have  seen  that  this  implies  that 

m 

d    >  (3+h    E    d.) / (3+qh) , 
m-q        •     , i  1 
^        3=m-q+l  J 

for  q  =  r ,r+l, . . . ,m-l  and,  more  specifically,  for  q  =  m-s. 

Hence,  if  r<m-s+l, 


ds  >  (3+h  Z      dJ/(3+(m-s)h)  , 
which  implies 


j-8+1  3 


+  h  E  d.  <  d  (3+(m-s)h)+hd  , 
i  s  s 

D=s  J 


so  that 


+  h  Z   d.  <  hd  (£  +  m-s+1  , 
1      s  h 
D=s  J 


and  thus 

hd 

t   = 


s       m      3/h  + m-s+1    3+  (m-s+1)  h* 
3+h  E  d. 
j=s  : 

Also,  if  r  >  m-s+1,  then  it  must  be  true  that 


dm-(m-s)  =  ds  *  (3+h_=Z+id.)/(3+(m-s)h) 
which  implies  that 

t  =  -S-  * 


s       m      3+ (m-s+1) h 
3+h  E  d. 
j=s  : 


93 


It  follows  that  the  likelihood  ratio,  A,  can  be  written  as 

fe+  (m-s+1)  hY2*1  (&+  (m-s+1)  h\h  ( 3+h  (m"s)  }  .  hh  .Js(3+h(m-s)) 

^    h     J      ^3+(m-s)h  J  s  [L      sj 

A=\  if  fcs  >  3+ (m-s+1)  h  ' 

h 


V 


1  lf  ts  "  3+ (m-s+1)  h  * 

^   ,_  •     /^  \    u-^h.,    ,  %  (3+h  (m-s)  ) 
Consider  the  function  g  t    =  t2   1-t  )2X  . 

s     s     s 

The  derivative  of  g (t  )  with  respect  to  t   is 
^   s  s 

t%h"1(l-t  )H^+h(m-s))-l[hh{1_t   )_^{3+h(m_s))t  ], 

s       s  s  s 

which  is  negative  for  t  e  (h/ (3+ (m-s+1) h) , 1) .   Thus,  A  is  a 

s 

decreasing  function  of  t   when  t  e  (h/ (3+ (m-s+1) h) , 1) .   In 

s       s 

addition, 

i+  (m-s+1)  h\i8hrB+(m-s+l)h^(g+h(m"s))^hf.,  .  ,h  (3+h  (m-s) )  <  . 
h     J   ^  3+(m-s)h  J  s  U~V  "  L' 

for   t  e  (h/ (3+ (m-s+1) h) ,1) ,  with  equality  when  t   = 

h/ (3+ (m-s  +  1) h)  ,  so  that  A  is  a  decreasing  function  of  t  . 

(s ) 

Since  the  likelihood  ratio  test  rejects  H~    for  small 

(s) 

values  of  A,  it  equivalently  rejects  Hq    for  large  values 

of  t  .   However,  the  distribution  of  t   is  intractable,  and 
s  s 

(s)  (<=) 

so  use  of  t   in  a  test  of  H„    versus  H,    is  not  practical, 
s  0  1 

In  the  following  chapter  we  present  an  alternative  test 

(s)  (s) 

statistic  for  testing  H~    versus  H' 


CHAPTER  5 


AN  ALTERNATIVE  TEST  WHEN  Z  = a2I 
AND  ITS  PROPERTIES 


5 . 1   Introduction 

We  have  seen  that  the  likelihood  ratio  test  rejects 

(s)  m 

H_    for  sufficiently  large  values  of  hd  /(8+h  E  d.), 

i=s 

where  d,>d„>...>d   are  the  solutions  to  I H+-d (b/B) 1 1  =  0. 

l  /.  m  i    -k  i 

Let  i^->^2>  .  .  .  >\]j      be  the  solutions  to  |  H— ipbl  |  =  0,  that  is, 

^i =hd./6  for  i  =  1,2, . . . ,m.   Then  the  likelihood  ratio  test 

(s)  m 

rejects  H'    for  sufficiently  large  values  of  ip  /  (1+  £  \\i .  )  . 

i=s 

This  quantity  is  an  increasing  function  of  ^  ,  so  that  it 

(s) 

would  be  reasonable  to  reject  H~    for  sufficiently  large 

values  of  tj;  .   However,  the  complexity  of  the  null  distribu- 

(s)  (s) 

tion  of  \\j      makes  the  use  of  4>      in  a  test  of  H^    versus  H, 

impractical.   Therefore,  in  this  chapter  we  present  an 

(s)  (s) 

alternative  test  statistic  for  testing  H^    against  H, 

(s)        m 

and  consider  the  test  which  rejects  H*    when   E  iLi .  is 

0         .   ri 
i=s 

sufficiently  large.   In  the  remainder  of  this  chapter  we 

investigate  some  properties  of  this  new  test.   In  Section  5.2 

m 
it  is  shown  that  the  test  based  on  I    \b .    is  an  invariant  test 

i=s  x 

(s)  (s) 

of  HQ    against  H    .   In  the  last  two  sections  we  discuss 


94 


95 


an  important  monotonicity  property  of  the  roots  ty .  i    i  = 

1,2,...  ,m  and  use  this  property  in  deriving  the  asymptotic 

m 
distribution  of  Z    ty • • 


5 .2   An  Invariance  Property 

Consider  the  group  of  transformations  G  =  {g    :  a>  0, 

a ,  P 

P  (mxra)  is  such  that  PP '  =  al},  where  g   _(b,H)  =  (ab,PHPf). 

a  ,  ir 

If   b  ~  a2xl    and   H~W    (a2I+M,h,0),    then   ab~aa2x?,    PHP '    ~ 

2 

W  (ao  I+PMP ' ,h, 0)  and  rank (PMP  ' )  =  rank(M).   Hence,  the 
m 

(s) 

problem  of  testing  the  hypothesis  H'   :  rank (M)  <  s-1  against 

(s) 

H,   :  rank(M)  =  s  is  invariant  under  the  group  G. 


Now 


consider   the   roots   tL> -.  >\b~>  .  .  .  >^      of    |H-^bl|    =   0 

12  m     i   t   i 


and  the  roots  6,>9^>...>e  of  [PHP'-9abI   =  0,  where  a>0  and 
I  2.  m    '  ' 

P  is  such  that  PP  '  =  al .   Clearly, 

| PHP'  -  6abl |  =  0 
implies 

| PHP'  -  9bPP' |=0, 
so  that 

| h  -  ebi |=o, 

and  thus,  9.  =  ^ . :  i  =  l,2,...,m.   Suppose  now  that 

9.  =  \b .  :  i  =  1,2,.  .  .  ,m  are  the  roots  of  |H,-9b,l|  =  0 
11  '11' 

and  |  H_— ip  fc>  „  I  |  =  0,  respectively,  where  b,  >  0,  b^  >  0  ,  and 

H,  and"  H-   are  positive  definite,  symmetric  matrices. 
There  exist  orthogonal  matrices  Q,  and  Q?  such  that 


96 


Q1(H1/b1)Q;[    =    V, 


Q2(H2/b2)Q^    =    Y, 


where   f   =  diag  (i|j,,  i|>2  , .  .  .  ,ty    ).      Take   a   =  b_/b,    and 
P   =   /a  QnQi •      It   now   follows   that 

(a)  9a,P(bl'Hl)    =    (ab1,PH1Pr) 

=    (b2,ab1Q^Q1(H1/b1)Q^Q2) 

-    <b2,b2Q£YQ2) 

=  (b2,b2(H2/b2)  )  =  (b2,H2), 

(b)  PP'  =  (/lQ^Q1)  (/aQ^Q1)  ' 

=  aQ^Q1Q{Q2  =  ai, 

and  (c)  a  >  0 , 

so  that  g   D  e  G.   Therefore,  by  Definition  3.3.1 
a  ,^ 

{i|i:|H-^bl|  =  0}  is  the  maximal  invariant  with  respect  to 

m 
G.   The  test  statistic    £  ty .  is  clearly  a  function  of 

i=s 

(ijj,  ,ijj_ , .  .  .  , xp  ),  and  so,  by  Lemma  3.3.2,  the  test  statistic 

m 

£   iJj  .  is  an  invariant  test  statistic  for  testing  the 
i=s 

(s)  (s) 

hypothesis  H~    against  the  hypothesis  H-/ 


5 . 3   A  Monotonicity  Property  of 
the  Power  Function 

The  test  procedure  which  we  have  been  investigating 

depends  on  the  latent  roots  ip,  ,  i\> ,.,...,  \\i      of  the  random 

matrix  H(bl)"1  =  (a"2H)  (  (b/a2 ) I ) _1 .   If  we  let  91>02>... 

-2  2 

>0      be    the    latent    roots   of    a      H,    then    i/j .    =    a    8./b: 
m  li 


97 


i  =  l,2,...,m.   The  distribution  of  the  roots  6,,9~/.../0 

(see,  for  example,  James  [1964])  depends  upon  the  latent 

-2 
roots  of  the  corresponding  population  matrix  I+a  M  as  param- 

-2 

eters.   Let  5,>6.->...>6  >1  be  the  latent  roots  of  I+a   M, 
1  A  m 

and  note  that  M  has  rank  of  at  most  s-1  if  and  only  if  6   =  1. 

s 

(s)  (s) 

Thus,  testing  the  hypothesis  H~   :  rank  (M)  £  s-1  against  H-/   : 

(s) 

rank(M)  =  s  is  equivalent  to  testing  the  hypothesis  H'   : 

(s)  m 

6   =1  against  H,   :  6   >  1.   Since  we  are  using   Z  \b .    as 
s       ^        1      s  r  .   ri 

( s ) 

a  test  statistic  in  testing  the  hypothesis  Hn 

(s) 

against  H,   ,  a  desirable  property  would  be  that  it  stochasti- 
cally increases  in  5  ,  and  hence,  the  power  function  increases 

monotonically  in  6  .   In  this  section  we  not  only  show  that 

m 

Z    ill.  stochastically  increases  in  6  ,  but  also  that  it 
l  s 

i=s 

stochastically  increases  in  each  6.:  i  =  l,2,...,m.   This 
more  general  result  will  be  utilized  in  the  following  section.*. 

We  will  need  the  following  results  from  Anderson  and 
Das  Gupta  [1964] . 

Lemma  5.3.1:   Let  X (m*h) (h>m)  be  a  random  matrix  having 
density 

f(X;Z,h)  =  (2tt)  -ishm|  E  |  _i2h  exp  [-3strZ_1XX  '  ]  , 

where  I    is  positive  definite.   Let  X,  >A0> .  .  .  >A  be  the  latent 

c  1  A    -  m 

roots  of  XX'  and  to  be  a  set  in  the  space  of  A,  ,  A~  ,  .  .  .  ,  A 

LA  m 

such  that  when  a  point  (A,,A~,...,A  )  is  in  lo  so  is  every 
t-  1 '  2       m  -L 

point  (A  '  ,  A  ' ,  .  .  .  ,  A  ')  for  which  A.'  ^  A-  :  i  =  1.2,...,m. 
1  A  m  ii 

Then  the  probability  of  the  set  co  depends  on  Z  only  through 


98 


the  latent  roots  of  I  and  is  a  monotonically  decreasing  func- 
tion of  each  of  the  latent  roots  of  Z. 

Lemma  5.3.2:   Let  A  be  a  positive  definite  matrix  of 
order  m,  and  D  and  D^  be  two  diagonal  matrices  of  order m such 
that  D4  -  D  is  positive  semidef inite ,  and  D  is  positive 
definite.   Then 

ch. (DAD)  £    ch. (D^AD*)        for  i  =  l,2,...,m. 

Using  these  two  results,  we  can  now  prove  the  main 
result  of  this  section. 

Lemma  5.3.3:  Let  X (mxh)  be  a  random  matrix  having 
density 

f(X;D,h)  =  (2TT)-J5hm|Dr^h  exp[-%trD_1XX']  , 

where  D  =  diag  (d  ,d2 , . . . ,d  ).   Let  V (mxm)  be  a  random,  posi- 
tive definite  matrix  independent  of  X.   Let  u  be  a  set  in 
the  space  of  the  latent  roots  of  XX  'V    satisfying  the  condi- 
tion stated  in  Lemma  5.3.1.   Then  the  probability  of  the 
set  co  is  a  monotonically  decreasing  function  of  each  of  the 
elements  of  D. 

Proof:   Consider  V  as  fixed,  and  let  V    =  T'T  where  T  is 
nonsingular.   Then  the  density  of  W  =  TX  is  f(W;TDT',h), 
and 

ch.  (XX'V-1)  =ch.(TXX'T')  =ch.(WW') 

l  l  l 

for  i  =  l,2,...,m.   Thus,  for  any  fixed  V,  we  have 

_/..  f(X;D,h)dX=   /   f  (W;TDT  '  ,h)  dW  (5.3.1) 

R(X)  R(W) 


99 


where  R(X)  denotes  the  region  {X:  (ch  (XX 'V_1)  , . . . ,ch  (XX  'V-1) ) 
£  to},  and  R(W)  denotes  the  region  {W:  (ch,  (WW')  , .  .  .  ,ch  (WW')) 
e  w}.   Let  D*  be  a  diagonal  matrix  for  which  D^-D  is  positive 
semidefinite.   It  follows  from  Lemma  5.3.2  that 

chi(TDvtT')  =  chi(D^(T'T)D^)  ;>  chi  (D*5  (T  'T)  D*)    =   ch.(TDT') 

for  i  =  l,2,...,m.   Then  from  Lemma  5.3.1  and (5. 3.1)  we  have 

/   f(X;D,h)dX  >    /   f (X;D*,h)dX 
R(X)  R(X) 

for  any  fixed  V.   Taking  expectations  with  respect  to  V,  we 
find  that 

PD(u»  >  PD#(«). 

Now  recall  that  we  are  investigating  the  test  statistic 
m 

I    iK.   Let  P  be  the  orthogonal  matrix  such  that  P(I+a  M)P' 
i=s 

=  A  =  diag(61,52,  .  .  .  ,6m)  .   Then  since  a~2E  ~  W  (I-ra~2M,h,  0)  , 

it  follows  that  P(a"2H)P'  ~W  (A,h,0),  and  we  can  write 

-2 
P(a   H)P'  =  XX',  where  X (m*h)  has  density  f(X;A,h)  given  in 

Lemma  5.3.1.   The  latent  roots  of  a~2H ( (b/a2) I) -1  are  the 

latent  roots  of  P  (a"2H) P  '  ( (b/a2) I ) ~\   or  equivalently , 

2—1  9 

XX '((b/a  )I)   .   Hence,  with  V  =  (b/a  )I,  clearly  V  is 

independent  of  X,  and  1^,1^,...,^   are  the  latent  roots  of 

XX 'V   .   in  addition,  if    E  ii> .  <  c  and  \b.'<\b.  :   i  =  l  2      m 

X  1~"    1  ■*-?**?    mm*     fill  f 

m  i=s             ~ 

then   I  ty'    <    c,  so  that  the  set  to  =  {  (ty    ,  ^  , .  .  .  ,ty    )  :   2  iJj.^c} 

i  =  q  x  1   ^       m    .   ri 

■   ■  _                       1=s 

satisfies  the  condition  of  Lemma  5.3.3.   So  it  follows  from 

Lemma  5.3.3  that  the  probability  of  the  set  co  is  a  monoton- 

ically  decreasing  function  of  each  of  the  latent  roots 


100 


_2 

6,,62,...,<5   of  I+c   M;  in  other  words,  the  power  function 

m 
of  the  test  based  on   I  i> .    is  a  monotonically  increasing 

i=s 

function  of  6 . :  i  =  l,2,...,m. 

m 
We  now  know  that  as  6  +00  P(  I  i|j.  >c)  increases 

i=s 

monotonically.   We  will  show  that,  in  fact,  as  5   -*•  « 

s 

m 

P(  I  i(j.  >  c)  -*■    1,  and  thus,  for  sufficiently  large  values 

i=s  . 

of  6   the  probability  of  rejecting  H*   :  6   =1  will  be 

arbitrarily  close  to  unity.   Let  K,  (mxm)  be  such  that 

K1  =  diag(ak1,ak2, . . . , ak  , 1 , . . . ,1) . 

Note  that      K, P (a~2H) P 'K  '   W  (K,  AK,' , h,0),  and 
1         1   rail 

K,AK'    =   diag(a2k26,  ,a2k26~,.  .  .  ,a2k26    ,1,...,1), 
li  3  11  22  s    s  iii 

2    2 

so  that  as  a  •*■   °° ,  ch .  (K,AK')  =  a  k  6  .  ->  °°    for  i  =  1,2,  .  .  .  ,s. 

Thus,  we  need  to  show  that 

m  -2  2-1 

P(  Z    ch^^CK^io      H)P'K-[(  (b/a^)I)  x)  >  c)  +    1 

i=s 
as  a  ->  oo.   However,  clearly, 

m  -2  2-1 

P(  E  chi(K1P(a  ^HJP'K^ ( (b/a  )I)   ) >c) 

i=s 

2  P(chs(K1P(a"2H)P'K^((b/a2)I)"1)  >  c)  . 

The  result  now  follows  from  the  following  lemma. 

Lemma  5.3.4;   Let  V (mxm)  and  U (mxm)  be  random  matrices 
independently  distributed  such  that  both  V  and  U  are  positive 
definite  with  probability  one.   Let 


K,  (mxm)  =  diag (ak, ,ak2, . . . ,ak  ,1,...,1). 
Then  P  (chg  (K-^K  'V_1)  >  c)  +   1  as  a  ■*■   ». 

Proof:   The  proof  is  identical  to  that  of  Lemma  3.5.6 


101 


5  .  4   The  Limiting  Distribution  of  Z    \p  . 


m 
If   I  ij).  is  to  be  used  as  a  test  statistic  in  the  test 
i=s 

of  the  hypothesis  fAs'  :  rank  (M)  <  s-1  against  h|s)  :  rank  (M)  =  s, 

it  is  necessary  to  compute  the  significance  level,  a,  where 

m  .  . 

a  =  sup  P (  I  ty.    >  c|Hi  ' ) . 

H(s)    i=s  x  '    u 
H0 

_2 

Let  6,  ><$_>...  >6   be  the  latent  roots  of  I  +  a   M,  and  recall 
i-  z~        -   m 

(s) 

that  the  null  hypothesis  can  be  written  H*   :  6   =  1 ,  or 

more  precisely,  H^s)  :  6,>6~>...>&      ,>1,<5  =6  ,,=...=6  =1. 
0      12       s-1     s   s+1      m 

We  will  write  ty.       16  ,  ,  &2  •  •  •  •  '  6  )  to  indicate  that  ty  .    is 
the  i    largest  root  of  m  roots  and  depends  on  the  popu- 
lation roots  5,  ,  <$2  / .  .  •  *  <5  •   Then  we  may  write  a,  the  signif- 
icance level,  as 

m 
<*  =       sup       P(  E  <K   (57,6?,...,<5  ,,!,...,!)>    c)  . 
61>62> .  .  .>6g_1>l  i=s  1,m  L      *  s"i 

However,  we  have  seen  in  the  previous  section  that  i> .    is 

ri:m 

stochastically  increasing  in  each  6.:  j  =  l,2,...,m.   It 

then  follows  that 

m 
a  =  P(  Z  ^i.m(00,00,  .  .  .  ,°°,1,  .  .  .  ,D  >  c)  , 
i  =  s 

where  \p  .   (°°  ,°° ,  .  .  .  ,°°  ,  1 ,  .  .  .  ,  1)  denotes  the  random  variable 

which  has  the  limiting  distribution  of  i>  .        (6,, 6  _,..., 6   n, 

i:m   1   2     '  s-1 

1,...,1)  as  6.  ->  °°:  j  =  1,2 ,  .  .  .  ,s-l .   Hence,  we  need  to  deter- 

m 

mine    the   distribution    of      I    \b  .        f00,00 ,  .  .  .  ,°°  ,  1 ,  .  .  .  ,  1)  . 

i:m  iii  ii 

i=s 


102 


Recall   that  b  ~  o   y a      an<3  H ~ W    (a   I+M,h,0),    and 

ap  m 

-2 
there  exists  an  orthogonal  matrix  P  such  that  P  (I+a   M)Pf 

=  A  =  diag (6, ,6-, . . . , 6  ).   If  we  define  B  and  H  as 
B  =  (b/a2)A~1, 
H  =  A-J5P(a~2H)P'A~35 


-h  -k      -k 

where  A  2  =  diag  (  " 


-k. 


1  ,u2  '••*  '  m 
i:m(6l'62 


!)  ,  then,  clearly, 


H~W  (l,h,0)  and  f  ,   (6,  ,6v...,fi  )  =  ch .  (  (a"2H)  (  (b/a2)  I)  X) 


=  ch. (HB  1) .   Then  if  we  let  B   =  (b/a2)A   ,  where 
l  n  n 

A   =  diag  (n6,  ,n6~ , . . . ,n6   ,  ,1,...,1),  we  need  to  find  the 
n       r    1    2        s-1 

m     __i 
limiting  distribution  of   Z  ch.  (HB   )  as  n  ->  °° . 
r  .in 

i=s 

We  will  need  the  following  result. 


Lemma  5.4.1:   Suppose  v  ~  x  •   Then 


x  (mxl)  = 
— n 


"c,v/n 

c-v/n 


cs-lv/n 


*1 


where  c   c-  ,  . . . ,c 


,  are  constants  and  x,  (  (s-1) xl)  is  a 
s-1  — 1 


degenerate  random  vector  with  all  of  its  probability  at  0^. 
Proof:   Clearly   x,  and  v  are  independent,  so  the  character- 


istic function  of  x  is 


103 


E  [exp  (i  x  '  t)  ]  =  E  [exp  (  i  x-f  t,  )  exp  (i  v  E  t . )  ] 

j=s  D 


E[exp(i  x{t1) ]   E[exp(ivE  t.)] 

j=s    : 


=  E[exp(i   v  E     t .)] 
j=s    3 


m        _j, 
=    (1   -   i   2    E    t.) 
j=s    J 


3CC 


Now  the   characteristic    function   of   x      is 

— n 

s-1  m 

E[exp(i   x't)]    =   E[exp(i(    E    c.t./n    +      E    t.)v] 

j=l    3    3  j=s    3 

s-1  m  _, 

=    (1    -    i    2(    E    c.t./n   +      E    t.)  )  2C\ 

j-i  D  :       j=s  y 

so   lim  E[exp(i   x't)]    =    (1   -    i    2    E    t.)  2a   =  E[exp(i   x't)]. 

n+»  "  j  =  s  -* 

The  result  now  follows  from  the  continuity  theorem 

(Lemma  3.6.2) . 

From  Lemma  5.4.1  we  observe  that  B   >  B  with 

n 

-    8=fBi  ,o,y 

V(o)        B2y 

where   §1  (  (s-l)x  (s-1) )  =  (0)  with  probability  one,  B„  (  (m-s+1) 

2 
x  (m-s+1)  )  =  bl ,  and  b  -»  xR-   We  now  need  to  show  that 

m      __! 

E  ch .  (HB   )  is  continuous  with  probability  one  under  the 
i=s 

distribution  of  (H,B) .   Put 

fEll  H12A 


H  = 


H21     H22' 


104 


where  E^    is  (s-l)x(s-l),  H12  is  (s-1)  x  (m-s+1)  ,  H21  is 
(m-s+1) x (s-1) ,  and  H„2  is  (m-s+1) x (m-s+1) .   Then  the  roots 
of  interest  are  the  solutions  to 


H 


11 


H 


12 


(0)      (0) 


-  iC 


H, 


H, 


JO) 


B. 


=  0. 


(5.4.1) 


'21     "22'       vv~'       2 
Since  H  is  nonsingular  with  probability  one,  we  may  put 


H"1  -  G  . 


'11 


'21 


'12 


'2  2- 


so  that  (5.4.1)  can  be  written 

f(0) 


i    -  i> 

m    r 


d0) 


G12  B2 
G22  B2 


=  0, 


s-1 
(0) 


"  ^G12B2 


m-s+1  r  22  2 
Hence,  it  must  be  true  that 


=  0 


Vs+1  "  ^G22B2I  =  °' 
G22  "  ^2|  =  0. 


(5.4.2) 


Thus,  with  probability  one  ch, (HB   ),ch2(HB   ),..., 

ch   ,  (HB~  )  are  undefined  and  ch  (HB   ),ch  L,  (HB-  ),..., 
s-1  s         s+1 

ch  (HB   )  are  the  solutions  to  (5.4.2);  that  is,  since  B 


is  of  rank  m-s+1  with  probability  one,    there  are  only  m-s+1 


105 


solutions  to  |  H— ipB  |  =  0.   Now  since  H22-H2,H, ,  H,  ~  is  non- 
singular  with  probability  one,  it  follows  that  G„~  = 
(H22-H2,H, , H.. _)   ,  and  so  (5.4.2)  can  be  rewritten 

|H22  -  H21H-^H12  -  *B2|  =  0. 

Clearly  B2  is  also  nonsingular  with  probability  one,  and 
thus,  by  Lemma  3.6.6,  ch . (HB   )  is  continuous  with  proba- 
bility one  under  the  distribution  of  (H,B)  for  i  =  s,s+l, 


m 


•A«-l, 


. . . ,m.   This  implies  that   Z  ch . (HB   )  is  also  continuous 

i=s 

with  probability  one  under  the  distribution  of  (H,B) . 
Note  that  the  set  of  discontinuity  points,  R,  is  closed, 
since  R  =  {(H,B):  |§2|  =  0} ,  and  also  recall  that 

H22~**21**ll**12  ~  Wm-s+l ^ ,h~s+1'°* '   Therefore,  from 

Lemma  3.6.7,  since  (H,B  )  >  (H,B),  it  follows  that  for 

i  =  s ,s+l , . . . ,m, 

*i:m(ca'M "'1 !)  ^i-s+lzm-s+l*1'1 1}  ' 

where  ^i_s+1 .m_s+i t1 /!/••• > 1)  denotes  the  distribution  of 

the  i-s+1    largest  root  of  |w-ijjvl|  =  0,  with 

2 
W  -  W     .  (I,h-s+l,0)  and  v~Xo/  independently.   Now  if  we 

let  e,>e0>...>9TY,  „,,  be  the  solutions  to  I W-  6 1 1  =  0,  then  we 
1  I  m-s+l  '     ' 

can  put  i>i.m  (°°,°°,  .  .  .  ,°°,1,  .  .  .  ,1)  =  6._  +1/v,  so  that 

m  m-s+l 

^  ^i.m(00,00,  •  .  .  ,°°,1,  .  .  .  ,D  =    Z      6./v  =  (trW)/v. 
i=s   "  j=l   -1 

But  tr  W  ~  x  i    where  v  =  (m-s+l)  (h-s+1)  ,  so 

m 

1    *ism{-'- "'1 1}  ^|F6- 

i=s  M   M 


106 


(s)  (s) 

Hence,  in  testing  H*   :  rank (M)  <  s-1  against  H,   : 

rank(M)  =  s ,  we  choose  -r  F{v,3,a)  as  our  critical  value, 

where  F(v,8,a)  is  the  constant  for  which  P(F(v,B)  > 

F(v,8,a))  =  a  when  F(v,8)  ~FR.  By  so  doing  we  will  guarantee 

m  ,  . 

sup  P(  E  <K.  m(S,  (L,...,U  >  |  F(v,B,a)  |H*S')  =  a. 

„(s)    i=s  1-m   X   2      m  6  ° 

H0 

In  order  to  determine  the  rank  of  M,  we  will  again  use 

a  sequential  procedure.   To  illustrate  this  procedure,  we 

will  return  to  the  example  presented  in  Section  4.2. 

Recall  that  D  =  diag ( 94 . 1065 ,  34.8845,  1.01721,  .618312), 

h  =  20,  and  8  =  420,  so  that  since  i> .    =  hd .  /  6 :  i  =  1,2,3,4, 

ty1   =   4.4813,  it      =    1.6612,  ip   =  .04844,  and  ^^    =    .029443. 

(4) 

We  will  first  consider  testing  the  hypothesis  H^   : 

(4) 

rank (M)  <  3  against  H,   :  rank  (M)  =4.   We  reject  the  null 

hypothesis,  Hq4),  if  i>     >   17  F(17,  420,  .05J/420.   Now 

17  F(17,  420,  .05)/420  is  approximately  equal  to  .066  and 

iK  =  .029443  <.066,  so  that  we  do  not  reject  H(j4)  and, 

13) 

instead,  consider  testing  the  hypothesis  H^   :  rank (M) 

<  2  against  h|3):  rank  (M)  =  3.   The  quantity   36  F(36,  420, 

.05)/420  is  approximately  equal  to  .122,  and  clearly 

\ii3   +    ip4  =  .07788  <  .122,  so  that  the  null  hypothesis,  H^3*  , 

is  not  rejected.   Since  H.!    is  not  rejected,  we  next  con- 

(2) 

sider  testing  the  hypothesis  H '   :  rank  (M)  <  1  against 

h|2) :  rank(M)  =  2.   We  find  that  57  F(57,  420,  .05)/420  is 
approximately  equal  to  .181,  and  therefore,  since 


107 


(2) 

\p      +   ty-   +    \i>.    =   1.7391  >  .181,  we  reject  H*  '  and  conclude 

that  the  rank  of  M  could  very  reasonably  be  taken  as  being 
two . 

Note  that  this  sequential  procedure  is  open  to  the 
same  objections,  regarding  the  use  of  the  significance 
level,  a,  at  each  step,  mentioned  earlier  in  Section  3.6. 
Again,  however,  it  seems  unlikely  to  cause  serious  error 
in  practice.   If  the  true  rank  of  M  is  p,  then  there  is 
a  small  probability,  usually  less  than  a,  that  the  rank,  s, 
determined  by  the  sequential  procedure  will  be  greater  than 
p.   Also,  if  6   is  sufficiently  large,  then  the  probability 

ir 

of  s  being  less  than  p  is  also  small. 


BIBLIOGRAPHY 


Anderson,  T.  W.   (1955) .   The  integral  of  a  symmetric 
unimodal  function  over  a  symmetric  convex  set  and 
some  probability  inequalities.   Proceedings  of  the 
American  Mathematical  Society  6,  170-176. 

Anderson,  T.  W.   (1958) .   An  Introduction  to  Multivariate 
Statistical  Analysis.   John  Wiley  &  Sons,  Inc., 
New  York. 

Anderson,  T.  W.  ,  and  Das  Gupta,  S.   (1964).   A  monotonicity 
property  of  the  power  functions  of  some  tests  of  the 
equality  of  two  covariance  matrices.   Annals  of 
Mathematical  Statistics  35,  1059-1063. 

Barnard,  G.  A.   (1963)  .   Some  logical  aspects  of  the  fiducial 
argument.   Journal  of  the  Royal  Statistical  Society 
B  25,  111-114. 

Bellman,  R.   (1970) .   Introduction  to  Matrix  Analysis. 
McGraw-Hill  Book  Company,  Inc. ,  New  York. 

Breiman,  L.   (1968) .   Probability.   Addison-Wesley  Publish- 
ing Company,  Reading,  Massachusetts. 

Das  Gupta,  S.,  Anderson,  T.  W. ,  and  Mudholkar,  G.  S. 

(1964) .  Monotonicity  of  the  power  functions  of  some 
tests  of  the  multivariate  linear  hypothesis.  Annals 
of  Mathematical  Statistics  35,  200-205. 

Graybill,  F.  A.   (1969) .   Introduction  to  Matrices  with 
Applications  in  Statistics.   Wadsworth  Publishing 
Company,  Inc.,  Belmont,  California. 

James,  A.  T.   (1964).   Distributions  of  matrix  variates 

and  latent  roots  derived  from  normal  samples.   Annals 
of  Mathematical  Statistics  35,  475-501. 

Kendall,  M.  G. ,  and  Stuart,  A.   (1963).   The  Advanced  Theory 
of  Statistics  Vol.  1.   Hafner  Publishing  Company, 
New  York. 

Lehmann,  E.  L.   (1959).   Testing  Statistical  Hypotheses. 
John  Wiley  &  Sons,  Inc.,  New  York. 

108 


109 


Mann,  H.  B . , and  Wald,  A.   (1943).   On  stochastic  limit  and 

order  relationships.   Annals  of  Mathematical  Statistics 
14,  217-226. 

Marshall,  A.  W. ,  and  Olkin  I.   (1974).   Majorization  in 

multivariate  distributions.   Annals  of  Statistics  2, 
1189-1200. 

Morrison,  D.  F.   (1976) .   Multivariate  Statistical  Methods. 
McGraw-Hill  Book  Company,  Inc.,  New  York. 

Ostrowski,  A.  M.   (1973).   Solution  of  Equations  in  Euclidean 
and  Banach  Spaces.   Academic  Press,  Inc.,  New  York. 

Pillai,  K.  C.  S.  (1965).   On  the  distribution  of  the  largest 

characteristic  root  of  a  matrix  in  multivariate  analysis. 
Biometrika  52,  405-414. 

Pillai,  K.  C.  S.   (1967).   Upper  percentage  points  of  the 
largest  root  of  a  matrix  in  multivariate  analysis. 
Biometrika  54,  189-194. 

Roy,  S.  N.   (1953).  On  a  heuristic  method  of  test  construc- 
tion and  its  use  in  multivariate  analysis.   Annals  of 
Mathematical  Statistics  24,  220-238. 


BIOGRAPHICAL  SKETCH 

James  Robert  Schott  was  born  on  January  9,  1955,  in 
Cincinnati,  Ohio,  where  he  spent  the  first  twenty- two  years 
of  his  life.   Upon  graduating  from  La  Salle  High  School  in 
June,  1973,  he  attended  Xavier  University,  which  is  located 
in  Cincinnati,  and  received  the  degree  of  Bachelor  of  Science 
with  a  major  in  mathematics  in  June,  1977. 

In  September,  1977,  Jim  enrolled  in  the  graduate  school 
at  the  University  of  Florida  and  was  awarded  the  degree  of 
Master  of  Statistics  in  March,  1979.   Since  that  time  he  has 
been  working  toward  the  degree  of  Doctor  of  Philosophy. 
While  at  the  University  of  Florida,  Jim  has  been  a  recipient 
of  a  graduate  fellowship  and,  in  addition,  he  has  been 
employed  by  the  Department  of  Statistics  as  a  graduate 
assistant  for  both  teaching  and  consulting  duties. 


110 


I  certify  that  I  have  read  this  study  and  that  in  my 
opinion  it  conforms  to  acceptable  standards  of  scholarly 
presentation  and  is  fully  adequate,  in  scope  and  quality, 
as  a  dissertation  for  the  degree  of  Doctor  of  Philosophy. 


■  Q  -^ 


V  >A,' 


John\Gl  Saw,  Chairman 
Professor  of  Statistics 


I  certify  that  I  have  read  this  study  and  that  in  my 
opinion  it  conforms  to  acceptable  standards  of  scholarly 
presentation  and  is  fully  adequate,  in  scope  and  quality, 
as  a  dissertation  for  the  degree  of  Doctor  of  Philosophy. 


Ct^r 


.A^n/V^K-^ 


Alan  G.  Agresti 
Associate  Professor  of 
Statistics 


I  certify  that  I  have  read  this  study  and  that  in  my 
opinion  it  conforms  to  acceptable  standards  of  scholarly 
presentation  and  is  fully  adequate,  in  scope  and  quality, 
as  a  dissertation  for  the  degree  of/^Doctor  of  Philosophy. 


£ia 


Grams 
r  of  Pathology 


This  dissertation  was  submitted  to  the  Graduate  Faculty  of 
the  Department  of  Statistics  in  the  College  of  Liberal  Arts 
and  Sciences  and  to  the  Graduate  Council,  and  was  accepted 
as  partial  fulfillment  of  the  requirements  for  the  degree  of 
Doctor  of  Philosophy. 

August  1981 


Dean  for  Graduate  Studies 
and  Research 


UNIVERSITY  OF  FLORIDA 

■■■111 If 

3  1262  08553  1803