Skip to main content

Full text of "A large sample normality test"

See other formats


Faculty  Working  Paper  93-0171 


330  STX 

B385 

1993:171   COPY  2 


A  Large  Sample  Normality  Test 


^'  of  the 

JAN    />  -, 

^'^^srsitv  fj  nil    , 


Anil  K.  Bera  Pin  T.  Ng 

Department  of  Economics  Department  of  Economics 

University  of  Illinois  University  of  Houston,  TX 


Bureau  of  Economic  and  Business  Research 

College  of  Commerce  and  Business  Administration 

University  of  Illinois  at  Urbana-Champaign 


BEBR 


FACULTY  WORKING  PAPER  NO.  93-0171 

College  of  Commerce  and  Business  Administration 

University  of  Illinois  at  Urbana-Champaign 

November  1993 


A  Large  Sample  Normality  Test 


Anil  K.  Bera 
Pin  T.  Ng 


Digitized  by  the  Internet  Archive 

in  2011  with  funding  from 

University  of  Illinois  Urbana-Champaign 


http://www.archive.org/details/largesamplenorma93171bera 


A  LARGE  SAMPLE  NORMALITY  TEST 

ANIL  K.  BERA  and  PIN  T.  NG 

Department  of  Economics,  University  of  Illinois,  Champaign,  IL  61820 

Department  of  Economics,  University  of  Houston,  TX  77204-5882 

November  22,  1993 

Abstract 

The  score  function,  defined  as  the  negative  logarithmic  derivative  of  the  probability 
density  function,  plays  an  ubiquitous  role  in  statistics.  Since  the  score  function  of  the 
normal  distribution  is  linear,  testing  normality  amounts  to  checking  the  linearity  of  the 
empirical  score  function.  Using  the  score  function,  we  present  a  graphical  alternative 
to  the  Q-Q  plot  for  detecting  departures  from  normality.  Even  though  graphical  ap- 
proaches are  informative,  they  lack  the  objectivity  of  formal  testing  procedures.  We, 
therefore,  supplement  our  graphical  approach  with  a  formal  large  sample  chi-square 
test.  Our  graphical  approach  is  then  applied  to  a  wide  range  of  alternative  data  gener- 
ating processes.  The  finite  sample  size  and  power  performances  of  the  chi  square  test 
are  investigated  through  a  small  scale  Monte  Carlo  study. 
KEY  WORDS:  Normality  test;  Score  function;  Graphical  approach 

1      Introduction 

Since  Geary's  (1947)  suggestion  of  putting  the  statement  "Normality  is  a  myth.  There 
never  was  and  will  never  be,  a  normal  distribution"  in  front  of  all  statistical  texts, 
the  need  to  test  for  the  normality  assumption  in  many  statistical  models  has  been 
widely  acknowledged.  As  a  result,  a  wide  range  of  tests  for  normality  are  currently 
available.  Most  of  these  tests  basically  fall  into  the  foUowing  categories:  (1)  tests 
based  on  probability  or  Q-Q  plots,  (2)  moments  tests,  (3)  distance  tests  based  on  the 
empirical  distribution  function,  (4)  goodness  of  fit  tests,  and  (5)  tests  based  on  the 
empirical  characteristic  function. 

No  single  test  statistic  can  reveal  as  much  information  as  a  graphical  display.  In 
Section  2,  we  present  a  graphical  alternative  to  the  Q-Q  plot  using  the  score  function, 
defined  as  the  negative  logarithmic  derivative  of  the  probability  density  function.  Even 
though  graphical  approaches  are  informative,  they  lack  the  objectivity  of  formal  test- 
ing procedures.  We  therefore  supplement  our  graphical  approach  with  a  formal  large 
sample  x'^  test  based  on  the  score  function  in  Section  3.  The  performances  of  our 
graphical  approach  and  score  function  based  x^  test  depend  on  our  ability  to  estimate 
the  score  function  accurately.  We  review  some  score  function  estimators  in  Section  4. 


In  Jarque  and  Bera  (1987),  a  moment  test  was  shown  to  possess  superior  powers 
compared  to  most  other  normality  tests.  Their  moment  test  utilizes  the  normal  distri- 
bution's skewness  measure  y/b^  =  0  and  kurtosis  measure  62  =  3.0.  As  a  result,  under 
certain  non  normal  distributions  with  skewness  and  kurtosis  measures  identical  to  the 
normal  distribution,  moment  tests  based  on  y/bi  and  62  will  have  no  power.  Some 
of  such  distribution  are  Tukey's  A  distributions  when  A  =  0.135  and  5.2  [see  Joiner 
and  Rosenblatt  (1971)].  Moment  based  tests  also  have  power  against  only  certain  al- 
ternatives. Our  score  function  based  x^  test,  on  the  other  hand,  does  not  have  this 
disadvantage.  The  superior  power  of  our  score  function  based  x^  ^^st  is  demonstrated 
in  a  small  scale  Monte  Carlo  study  in  Section  5. 

2      A  Graphical  Approach 

The  score  function,  defined  as  ip{x)  =  —log'J[x)  —  —yuji  *^^^  random  variable  having 
probability  density  function  f{x)  plays  an  ubiquitous  role  in  statistics.  It  is  related 
to  the  constructions  of  L-,  M-  and  R-estimators  for  location  and  scale  model  as  well 
as  regression  models  in  the  robustness  literatures.  [See  Joiner  and  Hall  (1983)  for 
an  excellent  overview].  It  is  also  used  in  constructing  various  adaptive  L-,  M-  and 
R-estimators  which  achieve  the  Cramer-Rao  efficiency  bounds  asymptotically.  [See 
Koenker  (1982)].  It  can  also  be  used  to  estimate  the  Fisher  information.  In  hypothesis 
testing,  the  score  function  plays  a  crucial  role  in  robustifying  conventional  testing 
procedures.  [See  Bickel  (1978)  and  Bera  and  Ng  (1992)].  Its  fundamental  contribution 
to  statistics,  however,  can  best  be  seen  in  the  realm  of  exploratory  data  analysis. 

The  plots  of  the  density  and  score  functions  of  some  common  distributions  are 
presented  in  Figure  1  and  Figure  2  respectively.  While  it  is  difficult  to  differentiate 
the  tails  of  a  Gaussian  distribution  from  those  of  a  Cauchy  distribution  through  the 
density  functions,  the  tails  of  their  score  functions  are  very  distinct.  In  fact,  we  can 
easily  distinguish  among  various  distributions  by  investigating  the  score  functions. 

It  is  clear  from  Figure  1  and  Figure  2  that  the  mode  of  a  distribution  is  characterized 
by  an  upward  crossing  of  the  score  function  at  the  horizontal  axis  while  an  anti-mode  is 
located  at  the  point  of  downward  crossing.  An  exponential  distribution  has  a  horizontal 
score  function.  A  tail  thicker  than  the  exponential  has  a  negatively  sloped  score  while 
a  tail  thinner  than  the  exponential  corresponds  to  an  upward  sloping  score. 

A  Gaussian  distribution  has  a  linear  score  function  passing  through  the  horizontal 
axis  at  its  location  parameter  with  a  slope  equal  to  the  reciprocal  of  its  variance. 
This  suggests  an  alternative  to  the  familiar  and  popular  probability  or  Q-Q  plot.  An 
estimated  score  function  with  a  redescending  tail  towards  the  horizontal  axis  indicates 
departure  towards  distributions  with  thicker  tails  than  the  normal  distribution  while 
a  diverging  tail  suggests  departure  in  the  direction  of  thinner  tailed  distributions. 

We  can  even  recover  the  estimate  of  the  density  function  through  exponentiating 
the  negative  integral  of  the  estimated  score  function  although  this  may  seem  to  be  a 
roundabout  approach. 


Figure  1:  Probability  Density  Functions 


04 
0  0 


-3         0         3 


04 
00 


-3         0         3 


04 
00 


-3         0         3 


0  4 
0.0 


-3         0         3 


0.4 
0  0 


-3         0         3 


N(0.1) 


1(5) 


Cauchy(0,l) 


DouExp(0,1) 


Logis(0,.5) 


04 

^X\ 

2 

1 

1 

0 

-3         0          3 

0                  1 

Extreme(O.l) 

Unjf(O.I) 

1 

/ 

2 

^ 

0                   4 

0                   4 

Gamtna(2.l) 

Weibull(5.2) 

Exp{1) 


Perato(l.1) 


Lnomi(O.I) 


F(5,5) 


Be(a(3.2) 


ChiSq(3) 


Figure  2:  Score  Functions 


-3        0         3 


-3        0         3 


-3         0         3 


-3         0         3 


-3         0         3 


N(0.1) 


1(5) 


Cauchy(0,1) 


DouExp(0,1) 


Logis(0.  5) 


^ 


-3         0  3 


> 

^ 


0  1 


Exlreme(0,1) 


Unif(0.1) 


Exp(1) 


Lnorm(O.I) 


Beta(3.2) 


L 


40 
-10 


L 


L 


0  4 


Gamma(2,l) 


Weibull(5.2) 


Peraio(1.1) 


F(5,5) 


ChiSq(3) 


3  A  Formal  Test 

A  formal  "objective"  test  on  the  null  hypothesis  of  a  Gaussian  distribution  is  equivalent 
to  testing  the  linearity  of  the  score  function.  Since  a  straight  line  can  be  viewed  as  a 
first  order  approximation  to  any  polynomial,  the  normality  test  can  easily  be  carried 
out  through  the  asymptotic  x^  test  of  regressing  the  estimated  score  function  V'C^^t)  on 
some  polynomial  of  a:,.  The  null  hypothesis  of  a  Gaussian  distribution  wiU  correspond 
to  the  linear  relationship  between  VK^i)  ^■^d  a:,. 

When  the  null  hypothesis  of  a  Gaussian  distribution  cannot  be  rejected,  we  can 
estimate  the  location  parameter  by  the  point  at  which  the  ordinary  least  squares  re- 
gression line  intersects  the  horizontal  axis  and  the  estimate  of  the  scale  parameter  will 
be  the  square  root  of  the  reciprocal  of  the  regression  slope. 

4  Estimating  the  Score  Function 

Performances  of  the  above  graphical  approach  and  formal  x"^  test  rely  on  accurate 
estimates  of  the  score  functions.  Numerous  score  function  estimators  are  available, 
most  of  which  are  constructed  from  some  kernel  density  estimators.  [See  Stone  (1975), 
Manski  (1984)  and  Cox  and  Martin  (1988)].  Csorgo  and  Revesz  (1983)  used  a  nearest- 
neighbor  approach.  Cox  (1985)  proposed  a  smoothing  spline  version,  which  is  further 
refined  and  implemented  in  Ng  (1994). 

It  has  often  been  argued  that  the  choice  of  kernel  is  not  crucial  in  kernel  density 
estimation.  The  correct  choice  of  kernel,  however,  becomes  important  in  the  tails  where 
density  is  low  and  few  observations  will  help  smooth  things  out.  This  sensitivity  to 
kernel  choice  is  further  amplified  in  score  function  estimation  where  higher  derivatives 
of  the  density  function  are  involved  [see  Portnoy  and  Koenker  (1989),  and  Ng  (1994)]. 

Ng  (1994)  found  that  the  smoothing  spline  score  estimator,  which  finds  its  the- 
oretical justification  from  an  explicit  mean  squared  errors  minimization  criterion,  is 
more  robust  than  the  kernel  estimators  to  distributional  variations.  We  use  this  score 
estimator  in  the  paper. 

The  smoothing  spline  score  estimator  is  the  solution  to 

min       /(V'^  -  2il)')dFn  +  A  j {i)"{x)fdx  (1) 

t/'e//2[a,6] 

where  /r2[a,6]  =  {V'  :  V'jV''  Q-re  absolutely  continuous,  and  /^  [V^"(a:)]'^c^ar  <  oo}.  The 
objective  function  (1)  is  the  (penalized)  empirical  analogue  of  minimizing  the  following 
mean-squared  error: 

j{i^  -   i^ofdFo   =   Jii^'   -   2iP')dFo  -^  Ji^idFo  (2) 

in  which  i/^o  is  the  unknown  true  score  function  and  the  equality  is  due  to  the  fact  that 
under  some  mild  regularity  conditions  [see  Cox  (1985)] 

J^oHFo   =    -  j  f'Q{x)'4){x)dx  =   Jir'dFo. 

Since  the  second  term  on  the  right  hand  side  of  (2)  is  independent  of  ip,  minimizing  the 
mean-squared  error  may  focus  exclusively  on  the  first  term.  Minimizing  (1)  yields  a 


Figure  3:  Estimated  Score  Functions 


3n 


■3-* 


-3-' 


-3 


3  -I 


-3-" 


-IS-i 


N(0,1) 


-S-' 


t(5) 


Ture  Score 
Estimated  Score 


DouExp(0,1) 


Beia(3,2) 


Gamma(2,1) 


Lnorm(O.I) 


balance  between  "fidelity-to-data"  measured  by  the  mean-squared  error  term  and  the 
smoothness  represented  by  the  second  term.  As  in  any  nonparametric  score  function 
estimator,  the  smoothing  spline  score  estimator  has  the  penalty  parameter  A  to  choose. 
The  penalty  parameter  merely  controls  the  trade-off  between  "fidelity-to-data"  and 
smoothness  of  the  estimated  score  function.  We  use  the  automated  penalty  parameter 
choice  mechanism,  the  adaptive  information  criteria,  suggested  and  implemented  in 
Ng  (1994)  [see  Ng  (1991)  for  a  FORTRAN  source  codes]. 


5      Some  Examples  and  Simulation  Results 

In  Figure  3,  we  present  the  smoothing  spline  estimated  score  functions  for  each  of 
the  100  random  observations  drawn  from  some  of  the  distributions  in  Figure  2.  The 
random  number  generators  were  Marsaglla's  Super- Duper  random  number  generators 
available  in  "S"  [Becker,  Chambers  and  Wilks  (1988)]  installed  on  a  Sun  SPARCstation 
10.  The  smoothing  spline  score  estimator  was  Ng's  (1991)  Fortran  version  adapted  for 
""S".  It  is  obvious  from  Figure  3  that  any  departure  from  the  Gaussian  distribution  can 
be  easily  detected  from  the  plots. 

To  study  the  finite  sample  properties  of  our  score  function  based  x^  test  and  the 
moment  based  LM  test  of  Jarque  and  Bera  (1987),  we  perform  a  small  scale  Monte 
Carlo  study.  The  LM  test  was  shown  in  Jarque  and  Bera  (1987)  to  possess  very 
good  power  as  compared  to  the  skewness  measure  test  y/bi,  the  Kurtosis  measure 


test  62,  D'Agostino's  (1971)  D*  test,  Pearson,  D'Agostino  and  Bowman's  (1977)  R 
test,  Shapiro  and  Wilk's  (1965)  W  test,  and  Shapiro  and  Francia's  (1972)  W  test 
against  the  various  alternatives  distributions  investigated.  As  a  result,  we  use  it  as 
our  bench  mark  to  evaluate  the  performance  of  our  x^  test.  The  null  distribution 
here  is  the  standard  normal  distribution  and  the  alternatives  are  Gamma  (2,1),  Beta 
(3,2),  Student's  t  (5)  and  Tukey's  A  distribution  with  A  =  5.2.  All  distributions  are 
standardized  to  have  zero  mean  and  variance  twenty-five.  Our  x^  test  is  obtained  by 
running  the  following  regression 

i>{xi)  =  To  +  liXi  +  722;,-^  +  -js^i^  +  74a:,'*  +  IsXi^  +  76X,^  +  €,- 

and  testing  for  //q  :  72  =  73  =  74  =  75  =  76  =  0.  The  x^  test  statistic  is  then  given  by 

{RSSr  -  RSS)   D      2  .       u 

RSS/{N  -  7)    ^  '^^        ""^'^  ^' 

where  RSSr  is  the  restricted  residual  sum  of  squares  obtained  from  regressing  0(x,)  on 
the  intercept  and  x,  alone,  RSS  is  the  residual  sum  of  squares  of  the  whole  regression 
and  N  is  the  sample  size.  The  LM  test  is  given  by 


LM  =  N 


6  24 


Under  the  nuU  hypothesis  of  normality,  LM  is  asymptotically  distributed  as  a  X2- 

The  estimated  sizes  and  powers  of  the  LM  and  x^  test  in  1000  Monte  Carlo  repli- 
cations are  reported  in  Table  1.  The  standard  errors  of  the  estimated  probabilities 
are  no  greater  than  \/.25/1000  =  .016.  The  sample  sizes  considered  are  25,  50,  100 
and  250.  The  performances  of  the  x  test  from  regressing  the  ^'(^^t)  on  some  higher 
order  polynomials  of  x,  were  also  investigated.  The  results  are  similar  so  we  choose 
not  to  report  those  here.  Under  the  Gaussian  distribution,  the  estimated  probabilities 
of  Type  I  error  are  computed  from  the  true  x^  critical  points. 

From  Table  1,  we  can  see  that  the  estimated  Type  I  errors  of  the  x^  test  are  much 
closer  than  the  LM  test  to  the  nominal  value  of  .10  for  all  sample  sizes.  The  LM  test 
under  estimated  the  sizes  of  the  test  in  all  the  sample  sizes  we  investigated. 

To  make  a  valid  power  comparison,  we  size  adjust  the  power  under  all  alternative 
distributions.  The  empirical  significance  level  we  use  is  10%.  At  the  smaller  sample 
sizes  of  N=25  and  50,  the  LM  test  has  higher  powers  than  the  x^  test  under  Gamma 
(2,1),  Log  (0,1)  and  t(5).  The  discrepancies,  however,  become  less  prominent  as  the 
sample  size  increases.  This  is  due  to  the  fact  that  the  score  functions  of  both  Log  (0,1) 
and  Gamma  (1,2)  are  approximately  linear  in  the  high  density  regions  as  can  be  seen 
from  Figure  2.  More  observations  in  the  tails  will  be  needed  to  facilitate  estimation 
of  the  score  functions  that  are  distinguishable  from  the  linear  Gaussian  score.  The 
situation  is  similar  in  the  Student's  t(5).  However,  an  even  bigger  sample  size  will 
probably  be  needed  for  some  realizations  in  the  tails  to  discern  the  estimated  score 
function  of  the  Student's  t  from  that  of  the  Gaussian.  As  expected,  the  x^  test  has 
some  power  for  A(5.2)  and  this  increases  rapidly  with  the  sample  size.  The  x^  test  also 
performs  better  for  Beta  (3,2)  alternative.  The  LM  test  has  powers  even  lower  than 
its  sizes  for  the  Tukey's  A  alternative  in  all  sample  sizes. 


Table  1:  Estimated  Powers  for  1000  Replications  (Empirical  size  =  .10) 


Sample  Sizes 

Distributions 

LM 

x' 

Gaussian 

.044 

.126 

Beta  (3,2) 

.063 

.213 

iV  =  25 

Gamma  (2,1) 

.700 

.625 

t(5) 

.352 

.163 

Log  (0,1) 

.962 

.795 

A(5.2) 

.09 

.152 

Gaussian 

.059 

.111 

Beta  (3,2) 

.154 

.370 

N  =  50 

Gamma  (2,1) 

.944 

.872 

t  (5) 

.517 

.260 

Log(0,l) 

1.00 

.927 

A(5.2) 

.061 

.334 

Gaussian 

.067 

.099 

Beta  (3,2) 

.433 

.548 

N  =  100 

Gamma  (2,1) 

.998 

.990 

t  (5) 

.699 

.347 

Log(0,l) 

1.00 

1.00 

A(5.2) 

.019 

.641 

Gaussian 

.085 

.091 

Beta  (3,2) 

.980 

.901 

A^  =  250 

Gamma  (2,1) 

1.00 

1.00 

t(5) 

.933 

.701 

Log(0,l) 

1.00 

1.00 

A(5.2) 

.011 

.980 

Based  on  our  examples  and  simulation  results,  we  conclude  that  the  estimated  score 
function  is  informative  in  performing  exploratory  data  analysis.  It  also  allows  us  to 
formulate  a  formal  large  sample  test  for  normality  that  possesses  reasonable  size  and 
good  power  properties  under  finite  sample  situations. 

Acknowledgement 

The  authors  would  like  to  thank  Roger  Koenker  and  Robin  Sickles  for  their  helpful 
suggestions  and  incisive  comments.  The  computations  were  performed  on  computing 
facilities  support  by  the  National  Science  Foundation  Grants  SES  89-22472. 

References 

Bera,  A.K.  and  Ng,  P.T.  (1992),  "Robust  Tests  for  Heteroskedasticity  and  Au- 
tocorrelation Using  Score  Function,''  Tilburg  University,  CentER  for  Economic 
Research,  Discussion  Paper  No.  9245. 

2l  Bickel,  P.J.  (1978),  "Using  Residuals  Robustly  I:  Tests  for  Heteroscedasticity, 
Nonlinearity,"  The  Annals  of  Statistics,  6,  266-291. 

31  Cox,  D.D.  (1985),  "A  Penalty  Method  for  Nonparametric  Estimation  of  the  Log- 
arithmic Derivative  of  a  Density  Function,"  Annals  of  the  Institute  of  Statistical 
Mathematics,  37,  271-288. 

Cox,  D.D.  and  Martin,  D.R.  (1988),  "Estimation  of  Score  Functions",  Technical 
Report,  University  of  Washington. 

Csorgo,  M.  and  Revesz,  P.  (1983),  ".A.n  N.N-estimator  for  the  Score  Function," 
Seminarbericht  Nr.J^Q,  Proceedings  of  the  First  Easter  Conference  on  Model  The- 
ory, Sektion  Mathematik. 

D'Agostino,  R.B.  (1971),  "An  Omnibus  Test  for  Normality  for  Moderate  adn 
Large  Size  Samples,"  Biometrika,  58,  341-348. 

Geary,  R.C.  (1947),  "Testing  for  Normality,"  Biometrika,  34,  209-242. 

8]  Jarque,  CM.  and  Bera,  A.K.  (1987),  "Test  for  for  Normality  of  Observations  and 
Regression  Residuals,"  International  and  Statistical  Review,  55,  163-172. 

91  Joiner,  B.L.  and  HaU,  D.L.  (1983),  "The  Ubiquitous  Role  of  /'//  in  Efficient 
Estimation  of  Location,"  The  American  Statistician,  37,  128-133. 

Joiner,  B.L.  and  Rosenblatt,  J.R.  (1971),  "Some  Properties  of  the  Range  in  Sam- 
ples from  Tukey's  Symmetric  Lambda  Distributions,"  Journal  of  the  American 
Statistical  Association,  66,  394-399. 

Koenker,  R.  (1982),  "Robust  Methods  in  Econometrics,"  Econometric  Review,  1, 
214-255. 

Manski,  C.F.  (1984),  "Adaptive  Estimation  of  Non-linear  Regression  Models," 
Econometric  Reviews,  3,  145-194. 

Ng,  Pin  T.  (1991),  "Computing  Smoothing  Spline  Score  Estimator,"  working  pa- 
per. 


[12 


[14]  Ng,  Pin  T.  (1994),  "Smoothing  Spline  Score  Estimation,"  SIAM,  Journal  of  Sci- 
entific Computing,  forthcoming. 

[15]   Pearson,  E.S.,  D'Agostino,  R.B.  and  Bowman,  K.O.  (1977),  "Tests  for  Departure 
from  Normality:  Comparison  of  Powers,"  Biometrika^  64,  231-246. 

[16]   Portnoy,  S.  and  Koenlcer,  R.  (1989),  "Adaptive  L- Estimation  of  Linear  Models," 
The  Annals  of  Statistics,  17,  362-381. 

[17]  Shapiro,  S.S.  and  Wilk,  M.B.  (1965),  "An  Analysis  of  Variance  Test  for  Normality 
(Complete  Samples),"  Biometrika,  52,  591-611. 

[18]  Shapiro,  S.S.  and  Francia,  R.S.  (1972),  "Approximate  Analysis  of  Variance  Test 
for  Normality,"  Journal  of  the  American  Statistical  Association,  67,215-216. 

[19]  Stone,  C.J.  (1975),  "Adaptive  Maximum  Likelihood   Estimators  of  a  Location 
Parameter,"  The  Annals  of  Statistics,  3,  267-284.