Faculty Working Paper 93-0171

330 STX

B385

1993:171 COPY 2

A Large Sample Normality Test

^' of the

JAN /> -,

^'^^srsitv fj nil ,

Anil K. Bera Pin T. Ng

Department of Economics Department of Economics

University of Illinois University of Houston, TX

Bureau of Economic and Business Research

College of Commerce and Business Administration

University of Illinois at Urbana-Champaign

BEBR

FACULTY WORKING PAPER NO. 93-0171

College of Commerce and Business Administration

University of Illinois at Urbana-Champaign

November 1993

A Large Sample Normality Test

Anil K. Bera Pin T. Ng

Digitized by the Internet Archive

in 2011 with funding from

University of Illinois Urbana-Champaign

http://www.archive.org/details/largesamplenorma93171bera

A LARGE SAMPLE NORMALITY TEST

ANIL K. BERA and PIN T. NG

Department of Economics, University of Illinois, Champaign, IL 61820

Department of Economics, University of Houston, TX 77204-5882

November 22, 1993

Abstract

The score function, defined as the negative logarithmic derivative of the probability density function, plays an ubiquitous role in statistics. Since the score function of the normal distribution is linear, testing normality amounts to checking the linearity of the empirical score function. Using the score function, we present a graphical alternative to the Q-Q plot for detecting departures from normality. Even though graphical ap- proaches are informative, they lack the objectivity of formal testing procedures. We, therefore, supplement our graphical approach with a formal large sample chi-square test. Our graphical approach is then applied to a wide range of alternative data gener- ating processes. The finite sample size and power performances of the chi square test are investigated through a small scale Monte Carlo study. KEY WORDS: Normality test; Score function; Graphical approach

1 Introduction

Since Geary's (1947) suggestion of putting the statement "Normality is a myth. There never was and will never be, a normal distribution" in front of all statistical texts, the need to test for the normality assumption in many statistical models has been widely acknowledged. As a result, a wide range of tests for normality are currently available. Most of these tests basically fall into the foUowing categories: (1) tests based on probability or Q-Q plots, (2) moments tests, (3) distance tests based on the empirical distribution function, (4) goodness of fit tests, and (5) tests based on the empirical characteristic function.

No single test statistic can reveal as much information as a graphical display. In Section 2, we present a graphical alternative to the Q-Q plot using the score function, defined as the negative logarithmic derivative of the probability density function. Even though graphical approaches are informative, they lack the objectivity of formal test- ing procedures. We therefore supplement our graphical approach with a formal large sample x'^ test based on the score function in Section 3. The performances of our graphical approach and score function based x^ test depend on our ability to estimate the score function accurately. We review some score function estimators in Section 4.

In Jarque and Bera (1987), a moment test was shown to possess superior powers compared to most other normality tests. Their moment test utilizes the normal distri- bution's skewness measure y/b^ = 0 and kurtosis measure 62 = 3.0. As a result, under certain non normal distributions with skewness and kurtosis measures identical to the normal distribution, moment tests based on y/bi and 62 will have no power. Some of such distribution are Tukey's A distributions when A = 0.135 and 5.2 [see Joiner and Rosenblatt (1971)]. Moment based tests also have power against only certain al- ternatives. Our score function based x^ test, on the other hand, does not have this disadvantage. The superior power of our score function based x^ ^^st is demonstrated in a small scale Monte Carlo study in Section 5.

2 A Graphical Approach

The score function, defined as ip{x) = —log'J[x) — —yuji *^^^ random variable having probability density function f{x) plays an ubiquitous role in statistics. It is related to the constructions of L-, M- and R-estimators for location and scale model as well as regression models in the robustness literatures. [See Joiner and Hall (1983) for an excellent overview]. It is also used in constructing various adaptive L-, M- and R-estimators which achieve the Cramer-Rao efficiency bounds asymptotically. [See Koenker (1982)]. It can also be used to estimate the Fisher information. In hypothesis testing, the score function plays a crucial role in robustifying conventional testing procedures. [See Bickel (1978) and Bera and Ng (1992)]. Its fundamental contribution to statistics, however, can best be seen in the realm of exploratory data analysis.

The plots of the density and score functions of some common distributions are presented in Figure 1 and Figure 2 respectively. While it is difficult to differentiate the tails of a Gaussian distribution from those of a Cauchy distribution through the density functions, the tails of their score functions are very distinct. In fact, we can easily distinguish among various distributions by investigating the score functions.

It is clear from Figure 1 and Figure 2 that the mode of a distribution is characterized by an upward crossing of the score function at the horizontal axis while an anti-mode is located at the point of downward crossing. An exponential distribution has a horizontal score function. A tail thicker than the exponential has a negatively sloped score while a tail thinner than the exponential corresponds to an upward sloping score.

A Gaussian distribution has a linear score function passing through the horizontal axis at its location parameter with a slope equal to the reciprocal of its variance. This suggests an alternative to the familiar and popular probability or Q-Q plot. An estimated score function with a redescending tail towards the horizontal axis indicates departure towards distributions with thicker tails than the normal distribution while a diverging tail suggests departure in the direction of thinner tailed distributions.

We can even recover the estimate of the density function through exponentiating the negative integral of the estimated score function although this may seem to be a roundabout approach.

Figure 1: Probability Density Functions

04 0 0

-3 0 3

04 00

-3 0 3

04 00

-3 0 3

0 4 0.0

-3 0 3

0.4 0 0

-3 0 3

N(0.1)

1(5)

Cauchy(0,l)

DouExp(0,1)

Logis(0,.5)

04

^X\

2

1

1

0

-3 0 3

0 1

Extreme(O.l)

Unjf(O.I)

1

/

2

^

0 4

0 4

Gamtna(2.l)

Weibull(5.2)

Exp{1)

Perato(l.1)

Lnomi(O.I)

F(5,5)

Be(a(3.2)

ChiSq(3)

Figure 2: Score Functions

-3 0 3

-3 0 3

-3 0 3

-3 0 3

-3 0 3

N(0.1)

1(5)

Cauchy(0,1)

DouExp(0,1)

Logis(0. 5)

^

-3 0 3

>

^

0 1

Exlreme(0,1)

Unif(0.1)

Exp(1)

Lnorm(O.I)

Beta(3.2)

L

40 -10

L

L

0 4

Gamma(2,l)

Weibull(5.2)

Peraio(1.1)

F(5,5)

ChiSq(3)

3 A Formal Test

A formal "objective" test on the null hypothesis of a Gaussian distribution is equivalent to testing the linearity of the score function. Since a straight line can be viewed as a first order approximation to any polynomial, the normality test can easily be carried out through the asymptotic x^ test of regressing the estimated score function V'C^^t) on some polynomial of a:,. The null hypothesis of a Gaussian distribution wiU correspond to the linear relationship between VK^i) ^â– ^d a:,.

When the null hypothesis of a Gaussian distribution cannot be rejected, we can estimate the location parameter by the point at which the ordinary least squares re- gression line intersects the horizontal axis and the estimate of the scale parameter will be the square root of the reciprocal of the regression slope.

4 Estimating the Score Function

Performances of the above graphical approach and formal x"^ test rely on accurate estimates of the score functions. Numerous score function estimators are available, most of which are constructed from some kernel density estimators. [See Stone (1975), Manski (1984) and Cox and Martin (1988)]. Csorgo and Revesz (1983) used a nearest- neighbor approach. Cox (1985) proposed a smoothing spline version, which is further refined and implemented in Ng (1994).

It has often been argued that the choice of kernel is not crucial in kernel density estimation. The correct choice of kernel, however, becomes important in the tails where density is low and few observations will help smooth things out. This sensitivity to kernel choice is further amplified in score function estimation where higher derivatives of the density function are involved [see Portnoy and Koenker (1989), and Ng (1994)].

Ng (1994) found that the smoothing spline score estimator, which finds its the- oretical justification from an explicit mean squared errors minimization criterion, is more robust than the kernel estimators to distributional variations. We use this score estimator in the paper.

The smoothing spline score estimator is the solution to

min /(V'^ - 2il)')dFn + A j {i)"{x)fdx (1)

t/'e//2[a,6]

where /r2[a,6] = {V' : V'jV'' Q-re absolutely continuous, and /^ [V^"(a:)]'^c^ar < oo}. The objective function (1) is the (penalized) empirical analogue of minimizing the following mean-squared error:

j{i^ - i^ofdFo = Jii^' - 2iP')dFo -^ Ji^idFo (2)

in which i/^o is the unknown true score function and the equality is due to the fact that under some mild regularity conditions [see Cox (1985)]

J^oHFo = - j f'Q{x)'4){x)dx = Jir'dFo.

Since the second term on the right hand side of (2) is independent of ip, minimizing the mean-squared error may focus exclusively on the first term. Minimizing (1) yields a

Figure 3: Estimated Score Functions

3n

â– 3-*

-3-'

-3

3 -I

-3-"

-IS-i

N(0,1)

-S-'

t(5)

Ture Score Estimated Score

DouExp(0,1)

Beia(3,2)

Gamma(2,1)

Lnorm(O.I)

balance between "fidelity-to-data" measured by the mean-squared error term and the smoothness represented by the second term. As in any nonparametric score function estimator, the smoothing spline score estimator has the penalty parameter A to choose. The penalty parameter merely controls the trade-off between "fidelity-to-data" and smoothness of the estimated score function. We use the automated penalty parameter choice mechanism, the adaptive information criteria, suggested and implemented in Ng (1994) [see Ng (1991) for a FORTRAN source codes].

5 Some Examples and Simulation Results

In Figure 3, we present the smoothing spline estimated score functions for each of the 100 random observations drawn from some of the distributions in Figure 2. The random number generators were Marsaglla's Super- Duper random number generators available in "S" [Becker, Chambers and Wilks (1988)] installed on a Sun SPARCstation 10. The smoothing spline score estimator was Ng's (1991) Fortran version adapted for ""S". It is obvious from Figure 3 that any departure from the Gaussian distribution can be easily detected from the plots.

To study the finite sample properties of our score function based x^ test and the moment based LM test of Jarque and Bera (1987), we perform a small scale Monte Carlo study. The LM test was shown in Jarque and Bera (1987) to possess very good power as compared to the skewness measure test y/bi, the Kurtosis measure

test 62, D'Agostino's (1971) D* test, Pearson, D'Agostino and Bowman's (1977) R test, Shapiro and Wilk's (1965) W test, and Shapiro and Francia's (1972) W test against the various alternatives distributions investigated. As a result, we use it as our bench mark to evaluate the performance of our x^ test. The null distribution here is the standard normal distribution and the alternatives are Gamma (2,1), Beta (3,2), Student's t (5) and Tukey's A distribution with A = 5.2. All distributions are standardized to have zero mean and variance twenty-five. Our x^ test is obtained by running the following regression

i>{xi) = To + liXi + 722;,-^ + -js^i^ + 74a:,'* + IsXi^ + 76X,^ + €,-

and testing for //q : 72 = 73 = 74 = 75 = 76 = 0. The x^ test statistic is then given by

{RSSr - RSS) D 2 . u

RSS/{N - 7) ^ '^^ ""^'^ ^'

where RSSr is the restricted residual sum of squares obtained from regressing 0(x,) on the intercept and x, alone, RSS is the residual sum of squares of the whole regression and N is the sample size. The LM test is given by

LM = N

6 24

Under the nuU hypothesis of normality, LM is asymptotically distributed as a X2-

The estimated sizes and powers of the LM and x^ test in 1000 Monte Carlo repli- cations are reported in Table 1. The standard errors of the estimated probabilities are no greater than \/.25/1000 = .016. The sample sizes considered are 25, 50, 100 and 250. The performances of the x test from regressing the ^'(^^t) on some higher order polynomials of x, were also investigated. The results are similar so we choose not to report those here. Under the Gaussian distribution, the estimated probabilities of Type I error are computed from the true x^ critical points.

From Table 1, we can see that the estimated Type I errors of the x^ test are much closer than the LM test to the nominal value of .10 for all sample sizes. The LM test under estimated the sizes of the test in all the sample sizes we investigated.

To make a valid power comparison, we size adjust the power under all alternative distributions. The empirical significance level we use is 10%. At the smaller sample sizes of N=25 and 50, the LM test has higher powers than the x^ test under Gamma (2,1), Log (0,1) and t(5). The discrepancies, however, become less prominent as the sample size increases. This is due to the fact that the score functions of both Log (0,1) and Gamma (1,2) are approximately linear in the high density regions as can be seen from Figure 2. More observations in the tails will be needed to facilitate estimation of the score functions that are distinguishable from the linear Gaussian score. The situation is similar in the Student's t(5). However, an even bigger sample size will probably be needed for some realizations in the tails to discern the estimated score function of the Student's t from that of the Gaussian. As expected, the x^ test has some power for A(5.2) and this increases rapidly with the sample size. The x^ test also performs better for Beta (3,2) alternative. The LM test has powers even lower than its sizes for the Tukey's A alternative in all sample sizes.

Table 1: Estimated Powers for 1000 Replications (Empirical size = .10)

Sample Sizes

Distributions

LM

x'

Gaussian

.044

.126

Beta (3,2)

.063

.213

iV = 25

Gamma (2,1)

.700

.625

t(5)

.352

.163

Log (0,1)

.962

.795

A(5.2)

.09

.152

Gaussian

.059

.111

Beta (3,2)

.154

.370

N = 50

Gamma (2,1)

.944

.872

t (5)

.517

.260

Log(0,l)

1.00

.927

A(5.2)

.061

.334

Gaussian

.067

.099

Beta (3,2)

.433

.548

N = 100

Gamma (2,1)

.998

.990

t (5)

.699

.347

Log(0,l)

1.00

1.00

A(5.2)

.019

.641

Gaussian

.085

.091

Beta (3,2)

.980

.901

A^ = 250

Gamma (2,1)

1.00

1.00

t(5)

.933

.701

Log(0,l)

1.00

1.00

A(5.2)

.011

.980

Based on our examples and simulation results, we conclude that the estimated score function is informative in performing exploratory data analysis. It also allows us to formulate a formal large sample test for normality that possesses reasonable size and good power properties under finite sample situations.

Acknowledgement

The authors would like to thank Roger Koenker and Robin Sickles for their helpful suggestions and incisive comments. The computations were performed on computing facilities support by the National Science Foundation Grants SES 89-22472.

References

Bera, A.K. and Ng, P.T. (1992), "Robust Tests for Heteroskedasticity and Au- tocorrelation Using Score Function,'' Tilburg University, CentER for Economic Research, Discussion Paper No. 9245.

2l Bickel, P.J. (1978), "Using Residuals Robustly I: Tests for Heteroscedasticity, Nonlinearity," The Annals of Statistics, 6, 266-291.

31 Cox, D.D. (1985), "A Penalty Method for Nonparametric Estimation of the Log- arithmic Derivative of a Density Function," Annals of the Institute of Statistical Mathematics, 37, 271-288.

Cox, D.D. and Martin, D.R. (1988), "Estimation of Score Functions", Technical Report, University of Washington.

Csorgo, M. and Revesz, P. (1983), ".A.n N.N-estimator for the Score Function," Seminarbericht Nr.J^Q, Proceedings of the First Easter Conference on Model The- ory, Sektion Mathematik.

D'Agostino, R.B. (1971), "An Omnibus Test for Normality for Moderate adn Large Size Samples," Biometrika, 58, 341-348.

Geary, R.C. (1947), "Testing for Normality," Biometrika, 34, 209-242.

8] Jarque, CM. and Bera, A.K. (1987), "Test for for Normality of Observations and Regression Residuals," International and Statistical Review, 55, 163-172.

91 Joiner, B.L. and HaU, D.L. (1983), "The Ubiquitous Role of /'// in Efficient Estimation of Location," The American Statistician, 37, 128-133.

Joiner, B.L. and Rosenblatt, J.R. (1971), "Some Properties of the Range in Sam- ples from Tukey's Symmetric Lambda Distributions," Journal of the American Statistical Association, 66, 394-399.

Koenker, R. (1982), "Robust Methods in Econometrics," Econometric Review, 1, 214-255.

Manski, C.F. (1984), "Adaptive Estimation of Non-linear Regression Models," Econometric Reviews, 3, 145-194.

Ng, Pin T. (1991), "Computing Smoothing Spline Score Estimator," working pa- per.

[12

[14] Ng, Pin T. (1994), "Smoothing Spline Score Estimation," SIAM, Journal of Sci- entific Computing, forthcoming.

[15] Pearson, E.S., D'Agostino, R.B. and Bowman, K.O. (1977), "Tests for Departure from Normality: Comparison of Powers," Biometrika^ 64, 231-246.

[16] Portnoy, S. and Koenlcer, R. (1989), "Adaptive L- Estimation of Linear Models," The Annals of Statistics, 17, 362-381.

[17] Shapiro, S.S. and Wilk, M.B. (1965), "An Analysis of Variance Test for Normality (Complete Samples)," Biometrika, 52, 591-611.

[18] Shapiro, S.S. and Francia, R.S. (1972), "Approximate Analysis of Variance Test for Normality," Journal of the American Statistical Association, 67,215-216.

[19] Stone, C.J. (1975), "Adaptive Maximum Likelihood Estimators of a Location Parameter," The Annals of Statistics, 3, 267-284.