EXACT UNCONDITIONAL TESTS FOR 2x2 CONTINGENCY TABLES BY SAMY SALOMON SUISSA A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1982 n M^ » i To Nicole, Daniel and my parents ACKNOWLEDGEMENTS I am deeply grateful to Professor Jonathan Shuster for being an instrumental part of my education. His understanding and confidence were important in my entry to this graduate program. Throughout my studies, his patience, help and guidance were invaluable to my work. His contribution to this dissertation was crucial. I am thankful to my family for their constant support: My wife Nicole, for her patience, love and understanding, my son Daniel, for the joys and the change of pace, and my parents, brother and sisters who were always there with love and encouragement. My appreciation extends to Professors Ronald Randies, Mark Hale, Ramon Littell and Ken Portier for their full cooperation on such short notice. TABLE OF CONTENTS page ACKNOWLEDGEMENTS iii ABSTRACT vi CHAPTER 1 INTRODUCTION 1 1 . 1 The Problem 1 1.2 Some Methods of Eliminating Nuisance Parameters 3 1 . 3 Numerical Example 5 1. 4 Proposed Approach and Preview 12 2 METHODOLOGY FOR COMPUTING THE SIZE OF A TEST 14 2 . 1 Introduction 14 2 . 2 Local Bound for ir ' (p) 15 2.3 Least Upper Bound for it (p) 16 2.4 Stability of the Null Power Function 19 2.5 Choice of the Test Statistic... 20 3 THE 2x2 TABLE FOR INDEPENDENT PROPORTIONS 22 3.1 Introduction 22 3.2 Asymptotic Tests and Sample Size Formulae.. 25: 3.3 Fisher's Exact Test 2 8 3.4 An Exact Unconditional Test 31 3.5 Relation to the Chi-square Goodness-of-fit Test 38 3.6 Power and Sample Sizes 40 4 THE 2x2 TABLE FOR CORRELATED PROPORTIONS 42 4 . 1 Introduction 42 4.2 McNemar's Test, Other Asymptotic Tests and Sample Size Formulae.. 46 4.3 The Exact Conditional Test 48 4.4 An Exact Unconditional Test 51 4.5 Power and Sample Sizes 57 APPENDICES A TABLES 59 B PLOTS OF THE NULL POWER FUNCTION 79 C COMPUTER PROGRAMS 91 REFERENCES 9 8 BIOGRAPHICAL SKETCH 101 Abstract of Dissertation Presented to the Graduate Council of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy EXACT UNCONDITIONAL TESTS FOR 2*2 CONTINGENCY TABLES by Samy Salomon Suissa August 1982 Chairman: Jonathan J. Shuster Major Department: Statistics The so-called "exact" conditional tests are very popular for testing hypotheses in the presence of nuisance parameters. However, in the context of discrete distributions, they must be supplemented with randomization to become exactly of size a, the nominal significance level. This practice is undesir- able since irrelevant events should not affect one's decision. Consequently, the conditional test without randomization, while still called "exact," becomes conservative. As an unconditional alternative, a methodology is devel- oped to compute the exact size of any test when the null power function is of a given form. This approach is a way of catering to the worst possible configuration of the nuisance parameter by maximizing the null power function over the domain of the nuisance parameter. As special cases, the 2x2 contingency table to compare two independent proportions and the 2x2 contingency table to compare two correlated proportions are considered. For the equal sample size case, exact critical values of the Z-test for comparing two independent proportions are computed and tabulated for n=10 (1)150, a=.025 and oc=.05. Sample size requirements based on the exact unconditional one-sided Z-test, with a— .05 and 80% power, were never larger than the corresponding sample size requirements based on the "exact" conditional test, namely Fisher's exact test. In fact, the proposed Z-test is uniformly more powerful than Fisher's exact test for n=10 (1)150, a=.025 and o=.05. For comparing two correlated proportions, exact critical values of the Z-test, which is the appropriate square root of McNemar's chi-square test, are computed and tabulated for N=10(l)200, a=.025 and ot=.05, where N is the total number of matched pairs. Here again, sample size determinations based on the exact unconditional one-sided Z-test, with .05. The statistician is now faced with a dilemma. He would conclude that 91<92 by the first method, but could not by the second method at a=.05. This discrepancy raises several questions: How accurate is the normal approximation to the Z-test ? How conservative is Fisher's exact test without randomization ? If the conditional method is preferred, how does one explain to the layman that the sample space is restricted only to the tables with 12 total successes ? The answers to these questions lie in the null power function, a function of the nuisance parameter 8. For each of the two tests, the exact null power function is plotted on the basis of the attained significance levels as the new nominal significance levels. For the Z-test, the plot of the exact null power function, based on the Z-value of 1.83 (nominal significance .0336), is given in Figure 1.1. It is seen that the exact significance would be greater than the nominal significance level of .0336 when .13<8<.87. In that range, the Z-test is liberal in the sense that it would reject the null hypothesis when it should not at level .0336. However, notice that the exact null power function never exceeds the original significance level of a=.05, the maximum being .047. For the conditional approach, the plot of the exact null power function based on the attained significance level of Fisher's exact test (.0849) is given in Figure 1.2. The conservativeness of this conditional test is obvious. In fact, its null power function is never larger than the original nominal significance level of a=.05, the maximum being .045. Further insight into the reasons for such largely different results can be obtained by inspecting the critical regions of each test. Once more, the Z-test will be based on the nominal significance level .0336 and Fisher's exact II N ■ «. O H II n • *— * (0 c -p 0 CO ■H +) H 1 0 IS] a 0 4-1 n 0 04 C +» 0 c •H CI) -P T3 O c c CD 3 cu in o) T3 H c 0) •H £ o 0 as •J-l H H fr 3 c c •H n 4J (rt o a nJ 6 X 0 W u u. -H En. ZI3„l_J a_O3lU0= .33UI-- OZ 10 \ \ \ 7-*- ■o c fl o 11 c X! ■P •H ? +> 0) Q) -P +> O «J X 9 CO u Cn oo o M-) ■ 0 rH c 0) 0 > ■M •P H U c 0) 3 o M-l d C U ■H rt S X O W C 2T3_ J_J LD3UC L.2Z'Jf-~OZ 11 test on the level .0849. By representing all the possible results of the experiment in the form of points of a lattice diagram, given in Figure 1.3, the critical regions are simply given by marked subsets of these points. s2 =10 9 8 7 6 5 4 3 2 1 0 ® 0 10 = s. Figure 1.3 Critical regions of Z-test and Fisher's exact test. In Figure 1.3, S,= number of successes from II, and S2 = number of successes from II2. The sample points in the lattice diagram marked by an "0" belong to the critical region defined by the Z-test (nominal significance level .0336) and those marked by an "x" belong to the critical region defined by Fisher's exact test (nominal significance level .0849). Although the nominal significance level of the Z-test (.0336) is much smaller than that of Fisher's exact 12 test (.0849), notice that the critical region of the Z-test contains the critical region of Fisher's exact test. This is a flagrant example of the conservativeness of Fisher's exact test and of the liber alness of the Z-test. These discrepancies suggest that the maximization method might be more appropriate, if not more exact, than the first two methods of eliminating the nuisance parameter. Therefore, by either adjusting the Z-test or the unconditional Fisher's exact test, the maximization method could lead to significance levels which are closer to the nominal levels. 1.4 Proposed Approach and Preview In this dissertation, the maximization method will be used in conjunction with tests derived from the estimation method to develop an exact unconditional testing procedure' as an alternative to the popular, but conservative, condi- tional method. This procedure will be applied to two problems of the 2x2 contingency table, namely that of com- paring two independent proportions and of comparing two correlated proportions. For this purpose, a methodology for computing the supremum of a null power function of a certain general form is developed in Chapter 2. The form of this null power function includes, as particular cases, the null power functions of the two problems considered in this dissertation. In Chapter 3, the supremum of the null power function of the Z-test (the size of the Z-test) for 13 comparing two independent proportions, will be computed for the equal sample size case. Consequently, exact critical values and minimum required sample sizes will be obtained and tabulated for the design and analysis of such a trial. Comparisons with the conditional method (Fisher's exact test) will show that the exact unconditional method leads to smaller (or equal) sample sizes for a=.05, 80% power and common sample size n=10 (1)150. In Chapter 4, the size of the Z-test for comparing two correlated proportions will be computed via the methodology developed in Chapter 2. As a result, exact critical values and sample size determinations will be obtained and tabulated. This exact unconditional test will produce smaller (or equal) sample sizes than the conditional test (the sign test) for a=.05, 80% power and the number of paired observations N=10 (1)200. Furthermore, a comparision of the critical regions will show that the exact unconditional Z-tests are uniformly more powerful than their conditional counterparts in the range considered, namely a=.025, a=.05 and n=10(l)150 for the independent case, and N=10 (1)200 for the correlated case. CHAPTER 2 METHODOLOGY FOR COMPUTING THE SIZE OF A TEST 2 . 1 Introduction In this chapter, a methodology is developed to compute the unconditional size of any test of hypothesis T when: a. There exists a nuisance parameter p such that 00, i is an indexing subscript over the whole sample space S defined by the sampling scheme, C is the set of subscripts for which the related sample points belong to the critical region defined by the testing procedure T and p is a nuisance parameter on the unit interval. The size of the test T is given by sup ir(p) and, because only discrete null distributions will be considered, is restricted to a finite collection of possible values, namely, 14 15 the natural levels of test T as referred to as in Randies and Wolfe (1979) . The remainder of this chapter deals with the technique of computing sup it (p) . The method used is based on XT the mean value theorem of differential calculus applied to successive subintervals of the unit interval. The use of this methodology will be illustrated in two important cases. The case of comparing two independent proportions is given in Chapter 3 and that of comparing two correlated proportions in Chapter 4. 2.2 Local Bound for tt ' (p) The applicability of the mean value theorem of differ- ential calculus, as stated in Courant and John (1965), requires primarily a bound on the derivative of it (p) for each subinterval. The task of finding such a bound is facilitated by the form of tt (p) , namely that it is a linear combination of binomial terms. In this section, a method of computing this bound for any subinterval is given. First, note that the derivative of ir(p) , tt'(p) = z a.{ b. PDi "-(l-pr1 - c. p^u-pr1"-1- } , ieC x x 1 (2.2.1) is also a linear combination of binomial terms of the form h(p) = pr (l-p)S_r 16 so that for any given subinterval I=(a,b) with 0 b pel = h(a) if p < a = h(p) if p e I (2.2.2) and inf h(p) = min ( h(a), h(b) ) (2.2.3) pel where p = r/s. An upper bound for it' (p) can be obtained on (a,b) by substituting the right hand side of (2.2.2) for each positive term of (2.2.1) and the right hand side of (2.2.3) for each negative term of (2.2.1). Similarly, a lower bound for tt'(p) can be obtained on (a,b) by reversing these substi- tutions. Finally, a bound M for |ir'(p)| on (a,b) is taken as the larger of the two bounds, in absolute value. 2.3 A Least Upper Bound for tt (p) Since local bounds for |ir*(p)| can now be computed for any subinterval (a,b) , the mean value theorem can be applied to the successive subintervals I,=(0,.01), I_=( .01, .02) , . . . , I, _ =(.99,1) of the unit interval to obtain an upper bound for ir(p) in each I.. An upper bound for ir(p), 0 max { ir(p.); i=l (1)100 } + 6 -1 i 1 m. must be iteratively subdivided into 2 ^ subintervals m. { I., ; k.=l,2,..., 2 -1 , m.£l }, where m. is the smallest JKj J D 3 integer such that m. wtP-iv ) + .005 M.. /2 : < max{max{.ir (p. ) } , max{ir (p . . ) }} + 5 3 3 33 i X Uj :uj . (2.3.3) for all k . , and where ra.-l P^v = (j-D/100 + (2k.-l)/(100x2 D ) 3 3 3 is the midpoint of I., and M., is the local bound for Dkj 33 • ir' (p) in I.. . The inequality (2.3.3) is clearly attainable since, from (2.2.1), M. <; Z a. max(b. ,c.) . (2.3.4) DKj ieC x x 1 Moreover, using the right hand side of (2.3.4) rather than M., is inefficient and would make computing costs Dkj prohibitively large. By the methodology developed in this section, it is now possible to compute the size (with precision 6) of any test for which the null power function is of the form (2.1.1) 19 These computations will then shed a light on the extent of conservativeness or liberalness of the tests that are used in the elimination of a nuisance parameter in the present context. 2.4 Stability of the Null Power Function In the ideal case of the absence of a nuisance parameter, the null power function is constant over the null parameter space. However, when a nuisance parameter is present, the magnitude of its effect on the null power function ir (p) can be of interest. A relevant feature is the stability or flatness of ir(p). An indication of this stability can be created using inf { ir(pj; pe(p ,p,)} (2.4.1) P where p and p, are chosen appropriately for each problem, a jd The difference between sup ir (p) and (2.4.1) is then an indicator of this stability. The whole unit interval is not used in (2.4.1) because it (0)=tt (1)=0. A lower bound for the set in (2.4.1) can be computed from (2.3.1) by min { Tt(p.) - .005M. ; J=Jad)Jb > where j . = int(100p ) + 1, a. 3. jb = int(100pb) + 1 and int(.) is the integer function, p- being as in (2.3.1) unconditional sample an acceptance region optimal procedures 20 2.5 Choice of the Test Statistic The choice of the testing procedure is quite arbitrary since the goal here is to compare unconditional to conditional tests. The test statistic that is derived from a testing procedure will simply be a means of dividing the space S into a critical region C and S-C. The choice should then be based on that produce test statistics that are powerful, simple to compute, and intuitively appealing. Three such procedures are the likelihood ratio test criterion, the chi- square goodness-of-fit test and a Z-test based on the asymptotically standardized maximum likelihood estimator, often a function of the sufficient statistic, of the parameter being tested. The likelihood ratio criterion is given by R = where L(6) is the likelihood function of the sample. If large sample theory applies, the more convenient equivalent statistic X2 = -2 log(R) (2.5.1) can be used because of its limit, chi-squared null distribution. The chi-square goodness-of-fit test, based on the statistic sup Ho L(8) sup L(8) 21 2 (O-E^2 * = E X (2.5.2) i E. i is especially apropos because of its applicability to multinomial data which lead to null power functions of the, form (2.1.1). The asymptotic null distribution of (2.5.2) is also chi-square. The third test is based on the asymptotic normality of 9 , the maximum likelihood estimator, if one exists, of the parameter, call it 6, being tested. The statistic 9 - e Z= ". , (2.5.3) s(6 ) n 2 ~ ~ ■ where s (9 ) is the asymptotic variance of 9 , or a consistent estimator thereof, has a standard normal asymptotic null distribution under the regularity conditions of likeli- hood theory. For the two problems considered in Chapters- 3 and 4, the maximum likelihood estimator is a function of the sufficient statistic. This statistic and the chi-square goodness-of-fit statistic are the most appealing since (2.5.1) could be computationally laborious. A further advantage of using the test statistics (2.5.1), (2.5.2) and (2.5.3) is their well-known and well-tabulated asymptotic distributions. These tables can be used to find reasonable starting points for the critical values. Moreover, these values can be compared to the percentage points of the asymptotic distributions and thus provide a study of the accuracy of the large sample approximations. CHAPTER 3 THE 2x2 TABLE FOR INDEPENDENT PROPORTIONS 3.1 Introduction A classical problem is the one of comparing two proportions from independent samples . This seemingly simple problem that involves only four numbers, has generated a large amount of .literature and has been the subject of much controversy about the use of conditional tests. Since Fisher (1935) proposed the "exact" test, Barnard (1947) and Pearson (1947) started a conflict that has not yet been resolved, as can be seen in the recent articles by Berkson (1978), Barnard (1979), Basu (1979), Corsten and de Kroon (1979) and Kempthorne (1979). Because of the complexity of the power function, only partial attempts have been made in order to resolve the argument. The statement of the problem follows. Let X and Y be independent binomial random variables with parameters (n,?^ and (n,p2) respectively. An experi- ment that compares p, and p_ is called a 2x2 comparative trial by Barnard (1947) , the outcome of which is represented in the form of Table 3.1, 22 23 Table 3.1 S F totals Tl X n-x n T2 y n-y n where p^ = P'(S|T^) = l-PfFl^), i=l,2. The labels S and F represent the binary outcomes ( S=success, F=failure) and . T, and T2 represent the two populations being compared. The problem is to test, at level a, the null hypothesis H0:]?l=]?2 against the alternative hypothesis H :p.p„ and the two-sided alternative Ha:P^P2 are treated in a similar manner, only this one-sided case will be considered here. Furthermore, only the case of equal sample sizes will be considered because of its optimality under equal sampling costs (Lehmann, 1959:146). The probability of observing the outcome in Table 3.1 is P(X=x,Y=y) = (x} px (1-Pl)n-X {P p* (l-p2)n-y and is, under the null hypothesis H0:p.=p2 (=p say), P(X=x,Y=y) = {x)(y} px+* (l-p)2""*^ , a function of the nuisance parameter p, the unspecified common value of p. and p_ under H_. 24 Because of this dependence on a nuisance parameter, either approximate tests based on asymptotic results or exact conditional tests are used. Few attempts, however, have been made to compute the exact unconditional size of any of these tests. Barnard (1947) proposed an uncondi- tional test based on sup tt (p) , the size. The criterion that he suggested was intricate and no methodology for computing the size was given. McDonald, Davis and Milliken (1977) tabulated critical regions based on the unconditional size of Fisher's exact test for n<;15 and a=.01 and .05. Again, no formal methodology for computing the size was given. Furthermore, no sample size tables or power calculations based on an exact unconditional test exist. In this chapter, the most common tests are presented. For the asymptotic case, two normal tests and some sample size formulae are given. For the general case, and in particular for small sample sizes, two derivations that both lead to Fisher's exact test are presented. As an alternative to these tests, the results of chapter 2 are used to compute and tabulate the exact unconditional size of two simple statistics as well as the required sample sizes for a significance level of a=.05 and a power of 1-8=. 80. It is also shown that these tests are uniformly more powerful than Fisher's exact test in the range considered, namely a=.025, a=.05 and n=10 (1)150. 25 3.2 Asymptotic Tests and Sample Size Formulae A way of circumventing the effect of the nuisance parameter is through the use of asymptotic tests. These approximate tests are appealing because they are usually based on simple test statistics for which the limiting distributions are well tabulated. They are, however, approximations and should not be used when the sample sizes ar.ev small. When n is relatively large, the most widely used tests for the hypothesis of interest are the normal tests. The first one, based on the inversion of the asymptotic confidence interval for P2-Pi/ is tne Z-test with an unpooled estimator of the variance and is given by = ^ (P2 - Pl} (3.2.2) (p2q2 + Pi<3i> where p1 = x/n = l-q1, p2 = y/n = l-q2, with x, y and n as in Table 3.1. The second one, based on the asymptotic null distribution of P2~P1f is the Z-test with a pooled variance estimator and is given by /n (P, - Pi) Z = - — (3 2--21 P ( 2 p q >% (3*2'2) where p1 and p2 are as in (3.1.1) and p = (x+y)/2n - 1-q. The limiting distribution of both Z and Z is the standard . u p normal distribution and an approximate test of size a is based on the percentage points of $, the standard normal 26 distribution function. The test statistic Z is most P frequently used through an equivalent test statistic, the chi-square goodness of fit statistic given by Xp=Zp (3.2.3) 2 which has a X.. limiting distribution. Because this chi- square test deals with two-sided alternatives, the statistic Z is preferred in the present context of a one-sided test. The accuracy of the approximation was studied by various authors. The nominal significance level a was compared to the actual significance level by computing (3.4.3) for some values of p. Between the papers by Pearson(1947) and Berkson (1978), numerous studies have shown that for Z_, the actual level could be larger than the nominal level for some values of p, making this test a liberal one. To determine the sample size required in each group, two formulae are often used. The first one, based on Z and derived in Fleiss (1980) , determines the sample size in each group by [ z Up^)* - z, (p ,q +p q )h ]2 n = - ^ -^± — £_J (3.2.4) (P2-pl)2 where z is the upper 100y percentile of the standard normal distribution, a and 8 are the type I and type II error 27 probabilities, p. and p2 are the desired alternatives, arid P = (Pi+Pt)/^ = 1~P- The second formula, based on the variance stabilizing property of the arcsine transformation on proportions, is given in Cochran and Cox (1957) by n = (za + z3) (3.2.5) aS -1 -1 2 2 (sin /p, - sin /p_) Other formulae have been derived and are mostly corrected versions of (3.2.4). Kramer and Greenhouse (1959), arguing that the test based on Z was too liberal, adjusted (3.2.4) and found { 1+[1+8(P2-Pl)/Vi5 }2 , (3.2.6) nc np 2 4(p. - p, ) where n is the sample size found from (3.2.4). More recent- XT ly, namely since sample size tables based on the exact conditional test were computed, a further adjustment to (3.2.4) was suggested by Casagrande, Pike and Smith (1978a) in order to arrive closer to results based on the exact conditional test. They proposed the formula { l+tl+Mp^/np]* }2 \ {32.7) nr np 2 4(P2 - Px) 28 the derivation of which was based on a slight deviation from the derivation of (3.2.6). 3.3 Fisher's Exact Test The "exact" method of eliminating the nuisance parameter is based on a conditional argument and can be obtained via two different approaches. The first approach, put forward by Fisher (1935), is that of a permutation test. The permu- tation test argument is a conditional one in that the critical region is1 constructed on a space conditional on some information from the data. Fisher argues that, because the marginal totals of Table 3.1 alone do not supply any information about the equality of p, and p2, it is reasonable to test conditionally. Thus, given x+y and under HQ:p,=p2, the probability of Table 3.1 is given by the hypergeometric distribution, namely (n) (n) x y P(x,y|x+y) = • (3.3.1) (2n ) x+y' This is Fisher's exact test, the size of which is based on the tail areas of (3.3.1). The second approach is based on the Neyman-Pearson lemma for. testing hypotheses, a thorough treatment of which is given in Lehmann (1959:134) in the case for which a nuisance parameter is present. In the current case, the probability of Table 3 . 1 is given by 29 p(x,y) = 0 Px d-Pl)n-x (y} 4 d-p2)n-y (x)(y} (1-Pl)n (l-p,)n x exp{x log[p1/(l-p1)]+y log[p2/ (l-p2) ] } = (x)(y} (1-Pl)n (l-p2)n x exp{x logL /(1_p )j +(x+y)log[p2/(l-p2)]}. By Lemma 2 of Lehmann (1959:139), the uniformly most powerful unbiased (UMPU) level a test for comparing p, and p2 is based on the conditional distribution of X(=x) given T=X+Y(=t) and has the form (x,t) =1 when x < C(t) = y (t) when x = C(t) = 0 when x > C(t) where C and y are determined by E„ [ (j»(X,T) |T=t ] = a H0 for all t, that is 30 a = P [X px t ,n» , n v 2, U t-U p U=0 Pl/(1-Pl) where p = is the odds ratio, P2/(l-p2) and. under Hn, the distribution is given by (n) ( n ) PH (X=x|T=t) = , x=0,l,...,t, the same hypergeometric distribution found by Fisher's permutation method. Here, C(t) is taken to be the largest value such that (n) ( n V C(t)-1 V ^t-x; Z £ a. x=0 .2n. In practice, the nonrandomized version of <{> is used, that is $ without the random element y(t) . Therefore, the 3i conditional test always has size ^ a and, unlike , is not UMPU of level a. 3.4 An Exact Unconditional Test In this section, the methodology of Chapter 2 is used to compute the size of Z , the normal test statistic with unpooled variance estimator. This statistic was chosen on the basis of its computational simplicity and its intuitively appealing form. It is given by •n (p2 - px) >2q2 + Pxqr Z„ = = i — 5- (3.4.1) where p, = x/n = 1-q-, / P2 = Y/n = I'^o ' with x, y and n as in Table 3.1. The asymptotic null distribution (the standard normal) of Z is frequently used in this problem to approximate its actual size. The results of this chapter can thus be used to verify the accuracy of this approximation. Since x and y are outcomes of independent binomial random variables with parameters (n,p,) and (n,p2) respective- ly •■/. the power function of any test is given by n(p,,P2) = E E (x} px (l-p,)n"X Vp* (l-p-)n"y 1 Z (x,y)eC L L 2 2 (3.4.2) and under HQ:p,=p~ (=p say), the null power function, also 32 denoted by tt, is given by ir(p) = E E (x)(y} pX+y (l-p)2""^ (3.4.3) (x,y)eC where p is the nuisance parameter and C is the critical region defined by the test statistic. For the one-sided test of interest, the critical region defined by Z is given by C = {(x,y): Zu > zu; x,y=0(l)nf z^O }. (3.4.4) For an a level test, the critical value of Z , namely z , u' J u satisfies the equation * zu = inf {zu: sup tt (p) £ a }. (3.4.5) P Since (3.4.3) has the form of (2.1.1), the methodology of * Chapter 2 can be utilized to find a , a value at most 6 above sup tt (p) in (3.4.5) . First, to simplify the computations, (3.4.3) can be reduced to a single summation by solving the inequality zu > zux namely (-y(n-y) + x(n-x)^ y - x > z„ . (3,4.6) After squaring both sides of (3.4.6), the larger root for y in 33 n (y-x) = z2 [y(n-y) + x(n-x)] is found to be y = h(x) = 5 t (b2-^ 2a 2 where a = 1 + z /n , u b = 2x + z2 , ' ■ - 2 2 and c = ax - xz Hence (3.4.4) reduces to? C = {(x,y): y > h(x) ; x,y=0(l)n }. .. (3.4.7) Next, (3.4.3) can be written as w(P) -I E (x"y' px+y (l-p)2"-^ x=0 y>h(x) = Z (x} px (l-p)n-x Z (y> Py (l-p)^ x=0 y>h(x) v Z f(x) [ 1 - F(h(x)) ] (3.4.8) x=0 2 2 where v = int[ n /(n+z ) ], int[.] is the integer function, f (.) is the binomial probability mass function with parameters (n»p) and F(.) is its cummulative distribution function. 34 The derivative of ir (p) , which can also be reduced significantly to simplify the computations, is given by ir'(p) = E # (x+y) p^"1 (l-P)2n-x-y ,nx ,n C ,nx ,n -()()/ . \ x+y , , x 2n-x-y-l - E x y (n-x+n-y) p 2 (1-p) C where E denotes the double summation E E C (x,y)eC It can be rewritten as ./ x «. (n~h (n) x+y-1 ,, ,2n-x-y ir'(p) = E n vx-l' vy' p ■* (1-p) x ■ C C (n_1) (n) x+y ,, , 2n-x-y-l - E n x y p 2 (1-p) -1 C ( ) ( ) x+y ,, x 2n-x-y-l /r) . ft« - E n x' y p 2 (1-p) J , (3.4.9) C (n-l} (n-l} where v -V = v n ' = 0 Consider the boundary of the critical region C defined by W .'= {(x,y) : (x,y)eC and (x+l,y)£C }. The sum of the first and third terms Of (3.4.9) becomes, after cancellation of opposing signed identical contributions, 35 Tr-(p) = - I n (n;1)(y} px+y (l-p)211"^'1 . (3.4.10) 1 W The other boundary of C is defined by V = { (x,y) : (x,y)eC and (x,y-l)jzfc }. Then the sum of the second and fourth terms of (3.4.9) becomes ir'(p) = En (x)(?:i} p***"1 (l-p)2n-X^ . (3.4.11) V Upon combining (3.4.10) and (3.4.11), the derivative of "rr (p) is given by it- (p) = ir£(p) + ir^(p) , . and can be further reduced by noticing, from (3.4.4) and (3.4.6) , that (x0,y0)eC iff (n-y0,n-x0)eC so that (x,,y,)eV iff (n-y^n-x^ eC and (n-y1+l,n-x1) £C iff (n-y, ,n-x, ) eW. 36 The derivative of it (p) can finally be written as ir'(p) = En [<5)(y"l) p^'1 (l-p)211"^ V (n-l} ( n ) 2n-x-y ,, ,x+y-l , n-y' n-x p J (1-p) ] v (n) (n"h r x+y-1,, , 2n-x-y E n vx y-1 [ p J (1-p) V 2n-x-y (l-pjW-1 ] f - P a summation over the set of sample points that form a boundary of the critical region C and which are directly obtained from the reduction (3.4.7) . The methodology developed in Chapter 2 can now be * utilized to find a , the size (of precision 6=.001) of Z for any value z . For a test of significance of level a, * the critical value z can then be obtained by equation (3.4.5). This is done by using the 100a percentile point of the standard normal distribution as a starting of z . This * value is then incremented or decremented until a < a and * z is taken as the smallest value which satisfies this inequality. This procedure was implemented in a FORTRAN computer program, listed in Appendix C.l. * For n=10 (1)150 and a=.05 and .025, z , the exact * critical values and a , the size (of precision <5=.001) of Z , were computed and are given in Table A.l. Furthermore, Table A.l also contains a,, a lower bound for {ir(p); .05 0 =0 if u = 0 = -1 if u < 0 . Robbins (1977) has noted that |Zu| > |z | for the equal sample size case and has posed the question as to which of Z or Z is more powerful. This question was 40 investigated by Eberhardt and Fligner (1977) . They noticed, via a computational argument, that the increase in the significance level for Z is compensated fairly well by an increase in power. Moreover, they suggested that Z should not be used for small samples because Z is closer to a standard normal random variable . In view of the relation (3.5.5), Z and Z are monotonic increasing functions of each other and are therefore equivalent in the sense that Z , with some nominal significance level a, is equivalent to Z with some lower level a. Thus, for the same nominal level a, Z will reject H- more often than Z will. 3.6 Power and Sample Sizes. Given the critical values of Table A.l, it is now possible to compute the exact power by (3.4.2) for a=.025 and a=. 05 and various values of p. and p2« The minimum sample size required per group to attain a power of 1-g and significance level of a can thus be computed by solving the equation n = min {n: II (p,,p2) >l-g} where the critical region that defines K (p. ,p2) is based on * z , a function of n. u This equation was solved for a=.05 and l-&=.80 and the results are given in Table A. 2 for various combinations of * p, and p-. Table A. 2 also contains the critical values ?; , 41 * the size a (of precision 6=.001) and the attained power * 1-6 . This table is thus sufficient for both the design and analysis of the 2x2 comparative trial. Table A. 3 compares the results of Table A. 2 to the exact conditional test sample sizes [n ] found in Gail and Gart (1973) , Haseman (1978) and Casagrande, Pike, and Smith (1978b). Furthermore , the approximate formulae given in * section 3.2 are also computed and compared to n and n in Table A. 3. For the configurations considered, it is * seen that n tend to be smaller than n , the sample sizes determined by Fisher's exact test. Furthermore, the sample sizes based on the arcsine formula [n 1 and those based on as Z , the pooled Z-test [n ] , tend to co-agree quite well and * to be, in general, slightly smaller than n . The other formulae discussed in section 3.2, namely n„ and n_ are c r * seen to exceed n and n^. e A direct comparision of the critical regions defined by Fisher's exact test and the exact Z-tests was performed numer- ically. It showed that, for all the cases considered, the critical region defined by Fisher's exact test is contained in the critical region defined by the exact Z-tests. Therefore, the exact Z-tests are uniformly more powerful than Fisher's exact test for the cases n=10 (1)150 and a=.05 and .025. CHAPTER 4 THE 2*2 TABLE FOR CORRELATED PROPORTIONS 4 . 1 Introduction When the dichotomous responses for each of two regimens are sampled in pairs, either by measuring the same experimen- tal unit under each regimen or by pairing experimental units with respect to some common characteristic, the problem of comparing the success rate of these two regimens involves two correlated proportions. Prior to 1947, this type of data was incorrectly analyzed as if they were independent binomial samples. McNemar (1947) derived the variance of the differ- ence between two correlated binary random variables under the null hypothesis of equal success rate and consequently, using an asymptotic approach, derived the well-known "McNemar' s test". This problem, like the independent binomial case, falls into the realm of testing a hypothesis in the presence of a nuisance parameter. Analogous to the independent binomial case (Chapter 3) , the most common methods of tack- ling this problem are based on asymptotic approximations or on the conditional approach. The problem is formulated as follows. 42 43 Let (R,S) represent a pair of binary random variables with joint distribution P(R=i,S=j) = pi. , i,j=0,l, ZZ p±. = 1. ij J The outcome of a random sample of N such matched pairs is usually displayed in the form of a 2x2 contingency table such as Table 4.1, Table 4 1 S 0 l totals u X u+x y V y+v totals u+y x+v N where {u,x,y,v} are the frequencies. The problem is to test, at level a, the null hypothesis HQ:P(R=1) = P(S=1) against one of the alternative hypotheses H :P(R=1) < P(S=1), H :P(R=1) > P(S=1) or H :P(R=1) ^ P (S=l) . a a a For the sake of illustration, only the alternative hypothesis H :P(R=1) < P(S=1) will be considered. Note that a. P(R=1) = P(R=1,S=0) + P(R=1,S=1) and P(S=1) = P(R=0,S=1) + P(R=1,S=1) 44 so that the problem becomes that of testing Hn:pnl=p1Q against H :p 1>p.fl. The likelihood of the sample is given by ( N ) P(u,x,y,v) = {u x y v' pjjg p^ p^0 pi;L the quadrinomial distribution with probabilities {p.. ; i,j=0,l}. Under the null hypothesis ho:Pqi=Pio (=P saY) i the likelihood of the sample becomes r> / \ ( ) u x+y , , o \ v PH (u,x,y,v) = 'u x y v' PQ0 P J (l-p0Q-2p) , a function of the unspecified common proportion p and an unknown probability P00^ The problem was first tackled by McNemar (1947) who used the asymptotic approach of the standardized sufficient statistic. Cochran (1950) , by an intuitive argument, reduced the problem to a sign test, which is the exact conditional test obtained by the Neyman-Pearson approach to the elimina- tion of nuisance parameters. Bennett (1967) has computed the chi-square goodness-of-fit test statistic and observed that it coincides with McNemar 's test. A point to note about these asymptotic tests is that they are also conditional in the sense that they only involve x and y, and not N. It turns out that they simply evolve from the asymptotic null distribution of the exact conditional test. No attempts have been made to compute the size of any of these tests, although Bennett and Underwood (1970) , in assessing the adequacy of 45 McNemar's test against its continuity-corrected form, have computed their null power functions for three values of the nuisance parameter p, namely p=.10, .50 and .90. Beyond this investigation, researchers have completely relied upon these asymptotic approximations and the conditional test. It is surprising that conditional tests were not contested in this problem, in light of the fact that they are solely based on the number of discordant pairs x and y, and not at all on the number of concordant pairs u and v. That these tests do not involve N could be disturbing. Lehmann (1959:147), discuss- ing in the context of the sign test with ties, has hinted that N enters the picture through the parameter p0Q when the unconditional power is computed. Approximate power calculations and derivations of sample size formulae were made by Miettinen (1968). Bennett and Underwood (1970) compared the exact and approximate powers of McNemar's test and its continuity-corrected form for alterna- tives close to the null state. Schork and Williams (1980) tabulated the required sample sizes based on the exact power function of the conditional test. In this chapter, McNemar's test and other asymptotic- type tests will be presented. The approximate sample size formulae will also be given. The exact conditional test will be derived via the Neyman-Pearson approach. The results of Chapter 2 will then be used in section 4.4 to compute and tabulate the size of McNemar's test for the one-sided case. 46 In section 4.5, the exact unconditional critical values obtained in 4.4 will be used to tabulate the required sample sizes for a significance level of a=.05 and a power of 1-3=- 80. It is also shown that this exact unconditional test is uniformly more powerful than the exact conditional sign test for the cases considered, namely a=.05, a=.025 and N=10 (1)200. 4.2. McNemar's Test, other Asymptotic Tests and Sample Size Formulae \ McNemar (1947) derived the mean and variance of S-R (as defined in Table 4.1) under the null hypothesis and thus proposed the asymptotic test statistic 2 (x - y)2 * = (4.2.1) x +; y for the two-sided alternative. This statistic has an asymp- 2 totic X^ null distribution. Cochran (1950) reduced the problem to a sign test, using the statistic j (x - hn) 2 (y - %n) 2 XZ = + %n ^n (x - y)2 x + y where n=x+y, the total number of discordant pairs. Bennett (1967) used the chi-square goodness-of-f it test, applied to 47 the quadrinomial frequencies of Table 4.1, to find (u- Np0Q)2 (x - NpQ1)2 (y - Np1Q)2 x = + + NPnn Npnl Np (v - NPll) 00 "^01 "F10 2 + Np 11 (x - y)2 x + y where the P-;^,s are the maximum likelihood estimators of the Pj^'s under HQ. The three methods lead to the same test statistic, namely McNemar' s, and therefore have the same asymptotic null distribution. To determine the required sample size, Miettinen (1968) derived two formulas. The first one, based on an approxima- 2 tion to the asymptotic unconditional power function of X (McNemar's test statistic) gives, for a one-sided test of significance level a and power 1-&, the required sample size as { zj + z (ij;2 - A2)*5 }2 Na = 2 § (4.2.2) 1 2 where ^=Poi+Pio' A=p10~p01 and zy ^s the uPPer 10°Y percen- tile of the standard normal distribution. The second formula 48 is based on a more precise approximation to the asymptotic 2 unconditional power function of X and is given by { z Jf + zR [^2 - %A2(3+i|;)]Js }2 N = 2 5 . (4.2.3) 2 2 For the purpose of comparision with exact conditional and exact unconditional results, these formulas were computed and are given in Table A. 6. These comparisions are discussed in section 4.5. 4 . 3 The Exact Conditional Test The exact conditional test is obtained by the Neyman- Pearson approach described in Lehmann (1959) . The probability of the sample is given by ( N ) x P(u,x,y,v) = u x y v; pj}0.P01 P^0 P^ and can be written in the exponential family form as ( N ) P(u,x,y,v) = vu x y v1 exp{ u log(p0Q) + x log(pQ1) + (N-u-x-v) log(p1Q) + v log(pi:L)} It can be reparametrized as 49 ( N ) P(u,x,y,v) = vu x y v' exp{ u log(p00/p1()) + x log(p01/p10) + v log(Pll/p10) + N log(p10) }. The new parameters are, in the notation of Lehmann (1959) , e "■- iog(p01/p10) v = ( log(p00/plp) , log(pi;L/p10) ), and the hypothesis to be tested becomes Hn:6=0 against 0 Ha:6>0. The sufficient statistics" are X = Z (l-R.)S. i-1 X X T= (U,V) = ( Z (1-R. ) (1-S.) , Z R. S. ). i=l x X i=l X X Therefore the UMPU test is given by (x,t) = 1 when x > C(t) = y ( t) when x = C(t) = 0 when x < C(t) where C and y are such that 50 E { <|>(X,U,V) | U=u, V=v } = a , all u,v. "o To find this conditional expectation, first notice that the distribution of (U,V) is P(u.v) = p", pjx d-P00-Pll)N-U-V so that the distribution of X given U=u and V=v is P(x|u,v) = px (l-p)n-x where n=x+y and p = pQ1/ (P01+P1()) ' Therefore, the null hypothesis HQ:p01=p10 reduces to HQ:p=%, the usual sign test problem based only on n, the total number of discordant pairs. Because this conditional distribution of X is discrete, the test <{> needs the randomization element y to become UMPU of level a. However, since the practice of using y is rare, the test without randomization will be a conservative one and not UMPU of level a. 51 4.4 An Exact Unconditional Test As in the case of two independent proportions, the choice of the test statistic is based on the standardization of the sufficient statistic for the parameter being tested, namely Pqi'Pio' Tnis statistic is the square root of McNemar's test statistic and is given by x - y Zc = r (4.4.1) . (x+yp where x and y are as in Table 4.1. This statistic is often written in terms of n (=x+y) , the total number of discordant pairs, as x - %n Z„ = , (4.4.2) c Jj/ti the approximation to the sign test, referred to the standard normal distribution. In this section, the methodology of Chapter 2 is used to compute the size of Z and the exact critical values based on Z . These values will then provide a means of assessing the accuracy of the normal approximation. The power function of Z is given by ^Poi^io1 = (xyeC l (u x y v> Poo Poi Pio pIi ' a function only of pQ, and p,Q since it is based on the marginal distribution of (X,Y) which is obtained as 52 p p»o poi pjo <>i~ry-u - P10j the critical 53 region C defined by Z is given by C = {(x,n): Zc > zc; x=0(l)n, n=0(l)N, z >0}. (4.4.5) * For an a level test, the critical value of Z , namely z y satisfies the equation z = inf {z : sup it (p) < a} . (4.4.6) Note that ir (p) , as defined in (4.4.4), is a function of p as well as of z through (4.4.5). Since (4.4.4) has the form of (2.1.1), the methodology developed in Chapter 2 can be * utilized to solve (4.4.6) and thus find a , a value at most 6 above sup tt (p) . First to simplify the computation of (4.4.4), notice that the inequality Z > z is in fact x > Jjz /n + %n so that the critical region C reduces to C = {(x,n): x > h(n); x=0(l)n, n=0(l)N} , where h(n) = %{z /n + n}. The null power function (4.4.4) becomes 54 ir(p) = -E S V'x' %n pn (l-p)N"n n=0 x>h(n) I n p (1-p) E x % n=0 x>h(n) E (n} pn (l-p)N"n [1 - F (i )■] n=k n n where k = int[zc + 1], in= int[h(n)], int[.] is the integer function and F_(.) is the binomial cumulative distribution with parameters (n,%) . Notice that, since i > Jjn, it is more efficient to compute n ,n, E (x> hn x=i +1 n instead of 1-Fn(in). Then, by the symmetry of the binomial distribution with (n,^), the null power function of Z can be rewritten as N ,N. ir(p) = E V pn (l-p)N"n F (n-i -1), (4.4.7) n=k n n in the form of (2.1.1). The derivative of the null power function is 55 tt'(p) = I F (n-in-l) V [ n pn_i (l-p)N~n n=k n n (N-n) p (1-p) ] so that the methodology developed in Chapter 2 can now be * utilized to find a , the size (of precision S=.001) of Z for any value z . In this problem, the size was taken as sup{iT (p) :0 5/N} and a2, a lower bound for {ir(p): p > 10/N} as indicators of the stability of the null power function. These lower bounds on p are obtained for expected number of discordant pairs of at least 5 and 10 respectively. The null power function, tt (p) , of the exact conditional test, as well as of the exact and approximate unconditional tests, was plotted for some values of N and a nominal level of significance of a=.05. For N=10, Figure B. 7 contains the plot of it (p) based on the normal approximation of Z (critical value z =1.645). It is apparent that using the normal approximation in this case induces a liberal test for that range of the nuisance parameter p where the null power function exceeds the nominal level a=.05, namely • 30.74. Figure B.8 is the plot based on the unconditional critical region defined by the exact conditional test, namely the sign test. Here, the test is very conservative, its actual size being approximately .013. * In Figure B.9, the exact unconditional critical value z =1.90 of Table A. 4 is used to plot tt (p) . From these plots, the exact Z-test (Figure B.9) is seen to perform best at approaching the nominal significance level without exceeding it, although, because of the sparsity of its natural levels its size is only .0265. 57 The null power function of the exact unconditional test Z is seen to behave better for larger values of N. For N=30, Figure B.10 is the plot of tt (p) based on z =1.74 * and Figure B.ll is the plot of tt (p) based on z =1.68 and N=40. 4.5 Power and Sample Sizes Now that the exact critical values of Z have been c computed (Table A. 4) , the exact power in (4.4.3) can be readily obtained for a=.025, a=.05, N=10 (1)200 and various values of pQ1 and P10- Consequently, the minimum sample size required to achieve a power of 1-3 and a significance level of a for a combination of (Pqi/Pi0) can be computed by solving the equation N* = min { N: n(p01,p1{)) > 1-3} , (4.5.1) where the critical region that defines n(p01,p,0) ^s based * on z , a function of N. c Because all other sample size results are given in terms of the parameters ijj=pQ1+p10 and A=p,Q-p0, ' equation (4.5.1) was solved in terms of these parameters for the purpose of comparability. For a=.05 and 1-B=. 80, and various combina- tions of ty and A, the minimum sample sizes from (4.5.1) are given in Table A. 5. This table also contains the critical * values z , the size (of precision 6=.001) of Z and the * attained power 1-3 . Therefore, Table A. 5 is sufficient for 58 both the design and the analysis of the 2x2 table for comparing two correlated proportions. * In Table A.6f the exact unconditional sample sizes [N ] of Table A. 5 are compared to the exact conditional sample sizes [N ] found in Schork and Williams (1980) . Furthermore, the approximate formulae derived by Miettinen (1968) , namely N and N of section 4.2, are computed and also compared 12 to N and N in Table A. 6. The exact unconditional sample * sizes N are seen to be smaller than N , the sample sizes based on the exact conditional test, for all except some combinations of ty and A. This seems to happen for larger : values of \|> and A. The approximate sample sizes N and N al a2 are almost equal to each other, much smaller than N and * slightly smaller than N . Because these results suggest that the exact uncondi- tional test might be more powerful than the exact conditional test, the critical regions of each test were compared numerically. This comparision showed that, for all the cases considered, the critical region defined by the exact conditional test (sign test) is contained in the critical region defined by the exact Z-test. Therefore, the exact Z-test is uniformely more powerful than the exact conditional test for the cases considered, namely N=10(l)200, a=.025 and ot=.05. APPENDIX A TABLES These tables contain critical values and sample size determinations for the problems of comparing two independent proportions and of comparing two correlated proportions. For one-sided tests, the tables of critical values are produced for significance levels ot=.05 and .025, and the sample size tables for a level of a=.05 and 80% power. The legend for these tables is given below. Legend for Tables A.l, A. 2 and A. 3; two independent proportions n = sample size in each group a = nominal significance level a, = lower bound for {ir(p): .05 5/N} a2 = lower bound for {it (p) : p > 10/N} Tr(p)= null power function * z„ = exact one-sided critical values of Z , the Z-test c c A = P1(TP01 * = P10+P01 p01 = p(R=0'S=1) a z Oti a-i a Z 1 2 c 1 .0203 2 .0203 .0251 C ' 130 .0388 .0403 .0499 1.68 1.98 131 .0388 .0403 .0498 1.68 .0203 .0203 .0251 1.98 132 .0387 .0404 .0498 1.68 .0202 .0203 .0251 1.98 133 .0387 .0389 .0499 1.68 .0202 .0203 .0245 2.00 134 .0386 .0397 .0499 1.68 .0202 .0202 .0245 2.00 135 .0386 .0397 .0500 1.67 .0202 .0202 .0245 2.00 136 .0386 .0398 .0499 1.67 .0202 .0202 .0250 1.98 137 .0385 .0398 .0499 1.67 .0202 .0202 .0250 1.98 138 .0385 .0399 .0499 1.67 .0201 .0202 .0250 1.98 139 .0385 .0399 .0499 1.67 .0201 .0201 .0250 1.98 140 .0384 .0400 .0499 1.67 .0199 .0199 .0250 1.98 141 .0383 .0400 .0498 1.67 .0201 .0201 .0250 1.98 142 .0383 .0401 .0497 1.68 .0200 .0201 .0251 1.98 143 .0382 .0401 .0497 1.68 .0200 .0201 .0251 1.98 144 .0382 .0402 .0499 1.67 .0094 .0197 .0241 2.01 145 .0381 .0402 .0499 1.67 .0095 .0197 .0237 2.01 146 .0381 .0403 .0499 1.67 .0200 .0200 .0249 1.99 147 .0380 .0403 .0498 1.67 .0199 .0200 .0246 1.99 148 .0380 .0404 .0498 1.67 .0199 .0200 .0246 1.99 149 .0379 .0395 .0498 1.67 .0199 .0199 .0246 1.99 150 .0379 .0395 .0498 1.67 .0199 .0199 .0246 1.99 151 .0379 .0395 .0498 1.67 .0198 .0199 .0250 1.98 152 .0379 .0396 .0499 1.67 .0198 .0199 .0250 1.98 153 .0378 .0396 .0499 1.67 .0198 .0199 .0250 1.98 154 .0377 .0387 .0498 1.67 .0198 .0198 .0250 1.98 155 .0376 .0387 .0498- 1.67 .0198 .0198 .0250 1.98 156 .0376 .0388 .0499 1.67 .0197 .0197 .0250 1.98 157 .0375 .0388 .0497 1.68 .0197 .0197 .0246 2.00 158 .0374 .0388 .0496 1.68 .0197 .0197 .0246 2.00 159 .0374 .0389 .0500 1.67 .0197 .0197 .0246 2 . 00 160 .0373 .0389 .0498 1.67 .0196 .0197 .0249 1.99 161 .0373 .0389 .0498 1.67 .0196 .0197 .0247 1.99 162 .0372 .0390 .0498 1.67 .0196 .0197 .0246 1.99 163 .0372 .0390 .0498 1.67 .0196 .0196 .0246 1.99 164 .0372 .0390 .0498 1.67 .0195 .0196 .0246 1.99 165 .0371 .0390 .0498 1.67 .0195 .0196 .0246 1.99 166 .0371 .0391 .0498 1.67 .0195 .0196 .0246 1.99 167 .0370 .0391 .0498 1.67 .0195 .0196 .0246 1.99 168 .0370 .0391 .0498 1.67 .0194 .0195 .0246 1.99 169 .0369 .0392 .0498 1.67 .0194 .0195 .0247 1.99 Table A. 4 — continued 74 a = .05 a = .025 * * * * N al a2 a zc al a2 a zc 170 .0368 .0392 .0498 1.67 .0194 .0195 .0251 1.98 171 .0367 .0393 .0498 1.67 .0194 .0195 .0245 2.00 172 .0367 .0393 .0498 1.68 .0193 .0195 .0245 2.00 173 .0366 .0393 .0498 1.68 .0193 .0195 .0251 1.99 174 .0365 .0382 .0498 1.67 .0193 .0194 .0249 1.99 175 .0365 .0382 .0498 1.67 .0193 .0193 .0247 1.99 176 .0364 .0383 .0498 1.67 .0192 .0193 .0246 1.9.9 177 .0363 .0383 .0498 1.67 .0192 .0193 .0246 1.99 178 .0363 .0383 .0498 1.67 .0192 .0193 .0246 1.99 179 .0362 .0383 .0498 1.67 .0191 .0193 .0246 1.99 180 .0362 .0383 .0498 1.67 .0191 .0193 .0251 1.98 181 .0361 .0384 .0498 1.67 .0191 .0192 .0251 1.98 182 .0361 .0372 .0498 1.67 .0191 .0191 .0247 1.99 183 .0360 .0372 .0498 1.67 .0191 .0191 .0250 1.99 184 .0360 .0372 .0498 1.67 .0190 .0190 .0245 2.00 185 .0360 .0372 .0498 1.67 .0190 .0190 .0250 1.99 186 .0359 .0373 .0498 1.67 .0190 .0190 .0251 1.99 187 .0359 .0373 .0499 1.67 .0190 .0190 .0249 1.99 188 .0359 .0373 .0501 1.67 .0190 .0190 .0247 1.99 189 .0358 .0373 .0497 1.68 .0189 .0189 .0246 1.99 190 .0358 .0373 .0498 1.68 .0189 .0189 .0246 1.99 191 .0358 .0373 .0498 1.68 .0189 .0189 .0246 1.99 192 .0357 .0374 .0500 1.67 .0189 .0189 .0251 1.98 193 .0357 .0374 .0498 1.67 .0189 .0189 .0250 1.98 194 .0357 .0374 .0498 1.67 .0189 .0189 .0250 1.98 195 .0356 .0374 .0498 1.67 .0188 .0188 .0250 1.98 196 .0356 .0374 .0498 1.67 .0188 .0188 .0251 1.98 197 .0355 .0374 .0498 1.67 .0188 .0188 .0250 1.98 198 .0355 .0374 .0498 1.67 .0188 .0188 .0251 1.98 199 .0354 .0374 .0498 1.67 .0188 .0188 .0247 1.99 200 .0354 .0375 .0498 1.67 .0187 .0187 .0247 1.99 Table A. 5 Minimum Sample Sizes to Achieve 80% Power and a <.05 ; for One-sided Z-test for Comparing Two Correlated Proportions. 75 1-e .10 .30 185 1.67 .0498 .8006 .22 134 1.68 .0499 .8014 .14 80 1.69 .0499 .8038 .20 .98 153 1.67 .0499 .8003 .94 146 1.67 .0499 .8014 .90 139 1.67 .0499 .8014 .86 135 1.67 .0500 .8066 .82 129 1.68 .0498 .8018 .78 122 1.68 .0499 ^8018 .74 116 1.68 .0499 .8042 .70 108 1.67 .0499 .8016 .66 103 1.68 .0499 .8026 .62 96 1.67 .0499 .8027 .58 89 1.67 .0500 .8013 .54 82 1.67 .0500 .8025 .50 77 1.67 .0501 .8070 .46 70 1.68 .0499 .8005 .42 67 1.70 .0498 .8219 .38 58 1.68 .0501 .8193 .34 51 1.70 .0500 .8058 .30 44 1.68 .0500 .8186 .26 38 1.74 .0419 .8081 .22 30 1.74 .0450 .8124 .30 .88 63 1.74 .0427 .8029 .84 58 1.68 .0501 .8081 .80 57 1.73 .0501 .8088 .76 52 1.70 .0500 .8033 .72 49 1.70 .0501 .8035 .68 46 1.68 .0501 .8029 .64 44 1.68 .0500 .8117 .60 39 1.68 .0501 .8020 .56 39 1.68 .0501 .8324 .52 35 1.74 .0429 .8003 .48 33 1.74 .0458 .8120 .44 30 1.74 .0450 .8123 .40 26 1.74 .0434 .8075 .36 22 1.74 .0425 .8076 .32 19 1.74 .0399 .8195 .40 .98 39 1.68 .0501 .8209 .94 38 1.74 .0419 .8089 .90 35 1.74 .0429 .8037 .86 34 1.74 .0435 .8046 Table A. 5 — continued 76 1-6 .40 .82 33 1.74 .0458 .8090 .78 31 1.74 .0407 .8114 .74 29 1.74 .0407 .8117 .70 27 1.74 .0413 .8108 .66 25 1.74 .0494 .8067 .62 23 1.74 .0427 .8003 .58 22 1.74 .0425 .8120 .54 20 1.79 .0395 .8134 .50 17 1.74 .0413 .8002 .46 15 1.81 .0370 .8030 .42 14 1.74 .0371 .8365 .50 .88 21 1.74 .0459 .8029 .84 21 1.74 .0459 .8201 .80 19 1.74 .0399 .8062 .76 18 1.74 .0446 .8049 .72 17 1.74 .0413 .8031 .68 16 1.74 .0454 .8119 .64 14 1.74 .0371 .8020 .60 13 1.74 .0430 .8166 .56 12 1.74 .0373 .8325 .52 11 1.74 .0395 .8517 .60 .98 18 1.74 .0446 .8522 .94 16 1.74 .0454 .8414 .90 16 1.74 .0454 .8525 .86 15 1.81 .0370 .8196 .82 13 1.74 .0430 .8023 .78 12 1.74 .0373 .8084 .74 11 1.74 .0395 .8107 .70 11 1.74 .0395 .8504 .66 11 1.74 .0395 .8935 77 Table A. 6 Comparision of Sample Sizes to Achieve 80! Power and a<.05 for One-sided Tests for Comparing Two Correlated Proportions . A. * N N N * N al a2 e .10 .30 179 180 199 185 .22 127 128 146 134 .14 70 74 94 80 .20 .98 150 150 159 153 .94 143 143 152 146 .90 137 137 147 139 .86 131 131 141 135 .82 125 125 135 129 .78 118 118 129 122 .74 112 112 122 116 .70 106 106 116 108 .66 99 99 110 103 .62 93 93 103 96 .58 86 87 96 89 .54 80 80 90 82 .50 73 74 83 77 .46 67 67 76 70 .42 60 61 70 67 .38 53 54 63 58 .34 46 48 56 51 .30 39 41 50 44 .26 31 33 44 38 .22 22 25 36 30 .30 .88 58 59 65 63 .84 56 56 61 58 .80 53 53 59 57 .76 50 50 56 52 .72 47 47 53 49 .68 44 44 50 46 .64 41 41 47 44 .60 38 38 44 39 .56 35 35 41 39 .52 32 32 38 35 .48 29 29 35 33 .44 25 26 32 30 .40 22 23 30 26 .36 18 20 27 22 .32 14 16 23 19 .40 .98 36 36 42 39 .94 34 35 38 38 .90 33 33 37 35 .86 31 31 35 34 78 Table A. 6 — continued A * N N N N al a2 e .40 .82 29 30 34 33 .78 28 28 33 31 .74 26 26 31 29 .70 24 25 29 27 .66 23 23 27 25 .62 21 21 25 23 .58 19 19 24 22 .54 17 18 22 20 .50 15 16 21 17 .46 13 14 19 15 .42 10 11 17 14 .50 .88 20 20 23 21 .84 19 19 22 21 .80 17 18 21 19 .76 16 16 20 18 .72 15 15 19 17 .68 14 14 18 16 .64 13 13 17 14 .60 11 12 16 13 .56 10 10 14 12 .52 8 9 13 11 .60 .98 15 15 18 18 .94 14 14 17 16 .90 13 13 16 16 .86 13 13 15 15 .82 12 12 15 13 .78 11 11 14 12 .74 10 10 13 11 .70 9 9 12 11 .66 8 8 11 11 APPENDIX B PLOTS OF THE NULL POWER FUNCTION In this appendix, plots of it (p) , the null power function, are given for the two problems considered here. The dotted line represents the nominal significance level on which tt(p) is based. These plots are referred to in section 3.4 for the two independent proportions case, and in section 4.4 for the two correlated proportions case. 79 80 U) c 0 •H 4J k (D 0 u 04 c O (0 u •H 04 H > c fl): •0 n3 0) a rH 5 •H +J > &i +j fi 0) •H 0) H 4J nS | 0u N fa O 4-1 u o n c 0 0 »w •H +J ^^ 0 in C o 3 ' • U-4 II n U - M 3 Cn ■H fa z3_j„i o_o: u_ 3 s: o ►— — o : 81 CO C o ■H +J M 0 a <1) o o M fi a m ■H 4J u c nj o) >,rd c T3 a) CD a rH O U iH ta II X C w w Cn ■H Cm z3_j_j Lazun: )ZO>— —oz 82 si -p •H 5 -P to CD -P -P o rci X 0 CD iH c 0 CD •H O p c 0 to c o 3 ■H U-l IH •H M fl CD cn £ •H O CO a H H (0 ■H c 3 ■H c: e 0 ■p C o N c <1) -P T3 : O c (TJ Qi X CU cu CD T3 CD G .G •H -P 0 iw 5 0 P c C7> 0 c •H ■H p M u (fl C a. 3 g m o o M cu n S 0 O «H. ft rH o H 00 3 • q pH II ■P* Q) 0 N (0 X u w o I- r o M 3 -H =r=D i i ildzljc: .DZOt—OZ 84 II s • en •. G o 0 CN •H II -P c i-l •—' 0 a p 0 (0 u X! a Q) o) 13 O c ro CD X Cu 0) CD n CD a 43 •H 4J 0 «W 5 0 -P c Cn 0 G •H •H +» in O (fl G 04 3 e M-l o 0 H d> n S o O M-l a rH n ■-I r» 3 • C H II -P* Qt O N nl X u W 0 cu 3 en -H O.CTUJC :ui-«oz 86 w p C en o (U ■H P -P 1 H ■S3 O a o O CD ■p c us 0 rH •H a; P u o u c 0 3 u m o M 5 0) P £ o tn 04 C •H r-\ u r-H (fl 3 Oj G g 0 P 0 0 • a) en -P a 0 c •H cn-P •H H CO O Q* a) 0 xi M -P Cb 5 0 Cn cu c ■rl rH M ■H m 3 fr- Ci ■6 0 ■p u 0 (0 M X o W4-I Z=3_l_l Q.D31UC U-OSLJt— ■— OZ II 53 ■*"" ' -P w OZ 8 9 I- o II s .. o m II 2 w -P CO a c ■rH H W rH (fl d a c g 0 ■p u o _!_: a-osrujee l.3Zok~o: 90 ■ o 1° o "* II J5 "^ P CO c, -p 0 u •H Id p X n. a) o a cu 0 X! M p Oi «H. "O o QJ p c ZIi_)f— «DS APPENDIX C COMPUTER PROGRAMS The listing of the FORTRAN computer programs used to compute the size of the Z-test for each of the two problems consdered is given in this Appendix. For the case of two independent proportions, Appendix C.l gives the exact p-value for any n=10 (1)150 and any value of the Z-test statistic with unpooled variance estimator. In Appendix C.2, the case of two correlated proportions, the exact p-value for N=10 (1)200 and any value of the Z-test statistic can be obtained. 91 92 APPENDIX C.l HEAL A,B,C.TBY(151) INTEGER X,U,Y,UU,B0UNDYn51) , PHI (151) DOUBLE PRECISION LFAC (15 1) ,P, Q, LP, LQ, PX (151 ) ,FX (151) , ♦PVALDE.DEHBND^ESU-DE^L^COMlPFJAT^pftATi. ' ' l ' ' *PaAXl,|MAX2,PHINl,Pi3IN2,PL,P&,PLl.PL2,PIJ^PU2,SGP,INF, *C1,C2fT.T3,T4,T5,T6,D1.D2,D3,5u,D$,D6'LPX'LFx' *MAXP,MAXIM,MI&IM5,MINl1o DOUBLE PRECISION DLOG, DLOG10, DEXP, DMAX1 ,DMIN1 /DABS WRITE (6 . 40) 40 FORMAT (« »,///, * ' N = SAMPLE SIZE IN EACH GROUP,'/ * ' Z = NORMAL STATISTIC WITH UNPOOLED VARIANCE.'/ * 'ENTER N, Z, IN FREE FORMAT'/) C ."-.'. 99 READ(9,*)N,Z MAXP=0.0 N1=N+1 LFAC(1)=0.D0 DO 1 J=2.N1 1 LFAC(J)=LFAC(J-1)+DLOG(DBLE(FLOAT(J-1))) C C SIMPLIFY THE COMPUTATIONS AND FORM THE C BOUNDARY OF THE CRITICAL REGION C C X=0 A=1 ._ A=1+ (Z**2)/N C=(X**2)*A-(Z**2) *X TBY (X+1) = (B+SQRT < (B**2) - (4*A*C) ) ) / (2*A) IF (TBY(X+1)-N) 3,3,4 I ) ) / \± *> 3 X=X+1 GO TO 2 4 U=X-1 UU=U+1 DO 5 X=1,UG BOUNDY(X) =INT(TBY(X) +1) PHI(X)=INT(TBY(X)) L=X- 1 ' 5 CONTINUE MAXIM=0.D0 MINIM5=1.D0 MINI10=1.D0 19=0 15=0 P=-.005D0 6 IFfP.GT.0.49D0) GO TO 76 72 TFfL.BQ. 1) P=P+.01D0 IFJL.EQ.1) PP=P+.005D0 Q=1-P LP=DLOG(P] LQ=DLOG(Q) DO 7 J=1,N1 X=J-1 T=LFAC (N-H)-LFAC (J) -LFAC (N+1-J+1 ) +X*LP+ (N-X) *LQ IF (T..LT.-180.0) PX(J)=0.D0 l ' W IF (T.GE. -180.0) PX(J)=DEXP(T) 93 IF (X 8,9.8 9 FX(J)=PX(JJ GO TO 10 8 FX(J)=FX(J-1)+PX(J) 10 CONTINUE 7 CONTINUE PVALUE=0 DEEU=0 DERL=0 DO 30 K=1,UU X=K— 1 IF (PX(K) .LE.0.D0) GO TO 11 LPX=DLOG10(PX(K)) IF {(1~FX(PHIJK) + LFX=DLOG10(1-FXJ[P IF (7LPX+LFX) .LT.-7g.b) GO TO 11 PVALOE=PVALOE+PX{K)* (1-FX(PHI{K) +1)) 11 Y=BOUNDY(K) LCOM=LFAC " (N+1)-LFAC(K)-LFAC(N + 1-K + 1) + LFAC(N+1)-LFAC(Y)-LFAC(N+1-Y-1 + 1) C C C LOCAL BOUND FOR THE DERIVATIVE OF C THE NULL POWER FUNCTION C C C1=FLOAT(X+Y-1) C2=FLOAT(2*N-X-Y) PHAT1 = C1/{C1+C2) PHAT2=1-PHAT1 PL=P- (. 005D0/ (2** (L- 1) ) ) PU=P+L005D0/J2**]L-1))) IF (PHAT1.GT.PD) GO TO 12 IF ((PHAT1.LE.PU) .AND. (PHAT1 . GE. PL) ) GO TO 13 IF (PHAT1.LT. PL) GO TO 14 12 PHAX1=PU PMIN1=PL GO TO 15 13 PMAX1=PHAT1 IF (PL.NE.O.DO) GO TO 21 PMIN1=PL GO TO 15 21 PL1=C1*DL0G(PL)+C2*DL0G(1-PL) PU1=C1*DLOGJPU)+C2*DLOG(1-P0) IF (PL1.LT. PU1J — '"- -* IF JPL1.GE.PU1 GO TO 15 14 PMAX1=PL PHIN1=PU 15 CONTINUE IF (PHaT2.GT.PU) GO TO 16 IF (jfPHAT2.LE.PU) .AND. (PHAT2. GE. PL) ) GO TO 17 IF JPHAT2.LT. PL) GO TO 1 8 " 16 PMAX2=PU PHIH2=PL GO TO 19 17 PMAX2=PHAT2 IF (PL.NE.O.DO) GO TO 22 PMIN2=PL GO TO 19 22 PL2=C2*DLOG (PL) +C1 *DLOG ( 1-PL) PU2=C2*DLOGJPU5+C1*DLOG(1-PU) IF (PL2.LT. PU2) PKIN2=PL IF JPL2.GT.PU2) PHIN2=PU GO TO 19 18 PMAX2=PL PHIN2=PU 19 CONTINUE T3=LCOi3+Cl*DLOG{paAXl)+C2*DLOG(1-P8AX1) IF (T3.LT. -180.0) D3=0.D0 IF (T3.GE.-180.0[ D3=DEXP(T3) IF (PHIN2.EQ.0.D0) GO TO 23 J)+C2*DLOG( I)+C2*DLOG{' I) PMIN1=PL I) PMIN1=PU 94 T4=LC0M+C2*DL0GfPMIN2) +C1*DL0G (1-PMIN2) IF (T4.LT. -180.0) D4=0.D0 IF JT4.GE. -180.0) D4=DEXP(T4) GO TO 24 23 D4=0.D0 24 IF (PHIN1.EQ.0.D0) GO TO 25 T5=LC0M+C1*DL0G(PMIN1)+C _. _. *--— .,+C2*DLOG(1-PMIN1) IF (T5.LT.-180.d) D5=0.D0 IF JT5.GE. -180.0) D5=DEXP(T5) GO TO 26 25 D5=0.D0 26 CONTINUE T6=LC0M+C2*DL0G{PHAX2)+C1*DL0G(1-PMAX2) IF (T6.LT. -180.0) D6=0.D0 IP (T6.GE. -180.0) D6=DEXP(T6) DERU=DERU+D3-D4 DERL=DERL+D5-D6 30 CONTINUE DERBND=DBAX1 (DABS (DERU) , DABS (DERL) ) C C COMPUTE THE SIZE C C SUP=PVALUE+ (.005D0/(2**(L-1)) ) *DERBND INF=PVALUE-(.005D0/(2**(L-1)))*DERBND HAXP=DMAX1 (MAXP,PVALUE) IF (P.GE.0.1D0) 19=1 IF (P.GE.0.05D0) 15=1 C C CHECK FOB PRECISION OF SIZE, HERE DELTA=.001 C 33 IF (I L=L+1 SOP-MAXP) .LE. 0.001) GO TO 74 P=P-(.005D0/(2**(L-1))) GO TO 72 74 MAXIM=DMAX1 (MAXIM, SUP) IF (I5.EQ.1) MINIK5=DMIN1 (MINIM5,INF) IF (I9.EQ.1) MINI10=DMIN1 (HINI10,INF) IF (L.EQ.1) GO TO 6 IF { (PP-PUJ.LT.0.00001D0} GO TO 75 P=P+].005D0/(2**{L-1)))*2 GO TO 72 75 P=PP-.005D0 GO TO 6 76 CONTINUE HRITEJ6.41) 41 FORMAT]1 ' 'PVALUE FOR COMPARING TWO INDEPENDENT1, * 'PROPORTIONS' /'FOR: N«, " 4X, »Z',4X,' I5:',2X,»MIN5« . 6X. ' MIN10 ' ,5X, ' P7ALUE'/) WRITE (6.33) N,Z,HINIM5, MINI 10, MAXIM FORMAT (' ' ,I4,F8.3,3F11.5) GO TO 99 END 95 APPENDIX C.2 INTEGER T.T1.T2 DOUBLE PRECISION LFAC {20 1 ■*wHipH|#.PL1FPD1fPL2#P02,DLOG#6EXP;DABSjDHAXlJpHINl" 40 FORMAT(* ',///, * 'THIS PROGRAM COMPOTES THE P-VALOE OF THE Z-TEST'/ * 'FOR COMPARING TWO CORRELATED PROPORTIONS FOR'/ * • H = TOTAL NUMBER OF MATCHED PAIRS,'/ * ' Z = NORBAL STATISTIC.'/ * 'ENTER N, Z, IN FREE FORMAT'/) LP5=DLOG(0.5D0) 99 READ {9,*)N,Z N1=N*1 LFAC(1)=0.D0 DO 1 J=2,N1 1 LFAC (J)=LFAC(J-1)+DLOG(DBLE(FLOAT(J-1)) ) C C SIMPLIFY THE COMPUTATIONS C C LN=INT((Z-. 0000001) **2)+1 DO 2 M=LN,N M1=M+1 T=INTJ (Z*SQRT (FLOAT(M) ) +H) /2) T1=T+l T2=T+2 PM=0.D0 AM=0.DO DO 3 J=T1.M C=LFAC(MM)-LFAC{J+1)-LFAC(H-J + 1)+M*LP5 IF (C.LT.-180.) PM=0.D0 IF (C.GE.-180.) PM=DEXP(C) 3 AH=AH+PH 2 LAHJM)=DLOG(AM) MAXP=0.D0 MAXIM=0.D0 MINIM5=1.D0 MINIH9=1.D0 19=0 15=0 P=-.005DO 6 IF (P.GT.0.98) GO TO 76 L=1 72 IF (L.EQ.1) P=P+.01D0 IF (L.EQ.1) PP=P+.005D0 Q=1-P LP=DL0G (P) LQ=DLOG(Q) DERU=0.D0 DERL=0.D0 PVALUE=0.D0 C C C LOCAL BOUND FOR THE DERIVATIVE C OF THE NOLL POSER FUNCTION 96 C C PL=P-(.005D0/(2**(L-1) )) PU=P+(.005D0/(2**(L-1))) DO 4 R=LH..N PHAT1=FL0AT(M-1) /FLOAT (N-1) PHAT2=FLOATJM) /FLOAT (N-1) IF (PHAT1.GT.PU) GO TO 12 IF {7PHAT1.IE.PU) .AND. (PHAT1.GE. PL)) GO TO 13 IF JPHAT1.LT. PL) GO TO 14 12 PMAX1=PU PMIN1=PL GO TO 15 13 PMAX1=PHAT1 IF (PL.NE.O.DO) GO TO 21 PMIN1=PL GO TO 15 21 PL1=(M-1)*DLOG(PL) + (N-M) *DLOG(1-PL) PU1=JM-1) *DLOG (PU) + (N-M) *DLOG (1-PU) IF (PL1.LT.PU1) PMIN1=PL IF JPL1.GE.PU1) PMIN1=PU GO TO 15 14 PMAX1=PL PHIN1=PU 15 CONTINUE IF (PHAT2.GT.PU) GO TO 16 IF ((PHAT2.LE.PU) .AND. (PHAT2.GE. PL)) GO TO 17 IF (PHAT2.LT. PL) GO TO 1 8 16 PHAX2=PU PMIN2=PL GO TO 19 17 PMAX2=PHAT2 IF (PL.NE.O.DO) GO TO 22 PHIN2=PL GO TO 19 22 PL2=H*DLOG (PL) + (N-M- 1) *DLOG (1-PL) PU2=H*DLOG (PUJ + (N-M-1)*DLOG{1-PUJ IF (PL2.LT.PU2) PHIN2=PL IF JPL2.GT.PU2) PMIN2=PU GO TO 19 18 PHAX2=PL PHIN2=PU 19 CONTINUE LCOEF=LAH(H) +LFAC (N+ 1 V-LFAC (13+1) -LFAC(N-H+1) FU1=LCOEF+(H-1)*DLOG f PMAX1) + (N-H) *DLOG( 1-PHAX1) IF (FU1.LT. -180.0) GU1=0.D0 IF (FU1.GE. -180.0) GU1=M*DEXP(FU1) FL2=LCOEF+a*DLOGJPMAX2)+ (N-M-1) *DLOG (1-PHAX2) IF (FL2.LT. -180.0) GL2=0.D0 IF (FL2.GE. -180.0) GL2=(N-!1) *D3XP (FL2) IF (PHIN2.EQ.0.D0) GO TO 23 FU2=LCOEF+Jl*DLOG (PHIN2) + (N-M-1) *DLOG (1-PHIN2) IF (FU2.LT. -180.0) GU2=0.D0 . IF JFU2.GE. -180.0) G02= (N-H) *DEXP (FU2) GO TO 24 ' 23 GU2=0.D0 24 IF (PMIN1.EQ.0.D0) GO TO 25 FL1=LCOEF+ (H-1)*DLOG (PHIN1 ) + (N-M) *DLOG ( 1-PIUN1) IF (FL1.LT. -180.0) GL1=0.D0 IF JFL1.GE. -180.0) GL1=M*DEXP (FL 1) GO TO 26 25 GL1 = 0.D0 26 CONTINUE DERU=DERU+GU1-GU2 DERL=DERL+GL1-GL2 LPVAL=LCOEF+M*LP+ (N-M) *LQ IF (LPVAL.LT.-180.) P7AL=0.D0 IF (LPVAL.GE.-180.) PVAL=DEXP (LPVAL) PVALUE=PVALUS+P7AL 4 CONTINUE DERBND=DHAX1 (DABS (DERU) , DABS (DERL) ) 97 C C COMPUTE THE SIZE C C SOP=PVALOE+(.005D0/(2**(L-1) ) ) *DER3ND INF=PVALUE-(.005D0/J2**{L-1))) *DERBND maxp=dmaxi(Haxp,pvalue) IF " ' IF C C CHECK FOE PRECISION OF SIZE, HERE DELTA=.001 C P=DMAX1 (HAXP.PVALUE) (P.GE. (10./FLOAT{N)J) 19=1 (P.GE. (5.AlOAT(N))) 15=1 IF ((SUP-MAXP) .LE. 0.001) GO TO 74 L=L+1 P=P- (. 005D0/ (2** (L-1 ) ) ) GO TO 72 74 MAXIM=DMAX1 (MAXIM, SUP) IF (I5.EQ.1) MINIM5=DMIN1 (MIMIM5,INF) IF (I9.EQ.1) MINIM9=DMIN1 (MIHIM9,INF) IF (L.EQ.1) GO TO 6 IF ((PP-PUJ.LT.0.00001D0) GO TO 75 IF (L.EQ.1) GO TO 6 IF ((PP-PUJ.LT.0.00001D0) P=P+].005DO/(2**(L-1)) )*2 GO TO 72 75 P=PP-.005D0 GO TO 6 76 CONTINUE SRITEjf6.41) 41 FORMAT (* ','PVALUE FOR COMPARING TWO CORRELATED • * 'PROPORTIONS',/* FOR :N'f *o^*lZ,!'?i?f,IS:,*2x^!1TN5,f6X,'MINl0«,4X,«PVALaEV) WRITE (6.33) N,Z,MtNlfi5.MINIM9, MAXIM FORMATP »,I4,F8.3,3Fll.5) GO TO 99 END REFERENCES Barnard, G.A. (1947). Significance Tests for 2*2 Tables. Biometrika 34 , 123. Barnard, G.A. (1979). In Contradiction to J. Berkson's Dispraise: Conditional Tests can be more Efficient. Journal of Statistical Planning and Inference 3, 181. Basu, D. (1977) . On the Elimination of Nuisance Parameters. Journal of the American Statistical Association 72, 355. Basu, D. (1979) . Discussion of Joseph Berkson's Paper "in Dispraise of the Exact Test" . Journal of Statistical Planning and Inference 3, 189. Bennett, B.M. (1967). Tests of Hypotheses Concerning Matched Samples. Journal of the Royal Statistical Society B 29, 468. — Bennett, B.M., and Underwood, R.E. (1970). On McNemar's Test for the 2x2 Table and its Power Function. Biometrics 26, 339. -■•■ Berkson, J. (1978) . In Dispraise of the Exact Test. Journal of Statistical Planning and Inference 2, 27. Casagrande, J.T., Pike,.M.C, and Smith, P.G. (1978a). An Improved Approximate Formula for Calculating Sample Sizes for Comparing Two Binomial Distributions. Biometrics 34 , 483. Casagrande, J.T., Pike, M.C., and Smith, P.G. (1978b). The Power Function of the "Exact" Test for Comparing Two Binomial Distributions. Applied Statistics 27, 176. Cochran, W.G. (1950) . The Comparision of Percentages in Matched Samples. Biometrika 37, 256. Cochran, W.G., and Cox, G.M. (1957). Experimental Designs, 2nd ed. New York: John Wiley and Sons. 98 99 Corsten, L.C.A., and de Kroon, J. P.M. (1979). Comment on J. Berkson's Paper "In Dispraise of the Exact Test". Journal of Statistical Planning and Inference 3, 193. Courant, R., and John, F. (1965). Introduction to Calculus and Analysis, Volume 1. Interscience, New York: John Wiley and Sons. Eberhardt, K.R., and Fligner, M.A. (1977). A Comparision of Two Tests for Equality of Two Proportions. The American Statistician 31, 151. Ferguson, T.S. (1967). Mathematical Statistics: a Decision Theoretic Approach. New York: Academy Press. Fisher, R.A. (1935) . The Logic of Inductive Inference. Journal of the Royal Statistical Society A 98, 39. Fleiss, J.L. (1980). Statistical Methods for Rates and Proportions , 2nd ed. New York: John Wiley and Sons. Gail, M. and Gart, J.J. (1973). The Determination of Sample Sizes for Use with the Exact Conditional Test in 2*2 Comparative Trials. Biometrics 29, 441. Haseman, J.K. (1978). Exact Sample Sizes for Use with the Fisher-Irwin Test for 2x2 Tables. Biometrics 34, 106. Kempthorne, 0. (1979). In Dispraise of the Exact Test: Reactions. Journal of Statistical Planning and Inference 3, 199. Kendall, M.G.,and Stuart, A. (1967). The Advanced Theory of Statistics, Vol. 2, 2nd ed. New York: Hafner Publishing Company. Kramer, M., and Greenhouse, S.W. (1959). Determination of Sample Size and Selection of Cases, Psychopharmacology : Problems in Evaluation. J.O. Cole and R.W. Gerard (eds) , 356. Washington: National Academy of Sciences, National Research Council. Lehmann, E.L. (1959). Testing Statistical Hypotheses. New York: John Wiley and Sons. McDonald, L.L., Davis, B.M., and Milliken, G.A. (1977). A Nonrandomized Unconditional Test for Comparing Two Proportions in 2x2 Contigency Tables. Teehnometrics 19, 145. 100 McNemar, Q. (1947). Note on the Sampling Error of the Differences Between Correlated Proportions or Percentages. Psychometrika 12 , 153. Miettinen, O.S. (1968). The Matched Pairs Design in the Case of All-or-none Responses, Biometrics 24, 339. Pearson, E.S. (1947). The Choice of Statistical Tests Illustrated on the Interpretation of Data Classed in a 2x2 Table. Biometrika 34, 139. Randies, R.H., and Wolfe, D.A. (1979). Introduction to the Theory of Nonparametric Statistics. New York: John Wiley and Sons. Robbins, H. (1977). A Fundamental Question of Practical Statistics, Letter to the Editor. The American Statistician 31, 97. Schork, M. A., and Williams, G.W. (1980). Number of Observations Required for the Coraparision of Two Correlated Proportions. Communications in Statistics B 9, 349. ~~ : Welch, R.H. (1939). On Confidence Limits and Sufficency, with Particular Reference to Parameters of Location. Annals of Mathematical Statistics 10, 58. BIOGRAPHICAL SKETCH Samy Suissa was born on July 9, 1954, in Berrechid, a suburb of Casablanca, Morocco. In 1964, Samy's family moved to Montreal, Canada, where they still reside. There, he attended Baron Byng High School, famous for its role in "The Apprenticeship of Duddy Kravitz" by Mordechai Richler, until 1968 and Outremont High School until 1970. He then entered McGill University where he received a Bachelor of Science degree in mathematics in June 1976, and a Master of Science degree in mathematical statistics in November 1977. Samy came to the University of Florida in the fall of 1977 to pursue a doctoral degree in statistics. While studying at the University of Florida, Samy has worked as a graduate assistant performing statistical consulting duties in the Department of Orthopaedics among other places. On July 1, 19 82, Samy Suissa will join the "Service d' Epidemiologic Clinique" of the Montreal General Hospital's Research Institute in the capacity of biostatistician. He will also have an appointment as assistant professor in the Department of Epidemiology and Health at McGill University. He has recently been awarded a research grant from "Les Fonds de Recherche en Sante du Quebec" for 19 82-19 83. He is married to the former Nicole Bonenfant of Montreal, Canada, and has a son, Daniel Moshe. 101 I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of* Philosophy .