BOSTON PUBLIC LIBRARY 3 9999 06317 765 1 " r SOME BAYESIAN STATISTICAL TECHNIQUES USEFUL IN ESTIMATING FREQUENCY AND DENSITY UNITED STATES DEPARTMENT OF THE INTERIOR FISH AND WILDLIFE SERVICE Speq^aj^Scientific Report— Wildlife No. 203 Boston J':;;f-^i DocuiTior/.:' Library of Congress Cataloging in Publication Data Johnson, Douglas H Some Bayesian statistical techniques useful in estimating frequency and density. (Special scientific report — wildlife; no. 203) 1. Animal populations — Mathematical models. 2. Bayesian statistical decision theory. I. Title. II. Series: United States. Fish and Wildlife Service. Special scientific report — wildlife; no. 203. SK361.A256 no. 203 [QH352] 639'.97'90973s [598.2'5'24] 76-50075 NOTE: Use of trade names does not imply U.S. Government endorsement of commercial products. Some Bayesian Statistical Techniques Useful in Estimating Frequency and Density by Douglas H. Johnson UNITED STATES DEPARTMENT OF THE INTERIOR i^l°' Tfie,^ FISH AND WILDLIFE SERVICE Special Scientific Report— Wildlife No. 203 Washington, D.C. • 1977 Some Bayesian Statistical Techniques Useful in Estimating Frequency and Density by Douglas H. Johnson U.S. Fish and Wildhfe Service Northern Prairie WildHfe Research Center P.O. Box 1747, Jamestown, North Dakota 58401 Abstract This paper presents some elementary applications of Bayesian statistics to problems faced by wildlife biologists. Bayesian confidence limits for frequency of occurrence are shown to be generally superior to classical confidence limits. Population density can be estimated from frequency data if the species is sparsely distributed relative to the size of the sample plot. For other situations, limits are developed based on the normal distribution and prior knowledge that the density is non -negative, which insures that the lower confidence limit is non-negative. Conditions are described under which Bayesian confidence limits are superior to those calculated with classical methods; examples are also given on how prior knowledge of the density can be used to sharpen inferences drawn from a new sample. Biologists often require estimates of distribution and abundance for a species of animal or plant. Many approaches, such as mark-recapture techniques, complete censuses, counts along prescribed transects, and animal signs, have been used to estimate frequency of occurrence or density (e.g., Overton 1969). These methods provide an index to abundance which may allow comparisons between years or between areas, but some do not permit valid estimates of the absolute population size. Variance estimates, which are necessary to assess the precision of a quantity, are typically not available. Because time and expense limit the amount of data that can be collected, the investigator generally desires es- timators with as much precision as possible. This paper wdll explore methods of analyzing counts of organisms to obtain accurate estimators of population size and frequency of occurrence, as well as to gauge the precision of these estimators. The techniques described here apply the Bayesian philosophy of statistical inference (e.g., Good 1965, Schmitt 1969, Lindley 1971) to description of frequen- cy and density. Other authors have discussed applications of Bayesian theory to a variety of problems; see DeGroot (1970) for an extensive bibliography. The methods described here will be illustrated by examples of censuses of breeding pairs of birds (Stewart and Kantrud 1972), but the methods are applicable to a broad class of populations. The term "pair" is used for these examples, but in other situations the word "individual" might be sub- stituted. Stewart and Kantrud (1972), who studied the populations of breeding birds in North Dakota, drew a random sample of land areas (quarter-sections, 160 acres or 64.75 ha), and counted pairs of all avian species breeding on them. They wanted to estimate the statewide breeding population of each species and to measure the precision of those estimates. They assumed that a census of a quarter-section was essentially perfect; no pairs were missed and none counted more than once. Under this assumption, a census of all quarter-sections in the State would provide the exact population total, and no statistics would be required. Because complete censuses were not feasible for such a large area, however, sampling had to be employed. A Bayesian Approach We now consider some estimation procedures which employ the Bayesian philosophy of statistics. A distinguishing feature of this philosophy is the concept of a parameter. Whereas in classical thought a parameter is considered to be a fixed constant with unknown value, Bayesian method treats a parameter as a random variable, the distribution of which depends on available knowledge about the parameter. The parameter does not vary, but our degree of belief in its value changes with increasing information, and the probability function reflects the belief. We will use this philosophy in two ways. First, it will be convenient to combine information about the parameter from more than one source. The Bayesian approach provides a rigorous framework in which information about a parameter can be applied either from experiments performed simultaneously or sequentially, or from knowledge available before the experiment. Second, the concept of a confidence interval will be replaced by a more natural statement about the probability that a parameter is contained within an interval. A classical 95% confidence interval (a, b) for a parameter 6 consists of two functionsoftheobservations,a(xi,A:2, . . . , Atn)and 6(aci, X2, . ■ ■ , s:^), such that, if the experiment were repeated an infinite number of times under identical conditions, 95% of the intervals obtained would include the true value of the parameter 9. A comparable Bayesian confidence interval is more simply the interval (a, b) such that 95% of the probability content lies in that interval. To oversimplify the Bayesian method, we will say that it combines information about a pa- rameter 9 from two sources. One source is the experiment (or, in the present case, the random sample of counts). Information from the experiment is contained in the probability function of the observations which, considered as a function of ^ , is called the "likelihood" function. The other source of information is that which is available before the experiment occurs, and is contained in the "prior" distribution of 9 , denoted p{9)- We combine the two sources of information by multiplying the prior and likelihood functions; the product is termed the "posterior" distribution function. The Bayesian draws his inference from the posterior distribution, but the classical statistician generally uses only the likelihood function. Often our prior knowledge is lacking, so we take p{9) to be noninformative. A Bayesian employing a noninformative prior obtains results closely paralleling classical results, although the Bayesian may do so more directly. We will later demonstrate how even a modest amount of prior knowledge can sharpen the inferences drawn. Application of the Bayesian Method to Frequency of Occurrence To put this concept in focus, let X be a random variable taking the value 1 if breeding mallards (Anas platyrhynchos) are present in a quarter- section selected at random, and 0 if they are absent. Then X is distributed binomially with Pr| X = \\ - 9 and 9 , the expected value of X, represents the frequency of occurrence of mallards per quarter-section. Useful prior knowledge may be available, either from censuses conducted previously on these quarter- sections or from a knowledge of how well the habitat requirements of the mallard are satisfied on each quarter-section. But, for the moment, assume prior knowledge is lacking; we then have to select a prior distribution for 9 which expresses our ignorance. The Beta family of prior distributions p(e) = constant 9°" (1-6)1^ with a >-\, (i >-lis appealing because it is of the same form as the likelihood function, so that its use results in a posterior distribution belonging to the same family. In addition, the Beta family is very flexible; by varying a and fi the curve can assume a variety of shapes reflecting various degrees of prior belief. The choice of exponents a = (3 = -V2 has been suggested (Jeffreys 1967; Good 1965) on the basis of invariance under transformation of the parameter 9 . That is, if we have no prior knowledge of 9 , then we are equally ignorant of 9~\ log 9 , etc., and the prior should reflect this invariance. The choice of exponents a , /8 will have little effect on our con- clusions as long as they are small in absolute value relative to z and n - z, (defined below). The prior distribution with a = 0 - -V2 is graphed in Fig. 1. Note how little it varies over most of the range of 9 . We next take our random sample of censuses on n quarter-sections, yielding z = 2.x { quarter-sections with mallards present. The average frequency of occurrence is z/n and the likelihood function of 9 , given the data, is L{d\xu X2, ..., x„) - e^'^'n-ef'^''' For example, Stewart and Kantrud (1972) found mallards breeding in 79 of their 130 sample quarter- sections. An estimate of the frequency of occurrence is then given by 79/130 = 0.608, and the likelihood function is L{e\ xuX2, ..., -vn) = 0" {i-ef. We next combine our prior information with the information from the sample by multiplying the corresponding probability functions, to obtain the posterior distribution of 9 , given the observations, with probability density denoted by p{e\ xu X2, . . . , Xn) = constant XL(e\ .Vl, X2, ■ . . , Xr,) p{6), where the constant is such that S'P{9\ Al, X2, . . . ,.Xn) de =1- o The term "posterior" is in reference to the fact that it represents information available "after" the expert- THETA Fig. 1. Prior, likelihood, and posterior functions for frequency of occurrence of mallards. "Probability" means probability density. ment (or counts). In the present case of a binomial variate, L{d\ x\, xi, . . . , Xn) = constant 0'(l-0)" ^ , and pid) = constant 0""'(l-0)"'^' so the posterior distribution becomes p(d\ .VI, X2, . . . , Xn) = constant e''"-(l-0)"~'""^ or, in the mallard example, p(d\ .vi, X2, . . . , Xn) = constant e''\l-ef°\ The appendix describes how the constant is evaluated. Because of the large sample size and moderately large mean of the mallard data, the likelihood function dominates the posterior distribu- tion to such an extent that graphs of the func- tions cannot readily be distinguished (Fig. 1). We can now find points 6] and 62 so that Pr{0,<0<0,| = 1 - a. The interval (duO:) is the Bayesian counterpart of a classical confidence interval. A two-sided confidence interval is ordinarily chosen so that a/ 2 of the likelihood is eliminated from each tail of the distribution. On the other hand, a Bayesian seeks the shortest interval (61, 6:) con- taining 1 —a of the probability by selecting 0i and 02 such that; (1) no point outside the interval has its probability function greater than any point within, and (2) J Pie I .X-,, X2, . . . , Xn) dd = \ - a . 01 Such an interval is called a Bayesian confidence region or a Highest Posterior Density (HPD) region (Box and Tiao 1965). The 95% HPD region for frequency of occurrence of mallards was (0.523-0.690). The calculation of HPD regions is demonstrated in the appendix. The usual method for obtaining a confidence interval for a binomial parameter employs the normal approximation, which is fairly good for large samples as long as z (the number of samples in which the species was present) and n - z (the number of samples in which the species was absent) are not too small. The normal approximation gives limits on the mean as plus or minus two standard errors. Exact limits can be obtained by using the relationship between a binomial sum and the incomplete beta function (Johnson and Kotz 1969), which allows equal-tailed intervals to be obtained from tables of the F distribution (Bliss 1967). For mallards, with a moderate value of z (= S.v = 79) as well as a large sample size (n = 130), all three procedures yielded very similar confidence limits (Table 1). Consider, however, the burrowing owl (Speotyto cunicularia), which was present on only 2 of the 130 sample quarter-sections. The normal approximation fails here because the number present was small, and the Bayesian confidence interval is shorter than the exact interval (Table 1). Although there are some non- Bayesian methods for reducing the size of confidence regions, e.g., Hudson (1971), they do not follow as naturally as in the Bayesian framework and there is no unified approach to using available prior knowledge. Application to Estimating Population Density In certain circumstances the probability limits derived for the frequency of occurrence can also be used to estimate the density (Dice 1952). Suppose the sample areas censused are small relative to the territory size of a species, so that the probability of more than one pair on an area is zero. Then the frequency of occurrence is identical to the average density per unit, and the total number of pairs is equal to the number of sample units containing a pair. Thus the method described above will also give probability limits for the population total. To equate density to frequency, we have to assume that Prj more than one pair per sample unit f is negligible. For a territorial species, the validity of this assumption can be assured by reducing the size of the sample unit, although the resultant unit may be impracticably small. The assumption can be exam- Table 1. — Confidence limits for frequency of oc- currence of breeding pairs of mallards and burrowing owls on 130 sample quarter-sections in North Dakota. Method Mallard Burrowing owl Normal approximation to binomial Exact binomial Bayesian (.522, .693) (.519, .690) (.523, .690) (-.006, .037) ( .002, .054) ( .001, .042) ined if a large number of units are censused by noting whether or not any unit contains more than one pair. If not, then Pr| more than ofee pair per sample unit I is not likely to be appreciable. Even if more than one pair can occur in a unit, the estimator will be only slightly affected as long as Pr | more than one pair per sample unit > is small and the probability of a large number of pairs on a sample unit is zero. For a colonial species this latter probability is nonzero, so the relationship between frequency of occurrence and density does not hold, and this method should not be attempted. Modified Normal Limits for Population Density If the counts are reasonably large and sufficiently numerous, we have some confidence that the central limit theorem applies. That is, we expect the average of the counts to be nearly normally distributed. We will indicate how inferences can be sharpened by using the slight prior knowledge available. Even before sampling takes place, we know that the count on any sample area will be non-negative. Since the normal distribution extends into negative values, it is more general then we need, although this becomes a problem only when the average is small relative to its standard error. Let 6 be the mean and a' the variance of the counts. The usual (Jeffreys 1967) noninformative prior distribution for ( 0 , a' ) is with and p{e,o') = p{e)p(o') , p(6) = constant for all 6 , P(a' constant a for a>0 But for counts, we know 6^0, so we can take constant Os^O P{6) 0 0 p{e* I Xl, X2, . . . , Afn) , (1) and X^2 pid \ Xl, X2, . . . , X:,) de = I - a . (2) 01 We will develop HPD regions for a binomial parameter and for a normal mean restricted to non- negative values. HPD Regions for The Binomial Parameter If I -^i ( are distributed binomially with parameter 6, then the posterior distribution of 6 corresponding to the noninformative prior specified earlier is given by p{d\ xu X2, . . . , x„) = constant e''^'\l-6y''~"\ (3)\ where z = 2^ x,. The constant term is such that 1 = J P{d I .Vl, A:2, . . . , Xn) dd O = constant J" 0'"'^' (1 - d)"'"'^'^ dd = constant B{z+%, n-z-\-%) , where B(a, b) is the beta function (e.g., Feller 1966). Thus, the required constant is the reciprocal of B{z+V2, n-2+1/2). A (1-a) HPD interval is given by (6\, 62) satisfy- ing equations (1) and (2) above. For a unimodal distribution the first requirement is equivalent to P(dl I Xl, X2, . . . , Xn) = p(62 I Xl, X2, . . . , Xn) , which, using (3) and noting that the constant occurs on both sides and thus cancels, becomes di'-'" (1 - di) "''-'" = 62''"' (1 - 62)""''"', which has the same solution as does /i(0i,02) = 0 , where Mdi,e2) = (z-%) (log di - log 02) + (n-z-%) (log(l-0,) - log (1-02) ). It follows from (2) that 1 .0 B(z+%, n-zVh) -^Q / V'^\l-0)"-^"'/'d0 = 1 - a , or 1 f ' e'-"\\-ef''-"^de 5(2+72, /j-z+'/j) •'o 1 5(z+'/: ~ r- f d'-"\l-dy-'-"'dd - 1 -a or where /2(01,02) = 0 (5) (4) /2(01,02) = Ig^ {z+%, «-Z+'/2) - Iq^ iz+%, n-Z+%) -{I - a) , and I u)(a, b) is the incomplete beta integral with parameters a and b evaluated at the point w (Beyer 1968). This function has been tabled (Pearson 1968), and computer routines are available for its calcula- tion (International Business Machines Corporation 1970). We thus have two equations, (4) and (5), that 0i and 02 must simultaneously satisfy. Because both equations are nonlinear, iterative techniques are necessary. Two such approaches have been successfully used, although either can fail under certain extreme conditions. The first and more direct approach is to use a computer algorithm designed to solve a system of nonlinear equations. In the present instance the system consists of two equations, (4) and (5). Not all computer facilities have such routines, but Brown (1973) presented an algorithm and Fortran program which can be used for this purpose. The second approach requires only computer routines that are commonly available. The equations (4) and (5) can be solved by minimizing the quantity |/l(01,02)| + 1/2(01,02)1 with respect to 0i and 02. The minimum is zero, and is attained for values of 0i and 02 such that equations (4) and (5) are satisfied. Numerous algorithms have been developed to find the minimum of a function. Most computing centers have one or more available. HPD Regions for the Normal Mean The approximate posterior distribution of the mean of non-negative random variables | Xj > is asymp- totically I constant 5~' X exp \rn{e-xfl2s^] if 0^0 0 if 0( • ) being the cumulative normal distribu- tion function. Because 00 J p(d \ X\, X2, . . . , Xn) dd = \ , the distribution for positive values of 0 is \/{\.-A) times that of a normal variate with mean .v and variance s^/n, i.e., P (6 I Xl, Xl, . . . , Xn) = (1-/1)"' ilns^ln)'"^ exp [-n(0-x)V2s'] for 0^0. As with a binomial variable, we want values d\ and 02 which satisfy equations (1) and (2). The form that the HPD interval takes will depend on whether or not A < all holds. Casel:/! < a/2. Here the intervalwillbesymmetric about X. Equation (2) implies 1 - a = f ' (1-/1)"' {iTTs'ln)'"^ exp [-/I {e-xflls"] dd , or X02 (27rs'lny"'x exp [-« i6-xfl2s'] dd ^/n(di-x)ls r)'"' exp [- i W^] dW sjn(di~x)ls ^ yn{.Bi-x)ls\ -* yn{e^-x)|s] . it\\/Z. -^ ft ^ Equation (1) requires p{Q\ I Xl, X2, . . . , Xn) P(02 I Xl, X2, . . . , Xn) which implies symmetry about x, so (0i, Qj) simply corresponds to an ordinary confidence interval with a confidence coefficient of (1 - a) (1 - >!) rather than (1 - a)- For the mallard, having a mean of 1.68 and a standard error of 0.152, the calculated value of A-^ (-11.05) is so near zero that it can be ignored; thus, the HPD region coincides with the ordinary confidence interval. Case H: .4 > a/2. Here Equation (1) insures that the lower limit will be zero, so we need to exclude o of the probability content from the upper range. Equation (2) becomes simply 10 r^2 , , ^° 1 - a - (1 - Ay' i27rs'/ny"'x , _ , exp [-n{d-xfl2s^] dd. or or 6i ^ X + {sl-Jn) ''[ 1 - a(l - A)] , -1/2 X a = J (1 - ^)"' (27ri'/«)" di and exp [-n{e-xfl2s'-\ dd, 0, = 0 . or Case II is exemplified by the long-billed marsh wren. X2 _^n -2, 1-. L.ase 11 IS exempiinea Dy tne long-Diiiea marsn wren. (2775 In) exp {-n{e-x) lis ] dd ^ ^^^^ ^f o.39 and standard error of 0.223 resulted in ^^ A =* (-0.39/0.223) = 0.040 > a /2. Hence, e^ = 0.0 and 6*2 = 0.39 + 0.223 $' (0.952)= 0.76. Calculating HPD regions for the normal mean requires simply a routine for the normal probability \Jn{Q2~x)ls function (to determine A) and one for the inverse normal probability function (to determine the con- 1 — ^ [_\Jn{62-x)ls]. fidence limits). oo = J (27r)""' exp [-'/2 ff '] dW As the Nation's principal conservation agency, the Department of the ■ Interior has responsibility for most of our nationally owned public lands and natural resources. This includes fostering the wisest use of our land and water resources, protecting our fish and wildlife, preserving the environmental and cultural values of our national parks and historical places, and providing for the enjoyment of life through outdoor recreation. The Department assesses our energy and mineral resources and works to assure that their development is in the best interests of all our people. The Department also has a major responsibility for American Indian reservation communities and for people who live in island territories under U.S. administration. GPO 837 -724 UNITED STATES DEPARTMENT OF THE INTERIOR FISH AND WILDLIFE SERVICE EDITORIAL OFFICE AYLESWORTH HALL, CSU FORT COLLINS, COLORADO 80S23 POSTAGE AND FEES PAID US DEPARTMENT OF THE INTERIOR INT 423 Mailing lists are computerized. Please return address label with change of address.