BOSTON  PUBLIC  LIBRARY 


3  9999  06317  765  1    " 

r 

SOME  BAYESIAN  STATISTICAL  TECHNIQUES 
USEFUL  IN  ESTIMATING 
FREQUENCY  AND  DENSITY 


UNITED  STATES  DEPARTMENT  OF  THE  INTERIOR 

FISH  AND  WILDLIFE  SERVICE 

Speq^aj^Scientific  Report— Wildlife  No.  203 

Boston  J':;;f-^i  DocuiTior/.:' 


Library  of  Congress  Cataloging  in  Publication  Data 

Johnson,  Douglas  H 

Some  Bayesian  statistical  techniques  useful  in  estimating  frequency 
and  density. 

(Special  scientific  report — wildlife;  no.  203) 

1.  Animal    populations — Mathematical    models.    2.  Bayesian 
statistical  decision  theory.  I.  Title.  II.  Series:  United  States.  Fish  and 
Wildlife  Service.  Special  scientific  report — wildlife;  no.  203. 
SK361.A256  no.  203        [QH352]       639'.97'90973s       [598.2'5'24] 
76-50075 


NOTE:    Use  of  trade  names  does  not  imply  U.S.  Government  endorsement  of  commercial  products. 


Some  Bayesian  Statistical  Techniques  Useful  in 
Estimating  Frequency  and  Density 


by  Douglas  H.  Johnson 


UNITED  STATES  DEPARTMENT  OF  THE  INTERIOR 

i^l°' Tfie,^  FISH  AND  WILDLIFE  SERVICE 

Special  Scientific  Report— Wildlife  No.  203 
Washington,  D.C.    •    1977 


Some  Bayesian  Statistical  Techniques  Useful  in 
Estimating  Frequency  and  Density 

by 

Douglas  H.  Johnson 

U.S.  Fish  and  Wildhfe  Service 

Northern  Prairie  WildHfe  Research  Center 

P.O.  Box  1747,  Jamestown,  North  Dakota    58401 


Abstract 

This  paper  presents  some  elementary  applications  of  Bayesian  statistics  to  problems  faced 
by  wildlife  biologists.  Bayesian  confidence  limits  for  frequency  of  occurrence  are  shown  to  be 
generally  superior  to  classical  confidence  limits.  Population  density  can  be  estimated  from 
frequency  data  if  the  species  is  sparsely  distributed  relative  to  the  size  of  the  sample  plot.  For 
other  situations,  limits  are  developed  based  on  the  normal  distribution  and  prior  knowledge 
that  the  density  is  non -negative,  which  insures  that  the  lower  confidence  limit  is  non-negative. 
Conditions  are  described  under  which  Bayesian  confidence  limits  are  superior  to  those 
calculated  with  classical  methods;  examples  are  also  given  on  how  prior  knowledge  of  the 
density  can  be  used  to  sharpen  inferences  drawn  from  a  new  sample. 


Biologists  often  require  estimates  of  distribution 
and  abundance  for  a  species  of  animal  or  plant.  Many 
approaches,  such  as  mark-recapture  techniques, 
complete  censuses,  counts  along  prescribed  transects, 
and  animal  signs,  have  been  used  to  estimate 
frequency  of  occurrence  or  density  (e.g.,  Overton 
1969).  These  methods  provide  an  index  to  abundance 
which  may  allow  comparisons  between  years  or 
between  areas,  but  some  do  not  permit  valid  estimates 
of  the  absolute  population  size.  Variance  estimates, 
which  are  necessary  to  assess  the  precision  of  a 
quantity,  are  typically  not  available.  Because  time 
and  expense  limit  the  amount  of  data  that  can  be 
collected,  the  investigator  generally  desires  es- 
timators with  as  much  precision  as  possible. 

This  paper  wdll  explore  methods  of  analyzing 
counts  of  organisms  to  obtain  accurate  estimators 
of  population  size  and  frequency  of  occurrence,  as 
well  as  to  gauge  the  precision  of  these  estimators. 
The  techniques  described  here  apply  the  Bayesian 
philosophy  of  statistical  inference  (e.g.,  Good  1965, 
Schmitt  1969,  Lindley  1971)  to  description  of  frequen- 
cy and  density.  Other  authors  have  discussed 
applications  of  Bayesian  theory  to  a  variety  of 
problems;  see  DeGroot  (1970)  for  an  extensive 
bibliography.  The  methods  described  here  will  be 
illustrated  by  examples  of  censuses  of  breeding  pairs 
of  birds  (Stewart  and  Kantrud  1972),  but  the  methods 
are  applicable  to  a  broad  class  of  populations.  The 


term  "pair"  is  used  for  these  examples,  but  in  other 
situations  the  word  "individual"  might  be  sub- 
stituted. 

Stewart  and  Kantrud  (1972),  who  studied  the 
populations  of  breeding  birds  in  North  Dakota,  drew 
a  random  sample  of  land  areas  (quarter-sections,  160 
acres  or  64.75  ha),  and  counted  pairs  of  all  avian 
species  breeding  on  them.  They  wanted  to  estimate 
the  statewide  breeding  population  of  each  species 
and  to  measure  the  precision  of  those  estimates.  They 
assumed  that  a  census  of  a  quarter-section  was 
essentially  perfect;  no  pairs  were  missed  and  none 
counted  more  than  once.  Under  this  assumption,  a 
census  of  all  quarter-sections  in  the  State  would 
provide  the  exact  population  total,  and  no  statistics 
would  be  required.  Because  complete  censuses  were 
not  feasible  for  such  a  large  area,  however,  sampling 
had  to  be  employed. 

A  Bayesian  Approach 

We  now  consider  some  estimation  procedures 
which  employ  the  Bayesian  philosophy  of  statistics. 
A  distinguishing  feature  of  this  philosophy  is  the 
concept  of  a  parameter.  Whereas  in  classical  thought 
a  parameter  is  considered  to  be  a  fixed  constant  with 
unknown  value,  Bayesian  method  treats  a  parameter 
as  a  random  variable,  the  distribution  of  which 
depends    on    available    knowledge    about    the 


parameter.  The  parameter  does  not  vary,  but  our 
degree  of  belief  in  its  value  changes  with  increasing 
information,  and  the  probability  function  reflects  the 
belief.  We  will  use  this  philosophy  in  two  ways.  First, 
it  will  be  convenient  to  combine  information  about 
the  parameter  from  more  than  one  source.  The 
Bayesian  approach  provides  a  rigorous  framework  in 
which  information  about  a  parameter  can  be  applied 
either  from  experiments  performed  simultaneously  or 
sequentially,  or  from  knowledge  available  before  the 
experiment.  Second,  the  concept  of  a  confidence 
interval  will  be  replaced  by  a  more  natural  statement 
about  the  probability  that  a  parameter  is  contained 
within  an  interval.  A  classical  95%  confidence 
interval  (a,  b)  for  a  parameter  6  consists  of  two 
functionsoftheobservations,a(xi,A:2,  .  .  .  ,  Atn)and 
6(aci,  X2,  .  ■  ■  ,  s:^),  such  that,  if  the  experiment  were 
repeated  an  infinite  number  of  times  under  identical 
conditions,  95%  of  the  intervals  obtained  would 
include  the  true  value  of  the  parameter  9.  A 
comparable  Bayesian  confidence  interval  is  more 
simply  the  interval  (a,  b)  such  that  95%  of  the 
probability  content  lies  in  that  interval. 

To  oversimplify  the  Bayesian  method,  we  will 
say  that  it  combines  information  about  a  pa- 
rameter 9  from  two  sources.  One  source  is  the 
experiment  (or,  in  the  present  case,  the  random 
sample  of  counts).  Information  from  the  experiment 
is  contained  in  the  probability  function  of  the 
observations  which,  considered  as  a  function  of  ^ ,  is 
called  the  "likelihood"  function.  The  other  source  of 
information  is  that  which  is  available  before  the 
experiment  occurs,  and  is  contained  in  the  "prior" 
distribution  of  9  ,  denoted  p{9)-  We  combine  the  two 
sources  of  information  by  multiplying  the  prior  and 
likelihood  functions;  the  product  is  termed  the 
"posterior"  distribution  function.  The  Bayesian 
draws  his  inference  from  the  posterior  distribution, 
but  the  classical  statistician  generally  uses  only  the 
likelihood  function.  Often  our  prior  knowledge  is 
lacking,  so  we  take  p{9)  to  be  noninformative.  A 
Bayesian  employing  a  noninformative  prior  obtains 
results  closely  paralleling  classical  results,  although 
the  Bayesian  may  do  so  more  directly.  We  will  later 
demonstrate  how  even  a  modest  amount  of  prior 
knowledge  can  sharpen  the  inferences  drawn. 

Application  of  the  Bayesian  Method 
to  Frequency  of  Occurrence 

To  put  this  concept  in  focus,  let  X  be  a  random 
variable  taking  the  value  1  if  breeding  mallards 
(Anas  platyrhynchos)  are  present  in  a  quarter- 
section  selected  at  random,  and  0  if  they  are 
absent.  Then  X  is  distributed  binomially  with 
Pr|  X  =  \\  -  9  and  9  ,   the  expected  value  of  X, 


represents  the  frequency  of  occurrence  of  mallards 
per  quarter-section. 

Useful  prior  knowledge  may  be  available,  either 
from  censuses  conducted  previously  on  these  quarter- 
sections  or  from  a  knowledge  of  how  well  the  habitat 
requirements  of  the  mallard  are  satisfied  on  each 
quarter-section.  But,  for  the  moment,  assume  prior 
knowledge  is  lacking;  we  then  have  to  select  a  prior 
distribution  for  9    which  expresses  our  ignorance. 

The  Beta  family  of  prior  distributions 
p(e)  =  constant  9°"  (1-6)1^ 
with  a  >-\,  (i  >-lis  appealing  because  it  is  of  the 
same  form  as  the  likelihood  function,  so  that  its  use 
results  in  a  posterior  distribution  belonging  to  the 
same  family.  In  addition,  the  Beta  family  is  very 
flexible;  by  varying  a  and  fi  the  curve  can  assume  a 
variety  of  shapes  reflecting  various  degrees  of  prior 
belief.  The  choice  of  exponents  a  =  (3  =  -V2  has  been 
suggested  (Jeffreys  1967;  Good  1965)  on  the  basis  of 
invariance  under  transformation  of  the  parameter 
9  .  That  is,  if  we  have  no  prior  knowledge  of  9  ,  then 
we  are  equally  ignorant  of  9~\  log  9  ,  etc.,  and  the 
prior  should  reflect  this  invariance.  The  choice  of 
exponents  a  ,  /8  will  have  little  effect  on  our  con- 
clusions as  long  as  they  are  small  in  absolute  value 
relative  to  z  and  n  -  z,  (defined  below).  The  prior 
distribution  with  a  =  0  -  -V2  is  graphed  in  Fig.  1. 
Note  how  little  it  varies  over  most  of  the  range  of  9  . 

We  next  take  our  random  sample  of  censuses  on  n 
quarter-sections,  yielding  z  =  2.x  {  quarter-sections 
with  mallards  present.  The  average  frequency  of 
occurrence  is  z/n  and  the  likelihood  function  of  9  , 
given  the  data,  is 

L{d\xu  X2,  ...,  x„)  -  e^'^'n-ef'^''' 

For  example,  Stewart  and  Kantrud  (1972)  found 
mallards  breeding  in  79  of  their  130  sample  quarter- 
sections.  An  estimate  of  the  frequency  of  occurrence  is 
then  given  by  79/130  =  0.608,  and  the  likelihood 
function  is 

L{e\  xuX2,  ...,  -vn)  =  0"  {i-ef. 

We  next  combine  our  prior  information  with  the 
information  from  the  sample  by  multiplying  the 
corresponding  probability  functions,  to  obtain  the 
posterior  distribution  of  9  ,  given  the  observations, 
with  probability  density  denoted  by 

p{e\  xu  X2,  .  .  .  ,  Xn)  =  constant 

XL(e\   .Vl,    X2,    ■   .  .  ,    Xr,)  p{6), 

where  the  constant  is  such  that 

S'P{9\    Al,    X2,     .    .    .    ,.Xn)    de    =1- 

o 

The  term  "posterior"  is  in  reference  to  the  fact  that  it 
represents  information  available  "after"  the  expert- 


THETA 

Fig.  1.     Prior,  likelihood,  and  posterior  functions  for  frequency  of  occurrence  of  mallards.  "Probability"  means  probability 
density. 


ment  (or  counts).  In  the  present  case  of  a  binomial 
variate, 

L{d\  x\,  xi,  .  .  .  ,  Xn)  =    constant  0'(l-0)"  ^  , 

and 

pid)  =  constant  0""'(l-0)"'^' 
so  the  posterior  distribution  becomes 

p(d\  .VI,  X2,  .  .  .  ,  Xn)  =  constant  e''"-(l-0)"~'""^ 

or,  in  the  mallard  example, 

p(d\  .vi,  X2,  .  .  .  ,  Xn)  =  constant  e''\l-ef°\ 

The  appendix  describes  how  the  constant  is 
evaluated.  Because  of  the  large  sample  size  and 
moderately  large  mean  of  the  mallard  data,  the 
likelihood  function  dominates  the  posterior  distribu- 
tion to  such  an  extent  that  graphs  of  the  func- 
tions cannot  readily  be  distinguished  (Fig.  1).  We 
can  now  find  points  6]  and  62  so  that 
Pr{0,<0<0,|  =  1  -  a.  The  interval  (duO:)  is  the 
Bayesian    counterpart    of   a    classical    confidence 


interval.  A  two-sided  confidence  interval  is  ordinarily 
chosen  so  that  a/ 2  of  the  likelihood  is  eliminated 
from  each  tail  of  the  distribution.  On  the  other  hand,  a 
Bayesian  seeks  the  shortest  interval  (61,  6:)  con- 
taining 1  —a  of  the  probability  by  selecting  0i  and 
02    such  that; 

(1)  no  point  outside  the  interval  has  its  probability 
function  greater  than  any  point  within,  and 

(2)    J      Pie  I  .X-,,  X2,  .  .  .  ,  Xn)  dd  =  \  -  a  . 

01 

Such  an  interval  is  called  a  Bayesian  confidence 
region  or  a  Highest  Posterior  Density  (HPD)  region 
(Box  and  Tiao  1965).  The  95%  HPD  region  for 
frequency  of  occurrence  of  mallards  was 
(0.523-0.690).  The  calculation  of  HPD  regions  is 
demonstrated  in  the  appendix. 

The  usual  method  for  obtaining  a  confidence 
interval  for  a  binomial  parameter  employs  the 
normal  approximation,  which  is  fairly  good  for  large 
samples  as  long  as  z  (the  number  of  samples  in  which 


the  species  was  present)  and  n  -  z  (the  number  of 
samples  in  which  the  species  was  absent)  are  not  too 
small.  The  normal  approximation  gives  limits  on  the 
mean  as  plus  or  minus  two  standard  errors.  Exact 
limits  can  be  obtained  by  using  the  relationship 
between  a  binomial  sum  and  the  incomplete  beta 
function  (Johnson  and  Kotz  1969),  which  allows 
equal-tailed  intervals  to  be  obtained  from  tables  of  the 
F  distribution  (Bliss  1967).  For  mallards,  with  a 
moderate  value  of  z  (=  S.v  =  79)  as  well  as  a  large 
sample  size  (n  =  130),  all  three  procedures  yielded 
very  similar  confidence  limits  (Table  1).  Consider, 
however,  the  burrowing  owl  (Speotyto  cunicularia), 
which  was  present  on  only  2  of  the  130  sample 
quarter-sections.  The  normal  approximation  fails 
here  because  the  number  present  was  small,  and  the 
Bayesian  confidence  interval  is  shorter  than  the 
exact  interval  (Table  1).  Although  there  are  some  non- 
Bayesian  methods  for  reducing  the  size  of  confidence 
regions,  e.g.,  Hudson  (1971),  they  do  not  follow  as 
naturally  as  in  the  Bayesian  framework  and  there  is 
no  unified  approach  to  using  available  prior 
knowledge. 

Application  to  Estimating  Population  Density 

In  certain  circumstances  the  probability  limits 
derived  for  the  frequency  of  occurrence  can  also  be 
used  to  estimate  the  density  (Dice  1952).  Suppose 
the  sample  areas  censused  are  small  relative  to 
the  territory  size  of  a  species,  so  that  the  probability 
of  more  than  one  pair  on  an  area  is  zero.  Then  the 
frequency  of  occurrence  is  identical  to  the  average 
density  per  unit,  and  the  total  number  of  pairs  is  equal 
to  the  number  of  sample  units  containing  a  pair.  Thus 
the  method  described  above  will  also  give  probability 
limits  for  the  population  total. 

To  equate  density  to  frequency,  we  have  to  assume 
that  Prj  more  than  one  pair  per  sample  unit  f  is 
negligible.  For  a  territorial  species,  the  validity  of  this 
assumption  can  be  assured  by  reducing  the  size  of  the 
sample  unit,  although  the  resultant  unit  may  be 
impracticably  small.  The  assumption  can  be  exam- 


Table  1. — Confidence  limits  for  frequency  of  oc- 
currence of  breeding  pairs  of  mallards  and 
burrowing  owls  on  130  sample  quarter-sections  in 
North  Dakota. 


Method 


Mallard  Burrowing  owl 


Normal  approximation 

to  binomial 
Exact  binomial 
Bayesian 


(.522,  .693) 
(.519,  .690) 
(.523,  .690) 


(-.006,  .037) 
(  .002,  .054) 
(  .001,  .042) 


ined  if  a  large  number  of  units  are  censused  by  noting 
whether  or  not  any  unit  contains  more  than  one  pair. 
If  not,  then  Pr|  more  than  ofee  pair  per  sample 
unit  I  is  not  likely  to  be  appreciable.  Even  if  more 
than  one  pair  can  occur  in  a  unit,  the  estimator  will  be 
only  slightly  affected  as  long  as  Pr  |  more  than  one 
pair  per  sample  unit  >  is  small  and  the  probability  of 
a  large  number  of  pairs  on  a  sample  unit  is  zero.  For  a 
colonial  species  this  latter  probability  is  nonzero,  so 
the  relationship  between  frequency  of  occurrence  and 
density  does  not  hold,  and  this  method  should  not  be 
attempted. 

Modified  Normal  Limits  for 
Population  Density 

If  the  counts  are  reasonably  large  and  sufficiently 
numerous,  we  have  some  confidence  that  the  central 
limit  theorem  applies.  That  is,  we  expect  the  average 
of  the  counts  to  be  nearly  normally  distributed.  We 
will  indicate  how  inferences  can  be  sharpened  by 
using  the  slight  prior  knowledge  available.  Even 
before  sampling  takes  place,  we  know  that  the  count 
on  any  sample  area  will  be  non-negative.  Since  the 
normal  distribution  extends  into  negative  values,  it  is 
more  general  then  we  need,  although  this  becomes  a 
problem  only  when  the  average  is  small  relative  to  its 
standard  error. 

Let  6  be  the  mean  and  a'  the  variance  of  the 
counts.  The  usual  (Jeffreys  1967)  noninformative 
prior  distribution  for  ( 0  ,  a' )  is 


with 


and 


p{e,o')  =  p{e)p(o')  , 


p(6)  =  constant  for  all  6  , 


P(a' 


constant  a     for  a>0 


But  for  counts,  we  know  6^0,  so  we  can  take 
constant  Os^O 

P{6) 


0 


0<O  , 


to  place  all  of  the  probability  in  the  range 
where  05=0.  The  likelihood  function  obtained  from 
the  counts  is    L(6,a'  \   X\,  xi,  .  .  .  ,  .Vn) 

=, constant  a~"  exp  |  — 2(^i  ~  ^)^/2a  [  . 

Combining  our  prior  distribution  with  the  likelihood  • 
function  yields  the  posterior  distribution,  given  by 

/7(0,a'  I   X\,    X2,    .   .   .   ,    Xn) 

-'  exp  {  -S(x.  -  e)V2a'  [  0^0 


constant  a 


0 


0<O 


Then  the  distribution  of  the  mean  9   is  found  by 
integrating  out  a  ^ ,'  so 

p(d  I    Xl,    X2,    .   .    .   ,    Xn) 

=     r  p{d,a^  \  X[,  Xl,  .  .  .  ,  Xn)  da^ 


constant  [1  +  n(e  -  xf/(n-l)s^]  "'"""''  6^0 


0 


d<0 


On  the  positive  axis,  the  distribution  function 
of  0  is  proportional  to  Student's  t  with  mean  x  and 
variance  s'^/n,  which  is  nearly  normal  for  moderately 
large  sample  size.  Hence,  the  posterior  distribution 
of  6  approaches  the  normal,  except  that  the 
negative  tail  is  eliminated  and  the  portion  of  the 
probability  mass  from  the  negative  range  is 
redistributed. 

For  large  samples  with  a  moderate  coefficient  of 
variation,  the  HPD  regions  obtained  from  the  above 
distribution  are  similar  in  numerical  value,  but  not  in 
interpretation,  to  classical  normal  confidence  limits. 
If  the  mean  is  small  relative  to  the  standard  error, 
however,  the  Bayesian  method  may  give  con- 
siderably shorter  intervals  because  knowledge 
that  6  is  non-negative  contributes  more  informa- 
tion when  X  is  small  than  when  x  is  large.  For 
example,  the  density  of  the  mallard,  whose  habitat 
requirements  are  so  general  that  they  are  easily  met 
in  much  of  North  Dakota,  was  1.68  pairs  per  quarter- 
section.  A  standard  error  of  0.152  resulted  in  a  95% 
confidence  interval  of  (1.38,  1.98),  which  is  no 
different  from  the  HPD  interval.  The  long-billed 
marsh  wren  (Telmatodytes  palustris),  on  the  other 
hand,  is  mostly  restricted  to  semipermanent 
marshes,  so  that  its  distribution  is  much  more 
localized  than  that  of  the  mallard.  But  where  it 
occurs,  it  is  often  abundant,  so  that  the  variation  in 
the  counts  is  large.  The  95%  confidence  interval  for 
this  species,  with  an  average  of  0.39  pair  per  quarter- 
section  and  a  standard  error  of  0.223,  was  (-0.05, 0.84). 
The  HPD  region  (0.00,  0.76)  is  appreciably  shorter 
because  of  the  large  coefficient  of  variation.  Fig.  2 
illustrates  for  the  long-billed  marsh  wren  the 
likelihood  function  on  which  ordinary  confidence 
limits  are  based  and  the  posterior  distribution  from 
which  HPD  regions  are  determined. 

Blumenthal  (1970)  developed  interval  estimators  of 
a  normal  mean  restricted  to  be  non-negative,  in  the 
case  of  known  variance. 

The  Use  of  Prior  Information 

Thus  far  we  have  restricted  our  attention  to  the  case 
in  which  we  lack  prior  knowledge,  except  for  the  fact 


that  counts  are  non-negative.  One  of  the  features  of 
Bayesian  statistics  is  the  ease  with  which  prior 
knowledge  can  be  incorporated  intc^he  analysis.  In 
this  section  we  will  illustrate  this  by  an  example. 

The  mallard  density  in  the  primary  waterfowl  area 
of  North  Dakota  for  1957-66  varied  from  0.92  to  2.97 
pairs  per  quarter-section  (Table  2).  How  can  these 
values  be  used  in  combination  with  the  1967  counts? 
One  possibility  is  to  use  the  early  data  to  develop  a 
prior  distribution  for  estimation  of  the  1967  count. 
Take  the  prior  distribution  for  the  1967  density  to  be 
normal  with  mean  =  2.11  and  variance  =  0.365. 
Combining  this  prior  with  the  likelihood  function  for 
the  1967  data,  a  normal  with  mean  =  2.45  and 
variance  =  0.044,  results  in  a  normal  posterior 
distribution,  with  mean  =  (2.45/0.044  +  2.11/0.365) 
-^  (1/0.044  +  1/0.365)  =  2.41.  The  variance  is 
(1/0.044  + 1/0.365)-!  =  0.039.  The  details  of  the 
calculation  are  shown  in  Box  and  Tiao  (1973).  Note 
that  the  posterior  mean  is  a  weighted  average  of  the 
prior  mean  and  the  sample  mean,  with  weights 
inversely  proportional  to  the  corresponding 
variances.  Thus,  the  posterior  mean  will  shift  from 
the  observed  mean  toward  the  prior  mean;  the 
amount  of  shift  depends  upon  the  relative  precision  of 
the  two  means.  In  the  example,  the  shift  was  minor 
because  the  variance  of  the  1967  mean  was  small 
relative  to  the  variation  among  years  in  the  prior 
data.  Nevertheless,  some  improvement  in  precision  is 
indicated  by  the  11%  reduction  in  the  posterior 
variance  as  compared  with  the  sample  variance 
(0.039  vs.  0.044). 

Although  it  may  be  contrary  to  the  intuition  of 
biologists  to  use  data  from  other  years  to  estimate  the 
1967  count,  the  theoretical  justification  is  con- 
siderable. For  example,  Lindley  (1971)  and  Box  and 
Tiao  (1973)  developed  Bayesian  estimators  for  a 
vector  of  normal  means  such  that  information  from 
all  means  is  used  in  the  estimation  of  each  mean.  A 
sampling  theory  justification  was  given  by  James 
and  Stein  (1961),  who  demonstrated  the  surprising 
result  that,  in  estimating  a  vector  of  normal  means, 
the  maximum  likelihood  estimator  (the  sample  mean) 
has  a  larger  mean  square  error  than  does  a  weighted 
average  of  the  sample  mean  and  the  overall  mean. 

Judgment  is  necessary,  of  course,  in  determining 
whether  combining  should  be  done  or  not.  In  the 
mallard  example,  if  water  conditions  were  uniformly 
excellent  in  1957-66,  but  known  to  be  poor  in  1967,  we 
should  be  reluctant  to  use  the  early  data  to  form  an 
opinion  on  the  1967  count.  Efron  and  Morris  (1973) 
discussed  strategies  for  combining  data  and  proposed 
a  method  that  allows  the  data  to  determine  the  extent 
of  combination:  larger  weights  are  given  to  means 
that  are  "close"  (in  the  likelihood  sense)  to  the 
observed  mean  being  estimated.  So,  for  example,  in 
estimating  the  1967  mallard  density,  higher  weights 


-0.4  -0.2 


0.2  0.4 

THETA 


0.6 


0.8 


1.0  1.2 


Fig.  2a.  (TOP)  Normal  likelihood  function  for  long-billed  marsh  wren,  showing  95%  confidence 
limits  (-0.05,  0.84).  Shading  indicates  the  region  to  which  the  prior  distribution  function  assigns 
zero  probability. 

Fig.  2b.  (BOTTOM)  Posterior  probability  function  for  long-billed  marsh  wren,  showing  95%  HPD 
region  (0.00,  0.76). 


Table  2.— Mallard  densities  (pairs  per  quarter- 
section)  in  primary  waterfowl  area  of  North 
Dakota.  1957-66. 


Year 


Density 


1957 
1958 
1959 
1960 
1961 
1962 
1963 
1964 
1965 
1966 

Average 
Variance 


2.50 
2.45 
0.92 
1.84 
1.58 
1.68 
2.97 
2.36 
2.23 
2.61 

2.11 
0.365 


would  be  given  to  years  that  had  densities  closer  to 
the  1967  value.  Their  procedure  is  one  of  a  class  called 
"Empirical  Bayes"  methods,  which  are  intermediate 
between  Bayesian  and  sampling  theory  methods. 
They  are  similar  to  Bayesian  methods  in  that  a  prior 
distribution  is  combined  with  a  likelihood  to  obtain  a 
posterior  distribution,  but,  unlike  Bayesian  practice, 
the  prior  distribution  is  estimated  from  the  data, 
rather  than  assigned  subjectively.  See  Krutchkoff 
(1972)  for  a  brief  introduction  to  Empirical  Bayes 
methods. 

Most  of  the  literature  in  this  area  is  new  and 
detailed  practical  procedures  have  not  yet  been  fully 
worked  out.  Applications  of  Empirical  Bayes 
procedures  will  no  doubt  increase  in  the  near  future, 
but  it  is  beyond  the  scope  of  this  article  to  provide  a 
comprehensive  review  of  the  subject. 

Acknowledgments 

I  am  grateful  to  J.  S.  Ramberg,  J.  R.  Taylor,  and  P. 
F.  Springer  for  comments  on  an  earlier  draft  of  this 
report,  to  L.  M.  Cowardin  for  profitable  discussions 
and  review,  to  C.  L.  Nustad  for  the  careful  typing,  and 
to  R.  E.  Stewart  and  H.  A.  Kantrud,  who  allowed  their 
data  to  be  used  throughout  the  development. 

References 

Beyer,  W.H.(ed.).  1968.  Handbook  of  tables  for  probabili- 
ty and  statistics,  2nd  ed.  The  Chemical  Rubber  Co., 
Cleveland.  642  pp. 


Bliss,  C.  I.  1967.  Statistics  in  biology:  statistical  methods 
for  research  in  the  natural  sciences,  vol.  1.  McGraw-Hill, 
New  York.  558  pp. 

Blumenthal,  S.  1970.  Interval  estimation  of  the  normal 
mean  subject  to  restrictions,  when  the  variance  is  known. 
Nav.  Res.  Logist.  Q.  17(4):485-505. 

Box,  G.  E.  P.,  and  G.  C.  Tiao.  1965.  Multiparameter 
problems  from  a  Bayesian  point  of  view.  Annu.  Math. 
Stat.  36(5):1468-1482. 

Box,  G.  E.P.,  andG.C.Tiao.  1973.  Bayesian  inference  in 
statistical  analysis.  Addison-Wesley,  Reading,  Mass. 
588  pp. 

Brown,  K.  M.  1973.  Computer  oriented  algorithms  for 
solving  systems  of  simultaneous  nonlinear  algebraic 
equations.  Pages  281-348  in  G.  D.  Byrne  and  C.  A.  Hall, 
eds.  Numerical  solution  of  systems  of  nonlinear 
algebraic  equations.  Academic  Press,  Inc.,  New  York. 

DeGroot,  M.  H.  1970.  Optimal  statistical  decisions. 
McGraw-Hill,  New  York.  489  pp. 

Dice,  L.  R.  1952.  Natural  communities.  Univ.  of 
Michigan  Press,  Ann  Arbor.  547  pp. 

Efron,  B.,  and  C.  Morris.  1973.  Combining  possibly 
related  estimation  problems  (with  discussion).  J.  Royal 
Stat.  Soc.  B  35(3):379-421. 

Feller,  W.  1966.  An  introduction  to  probability  theory 
and  its  applications,  vol.  II.  John  Wiley  and  Sons,  New 
York.  626  pp. 

Good,  I.J.  1965.  The  estimation  ofprobabilities:  an  essay 
on  modern  Bayesian  methods.  MIT  Press,  Cambridge. 
109  pp. 

Hudson,  D.  J.  1971.  Interval  estimation  from  the 
likelihood  function.  J.  Royal  Stat.  Soc.  B  33(2):256-262. 

International  Business  Machines  Corporation. 
1970.  System/360  Scientific  Subroutine  Package  Ver- 
sion III  programmer's  manual,  5th  ed.  IBM  Corp.,  White 
Plains,  N.Y.  454  pp. 

James,  W.,  and  C.  Stein.  1961.  Estimation  with  quadratic 
loss.  Proc.  4th  Berkeley  Symposium  Math.  Stat,  and 
Probability.  Univ.  California  Press  1:361-379. 

Jeffreys,  H.  1967.  Theory  of  probability,  3rd  ed.  (cor- 
rected). Clarendon  Press,  Oxford,  England.  459  pp. 

Johnson,  N.  L.,  and  S.  Kotz.  1969.  Discrete  distributions. 
Houghton  Mifflin,  Boston.  328  pp. 

Krutchkoff,  R.  G.  1972.  Empirical  Bayes  estimation.  Am. 
Statistician  26(5):14-16. 

Lindley,  D.  V.  1971.  Bayesian  statistics,  a  review.  Soc. 
Ind.  Appl.  Math.,  Philadelphia.  83  pp. 

Overton,  W.S.  1969.  Estimating  the  numbers  of  animals 
in  wildlife  populations.  Pages  403-455,  in  R.  H.  Giles,  Jr., 
ed.  Wildlife  management  techniques.  3rd  ed.  (revised). 
The  Wildlife  Society,  Washington,  D.C. 

Pearson,  K.  (ed.)     1968.    Tables  of  the  incomplete  beta- 
function,  2nd  ed.  Cambridge  Univ.  Press,  New  York.  505 
pp. 
Schmitt,  S.A.     1969.    Measuring  uncertainty:  an  elemen- 
tary   introduction    to    Bayesian    statistics.    Addison- 
Wesley,  Reading,  Mass.  400  pp. 

Stewart,  R.  E.,  and  H.  A.  Kantrud.  1972.  Population 
estimates  of  breeding  birds  in  North  Dakota.  Auk 
89(4):766-788. 


Appendix. 

Calculation  of  Highest 

Probability  Density  Regions 

For  a  posterior  distribution p(0j.vi,  xi,  .  .  .  ,  Xn), 
a  (1  -  a)  Highest  Posterior  Density  (HPD)  region 
(61,62)    must  satisfy  the  following  two  conditions: 

If  e  e  (01,02)  and  d*  i  (6^,62),  then 

p(e  I    Xl,    X2,    .   .    .    ,    Xn)    >   p{e*  I    Xl,    X2,    .    .    .    ,    Afn)    ,   (1) 

and 

X^2 
pid  \  Xl,  X2,  .  .  .  ,  X:,)  de  =  I  -  a  .     (2) 

01 

We  will  develop  HPD  regions  for  a  binomial 
parameter  and  for  a  normal  mean  restricted  to  non- 
negative  values. 

HPD  Regions  for 
The  Binomial  Parameter 

If  I  -^i  (  are  distributed  binomially  with 
parameter  6,  then  the  posterior  distribution 
of  6  corresponding  to  the  noninformative  prior 
specified  earlier  is  given  by 

p{d\  xu  X2,  .  .  .  ,  x„)  =  constant  e''^'\l-6y''~"\  (3)\ 

where  z  =  2^  x,.    The  constant  term  is  such  that 

1    =  J     P{d  I   .Vl,    A:2,    .   .   .   ,    Xn)    dd 

O 

=  constant  J"  0'"'^'  (1  -  d)"'"'^'^  dd 

=  constant  B{z+%,  n-z-\-%)  , 

where  B(a,  b)  is  the  beta  function  (e.g.,  Feller  1966). 
Thus,  the  required  constant  is  the  reciprocal  of 
B{z+V2,  n-2+1/2). 

A  (1-a)  HPD  interval  is  given  by  (6\,  62)  satisfy- 
ing equations  (1)  and  (2)  above.  For  a  unimodal 
distribution  the  first  requirement  is  equivalent  to 

P(dl   I    Xl,    X2,    .    .    .    ,    Xn)    =   p(62  I    Xl,    X2,    .    .    .    ,    Xn)    , 

which,  using  (3)  and  noting  that  the  constant  occurs 
on  both  sides  and  thus  cancels,  becomes 

di'-'"  (1  -  di)  "''-'"  =  62''"'  (1  -  62)""''"', 


which  has  the  same  solution  as  does 
/i(0i,02)  =  0  , 


where 

Mdi,e2)  =  (z-%)  (log  di  -  log  02) 

+  (n-z-%)  (log(l-0,)  -  log  (1-02)  ). 
It  follows  from  (2)  that 


1 


.0 


B(z+%,  n-zVh)   -^Q 


/   V'^\l-0)"-^"'/'d0  =  1  -  a  , 


or 


1 


f  '  e'-"\\-ef''-"^de 


5(2+72,  /j-z+'/j)  •'o 

1 


5(z+'/: 


~ r-    f     d'-"\l-dy-'-"'dd  -   1  -a 


or 
where 


/2(01,02)    =    0 


(5) 


(4) 


/2(01,02)    =    Ig^    {z+%,    «-Z+'/2)    -   Iq^    iz+%,    n-Z+%) 

-{I  -  a)  , 

and  I  u)(a,  b)  is  the  incomplete  beta  integral  with 
parameters  a  and  b  evaluated  at  the  point  w  (Beyer 
1968).  This  function  has  been  tabled  (Pearson  1968), 
and  computer  routines  are  available  for  its  calcula- 
tion (International  Business  Machines  Corporation 
1970). 

We  thus  have  two  equations,  (4)  and  (5),  that  0i 
and  02  must  simultaneously  satisfy.  Because  both 
equations  are  nonlinear,  iterative  techniques  are 
necessary.  Two  such  approaches  have  been 
successfully  used,  although  either  can  fail  under 
certain  extreme  conditions. 

The  first  and  more  direct  approach  is  to  use  a 
computer  algorithm  designed  to  solve  a  system  of 
nonlinear  equations.  In  the  present  instance  the 
system  consists  of  two  equations,  (4)  and  (5).  Not  all 
computer  facilities  have  such  routines,  but  Brown 
(1973)  presented  an  algorithm  and  Fortran  program 
which  can  be  used  for  this  purpose. 

The  second  approach  requires  only  computer 
routines  that  are  commonly  available.  The  equations 
(4)  and  (5)  can  be  solved  by  minimizing  the  quantity 

|/l(01,02)|     +    1/2(01,02)1 

with  respect  to  0i  and  02.  The  minimum  is  zero,  and 
is  attained  for  values  of  0i  and  02  such  that 
equations  (4)  and  (5)  are  satisfied.  Numerous 
algorithms  have  been  developed  to  find  the  minimum 
of  a  function.  Most  computing  centers  have  one  or 
more  available. 


HPD  Regions  for 
the  Normal  Mean 

The  approximate  posterior  distribution  of  the  mean 
of  non-negative  random  variables  |  Xj  >  is  asymp- 
totically 

I  constant  5~'  X 
exp  \rn{e-xfl2s^]  if  0^0 
0  if  0<O. 

The  value  of  the  constant  will  depend  upon  the 
excluded  portion  of  the  normal  curve  (the  shaded  area 
in  Fig.  2a).  Let  A  be  this  value.  Then 

A  =  ^°  (In  s^lny'"-  exp[-«(0-x)V25']  dd 

—Jnx  Is 

— oo 

=  Pr    \w  <  —\Jnxls\   where  W xs  a  unit  normal 
deviate 

=  *  (-\/«.v/.s)  , 

<!>(  •  )  being  the  cumulative  normal  distribu- 
tion function.  Because 

00 

J        p(d  \   X\,    X2,    .   .   .   ,    Xn)    dd    =    \     , 


the  distribution  for  positive  values  of  0  is  \/{\.-A) 
times  that  of  a  normal  variate  with  mean  .v  and 
variance  s^/n,  i.e., 

P    (6  I    Xl,    Xl,    .    .    .    ,    Xn) 

=  (1-/1)"'  ilns^ln)'"^   exp  [-n(0-x)V2s']  for  0^0. 

As  with  a  binomial  variable,  we  want  values  d\ 
and  02  which  satisfy  equations  (1)  and  (2).  The  form 
that  the  HPD  interval  takes  will  depend  on  whether 
or  not  A  <  all  holds. 

Casel:/!  <  a/2.  Here  the  intervalwillbesymmetric 
about  X.  Equation  (2)  implies 


1  -  a  =   f    '  (1-/1)"'  {iTTs'ln)'"^ 


exp  [-/I  {e-xflls"]  dd  , 


or 


X02 
(27rs'lny"'x 

exp  [-«  i6-xfl2s']  dd 


^/n(di-x)ls 

r)'"'  exp  [-  i  W^]  dW 

sjn(di~x)ls 
^  yn{.Bi-x)ls\  -*  yn{e^-x)|s]  . 


it\\/Z.      -^  ft  ^ 


Equation  (1)  requires 

p{Q\   I    Xl,    X2,    .    .    .    ,    Xn) 


P(02  I    Xl,    X2,    .    .    .    ,    Xn) 


which  implies  symmetry  about  x,  so  (0i,  Qj)  simply 
corresponds  to  an  ordinary  confidence  interval  with  a 
confidence  coefficient  of  (1  -  a)  (1  -  >!)  rather  than 
(1  -  a)- 

For  the  mallard,  having  a  mean  of  1.68  and  a 
standard  error  of  0.152,  the  calculated  value  of 
A-^  (-11.05)  is  so  near  zero  that  it  can  be  ignored; 
thus,  the  HPD  region  coincides  with  the  ordinary 
confidence  interval. 

Case  H:  .4  >  a/2.  Here  Equation  (1)  insures  that  the 
lower  limit  will  be  zero,  so  we  need  to  exclude  o  of 
the  probability  content  from  the  upper  range. 
Equation  (2)  becomes  simply 


10 


r^2  ,        ,  ^° 

1   -  a  -  (1  -  Ay'  i27rs'/ny"'x  ,  _  , 


exp  [-n{d-xfl2s^]  dd. 


or 


or 


6i  ^  X  +  {sl-Jn)  <!>''[  1  -  a(l  -  A)]  , 


-1/2    X 


a  =  J   (1   -  ^)"' (27ri'/«)" 

di  and 

exp  [-n{e-xfl2s'-\  dd,  0,  =  0  . 


or 


Case  II  is  exemplified  by  the  long-billed  marsh  wren. 


X2      _^n  -2,    1-.  L.ase  11  IS  exempiinea  Dy  tne  long-Diiiea  marsn  wren. 

(2775  In)      exp  {-n{e-x)  lis  ]  dd       ^  ^^^^  ^f  o.39  and  standard  error  of  0.223  resulted  in 
^^  A  =* (-0.39/0.223)  =  0.040  >  a /2.  Hence,    e^  =  0.0 


and   6*2   =  0.39  +  0.223  $'  (0.952)=  0.76. 

Calculating  HPD  regions  for  the  normal  mean 

requires  simply  a  routine  for  the  normal  probability 

\Jn{Q2~x)ls  function  (to  determine  A)  and  one  for  the  inverse 

normal  probability  function  (to  determine  the  con- 
1  —  ^  [_\Jn{62-x)ls].  fidence  limits). 


oo 

=  J  (27r)""'  exp  [-'/2   ff ']  dW 


As  the  Nation's  principal  conservation  agency,  the  Department  of  the 
■  Interior  has  responsibility  for  most  of  our  nationally  owned  public  lands 
and  natural  resources.  This  includes  fostering  the  wisest  use  of  our  land 
and  water  resources,  protecting  our  fish  and  wildlife,  preserving  the 
environmental  and  cultural  values  of  our  national  parks  and  historical 
places,  and  providing  for  the  enjoyment  of  life  through  outdoor 
recreation.  The  Department  assesses  our  energy  and  mineral  resources 
and  works  to  assure  that  their  development  is  in  the  best  interests  of  all 
our  people.  The  Department  also  has  a  major  responsibility  for  American 
Indian  reservation  communities  and  for  people  who  live  in  island 
territories  under  U.S.  administration. 


GPO    837  -724 


UNITED  STATES 

DEPARTMENT  OF  THE  INTERIOR 

FISH  AND  WILDLIFE  SERVICE 

EDITORIAL  OFFICE 

AYLESWORTH   HALL,  CSU 

FORT  COLLINS,  COLORADO  80S23 


POSTAGE  AND  FEES  PAID 
US    DEPARTMENT  OF  THE  INTERIOR 


INT  423 


Mailing  lists  are  computerized.  Please  return  address  label  with  change  of  address.