BLM LIBRARY 88016401 T/N 331 Filing Code 5210 Date Issued March 1979 TECHNICAL NOTE U.S. DEPARTMENT OF THE INTERIOR - BUREAU OF LAND MANAGEMENT Sampling Theory Examples, and Rationale by L.R. Grosenbaugh no. 331 C. 3 Additional copies of Technical Notes are available from DSC, Federal Center Building 50, Denver, Colo., 80225 of Land Management [ Denver Federal Center Denver, CO 8G225 I FORWARD The purpose of this Technical Note is to provide insight into various designs used to sample populations. The Bureau of Land Management (BLM) has utilized 3P (an acronym for Probability Proportional to Prediction) as an operational tool for sampling timber for sale purposes for approximately fourteen years, This efficient sample design was devised by Mr. L. R. Grosenbaugh in 1963. 3P sampling has proven to be a very valuable component of timber measurement measurement methods available to BLM, but little has been available in print to explain the technique. In this paper, printed with the author's permission, Mr. Grosenbaugh provides narrative and tabular examples of a wide variety of designs available for sampling populations. These examples graphically demon- strate why 3P is such an efficient sample method. This Tech Note will provide an understanding of 3P and other sample designs for populations. Bureau of Land Management Library BlcSg. 50, Denver Federal Center Denver, CO 80225 ABOUT THE AUTHOR LEWIS Ri GrOSENBAUGH retired in June 1974 as Chief Mensurationist of the U.S. Forest Servicers first pioneering research unit, which he initiated in 1961 at Berkeley, California and moved to Atlanta, Georgia in 1968. He received his B.A. from Dartmouth College in 1934 and his M.F. from Yale University in 1936. His professional career began on the Ouachita, Ozark, and Florida National Forests as junior forester, assistant ranger, and timber management officer before transferring in 1946 to the Southern Forest Experiment Station in New Orleans as silviculturist, mensurationist, and finally chief of the division of forest management research. During World War II he served as gunnery officer on two destroyers with the Pacific Fleet and retired as LCDR, USNR. In addition to being a member of Phi Beta Kappa, Sigma Xi, the Biometrics Society, ACM, and AAAS , he is a fellow of the Society of American Foresters (SAF) , and author of more than 40 scientific papers or articles in the Journal of Forestry, Forest Science, Biometrics, or Communications of the ACM. In 1965 he received the Barrington Moore Memorial Award of the national SAF for outstanding achievement in biological research contributing to the advancement of forestry, and served as Visiting Scientist at University of Montana, Washington State University, Oregon State University, Colorado State University under auspices of NSF-SAF. In 1974, the Southeastern Section of the SAF presented him with the award of excellence for distinguished accomplishment in research and development in forestry in the southeast; in the same year he was invited to lecture and consult at VPI as Visiting Scholar. In 1976 the Mexican National Academy of Forest Sciences made him an honorary member. Presently, he is a forestry consultant and an adjunct professor, School of Forest Resources and Conservation, University of Florida, Gainesville, Florida. n SPANISH CITATION: Grosenbaugh, L.R. 197 7. Teoria, ejemplos y base racional del muestreo 3P. (in Memorla de la cuarta reunion nacional sobre inventario forestal continuo, pp. 137-155, SARH SFF Direccion General del Inventario National Forestal Publicacion Numero 33, Mexico, D.F.). ENGLISH TITLE: 3P Sampling Theory, Examples, and Rationale. A' ENGLISH ABSTRACT: The distributions of all possible sample estimates of a specified 10-tree population sampled by each of a wide variety of designs are analyzed to obtain expected mean squared error and bias. Different designs may employ equal or unequal selection probabilities, sample replacement or no sample replacement, fixed number of samples or variable number of samples (and in the last case, N = 0 may require a zero estimate or else resampling) . The designs requiring an a priori list include direct, ratio, and PPS samples, while the nonlist designs include several types of 3P samples. A 110-tree population is specified to illustrate the effect of using probabilities that are variously correlated with the variable of interest, and a single sample drawn from the same population (clustered in a specified manner) is used to illustrate point-3P sampling. The 110-tree popula- tion is again used in another example to exemplify the generally beneficial effects of frequency-balancing on adjusted 3P sample estimates. 1/ Originally published in Spanish in 1977 in "Proceedings of the (1975) 4th National Conference on CFI" by the Mexican Government. Reprinted in English with permission of the author and Mexican Government . in 3P SAMPLING THEORY, EXAMPLES, AND RATIONALE L. R. Grosenbaugh This discussion briefly compares procedures and results of ratio and list sampling (fixed N, selection probabilities equal or unequal, sample replacement or no sample replacement) with procedures and results of 3P sampling (variable N, selection probabilities unequal, no replacement, unadjusted or adjusted by ratio of expected number of samples to actual number of samples). Early discussion and figures 2 through 6 compare distributions of all possible outcomes for each of the above types of sampling, given fixed N = 2, variable N = 0 to 10 with expectation 1.752, variable N = 1 to 10 with expectation 2, and a hypothetical 11 -tree population specified as in figure 1 (the largest tree of which is always measured so that actually sampled population has M = 10). Basal area and volume are expressed in square or cubic meters, but DBH and height are coded to convenient small integers, with decoding factors as indicated. The advantages and disadvantages of each design are brought out, with emphasis on situations where 3P sampling is preferable (more efficient, more feasible, less biased). In order to illustrate 2-stage point-3P sampling of a spatially clustered tree population, this hypothetical 11 -tree population is expanded 10-fold (M = 110), with stipulation that it be spatially clustered on one hectare of land in such a way that 60 small trees are Poisson-distributed on one-third hectare, 40 medium trees are Poisson-distributed on another one-third hectare, and 10 large trees are Poisson-distributed on the final one-third hectare. All 110 trees in this population (including the largest) are always subjected to any sampling technique that is being illustrated. Figure 1 describes the 11 -tree population and the 110-tree population (if cluster parameters are multiplied by 10). Figure 7 illustrates the effect of different assignments of relative probability (KPI) on efficiency. Figure 8 summarizes and illustrates symbols and formulae appropriate to point-3P sampling. Figure 9 shows how adjusted 3P estimates and frequency- balanced 3P estimates largely eliminate the adverse effects of unrepresentative samples so apparent in unadjusted 3P estimates. Figure 10 summarizes and defines symbols and formulae appropriate to pure 3P sampling. Figure 2 illustrates equi probable random selection of a fixed number of sample trees (N = 2) with replacement from a list of 10 sample trees--the simplest form of sampling. Whether planning direct or ratio samples, the sequenced list of 10 trees shown in figure 2 and the 2 random integers in the range 1 - 10 will be needed (with replacement, the integers might be identical and select the same tree twice, but in the example they are 5 and 8, resulting in selection of trees 5 and 8). For di rect sample estimates, the volume of each sample tree is measured, and the sum is multiplied by the number of trees in the population divided by the number of sample trees selected. The estimate is unbiased, but highly variable. Ratio estimates select and measure the volume of the 2 sample trees in the same manner, but some other variable (coded D2*H in this case) must either be known (listed) or measured for all 10 trees in the population. Then the sum of ratios of volume/coded D^H for the 2 sample trees is multiplied by the sum of coded D^H for all 10 trees divided by the number of sample trees selected. The estimate is much less variable than the direct estimate, but it is biased. When a fixed number of sample trees (N = 2) is to be selected with replacement and with unequal probability, the relative probability to be used for each tree on the list must be available prior to the selection of any sample tree—figure 3 shows this using coded D^H as relative probability. These relative probabilities must be progressively accumulated for each tree in the listed population. Two random integers in the range 1 - 60 must be generated (the upper limit is sum of relative probabilities for all 10 trees in the population). The random integers in the example are 34 and 55, selecting trees 8 and 10 in the example, because the accumulation (36) at tree 8 is the first accumulation "greater than or equal to" (GE_) random 34, and the accumulation (60) at tree 10 is the first accumulation GE_ random 55. With replacement, the same sample tree may be selected twice. Once selected, the formula for estimates derived from samples selected with unequal probability is exactly the same as for ratio sampling (but larger trees will be selected more frequently than in ratio sampling). When the relative probability corresponds to some physical dimension of the tree (D2H, D^, etc.), this type of list sampling with unequal probability is commonly called "PPS" sampling (for "probability proportional to size"). The PPS estimate is unbiased and has less variability than the ratio estimate. List sampling with rw replacement is simple in the case of equi probable direct or ratio sampling. Simply reject any random integer that selected the same sample tree more than once and generate another integer. The same unbiased estimating formula is valid for equi probable direct list sampling with or without replacement. Similarly, the estimating formula for equiprobable ratio sampling without replacement is the same as for equiprobable ratio sampling with replacement, but both equiprobable ratio estimates contain the same bias. Equiprobable list sampling (direct or ratio) tends to haye less error variance with no replacement than with replacement. Unhappily, selection of a fixed number of samples from a list with unequal probability (PPS) with no replacement is much more complicated. The simple device of rejecting random numbers that select the same sample tree more than once cannot be used without injecting bias. However, if samples are appropriately selected, the unbiased estimating formula is the same whether PPS sampling is with replacement or no replacement. One unbiased procedure for systematic PPS selection of a fixed number of samples from a list with no replacement employs a random integer to select the first sample (as illustrated in figure 3) and then increments that integer by a constant fraction of the sum of relative probabilities (sum of coded D^H in the example). Additional selection criteria may be obtained by incrementing the last criterion, etc. The increment (a fixed interval between selection criteria) must be greater than or equal to the maximum relative probability of any tree in the population and less than or equal to the total relative probability minus that maximum individual probability. When an incremented criterion exceeds the total relative probability, that total must be subtracted from the criterion. It is usually satisfactory and convenient to establish the increment using (1/N) as the constant fraction. In the example in figure 3, this would have resulted in an increment of (1/2)* (60) = 30; random 34 + increment 30 = 64; 64 - 60 = criterion 4; 2nd tree accumulation 4 GE_ criterion 4, so that PPS sampling with no replacement would have selected 8th and 2nd trees as compared with 8th and 10th trees selected with replacement in the example. Under favorable circumstances, there is usually a gain in efficiency from systematic PPS sampling with no replacement, but it is occasionally less efficient than PPS sampling with replacement. The list should be randomly rearranged and reaccumulated before selecting a new set of N samples in repeated trials (i.e., in simulation studies). The effect of choice of interval size depends on population parameters and arrangement; for practical purposes, the effect is not predictable. Figure 4 summarizes the parametric distribution of all possible 2-tree samples, estimates, and mean squared errors for all 6 types of sample discussed thus far: direct, ratio, PPS, each with replacement and with no replacement. The same 10-tree population shown in figures 2 and 3 is used, with the 11th tree (the largest) always measured. The figures substantiate the fact that sampling with no replacement reduces error variance of equi probable direct and ratio estimates by the factor (M-N)/(M-1), although in the case of the ratio estimates, the fact is almost obscured by the large squared bias present in both mean squared errors. The ratio sample (no replacement), despite presence of a large bias, has a much smaller mean squared error than the comparable direct sample. The list PPS sample with no replacement (systematic interval = 1/2 total probabilities) is inferior to the list PPS sample with replacement in this particular population. However, a systematic PPS interval equal to 1/5 total probabilities would have resulted in the MSE dropping from .0012 to .0008. An MSE of .0008 could have also been obtained by using the systematic PPS interval equal to 1/2 total probabilities if the population was arranged in ascending order of D2H and never rearranged prior to selection of subsequent sets of N samples. This emphasizes the inherent complexity of systematic PPS sampling with no replacement. Now let us examine figure 5. The same 10 small and medium-sized trees are denoted here by numbers 1 through 10 in ascending order of size (although this is immaterial). Someone assigns a relative probability (KPI) to each tree. A KPI is a subjective prediction or guess of some tree dimension or variable of interest. In this case the prediction is coded D2H, but it could be guessed tree height, basal area, volume, value, etc. A tree will be selected as a sample if its prediction is greater than or equal to (GE_) some random integer. Since any selection is independent of the number of samples previously selected, a varying number of samples (N) will be obtained (as opposed to the fixed number obtained by the 6 types of list sample summarized in figure 4). Furthermore, the relative probability assigned can be arbitrary or subjective, which lets us make predictions proportional to our interest in the tree rather than to the size of some physical dimension or listed magnitude of the tree (thus, we could assign any Swie tenia a KPI 5 times as large as that assigned to a Pinus of identical size). The acronym "3P" (shortened form of PPP for "probability proportional to prediction") has been used to distinguish this type of sampling from list PPS sampling with fixed N. There is, moreover, a frequently neglected and rather subtle difference between 3P and nonlist PPS sampling with fixed N: 3P sampling can select samples in an unbiased way without any a priori list and still allow subjective decision as to "next" tree and its KPI, but allowing subjective decision as to order and KPI will inject bias into PPS sample selection with fixed N unless the subjective elements are crystal i zed in a list prior to any selection. Systematic PPS sampling will be the most biased, since after each selection the sampler knows that no other can occur until the accumulation has been increased by aggregate KPI equal to the systematic interval. However, even in PPS sampling with fixed N and with replacement, sampler's decision as to "next" tree and its KPI will be affected by his partial knowledge about the gap between accumulation and the next criterion—there is less chance of an impending selection following a closely spaced group of selections, and there is greater chance of an impending selection following a long run with no selections. These serious drawbacks rule out the use of nonlist PPS sampling with fixed N, since the advantages of subjective assessment of KPI are great, and subjective choice of the "next" tree is almost inevitable in complete visitation of variously clustered tree populations. However, foresters are exploiting one very useful unbiased form of nonlist geometric cluster sampling (PPS with variable N) with which I assume you are all familiar— horizontal point-sampling with an angle-gauge. How 2-stage point-3P sampling combines geometric PPS selection with 3P selection (both with variable N) will be discussed later, since it makes possible the efficient use of dendrometry in populations too large for pure 3P sampling. Briefly, the 3P sampling procedure requires that we first guess total KPI of the population and decide how many sample trees we desire. If we guessed total KPI was 60 and wished 2 sample trees, we would calculate KZ = 60/2 = 30, and would draw 10 random integers from the range 1 through 30. Each random integer would be paired with a KPI assigned to a tree. If the KPI were "greater than or equal to" (GE_) the integer, the tree would become a sample tree. Thus, KZ specifies the largest random number to be generated, which must be larger than the largest KPI in the population to be sampled. Also, KZ specifies that (on the average) 1 sample tree is desired for every KZ units of KPI in the total population. It might happen that we pair a random number with each of the 10 trees in the population and none are selected. If so, there are 2 choices- accept the outcome and estimate zero volume for the population, or specify drawing a completely new sample. Generally, we specify resampling when N = 0. For small expected numbers of sample trees (ESN), the probability of having to resample (called P0) may be appreciable. If we require N to be nonzero by such a specification, the expected nonzero number of sample trees (ENZSN) will be ESN/(1-P0), which will be larger than ESN. It should be noted that ENZSN and ESN are practically identical for all but very small populations and ratios of N/M so that we may ordinarily assume ENZSN = ESN. However, in the present situation, to specify ENZSN = 2, we must specify ESN = 1.7517225 and KZ = 34.252. The decimal following 34 merely means that the integer 35 should be generated only 25.2 percent as often as any one of the first 34 integers. An alternative would be to have this "reduced 35" flagged by the generator as a NULL (usually denoted -99999 and implying that any tree with which it is paired will not be selected). A small computer program (RN3P) can produce random integer lists for a wide range of integer KZ, and can easily be adapted to the noninteger KZ discussed above. Using KZ = 34.252 for the reason explained above, we generate 10 random integers and trees 8 and 10 happen to be selected as samples. The volume (YI) of each is measured, and the sum of the two YI/KPI ratios is multiplied by KZ for the unadjusted estimate which would estimate 0 volume if N were 0. An unadjusted estimate which requires resampling would simply be the previous estimate multiplied by (1-PO). Finally, an adjusted estimate can be made which multiplies the previous estimate by ENZSN/N, but is more easily calculated as ( rKPI/N)* £YI/KPI. This condenses the basic facts about how to use 3P sampling into a minimum; necessary symbols and formulae are summarized in figure 10. If you can visualize the process, you can soon acquire the necessary facility to design sampling plans that achieve your objectives. 3P gains considerable advantage from usually sampling with no replacement, as in figure 5 (although sampling with replacement is conceivable), and even more advantage from sampling with unequal probability, as in figure 5 (although equi probable 3P results from specifying each KPI to be unity). Figure 6 summarizes the parametric distribution of 3P samples and estimates of cubic volume from N = 0 through N = 10 with ESN = 1.7517225, drawn from our specified 10-tree population plus 1 "sure" tree. Note that there are really 34 possible sample combinations (excluding N = 0) summarized in the 10 sample classes. If all 10 trees had had different KPI (instead of 6 twos and 4 twelves), there would have been 210 - 1 = 1023 different sample outcomes—this explains why we have used a ridiculously small population to illustrate the principles and procedures. There are 3 different estimates that can be made from each 3P sample: unadjusted (don't resample if N = 0), unadjusted (resample if N = 0), and adjusted (resample if N = 0). The first unadjusted estimate is unbiased and the most variable; it uses only KZ and information obtained from the sample trees. The second unadjusted estimate is also unbiased and much less variable than the first when dealing with very small populations or very small ratios of ESN/M; otherwise there would be no appreciable difference between the first two estimates, since P0 would be nearly zero. The last adjusted estimate contains a trivial bias (usually undetectable even after thousands of simulations); it uses total KPI divided by actual number of samples (N) in place of KZ in the first estimate. Its mean squared error is lower than that for most of the other types of sampling discussed except for certain situations where list PPS sampling with or without replacement might be slightly superior if feasible (which is rarely the case). Good approximations of relative variance (using formulae in figure 10) indicate that adjusted 3P sampling with no replacement should be more efficient than PPS sampling with replacement whenever ESN/(M-ESN) is greater than VN, where VN is the relative variance of N about ENZSN. This inequality is always satisfied when ESN2 + ESN is greater than M. The very low mean squared error of the adjusted estimate with ENZSN = 2 in figure 6, translated to a percentage of the expected value of the 10-tree portion of the population is 100 * \/. 0011/1 .8649 = + 1.8 percent, certainly a remarkably small relative standard error when it is considered that the average number of samples is only 2 trees from a population of 10 trees. The squared bias is .00000441, which is less than "\/2% of the overall mean squared error. Thousands of simulations with smaller and larger populations and different 3P sample designs have substantiated that adjusted 3P bias is trivial and can be neglected for all practical purposes. Figure 7 was designed to help answer the question of what would happen if our guesses at KPI are poor. For the first time now, we will use the 1 10-tree population which contains 10 times as many trees as the 11 -tree population, and we will select the samples from the entire population (M = 110) instead of treating the large trees as sure-to-be measured. As KPI, we will use unity (which is uncorrelated with tree volume), tree volume (which is perfectly correlated with tree volume), and 4 tree dimensions which are weakly or strongly correlated with tree volume (coded values of height, diameter, squared diameter, and our old familiar D2*H). In each case a KZ has been computed which will select a number of sample trees with expected value of 11 (ESN = ENZSN =11). The figure shows that the expected distribution of these 3P sample trees by size class differs drastically, ranging from 6 small, 4 medium, 1 big when KPI is always unity to 1 small, 4 medium, 6 big when KPI is coded D2H. The approximate standard errors range from +45 percent where KPI = constant unity to +2.3 percent where KPI is coded D2H to .0 percent where KPI is constantly proportional to tree volume. The last 3 columns of figure 7 show some important parametric relative variances: A2 is the rel variance of the ratio volume/KPI, each ratio for a given tree being deemed to recur KPI times; G2 is the squared coefficient of variation of tree KPI, and where KPI = YI (the variable of interest),G2 = C2, the squared coefficient of variation for YI (volume, in this case); VN is the rel variance of N about ENZSN for a given KZ. Obviously, A2 is the major component of the relvariance of each estimate. The most important principles derivable from the foregoing discussion and figures follow. There is little justification for equiprobable ratio sampling now that PPS and 3P sampling are available. Sampling with unequal probability will always be preferable to equiprobable sampling where the correlation between YI and relative probability (KPI in the case of 3P sampling) is greater than (1/2)*(G/C) and! the cost of measuring a sample is nearly independent of YI, as in the case of rangefinder dendrometry. If, additionally, a complete list with desired probabilities is available prior to sample selection and trees selected in the office can be located in the woods through adequate monumentation or coordinates, PPS list sampling using D2H with fixed N and with no replacement may be slightly more efficient than adjusted 3P sampling in a few situations. In most other cases, adjusted 3P sampling will be preferable because PPS sampling with fixed N would be either biased (no a priori list) or infeasible (impossibility of relocating trees selected from a priori list). Adjusted 3P estimates have a negligible bias but are far more efficient than unadjusted 3P estimates where total KPI becomes known a_ posteriori . The more consistently KPI is assigned proportional to YI, the smaller the sampling error becomes. Bias in KPI does not inject bias into the estimate, but fluctuation in the ratio YI/KPI does inflate sampling variance. Let us consider how 3P sampling may be combined with some earlier stage (or stages) of geometric cluster sampling to avoid visiting every tree in the population. Figure 8 postulates that 3 point-samples are randomly located on a hectare with 110 trees clustered as specified in figure 1. A 360° sweep is made at each point with an angle-gauge having a metric Basal Area Factor (BAF) = .547244 m? of basal area per hectare per gauge-selected tree. Each gauge-selected tree is assigned KPI = coded height (HF). Coded height of each gauge-selected tree is compared with random integer generated by KZ = 12. Trees both point-selected and 3P-selected are dendrometered for use in an adjusted point-3P estimate. Figure 8 shows that Nl = 31 trees were selected at 3 points and gives their guessed heights; all trees at point 1 are also in size-class I, etc., by definition. Coded sum H at the 3 points was 12, 48, 72, and with KZ = 12, expectation of 3P selection proportional to H would be 1 small, 4 medium, 6 big trees. Figure 8 shows that N2 = 11 trees (distributed 1, 4, 6) were indeed 3P-selected from the 31 point-selected trees (using coded H for KPI). The ratio of volume (YI) to (Basal Area * Coded Height) is shown for each class of selected tree. The sum of the 11 ratios is 17.2980. Beneath the table is the formula for computing the adjusted point-3P estimate. Such a sample is very efficient for permanent inventory samples (C.F.I, or growth, harvest, and mortality samples). The variance of a point-3P estimate depends largely on the variance of sum H among points, as is indicated by the approximate variance formula shown. The parametric variance (calculable only because cluster parameters are specified in a particularly simple manner) shows that the approximation is a good one. Principles outlined above may be readily extended to cover more complicated designs—for instance, numerous equiprobable photo-points at each of which a photo-guess of sum D^H or volume per hectare is made, followed by PPS selection of a subset of these points at which a second-stage point-sample (on the ground) estimates sum H per point (proportional to sum D2H per hectare), supplemented by a phase using a completely independent set of ground-points that obtain point-3P dendrometry (as in figure 8) for estimating the mean volume/D^H ratio. A simpler design lacking the dendrometry phase has proved very efficient. The sampler goes through the woods rapidly, guessing sum D2H or volume per hectare at a large number of points. Each guess is immediately compared with an appropriate integer, and only a small number of these points will be 3P-selected to require a rigorous point-sample that provides an unbiased estimate of sum D2H or volume-table volume per hectare. The sum of guesses constitutes sum KPI, and the ratio of point-selected sum H to KPI at each point provides the YI/KPI ratios. This 3P technique costs far less than pure point-sampling for a specified precision. There is one more aspect of 3P estimating that I would like to demonstrate without going into the mathematical procedure (which involves the generalized unique inverse of a matrix formed by a system of 3 or more constraints on the 3P estimate). We call the feature frequency-balancing, because it results in adjusting the frequencies represented by each sample tree in the adjusted 3P estimate so that their sum exactly equals population M, while the estimates of total volume and KPI remain unchanged. Figure 9 illustrates frequency balancing. We assume KPI = D2H and use KZ = 120. According to figure 7, expectation is 11 sample trees with 1 small, 4 medium, 6 big trees. However, in the illustration 13 sample trees were obtained, with 2 small, 3 medium, 8 big. This atypical sample resulted in a rather poor unadjusted estimate, which was considerably improved by adjustment. However, when frequency-balancing was invoked, the estimate of total frequency (M) was made perfect, while estimates of total surface and length were greatly improved, and estimate of basal area deteriorated only slightly. This balancing is an optional feature in computer program STX which handles a wide variety of dendrometers and sample designs. Those interested in the many powerful features of STX, RN3P, and USMT should refer to U. S. Forest Service Research Paper SE-117, published in 1974. I have probably crowded far more into this talk than can be grasped initially, so I will not bore you with all the 3P symbol ogy and formulae which can be studied at leisure in figure 10. Let me only say this: the approximate formula for the relative variance of an adjusted 3P sample estimate is a very good one, despite some literature which indicates that no suitable parametric or sample-based approximation for adjusted 3P sampling error could be found. These papers had serious mistakes in mathematics and computer programming, and neither their numerical results nor their conclusions are valid. Three factors in the formula for adjusted 3P relvariance may frequently be close to unity. [1 + VN] approaches 1 when ESN is greater than 100 (and thus disappears the only demonstrable advantage of list PPS sampling with fixed N). [1 - P0] approaches unity when P0 approaches zero, which is usually the case in practical applications where M is large and ESN exceeds 10. Finally, [1 - ESN/MJ approaches unity for sampling fractions usually found in most practical applications. Where these 3 factors are close to unity, the relvariance of an adjusted 3P sample is A2/ESN, which is much smaller than C^/N appropriate to equi probability sampling with fixed N, and much smaller than the corresponding unadjusted 3P relvariance (1 + A2)/ESN - (1 + C2)/M--the unbiased expectation of unadjusted 3P relvariance initially discovered by the author in 1965. I hope I have given you enough facts and explanation to interest you in the many opportunities for improved sampling efficiency now open to us with 3P sampling and related designs using unequal probability. In some few cases cumulative list sampling (PPS) might be slightly preferable, but beware of the implicit selection biases in cumulative techniques where no a priori list is available. Figure 1 PARAMETERS FOR INDIVIDUAL TREES AND DISTRIBUTION OF AN 11-TREE POPULATION BY SIZE CLASS (EXPAND TO 110-TREE SPATIALLY CLUSTERED POPULATION AS SPECIFIED IN FOOTNOTE) Tree Class Individual free P&r&rneters PBH w HF=, Coded Height Cubic Volume (DE)^(HF)s 1>*H (BA)*(HF> £*dMhf) Vol ume/Prob Ratio Yl/CoJ^D^H [YiytBA^ H F) .036 **B3 .032 C05" 1.7874- .36840 12 .ai8 898 .030 700 1.6830 IE 8 U92158 72. 1.313 386 ,026 689 M631 Metric Factors for Decodir«3 CDQ,0>E^F),^)E)^(HF3 .1524- *(DE) s3) m meters .0232 2S76 *(!)£)*•=])* tn m2- .018a Hlf7 *(DE)*sBA m w £.4384 HF=H in, m .0566 3369 »(PE>l»HFg1)zH in tt>3 Tree Class Tree Class or Cluster A 99 rentes (M-ll)* or Cluster TMPe Number Trees IDE HDE)* IHF EYI E0>E)VHF I(ba*hf) I 6 6 m* 6 12 .3*513 12 .21890 H 4- 8 16 12 1.4736 if8 .B7SW HI 1 3 9 8 1.9216 72 1.31339 Totals 11 17 31 31 3.786S 13 ^ 2.40788 N.B. T< on jspecifu spatially clustered distribution of 110-iree population 1 hectare of /and, multiply U-tree duster aggregates shown above by 10 and stipulate -that each of 3 clusters be random/y distributed (pofsson spatia/ distribution.) over 1/3 hectare with no intermixture of tree classes, Figure 2 EXAMPLE OF SELECTION PROCESS AND CALCULATION OF ESTIMATES FOR LIST SAMPLING (N=2) FROM TREE POPULATION (M=10) PLUS 1 SURE TREE (EQUIPROBABLE DIRECT, EQUIPROBABLE RATIO, BOTH WITH REPLACEMENT) Selection of EithtfDirectoT Ratio Sample Direct Estimate Ratio Estimate. Unitvj* Cumulative Probabilities Randotn YI* Measured Volume CoidtfH Yj/OiedttfH t (i-io) m* 2 2. 2. 3 Z * 2 © §L 5 ,06iT2l £ .03Z60r 3 3.8780 .0084 .04- _ - »T)3 - - — 3.8780 .0084 VEC " ) .48 4.0896 .0913 .46 3.8208 .0012 .32. 3.8208 .0012 EJEC " ) A(o 5.6056 3.3091 .16 3.7636 .0005" .6* 3.7636 .0005" Total Probs = 1.00 1.00 1.00 Expect^ vo|un»t est.= 3.7845" 3.832Z 3.7865" Population /©lume =3.7865" 3.7865" 3.7865 3Jas = .0 + .0457 .0 FaVametri c mean scared error s 1 .1 031 .0037 .0010 LIST SAMPLING WITH Nfl REPLACEMENT Simple Composition 3IJIC « ) \ Eqm'probo.blfcDirectSamples Probs .33333 .53334 .13333 ToklProbsr l.QOQOO YoUst, MSE m* 2.H37 1.4703 4.Q896 %oW 5~,6056 3.3091 Expected volume est. =3.7865 Population volume =3.7865 Bins - .0 Para-metric mean -scared emra.9805 E^fprobable Ratio Samples Probs .33333 .53334 .13333 1,00000 Yol,«st.|MSE 3.8780* .0084 3.o20fc .0012. 3.7636 .0005" 3.8322 3.7865 + .0457 .0035" PPS(i*H) Samples* Probs .05714 .28572 .65714 VoUst. MSE m* 3.8780 ,oo8f 3.8208 .0012 3.7636 .0005 1.0QQQ0 3.7865" 3.7865 .0 .0012 *MB. The parametric distribution of samples shown aoove for PPS with no replacement presages use of systematic- increment eojja! to 1/2 total probabilities, with list rundomlu re^rawd prior to selection of each Set of M samples. See text for some other selection possibilities. 12 Figure 5 EXAMPLE OF SELECTION PROCESS AND CALCULATION OF ESTIMATES FOR 3P SAMPLING WITH NO REPLACEMENT FROM TREE POPULATION (M=10) PLUS 1 SURE (ENZSN=2, ESN=1. 7517225, SUM KPI=60, KZ=34.252) M=10 Integer YI/KPI Cl^KZJ 2 14- Z -QQQQQ^. Z 20 z is Z 8 12 31 AS §E 4- .03070 tl 18 (L£> £E i ♦03070 ToUlsgO .OGlfO <^9999 represents KZ = 3tl;ZSZi u)fi occurs onlu ZS.Z% <*s frequently as 35" irfeu/fl( occur if a recTanat/for randaw inteaer c/istrti>6tt/6h rarnai^ forn 1 ifir&tmK 35" were ae^rafc^J N Unadjusted 3P estimate of tot* I Volume* KZ *E (H/KPI) +SU RE (volume esWfc*0 if N=o) 34-.Z52* ,0^>i4-0 +SORE. £.1031 +1.9Z16- 4.024-7 rn3 Unadjusted 3P estimate, of fetal volume -(l-PO)^l-un adjust? J est mate) +SURE (resumplc if tf*o) (1- .1241?)* 2.103 1+SU RE 1.84-20+1.9216 = 1 3.7636 m* 3.7636 m3 Adjusts 3P estimate of fctil veJumes(LKPI/N)*E(YI/KPl)+SURE fafwijs resuwplc if N*0,6ut (SO/2.)* .06140 + SURE multiply 2^ eei.'maifc *We by 1.8420+ 1.92.16= EMzSN/N,or more s*rr>plu USe forrou/a. shown on nqht wttout ca(c«fetm4 either unfoljU$fa/ estimate) ' 13 Figure 6 PARAMETRIC DISTRIBUTION OF SAMPLES AND ESTIMATES FOR 3P SAMPLING WITH NO REPLACEMENT FROM TREE POPULATION (M=10) PLUS 1 SURE (ENZSN=2, ESN=1. 7517225, SUM KPI=60, KZ=34.252) Number of Samples Obtained Probabilities Unadjusted? Volume- est (Este>«att=0 if- N=0) UnadjV5ttd3P Volume est. (Resarople if M»b) AdiusteJ 3P Volume est. (Resample if N = 0) .1ZH-1S 0 (+sure) i.m(o Re sample. Re$fcmp(e. 1 ( « ) .314-00 2.9827 e.&sio 3.7804- 2 C " ) .323** 4, 0H77 3.7838 3.7838 3f « ) .17*53 5.U85" H-.7216 3.7883 M-r « ) .05328 (p.1988 5",C678 3.7 W 5( « ) .009*9 7.2909 6.624-3 3. $027 G( « ) .00103 8.3915" 7.5882- 3.S105" 7C »' ) .00007 9.*969 fc.556* 3.&173 8( - ) 10.60*"7 9.5267 3^9 9f « ) 11.7230 10.5061 3.8293 10( •« -) !Z.82fc6 1L*7*5 3,832^ Total Ph>bs = 1.00000 Expected volume esttmottr 3.78&r 3.7865" 3.78*4 "Pbpulatioh Volume r 3.7865 3.7865 3.7865 Bias = .0 .0 - .0021 rVimetric mean scjuareJ errors 1 . 1070 .8004- .0011 14 Figure 7 EFFECT OF DIFFERENT SPECIFICATION OF KPI ON DISTRIBUTION OF EXPECTED NUMBER OF 3P SAMPLES (ESN=11) , ON SAMPLING ERROR (%) OF ADJUSTED 3P ESTIMATES OF VOLUME, AND ON CERTAIN REL VARIANCES , GIVEN TREE POPULATION (M=110) Tree KPI Appropriate KZ for ESN = 11 E xp ected Di stn bution of sampfe trees by class /^proximate Standard Error of Adjusted 3P estimate of volume Rdvariances"*" I n HL A* Gz 1+VN Un.tq 10. l = Unitu ^dtb 9* -for KPI* Volume. 6ee fijure 10 Jhr defmitrons of A1, G*, and VN. identical to 15 Figure 8 EXPECTED DISTRIBUTION OF POINT-3P SAMPLE TREES SELECTED IN 2 STAGES, CALCULATION OF POINT-3P VOLUME ESTIMATE AND ITS APPROXIMATE AND PARAMETRIC RELVARIANCES . 1ST STAGE IS HORIZONTAL ANGLE-GAUGE PPS-SELECTION (BA) OF TREES AT 3 RANDOM POINTS (GAUGE METRIC BASAL AREA FACTOR=. 547244) 2ND STAGE IS 3P-SELECTION FROM GAUGE SELECTIONS (KPI=HF=CODED HEIGHT, KZ=12) Point sel ected sanies Powt-3P-se(ecte 137.865* m3| Approximate Relvamnce of Adjusted Point-3P Volume Estimates. Relvariance of mean ZnF per point (= J570) plus Relvariance of mean I (YI/(BA*HF))per point £.0033) plus 2.*-Relcov&rfance 0f above means fc-.0S89) "joTT = Su/T) r Equal welohts TxpplyTi) ZHF relvariance calculations. Weight* l.^loiO^nd 1.000 applu to other 2. calculations. (they are reciprocals of Ci-PO; appropriate to each point) Parametric rfclvanxrice of Adjusted Pomt-3P Volume Estimate - Volume variance among clusters 4- Volume vanince yy/trnn. clusters:: .084»43 TMTT ,0132.9 Approximate Relative Standard Errors 1 00 *Vfrel variance = 71,8% Parametric Relative- Standard Error=loo*\/Relvariance = 31.6% 16 Figure 9 COMPARISON OF THREE 3P ESTIMATES (UNADJUSTED, ADJUSTED, FREQUENCY-BALANCED) RESULTING FROM ATYPICAL SAMPLE (N=13, ESN=11) DISTRIBUTED 2, 3, 8 INSTEAD OF 1, 4, 6, GIVEN TREE POPULATION (M=110) Estinn-nteJ Total Volume Approximate Stanch"! Error Estimates