siete P > ee 3a bs hee Mais bs ieaeanrha ce Ye se ements teins besa +) 4 - meh are LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN no.6G - 99 SURVEY cmapien » Wek SOL. Ga > wig Jia tae | ae | * oie bale | ee: | Aah Pe . ? E> 14 (25338 5M—1-71) STATE OF ILLINOIS DEPARTMENT OF REGISTRATION AND EDUCATION NATURAL HISTORY SURVEY DIVISION The Use of Factor Analysis in Modeling Natural Communities of Plants and Animals Robert W. Poole —z- 2.15 —p— 2.75 Illinois Natural History Survey Biological Notes No. 72 Urbana, Illinois February, 1971 The Use of Factor Analysis in Modeling Natural Communities of Plants and Animals Robert W. Poole Tue Prostem oF Mope.inc Communities of plants or animals can be studied either by observing the charac- teristics of the community as a whole or by determining the interactions among and within individual species. At the community level most attention has been focused on descriptive community analysis, species diversity, and energy flow. At the one- and two-species levels some aspects of the problem that have been, and are being, intensively studied are population demography, preda- tion, competition, parasitism, and spatial distribution. These basic interactions have been reasonably well de- scribed, and they have been integrated in the modeling of spruce budworm populations in Canada (Morris 1963). However, even this one-species model is very complex and requires the determination of a large number of parameters. It is just as conceivable to go from the community level to the individual species as from the species to the community. The purpose of this paper is to explore this approach using a statistical technique known as factor analysis. Factor analysis is a statistical technique for picking out the underlying factors causing the vari- ance in a set of variables. Factor analysis originated in the psychological sci- ences but is now also being used in the biological sci- ences. Its first uses in biology were by Goodall (1954) and Sokal & Hunter (1955), and it has since been used extensively in numerical taxonomy (e.g. Sokal & Sneath 1963; Schnell 1970) in the delimiting of the natural associations of plants (e.g. Dagnelie 1965) and in palaeo- ecology by Reyment (1963). Factor analysis, primarily the form known as prin- cipal components analysis, has been used in biology for the most part as a classification technique, although there have been some attempts to make associations between environmental variables and species using correlation co- efficients in a factor-analysis framework (e.g. Dagnelie 1965). Factor analysis was originally developed to esti- mate and define the factors causing the observed re- sponses in a series of variables and is here used in this sense rather than as a classification technique. This paper is divided into three parts. The first gives a brief review of basic ecological principles nec- essary for the following two sections. The second section describes the statistical procedures considered and the analysis of a specific example. The third section con- This paper is published by authority of the State of Illinois, IRS Ch. 127, Par. 58.12. Dr. Robert W. Poole is Assistant Taxonomist, Section of Faunistie Surveys and Insect Identification, Illinois Natural History Survey, Urbana. siders the assumptions of the factor analytic model and compares them to the initial ecological generalities to see if the model really does mirror the workings of the community or if it only produces a set of mathematically correct but ecologically meaningless numbers. I have tried to emphasize the implications of the assumptions underlying the factor analysis and deemphasize the mathematics. Many university computing centers have the programs used in this paper, and interested persons can find the mathematics underlying the technique in Harman (1967). I wish to express my appreciation to those persons who have either read the manuscript or helped with the analysis of the example used in this paper: Dr. George F. Kawash of the Department of Psychology, University of Illinois; Dr. K. W. Dickman of the SOUPAC office of the Department of Computer Sci- ence, University of Illinois; Dr. Robert H. Whittaker of the Division of Biological Sciences, Cornell Univer- sity; Dr. Richard B. Root of the Department of En- tomolgy, Cornell University; Mrs. Kathleen Eickwort of the Department of Entomology, Cornell University; and my wife Beverly. I also wish to thank Dr. Philip W. Smith and O. F. Glissendorf of the Illinois Natural History Survey for their editorial contributions to the paper. FACTORS AND SPECIES POPULATIONS A population of an animal rarely stays at a constant level; usually it is either increasing or decreasing. Whether and how much a population increases or de- creases depends on the environmental factors controlling the limits of that population. If conditions are favor- able, the population increases; if they are not favor- able, it decreases. A species population can be affected by several factors, and the factors may be interacting among themselves. This basic relationship is diagram- med in Fig. 1. Not all of the factors are of equal im- portance to the species population, one factor usually being more important than the others. If the effect of a factor on a population depends on the density of the population, it is referred to as a density-dependent fac- tor, and if it does not depend on density, it is referred to as a density-independent factor. In a community of two or more species, a factor in- fluencing one species may also influence other species in the community. The effect of this common factor may vary from species to species, being more important Fig. 1—Diagrammatic representation of the influence of three factors on one species. for one than for another. At the same time a species may be influenced by a factor or set of factors which af- fect it only. These will be referred to as specific factors. This relationship is diagrammed in Fig. 2. Even in this relatively simple community with uncorrelated factors, the complexity is evident. SPECIES SPECIFIC FACTORS Fig. 2.—Diagrammatic representation of the influence of three specific and four common factors on three species. If two species share a common factor or factors, the changes in their populations will be correlated. For example, if two species are both limited by rainfall and rainfall is increased, both populations will increase. However, if one species is only slightly dependent on rainfall and the other strongly so, the changes will be disproportionate and the correlation less. As a simple principal rule, the correlations among a group of species making up a community are determined by the species’ mutual association with a group of common factors. In essence, factor analysis takes a matrix of correla- tion coefficients among a set of variables and reduces it to a series of mathematical common factors that ac- count for the correlations among the variables. TECHNIQUES The procedures carried out in this paper are calcu- lation of the correlation matrix, estimation of commu- nalities (the amount of variance caused by factors com- mon to other species), factoring of the matrix using the principal axis method, rotation to a specified hypoth- esis (transformation of the numbers to other biologically meaningful numbers), calculation of factor scores, and the formulation of the so-called specification equations for each species to serve as a model of the community. If all of the above steps except the factoring of the calculated correlation matrix are skipped, the result is a form of factor analysis normally referred to as prin- cipal components analysis. Principal Components Analysis Mathematically, factor analysis resolves a correla- tion matrix (a covariance matrix can also be used in some cases) into a n x k factor matrix where the num- ber of factors, k, is usually smaller than n, the number of variables (in this case species). ‘This factor matrix has the characteristic that when multiplied by its trans- pose (rows and columns interchanged) it restores the original correlation matrix. In matrix notation R=V,V,’ where R is the correlation matrix, V, the factor matrix, and V,’ the transposed factor matrix. Basically the problem is to resolve the correlation matrix into its latent roots and vectors (also referred to as eigenvalues and eigenvectors) . Principal components analysis assumes that all of the variance of each species can be accounted for by a set of factors common to all of the other species in the community and lumps variance due to specific factors and error factors in with the common factors. In the actual computation, the loadings (weights) of the first factor on each species (the latent vector or eigenvector) are calculated in such a way as to remove the maximum amount of variance from the matrix as can be explained by one factor. The effects of this factor are then sub- tracted from the correlation matrix. A second factor is then calculated from this reduced correlation matrix, and so forth until the reduced correlation matrix con- sists of essentially all zeros. These calculations have been carried out on data given by Hunter (1966). Hunter measured the species populations of Drosophila at several sites in Colombia, principally near Bogota. I have analyzed her data for “Pine Woods,” a government-protected pine forest near Bogota. The census was carried out from September, 1961 to December, 1963 (28 months) by sweeping a net over bait. In Hunter’s table for Pine Woods, the figures for each month are lumped, and the abundance of each species expressed as a percentage of the total Drosophila community. In cases where a species fre- quency was less than | percent, it is listed only as present. In my analysis when a species is listed as “present,” I have considered it to be absent because individuals of that species made up less than 1 percent of the total. Of the 11 species listed by Hunter, I have analyzed only 10 because the 11th, “dreyfust 22,’ was very rare. 5 TaBLe 1.— Correlations among the frequencies of 10 species of Drosophila over a period of 28 months at Pines Woods (near Bogota, Colombia). S & E . S : x 3 3 S S 3 3 S s N 3 & 8 2 8 3 3 Fs s : 5 = = = RS S = & = o 35 = = = 2 = as) Ts) melanogaster 1.00 pseudoobscura 37 1.00 bandeirantorum 34 85 1.00 “tripunctata 20” {0h 46 52 1.00 hydei —.08 -.09 .02 -.00 1.00 immigrans 09 24 EOI / —aAS 1.00 viracochi 11 =a i08) —.00 =o! .38 =p 1.00 mesophragmatica Ships =i) —.80 =50) —.09 —.56 = 214 1.00 brncict 43 45 61 .28 =09 05 5 —.41 1.00 gasict =e} 40) —.18 ES -.01 58 O01 —.26 20) 1.00 The correlation matrix was calculated (Table 1) using the Pearson product-moment correlation coeffi- cient and factored using the principal axis method. The resulting factors are shown in Table 2, which also shows Tasie 2.— Calculated factors from the principal compon- ents analysis. Percent Cumulative Factor Variance Variance Percent 1 35/921 37.9214 37.9214 2 1.9637 19.6365 57.5579 3 1.5131 15.1310 72.6889 4 0.9162 9.1621 81.8510 5 0.7103 7.1034 88.8544 6 0.5365 5.3646 94.3189 7] 0.2941 2.9414 97.2603 8 0.1728 1.7282 98.9885 9 0.0971 0.9712 99.9596 10 0.0040 0.0403 100.0000 that the first three factors account for about 73 percent of the variance in the correlation matrix. The total number of factors extracted by the principal axis method cannot exceed the number of variables. Each of the calculated factors is affected in part by the inclusion of error variance and variance due to specific rather than common factors. Therefore the factors become more and more trivial and unreliable as the factoring proceeds, so that the factors calculated after the first few have no real meaning. A commonly used breaking point in factoring is when the eigenvalue of the factor falls below 1.00 (listed under variance in Table 2). Using this criterion, the first three factors are signifi- cant. The factor loadings of each factor on the 10 species are given in Table 3. Factor loadings are a type of correlation between a factor and a variable, or more specifically, the weight of each factor in account- ing for the variance of a given variable. In other words if factor 1 had a loading of .47 on a given variable, TasLe 3.— Computed factor loadings from the principal components analysis on the 10 species cf Drosophila. Factor 1 Factor 2 Factor 3 melanogaster —.4750 —.3843 —.0687 pseudoobscura —.8581 = .0020 bandeirantorum —.9002 —.2406 sz7/8) “tripunctata 20” —.6886 4540 —.1654 hydei .0789 —.0303 7648 immigrans —.5498 .6897 —.0449 viracochi 1171 .0108 8662 mesophragmatica 9091 al 25 —.3291 brneici —.6233 —.4481 —.1364 gasici —.0914 .8906 .0226 and factor 2 a loading of .03, factor 1 would be more important to the variable than would factor 2. Rotation The set of factors arrived at in the preceding section and the loadings of the factors on the variables are only one of an essentially infinite number of possibilities. In other words there is an infinite number of factor matrices that when multiplied by their transpose will restore the original correlation matrix. The factors as they come out of the principal axis method are orthogonal to each other (uncorrelated). These calculated factors do not necessarily correspond in any way with the real attributes of the environment controlling the fluctuations of the species populations. One of this infinite array of answers is the correct one, however, and the problem is to find it. The variables can be plotted on each factor as has been done in Fig. 3 for factors 1 and 2. Factor 3 could be included and the variables would then be in a three- dimensional space. The addition of a fourth factor would be in hyperspace. Any of the possible solutions to the problem can be arrived at by rotating these axes (factors) and reading off the new factor loadings on each variable. This is an oversimplified explanation of rotation and a more complete account can be found in Cattell (1965) and Harman (1967). Factor 2 In the principal axis method the first factor is cal- culated to account for as much of the variance in the correlation matrix as possible. The method attempts to have this factor loaded as heavily as possible with all of the variables. It is possible that a factor such as temperature would influence all of the species strongly, and in this case the calculated factor as it comes from the analysis would accurately reflect the actual environ- mental factor. However, it is also possible that a factor may be important to only two or three species and rela- tively unimportant to the others in a community. In this second case the factors coming from the principal axis analysis would not fit the real situation and must be rotated to a position where they do. The above situ- ation is satisfied by rotation to what is known as simple structure. The factors coming from the principal axis analysis are orthogonal to each other, but very often, probably usually, the factors: operating on the species are correlated with each other. By rotating to simple structure, the factors are allowed to be correlated with each other. Mathematically, rotation to simple structure attempts to correlate a factor with the smallest number of variables possible. In other words each fac- tor should affect only a few variables. In rotation, the original factor matrix (V,) is mul- tiplied by a transformation matrix (T) giving a new matrix referred to as the reference vector matrix (Vrs) Vrs = Weak The reference vector matrix does not give the new loadings of the factors on the variables for reasons dis- cussed by Cattell (1965). To calculate the new factor Fig. 3.—Loadings of the 10 species of Drosophila on factors 1 and 2. Factor 1 loadings, a new matrix termed the factor-pattern matrix is calculated as Ver = Vrs)! where D is the diagonal matrix of the reciprocal square roots of the diagonal elements of the inverted matrix of the reference-vector correlations. The reference corre- lations are computed by multiplying the transformation matrix by its transpose Cre = T’ T where Crs is the matrix of correlations between the reference vectors, T the transformation matrix, and T’ the transpose of the transformation matrix. Several mechanical programs are available for rota- tion to simple structure. The program Oblimax (Pinz- ka & Saunders 1954) was found to give the most reason- able answers in this case and has been used in this TasLe 4. — New factor loadings (the factor-pattern matrix) after rotation to simple structure using the Oblimax program on the 10 species of Drosophila. Factor 1 Factor 2 Factor 3 melanogaster =S7/2)il .2648 —.0889 pseudoobscura —.8942 .0453 —.0339 bandeirantorum —.9589 .0280 0925 “tripunctata 20” —.4056 ~.6187 —.2004 hydei = 262 .0744 .7833 immigrans —.2098 —.8148 -.0725 viracochi —.0988 0462 8881 mesophragmatica 8618 3365 —.2956 brneict =P) -2903 —.1641 gasict .2786 -.9021 0152 analysis. The factor-pattern matrix after rotation to simple structure is shown in Table 4. A comparison of Tables 3 and 4 shows few significant changes because of rotation to simple structure, using the Oblimax pro- gram (the signs have been changed in factor 2). The new factors produced by rotation to simple structure are not necessarily orthogonal and may be cor- related (oblique). The correlation matrix of these three factors is given in Table 5. TasLe 5.— Correlations among the factors after rotation to simple structure using the Oblimax program. The calculated communalities are listed in Table 6. Other techniques of communality estimation were tried: (1) replacing the diagonal entry of a variable by the square of the multiple R of each variable with all other variables, and (2) replacing the diagonal entry of a row by the square root of the average r” across the row. The estimated communalities using these two methods are also given in Table 6. The Varimax-rotation pro- TasBLe 6.— Estimated communalities of the 10 species of Drosophila using the following methods: 1) (r*ix) (Si—r*ix)/ (Sx—r*ix), 2) square of multiple R, 3) square root of average r’, 4) iterative. Factor 1 Factor 2 Factor 3 Factor 1 1.0000 —.1875 —.1987 Factor 2 3137/5) 1.0000 .0469 Factor 3 —.1987 .0469 1.0000 Communalities In the principal components analysis 1’s are entered in the diagonal of the correlation matrix because the correlation of a variable with itself is 1. In factoring the matrix this presumes that all of the variance of a spe- cies can be accounted for by factors common to other species. However, a species is normally affected not only by common factors, but by factors specific to it, and also error factors. The variance of a species (o*,) is equal to the vari- ance explicable by common factors (o*,;) plus the vari- ance of the species due to specific factors (o*,;) plus an error term (o*.;), ome a Obs + Cus + Ores The term o*,, is usually referred to as a variable’s communality. To remove the variance of a species due to specific factors and error terms, communalities for each species must be calculated and substituted for the diagonal ele- ments of the correlation matrix. Unfortunately there are many different techniques used to estimate com- munalities and none of them is “the best.” Also, the subject of communalities is a controversial one. In a practical sense, with large initial matrices the effect of not calculating communalities on the estimates of the factors is minimal and becomes less and less im- portant for larger and larger matrices. The calculated communalities are important, however, in estimating the reliability of the predictive equations presented later. Communalities in the factor analysis carried out in the following pages were calculated by replacing the diagonal entry for each row by (r¥ix) (Sir*ix) 7 (Sx-r*ix) where r*i, = maximum absolute rj; S; = absolute ry S; = absolute ry; 1 2 3 4 melanogaster .2919 8454 4075 .3780 pseudoobscura 8338 9825 5499 .7999 bandeirantorum 8638 .9097 5708 .8845 “tripunctata 20” 7150 7567 .4940 1077 hydei 2333 5736 3484 5921 immigrans 8314 UTS .4669 .7801 viracochi 6267 .9399 3671 7641 mesophragmatica 9345 9932 5795 Ge brneici 4282 6633 4592 .6079 gasict .4036 9522 4113 8021 gram (Kaiser 1958) also gives iterative solutions for the communalities. The calculated communalities using this iterative technique are also given in Table 6. Factor Identification The purpose of the analysis is to arrive, mathematic- ally, at a set of factors corresponding to the real factors in the environment that cause changes in populations of the species in the community. This problem has been partially discussed under rotation. There it was shown that factors calculated by the principal axis analysis do not necessarily correspond to any real factors. To make these factors useful, the factor vectors must be rotated in hyperspace to a position where they do correspond to real parts of the environment. The problem of identi- fication can be broken into two stages: (1) rotation of computed factors to where they correlate heavily with real factors of the environment, and (2) the identifi- cation of the environmental factors. I will discuss the second stage first. A set of factors has been calculated that explain part of the variation in the population of a species. How- ever, to be useful these factors must correspond to real parts of the environment that can be identified. Basic- ally we want to know that factor 1 is so highly cor- related with rainfall that rainfall, for practical purposes, can be taken as factor 1. Often a person knows a priori, or suspects, that species “‘a” is heavily influenced by some factor such as maximum temperature. Therefore if this species has a heavy loading on one of the factors derived from the factor analysis, it is a good indication that this factor is either maximum temperature or is, in some way, Closely correlated with maximum tempera- ture. It is also possible, if measurements of maximum 8 temperature are available, to include maximum temper- ature in the data matrices as another variable. If max1- mum temperature as a variable loads heavily with one factor and little with other factors, it is likely that this factor is in some way related to maximum tempera- ture. Determining the identity of every significant fac- tor is not easy and depends on extensive field work. However, factor analysis indicates how many significant factors to look for, and the weightings of these factors on every species in the community. Even if a factor is interpreted incorrectly, as maximum temperature, the use of maximum temperature measurements for that factor may still give correct predictive answers, a pro- cedure not very scientific but pragmatically important. It must be emphasized that the mathematical factors never exactly correspond to the environmental factors, but they may be so heavily loaded on the environmental factors that measurements of the environmental factors can be used as approximations to the mathematical factors. The other problem in identification is the rotation of the factors derived from the principal axis method analysis to some position where they correspond to real parts of the environment. If the factors are not rotated, the hypothesis is that the factors tend to influence all of the variables; however, if rotated to simple structure, it is assumed that the real factors tend to influence significantly only a few of the variables. In a real situa- tion neither hypothesis may be the correct one. For example, if in a community of insects rainfall was im- portant to all of the species, but at the same time each species was restricted in its choice of food plants, there would be one factor influencing all of the species, and several other factors that influenced only a few variables each. This situation clearly does not fit the hypothesis behind the factors as they come from the principal axis analysis or after rotation of the factors to simple struc- ture. It is also possible to rotate the factors to fit a specific hypothesis, but because it is not possible to formulate a specific hypothesis for the example used in this paper, this rotation has not been done. The most difficult prob- lem connected with this type of community analysis should now be apparent. To rotate the calculated fac- tors to a position where they represent real factors of the environment, a correct hypothesis of the type of factors involved and the relative numbers of each (such as two factors influencing all of the species and three factors in- fluencing only two or three species) is needed. The prob- lem is what stage in the identification of factors is to be carried out first—the identification of factors or the rota- tion of the calculated factors to fit actual factors in the environment. Each is partially dependent on the other. As a working technique it should be possible, by extensive field work and experimentation, to formulate a rough hy- pothesis as to the percentage of significant factors that will influence a limited number of the species. For example, it might be known that rainfall influences a certain number of species, and there is reason to believe that it is important to virtually all the species in the com- munity. On the other hand, it might be known that most species in the community tend to be limited in their selection of food plants. Given four significant factors, a rough hypothesis might be that one factor in- fluences all of the species, and three others influence only a few of them. From the set of calculated factors, the first (the one accounting for the most variance) is likely to be factor 1 of the hypothesis, with the other three factors being fitted to the groups of species that they load most heavily with. The factors could then be rotated to fit the rough hypothesis, and the hypothesis could possibly be reformulated as a result of the rota- tion, Every possible rotation of the factor vectors is, of course, an approximation to the real situation. Some of the approximations will be good, others not so good. A question of practical importance is whether or not the answers derived from each rotation are much dif- ferent from each other. The answer to this question will only come through use of the factor-analysis tech- nique. In the example used in this paper, the differences between the factor loadings of the orthogonal factors and the factors rotated to simple structure are slight. It has usually been found in psychology that the changes in factor loadings by rotation to simple structure are slight (Kawash, personal communication). Simple struc- ture rotation tends to rotate out small error factors and is used more for that reason than for the hypothesis it represents (Cattell 1965). Even if the factors are not correctly rotated, the appoximation may still be close. Computational Procedure Having carried out a principal components analysis of the data and having partially explained the problems of rotation and communalities, the complete factor analysis will now be carried out. In the following sec- tion the predictive equations are formulated and _ the possible usefulness of the technique is discussed. As discussed in the preceding section, a possible aid in the identification of the factors is to place measure- ments of presumed factors into the correlation matrix as variables and then note if any of the calculated factors load heavily on them. Hunter (1966) gave data for rain- fall, mean maximum temperature, and mean minimum temperature for each month of her study. She assumed that rainfall was one of the most important factors, pointing out that its effect probably acted upon the larvae, or perhaps initiated egg laying in the adults. Hunter stated that the average time for development from ege to adult is about 2 months. Because these three environmental measurements are more likely to be important to the larvae that later give rise to the adults than to the adults directly, the three measurements have been entered as variables with the species with a 2- month lead. The correlation matrix of the 13 variables was com- 9 Taste 7.—Correlations among the 10 species of Drosophila and 3 environmental variables. with calculated communalities in the diagonal. 2 3 o fe) Saas = a 2 Sy se aoe 2 ete Se oe ome fen bee z 8 Se ae op Se ee oS = See eo) ee ee ee 2 re i .S $ S Ss 5 & S| Be ea ES GeoRSS) CEH se? ESA oSyer Een melanogaster 30 pseudoobscura oi 90 bandeirantorum 2) dea) fa) “tripunctata 20” SOAS hydei 03509 025 007, immigrans (09 e24 eo AD) Sail} affil viracochi Sil al) (000) eee fais} (0) 58 mesophragmatica APE fe) ats) a) =O) a) 1) 97 brncict 43) 456 228) 09) S05) 15) eel) 44: gasict S18) 520) lf} cele’ fl BY} AO A Aa) min. temperature Salle) sili Salil 41 Ws lyf Sa Sih et sh ef) max. temperature rk! AE IP = 0 AO) ARAB Fee EO Oy AS) nay rainfall =9 =23 =09 09 =i10 Ol —=21 29 =O) =(0, =06 15 07 puted. Rainfall was not significantly correlated with any of the other 12 variables. Communalities were esti- mated, as described earlier in this paper, and the esti- mated communalities were entered in the diagonal of the correlation matrix. The new correlation matrix, with all 13 variables and the estimated communalities in the diagonal, is given in Table 7. This correlation matrix with estimated communali- ties was then factored by the principal axis method. The eigenvalues of the first three factors were 3.60, 2.28, and 1.38, about the same as in the principal components analysis. The second eigenvalue is slightly higher, and the first and third eigenvalues are slightly smaller, than in the principal components analysis (Table 2). There is, however, a significant drop from the eigenvalue of the third factor (1.38) to the fourth (.61). Using an asso- ciated eigenvalue of 1.00 as a criterion of significance, the loadings of the first three factors were computed and are listed in Table 8. Comparison of Tables 8 and 3 Tasie 8.— Factor loadings of the first three factors on the 10 species of Drosophila and 3 environmental variables. shows few major changes in the first two factors but several in factor 3. None of the three environmental Taste 9.— Calculated reference-vector structure matrix after rotation to simple structure using the Oblimax program. Reference Reference Reference Vector 1 Vector 2 Vector 3 melanogaster 3869 1672 1778 pseudoobscura 8813 1214 0121 bandeirantorum 8917 .0705 —.0342 “tripunctata 20” 4832 —.5052 .1608 hydei .0509 —.0243 —.3676 immigrans 3563 = 017/83 .0703 viracochi 1471 —.0075 —.8113 mesophragmatica —.9562 3252 3045 brneici 5194 2502 .2109 gasici 57/7) —.8286 —.0488 min. temperature —.0889 —.7476 0727 max. temperature —.2096 = 2I17/83 .6141 rainfall —.2346 0042 .2201 Tasie 10.— Calculated factor-pattern matrix after rota- tion to simple structure using the Oblimax program. Factor 1 Factor 2 Factor 3 Factor 1 Factor 2 Factor 3 melanogaster 4001 = 2122 .2205 melanogaster 3994 1678 1841 pseudoobscura 8586 —.3278 .0661 pseudoobscura .9097 1218 0125 bandeirantorum 8675 eT .0106 bandeirantorum -9205 .0708 —.0354 “tripunctata 20” 6489 3999 0612 “tripunctata 20” 4987 —.5068 1664 hydei —.0495 —.0746 —.3590 hydet 0525 —.0244 —.3806 immigrans 5451 6116 —.0763 immigrans 3677 —.7196 0727 viracochi —.0852 SIP —.7811 viracochi 1519 —.0075 —.8398 mesophragmatica —.9466 —.0052 3344 mesophragmatica —.9870 3263 3153 brncici 5231 = Siill5ys) .2746 brneici 5361 .2510 .2183 gasici 1217 .7909 = PRM gasici —.0596 —.8312 —.0505 min. temperature .1066 7497 ~.0947 min. temperature —.0918 —.7499 .0753 max. temperature .0181 4004 5398 max. temperature —.2164 —.2180 6357 rainfall —.1726 1044 rainfall —.2422 2279 2062 0042 10 variables is heavily loaded on any of the three factors, although minimum temperature has a moderately heavy loading on factor 2, suggesting that factor 2 may in some way be associated with minimum temperature. Rainfall is lightly loaded on all three factors, and it is thus un- likely that it has any association with the three factors. The factor matrix was then rotated to simple struc- ture using the Oblimax method, and the resulting refer- ence-vector structure matrix is given in Table 9. The calculated factor-pattern matrix is given in Table 10. Predictive Equations The next stage is the formulation of what are known as specification equations. These equations specify the weights to be given to each factor in accounting for the score (observed measurement of some kind) of each variable. The specification equation can be written in a general form as Vii =spF yi + SjoF oi + ----- SixF ig + SF ji + SjeF ei as given by Cattell (1965). If there are k observations, the score on a variable on one of these observations is equal to the sums of the scores of the factors (Fj,) in- fluencing the variable as modified by the significance or weight of each factor to the variable (the s;;). These factors include a series of common factors, any specific factors there may be, and an error factor. The specifi- cation equations will be the basic predictive equations. In the example analyzed in this paper there are 10 species measured at 28 observations, giving a total of 280 specifi- cation equations. To formulate the set of equations for all species in the community, it is necessary to calculate first the factor-score matrix (F,)and secondly the fac- tor-pattern matrix (Vp) which gives the necessary values of the sj. The factor-score matrix is computed by multiplying the reference-vector structure matrix by the basic diagon- al of the original correlation matrix. In computation this step was done by inverting the correlation matrix, multiplying that by the matrix of standard scores for the variables standardized by rows, and multiplying the re- sulting matrix by the reference-vector structure matrix (Vrs) or Fp = Vrs where F» is the factor score matrix, Vrs the reference- vector structure matrix, and 6 the basic diagonal of the correlation matrix. The resulting factor-score matrix for the 28 observations is given in Table 11. The factor scores are the standard scores for the factors calculated for a particular rotation. If the factors have been ro- tated to where they correspond to real parts of the en- vironment, the factor-score matrix gives estimated stand- ard scores for the environmental factors. If the rota- tion is not the correct one, the numbers are only numbers that will reproduce the scores on the variables. It is, of course, impossible to use them predictively if they are not real. Having calculated the factor-score matrix and the factor-pattern matrix, it is now possible to estimate the value of a variable on any observation. As an example, the standard score of Drosophila pseudoobscura at ob- Taste 11.— Calculated factor score matrix for the 28 ob- servations from the Oblimax rotation to simple structure Observation Factor 1 Factor 2 Factor 3 1 .4840 1.0297 -—.5129 2 5836 .9663 -1.0332 3 1.7794 BLP —.6943 4 .7821 —.1245 = H125 5 —.4793 —.0247 .8224 6 —.1264 3034 2.3076 7 —.3645 .8503 1.6247 8 1.1017 4221 7145 9 1.2421 .0689 —.0601 10 3.2278 —1.3368 —1.9288 11 1.1410 —1.0098 —.6190 12 1.1349 3455 5760 13 .8159 —.0459 —.4046 14 9906 = 3259) —.8530 15 —1.4451 -9605 1.8509 16 —1.9348 1.2871 1.7579 17 —1.2463 6319 .7126 18 —2.1194 1.3567 2.2356 19 —.3433 —.2672 .0467 20 —1.2586 .4976 1.0491 21 —.4868 —3.9399 4878 22 4718 —1.9417 —1.0491 23 —.6639 —.4351 = 1375 24 —,8399 1.2284 —.1013 25 .0412 .0449 —2.7617 26 —.5968 —.8071 —.5184 27 —.8115 —.2905 =.5392 28 —1.0795 .0832 —.8220 servation 4 (December, 1961) equals the sums of the factor scores as weighted by the factor loadings for that period plus specific factor scores, plus an error term. In other words Drosophila pseudoobscura,,, — (.9097) (.7821) + (.1218) (—1245) + (.0126) (—.7125) + specific fac- tors;,, ++ error factors,,) Drosophila pseudoobscura,,, = .6873 + specific fac- tors,,. + error factors,,,. All scores are in standard form. Theoretically if the scores for the common factors, the specific factors, and the error factors were known, the predicted scores would exactly fit the actual scores of the variables (species population levels). However, in this case nothing is known of the specific factors and the error factors, and the predictions are based only on the variance attributable to common factors. Where com- mon factors account for a large percentage of the vari- ance of a species, the predictions should be fairly accu- rate. In a species population influenced to a large ex- tent by specific factors and error factors, the predictions will not be as good. To a certain extent, the reliability of the estimates can be judged from the size of the spe- cies population’s communality, species with large com- munalities being more predictable than those with small communalties. This procedure, in essence, pretends that specific and error factors do not exist. Graphs of the predicted and observed abundances (as standard scores) of the 10 species are given in Fig. 4-13. It is clear that for many of the species, particularly the common ones, predicted and actual values agree quite well, although there are still some deviations. Devia- D. melanogaster PREDICTED OBSERVED ‘STANDARD SCORES ao e-nNoWN ® 1961 " bEC Fig. 4.—Predicted versus observed abundances (standard- ized) for Drosophila melanogaster from September, 1961 to December, 1963. D. pseudoobscura —— PREDICTED oc (1 ERE SRY | VE 2, OBSERVED STANDARD SCORES SEPT ’ DEC 1961 1963 Fig. 5.—Predicted versus observed abundances (standard- ized) for Drosophila pseudoobscura from September, 1961 to December, 1963. D-tripunctata 20° -8- i —— PREDICTED fh ns fh Oe OBSERVED STANDARD SCORES a ta) a ° Oo} cd ‘ \ \ NS) gt es = : SREY . a poe — * péc 1961 1963 Fig. 6.—Predicted versus observed abundances (standard- ized) for Drosophila “tripunctata 20” from September, 1961 to December, 1963. 11 D. bandeirantorum STANDARD SCORES nce m7eN ON & © @ SEPT 1961 1963 DEC Fig. 7.—Predicted versus observed abundances (standard- ized) for Drosophila bandeirantorum from September, 1961 to December, 1963. D. hydei PREDICTED = OBSERVED STANDARD SCORES sept | : ae add ; : " Dec 1961 1963 Fig. 8—Predicted versus observed abundances (standard- ized) for Drosophila hydei from September, 1961 to December, 1963. D. immigrans STANDARD SCORES LU tay to 5 . ry . . rs rial comnevROw@mMnaANONRDEBON & OBO Sept : : i pce anh 2 Tas ay os. -4 1961 1963 Fig. 9.—Predicted versus observed abundances (standard- ized) for Drosophila immigrans from September, 1961 to De- cember, 1963. 12 D. viracochi a hey} 272 — PREDICTED 2333 RNROKRDD®OND ODO MS of eh 4 SEPT re: d : ‘ DEC 1961 1963 Fig. 12.—Predicted versus observed abundances (standard- ized) for Drosophila brncici from September, 1961 to Decem- ber, 1963. D. gasici 2.0 - 1.8 - — PREDICTED YEO esse OBSERVED 1.4- 1.2 - STANDARD SCORES RO@2MANOKR BO ® ree Lai aia ' a TST aT te 1961 Fig. 13—Predicted versus observed abundances (standard- ized) for Drosophila gasici from September, 1961 to December, 1963. culated factor scores, the population levels of the species for each period were recalculated. However, if the fac- tors influencing the species of a given community have been identified by a previous factor analysis and the ro- tation properly carried out, it is possible later to make measurements of the factors, standardize them, and then calculate predicted standard scores for all of the species of the community using a set of specification equations as above. I have not been able to do so with these data from the literature, because the factors have not been identified, or if they had been, no measurements are available for them. Also, there is no way to check the validity of the results. The use of these predictive equations can be illustrat- ed by a possible application. A factor-analysis study car- ried out on the community of fish in a river had deter- mined that water temperature was one of the important factors affecting the fish community. It is also known that the establishment of a nuclear reactor on the banks of the river will progressively raise the temperature of the water. The question is: “How will the rise in tem- perature of the water affect the populations of the fish living in the river?” The expected rises in temperatures with time could be entered into the specification equa- tions. The other factors could be assumed to be constant or estimates of their probable values might be entered, and the predicted population levels of all species of fish in the river estimated for time x. A weakness of the model is that it can never predict a species becoming extinct although it will approach zero frequency as a limit. DISCUSSION Like any other statistical technique, factor analysis manipulates data in an attempt to reveal the underlying causes and their importance to the variables measured. Three important assumptions are made about the data when factor analysis is employed (Cattell 1965): (1) individual variables and factors are linearly interrelated, (2) two factors act additively in respect to any given variable, and (3) there are no interaction effects among the variables. No assumptions are made about the distributions of the variables. Various tests for significance of factors do make assumptions regarding the distributions of variables and, for that reason, have been avoided in this paper. It is probable that in any real, relatively large community of organisms all three assumptions will be violated at one time or another. Because of the likelihood of some curvilinear or higher polynomial relationships between factors and variables and because of the existence of non- additive factors, it is important to know how closely the linear model assumed by the factor analysis approximates the situation where there are some nonlinear relation- ships between variables and factors. Cattell & Dickman (1962), using variables and factors between which the relationships were known, showed that if variables are not linearly related to the factors, the factor analysis approximates the determina- tion of the variable by representing a product by a sum. Over a small range this is usually considered to be a good approximation. For example, if a species were deter- mined by two factors acting multiplicatively, Species = sF,F, then the factor-analysis model approximates it by Species = sF, + sF, After the analysis has been carried out and the num- ber and nature of the factors determined, the linear model can be modified and the predictions improved by experimentally locating nonadditive factors and modify- ing the series of specification equations. The same can be done with nonlinear relationships between variables and factors. Often the mathematical relationship of 13 a factor to a community of species, if not linear, will be roughly the same for all species (i.e. if the relation- ship is exponential, it will be exponential for all species) . Two other common situations that modify the rela- tionships between factors and variables are threshold levels and competition for a limited resource. Some- times a factor influencing a set of variables may op- erate only above or below a critical value. For exam- ple, dispersal in some animals occurs when the popula- tion of a species reaches a critical density. The sigmoid curve of population ecology assumes that reaction to in- creasing density is gradual: the closer the population approaches the carrying capacity of the environment, the slower the rate of growth. It is also possible that there may be a situation where the curve is completely exponential until the carrying capacity has been reached, or surpassed, and a point is reached where density-de- pendent factors act suddenly. In some predators, search images are formed on abundant species of prey and, when the population of a prey species reaches a critical level, a predator population may begin to attack it to the exclusion of other less common species. Competition between the members of a community may prove to be more of a problem, and depends on whether populations are controlled by density-dependent or density-independent factors. It is the author’s opin- ion that both types of factors are important in animal communities. One factor influencing a group of species in a community may be a common food resource, such as in a group of insects all feeding on the same species of plant. In the situation of two insect species feeding on one plant species, the feeding of species “a” re- duces the amount of factor ““X”’ (the plant) and there- fore indirectly influences species “‘b,” the other species feeding on the same species of plant. Factors of this type are referred to as “expendable” and, when they are shown to exist, the specification equations can be modified to take them into account. The computational steps in the factor analysis tech- nique presented in this paper are outlined in Fig. 14. The assumptions underlying each step of the procedure have been discussed in the Techniques section and will Standard Scores Data Invert Correlation Matrix Principal Axis Factor Analysis Basic Diagonal Rotation Factor Pattern Matrix Factor Score Matin Specification Equations Fig. 14—Sequence of steps in creating a factor analytical model of a community. 14 not be repeated here. The experimental steps can be roughly outlined as: (1) definition of the “community” of animals or plants or both to be studied, (2) carry- ing out of the census, (3) running of the factor analy- sis, (4) identification of the factors and rotation to a specified hypothesis, (5) formulation of the specification equations (first approximation), (6) discovery and an- alysis of nonlinear, factor-variable relationships and of nonadditive factors (second approximation), and (7) discovery and measurement of specific factors for each species (third approximation). As the size of the community studied increases, the number of significant common factors discovered also increases. By increasing the number of species measured, a factor originally specific to one species may now in- fluence a second species and can be picked out by the factor analysis. As more species are considered, more factors must be identified. Because of the tremendous amount of field work and experimentation needed for this technique, the de- cision to stop at the first, second, or third approxima- tion will depend on how close the first approximation accurately predicts future changes (or spatial changes) in the species of the community and on how much time and money are available. A rough approximation is often all that is needed. A farmer usually wants to know only which species, if any, of a set of possible pests will be abundant enough to damage his crops, given a set of conditions that he can predict (e.g., will the application of a certain pesti- cide in the spring cause an increase in the populations of some potential pest species later in the year?). He is not particularly interested in the cxact level of each population. The factor analysis technique is applicable to model- ing communities in both space and time. The factor analysis approach is an improvement over the multiple regression approach (actually a form of factor analysis) in indicating not only how many factors to look for, but also which species are influenced by which factors and the extent of the influences. The psychologists have also found empirically (Kawash, personal communica- tion) that the results of a factor analysis modeling of a siutation using the specification equations tend to be much more useful when applied to similar situations (such as perhaps a model of one river being more ap- plicable to the fishes in an adjacent river), than are the multiple regression equations. Factor analysis is an extensive and complicated sub- ject. Just how useful this proposed technique will prove can only be known after it has been more extensively used and studied. LITERATURE CITED CatrTett, R. B. 1965. Factor Analysis: An introduction to essentials. Biometrics 21:190-215, 405-435. , and K. Dickman. 1962. A dynamic model of physi- cal influences demonstrating the necessity of oblique sim- ple structure. Psychology Bulletin 59:389-400. Dacne.iz, P. 1965. L’étude des communautés végétales par L’analyse statistique des liasons entre les espéces et les varia- bles écologiques: un exemple. Biometrics 21:890-907. GoopatL, D. W. 1954. Objective methods for the classifica- tion of vegetation: III. An essay in the use of factor analy- sis. Australian Journal of Botany 2:304-324. Harman, H. H. 1967. Modern Factor Analysis. 2nd ed. Uni- versity of Chicago Press, Chicago. 474 p. Hunter, A. S. 1966. High-altitude Drosophila of Colombia (Diptera: Drosophilidae). Annals of the Entomological Society of America 59:413-423. Kaiser, H. F. 1958. The varimax criterion for analytic rota- tion in factor analysis. Psychometrika 23: 187-200. Morris, R. F. [ed.] 1963. The dynamics of epidemic spruce budworm populations. Memoirs of the Entomological So- ciety of Canada No. 31. PinzKa, C., and D. R. Saunpers. 1954. Analytic rotation to simple structure, II. Extension to an oblique solution. Re- search Bulletin RB-54-31. Princeton: Educational Testing Service. Reyment, R. A. 1963. Multivariate analytical treatment of quantitative species associations: An example from palaeo- ecology. Journal of Animal Ecology 32:535-547. Scune.i, G. D. 1970. A phenetic study of the suborder Lari (Aves). I. Methods and results of principal components analyses. Systematic Zoology 19:35-57. Soca, R. R., and P. E. Hunter. 1955. A morphometric analysis of DDT-resistant and non-resistant house fly strains. Annals of the Entomological Society of America 48:499-507. , and P. H. A. Sneatu. 1963. Principles of numerical taxonomy. W. H. Freeman and Co., San Francisco. 359 p. pu Hei ee Dehetre a edie rats Beane i mf thea ed Hoes f Waebobe ete ed) aye ett San Pe Teal ree " ¥ 3 | * ee ee ene ne ue ek ‘ trite NSN Sta ¢ i ; ; Vas wa 4 fi “ ‘ Tay Y ‘ Woe eo at “i Hot : TOP ALD © ue 1) t Oy UN AM AUN hee Me os A ved : Wins AN ER MCRLLE SAL MLM CE “ CaN ' ry i i ; . t ity AM \ Kel ‘ ‘ ‘ 4 RS HAR ii HANA MMe CREO RUS ‘ H t ' a r : nS i . hi weve An ‘ wih f ay! NH, SENIORS NADP ‘ ' i \, AN MN