siete 
P > 
ee 3a bs hee 
Mais 
bs ieaeanrha ce Ye se ements 
teins besa 


+) 


4 
- meh 
are 


LIBRARY OF THE 
UNIVERSITY OF ILLINOIS 
AT URBANA-CHAMPAIGN 


no.6G - 99 


SURVEY 


cmapien 
» Wek SOL. Ga 
> wig Jia tae 
| ae | * 
oie bale | 
ee: 


| Aah Pe 
. 
? 
E> 14 
(25338 5M—1-71) 


STATE OF ILLINOIS 
DEPARTMENT OF REGISTRATION AND EDUCATION 
NATURAL HISTORY SURVEY DIVISION 


The Use of Factor Analysis in Modeling 
Natural Communities of Plants and Animals 


Robert W. Poole 


—z- 2.15 —p— 2.75 


Illinois Natural History Survey 


Biological Notes No. 72 
Urbana, Illinois February, 1971 


The Use of Factor Analysis in Modeling 
Natural Communities of Plants and Animals 


Robert W. Poole 


Tue Prostem oF Mope.inc Communities of plants or 
animals can be studied either by observing the charac- 
teristics of the community as a whole or by determining 
the interactions among and within individual species. 
At the community level most attention has been focused 
on descriptive community analysis, species diversity, and 
energy flow. At the one- and two-species levels some 
aspects of the problem that have been, and are being, 
intensively studied are population demography, preda- 
tion, competition, parasitism, and spatial distribution. 
These basic interactions have been reasonably well de- 
scribed, and they have been integrated in the modeling 
of spruce budworm populations in Canada (Morris 
1963). However, even this one-species model is very 
complex and requires the determination of a large 
number of parameters. 

It is just as conceivable to go from the community 
level to the individual species as from the species to the 
community. The purpose of this paper is to explore 
this approach using a statistical technique known as 
factor analysis. Factor analysis is a statistical technique 
for picking out the underlying factors causing the vari- 
ance in a set of variables. 

Factor analysis originated in the psychological sci- 
ences but is now also being used in the biological sci- 
ences. Its first uses in biology were by Goodall (1954) 
and Sokal & Hunter (1955), and it has since been used 
extensively in numerical taxonomy (e.g. Sokal & Sneath 
1963; Schnell 1970) in the delimiting of the natural 
associations of plants (e.g. Dagnelie 1965) and in palaeo- 
ecology by Reyment (1963). 

Factor analysis, primarily the form known as prin- 
cipal components analysis, has been used in biology for 
the most part as a classification technique, although there 
have been some attempts to make associations between 
environmental variables and species using correlation co- 
efficients in a factor-analysis framework (e.g. Dagnelie 
1965). Factor analysis was originally developed to esti- 
mate and define the factors causing the observed re- 
sponses in a series of variables and is here used in this 
sense rather than as a classification technique. 

This paper is divided into three parts. The first 
gives a brief review of basic ecological principles nec- 
essary for the following two sections. The second section 
describes the statistical procedures considered and the 
analysis of a specific example. The third section con- 

This paper is published by authority of the State of Illinois, IRS 
Ch. 127, Par. 58.12. Dr. Robert W. Poole is Assistant Taxonomist, 


Section of Faunistie Surveys and Insect Identification, Illinois Natural 
History Survey, Urbana. 


siders the assumptions of the factor analytic model and 
compares them to the initial ecological generalities to 
see if the model really does mirror the workings of the 
community or if it only produces a set of mathematically 
correct but ecologically meaningless numbers. I have 
tried to emphasize the implications of the assumptions 
underlying the factor analysis and deemphasize the 
mathematics. Many university computing centers have 
the programs used in this paper, and interested persons 
can find the mathematics underlying the technique in 
Harman (1967). 

I wish to express my appreciation to those persons 
who have either read the manuscript or helped with 
the analysis of the example used in this paper: Dr. 
George F. Kawash of the Department of Psychology, 
University of Illinois; Dr. K. W. Dickman of the 
SOUPAC office of the Department of Computer Sci- 
ence, University of Illinois; Dr. Robert H. Whittaker 
of the Division of Biological Sciences, Cornell Univer- 
sity; Dr. Richard B. Root of the Department of En- 
tomolgy, Cornell University; Mrs. Kathleen Eickwort 
of the Department of Entomology, Cornell University; 
and my wife Beverly. I also wish to thank Dr. Philip 
W. Smith and O. F. Glissendorf of the Illinois Natural 
History Survey for their editorial contributions to the 


paper. 


FACTORS AND SPECIES POPULATIONS 


A population of an animal rarely stays at a constant 
level; usually it is either increasing or decreasing. 
Whether and how much a population increases or de- 
creases depends on the environmental factors controlling 
the limits of that population. If conditions are favor- 
able, the population increases; if they are not favor- 
able, it decreases. A species population can be affected 
by several factors, and the factors may be interacting 
among themselves. This basic relationship is diagram- 
med in Fig. 1. Not all of the factors are of equal im- 
portance to the species population, one factor usually 
being more important than the others. If the effect of 
a factor on a population depends on the density of the 
population, it is referred to as a density-dependent fac- 
tor, and if it does not depend on density, it is referred 
to as a density-independent factor. 

In a community of two or more species, a factor in- 
fluencing one species may also influence other species 
in the community. The effect of this common factor 
may vary from species to species, being more important 


Fig. 1—Diagrammatic representation of the influence of 
three factors on one species. 


for one than for another. At the same time a species 
may be influenced by a factor or set of factors which af- 
fect it only. These will be referred to as specific factors. 
This relationship is diagrammed in Fig. 2. Even in this 
relatively simple community with uncorrelated factors, 
the complexity is evident. 


SPECIES 


SPECIFIC 
FACTORS 


Fig. 2.—Diagrammatic representation of the influence of 
three specific and four common factors on three species. 


If two species share a common factor or factors, the 
changes in their populations will be correlated. For 
example, if two species are both limited by rainfall and 
rainfall is increased, both populations will increase. 
However, if one species is only slightly dependent on 
rainfall and the other strongly so, the changes will be 
disproportionate and the correlation less. As a simple 
principal rule, the correlations among a group of species 
making up a community are determined by the species’ 
mutual association with a group of common factors. 

In essence, factor analysis takes a matrix of correla- 
tion coefficients among a set of variables and reduces it 
to a series of mathematical common factors that ac- 
count for the correlations among the variables. 


TECHNIQUES 


The procedures carried out in this paper are calcu- 
lation of the correlation matrix, estimation of commu- 


nalities (the amount of variance caused by factors com- 
mon to other species), factoring of the matrix using 
the principal axis method, rotation to a specified hypoth- 
esis (transformation of the numbers to other biologically 
meaningful numbers), calculation of factor scores, and 
the formulation of the so-called specification equations 
for each species to serve as a model of the community. 
If all of the above steps except the factoring of the 
calculated correlation matrix are skipped, the result is 
a form of factor analysis normally referred to as prin- 
cipal components analysis. 


Principal Components Analysis 


Mathematically, factor analysis resolves a correla- 
tion matrix (a covariance matrix can also be used in 
some cases) into a n x k factor matrix where the num- 
ber of factors, k, is usually smaller than n, the number 
of variables (in this case species). ‘This factor matrix 
has the characteristic that when multiplied by its trans- 
pose (rows and columns interchanged) it restores the 
original correlation matrix. In matrix notation 

R=V,V,’ 
where R is the correlation matrix, V, the factor matrix, 
and V,’ the transposed factor matrix. Basically the 
problem is to resolve the correlation matrix into its 
latent roots and vectors (also referred to as eigenvalues 
and eigenvectors) . 

Principal components analysis assumes that all of 
the variance of each species can be accounted for by a 
set of factors common to all of the other species in the 
community and lumps variance due to specific factors 
and error factors in with the common factors. In the 
actual computation, the loadings (weights) of the first 
factor on each species (the latent vector or eigenvector) 
are calculated in such a way as to remove the maximum 
amount of variance from the matrix as can be explained 
by one factor. The effects of this factor are then sub- 
tracted from the correlation matrix. A second factor 
is then calculated from this reduced correlation matrix, 
and so forth until the reduced correlation matrix con- 
sists of essentially all zeros. 

These calculations have been carried out on data 
given by Hunter (1966). Hunter measured the species 
populations of Drosophila at several sites in Colombia, 
principally near Bogota. I have analyzed her data for 
“Pine Woods,” a government-protected pine forest near 
Bogota. The census was carried out from September, 
1961 to December, 1963 (28 months) by sweeping a 
net over bait. In Hunter’s table for Pine Woods, the 
figures for each month are lumped, and the abundance 
of each species expressed as a percentage of the total 
Drosophila community. In cases where a species fre- 
quency was less than | percent, it is listed only as present. 
In my analysis when a species is listed as “present,” I 
have considered it to be absent because individuals of 
that species made up less than 1 percent of the total. 
Of the 11 species listed by Hunter, I have analyzed 
only 10 because the 11th, “dreyfust 22,’ was very rare. 


5 


TaBLe 1.— Correlations among the frequencies of 10 species of Drosophila over a period of 28 months at Pines Woods (near 


Bogota, Colombia). 


S & E 
. S : x 3 
3 S S 3 3 S 
s N 3 & 8 2 8 3 3 Fs 
s : 5 = = = RS S = & 
= o 35 = = = 2 = as) Ts) 
melanogaster 1.00 
pseudoobscura 37 1.00 
bandeirantorum 34 85 1.00 
“tripunctata 20” {0h 46 52 1.00 
hydei —.08 -.09 .02 -.00 1.00 
immigrans 09 24 EOI / —aAS 1.00 
viracochi 11 =a i08) —.00 =o! .38 =p 1.00 
mesophragmatica Ships =i) —.80 =50) —.09 —.56 = 214 1.00 
brncict 43 45 61 .28 =09 05 5 —.41 1.00 
gasict =e} 40) —.18 ES -.01 58 O01 —.26 20) 1.00 


The correlation matrix was calculated (Table 1) 
using the Pearson product-moment correlation coeffi- 
cient and factored using the principal axis method. The 
resulting factors are shown in Table 2, which also shows 


Tasie 2.— Calculated factors from the principal compon- 
ents analysis. 


Percent Cumulative 

Factor Variance Variance Percent 
1 35/921 37.9214 37.9214 
2 1.9637 19.6365 57.5579 
3 1.5131 15.1310 72.6889 
4 0.9162 9.1621 81.8510 
5 0.7103 7.1034 88.8544 
6 0.5365 5.3646 94.3189 
7] 0.2941 2.9414 97.2603 
8 0.1728 1.7282 98.9885 
9 0.0971 0.9712 99.9596 
10 0.0040 0.0403 100.0000 


that the first three factors account for about 73 percent 
of the variance in the correlation matrix. The total 
number of factors extracted by the principal axis method 
cannot exceed the number of variables. Each of the 
calculated factors is affected in part by the inclusion 
of error variance and variance due to specific rather 
than common factors. Therefore the factors become 
more and more trivial and unreliable as the factoring 
proceeds, so that the factors calculated after the first 
few have no real meaning. A commonly used breaking 
point in factoring is when the eigenvalue of the factor 
falls below 1.00 (listed under variance in Table 2). 
Using this criterion, the first three factors are signifi- 
cant. The factor loadings of each factor on the 10 
species are given in Table 3. Factor loadings are a 
type of correlation between a factor and a variable, or 
more specifically, the weight of each factor in account- 
ing for the variance of a given variable. In other words 
if factor 1 had a loading of .47 on a given variable, 


TasLe 3.— Computed factor loadings from the principal 
components analysis on the 10 species cf Drosophila. 


Factor 1 Factor 2 Factor 3 
melanogaster —.4750 —.3843 —.0687 
pseudoobscura —.8581 = .0020 
bandeirantorum —.9002 —.2406 sz7/8) 
“tripunctata 20” —.6886 4540 —.1654 
hydei .0789 —.0303 7648 
immigrans —.5498 .6897 —.0449 
viracochi 1171 .0108 8662 
mesophragmatica 9091 al 25 —.3291 
brneici —.6233 —.4481 —.1364 
gasici —.0914 .8906 .0226 


and factor 2 a loading of .03, factor 1 would be more 
important to the variable than would factor 2. 


Rotation 

The set of factors arrived at in the preceding section 
and the loadings of the factors on the variables are only 
one of an essentially infinite number of possibilities. In 
other words there is an infinite number of factor matrices 
that when multiplied by their transpose will restore the 
original correlation matrix. The factors as they come 
out of the principal axis method are orthogonal to each 
other (uncorrelated). These calculated factors do not 
necessarily correspond in any way with the real attributes 
of the environment controlling the fluctuations of the 
species populations. One of this infinite array of answers 
is the correct one, however, and the problem is to find 
it. The variables can be plotted on each factor as has 
been done in Fig. 3 for factors 1 and 2. Factor 3 could 
be included and the variables would then be in a three- 
dimensional space. The addition of a fourth factor 
would be in hyperspace. Any of the possible solutions 
to the problem can be arrived at by rotating these axes 
(factors) and reading off the new factor loadings on 
each variable. This is an oversimplified explanation of 
rotation and a more complete account can be found 


in Cattell (1965) and Harman (1967). 


Factor 2 


In the principal axis method the first factor is cal- 
culated to account for as much of the variance in the 
correlation matrix as possible. The method attempts to 
have this factor loaded as heavily as possible with all 
of the variables. It is possible that a factor such as 
temperature would influence all of the species strongly, 
and in this case the calculated factor as it comes from 
the analysis would accurately reflect the actual environ- 
mental factor. However, it is also possible that a factor 
may be important to only two or three species and rela- 
tively unimportant to the others in a community. In 
this second case the factors coming from the principal 
axis analysis would not fit the real situation and must 
be rotated to a position where they do. The above situ- 
ation is satisfied by rotation to what is known as simple 
structure. The factors coming from the principal axis 
analysis are orthogonal to each other, but very often, 
probably usually, the factors: operating on the species 
are correlated with each other. By rotating to simple 
structure, the factors are allowed to be correlated 
with each other. Mathematically, rotation to simple 
structure attempts to correlate a factor with the smallest 
number of variables possible. In other words each fac- 
tor should affect only a few variables. 

In rotation, the original factor matrix (V,) is mul- 
tiplied by a transformation matrix (T) giving a new 
matrix referred to as the reference vector matrix (Vrs) 


Vrs = Weak 
The reference vector matrix does not give the new 


loadings of the factors on the variables for reasons dis- 
cussed by Cattell (1965). To calculate the new factor 


Fig. 3.—Loadings of the 10 species of 
Drosophila on factors 1 and 2. 


Factor 1 


loadings, a new matrix termed the factor-pattern matrix 
is calculated as 


Ver = Vrs)! 


where D is the diagonal matrix of the reciprocal square 
roots of the diagonal elements of the inverted matrix of 
the reference-vector correlations. The reference corre- 
lations are computed by multiplying the transformation 
matrix by its transpose 


Cre = T’ T 


where Crs is the matrix of correlations between the 
reference vectors, T the transformation matrix, and T’ 
the transpose of the transformation matrix. 

Several mechanical programs are available for rota- 
tion to simple structure. The program Oblimax (Pinz- 
ka & Saunders 1954) was found to give the most reason- 
able answers in this case and has been used in this 


TasLe 4. — New factor loadings (the factor-pattern matrix) 
after rotation to simple structure using the Oblimax program on 
the 10 species of Drosophila. 


Factor 1 Factor 2 Factor 3 
melanogaster =S7/2)il .2648 —.0889 
pseudoobscura —.8942 .0453 —.0339 
bandeirantorum —.9589 .0280 0925 
“tripunctata 20” —.4056 ~.6187 —.2004 
hydei = 262 .0744 .7833 
immigrans —.2098 —.8148 -.0725 
viracochi —.0988 0462 8881 
mesophragmatica 8618 3365 —.2956 
brneict =P) -2903 —.1641 
gasict .2786 -.9021 0152 


analysis. The factor-pattern matrix after rotation to 
simple structure is shown in Table 4. A comparison of 
Tables 3 and 4 shows few significant changes because 
of rotation to simple structure, using the Oblimax pro- 
gram (the signs have been changed in factor 2). 

The new factors produced by rotation to simple 
structure are not necessarily orthogonal and may be cor- 
related (oblique). The correlation matrix of these three 
factors is given in Table 5. 


TasLe 5.— Correlations among the factors after rotation 
to simple structure using the Oblimax program. 


The calculated communalities are listed in Table 6. 
Other techniques of communality estimation were tried: 
(1) replacing the diagonal entry of a variable by the 
square of the multiple R of each variable with all other 
variables, and (2) replacing the diagonal entry of a 
row by the square root of the average r” across the row. 
The estimated communalities using these two methods 
are also given in Table 6. The Varimax-rotation pro- 


TasBLe 6.— Estimated communalities of the 10 species of 
Drosophila using the following methods: 1) (r*ix) (Si—r*ix)/ 
(Sx—r*ix), 2) square of multiple R, 3) square root of average 
r’, 4) iterative. 


Factor 1 Factor 2 Factor 3 
Factor 1 1.0000 —.1875 —.1987 
Factor 2 3137/5) 1.0000 .0469 
Factor 3 —.1987 .0469 1.0000 


Communalities 


In the principal components analysis 1’s are entered 
in the diagonal of the correlation matrix because the 
correlation of a variable with itself is 1. In factoring the 
matrix this presumes that all of the variance of a spe- 
cies can be accounted for by factors common to other 
species. However, a species is normally affected not 
only by common factors, but by factors specific to it, 
and also error factors. 

The variance of a species (o*,) is equal to the vari- 
ance explicable by common factors (o*,;) plus the vari- 
ance of the species due to specific factors (o*,;) plus 
an error term (o*.;), 


ome a Obs + Cus + Ores 


The term o*,, is usually referred to as a variable’s 
communality. 

To remove the variance of a species due to specific 
factors and error terms, communalities for each species 
must be calculated and substituted for the diagonal ele- 
ments of the correlation matrix. Unfortunately there 
are many different techniques used to estimate com- 
munalities and none of them is “the best.” Also, the 
subject of communalities is a controversial one. 

In a practical sense, with large initial matrices the 
effect of not calculating communalities on the estimates 
of the factors is minimal and becomes less and less im- 
portant for larger and larger matrices. The calculated 
communalities are important, however, in estimating the 
reliability of the predictive equations presented later. 

Communalities in the factor analysis carried out in 
the following pages were calculated by replacing the 
diagonal entry for each row by 


(r¥ix) (Sir*ix) 7 (Sx-r*ix) 
where 

r*i, = maximum absolute rj; 

S; = absolute ry 


S; = absolute ry; 


1 2 3 4 
melanogaster .2919 8454 4075 .3780 
pseudoobscura 8338 9825 5499 .7999 
bandeirantorum 8638 .9097 5708 .8845 
“tripunctata 20” 7150 7567 .4940 1077 
hydei 2333 5736 3484 5921 
immigrans 8314 UTS .4669 .7801 
viracochi 6267 .9399 3671 7641 
mesophragmatica 9345 9932 5795 Ge 
brneici 4282 6633 4592 .6079 
gasict .4036 9522 4113 8021 


gram (Kaiser 1958) also gives iterative solutions for 
the communalities. The calculated communalities using 
this iterative technique are also given in Table 6. 


Factor Identification 


The purpose of the analysis is to arrive, mathematic- 
ally, at a set of factors corresponding to the real factors 
in the environment that cause changes in populations 
of the species in the community. This problem has been 
partially discussed under rotation. There it was shown 
that factors calculated by the principal axis analysis do 
not necessarily correspond to any real factors. To make 
these factors useful, the factor vectors must be rotated 
in hyperspace to a position where they do correspond 
to real parts of the environment. The problem of identi- 
fication can be broken into two stages: (1) rotation of 
computed factors to where they correlate heavily with 
real factors of the environment, and (2) the identifi- 
cation of the environmental factors. I will discuss the 
second stage first. 

A set of factors has been calculated that explain part 
of the variation in the population of a species. How- 
ever, to be useful these factors must correspond to real 
parts of the environment that can be identified. Basic- 
ally we want to know that factor 1 is so highly cor- 
related with rainfall that rainfall, for practical purposes, 
can be taken as factor 1. Often a person knows a priori, 
or suspects, that species “‘a” is heavily influenced by 
some factor such as maximum temperature. Therefore 
if this species has a heavy loading on one of the factors 
derived from the factor analysis, it is a good indication 
that this factor is either maximum temperature or is, in 
some way, Closely correlated with maximum tempera- 
ture. It is also possible, if measurements of maximum 


8 


temperature are available, to include maximum temper- 
ature in the data matrices as another variable. If max1- 
mum temperature as a variable loads heavily with one 
factor and little with other factors, it is likely that this 
factor is in some way related to maximum tempera- 
ture. Determining the identity of every significant fac- 
tor is not easy and depends on extensive field work. 
However, factor analysis indicates how many significant 
factors to look for, and the weightings of these factors 
on every species in the community. Even if a factor 
is interpreted incorrectly, as maximum temperature, the 
use of maximum temperature measurements for that 
factor may still give correct predictive answers, a pro- 
cedure not very scientific but pragmatically important. 
It must be emphasized that the mathematical factors 
never exactly correspond to the environmental factors, 
but they may be so heavily loaded on the environmental 
factors that measurements of the environmental factors 
can be used as approximations to the mathematical 
factors. 


The other problem in identification is the rotation 
of the factors derived from the principal axis method 
analysis to some position where they correspond to real 
parts of the environment. If the factors are not rotated, 
the hypothesis is that the factors tend to influence all 
of the variables; however, if rotated to simple structure, 
it is assumed that the real factors tend to influence 
significantly only a few of the variables. In a real situa- 
tion neither hypothesis may be the correct one. For 
example, if in a community of insects rainfall was im- 
portant to all of the species, but at the same time each 
species was restricted in its choice of food plants, there 
would be one factor influencing all of the species, and 
several other factors that influenced only a few variables 
each. This situation clearly does not fit the hypothesis 
behind the factors as they come from the principal axis 
analysis or after rotation of the factors to simple struc- 
ture. 


It is also possible to rotate the factors to fit a specific 
hypothesis, but because it is not possible to formulate a 
specific hypothesis for the example used in this paper, 
this rotation has not been done. The most difficult prob- 
lem connected with this type of community analysis 
should now be apparent. To rotate the calculated fac- 
tors to a position where they represent real factors of the 
environment, a correct hypothesis of the type of factors 
involved and the relative numbers of each (such as two 
factors influencing all of the species and three factors in- 
fluencing only two or three species) is needed. The prob- 
lem is what stage in the identification of factors is to be 
carried out first—the identification of factors or the rota- 
tion of the calculated factors to fit actual factors in the 
environment. Each is partially dependent on the other. As 
a working technique it should be possible, by extensive 
field work and experimentation, to formulate a rough hy- 
pothesis as to the percentage of significant factors that will 
influence a limited number of the species. For example, 
it might be known that rainfall influences a certain 


number of species, and there is reason to believe that 
it is important to virtually all the species in the com- 
munity. On the other hand, it might be known that 
most species in the community tend to be limited in 
their selection of food plants. Given four significant 
factors, a rough hypothesis might be that one factor in- 
fluences all of the species, and three others influence 
only a few of them. From the set of calculated factors, 
the first (the one accounting for the most variance) 
is likely to be factor 1 of the hypothesis, with the other 
three factors being fitted to the groups of species that 
they load most heavily with. The factors could then be 
rotated to fit the rough hypothesis, and the hypothesis 
could possibly be reformulated as a result of the rota- 
tion, 

Every possible rotation of the factor vectors is, of 
course, an approximation to the real situation. Some 
of the approximations will be good, others not so good. 
A question of practical importance is whether or not 
the answers derived from each rotation are much dif- 
ferent from each other. The answer to this question 
will only come through use of the factor-analysis tech- 
nique. In the example used in this paper, the differences 
between the factor loadings of the orthogonal factors 
and the factors rotated to simple structure are slight. 
It has usually been found in psychology that the changes 
in factor loadings by rotation to simple structure are 
slight (Kawash, personal communication). Simple struc- 
ture rotation tends to rotate out small error factors and 
is used more for that reason than for the hypothesis it 
represents (Cattell 1965). Even if the factors are not 
correctly rotated, the appoximation may still be close. 


Computational Procedure 

Having carried out a principal components analysis 
of the data and having partially explained the problems 
of rotation and communalities, the complete factor 
analysis will now be carried out. In the following sec- 
tion the predictive equations are formulated and _ the 
possible usefulness of the technique is discussed. 

As discussed in the preceding section, a possible aid 
in the identification of the factors is to place measure- 
ments of presumed factors into the correlation matrix as 
variables and then note if any of the calculated factors 
load heavily on them. Hunter (1966) gave data for rain- 
fall, mean maximum temperature, and mean minimum 
temperature for each month of her study. She assumed 
that rainfall was one of the most important factors, 
pointing out that its effect probably acted upon the 
larvae, or perhaps initiated egg laying in the adults. 
Hunter stated that the average time for development 
from ege to adult is about 2 months. Because these 
three environmental measurements are more likely to be 
important to the larvae that later give rise to the adults 
than to the adults directly, the three measurements have 
been entered as variables with the species with a 2- 
month lead. 

The correlation matrix of the 13 variables was com- 


9 


Taste 7.—Correlations among the 10 species of Drosophila and 3 environmental variables. with calculated communalities 


in the diagonal. 


2 3 o fe) 
Saas = a 2 

Sy se aoe 2 ete 
Se oe ome fen bee z 8 
Se ae op Se ee oS = 
See eo) ee ee ee 2 
re i .S $ S Ss 5 & S| 
Be ea ES GeoRSS) CEH se? ESA oSyer Een 

melanogaster 30 

pseudoobscura oi 90 

bandeirantorum 2) dea) fa) 

“tripunctata 20” SOAS 

hydei 03509 025 007, 

immigrans (09 e24 eo AD) Sail} affil 

viracochi Sil al) (000) eee fais} (0) 58 

mesophragmatica APE fe) ats) a) =O) a) 1) 97 

brncict 43) 456 228) 09) S05) 15) eel) 44: 

gasict S18) 520) lf} cele’ fl BY} AO A Aa) 

min. temperature Salle) sili Salil 41 Ws lyf Sa Sih et sh ef) 

max. temperature rk! AE IP = 0 AO) ARAB Fee EO Oy AS) nay 

rainfall =9 =23 =09 09 =i10 Ol —=21 29 =O) =(0, =06 15 07 


puted. Rainfall was not significantly correlated with any 
of the other 12 variables. Communalities were esti- 
mated, as described earlier in this paper, and the esti- 
mated communalities were entered in the diagonal of 
the correlation matrix. The new correlation matrix, 
with all 13 variables and the estimated communalities 
in the diagonal, is given in Table 7. 

This correlation matrix with estimated communali- 
ties was then factored by the principal axis method. The 
eigenvalues of the first three factors were 3.60, 2.28, and 
1.38, about the same as in the principal components 
analysis. The second eigenvalue is slightly higher, and 
the first and third eigenvalues are slightly smaller, than 
in the principal components analysis (Table 2). There 
is, however, a significant drop from the eigenvalue of the 
third factor (1.38) to the fourth (.61). Using an asso- 
ciated eigenvalue of 1.00 as a criterion of significance, 
the loadings of the first three factors were computed and 
are listed in Table 8. Comparison of Tables 8 and 3 


Tasie 8.— Factor loadings of the first three factors on the 
10 species of Drosophila and 3 environmental variables. 


shows few major changes in the first two factors but 
several in factor 3. None of the three environmental 


Taste 9.— Calculated reference-vector structure matrix 
after rotation to simple structure using the Oblimax program. 


Reference Reference Reference 

Vector 1 Vector 2 Vector 3 
melanogaster 3869 1672 1778 
pseudoobscura 8813 1214 0121 
bandeirantorum 8917 .0705 —.0342 
“tripunctata 20” 4832 —.5052 .1608 
hydei .0509 —.0243 —.3676 
immigrans 3563 = 017/83 .0703 
viracochi 1471 —.0075 —.8113 
mesophragmatica —.9562 3252 3045 
brneici 5194 2502 .2109 
gasici 57/7) —.8286 —.0488 
min. temperature —.0889 —.7476 0727 
max. temperature —.2096 = 2I17/83 .6141 
rainfall —.2346 0042 .2201 

Tasie 10.— Calculated factor-pattern matrix after rota- 


tion to simple structure using the Oblimax program. 


Factor 1 Factor 2 Factor 3 Factor 1 Factor 2 Factor 3 
melanogaster 4001 = 2122 .2205 melanogaster 3994 1678 1841 
pseudoobscura 8586 —.3278 .0661 pseudoobscura .9097 1218 0125 
bandeirantorum 8675 eT .0106 bandeirantorum -9205 .0708 —.0354 
“tripunctata 20” 6489 3999 0612 “tripunctata 20” 4987 —.5068 1664 
hydei —.0495 —.0746 —.3590 hydet 0525 —.0244 —.3806 
immigrans 5451 6116 —.0763 immigrans 3677 —.7196 0727 
viracochi —.0852 SIP —.7811 viracochi 1519 —.0075 —.8398 
mesophragmatica —.9466 —.0052 3344 mesophragmatica —.9870 3263 3153 
brncici 5231 = Siill5ys) .2746 brneici 5361 .2510 .2183 
gasici 1217 .7909 = PRM gasici —.0596 —.8312 —.0505 
min. temperature .1066 7497 ~.0947 min. temperature —.0918 —.7499 .0753 
max. temperature .0181 4004 5398 max. temperature —.2164 —.2180 6357 
rainfall —.1726 1044 rainfall —.2422 2279 


2062 


0042 


10 


variables is heavily loaded on any of the three factors, 
although minimum temperature has a moderately heavy 
loading on factor 2, suggesting that factor 2 may in some 
way be associated with minimum temperature. Rainfall 
is lightly loaded on all three factors, and it is thus un- 
likely that it has any association with the three factors. 
The factor matrix was then rotated to simple struc- 
ture using the Oblimax method, and the resulting refer- 
ence-vector structure matrix is given in Table 9. The 
calculated factor-pattern matrix is given in Table 10. 


Predictive Equations 


The next stage is the formulation of what are known 
as specification equations. These equations specify the 
weights to be given to each factor in accounting for the 
score (observed measurement of some kind) of each 
variable. The specification equation can be written in a 
general form as 

Vii =spF yi + SjoF oi + ----- SixF ig + SF ji + SjeF ei 

as given by Cattell (1965). If there are k observations, 
the score on a variable on one of these observations is 
equal to the sums of the scores of the factors (Fj,) in- 
fluencing the variable as modified by the significance or 
weight of each factor to the variable (the s;;). These 
factors include a series of common factors, any specific 
factors there may be, and an error factor. The specifi- 
cation equations will be the basic predictive equations. 
In the example analyzed in this paper there are 10 species 
measured at 28 observations, giving a total of 280 specifi- 
cation equations. To formulate the set of equations for 
all species in the community, it is necessary to calculate 
first the factor-score matrix (F,)and secondly the fac- 
tor-pattern matrix (Vp) which gives the necessary values 
of the sj. 

The factor-score matrix is computed by multiplying 
the reference-vector structure matrix by the basic diagon- 
al of the original correlation matrix. In computation 
this step was done by inverting the correlation matrix, 
multiplying that by the matrix of standard scores for the 
variables standardized by rows, and multiplying the re- 


sulting matrix by the reference-vector structure matrix 
(Vrs) or 


Fp = Vrs 


where F» is the factor score matrix, Vrs the reference- 
vector structure matrix, and 6 the basic diagonal of the 
correlation matrix. The resulting factor-score matrix for 
the 28 observations is given in Table 11. The factor 
scores are the standard scores for the factors calculated 
for a particular rotation. If the factors have been ro- 
tated to where they correspond to real parts of the en- 
vironment, the factor-score matrix gives estimated stand- 
ard scores for the environmental factors. If the rota- 
tion is not the correct one, the numbers are only numbers 
that will reproduce the scores on the variables. It is, of 
course, impossible to use them predictively if they are 
not real. 

Having calculated the factor-score matrix and the 
factor-pattern matrix, it is now possible to estimate the 
value of a variable on any observation. As an example, 
the standard score of Drosophila pseudoobscura at ob- 


Taste 11.— Calculated factor score matrix for the 28 ob- 
servations from the Oblimax rotation to simple structure 


Observation Factor 1 Factor 2 Factor 3 
1 .4840 1.0297 -—.5129 
2 5836 .9663 -1.0332 
3 1.7794 BLP —.6943 
4 .7821 —.1245 = H125 
5 —.4793 —.0247 .8224 
6 —.1264 3034 2.3076 
7 —.3645 .8503 1.6247 
8 1.1017 4221 7145 
9 1.2421 .0689 —.0601 

10 3.2278 —1.3368 —1.9288 
11 1.1410 —1.0098 —.6190 
12 1.1349 3455 5760 
13 .8159 —.0459 —.4046 
14 9906 = 3259) —.8530 
15 —1.4451 -9605 1.8509 
16 —1.9348 1.2871 1.7579 
17 —1.2463 6319 .7126 
18 —2.1194 1.3567 2.2356 
19 —.3433 —.2672 .0467 
20 —1.2586 .4976 1.0491 
21 —.4868 —3.9399 4878 
22 4718 —1.9417 —1.0491 
23 —.6639 —.4351 = 1375 
24 —,8399 1.2284 —.1013 
25 .0412 .0449 —2.7617 
26 —.5968 —.8071 —.5184 
27 —.8115 —.2905 =.5392 
28 —1.0795 .0832 —.8220 


servation 4 (December, 1961) equals the sums of the 
factor scores as weighted by the factor loadings for that 
period plus specific factor scores, plus an error term. 
In other words 


Drosophila pseudoobscura,,, — (.9097) (.7821) + 
(.1218) (—1245) + (.0126) (—.7125) + specific fac- 
tors;,, ++ error factors,,) 

Drosophila pseudoobscura,,, = .6873 + specific fac- 
tors,,. + error factors,,,. All scores are in standard 
form. 


Theoretically if the scores for the common factors, 
the specific factors, and the error factors were known, the 
predicted scores would exactly fit the actual scores of 
the variables (species population levels). However, in this 
case nothing is known of the specific factors and the 
error factors, and the predictions are based only on the 
variance attributable to common factors. Where com- 
mon factors account for a large percentage of the vari- 
ance of a species, the predictions should be fairly accu- 
rate. In a species population influenced to a large ex- 
tent by specific factors and error factors, the predictions 
will not be as good. To a certain extent, the reliability 
of the estimates can be judged from the size of the spe- 
cies population’s communality, species with large com- 
munalities being more predictable than those with small 
communalties. This procedure, in essence, pretends that 
specific and error factors do not exist. 

Graphs of the predicted and observed abundances (as 
standard scores) of the 10 species are given in Fig. 4-13. 
It is clear that for many of the species, particularly the 
common ones, predicted and actual values agree quite 
well, although there are still some deviations. Devia- 


D. melanogaster 


PREDICTED 


OBSERVED 


‘STANDARD SCORES 
ao e-nNoWN ® 


1961 " bEC 

Fig. 4.—Predicted versus observed abundances (standard- 
ized) for Drosophila melanogaster from September, 1961 to 
December, 1963. 


D. pseudoobscura 


—— PREDICTED 
oc (1 ERE SRY | VE 2, OBSERVED 


STANDARD SCORES 


SEPT ’ DEC 
1961 1963 


Fig. 5.—Predicted versus observed abundances (standard- 
ized) for Drosophila pseudoobscura from September, 1961 to 
December, 1963. 


D-tripunctata 20° 


-8- i —— PREDICTED 
fh ns fh Oe OBSERVED 


STANDARD SCORES 
a ta) a 
° Oo} cd ‘ 
\ \ 


NS) gt es 


= : SREY . a poe — * péc 
1961 1963 

Fig. 6.—Predicted versus observed abundances (standard- 
ized) for Drosophila “tripunctata 20” from September, 1961 to 
December, 1963. 


11 


D. bandeirantorum 


STANDARD SCORES 
nce m7eN ON & © @ 


SEPT 
1961 1963 


DEC 


Fig. 7.—Predicted versus observed abundances (standard- 
ized) for Drosophila bandeirantorum from September, 1961 to 
December, 1963. 


D. hydei 


PREDICTED 
= OBSERVED 


STANDARD SCORES 


sept | : ae add ; : " Dec 
1961 1963 


Fig. 8—Predicted versus observed abundances (standard- 
ized) for Drosophila hydei from September, 1961 to December, 
1963. 


D. immigrans 


STANDARD SCORES 
LU tay to 5 . ry . . rs rial 
comnevROw@mMnaANONRDEBON & OBO 


Sept : : i pce anh 2 Tas ay os. -4 
1961 1963 

Fig. 9.—Predicted versus observed abundances (standard- 
ized) for Drosophila immigrans from September, 1961 to De- 
cember, 1963. 


12 


D. viracochi 


a hey} 


272 
— PREDICTED 2333 


<a OBSERVED 


- eee ND 


coe nmenoanesnonebaowone awe 


STANDARD SCORES 


Ree ee 


SEPT " DEC 
1961 1963 


Fig. 10.—Predicted versus observed abundances (standard- 
ized) for Drosophila viracochi from September, 1961 to Decem- 
ber, 1963. 


D. mesophragmatica 


STANDARD SCORES 


ee rkvnowearno 


n 
° 
n 


0 7 y y met 
SEPT. C 
1961 1963 


Fig. 11—Predicted versus observed abundances (standard- 
ized) for Drosophila mesophragmatica from September, 1961 to 
December, 1963. 


tions of the predicted from the actual values (with the 
predicted values converted to raw scores), as measured 
by a Chi-square goodness-of-fit test for Drosophila pseu- 
doobscura and mesophragmatica, are highly significant; 
the fit is hardly perfect. 

Some general observations are: 

1. Common species are better modeled than rarer 
ones. 

2. Large, long-term changes can usually be pre- 
dicted, but short-term, small fluctuations cannot, 
particularly for the rarer species. 

3. The last 14 months present a better fit than the 
first 14 months. 

In a practical sense the predicted population levels 
from the above analysis are rather trivial because values 
of a set of common factors were calculated from 28 
observations on 10 variables (species) ; using these cal- 


D. brncici 


2.27 


—— PREDICTED 
OBSERVED 


ey ey eb csi 


STANDARD SCORES 
ob obRODo>RNROKRDD®OND ODO 


MS of eh 4 


SEPT re: d : ‘ DEC 

1961 1963 

Fig. 12.—Predicted versus observed abundances (standard- 

ized) for Drosophila brncici from September, 1961 to Decem- 
ber, 1963. 


D. gasici 
2.0 - 
1.8 - — PREDICTED 
YEO esse OBSERVED 
1.4- 
1.2 - 


STANDARD SCORES 
RO@2MANOKR BO ® 
ree 


Lai aia ' a TST aT te 
1961 


Fig. 13—Predicted versus observed abundances (standard- 
ized) for Drosophila gasici from September, 1961 to December, 
1963. 


culated factor scores, the population levels of the species 
for each period were recalculated. However, if the fac- 
tors influencing the species of a given community have 
been identified by a previous factor analysis and the ro- 
tation properly carried out, it is possible later to make 
measurements of the factors, standardize them, and then 
calculate predicted standard scores for all of the species 
of the community using a set of specification equations 
as above. I have not been able to do so with these data 
from the literature, because the factors have not been 
identified, or if they had been, no measurements are 
available for them. Also, there is no way to check the 
validity of the results. 

The use of these predictive equations can be illustrat- 
ed by a possible application. A factor-analysis study car- 
ried out on the community of fish in a river had deter- 
mined that water temperature was one of the important 


factors affecting the fish community. It is also known 
that the establishment of a nuclear reactor on the banks 
of the river will progressively raise the temperature of 
the water. The question is: “How will the rise in tem- 
perature of the water affect the populations of the fish 
living in the river?” The expected rises in temperatures 
with time could be entered into the specification equa- 
tions. The other factors could be assumed to be constant 
or estimates of their probable values might be entered, 
and the predicted population levels of all species of fish 
in the river estimated for time x. A weakness of the 
model is that it can never predict a species becoming 
extinct although it will approach zero frequency as a 
limit. 


DISCUSSION 


Like any other statistical technique, factor analysis 
manipulates data in an attempt to reveal the underlying 
causes and their importance to the variables measured. 
Three important assumptions are made about the data 
when factor analysis is employed (Cattell 1965): (1) 
individual variables and factors are linearly interrelated, 
(2) two factors act additively in respect to any given 
variable, and (3) there are no interaction effects among 
the variables. 

No assumptions are made about the distributions of 
the variables. Various tests for significance of factors do 
make assumptions regarding the distributions of variables 
and, for that reason, have been avoided in this paper. It 
is probable that in any real, relatively large community 
of organisms all three assumptions will be violated at 
one time or another. Because of the likelihood of some 
curvilinear or higher polynomial relationships between 
factors and variables and because of the existence of non- 
additive factors, it is important to know how closely the 
linear model assumed by the factor analysis approximates 
the situation where there are some nonlinear relation- 
ships between variables and factors. 

Cattell & Dickman (1962), using variables and 
factors between which the relationships were known, 
showed that if variables are not linearly related to the 
factors, the factor analysis approximates the determina- 
tion of the variable by representing a product by a sum. 
Over a small range this is usually considered to be a good 
approximation. For example, if a species were deter- 
mined by two factors acting multiplicatively, 


Species = sF,F, 
then the factor-analysis model approximates it by 
Species = sF, + sF, 


After the analysis has been carried out and the num- 
ber and nature of the factors determined, the linear 
model can be modified and the predictions improved by 
experimentally locating nonadditive factors and modify- 
ing the series of specification equations. The same can 
be done with nonlinear relationships between variables 
and factors. Often the mathematical relationship of 


13 


a factor to a community of species, if not linear, will 
be roughly the same for all species (i.e. if the relation- 
ship is exponential, it will be exponential for all species) . 

Two other common situations that modify the rela- 
tionships between factors and variables are threshold 
levels and competition for a limited resource. Some- 
times a factor influencing a set of variables may op- 
erate only above or below a critical value. For exam- 
ple, dispersal in some animals occurs when the popula- 
tion of a species reaches a critical density. The sigmoid 
curve of population ecology assumes that reaction to in- 
creasing density is gradual: the closer the population 
approaches the carrying capacity of the environment, 
the slower the rate of growth. It is also possible that 
there may be a situation where the curve is completely 
exponential until the carrying capacity has been reached, 
or surpassed, and a point is reached where density-de- 
pendent factors act suddenly. In some predators, search 
images are formed on abundant species of prey and, 
when the population of a prey species reaches a critical 
level, a predator population may begin to attack it to 
the exclusion of other less common species. 

Competition between the members of a community 
may prove to be more of a problem, and depends on 
whether populations are controlled by density-dependent 
or density-independent factors. It is the author’s opin- 
ion that both types of factors are important in animal 
communities. One factor influencing a group of species 
in a community may be a common food resource, such 
as in a group of insects all feeding on the same species 
of plant. In the situation of two insect species feeding 
on one plant species, the feeding of species “a” re- 
duces the amount of factor ““X”’ (the plant) and there- 
fore indirectly influences species “‘b,” the other species 
feeding on the same species of plant. Factors of this 
type are referred to as “expendable” and, when they 
are shown to exist, the specification equations can be 
modified to take them into account. 

The computational steps in the factor analysis tech- 
nique presented in this paper are outlined in Fig. 14. 
The assumptions underlying each step of the procedure 
have been discussed in the Techniques section and will 


Standard Scores Data 


Invert Correlation Matrix 


Principal Axis 
Factor Analysis 


Basic Diagonal 
Rotation 


Factor Pattern 
Matrix 


Factor Score 
Matin 
Specification Equations 


Fig. 14—Sequence of steps in creating a factor analytical 
model of a community. 


14 


not be repeated here. The experimental steps can be 
roughly outlined as: (1) definition of the “community” 
of animals or plants or both to be studied, (2) carry- 
ing out of the census, (3) running of the factor analy- 
sis, (4) identification of the factors and rotation to a 
specified hypothesis, (5) formulation of the specification 
equations (first approximation), (6) discovery and an- 
alysis of nonlinear, factor-variable relationships and of 
nonadditive factors (second approximation), and (7) 
discovery and measurement of specific factors for each 
species (third approximation). 

As the size of the community studied increases, the 
number of significant common factors discovered also 
increases. By increasing the number of species measured, 
a factor originally specific to one species may now in- 
fluence a second species and can be picked out by the 
factor analysis. As more species are considered, more 
factors must be identified. 

Because of the tremendous amount of field work 
and experimentation needed for this technique, the de- 
cision to stop at the first, second, or third approxima- 
tion will depend on how close the first approximation 
accurately predicts future changes (or spatial changes) 
in the species of the community and on how much time 
and money are available. 

A rough approximation is often all that is needed. 


A farmer usually wants to know only which species, if 
any, of a set of possible pests will be abundant enough 
to damage his crops, given a set of conditions that he 
can predict (e.g., will the application of a certain pesti- 
cide in the spring cause an increase in the populations 
of some potential pest species later in the year?). He 
is not particularly interested in the cxact level of each 
population. 

The factor analysis technique is applicable to model- 
ing communities in both space and time. The factor 
analysis approach is an improvement over the multiple 
regression approach (actually a form of factor analysis) 
in indicating not only how many factors to look for, 
but also which species are influenced by which factors 
and the extent of the influences. The psychologists have 
also found empirically (Kawash, personal communica- 
tion) that the results of a factor analysis modeling of a 
siutation using the specification equations tend to be 
much more useful when applied to similar situations 
(such as perhaps a model of one river being more ap- 
plicable to the fishes in an adjacent river), than are the 
multiple regression equations. 

Factor analysis is an extensive and complicated sub- 
ject. Just how useful this proposed technique will prove 
can only be known after it has been more extensively 
used and studied. 


LITERATURE CITED 


CatrTett, R. B. 1965. Factor Analysis: An introduction to 
essentials. Biometrics 21:190-215, 405-435. 

, and K. Dickman. 1962. A dynamic model of physi- 
cal influences demonstrating the necessity of oblique sim- 
ple structure. Psychology Bulletin 59:389-400. 

Dacne.iz, P. 1965. L’étude des communautés végétales par 
L’analyse statistique des liasons entre les espéces et les varia- 
bles écologiques: un exemple. Biometrics 21:890-907. 

GoopatL, D. W. 1954. Objective methods for the classifica- 
tion of vegetation: III. An essay in the use of factor analy- 
sis. Australian Journal of Botany 2:304-324. 

Harman, H. H. 1967. Modern Factor Analysis. 2nd ed. Uni- 
versity of Chicago Press, Chicago. 474 p. 

Hunter, A. S. 1966. High-altitude Drosophila of Colombia 
(Diptera: Drosophilidae). Annals of the Entomological 
Society of America 59:413-423. 

Kaiser, H. F. 1958. The varimax criterion for analytic rota- 
tion in factor analysis. Psychometrika 23: 187-200. 


Morris, R. F. [ed.] 1963. The dynamics of epidemic spruce 
budworm populations. Memoirs of the Entomological So- 
ciety of Canada No. 31. 

PinzKa, C., and D. R. Saunpers. 1954. Analytic rotation to 
simple structure, II. Extension to an oblique solution. Re- 
search Bulletin RB-54-31. Princeton: Educational Testing 
Service. 

Reyment, R. A. 1963. Multivariate analytical treatment of 
quantitative species associations: An example from palaeo- 
ecology. Journal of Animal Ecology 32:535-547. 

Scune.i, G. D. 1970. A phenetic study of the suborder Lari 
(Aves). I. Methods and results of principal components 
analyses. Systematic Zoology 19:35-57. 

Soca, R. R., and P. E. Hunter. 1955. A morphometric 
analysis of DDT-resistant and non-resistant house fly strains. 
Annals of the Entomological Society of America 48:499-507. 

, and P. H. A. Sneatu. 1963. Principles of numerical 

taxonomy. W. H. Freeman and Co., San Francisco. 359 p. 


pu 
Hei 


ee 


Dehetre 


a edie 


rats 
Beane 
i mf 
thea ed 


Hoes f 
Waebobe ete 
ed) 
aye 
ett 
San 
Pe 
Teal 
ree 


" 
¥ 
3 
| 


* 


ee ee ene ne 


ue 
ek ‘ trite 
NSN Sta ¢ i ; ; 
Vas wa 4 fi “ ‘ Tay 


Y 
‘ 


Woe 
eo 
at 
“i Hot : 
TOP ALD © 
ue 1) 


t 
Oy UN 
AM AUN hee Me os A ved : 
Wins AN ER MCRLLE SAL MLM CE “ CaN ' ry i i 
; . t ity AM \ 
Kel ‘ ‘ ‘ 4 
RS HAR ii HANA MMe CREO RUS ‘ H t ' 
a r : nS i . hi 
weve An ‘ wih f ay! NH, SENIORS NADP ‘ 


' i \, 
AN MN