HD28
.M414
Nc i^^''
Dewey
MAY £6 1981
WORKING PAPER
ALFRED P. SLOAN SCHOOL OF MANAGEMENT
A kth Nearest Neighbour Clustering Procedure
M. Anthony Wong and Tom Lane
WP#1213.81
May 1981
MASSACHUSETTS
INSTITUTE OF TECHNOLOGY
50 MEMORIAL DRIVE
CAMBRIDGE, MASSACHUSETTS 02139
A kth Nearest Neighbour Clustering Procedure
M. Anthony Wong and Tom Lane
WP#1213.81 May 1981
M.I.T. LIBRARIES
MAY 2 6 1981
1
A kTH NEAREST NEIGHBOUR CLUSTERING PROCEDURE
M. Anthony Wong and Torn Lane
Sloan School of Management
Massachusetts Institute of Technology
Cambridge, MA 02139
SUMMARY
Due to the lack of development in the probabilistic and statistical
aspects of clustering research, clustering procedures are often
regarded as heuristics generating artificial clusters from a given
set of sample data. In this paper, a clustering procedure that is
useful for drawing statistical inference about the underlying
population from a random sample is developed. It is based on the
uniformly consistent kth nearest neighbour density estimate, and is
applicable to both case-by-variable data matrices and case-by-case
dissimilarity matrices. The proposed clustering procedure is shown
to be asymptotically consistent for high-density clusters in several
dimensions, and its small-sample behavior is illustrated by
empirical examples. A real application is also included to
demonstrate the practical utility of this clustering method.
Keywords: CLUSTERING PROCEDURE; HIGH-DENSITY CLUSTERS; kTH NEAREST NEIGHBOUR
DENSITY ESTIMATION; SET -CONSISTENCY.
(^742€26
1. INTRODUCTION
1. 1 Shortcomings of Clustering Procedures
A recent study by Blashfield and Aldenderfer (1978) shows that numerous
clustering methods have been developed in the past two decades. A review of
many of these techniques can be found in Cormack (1971), Anderberg (1973),
Sneath and Sokal (1973), Everitt (19 74), Hartigan (1975), and Spath (1980).
However, hardly any of the originators of these methods have approached the
clustering problem from within a theoretical framework. More often than not,
the concept of a real population cluster is vague and is left undefined.
Since no statistical evaluation of the sample clusters can be performed under
the circumstance, the validity of the clusters obtained by these methods is
always questionable. Consequently, the existing clustering procedures are
often regarded as heuristics generating artificial clusters from a given set
of sample data, and there is a need of clustering procedures that are useful
for drawing statistical inference about the underlying population from a
sample. In this paper, a clustering procedure based on the kth nearest
neighbour density estimate is proposed, and it is shown to be set-consistent
for high-density clusters in several dimensions. The set-consistency property
of a hierarchical clustering procedure will be defined next .
1. 2 A Theoretical Approach to Evaluating Hierarchical Clustering Methods
In order to evaluate the sampling property of a clustering method, it is
necessary to have population clusters defined on population probability
density functions from which the observations are obtained, and to have some
ways of judging how the sample clusters deviate from the population clusters.
Let the observations x, , x„, ..., x in p-dimensional space be sampled from a
population with density f, taken with respect to Lebesque measure. Using the
high-density clustering model given in Hartigan (1975, p. 205), the true
population clusters can be defined on f as follows: a high-density cluster at
level f in the population is defined as a maximal connected set of the form
{x|f(x) > f }. The family T of such clusters forms a tree, in that At T, Be
T implies either A ^ B, B => A, or A n B = 4>. A hierarchical clustering
procedure, which produces a sample clustering tree \,on the observations x. ,
..., X , may then be evaluated by examining whether T^ converges to T with
probability one when N approaches infinity. A clustering method (or
equivalently, T ) is said to be strongly set-consistent for high-density
clusters (or T) if for any A, B e T, A n B = 4),
P {A„ n B., = t) as N -► "} = 1,
r N N
where A^ and B^ are respectively the smallest cluster in the sample tree Tj^
containing all the sample points in A and B. Since A ^ b implies k^ c b^^,
this limit result means that the tree relationship in T„ converges strongly to
the tree relationship in T.
Using this definition of consistency, hierarchical clustering methods can
be evaluated by examining whether they are strongly set-consistent for
high-density clusters. If a clustering procedure is set-consistent, the
sequence of enlarging hierarchical clusters that it produces in the sample are
groups of points lying within successively lower density contours in the
underlying distribution. Hence, these sample high-density clusters are useful
in indicating the number of modal regions in the population as well as
identifying their locations in the underlying space. And since it is the
geometrical shape of the population density contours that determines the
configuration of the sample high-density clusters, a set-consistent clustering
method does not impose structure on the clusters it produces. (See Everitt
(1974, Chapter 4) for some well-known clustering methods that impose a
spherical structure on the clusters they produce.) On the other hand, a
clustering procedure that is not set-consistent is not adaptive to the
underlying distribution, and is hence not suitable for detecting high-density
or "natural" clusters (see Carmichael et . al . , 1968).
Hartigan (1977a, 1977b, 1979) has examined the set-consistency of many of
the best known hierarchical clustering methods for high-density clusters. It
was shown that the complete linkage (Sorenson 1948) and average linkage
(Sneath and Sokal 1973) methods are not set-consistent, while single linkage
(Sneath 1957) is weakly set-consistent in one dimension but not in higher
dimensions. Thus most of the relevant evaluative work under the high-density
clustering model have been carried out. However, the important problem of
developing clustering procedures that are set-consistent for high-density
clusters did not receive much attention. In Hartigan and Wong (1979), and
Wong (1980), a hybrid clustering method is developed which is weakly
set-consistent for high-density clusters in one dimension; and, there exist
empirical evidence that similar consistency results hold in several
dimensions. However, although the hybrid method has the advantage of being
practicable for very large data sets, it is not well-suited for small samples
(n < 100) and it is only applicable to case-by-variable data matrices. In
this paper, a strongly set-consistent clustering procedure is developed which
is applicable to both case-by-variable data matrices and case-by-case distance
matrices, and its development is outlined next.
1.3 Development of the kth Nearest Neighbour Clustering Procedure
Under the high-density clustering model, density estimates can be used to
generate sample clusters, namely the high-density clusters defined on the
estimates. And a clustering procedure is expected to be set-consistent for
high-density clusters if it is based on a uniformly consistent density
estimate. Single linkage corresponds to nearest neighbour density estimation
(Hartigan 1977b), in which the density estimate fxjCx) at a point x is
inversely proportional to the volume of the smallest closed sphere including
one sample point. This density estimate is not consistent in the sense that
f„(x) does not approach f(x) in probability. An improved density estimate,
and perhaps improved clustering, can be obtained by the kth nearest neighbour
density estimate: the estimated density at point x is f^(x) = k/(N V (x)),
where V (x) is the volume of the closed sphere centered at x containing k
sample points. Such a density estimate is uniformly consistent with
probability 1 if f is uniformly continuous and if k = k(N) satisfies k(N)/N *
and k(N)/log N -► ». (See, for example, Devroye and Wagner 1977, and Moore
and Yackel 19 77.)
Wishart (1969), in an attempt to improve on the single linkage clustering
technique, developed a procedure entitled Mode Analysis which is related to
the kth nearest neighbour density estimate. However, Wishart' s procedure was
not designed to obtain the high-density clusters defined on the density
estimate, and hence its set-consistency for high-density clusters was never
established. Moreover, since its computational algorithm is quite
complicated, the Mode Algorithm did not receive much attention in the
clustering literature. In this paper, a clustering algorithm for deriving the
tree of sample high-density clusters from the kth nearest neighbour density
estimate is developed. A detailed description of this clustering procedure is
given in Section 2. In Section 3, it is established that the proposed method
is strongly set-consistent for high-density clusters. Empirical examples are
given in Section 4 to illustrate the small-sample behavior of kth nearest
neighbour clustering. A real example is presented in Section 5 to demonstrate
the practical utility of the proposed clustering method.
2. A KTH NEAREST NEIGHBOUR CLUSTERING PROCEDURE
The proposed nearest neighbour clustering algorithm consists of two
stages. At the first stage, the kth nearest neighbour density estimation
procedure is used to obtain a uniformly consistent estimate of the underlying
density. The tree of sample high-density clusters defined on the estimated
density is computed at the second stage of the algorithm. At this latter
stage, a distance matrix is first computed in which the distance between two
"neighbouring" points (i.e. points with the property that at least one point
is one of the kth nearest neighbour of the other) is defined to be inversely
proportional to a pooled density estimate at the point halfway between them,
and the single linkage clustering algorithm (Sneath, 1957) is then applied to
this distance matrix to obtain the tree of sample clusters.
2.1 The Density Estimation Stage
The kth nearest neighbour density estimation procedure is used in this
stage of the clustering procedure because it provides a strongly uniform
consistent estimate of the underlying density. Let x, , . . . ,Xj^ be independent,
identically distributed random vectors with values in R , p ^ 1^ and with a
common probability density f. If V, (x) is the volume of the smallest sphere
centered at x and containing at least k of the random vectors Xj|^,...,Xj^, then
the kth nearest neighbour density estimate of f at x is
fj,(x) = k/(NVj^(x))
And in Devroye and Wagner (19 77), the following strong uniform consistency
result of this estimate is shown:
Lemma (Devroye and Wagner, 19 77):
If f is uniformly continuous on R^ and if k = k(N) is a sequence of
positive integers satisfying:
(a) k(N)/N - 0, and
(b) k(N)/log N -- «, as N -► »,
then
^"P I fjj(x) - f(x)| -.- with probability 1.
One purpose of the kth nearest neighbour clustering method is to discover the
population high-density clusters given a random sample from some underlying
distribution F with density f. In this first step of the proposed procedure,
a uniformly consistent estimate of f is obtained. The high-density clusters
defined on the estimated density f„ can then be used as sample estimates of
the population high-density clusters defined on f. These hierarchical sample
high-density clusters are constructed in the second stage of the proposed
clustering algorithm.
2. 2 The Hierarchical Clustering Stage
In this stage, a distance matrix D(x.,x.), 1 ^ i, j ^ N, for the N
observations is first computed using the following definitions:
Definition 1: Two observations x. and x. are said to be neighbours if
d*(x.,x.) < d, (x.) or d,(x.), where d* is the Euclidean metric and d,(x.) is
the kth nearest neighbour distance to point x..
Definition 2: The distance D(»,*) between the observations x. and x.
is
D(x.,Xj) = (l/2)[l/fj^(x.) + l/fj^(x.)] = |i^[Vj^(x.) + Vj^CxJ], if x. and x^ are
neighbors ;
= °° , otherwise .
Hence, finite distances are defined only for pairs of observations which
are in the same neighbourhood in R^, and the defined distance between a pair
of neighbouring observations is inversely proportional to a pooled density
estimate at the point halfway between them. The following single linkage
clustering technique is then applied to this distance matrix D to obtain the
tree of sample high-density clusters.
Given a set of observations of objects x., . . . , x„ with distances
D(x.,x.), 1 ^ i < j ^ N, single linkage clusters are defined as follows: let
X- and X. be the closest pair of objects; amalgamate them to form a cluster c
and define the distance between that cluster and any object J^ be D(c,x ) = min
[D(x. ,x ),D(x.,x )]; repeat the process treating c as an object and ignoring
X. and X.. The amalgamation continues until all objects are grouped in one
large cluster. All clusters obtained in the course of this hierarchical
algorithm are single linkage clusters. (See Gower and Ross (1969), and
Hartigan (19 75) for computational single linkage algorithms.) Single linkage
clustering is used in this step of the proposed procedure because it has the
following property: at every stage of the clustering, the single linkage
clusters are the maximal linked sets if objects x^ and x. are said to be
linked whenever D(x.,x.) is no greater than a given distance D . Now, since
the distance D between two "neighboring" observations is reciprocal to the
density estimate f^, at the midpoint between them, every cluster obtained by
applying single linkage to D has the property that the density estimates over
the objects in this cluster are greater than a certain density level f .
Moreover, as the distance measure D is defined only for pairs of
"neighbouring" observatons, the resultant single linkage clusters correspond
to maximal connected sets of the form {x|fj^(x) ^ f^}, which are the
high-density clusters defined on f„.
2. 3 The Computational Algorithm
Since high-density clusters are invariant to monotone transformations of
the density function, the kth nearest neighbour distances dj^(x^), i = 1, ...,
N are used instead of the V,(x.)'s in the following computational algorithm of
the kth nearest neighbour clustering procedure:
STEP 1: For i = 1, 2, ..., N, compute d,(x.), the kth nearest neighbour
distance of x- . (For a computationally efficient algorithm to find
the kth nearest neighbour distances, see Friedman et . al . , 19 75.)
STEP 2: Compute the distance matrix D as follows:
D(x.,x.) = (l/2)[d,(x.) + d,(x.)] if
1 J K 1 "^ J
d*(x.,x.) < d, (x.) or d,(x.), where d* is the Euclidean
i' J k 1 k J '
metric ;
= ", otherwise.
STEP 3: Apply the single linkage clustering algorithm to the computed
distance matrix D to obtain the sample tree of high-density clusters.
The computational requirements for STEP 1 and STEP 3 are 0(p N log N) and
0(nk) respectively. Hence, unlike the hybrid clustering method (Hartigan and
Wong, 1979 and Wong, 1980), this procedure is not practicable for large data
sets; but, it is better suited for small samples, and is applicable to both
case-by-variable data matrices and case-by-case dissimilarity matrices.
3. STRONG SET -CONSISTENCY OF kTH NEAREST NEIGHBOUR CLUSTERING
The asymptotic consistency of the kth nearest neighbour clustering method
for high-density clusters in R^, p > 1, is given in the following theorem:
Theorem: Let f denote a positive, uniformly continuous function on R such
that {x|f(x) ^ f } is the union of a finite number of compact subsets of R
o
for every f > 0. Let T be the tree of population high-density clusters
defined on f. Supose that A and B are any two disjoint high-density clusters
in T with connected interiors. Let x, , . . . , Xj^ be a random sample from f and
let T be the hierarchical clustering specified by the kth nearest neighbour
10
clustering algorithm. Then, provided that k. = k(N) satisfies
(a) k.(N)/N ♦ 0, and
(b) k(N)/log N * ».
as N - », there exist Aj^, B^^ e T^ with A^^ ^ A^^ n {x^ , . . . ,Xj^} , B^^ :^ B n
{x,,...,Xj,} and A n Bj. = <^ with probability 1.
Proof : Since T is the tree of high-density clusters for f„, this theorem is
a direct consequence of the Lemma, which states that
^xP |fj^(x) - f(x)| - 0, with probability 1. (3.1)
By definition, for any two disjoint high-density clusters A and B in T, there
exist 6 > 0, c > and X > 0, such that
(i) f(x) > X for all X e A u B, and (3.2)
(ii) each rectilinear path between A and B contains a segment, with
length greater than 6, along which the density f(x) < X - 3e . (3.3)
From (3.1), we have for N large,
*xP |fj^(x) - f(x)| < e w.p. 1.
Thus, it follows fromm (3.2) and (3.3) that for N large, with probability 1,
(iii) fxj(x) > X - e for all x e A u B, and
(iv) each rectilinear path between A and B contains a segment, with
length greater than 6, along which the density estimate fM(x) ^ X -
2e.
Since A and B are disjoint, it follows from (3.4) and (3.5) that
high-density clusters of the form {x|fj^.(x) > X - e} separate the observations
in A and B. The theorem follows.
XX
4. EMPIRICAL STUDY OF THE SMALL-SAMPLE BEHAVIOR OF THE kTH NEAREST NEIGHBOUR
CLUSTERING PROCEDURE
To illustrate the small-sample behavior of the kth nearest neighbor
clustering procedure, an empirical study was performed in which the procedure
is applied to various generated data sets. Results of three experiments, in
which bivariate data were used, are reported here.
1. Experiment One : 30 observations were generated so that two spherical
clusters of observations are present in this data set. The scatter-plot of
this sample set is shown in Figure la, in which the observation numbers are
plotted next to the observations. This data set is useful for illustrating
the effectiveness of the proposed procedure in identifying spherical clusters.
The dendrogram giving the hierarchical clustering obtained by the kth nearest
neighbour method (using k = 4) is shown in Figure lb. It is clear that, in
this experiment, the kth nearest neighbour clustering indicates the presence
of two modal regions of clusters.
However, for a choice of k which is much too small, the dendrogram
produced by the hybrid method would tend to suggest the presence of a few
extra modal regions. Tne reason for this moderate sensitivity of the proposed
method to the choice of k is that, if k is too small, extra modes tend to
appear in the kth nearest neighbour density estimate, and these bumps in the
estimated density function are identified as modal regions in the hierarchical
clustering stage of the algorithm.
2. Experiment Two : 58 observations were generated so that two elongated,
elliptical clusters of observations are present in this data set. The scatter
plot of this sample set is shown in Figure 2a, in which the observation
numbers are plotted next to the observations. This data set is useful for
12
illustrating the effectiveness of the proposed clustering procedure
inidentifying non-spherical clusters. The dendrogram giving the hierarchical
clustering obtained by the kth nearest neighbour method (using k = 4) is shown
in Figure 2b. Two disjoint modal regions, corresponding to the two elliptical
clusters of observations shown in Figure 2a can be identified in this
dendrogram. However, observations 51, 58, and 37 form a minor modal region
within one of the two clusters; and, observations 22, 23, and 2 form a minor
modal region in the other cluster.
3. Experiment Thre e: 60 observations were generated so that two spherical
clusters of observations are present in this sample and they are connected by
a chain of "noise" observations (see Figure 3a). This data set is useful for
demonstrating the effectiveness of the proposed method when a moderate amount
of noise is present in the sample. The hierarchical clustering obtained by
the kth nearest neighbour method (using k = 4) is shown in Figure 3b. It can
be seen that the two spherical clusters are recovered by the proposed method
as modal regions, in spite of the presence of the noise observations.
5. A REAL EXAMPLE
In order to illustrate how the hybrid clustering method works in
practice, it is applied to the well-known Iris data given in Fisher (1936).
The data consist of four characteristics for three species of Iris; the
species are Iris Setosa, Iris Versicolor and Iris Virginica, and the
characteristics are sepal length, sepal width, petal length, and petal width.
There are fifty samples from each species, and hence the total sample size is
150. This data set has been used by many authors to test the practical
utility of various clustering algorithms (e.g. Friedman and Rubin, 1967). It
has been found that there are two distinct clusters of samples in this data
iJ
set; one corresponding to the samples from Iris Setosa, and the other
corresponding to samples from the other two species. Moreover, the samples
from Iris Versicolor and Iris Virginica form a somewhat homogeneous group and
there is no clearcut distinction between samples from these two different
species (see, for example, Fisher, 1936; Friedman and Rubin, 1967; and
Gnanadesikan, 1977).
The hierarchical clustering obtained by applying the kth nearest neighbour
method is this data set, using a value of k = 8, is shown in Figure 4. Two
distinct modal regions can be identified in this sample tree of high-density
clusters; one corresponding to the samples from Iris Setosa, and the other
corresponding to the samples from Iris Versicolor and Iris Virginica.
Moreover, within the Versicolor-Virginica modal region, there are two
sub-modal regions, one such region is consisted of samples only from Iris
Versicolor, while the other region is consisted of samples only from Iris
Virginica. However, it should be pointed out that if k was chosen to be much
longer than 8 (say, k = 12 or 15), the two sub-modal regions would not appear
in the hierarchical clustering and only the two well-known distinct clusters
can be identified by the kth nearest neighbour method.
10.fi
7.5*-
5.e-
2.5-
0.0
0.0
^' 25
•
20 g3
2S
A3
3
2
.7
^
ABAS ^^
20 21 22
X^
XS
X
XX
xs
1
I
XB
i_
„... I .... , l_ -U
l.H
2.S
4.3
5.7
7.1
S.S
19.
Fig. la. Scatter-plot of the generated bivariate sample (N = 30)
used in Experiment One. Observation numbers are plotted
next to the observations.
Dist.
Obser.
Level
No.
■ -.
• .•U4
1
» -1
«.«>)
«— .
4 — 1
e.tii
I-.
• -• 1
0.9H
».0»
«0 -1
1.071
T -• i
I.IK
I-.
1 -> t
l.lSt
J -t
t.31
>
1
t.St4
t*
1-
1
1.»T»
l>
t-.
-• 1
<.7a>
<•
1
-t
• .•4S
t . ,
•a
I '
».e»
I^M.
i«
».TJJ
5.4*1
19
• .7»1
>« -1
• .»9»
i» -• i
e.>]*
I-.
17 -• 1
e.«»T
>• -1
0.»7»
>t -1
e.9>T
e.9i»
»» -1
.
IS -• !
t.e9i
I-.
la -• 1
l.tti
I-.
t.as
t-
»• -•
1.32I
)• ■*
1.3»
-.
«.4a
la
1
la
I
«.BT»
ir
1
1
a.i«»
It
t
I-
Fig. lb. Tree of sample high-density clusters for data shown in Fig. la,
derived from the kth nearest neighbour density estimate using
k = 4.
10,0
10.0
Fig. 2 a Scatter-plot of the generated bivariate sample (N = 58)
used in Experiment Two. Observation numbers are plotted
next to the observations.
UlSt.
Level
0.»-)7
».'^%.»
• .S»'.
e.ti5
• .•»«
t.to
• .«>•
• .(ji
e.(»«
a. 7
0.7]>
O.T]>
• .7 3*
•.743
• .(47
• ••&»
• .•71
• ••T*
•.••
•.»■»
t.O*
l.13t
1.17*
*.7»1
• .»>»
• .•»
• .•M
• .»*»
• .tS4
• ••71
• .tf»
• .t>t
• .•»•
• .t>»
•.711
• .731
• .734
• .734
• .734
• .734
• .777
• .•■»
• .(•*
».«I7
• .941
• .♦»7
1.001
i.a»*
i.>»i
Ubser .
No.
— »i -.
33
I
-I
I
47 -I
I-
3'> -I ;
I I
i« -I I
I •
33 -' I
I-
3» -'
40
a*
34
51
if
37
54
15
30
30
5»
41
4*
at
••
41
45
51
55
•>
31
4*
I
I-.
-• I
I-
I
I
-I
1-.
-• I
I--
41
11
l»
II
1»
\*
• •
\
1»
17
33
13
a
34
19
II
3
5
10
14
10
II
•
4
1>
7
•
B7
• -.
-• I
I
Fig. 2b
Tree of sample high-density clusters for data shown in Figure 2a,
derived from the kth nearest neighbour density estimate using k = 4.
10.0
53
51
.43.
sP£.3i^'
31 ^9633^'' -"^^
•*••)
G=* 53 ^^^
22
5.0 B IC ^^ ^ ^
S.5
0.0'
0.O a.e <4.o s.o 8.0 10.0
Fig. 3a Scatter-plot of the generated bivariate sample (N = 60) used in
Experiment 3. Observation numbers are plotted next to the
observations .
Dist.
Obser.
Level
No.
33 -.
».*tm
1
)« -1
».*C
1
»« -1
• .4U
1
»« -1
9.«i»
I
3* -1
• .«i'.
1
>» -1
0.«i»
1
• » -1
e.4tl
I-.
JJ -• 1
• .•f 1
J» •!
• .«•
«« -1
• .4TJ
*0 -1
t.*it
J* -!
t.lo«
41 -J
».>«<
«J -1
t.ili
I-.
»» -• 1
t.ili
1
«> -I
• .»>
1
«» -1
P.lJt
1
»» -«
• .14*
1
S» -I
• •»>«
1-
4*
• .>M
«*
• .>*
a*
• .•3J
»o
0.»]4
,
»
• .«»>
» -.
• .«»]
• -1
• .»•!
<> -I
• .XT
!'•
« -• 1
• .»]•
1
• -«
• .»»
1
• 4 -1
».»»»
1
\t
».»»
»
o.»
y
».»»3
le
».*V«
It
••.»»l
<*
• .TJ«
1*
• .79»
>
• .T>»
>*
• .Til
I
• •>••
•
».>««
19
O.tJl
«T
e.«}>
at
• .»))
33
e.»77
33
• .»JT
34
1.»>»
»»
%.»0>
4»
■3.0»2
4*
>.»«J
**
».<T
SO
1.331
»•
a.)>>
• 3
!.»<'
t3
».TI3
• I
».T»»
l»
3. (OS
»»
».33l
»t
>.)J>
S3
FiK
. 3 b
-• I
»-
I
I—
• I
I-.
-• I
I
I-.
-• I
I — .
I
I —
Tree of sample high-density clusters for data shown in Fig. 3a, derived from
the kth nearest neighbour density estimate using k = 4.
!.-•
H
I
00
>i
•H
U
nj in
U 01
•H -H
C a
•H 01
60 a.
u en
•H
T3
O
B
M
O
•H
0)
c
CO
0)
u
c
u
x:
C
•rl
CO
3
CO
•H
O CO
.H CU
O -H
CJ O
•H aj
to ex
)-l C/3
I.
I
CO
O CO
J-l 0)
(1) -H
en U
(1)
CO D.
•H C/>
F . ^
10
T>
CO
•H
H
(U
X!
O
•a
0)
c
to
XI
o
bO
C
•H
(U
U
CO
3
iH
U
CJ
•H
x:
u
u
M
0)
•H
PC
00
•H
14
REFERENCES
Anderberg, M.R. (1973). Cluster Analysis for Applications . New York:
Academic Press.
Blashfield, R. K. , and Aldenderfer, M.S. (1978). "The Literature on Cluster
Analysis". Multivariate Behavioral Research , 13, 271-295.
Carmichael, J.W., George, J. A., and Julius, R. S. (1968), "Finding natural
clusters". Systematic Zoology , 17, 144-150.
Cormack, R. M. (1971). "A Review of Classification". Journal of the Royal
Statistical Society , Series A, 134, 321-367.
Devroye , L.P., and Wagner, T.J. (1977). "The strong uniform consistency of
nearest neighbour density estimates". Annals of Statistics . 5, 536-540.
Everitt, B. S. (1974). Cluster Analysis , Halsted Press, New York: John Wiley.
Fisher, R.A. (1936). "Use of multiple measurements in taxonomic problems".
Ann. Engen. Lond. , 7, 179-188.
Friedman, H. P. , and Rubin, J. (1967). "On some invariant criteria for grouping
data". Journal of the American Statistical Association , 62, 1159-1178.
Friedman, J.H. , Bentley, J. L. , and Finkel , R.A. (1975). "An algorithm for
finding best matches in logarithmic time." SLAC-PUB-1549, Feb., 1975.
Gnanadesikan , R. (1977). Statistical Data Analysis for Multivariate
Observations . New York: John Wiley & Sons, 317-221.
Gower , J.C. and Ross, G.J.S. (1969). "Minimum spanning trees and single
linkage cluster analysis". Applied Statistics , 18, 54-64.
Hartigan, J. A. (1975). Clustering Algorithms . New York: John Wiley & Sons.
(1977a). "Distributional problems in clustering", in
Classification and Clustering , ed . J. Van Ryzin, New York: Academic Press,
(1977b). "Clusters as modes", in First International Symposium
on Data Analysis and Informatics , Vol. 2, IRIA, Versailles,
(1979). "Consistency of single linkage for high-density
clusters". Unpublished manuscript. Department of Statistics, Yale
University .
, and Wong, M.A. (1979). "Hybrid Clustering". Proceedings of
the 12th Interface Symposium on Computer Science and Statistics, ed . Jane
Gentleman, U. of Waterloo, Press, pp. 137-143.
Moore, D.S. and Yac kel , J.W. (1977). "Consistency properties of nearest
neighbour density function estimators. Annals of Statistics, 5, 143-154.
15
Sneath, P.H.A. (1957). "The application of computers to taxonomy", Journal of
General Microbiology . 17, 201-226.
, and Sokal , R. R. (1973). Numerical Taxonomy . San Francisco:
W.H. Freeman,
Sorenson, T. (1948). "A method of estimating groups of equal amplitude in
plant sociology based on similarity of species content". K. Danske Vidensk.
Selsk. Skr. (Biol.), 5, 1-34.
Spath, H. (1980). Cluster Analysis Algorithms . Halsted Press, New York:
John Wiley.
Wishart, D. (1969). "Mode Analysis" in Numerical Taxonomy , edited by A.J.
Cole, New York: Academic Press.
Wong, M.A. (1980). "A hybrid clustering method for identifying high-density
clusters". Working Paper #2001-80, Sloan School of Management,
Massachusetts Institute of Technology.
7257 "oSi*
OC 5
'82
mm^r
Lib-26-67
HD28.IV1414 no1213- 81
Wong, M. Antho/A kth nearest neighbour
" iliiiiiiiiiililliiiiiiiliil
3 TOflD DD2 DOS T21