# Full text of "A kth nearest neighbour clustering procedure"

## See other formats

HD28 .M414 Nc i^^'' Dewey MAY £6 1981 WORKING PAPER ALFRED P. SLOAN SCHOOL OF MANAGEMENT A kth Nearest Neighbour Clustering Procedure M. Anthony Wong and Tom Lane WP#1213.81 May 1981 MASSACHUSETTS INSTITUTE OF TECHNOLOGY 50 MEMORIAL DRIVE CAMBRIDGE, MASSACHUSETTS 02139 A kth Nearest Neighbour Clustering Procedure M. Anthony Wong and Tom Lane WP#1213.81 May 1981 M.I.T. LIBRARIES MAY 2 6 1981 1 A kTH NEAREST NEIGHBOUR CLUSTERING PROCEDURE M. Anthony Wong and Torn Lane Sloan School of Management Massachusetts Institute of Technology Cambridge, MA 02139 SUMMARY Due to the lack of development in the probabilistic and statistical aspects of clustering research, clustering procedures are often regarded as heuristics generating artificial clusters from a given set of sample data. In this paper, a clustering procedure that is useful for drawing statistical inference about the underlying population from a random sample is developed. It is based on the uniformly consistent kth nearest neighbour density estimate, and is applicable to both case-by-variable data matrices and case-by-case dissimilarity matrices. The proposed clustering procedure is shown to be asymptotically consistent for high-density clusters in several dimensions, and its small-sample behavior is illustrated by empirical examples. A real application is also included to demonstrate the practical utility of this clustering method. Keywords: CLUSTERING PROCEDURE; HIGH-DENSITY CLUSTERS; kTH NEAREST NEIGHBOUR DENSITY ESTIMATION; SET -CONSISTENCY. (^742€26 1. INTRODUCTION 1. 1 Shortcomings of Clustering Procedures A recent study by Blashfield and Aldenderfer (1978) shows that numerous clustering methods have been developed in the past two decades. A review of many of these techniques can be found in Cormack (1971), Anderberg (1973), Sneath and Sokal (1973), Everitt (19 74), Hartigan (1975), and Spath (1980). However, hardly any of the originators of these methods have approached the clustering problem from within a theoretical framework. More often than not, the concept of a real population cluster is vague and is left undefined. Since no statistical evaluation of the sample clusters can be performed under the circumstance, the validity of the clusters obtained by these methods is always questionable. Consequently, the existing clustering procedures are often regarded as heuristics generating artificial clusters from a given set of sample data, and there is a need of clustering procedures that are useful for drawing statistical inference about the underlying population from a sample. In this paper, a clustering procedure based on the kth nearest neighbour density estimate is proposed, and it is shown to be set-consistent for high-density clusters in several dimensions. The set-consistency property of a hierarchical clustering procedure will be defined next . 1. 2 A Theoretical Approach to Evaluating Hierarchical Clustering Methods In order to evaluate the sampling property of a clustering method, it is necessary to have population clusters defined on population probability density functions from which the observations are obtained, and to have some ways of judging how the sample clusters deviate from the population clusters. Let the observations x, , x„, ..., x in p-dimensional space be sampled from a population with density f, taken with respect to Lebesque measure. Using the high-density clustering model given in Hartigan (1975, p. 205), the true population clusters can be defined on f as follows: a high-density cluster at level f in the population is defined as a maximal connected set of the form {x|f(x) > f }. The family T of such clusters forms a tree, in that At T, Be T implies either A ^ B, B => A, or A n B = 4>. A hierarchical clustering procedure, which produces a sample clustering tree \,on the observations x. , ..., X , may then be evaluated by examining whether T^ converges to T with probability one when N approaches infinity. A clustering method (or equivalently, T ) is said to be strongly set-consistent for high-density clusters (or T) if for any A, B e T, A n B = 4), P {A„ n B., = t) as N -► "} = 1, r N N where A^ and B^ are respectively the smallest cluster in the sample tree Tj^ containing all the sample points in A and B. Since A ^ b implies k^ c b^^, this limit result means that the tree relationship in T„ converges strongly to the tree relationship in T. Using this definition of consistency, hierarchical clustering methods can be evaluated by examining whether they are strongly set-consistent for high-density clusters. If a clustering procedure is set-consistent, the sequence of enlarging hierarchical clusters that it produces in the sample are groups of points lying within successively lower density contours in the underlying distribution. Hence, these sample high-density clusters are useful in indicating the number of modal regions in the population as well as identifying their locations in the underlying space. And since it is the geometrical shape of the population density contours that determines the configuration of the sample high-density clusters, a set-consistent clustering method does not impose structure on the clusters it produces. (See Everitt (1974, Chapter 4) for some well-known clustering methods that impose a spherical structure on the clusters they produce.) On the other hand, a clustering procedure that is not set-consistent is not adaptive to the underlying distribution, and is hence not suitable for detecting high-density or "natural" clusters (see Carmichael et . al . , 1968). Hartigan (1977a, 1977b, 1979) has examined the set-consistency of many of the best known hierarchical clustering methods for high-density clusters. It was shown that the complete linkage (Sorenson 1948) and average linkage (Sneath and Sokal 1973) methods are not set-consistent, while single linkage (Sneath 1957) is weakly set-consistent in one dimension but not in higher dimensions. Thus most of the relevant evaluative work under the high-density clustering model have been carried out. However, the important problem of developing clustering procedures that are set-consistent for high-density clusters did not receive much attention. In Hartigan and Wong (1979), and Wong (1980), a hybrid clustering method is developed which is weakly set-consistent for high-density clusters in one dimension; and, there exist empirical evidence that similar consistency results hold in several dimensions. However, although the hybrid method has the advantage of being practicable for very large data sets, it is not well-suited for small samples (n < 100) and it is only applicable to case-by-variable data matrices. In this paper, a strongly set-consistent clustering procedure is developed which is applicable to both case-by-variable data matrices and case-by-case distance matrices, and its development is outlined next. 1.3 Development of the kth Nearest Neighbour Clustering Procedure Under the high-density clustering model, density estimates can be used to generate sample clusters, namely the high-density clusters defined on the estimates. And a clustering procedure is expected to be set-consistent for high-density clusters if it is based on a uniformly consistent density estimate. Single linkage corresponds to nearest neighbour density estimation (Hartigan 1977b), in which the density estimate fxjCx) at a point x is inversely proportional to the volume of the smallest closed sphere including one sample point. This density estimate is not consistent in the sense that f„(x) does not approach f(x) in probability. An improved density estimate, and perhaps improved clustering, can be obtained by the kth nearest neighbour density estimate: the estimated density at point x is f^(x) = k/(N V (x)), where V (x) is the volume of the closed sphere centered at x containing k sample points. Such a density estimate is uniformly consistent with probability 1 if f is uniformly continuous and if k = k(N) satisfies k(N)/N * and k(N)/log N -► ». (See, for example, Devroye and Wagner 1977, and Moore and Yackel 19 77.) Wishart (1969), in an attempt to improve on the single linkage clustering technique, developed a procedure entitled Mode Analysis which is related to the kth nearest neighbour density estimate. However, Wishart' s procedure was not designed to obtain the high-density clusters defined on the density estimate, and hence its set-consistency for high-density clusters was never established. Moreover, since its computational algorithm is quite complicated, the Mode Algorithm did not receive much attention in the clustering literature. In this paper, a clustering algorithm for deriving the tree of sample high-density clusters from the kth nearest neighbour density estimate is developed. A detailed description of this clustering procedure is given in Section 2. In Section 3, it is established that the proposed method is strongly set-consistent for high-density clusters. Empirical examples are given in Section 4 to illustrate the small-sample behavior of kth nearest neighbour clustering. A real example is presented in Section 5 to demonstrate the practical utility of the proposed clustering method. 2. A KTH NEAREST NEIGHBOUR CLUSTERING PROCEDURE The proposed nearest neighbour clustering algorithm consists of two stages. At the first stage, the kth nearest neighbour density estimation procedure is used to obtain a uniformly consistent estimate of the underlying density. The tree of sample high-density clusters defined on the estimated density is computed at the second stage of the algorithm. At this latter stage, a distance matrix is first computed in which the distance between two "neighbouring" points (i.e. points with the property that at least one point is one of the kth nearest neighbour of the other) is defined to be inversely proportional to a pooled density estimate at the point halfway between them, and the single linkage clustering algorithm (Sneath, 1957) is then applied to this distance matrix to obtain the tree of sample clusters. 2.1 The Density Estimation Stage The kth nearest neighbour density estimation procedure is used in this stage of the clustering procedure because it provides a strongly uniform consistent estimate of the underlying density. Let x, , . . . ,Xj^ be independent, identically distributed random vectors with values in R , p ^ 1^ and with a common probability density f. If V, (x) is the volume of the smallest sphere centered at x and containing at least k of the random vectors Xj|^,...,Xj^, then the kth nearest neighbour density estimate of f at x is fj,(x) = k/(NVj^(x)) And in Devroye and Wagner (19 77), the following strong uniform consistency result of this estimate is shown: Lemma (Devroye and Wagner, 19 77): If f is uniformly continuous on R^ and if k = k(N) is a sequence of positive integers satisfying: (a) k(N)/N - 0, and (b) k(N)/log N -- «, as N -► », then ^"P I fjj(x) - f(x)| -.- with probability 1. One purpose of the kth nearest neighbour clustering method is to discover the population high-density clusters given a random sample from some underlying distribution F with density f. In this first step of the proposed procedure, a uniformly consistent estimate of f is obtained. The high-density clusters defined on the estimated density f„ can then be used as sample estimates of the population high-density clusters defined on f. These hierarchical sample high-density clusters are constructed in the second stage of the proposed clustering algorithm. 2. 2 The Hierarchical Clustering Stage In this stage, a distance matrix D(x.,x.), 1 ^ i, j ^ N, for the N observations is first computed using the following definitions: Definition 1: Two observations x. and x. are said to be neighbours if d*(x.,x.) < d, (x.) or d,(x.), where d* is the Euclidean metric and d,(x.) is the kth nearest neighbour distance to point x.. Definition 2: The distance D(»,*) between the observations x. and x. is D(x.,Xj) = (l/2)[l/fj^(x.) + l/fj^(x.)] = |i^[Vj^(x.) + Vj^CxJ], if x. and x^ are neighbors ; = °° , otherwise . Hence, finite distances are defined only for pairs of observations which are in the same neighbourhood in R^, and the defined distance between a pair of neighbouring observations is inversely proportional to a pooled density estimate at the point halfway between them. The following single linkage clustering technique is then applied to this distance matrix D to obtain the tree of sample high-density clusters. Given a set of observations of objects x., . . . , x„ with distances D(x.,x.), 1 ^ i < j ^ N, single linkage clusters are defined as follows: let X- and X. be the closest pair of objects; amalgamate them to form a cluster c and define the distance between that cluster and any object J^ be D(c,x ) = min [D(x. ,x ),D(x.,x )]; repeat the process treating c as an object and ignoring X. and X.. The amalgamation continues until all objects are grouped in one large cluster. All clusters obtained in the course of this hierarchical algorithm are single linkage clusters. (See Gower and Ross (1969), and Hartigan (19 75) for computational single linkage algorithms.) Single linkage clustering is used in this step of the proposed procedure because it has the following property: at every stage of the clustering, the single linkage clusters are the maximal linked sets if objects x^ and x. are said to be linked whenever D(x.,x.) is no greater than a given distance D . Now, since the distance D between two "neighboring" observations is reciprocal to the density estimate f^, at the midpoint between them, every cluster obtained by applying single linkage to D has the property that the density estimates over the objects in this cluster are greater than a certain density level f . Moreover, as the distance measure D is defined only for pairs of "neighbouring" observatons, the resultant single linkage clusters correspond to maximal connected sets of the form {x|fj^(x) ^ f^}, which are the high-density clusters defined on f„. 2. 3 The Computational Algorithm Since high-density clusters are invariant to monotone transformations of the density function, the kth nearest neighbour distances dj^(x^), i = 1, ..., N are used instead of the V,(x.)'s in the following computational algorithm of the kth nearest neighbour clustering procedure: STEP 1: For i = 1, 2, ..., N, compute d,(x.), the kth nearest neighbour distance of x- . (For a computationally efficient algorithm to find the kth nearest neighbour distances, see Friedman et . al . , 19 75.) STEP 2: Compute the distance matrix D as follows: D(x.,x.) = (l/2)[d,(x.) + d,(x.)] if 1 J K 1 "^ J d*(x.,x.) < d, (x.) or d,(x.), where d* is the Euclidean i' J k 1 k J ' metric ; = ", otherwise. STEP 3: Apply the single linkage clustering algorithm to the computed distance matrix D to obtain the sample tree of high-density clusters. The computational requirements for STEP 1 and STEP 3 are 0(p N log N) and 0(nk) respectively. Hence, unlike the hybrid clustering method (Hartigan and Wong, 1979 and Wong, 1980), this procedure is not practicable for large data sets; but, it is better suited for small samples, and is applicable to both case-by-variable data matrices and case-by-case dissimilarity matrices. 3. STRONG SET -CONSISTENCY OF kTH NEAREST NEIGHBOUR CLUSTERING The asymptotic consistency of the kth nearest neighbour clustering method for high-density clusters in R^, p > 1, is given in the following theorem: Theorem: Let f denote a positive, uniformly continuous function on R such that {x|f(x) ^ f } is the union of a finite number of compact subsets of R o for every f > 0. Let T be the tree of population high-density clusters defined on f. Supose that A and B are any two disjoint high-density clusters in T with connected interiors. Let x, , . . . , Xj^ be a random sample from f and let T be the hierarchical clustering specified by the kth nearest neighbour 10 clustering algorithm. Then, provided that k. = k(N) satisfies (a) k.(N)/N ♦ 0, and (b) k(N)/log N * ». as N - », there exist Aj^, B^^ e T^ with A^^ ^ A^^ n {x^ , . . . ,Xj^} , B^^ :^ B n {x,,...,Xj,} and A n Bj. = <^ with probability 1. Proof : Since T is the tree of high-density clusters for f„, this theorem is a direct consequence of the Lemma, which states that ^xP |fj^(x) - f(x)| - 0, with probability 1. (3.1) By definition, for any two disjoint high-density clusters A and B in T, there exist 6 > 0, c > and X > 0, such that (i) f(x) > X for all X e A u B, and (3.2) (ii) each rectilinear path between A and B contains a segment, with length greater than 6, along which the density f(x) < X - 3e . (3.3) From (3.1), we have for N large, *xP |fj^(x) - f(x)| < e w.p. 1. Thus, it follows fromm (3.2) and (3.3) that for N large, with probability 1, (iii) fxj(x) > X - e for all x e A u B, and (iv) each rectilinear path between A and B contains a segment, with length greater than 6, along which the density estimate fM(x) ^ X - 2e. Since A and B are disjoint, it follows from (3.4) and (3.5) that high-density clusters of the form {x|fj^.(x) > X - e} separate the observations in A and B. The theorem follows. XX 4. EMPIRICAL STUDY OF THE SMALL-SAMPLE BEHAVIOR OF THE kTH NEAREST NEIGHBOUR CLUSTERING PROCEDURE To illustrate the small-sample behavior of the kth nearest neighbor clustering procedure, an empirical study was performed in which the procedure is applied to various generated data sets. Results of three experiments, in which bivariate data were used, are reported here. 1. Experiment One : 30 observations were generated so that two spherical clusters of observations are present in this data set. The scatter-plot of this sample set is shown in Figure la, in which the observation numbers are plotted next to the observations. This data set is useful for illustrating the effectiveness of the proposed procedure in identifying spherical clusters. The dendrogram giving the hierarchical clustering obtained by the kth nearest neighbour method (using k = 4) is shown in Figure lb. It is clear that, in this experiment, the kth nearest neighbour clustering indicates the presence of two modal regions of clusters. However, for a choice of k which is much too small, the dendrogram produced by the hybrid method would tend to suggest the presence of a few extra modal regions. Tne reason for this moderate sensitivity of the proposed method to the choice of k is that, if k is too small, extra modes tend to appear in the kth nearest neighbour density estimate, and these bumps in the estimated density function are identified as modal regions in the hierarchical clustering stage of the algorithm. 2. Experiment Two : 58 observations were generated so that two elongated, elliptical clusters of observations are present in this data set. The scatter plot of this sample set is shown in Figure 2a, in which the observation numbers are plotted next to the observations. This data set is useful for 12 illustrating the effectiveness of the proposed clustering procedure inidentifying non-spherical clusters. The dendrogram giving the hierarchical clustering obtained by the kth nearest neighbour method (using k = 4) is shown in Figure 2b. Two disjoint modal regions, corresponding to the two elliptical clusters of observations shown in Figure 2a can be identified in this dendrogram. However, observations 51, 58, and 37 form a minor modal region within one of the two clusters; and, observations 22, 23, and 2 form a minor modal region in the other cluster. 3. Experiment Thre e: 60 observations were generated so that two spherical clusters of observations are present in this sample and they are connected by a chain of "noise" observations (see Figure 3a). This data set is useful for demonstrating the effectiveness of the proposed method when a moderate amount of noise is present in the sample. The hierarchical clustering obtained by the kth nearest neighbour method (using k = 4) is shown in Figure 3b. It can be seen that the two spherical clusters are recovered by the proposed method as modal regions, in spite of the presence of the noise observations. 5. A REAL EXAMPLE In order to illustrate how the hybrid clustering method works in practice, it is applied to the well-known Iris data given in Fisher (1936). The data consist of four characteristics for three species of Iris; the species are Iris Setosa, Iris Versicolor and Iris Virginica, and the characteristics are sepal length, sepal width, petal length, and petal width. There are fifty samples from each species, and hence the total sample size is 150. This data set has been used by many authors to test the practical utility of various clustering algorithms (e.g. Friedman and Rubin, 1967). It has been found that there are two distinct clusters of samples in this data iJ set; one corresponding to the samples from Iris Setosa, and the other corresponding to samples from the other two species. Moreover, the samples from Iris Versicolor and Iris Virginica form a somewhat homogeneous group and there is no clearcut distinction between samples from these two different species (see, for example, Fisher, 1936; Friedman and Rubin, 1967; and Gnanadesikan, 1977). The hierarchical clustering obtained by applying the kth nearest neighbour method is this data set, using a value of k = 8, is shown in Figure 4. Two distinct modal regions can be identified in this sample tree of high-density clusters; one corresponding to the samples from Iris Setosa, and the other corresponding to the samples from Iris Versicolor and Iris Virginica. Moreover, within the Versicolor-Virginica modal region, there are two sub-modal regions, one such region is consisted of samples only from Iris Versicolor, while the other region is consisted of samples only from Iris Virginica. However, it should be pointed out that if k was chosen to be much longer than 8 (say, k = 12 or 15), the two sub-modal regions would not appear in the hierarchical clustering and only the two well-known distinct clusters can be identified by the kth nearest neighbour method. 10.fi 7.5*- 5.e- 2.5- 0.0 0.0 ^' 25 • 20 g3 2S A3 3 2 .7 ^ ABAS ^^ 20 21 22 X^ XS X XX xs 1 I XB i_ „... I .... , l_ -U l.H 2.S 4.3 5.7 7.1 S.S 19. Fig. la. Scatter-plot of the generated bivariate sample (N = 30) used in Experiment One. Observation numbers are plotted next to the observations. Dist. Obser. Level No. ■ -. • .•U4 1 » -1 «.«>) «— . 4 — 1 e.tii I-. • -• 1 0.9H ».0» «0 -1 1.071 T -• i I.IK I-. 1 -> t l.lSt J -t t.31 > 1 t.St4 t* 1- 1 1.»T» l> t-. -• 1 <.7a> <• 1 -t • .•4S t . , •a I ' ».e» I^M. i« ».TJJ 5.4*1 19 • .7»1 >« -1 • .»9» i» -• i e.>]* I-. 17 -• 1 e.«»T >• -1 0.»7» >t -1 e.9>T e.9i» »» -1 . IS -• ! t.e9i I-. la -• 1 l.tti I-. t.as t- »• -• 1.32I )• ■* 1.3» -. «.4a la 1 la I «.BT» ir 1 1 a.i«» It t I- Fig. lb. Tree of sample high-density clusters for data shown in Fig. la, derived from the kth nearest neighbour density estimate using k = 4. 10,0 10.0 Fig. 2 a Scatter-plot of the generated bivariate sample (N = 58) used in Experiment Two. Observation numbers are plotted next to the observations. UlSt. Level 0.»-)7 ».'^%.» • .S»'. e.ti5 • .•»« t.to • .«>• • .(ji e.(»« a. 7 0.7]> O.T]> • .7 3* •.743 • .(47 • ••&» • .•71 • ••T* •.•• •.»■» t.O* l.13t 1.17* *.7»1 • .»>» • .•» • .•M • .»*» • .tS4 • ••71 • .tf» • .t>t • .•»• • .t>» •.711 • .731 • .734 • .734 • .734 • .734 • .777 • .•■» • .(•* ».«I7 • .941 • .♦»7 1.001 i.a»* i.>»i Ubser . No. — »i -. 33 I -I I 47 -I I- 3'> -I ; I I i« -I I I • 33 -' I I- 3» -' 40 a* 34 51 if 37 54 15 30 30 5» 41 4* at •• 41 45 51 55 •> 31 4* I I-. -• I I- I I -I 1-. -• I I-- 41 11 l» II 1» \* • • \ 1» 17 33 13 a 34 19 II 3 5 10 14 10 II • 4 1> 7 • B7 • -. -• I I Fig. 2b Tree of sample high-density clusters for data shown in Figure 2a, derived from the kth nearest neighbour density estimate using k = 4. 10.0 53 51 .43. sP£.3i^' 31 ^9633^'' -"^^ •*••) G=* 53 ^^^ 22 5.0 B IC ^^ ^ ^ S.5 0.0' 0.O a.e <4.o s.o 8.0 10.0 Fig. 3a Scatter-plot of the generated bivariate sample (N = 60) used in Experiment 3. Observation numbers are plotted next to the observations . Dist. Obser. Level No. 33 -. ».*tm 1 )« -1 ».*C 1 »« -1 • .4U 1 »« -1 9.«i» I 3* -1 • .«i'. 1 >» -1 0.«i» 1 • » -1 e.4tl I-. JJ -• 1 • .•f 1 J» •! • .«• «« -1 • .4TJ *0 -1 t.*it J* -! t.lo« 41 -J ».>«< «J -1 t.ili I-. »» -• 1 t.ili 1 «> -I • .»> 1 «» -1 P.lJt 1 »» -« • .14* 1 S» -I • •»>« 1- 4* • .>M «* • .>* a* • .•3J »o 0.»]4 , » • .«»> » -. • .«»] • -1 • .»•! <> -I • .XT !'• « -• 1 • .»]• 1 • -« • .»» 1 • 4 -1 ».»»» 1 \t ».»» » o.» y ».»»3 le ».*V« It ••.»»l <* • .TJ« 1* • .79» > • .T>» >* • .Til I • •>•• • ».>«« 19 O.tJl «T e.«}> at • .»)) 33 e.»77 33 • .»JT 34 1.»>» »» %.»0> 4» ■3.0»2 4* >.»«J ** ».<T SO 1.331 »• a.)>> • 3 !.»<' t3 ».TI3 • I ».T»» l» 3. (OS »» ».33l »t >.)J> S3 FiK . 3 b -• I »- I I— • I I-. -• I I I-. -• I I — . I I — Tree of sample high-density clusters for data shown in Fig. 3a, derived from the kth nearest neighbour density estimate using k = 4. !.-• H I 00 >i •H U nj in U 01 •H -H C a •H 01 60 a. u en •H T3 O B M O •H 0) c CO 0) u c u x: C •rl CO 3 CO •H O CO .H CU O -H CJ O •H aj to ex )-l C/3 I. I CO O CO J-l 0) (1) -H en U (1) CO D. •H C/> F . ^ 10 T> CO •H H (U X! O •a 0) c to XI o bO C •H (U U CO 3 iH U CJ •H x: u u M 0) •H PC 00 •H 14 REFERENCES Anderberg, M.R. (1973). Cluster Analysis for Applications . New York: Academic Press. Blashfield, R. K. , and Aldenderfer, M.S. (1978). "The Literature on Cluster Analysis". Multivariate Behavioral Research , 13, 271-295. Carmichael, J.W., George, J. A., and Julius, R. S. (1968), "Finding natural clusters". Systematic Zoology , 17, 144-150. Cormack, R. M. (1971). "A Review of Classification". Journal of the Royal Statistical Society , Series A, 134, 321-367. Devroye , L.P., and Wagner, T.J. (1977). "The strong uniform consistency of nearest neighbour density estimates". Annals of Statistics . 5, 536-540. Everitt, B. S. (1974). Cluster Analysis , Halsted Press, New York: John Wiley. Fisher, R.A. (1936). "Use of multiple measurements in taxonomic problems". Ann. Engen. Lond. , 7, 179-188. Friedman, H. P. , and Rubin, J. (1967). "On some invariant criteria for grouping data". Journal of the American Statistical Association , 62, 1159-1178. Friedman, J.H. , Bentley, J. L. , and Finkel , R.A. (1975). "An algorithm for finding best matches in logarithmic time." SLAC-PUB-1549, Feb., 1975. Gnanadesikan , R. (1977). Statistical Data Analysis for Multivariate Observations . New York: John Wiley & Sons, 317-221. Gower , J.C. and Ross, G.J.S. (1969). "Minimum spanning trees and single linkage cluster analysis". Applied Statistics , 18, 54-64. Hartigan, J. A. (1975). Clustering Algorithms . New York: John Wiley & Sons. (1977a). "Distributional problems in clustering", in Classification and Clustering , ed . J. Van Ryzin, New York: Academic Press, (1977b). "Clusters as modes", in First International Symposium on Data Analysis and Informatics , Vol. 2, IRIA, Versailles, (1979). "Consistency of single linkage for high-density clusters". Unpublished manuscript. Department of Statistics, Yale University . , and Wong, M.A. (1979). "Hybrid Clustering". Proceedings of the 12th Interface Symposium on Computer Science and Statistics, ed . Jane Gentleman, U. of Waterloo, Press, pp. 137-143. Moore, D.S. and Yac kel , J.W. (1977). "Consistency properties of nearest neighbour density function estimators. Annals of Statistics, 5, 143-154. 15 Sneath, P.H.A. (1957). "The application of computers to taxonomy", Journal of General Microbiology . 17, 201-226. , and Sokal , R. R. (1973). Numerical Taxonomy . San Francisco: W.H. Freeman, Sorenson, T. (1948). "A method of estimating groups of equal amplitude in plant sociology based on similarity of species content". K. Danske Vidensk. Selsk. Skr. (Biol.), 5, 1-34. Spath, H. (1980). Cluster Analysis Algorithms . Halsted Press, New York: John Wiley. Wishart, D. (1969). "Mode Analysis" in Numerical Taxonomy , edited by A.J. Cole, New York: Academic Press. Wong, M.A. (1980). "A hybrid clustering method for identifying high-density clusters". Working Paper #2001-80, Sloan School of Management, Massachusetts Institute of Technology. 7257 "oSi* OC 5 '82 mm^r Lib-26-67 HD28.IV1414 no1213- 81 Wong, M. Antho/A kth nearest neighbour " iliiiiiiiiiililliiiiiiiliil 3 TOflD DD2 DOS T21