© 1998 by British Computer Society
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis
Department of Statistics, University of Washington, USA Email: fraley{at}stat.washington.edu
We consider the problem of determining the structure of clustered data, without prior knowledge of the number of clusters or any other information about their composition. Data are represented by a mixture model in which each component corresponds to a different cluster. Models with varying geometric properties are obtained through Gaussian components with different parametrizations and cross-cluster constraints. Noise and outliers can be modelled by adding a Poisson process component. Partitions are determined by the expectation-maximization (EM) algorithm for maximum likelihood, with initial values from agglomerative hierarchical clustering. Models are compared using an approximation to the Bayes factor based on the Bayesian information criterion (BIC); unlike significance tests, this allows comparison of more than two models at the same time, and removes the restriction that the models compared be nested. The problems of determining the number of clusters and the clustering method are solved simultaneously by choosing the best model. Moreover, the EM result provides a measure of uncertainty about the associated classification of each data point. Examples are given, showing that this approach can give performance that is much better than standard procedures, which often fail to identify groups that are either overlapping or of varying sizes and shapes.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
P. Haldar, I. D. Pavord, D. E. Shaw, M. A. Berry, M. Thomas, C. E. Brightling, A. J. Wardlaw, and R. H. Green Cluster Analysis and Clinical Asthma Phenotypes Am. J. Respir. Crit. Care Med., August 1, 2008; 178(3): 218 - 224. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. G. Neblett Patterns of Single Mothers' Work and Welfare Use: What Matters for Children's Well-Being? Journal of Family Issues, August 1, 2007; 28(8): 1083 - 1112. [Abstract] [PDF] |
||||
![]() |
J. Lopez-Sintas and E. Garcia-Alvarez Patterns of Audio-Visual Consumption: The Reflection of Objective Divisions in Class Structure Eur. Sociol. Rev., September 1, 2006; 22(4): 397 - 411. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Hu, Y. Han, S. Jang, and B. Bai E-Relational Characteristics on Hospitality and Tourism Program Web Sites Journal of Hospitality & Tourism Research, November 1, 2005; 29(4): 508 - 522. [Abstract] [PDF] |
||||
![]() |
G. Celeux, O. Martin, and C. Lavergne Mixture of linear mixed models for clustering gene expression profiles from repeated microarray experiments Statistical Modeling, October 1, 2005; 5(3): 243 - 267. [Abstract] [PDF] |
||||
![]() |
G J McLachlan and S U Chang Mixture modelling for cluster analysis Statistical Methods in Medical Research, October 1, 2004; 13(5): 347 - 361. [Abstract] [PDF] |
||||
![]() |
T. Lange, V. Roth, M. L. Braun, and J. M. Buhmann Stability-Based Validation of Clustering Solutions Neural Comput., June 1, 2004; 16(6): 1299 - 1323. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. L. JONES, D. S. NAGIN, and K. ROEDER A SAS Procedure Based on Mixture Models for Estimating Developmental Trajectories Sociological Methods Research, February 1, 2001; 29(3): 374 - 393. [Abstract] [PDF] |
||||







