Skip Navigation

The Computer Journal 1998 41(8):578-588; doi:10.1093/comjnl/41.8.578
© 1998 by British Computer Society
This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (200)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Fraley, C.
Right arrow Articles by Raftery, A. E.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis

C. Fraley and A. E. Raftery

Department of Statistics, University of Washington, USA Email: fraley{at}stat.washington.edu

We consider the problem of determining the structure of clustered data, without prior knowledge of the number of clusters or any other information about their composition. Data are represented by a mixture model in which each component corresponds to a different cluster. Models with varying geometric properties are obtained through Gaussian components with different parametrizations and cross-cluster constraints. Noise and outliers can be modelled by adding a Poisson process component. Partitions are determined by the expectation-maximization (EM) algorithm for maximum likelihood, with initial values from agglomerative hierarchical clustering. Models are compared using an approximation to the Bayes factor based on the Bayesian information criterion (BIC); unlike significance tests, this allows comparison of more than two models at the same time, and removes the restriction that the models compared be nested. The problems of determining the number of clusters and the clustering method are solved simultaneously by choosing the best model. Moreover, the EM result provides a measure of uncertainty about the associated classification of each data point. Examples are given, showing that this approach can give performance that is much better than standard procedures, which often fail to identify groups that are either overlapping or of varying sizes and shapes.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Journal of Family IssuesHome page
N. G. Neblett
Patterns of Single Mothers' Work and Welfare Use: What Matters for Children's Well-Being?
Journal of Family Issues, August 1, 2007; 28(8): 1083 - 1112.
[Abstract] [PDF]


Home page
Eur Sociol RevHome page
J. Lopez-Sintas and E. Garcia-Alvarez
Patterns of Audio-Visual Consumption: The Reflection of Objective Divisions in Class Structure
Eur. Sociol. Rev., September 1, 2006; 22(4): 397 - 411.
[Abstract] [Full Text] [PDF]


Home page
Journal of Hospitality & Tourism ResearchHome page
C. Hu, Y. Han, S. Jang, and B. Bai
E-Relational Characteristics on Hospitality and Tourism Program Web Sites
Journal of Hospitality & Tourism Research, November 1, 2005; 29(4): 508 - 522.
[Abstract] [PDF]


Home page
Statistical ModellingHome page
G. Celeux, O. Martin, and C. Lavergne
Mixture of linear mixed models for clustering gene expression profiles from repeated microarray experiments
Statistical Modeling, October 1, 2005; 5(3): 243 - 267.
[Abstract] [PDF]


Home page
Stat Methods Med ResHome page
G J McLachlan and S U Chang
Mixture modelling for cluster analysis
Statistical Methods in Medical Research, October 1, 2004; 13(5): 347 - 361.
[Abstract] [PDF]


Home page
Neural Comput.Home page
T. Lange, V. Roth, M. L. Braun, and J. M. Buhmann
Stability-Based Validation of Clustering Solutions
Neural Comput., June 1, 2004; 16(6): 1299 - 1323.
[Abstract] [Full Text] [PDF]


Home page
Sociological Methods ResearchHome page
B. L. JONES, D. S. NAGIN, and K. ROEDER
A SAS Procedure Based on Mixture Models for Estimating Developmental Trajectories
Sociological Methods Research, February 1, 2001; 29(3): 374 - 393.
[Abstract] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.