© 1989 by British Computer Society
Comparison of Hierarchic Agglomerative Clustering Methods for Document Retrieval
Department of Information Studies, University of Sheffield, Western Bank, Sheffield S10 2TN, UK
This paper considers the use of the single linkage, complete linkage, group average and Ward hierarchic agglomerative clustering methods for document retrieval. The methods are used to cluster seven document test collections for which queries and relevance judgements are available. Several retrieval strategies are described which allow searches to be carried out of the clustered document files resulting from the use of the four methods. These searches suggest that the group average method is the most suitable for document clustering purposes; however, searches of the unclustered document collections and of a simpler type of clustered file (based on pairs of nearest neighbours) usually result in better levels of retrieval effectiveness than searches of the clustered collections.
Received March 1987. revised June 1987.
* To whom all correspondence should be addressed.
Department of Information Studies, University of Sheffield, Western Bank, Sheffield S10 2TN