Skip Navigation

The Computer Journal 1978 21(1):57-62; doi:10.1093/comjnl/21.1.57
© 1978 by British Computer Society
This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (12)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Williams, P. W.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Criteria for choosing subsets to obtain maximum relative entropy

P. W. Williams *

Department of Computation, University of Manchester Institute of Science and Technology, P.O. Box 88, Sackville Street, Manchester, UK

An important factor in the performance of information retrieval systems is the choice of classification scheme, which should ideally assign an equal number of documents to each retrieval key. The concept of relative entropy, taken from information theory, has been used in the literature as a criterion of equifrequency but a mathematical basis for the methods used to generate index terms has not been presented.

This paper considers the general problem of grouping a collection of objects of known frequencies in order to balance the frequencies of the resulting sets, and presents mathematical criteria for increasing the balance of a grouping by re-arranging the sets. This leads to a method for monotonically increasing the relative entropy of the collection of objects by a sequence of multiway splitting or coalescing steps.

The theory is applied to the threshold method used by Lynch to generate equifrequent sets. It is shown that, for typical distributions, some steps in the threshold method will decrease the balance and furthermore, for some distributions, the threshold method will give very poor results.


Received April 1976.

* Department of Computation, University of Manchester Institute of Science and Technology, P.O. Box 88, Sackville Street, Manchester M60 1QD


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.