© 2002 by British Computer Society
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
How Weak Categorizers Based Upon Different Principles Strengthen Performance
1 Knowledge Media Institute, The Open University, Walton Hall, Milton Keynes, MK7 7AA, UK Email: v.s.uren@open.ac.uk 2 Division of Computer Science and Mathematics, University of Portsmouth, Portsmouth PO1 2EG, UK 3 Author to whom correspondence should be addressed
Combining the results of classifiers has shown much promise in machine learning generally. However, published work on combining text categorizers suggests that, for this particular application, improvements in performance are hard to attain. Explorative research using a simple voting system is presented and discussed in the light of a probabilistic model that was originally developed for safety critical software. It was found that typical categorization approaches produce predictions which are too similar for combining them to be effective since they tend to fail on the same records. Further experiments using two less orthodox categorizers are also presented which suggest that combining text categorizers can be successful, provided the essential element of difference is considered.
Received 6 July, 2001. Revised 26 February, 2002.