© 1994 by British Computer Society
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
Semantic and Generative Models for Lossy Text Compression


1 Department of Computer Science, University of Waikato, Hamilton, New Zealand, 2 Department of Computer Science, University of Canterbury, Christchurch, New Zealand, 3 Department of Computer Science, University of Melbourne, Melbourne, Australia, 4 Department of Computer Science, University of Calgary, Calgary, Canada, 5 Department of Psychology, University of Stirling, Stirling, UK
The complementary paradigms of text compression and image compression suggest that there may be potential for applying methods developed for one domain to the other. In image coding, lossy techniques yield compression factors that are vastly superior to those of the best lossless schemes and we show that this is also the case for text. This paper investigates the resulting trade-off between subjective quality of the transmission and its compression factor. Two different methods are described, which can be combined into an extremely effective technique that provides far better compression than the present state of the art and yet preserves a reasonable degree of perceived match between the original and received text. The major challenge for lossy text compression is the quantitative evaluation of the quality of this match.
Received September 15, 1993. accepted April 1, 1994.
* Department of Computer Science, University of Waikato, Hamilton, New Zealand
Department of Computer Science, University of Canterbury, Christchurch, New Zealand
¶ Department of Computer Science, University of Melbourne, Melbourne, Australia
Department of Computer Science, University of Calgary, Calgary, Canada
++ Department of Psychology, University of Stirling, Stirling, UK
Correspondence to H. Thimbleby.