Skip Navigation

The Computer Journal 1994 37(2):83-87; doi:10.1093/comjnl/37.2.83
© 1994 by British Computer Society
This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Witten, I. H.
Right arrow Articles by Thimbleby, H.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Semantic and Generative Models for Lossy Text Compression

I. H. Witten1 *, T. C. Bell2 §, A. Moffat3 ¶, C. G. Nevill-Manning1 *, T. C. Smith4 {ddagger} and H. Thimbleby5 ++

1 Department of Computer Science, University of Waikato, Hamilton, New Zealand, 2 Department of Computer Science, University of Canterbury, Christchurch, New Zealand, 3 Department of Computer Science, University of Melbourne, Melbourne, Australia, 4 Department of Computer Science, University of Calgary, Calgary, Canada, 5 Department of Psychology, University of Stirling, Stirling, UK

The complementary paradigms of text compression and image compression suggest that there may be potential for applying methods developed for one domain to the other. In image coding, lossy techniques yield compression factors that are vastly superior to those of the best lossless schemes and we show that this is also the case for text. This paper investigates the resulting trade-off between subjective quality of the transmission and its compression factor. Two different methods are described, which can be combined into an extremely effective technique that provides far better compression than the present state of the art and yet preserves a reasonable degree of perceived match between the original and received text. The major challenge for lossy text compression is the quantitative evaluation of the quality of this match.


Received September 15, 1993. accepted April 1, 1994.

* Department of Computer Science, University of Waikato, Hamilton, New Zealand

§ Department of Computer Science, University of Canterbury, Christchurch, New Zealand

Department of Computer Science, University of Melbourne, Melbourne, Australia

{ddagger} Department of Computer Science, University of Calgary, Calgary, Canada

++ Department of Psychology, University of Stirling, Stirling, UK

{dagger} Correspondence to H. Thimbleby.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.