Skip Navigation

The Computer Journal 1981 24(4):324-330; doi:10.1093/comjnl/24.4.324
© 1981 by British Computer Society
This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Pike, J.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Text compression using a 4 bit coding scheme

J. Pike *

Holly Cottage, Chawston, Bedford, UK

The most frequently used words in natural or printed English are found unexpectedly to contain only an average proportion of the most frequently used letters. This independence of the word and letter frequency distributions is used to minimise the number of bits necessary to code natural English text. It is shown that mean bit rates of less than 4 per character can be achieved for text using the full ASCII set of 96 characters, by combining a variable bit length representation of each character with a character combination dictionary of a 100 or more common words. A simple practical scheme is presented which uses, 4, 8 or 12 bits to code the characters and dictionary words. Using this scheme with a 205 word dictionary, a mean code rate of 3.87 bits per character is achieved. It is indicated how even this rate might be improved with a larger dictionary or by basing the dictionary on the more numerous word prefixes.


Received September 1980.

* Holly Cottage, Chawston, Bedford MK44 3BH, UK


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.