© 1995 by British Computer Society
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
Practical length-limited coding for large alphabets*


Department of Computer Science, The University of Melbourne, Parkville 3052, Australia Email: aht{at}cs.mu.oz.au
The use of minimum-cost coding for economical representation of a stream of symbols drawn from a defined source alphabet is widely known. However, for large-scale compression minimum-cost coding has the drawback that codewords generated may be longer than a machine word, limiting the usefulness of both software and hardware implementations on word-based architectures. The solution is to generate length-limited codes, and accept the consequent loss of compression effectiveness in order to preserve the simplicity and speed of the encoding and decoding software. Here we re-examine the package-merge algorithm for generating minimum-cost length-limited prefix-free codes and show that with a considered reorganization of the key steps it is possible for it or run quickly in significantly less memory than was required by previous implementations, while retaining asymptotic efficiency. As evidence of the practical usefulness of the improved method we describe experiments on an alphabet of over 1 million symbols, for which length-limited codes can be constructed in 11 Mb of memory and about 20 seconds of CPU time.
Received March 24 1995. revised August 3 1995.
* This paper includes materials presented in preliminary form at the 1995 Australasian Computer Science Conference.
Department of Computer Science, The University of Melbourne, Parkville 3052, Australia Email: aht{at}cs.mu.oz.au