© 1999 by British Computer Society
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
Compressing Integers for Fast File Access
A1 Department of Computer Science, RMIT University, GPO Box 2476V, Melbourne 3001, Australia Email: {hugh,jz}@cs.rmit.edu.au
Fast access to files of integers is crucial for the efficient resolution of queries to databases. Integers are the basis of indexes used to resolve queries, for example, in large internet search systems, and numeric data forms a large part of most databases. Disk access costs can be reduced by compression, if the cost of retrieving a compressed representation from disk and the CPU cost of decoding such a representation is less than that of retrieving uncompressed data. In this paper we show experimentally that, for large or small collections, storing integers in a compressed format reduces the time required for either sequential stream access or random access. We compare different approaches to compressing integers, including the Elias gamma and delta codes, Golomb coding, and a variable-byte integer scheme. As a conclusion, we recommend that, for fast access to integers, files be stored compressed.
Received 23 November, 1998. Revised 15 April, 1999.