Skip Navigation

The Computer Journal 2002 45(2):213-220; doi:10.1093/comjnl/45.2.213
© 2002 by British Computer Society
This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Gibbons, A.
Right arrow Articles by Muthukrishnan, M.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Exact Analyses of a Simple Heuristic Employed in Array Compression

Alan Gibbons1, Ida Pu2 and Muthu Muthukrishnan3

1 Department of Computer Science, King's College London, UK Email: amg@dcs.kcl.ac.uk 2 Department of Mathematical and Computing Science, Goldsmiths College, London, UK 3 Bell Laboratories Innovations, Lucent Technologies, Murray Hill, NJ 07974, USA

We consider the classical static dictionary problem of storing a set $S$ of $x$ elements from Universe $[1,n]$ such that membership queries on elements of $S$ can be supported in worst case $O(1)$; time. This can be viewed as a table compression problem in which a table of size $n$ has $x$ ones and $(n-x)$ zeros; in our applications, $x\ll n$, that is, the table is sparse. As is well known, perfect hashing provides an optimum solution to this problem, the space requirement being $O(x)$ with constant time access to any element. Apart from perfect hashing, many applications employ simpler heuristics for compression and rely on setting inherent parameters. Although these may be known to behave usefully, often their precise behaviour is ill-understood. The object of this paper is to precisely analyse the behaviour of one such extremely simple heuristic which is known to give modest compression in practice. For the heuristic we prove that the expected asymptotic space requirement is, at worst, $a(k)n+b(k)x$ and that although its dependency on $n$ is inherent, it can be made arbitrarily small. Here $k$ is a parameter and $a(k)$ and $b(k)$ are, respectively, monotonically decreasing and increasing functions. Thus $k$ allows a trade-off between dependency on $n$ and $x$; for example, pairs $(a(k), b(k))$ can be $(0.1,3.26)$, $(0.03,5.57)$ and $(6\times10^{-4},33)$. We also show that for some applications the dependency of the space requirement on $n$ can be made sublinear. The heuristic allows constant time access to any element. Our analyses are over two different models for the uniform probability distribution and we derive exact formulae for the expected space used. We prove that the heuristic gives the same asymptotic performance in both models.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.