© 1989 by British Computer Society
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
A Fixed-Size Bloom Filter for Searching Textual Documents


1 Computing Science Division, Dalhousie University, Halifax, Nova Scotia, Canada B3H 3J5, 2 Technical University of Nova Scotia, Halifax, Nova Scotia, Canada B3J 2X4
The empirical false drop rate associated with a fixed-size Bloom filter used to represent textual documents may be quite different than the theoretical rate. This problem arises when the filter size is based on the expectation of a uniform distribution of the number of different terms per document. The distribution is, in fact, not uniform. This paper describes a method to determine the filter size for a database of textual documents, based on the desired false drop rate and the actual distribution of different words over the documents for that database. Theoretical and experimental results are reported and indicate that a filter size based on this method produces empirical false drop rates equivalent to the theoretical rates. The filter was also compared to variable-length filters with respect to storage requirements and search and search times.
Received June 1988. revised October 1988.
* Computing Science Division, Dalhousie University, Halifax, Nova Scotia, Canada B3H 3J5
Technical University of Nova Scotia, Halifax, Nova Scotia, Canada B3J 2X4