© 1998 by British Computer Society
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
An Efficient Hash-Based Algorithm for Sequence Data Searching
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong Email: mhwong{at}cse.cuhk.edu.hk
In real life, data collected day by day often appear in sequences and this type of data is called sequence data. The technique of searching for similar patterns among sequence data is very important in many applications. We first point out that there are some deficiencies in the existing definitions of sequence similarity. We then introduce a definition of sequence similarity based on the shape of sequences. The definition is also extended to handle sequence matching with linear scaling in both amplitude and time dimensions. A fast sequence searching algorithm based on extendable hashing is also proposed. The algorithm can match linearly scaled sequences and guarantee that no qualified data subsequence is falsely rejected. Several experiments are performed on real data (stock price movement) and synthetic data to measure the performance of the algorithm in different aspects.
Received October 20, 1997. revised September 30, 1998.