The Computer Journal Advance Access originally published online on January 29, 2008
The Computer Journal 2008 51(6):662-676; doi:10.1093/comjnl/bxm105
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||
The Collective Index: A Technique for Efficient Processing of Progressive Queries
1 Department of Computer and Information Science The University of Michigan, Dearborn, MI 48128, USA
2 Research and Advanced Engineering Ford Motor Company, Dearborn, MI 48121, USA
* Corresponding author: qzhu{at}umich.edu
Received 12 April 2007; revised 12 November 2007
The emergence of modern data-intensive applications requires sophisticated database techniques for processing advanced types of user queries on massive data. In this paper, we study such a new type of query, called progressive queries. A progressive query is defined as a set of inter-related and incrementally formulated step-queries. A step-query in a progressive query PQ is specified on the fly based on the results of previously-executed step-queries in PQ. Hence, a progressive query cannot be formulated in advance before its execution, which raises challenges for its processing and optimization. We introduce a query model to characterize different types of progressive queries. We then present a new index structure, called the collective index, to efficiently process progressive queries. The collective index technique incrementally evaluates step-queries via dynamically maintained member indexes. Utilizing the special structure of a collective index, the (member) indexes on the input relation(s) of a step-query are efficiently transformed into indexes on the result relation. Algorithms to efficiently process single-input (unary) linear and multiple-input (join) linear progressive queries based on the collective index are presented. Our experiment results show that the proposed collective index technique outperforms the conventional query processing methods in processing progressive queries.
Key Words: progressive query query processing and optimization index structure index maintenance algorithm performance