© 2004 by British Computer Society
Piggyback Statistics Collection for Query Optimization: Towards a Self-Maintaining Database Management System
1 Department of Computer and Information Science, The University of Michigan, Dearborn, MI 48128, USA 2 Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, MI 48109, USA 3 IBM Toronto Laboratory, Markham, Ontario, L6G 1C7, Canada
A database management system (DBMS) performs query optimization based on statistical information about data in the underlying database. Out-of-date statistics may lead to inefficient query processing in the system. The existing utility method, which collects statistics in batch mode, suffers from drawbacks such as heavy administrative burden, high system load and tardy updates. In this paper, we study approaches to performing statistical analysis on the fly during query execution, taking advantage of data already resident in main memory. We propose a framework for on-the-fly statistics collection, which we term piggybacking, and analyze the tradeoffs of piggybacking various statistics collection techniques on top of query execution plans. We present a multiple-granularity interleaving algorithm to integrate a set of piggyback operations with an execution plan, and show how the algorithm can be incorporated into an existing query optimizer. Our experiments demonstrate that useful statistics can be obtained via the piggyback method with a small overhead.
Received 8 January 2002. Revised 28 August 2003.