© 2002 by British Computer Society
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
The Efficiency of Histogram-like Techniques for Database Query Optimization
1 School of Computer Science, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario, K1S 5B6, Canada Email: oommen@scs.carleton.ca
One of the most difficult tasks in modern day database management systems is information retrieval. Basically, this task involves a user query, written in a high-level language such as the Structured Query Language, and some internal operations, which are transparent to the user. The internal operations are carried out through very complex modules that decompose, optimize and execute the different operations. We consider the problem of Query Optimization which consists of the system choosing, among many different query evaluation plans (QEPs), the most economical one. Since the number of QEPs increases exponentially as the number of relations involving the query increases, query optimization is a very complex problem. Many estimation techniques have been developed in order to approximate the cost of a QEP. Histogram-based techniques are the most used methods in this context. In this paper, we discuss the efficiency of some of these methods: Equi-width, Equi-depth, the Rectangular Attribute Cardinality Map (R-ACM) and the Trapezoidal Attribute Cardinality Map (T-ACM). These methods are used to estimate the cost of the different QEP, whence they attempt to determine the optimal one. It has been shown that the errors of the estimates from R-ACM and T-ACM are significantly less than the corresponding errors obtained from Equi-width and Equi-depth. This fact has been formally demonstrated using reasonable statistical distributions for the cost of a QEP, the doubly exponential distribution and the normal distribution. For the empirical analysis, we have developed a formal, rigorous prototype model used to analyze these methods on random databases. Our empirical results demonstrate that R-ACM chooses a superior QEP more than two times as often as Equi-width and Equi-depth. Similar results have been obtained for T-ACM when compared to the traditional methods. Indeed, in the most general scenario, we analytically prove that under certain models the better the accuracy of an estimation technique, the greater the probability of choosing the most efficient QEP.
Received 26 June, 2001. Revised 25 January, 2002.