© 1996 by British Computer Society
Minimizing Data Transfers in Distributed Query Processing: A Comparative Study and Evaluation
School of Computer Science, University of Windsor, 401 Sunset, Windsor, Ontario, Canada N9B 3P4. Email: joan{at}cs.uwindsor.ca
A new heuristic, called Algorithm W, is presented as a method for efficiently generating cost-effective semijoin schedules for distributed query processing. It uses the concepts of profit, marginal profit and gain to determine a sequence of semijoins to minimize the total volume of data transferred over the network. Reducers, which are small in size and highly selective are constructed and applied in a cost-effective way. We show that in most cases the heuristic has complexity O(nm). However, static strategies, such as Algorithm W rely on accurate estimates to function properly. If the estimates are incorrect then the strategy may be less than optimal. Dynamic strategies, where the schedule of operations is monitored and modified during execution, are proposed as a solution to this problem. We propose two new dynamic heuristics: Algorithm D1 has minimal overheads and does not require schedule modification; Algorithm D2 is a more sophisticated dynamic strategy that has a look ahead phase to determine the best starting point for the optimization and it permits schedule modifications. We evaluate the performance of the heuristics and test the hypothesis that a dynamic strategy allows for better estimates and improved schedules.
Received November 27, 1995. revised November 22, 1996.
* School of Computer Science, University of Windsor, 401 Sunset, Windsor, Ontario, Canada N9B 3P4 Email: joan{at}cs.uwindsor.ca