Skip Navigation

The Computer Journal 2005 48(2):220-238; doi:10.1093/comjnl/bxh060
This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Corral, A.
Right arrow Articles by Vassilakopoulos, M.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved. For Permissions, please email: journals.permissions@oupjournals.org

On Approximate Algorithms for Distance-Based Queries using R-trees

Antonio Corral1 * and Michael Vassilakopoulos2 §

1 Department of Languages and Computation, University of Almeria, 04120 Almeria, Spain, 2 Department of Informatics, Technological Educational Institute of Thessaloniki, P.O. Box 14561, GR-541 01, Thessaloniki, Greece

In modern database applications the similarity or dissimilarity of complex objects is examined by performing distance-based queries (DBQs) on data of high dimensionality. The R-tree and its variations are commonly cited multidimensional access methods that can beused for answering such queries. Although the related algorithms work well for low-dimensional data spaces, their performance degrades as the number of dimensions increases (dimensionality curse). In order to obtain acceptable response time in high-dimensional data spaces, algorithms that obtain approximate solutions can be used. Approximation techniques, like N-consider (based on the tree structure), {alpha}-allowance and {varepsilon}-approximate (based on distance), or Time-consider (based on time) can be applied in branch-and-bound algorithms for DBQs inorder to control the trade-off between cost and accuracy of the result. In this paper, we improve previous approximate DBQ algorithms by applying a combination of the approximation techniques in the same query algorithm (hybrid approximation scheme). We investigate the performance of these improvements for one of the most representative DBQs (the K-closest pairs query, K-CPQ) in high-dimensional data spaces, as well as the influence of the algorithmic parameters on the control of the trade-off between the response time and the accuracy of the result. The outcome of the experimental evaluation, using synthetic and real datasets, is the derivation of the outperforming DBQ approximate algorithm for large high-dimensional point datasets.


Received 31 October 2003. revised 17 May 2004.

* Email: acorral{at}ual.es

§ Email: vasilako{at}it.teithe.gr


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.