Skip Navigation


The Computer Journal Advance Access originally published online on June 21, 2006
The Computer Journal 2006 49(6):670-684; doi:10.1093/comjnl/bxl035
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
49/6/670    most recent
bxl035v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Thomasian, A.
Right arrow Articles by Zhang, L.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Persistent Semi-Dynamic Ordered Partition Index

Alexander Thomasian* and Lijuan Zhang {dagger}

Department of Computer Science, New Jersey Institute of Technology NJIT, Newark, NJ 07102, USA

*Corresponding author: alexthomasian{at}gmail.com

Similarity search is a popular paradigm in advanced database applications. In content based image retrieval (CBIR) for example, images are transformed into feature vectors, which are then used for similarity search via k-nearest-neighbor (k-NN) queries in the feature vector space. Clustering by building a disk resident index is one method to speed up the processing of k-NN queries. In the case of high-dimensional feature vectors the dimensionality curse results in a high degree of overlap among the minimum bounding rectangles of the index, which results in most pages of the index being accessed. This is especially detrimental to performance, since disk positioning time for random disk accesses is slow and improving only at a rate of 8% annually. We propose an alternative solution to indexing high-dimensional data, which takes advantage of increasing main memory sizes and the 40% annual improvement in disk transfer rates, More specifically we make the Ordered-Partition—OP-tree, which is a main memory resident index, persistent by writing it onto disk. We investigate the optimization of OP-tree parameters and compare its performance with the sequential scan method with and without Karhunen–Loève transformation. We use serialization to compact the dynamically allocated nodes of the OP-tree in main memory, which form a linked list, into a contiguous area. The index can then be saved on disk as a single file and loaded into main memory by a single transfer. The original OP-tree is static, so we propose several methods to support the insertion of new points dynamically. We compare these methods from the viewpoints of time and space efficiency. We also study the effect of incrementally building the index with and without applying the Karhunen-Loève transformation. We compare the processing time of k-NN queries on persistent OP-trees and SR-trees to demonstrate the viability of the proposed method. We use one synthetic and three real world datasets in our experiments.

Key Words: Multimedia databases • feature vectors • similarity search • nearest neighbor queries • high dimensional indexing • persistent index



Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.