@salotz

Measuring the Difficulty of Distance-Based Indexing

. String Processing and Information Retrieval, page 103--114. Berlin, Heidelberg, Springer Berlin Heidelberg, (2005)

Abstract

Data structures for similarity search are commonly evaluated on data in vector spaces, but distance-based data structures are also applicable to non-vector spaces with no natural concept of dimensionality. The intrinsic dimensionality statistic of Chávez and Navarro provides a way to compare the performance of similarity indexing and search algorithms across different spaces, and predict the performance of index data structures on non-vector spaces by relating them to equivalent vector spaces. We characterise its asymptotic behaviour, and give experimental results to calibrate these comparisons.

Description

Measuring the Difficulty of Distance-Based Indexing | SpringerLink

Links and resources

Tags

community