Nearest Neighbors in High-Dimensional Data: The Emergence and Influence of Hubs (2009)

Authors

Abstract

High dimensionality can pose severe difficulties, widely recognized as different aspects of the curse of dimensionality. In this paper we study a new aspect of the curse pertaining to the distribution of k-occurrences, i.e., the number of times a point appears among the $k$ nearest neighbors of other points in a data set. We show that, as dimensionality increases, this distribution becomes considerably skewed and hub points emerge (points with very high $k$-occurrences). We examine the origin of this phenomenon, showing that it is an inherent property of high-dimensional vector space, and explore its influence on applications based on measuring distances in vector spaces, notably classification, clustering, and information retrieval.

Discussion

Enter your comment (wiki syntax is allowed):
UZADK
 
paper/2009/360.txt · Last modified: 2009/05/24 18:43 (external edit)
 
Driven by DokuWiki