Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data

被引:0
|
作者
Radovanovic, Milos [1 ]
Nanopoulos, Alexandros [2 ]
Ivanovic, Mirjana [1 ]
机构
[1] Univ Novi Sad, Dept Math & Informat, Novi Sad 21000, Serbia
[2] Univ Hildesheim, Inst Comp Sci, D-31141 Hildesheim, Germany
关键词
nearest neighbors; curse of dimensionality; classification; semi-supervised learning; clustering; POINT-PROCESSES; REDUCTION; ALGORITHMS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Different aspects of the curse of dimensionality are known to present serious challenges to various machine-learning methods and tasks. This paper explores a new aspect of the dimensionality curse, referred to as hubness, that affects the distribution of k-occurrences: the number of times a point appears among the k nearest neighbors of other points in a data set. Through theoretical and empirical analysis involving synthetic and real data sets we show that under commonly used assumptions this distribution becomes considerably skewed as dimensionality increases, causing the emergence of hubs, that is, points with very high k-occurrences which effectively represent "popular" nearest neighbors. We examine the origins of this phenomenon, showing that it is an inherent property of data distributions in high-dimensional vector space, discuss its interaction with dimensionality reduction, and explore its influence on a wide range of machine-learning tasks directly or indirectly based on measuring distances, belonging to supervised, semi-supervised, and unsupervised learning families.
引用
收藏
页码:2487 / 2531
页数:45
相关论文
共 50 条
  • [1] Hubs in space: Popular nearest neighbors in high-dimensional data
    Radovanović, Miloš
    Nanopoulos, Alexandros
    Ivanović, Mirjana
    Journal of Machine Learning Research, 2010, 11 : 2487 - 2531
  • [2] Reporting Neighbors in High-Dimensional Euclidean Space
    Aiger, Dror
    Kaplan, Haim
    Sharir, Micha
    PROCEEDINGS OF THE TWENTY-FOURTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS (SODA 2013), 2013, : 784 - 803
  • [3] REPORTING NEIGHBORS IN HIGH-DIMENSIONAL EUCLIDEAN SPACE
    Aiger, Dror
    Kaplan, Haim
    Sharir, Micha
    SIAM JOURNAL ON COMPUTING, 2014, 43 (04) : 1363 - 1395
  • [4] High-dimensional feature matching: Employing the concept of meaningful nearest neighbors
    Omercevic, Dusan
    Drbohlav, Ondrej
    Leonardis, Ales
    2007 IEEE 11TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS 1-6, 2007, : 409 - 416
  • [5] Can Shared Nearest Neighbors Reduce Hubness in High-Dimensional Spaces?
    Flexer, Arthur
    Schnitzer, Dominik
    2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2013, : 460 - 467
  • [6] Supporting K nearest neighbors query on high-dimensional data in P2P systems
    Li M.
    Lee W.-C.
    Sivasubramaniam A.
    Zhao J.
    Front. Comput. Sci. China, 2008, 3 (234-247): : 234 - 247
  • [7] Weighted k-nearest neighbors feature selection for high-dimensional multi-class data
    Bugata, Peter
    Drotar, Peter
    2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 3066 - 3073
  • [9] Fast nearest neighbor search in high-dimensional space
    Berchtold, S
    Ertl, B
    Keim, DA
    Kriegel, HP
    Seidl, T
    14TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1998, : 209 - 218
  • [10] An efficient LSH indexing on discriminative short codes for high-dimensional nearest neighbors
    Feng Xiaokang
    Cui Jiangtao
    Li Hui
    Liu Yingfan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (17) : 24407 - 24429