K-Nearest Neighbor Search by Random Projection Forests

被引:17
|
作者
Yan, Donghui [1 ,2 ]
Wang, Yingjie [3 ]
Wang, Jin [3 ]
Wang, Honggang [3 ]
Li, Zhenpeng [4 ]
机构
[1] Univ Massachusetts, Dept Math, Dartmouth, MA 02747 USA
[2] Univ Massachusetts, Program Data Sci, Dartmouth, MA 02747 USA
[3] Univ Massachusetts, Dept Elect & Comp Engn, Dartmouth, MA 02747 USA
[4] Dali Univ, Dept Math & Comp Sci, Dali 671003, Yunnan, Peoples R China
关键词
Big Data; Vegetation; Computational complexity; Forestry; Data mining; Computers; Search problems; K-nearest neighbors; random projection forests; ensemble; unsupervised learning; ALGORITHMS;
D O I
10.1109/TBDATA.2019.2908178
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
K-nearest neighbor (kNN) search is an important problem in data mining and knowledge discovery. Inspired by the huge success of tree-based methodology and ensemble methods over the last decades, we propose a new method for kNN search, random projection forests (rpForests). rpForests finds nearest neighbors by combining multiple kNN-sensitive trees with each constructed recursively through a series of random projections. As demonstrated by experiments on a wide collection of real datasets, our method achieves a remarkable accuracy in terms of fast decaying missing rate of kNNs and that of discrepancy in the k-th nearest neighbor distances. rpForests has a very low computational complexity as a tree-based methodology. The ensemble nature of rpForests makes it easily parallelized to run on clustered or multicore computers; the running time is expected to be nearly inversely proportional to the number of cores or machines. We give theoretical insights on rpForests by showing the exponential decay of neighboring points being separated by ensemble random projection trees when the ensemble size increases. Our theory can also be used to refine the choice of random projections in the growth of rpForests; experiments show that the effect is remarkable.
引用
收藏
页码:147 / 157
页数:11
相关论文
共 50 条
  • [41] A FUZZY K-NEAREST NEIGHBOR ALGORITHM
    KELLER, JM
    GRAY, MR
    GIVENS, JA
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1985, 15 (04): : 580 - 585
  • [42] CHROMATIC K-NEAREST NEIGHBOR QUERIES
    van der Horst, Thijs
    Loffler, Maarten
    Staals, Frank
    JOURNAL OF COMPUTATIONAL GEOMETRY, 2025, 16 (01)
  • [43] Hybrid k-Nearest Neighbor Classifier
    Yu, Zhiwen
    Chen, Hantao
    Liu, Jiming
    You, Jane
    Leung, Hareton
    Han, Guoqiang
    IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (06) : 1263 - 1275
  • [44] Projection Search For Approximate Nearest Neighbor
    Feng, Cheng
    Yang, Bo
    2016 17TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2016, : 33 - 38
  • [45] Approximate direct and reverse nearest neighbor queries, and the k-nearest neighbor graph
    Figueroa, Karina
    Paredes, Rodrigo
    SISAP 2009: 2009 SECOND INTERNATIONAL WORKSHOP ON SIMILARITY SEARCH AND APPLICATIONS, PROCEEDINGS, 2009, : 91 - +
  • [46] Reverse k-Nearest Neighbor Search Based on Aggregate Point Access Methods
    Kriegel, Hans-Peter
    Kroeger, Peer
    Renz, Matthias
    Zuefle, Andreas
    Katzdobler, Alexander
    SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 2009, 5566 : 444 - 460
  • [47] Random projections fuzzy k-nearest neighbor(RPFKNN) for big data classification
    Popescu, Mihail
    Keller, James M.
    2016 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2016, : 1813 - 1817
  • [48] Pruned Bi-directed K-nearest Neighbor Graph for Proximity Search
    Iwasaki, Masajiro
    SIMILARITY SEARCH AND APPLICATIONS, SISAP 2016, 2016, 9939 : 20 - 33
  • [49] Full Text Search Engine as Scalable k-Nearest Neighbor Recommendation System
    Suchal, Jan
    Navrat, Pavol
    ARTIFICIAL INTELLIGENCE IN THEORY AND PRACTICE III, 2010, 331 : 165 - 173
  • [50] General Distributed Hash Learning on Image Descriptors for k-Nearest Neighbor Search
    Cao, Yuan
    Qi, Heng
    Gui, Jie
    Li, Shuai
    Li, Keqiu
    IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (05) : 750 - 754