K-Nearest Neighbor Search by Random Projection Forests

被引:17
|
作者
Yan, Donghui [1 ,2 ]
Wang, Yingjie [3 ]
Wang, Jin [3 ]
Wang, Honggang [3 ]
Li, Zhenpeng [4 ]
机构
[1] Univ Massachusetts, Dept Math, Dartmouth, MA 02747 USA
[2] Univ Massachusetts, Program Data Sci, Dartmouth, MA 02747 USA
[3] Univ Massachusetts, Dept Elect & Comp Engn, Dartmouth, MA 02747 USA
[4] Dali Univ, Dept Math & Comp Sci, Dali 671003, Yunnan, Peoples R China
关键词
Big Data; Vegetation; Computational complexity; Forestry; Data mining; Computers; Search problems; K-nearest neighbors; random projection forests; ensemble; unsupervised learning; ALGORITHMS;
D O I
10.1109/TBDATA.2019.2908178
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
K-nearest neighbor (kNN) search is an important problem in data mining and knowledge discovery. Inspired by the huge success of tree-based methodology and ensemble methods over the last decades, we propose a new method for kNN search, random projection forests (rpForests). rpForests finds nearest neighbors by combining multiple kNN-sensitive trees with each constructed recursively through a series of random projections. As demonstrated by experiments on a wide collection of real datasets, our method achieves a remarkable accuracy in terms of fast decaying missing rate of kNNs and that of discrepancy in the k-th nearest neighbor distances. rpForests has a very low computational complexity as a tree-based methodology. The ensemble nature of rpForests makes it easily parallelized to run on clustered or multicore computers; the running time is expected to be nearly inversely proportional to the number of cores or machines. We give theoretical insights on rpForests by showing the exponential decay of neighboring points being separated by ensemble random projection trees when the ensemble size increases. Our theory can also be used to refine the choice of random projections in the growth of rpForests; experiments show that the effect is remarkable.
引用
收藏
页码:147 / 157
页数:11
相关论文
共 50 条
  • [31] Applying an efficient k-nearest neighbor search to forest attribute imputation
    Finley, AO
    McRoberts, RE
    Ek, AR
    FOREST SCIENCE, 2006, 52 (02) : 130 - 135
  • [32] An Adaptive Search Range Method for HEVC with the K-Nearest Neighbor Algorithm
    Li, Yuchen
    Liu, Yitong
    Yang, Hongwen
    Yang, Dacheng
    2015 VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2015,
  • [33] Navigating K-Nearest Neighbor Graphs to Solve Nearest Neighbor Searches
    Chavez, Edgar
    Sadit Tellez, Eric
    ADVANCES IN PATTERN RECOGNITION, 2010, 6256 : 270 - 280
  • [34] A Centroid k-Nearest Neighbor Method
    Zhang, Qingjiu
    Sun, Shiliang
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2010, PT I, 2010, 6440 : 278 - 285
  • [35] Robust Earthquake Cluster Analysis Based on K-Nearest Neighbor Search
    Hamid Reza Samadi
    Roohollah Kimiaefar
    Alireza Hajian
    Pure and Applied Geophysics, 2020, 177 : 5661 - 5671
  • [36] Asymmetric Locality Preserving Projection and Its Application to k-Nearest Neighbor Method
    Iwai, Yoshio
    Nishiyama, Masashi
    Yoshimura, Hiroki
    PROCEEDINGS OF THE FIFTEENTH IAPR INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS - MVA2017, 2017, : 55 - 58
  • [37] Validation of k-Nearest Neighbor Classifiers
    Bax, Eric
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2012, 58 (05) : 3225 - 3234
  • [38] Quantum K-nearest neighbor algorithm
    Chen, Hanwu
    Gao, Yue
    Zhang, Jun
    Dongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of Southeast University (Natural Science Edition), 2015, 45 (04): : 647 - 651
  • [39] Analysis of the k-nearest neighbor classification
    Li, Jing
    Cheng, Ming
    INFORMATION SCIENCE AND MANAGEMENT ENGINEERING, VOLS 1-3, 2014, 46 : 1911 - 1917
  • [40] Weighted K-Nearest Neighbor Revisited
    Bicego, M.
    Loog, M.
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 1642 - 1647