K-Nearest Neighbor Search by Random Projection Forests

被引:17
|
作者
Yan, Donghui [1 ,2 ]
Wang, Yingjie [3 ]
Wang, Jin [3 ]
Wang, Honggang [3 ]
Li, Zhenpeng [4 ]
机构
[1] Univ Massachusetts, Dept Math, Dartmouth, MA 02747 USA
[2] Univ Massachusetts, Program Data Sci, Dartmouth, MA 02747 USA
[3] Univ Massachusetts, Dept Elect & Comp Engn, Dartmouth, MA 02747 USA
[4] Dali Univ, Dept Math & Comp Sci, Dali 671003, Yunnan, Peoples R China
关键词
Big Data; Vegetation; Computational complexity; Forestry; Data mining; Computers; Search problems; K-nearest neighbors; random projection forests; ensemble; unsupervised learning; ALGORITHMS;
D O I
10.1109/TBDATA.2019.2908178
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
K-nearest neighbor (kNN) search is an important problem in data mining and knowledge discovery. Inspired by the huge success of tree-based methodology and ensemble methods over the last decades, we propose a new method for kNN search, random projection forests (rpForests). rpForests finds nearest neighbors by combining multiple kNN-sensitive trees with each constructed recursively through a series of random projections. As demonstrated by experiments on a wide collection of real datasets, our method achieves a remarkable accuracy in terms of fast decaying missing rate of kNNs and that of discrepancy in the k-th nearest neighbor distances. rpForests has a very low computational complexity as a tree-based methodology. The ensemble nature of rpForests makes it easily parallelized to run on clustered or multicore computers; the running time is expected to be nearly inversely proportional to the number of cores or machines. We give theoretical insights on rpForests by showing the exponential decay of neighboring points being separated by ensemble random projection trees when the ensemble size increases. Our theory can also be used to refine the choice of random projections in the growth of rpForests; experiments show that the effect is remarkable.
引用
收藏
页码:147 / 157
页数:11
相关论文
共 50 条
  • [21] k-nearest neighbor search based on node density in MANETs
    Komai, Yuka
    Sasaki, Yuya
    Hara, Takahiro
    Nishio, Shojiro
    MOBILE INFORMATION SYSTEMS, 2014, 10 (04) : 385 - 405
  • [22] Boosting k-nearest neighbor classifier by means of input space projection
    Garcia-Pedrajas, Nicolas
    Ortiz-Boyer, Domingo
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (07) : 10570 - 10582
  • [23] Comparative Analysis of K-Nearest Neighbor and Modified K-Nearest Neighbor Algorithm for Data Classification
    Okfalisa
    Mustakim
    Gazalba, Ikbal
    Reza, Nurul Gayatri Indah
    2017 2ND INTERNATIONAL CONFERENCES ON INFORMATION TECHNOLOGY, INFORMATION SYSTEMS AND ELECTRICAL ENGINEERING (ICITISEE): OPPORTUNITIES AND CHALLENGES ON BIG DATA FUTURE INNOVATION, 2017, : 294 - 298
  • [24] Applying an efficient k-nearest neighbor search to forest attribute imputation
    Department of Forest Resources, University of Minnesota, 115 Green Hall, 1530 Cleveland Ave. North, St. Paul, MN 55108, United States
    不详
    For. Sci., 2006, 2 (130-135):
  • [25] Robust Earthquake Cluster Analysis Based on K-Nearest Neighbor Search
    Samadi, Hamid Reza
    Kimiaefar, Roohollah
    Hajian, Alireza
    PURE AND APPLIED GEOPHYSICS, 2020, 177 (12) : 5661 - 5671
  • [26] A real-time monitoring method using random projection and k-nearest neighbor rule for batch process
    Wu, Lan
    Wen, Chenglin
    Zhou, Mei
    Ren, Haipeng
    INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2017, 14 (06):
  • [27] DURS: A Distributed Method for k-Nearest Neighbor Search on Uncertain Graphs
    Li, Xiaodong
    2019 20TH INTERNATIONAL CONFERENCE ON MOBILE DATA MANAGEMENT (MDM 2019), 2019, : 377 - 378
  • [28] MKNN: Modified K-Nearest Neighbor
    Parvin, Hamid
    Alizadeh, Hoscin
    Minael-Bidgoli, Behrouz
    WCECS 2008: WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, 2008, : 831 - 834
  • [29] A GENERALIZED K-NEAREST NEIGHBOR RULE
    PATRICK, EA
    FISCHER, FP
    INFORMATION AND CONTROL, 1970, 16 (02): : 128 - &
  • [30] Improved k-nearest neighbor classification
    Wu, YQ
    Ianakiev, K
    Govindaraju, V
    PATTERN RECOGNITION, 2002, 35 (10) : 2311 - 2318