K-Nearest Neighbor Search by Random Projection Forests

被引:17
|
作者
Yan, Donghui [1 ,2 ]
Wang, Yingjie [3 ]
Wang, Jin [3 ]
Wang, Honggang [3 ]
Li, Zhenpeng [4 ]
机构
[1] Univ Massachusetts, Dept Math, Dartmouth, MA 02747 USA
[2] Univ Massachusetts, Program Data Sci, Dartmouth, MA 02747 USA
[3] Univ Massachusetts, Dept Elect & Comp Engn, Dartmouth, MA 02747 USA
[4] Dali Univ, Dept Math & Comp Sci, Dali 671003, Yunnan, Peoples R China
关键词
Big Data; Vegetation; Computational complexity; Forestry; Data mining; Computers; Search problems; K-nearest neighbors; random projection forests; ensemble; unsupervised learning; ALGORITHMS;
D O I
10.1109/TBDATA.2019.2908178
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
K-nearest neighbor (kNN) search is an important problem in data mining and knowledge discovery. Inspired by the huge success of tree-based methodology and ensemble methods over the last decades, we propose a new method for kNN search, random projection forests (rpForests). rpForests finds nearest neighbors by combining multiple kNN-sensitive trees with each constructed recursively through a series of random projections. As demonstrated by experiments on a wide collection of real datasets, our method achieves a remarkable accuracy in terms of fast decaying missing rate of kNNs and that of discrepancy in the k-th nearest neighbor distances. rpForests has a very low computational complexity as a tree-based methodology. The ensemble nature of rpForests makes it easily parallelized to run on clustered or multicore computers; the running time is expected to be nearly inversely proportional to the number of cores or machines. We give theoretical insights on rpForests by showing the exponential decay of neighboring points being separated by ensemble random projection trees when the ensemble size increases. Our theory can also be used to refine the choice of random projections in the growth of rpForests; experiments show that the effect is remarkable.
引用
收藏
页码:147 / 157
页数:11
相关论文
共 50 条
  • [1] K-nearest Neighbor Search by Random Projection Forests
    Yan, Donghui
    Wang, Yingjie
    Wang, Jin
    Wang, Honggang
    Li, Zhenpeng
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 4775 - 4781
  • [2] k-Nearest Neighbor Regressors Optimized by using Random Search
    Ortiz-Bejar, Jose
    Graff, Mario
    Tellez, Eric S.
    Ortiz-Bejar, Jesus
    Cerda Jacobo, Jaime
    2018 IEEE INTERNATIONAL AUTUMN MEETING ON POWER, ELECTRONICS AND COMPUTING (ROPEC), 2018,
  • [3] Distributed Sparse Random Projection Trees for Constructing K-Nearest Neighbor Graphs
    Ranawaka, Isuru
    Rahmant, Md Khaledur
    Azad, Ariful
    2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS, 2023, : 36 - 46
  • [4] Continuous k-nearest neighbor search for moving objects
    Li, YF
    Yang, J
    Han, JW
    16TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 2004, : 123 - 126
  • [5] Dimensional Testing for Reverse k-Nearest Neighbor Search
    Casanova, Guillaume
    Englmeier, Elias
    Houle, Michael E.
    Kroeger, Peer
    Nett, Michael
    Schubert, Erich
    Zimek, Arthur
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (07): : 769 - 780
  • [6] Anytime k-nearest neighbor search for database applications
    Xu, Weijia
    Miranker, Daniel P.
    Mao, Rui
    Ramakrishnan, Smriti
    SISAP 2008: FIRST INTERNATIONAL WORKSHOP ON SIMILARITY SEARCH AND APPLICATIONS, PROCEEDINGS, 2008, : 139 - +
  • [7] Reverse k-nearest neighbor search in the presence of obstacles
    Gao, Yunjun
    Liu, Qing
    Miao, Xiaoye
    Yang, Jiacheng
    INFORMATION SCIENCES, 2016, 330 : 274 - 292
  • [8] k-nearest reliable neighbor search in crowdsourced LBSs
    Jang, Hong-Jun
    Kim, Byoungwook
    Jung, Soon-Young
    INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, 2021, 34 (02)
  • [9] Anytime K-nearest neighbor search for database applications
    Xu, Weijia
    Miranker, Daniel
    Mao, Rui
    Ramakrishnan, Smriti
    2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOP, VOLS 1 AND 2, 2008, : 586 - +
  • [10] Random K-nearest neighbor algorithm with learning process
    Fu Z.-L.
    Chen X.-Q.
    Ren W.
    Yao Y.
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2024, 54 (01): : 209 - 220