A novel algorithm for scalable k-nearest neighbour graph construction

被引:5
|
作者
Park, Youngki [1 ]
Hwang, Heasoo [2 ]
Lee, Sang-goo [1 ]
机构
[1] Seoul Natl Univ, Seoul 151, South Korea
[2] Univ Seoul, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
collaborative filtering; k-nearest neighbour search; k-nearest neighbour graph construction;
D O I
10.1177/0165551515594728
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Finding the k-nearest neighbours of every node in a dataset is one of the most important data operations with wide application in various areas such as recommendation and information retrieval. However, a major challenge is that the execution time of existing approaches grows rapidly as the number of nodes or dimensions increases. In this paper, we present greedy filtering, an efficient and scalable algorithm for finding an approximate k-nearest neighbour graph. It selects a fixed number of nodes as candidates for every node by filtering out node pairs that do not have any matching dimensions with large values. Greedy filtering achieves consistent approximation accuracy across nodes in linear execution time. We also present a faster version of greedy filtering that uses inverted indices on the node prefixes. Through theoretical analysis, we show that greedy filtering is effective for datasets whose features have Zipfian distribution, a characteristic observed in majority of large datasets. We also conduct extensive comparative experiments against (a) three state-of-the-art algorithms, and (b) three algorithms in related research domains. Our experimental results show that greedy filtering consistently outperforms other algorithms in various types of high-dimensional datasets.
引用
收藏
页码:274 / 288
页数:15
相关论文
共 50 条
  • [31] Modified K-nearest neighbour filters for simple implementation
    Gevorkian, D
    Egiazarian, K
    Astola, J
    ISCAS 2000: IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - PROCEEDINGS, VOL IV: EMERGING TECHNOLOGIES FOR THE 21ST CENTURY, 2000, : 565 - 568
  • [32] Application of genetic algorithm/K-nearest neighbour method to the classification of renal cell carcinoma
    Liu, DQ
    Shi, T
    DiDonato, JA
    Carpten, JD
    Zhu, JP
    Duan, ZH
    2004 IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE, PROCEEDINGS, 2004, : 558 - 559
  • [33] Using Boosted k-Nearest Neighbour Algorithm for Numerical Forecasting of Dangerous Convective Phenomena
    Stankova, E. N.
    Khvatkov, E., V
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, ICCSA 2019, PT IV, 2019, 11622 : 802 - 811
  • [34] Love Thy Neighbour: Automatic Animal Behavioural Classification of Acceleration Data Using the K-Nearest Neighbour Algorithm
    Bidder, Owen R.
    Campbell, Hamish A.
    Gomez-Laich, Agustina
    Urge, Patricia
    Walker, James
    Cai, Yuzhi
    Gao, Lianli
    Quintana, Flavio
    Wilson, Rory P.
    PLOS ONE, 2014, 9 (02):
  • [35] PREDICTION OF BREAST CANCER USING K-NEAREST NEIGHBOUR: A SUPERVISED MACHINE LEARNING ALGORITHM
    Pandey, S.
    Sharma, A.
    Siddiqui, M. K.
    Singla, D.
    Vanderpuye-Orgle, J.
    VALUE IN HEALTH, 2020, 23 : S1 - S1
  • [36] Fast small-kernel K-nearest neighbour noise-smoothing algorithm
    Mitchell, Harvey B.
    European transactions on telecommunications and related technologies, 1995, 6 (05): : 609 - 612
  • [37] Indoor Tracking with Bluetooth Low Energy Devices Using K-Nearest Neighbour Algorithm
    Kee, Koon Lie
    Shien, Kwok Yeo
    Ngoh, Alvin Kee Ting
    Tze, David Heng Chieng
    IEEE 10TH SYMPOSIUM ON COMPUTER APPLICATIONS AND INDUSTRIAL ELECTRONICS (ISCAIE 2020), 2020, : 155 - 159
  • [38] Estimation of voltage instability inception time by employing k-nearest neighbour learning algorithm
    Khalilifar, Mahtab
    Joorabian, Mahmood
    Seifosadat, Ghodratollah
    Shahrtash, Seyed Mohammad
    IET GENERATION TRANSMISSION & DISTRIBUTION, 2019, 13 (14) : 2907 - 2918
  • [39] K-nearest neighbors in uncertain graph
    Zhang, Yinglong
    Li, Cuiping
    Chen, Hong
    Du, Lingxia
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2011, 48 (10): : 1850 - 1858
  • [40] Prediction of Heart Disease Using K-Nearest Neighbour Algorithm in Comparison with Support Vector Machine Algorithm
    Saikumar, Dara D. V. V. N. S.
    Priyadarsini, Uma P. S.
    Arumugam, I. Meignana
    2022 14TH INTERNATIONAL CONFERENCE ON MATHEMATICS, ACTUARIAL SCIENCE, COMPUTER SCIENCE AND STATISTICS (MACS), 2022,