A novel algorithm for scalable k-nearest neighbour graph construction

被引:5
|
作者
Park, Youngki [1 ]
Hwang, Heasoo [2 ]
Lee, Sang-goo [1 ]
机构
[1] Seoul Natl Univ, Seoul 151, South Korea
[2] Univ Seoul, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
collaborative filtering; k-nearest neighbour search; k-nearest neighbour graph construction;
D O I
10.1177/0165551515594728
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Finding the k-nearest neighbours of every node in a dataset is one of the most important data operations with wide application in various areas such as recommendation and information retrieval. However, a major challenge is that the execution time of existing approaches grows rapidly as the number of nodes or dimensions increases. In this paper, we present greedy filtering, an efficient and scalable algorithm for finding an approximate k-nearest neighbour graph. It selects a fixed number of nodes as candidates for every node by filtering out node pairs that do not have any matching dimensions with large values. Greedy filtering achieves consistent approximation accuracy across nodes in linear execution time. We also present a faster version of greedy filtering that uses inverted indices on the node prefixes. Through theoretical analysis, we show that greedy filtering is effective for datasets whose features have Zipfian distribution, a characteristic observed in majority of large datasets. We also conduct extensive comparative experiments against (a) three state-of-the-art algorithms, and (b) three algorithms in related research domains. Our experimental results show that greedy filtering consistently outperforms other algorithms in various types of high-dimensional datasets.
引用
收藏
页码:274 / 288
页数:15
相关论文
共 50 条
  • [41] EMPIRICAL LIKELIHOOD RATIO BASED K-NEAREST NEIGHBOUR SREGRESSION
    Sukshitha, R.
    Satyanarayana
    INTERNATIONAL JOURNAL OF AGRICULTURAL AND STATISTICAL SCIENCES, 2024, 20 (02): : 421 - 428
  • [42] Examining k-nearest neighbour networks: Superfamily phenomena and inversion
    Khor, Alexander
    Small, Michael
    CHAOS, 2016, 26 (04)
  • [43] A comparative study of k-nearest neighbour techniques in crowd simulation
    Vermeulen, Jordi L.
    Hillebrand, Arne
    Geraerts, Roland
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2017, 28 (3-4)
  • [44] K-Nearest Neighbour Classification for Interval-Valued Data
    Vu-Linh Nguyen
    Destercke, Sebastien
    Masson, Marie-Helene
    SCALABLE UNCERTAINTY MANAGEMENT (SUM 2017), 2017, 10564 : 93 - 106
  • [45] A k-nearest neighbour method for managing the evolution of a learning base
    Henry, JL
    ICCIMA 2001: FOURTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, PROCEEDINGS, 2001, : 357 - +
  • [46] The k-Nearest Neighbour Join: Turbo Charging the KDD Process
    Boehm, Christian
    Krebs, Florian
    KNOWLEDGE AND INFORMATION SYSTEMS, 2004, 6 (06) : 728 - 749
  • [47] VSMs with K-Nearest Neighbour to Categorise Arabic Text Data
    Thabtah, Fadl
    Hadi, Wa'el Musa
    Al-shammare, Gaith
    WCECS 2008: WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, 2008, : 778 - 781
  • [48] Handwritten Digit Recognition Using K-Nearest Neighbour Classifier
    Babu, U. Ravi
    Venkateswarlu, Y.
    Chintha, Aneel Kumar
    2014 WORLD CONGRESS ON COMPUTING AND COMMUNICATION TECHNOLOGIES (WCCCT 2014), 2014, : 60 - +
  • [49] Facial Expression Recognition Using Wavelet and K-Nearest Neighbour
    Kumar, V.
    Basha, A. Sikkander Ali
    SECOND INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN ENGINEERING AND TECHNOLOGY (ICCTET 2014), 2014, : 48 - 52
  • [50] Implementation K-nearest neighbour for student expertise recommendation system
    Taufik, I
    Gerhana, Y. A.
    Ramdani, A., I
    Irfan, M.
    4TH ANNUAL APPLIED SCIENCE AND ENGINEERING CONFERENCE, 2019, 2019, 1402