A novel algorithm for scalable k-nearest neighbour graph construction

被引:5
|
作者
Park, Youngki [1 ]
Hwang, Heasoo [2 ]
Lee, Sang-goo [1 ]
机构
[1] Seoul Natl Univ, Seoul 151, South Korea
[2] Univ Seoul, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
collaborative filtering; k-nearest neighbour search; k-nearest neighbour graph construction;
D O I
10.1177/0165551515594728
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Finding the k-nearest neighbours of every node in a dataset is one of the most important data operations with wide application in various areas such as recommendation and information retrieval. However, a major challenge is that the execution time of existing approaches grows rapidly as the number of nodes or dimensions increases. In this paper, we present greedy filtering, an efficient and scalable algorithm for finding an approximate k-nearest neighbour graph. It selects a fixed number of nodes as candidates for every node by filtering out node pairs that do not have any matching dimensions with large values. Greedy filtering achieves consistent approximation accuracy across nodes in linear execution time. We also present a faster version of greedy filtering that uses inverted indices on the node prefixes. Through theoretical analysis, we show that greedy filtering is effective for datasets whose features have Zipfian distribution, a characteristic observed in majority of large datasets. We also conduct extensive comparative experiments against (a) three state-of-the-art algorithms, and (b) three algorithms in related research domains. Our experimental results show that greedy filtering consistently outperforms other algorithms in various types of high-dimensional datasets.
引用
收藏
页码:274 / 288
页数:15
相关论文
共 50 条
  • [21] A multilevel k-nearest neighbour learning algorithm based on k-means clustering
    Ying, Xu
    2007 International Symposium on Computer Science & Technology, Proceedings, 2007, : 250 - 253
  • [22] A Scalable K-Nearest Neighbor Algorithm for Recommendation System Problems
    Sagdic, A.
    Tekinbas, C.
    Arslan, E.
    Kucukyilmaz, T.
    2020 43RD INTERNATIONAL CONVENTION ON INFORMATION, COMMUNICATION AND ELECTRONIC TECHNOLOGY (MIPRO 2020), 2020, : 186 - 191
  • [23] Fast Parallel Cosine K-Nearest Neighbor Graph Construction
    Anastasiu, David C.
    Karypis, George
    PROCEEDINGS OF 2016 6TH WORKSHOP ON IRREGULAR APPLICATIONS: ARCHITECTURE AND ALGORITHMS (IA3), 2016, : 50 - 53
  • [24] Application of genetic algorithm and k-nearest neighbour method in medical fraud detection
    He, HX
    Graco, W
    Yao, X
    SIMULATED EVOLUTION AND LEARNING, 1999, 1585 : 74 - 81
  • [25] Dynamic Data Discretization Technique based on Frequency and K-Nearest Neighbour algorithm
    Ahmed, Almahdi Mohammed
    Abu Bakar, Azuraliza
    Hamdan, Abdul Razak
    2009 2ND CONFERENCE ON DATA MINING AND OPTIMIZATION, 2009, : 10 - 14
  • [26] Continuous k-Nearest Neighbour Strategies Using the mqrtree
    Osborn, Wendy
    ADVANCES IN NETWORK-BASED INFORMATION SYSTEMS, NBIS-2018, 2019, 22 : 168 - 181
  • [27] k-Nearest Neighbour method in functional nonparametric regression
    Burba, Florent
    Ferraty, Frederic
    Vieu, Philippe
    JOURNAL OF NONPARAMETRIC STATISTICS, 2009, 21 (04) : 453 - 469
  • [28] An empirical analysis of the probabilistic K-nearest neighbour classifier
    Manocha, S.
    Girolami, M. A.
    PATTERN RECOGNITION LETTERS, 2007, 28 (13) : 1818 - 1824
  • [29] Multivariate k-Nearest Neighbour Regression for Time Series data - a novel Algorithm for Forecasting UK Electricity Demand
    Al-Qahtani, Fahad H.
    Crone, Sven F.
    2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
  • [30] Central limit theorems for k-nearest neighbour distances
    Penrose, MD
    STOCHASTIC PROCESSES AND THEIR APPLICATIONS, 2000, 85 (02) : 295 - 320