Efficient Computation of k-Nearest Neighbour Graphs for Large High-Dimensional Data Sets on GPU Clusters

被引:10
|
作者
Dashti, Ali [1 ]
Komarov, Ivan [1 ]
D'Souza, Roshan M. [1 ]
机构
[1] Univ Wisconsin, Complex Syst Simulat Lab, Dept Mech Engn, Milwaukee, WI 53201 USA
来源
PLOS ONE | 2013年 / 8卷 / 09期
基金
美国国家科学基金会;
关键词
CONSTRUCTION;
D O I
10.1371/journal.pone.0074113
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
This paper presents an implementation of the brute-force exact k-Nearest Neighbor Graph (k-NNG) construction for ultra-large high-dimensional data cloud. The proposed method uses Graphics Processing Units (GPUs) and is scalable with multi-levels of parallelism (between nodes of a cluster, between different GPUs on a single node, and within a GPU). The method is applicable to homogeneous computing clusters with a varying number of nodes and GPUs per node. We achieve a 6-fold speedup in data processing as compared with an optimized method running on a cluster of CPUs and bring a hitherto impossible k-NNG generation for a dataset of twenty million images with 15 k dimensionality into the realm of practical possibility.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Approximate k-Nearest Neighbor Query of High Dimensional Data Based on Dimension Grouping and Reducing
    Li S.
    Hu Y.
    Hao X.
    Zhang L.
    Hao Z.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2021, 58 (03): : 609 - 623
  • [32] GPU-Embedding of kNN-Graph Representing Large and High-Dimensional Data
    Minch, Bartosz
    Nowak, Mateusz
    Wcislo, Rafal
    Dzwinel, Witold
    COMPUTATIONAL SCIENCE - ICCS 2020, PT II, 2020, 12138 : 322 - 336
  • [33] LSR-forest: An locality sensitive hashing-based approximate k-nearest neighbor query algorithm on high-dimensional uncertain data
    Wang, Jiagang
    Qian, Tu
    Yang, Anbang
    Wang, Hui
    Qian, Jiangbo
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (08):
  • [34] Automatic high-dimensional association rule generation for large relational data sets
    Zhang, W
    Wang, G
    ICCI 2005: FOURTH IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS - PROCEEDINGS, 2005, : 136 - 143
  • [35] Visualization of very large high-dimensional data sets as minimum spanning trees
    Daniel Probst
    Jean-Louis Reymond
    Journal of Cheminformatics, 12
  • [36] Visualization of very large high-dimensional data sets as minimum spanning trees
    Probst, Daniel
    Reymond, Jean-Louis
    JOURNAL OF CHEMINFORMATICS, 2020, 12 (01)
  • [37] An Efficient Framework for Approximate Nearest Neighbor Search on High-Dimensional Multi-metric Data
    Uemura, Reon
    Amagata, Daichi
    Hara, Takahiro
    SIMILARITY SEARCH AND APPLICATIONS, SISAP 2024, 2025, 15268 : 3 - 17
  • [38] A Valid Clustering Algorithm for High-dimensional Large Data Sets Based on Distributed Method
    Guo Xian e
    Yan Junmei
    PROCEEDINGS OF 2009 INTERNATIONAL WORKSHOP ON INFORMATION SECURITY AND APPLICATION, 2009, : 1 - 6
  • [39] SPARSE LEAST TRIMMED SQUARES REGRESSION FOR ANALYZING HIGH-DIMENSIONAL LARGE DATA SETS
    Alfons, Andreas
    Croux, Christophe
    Gelper, Sarah
    ANNALS OF APPLIED STATISTICS, 2013, 7 (01): : 226 - 248
  • [40] Approximate single linkage cluster analysis of large data sets in high-dimensional spaces
    Eddy, WF
    Mockus, A
    Oue, SG
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1996, 23 (01) : 29 - 43