Efficient Computation of k-Nearest Neighbour Graphs for Large High-Dimensional Data Sets on GPU Clusters

被引：10

作者：

Dashti, Ali ^{[1
]}

Komarov, Ivan ^{[1
]}

D'Souza, Roshan M. ^{[1
]}

机构：

[1] Univ Wisconsin, Complex Syst Simulat Lab, Dept Mech Engn, Milwaukee, WI 53201 USA

来源：

PLOS ONE | 2013年 / 8卷 / 09期

基金：

美国国家科学基金会;

关键词：

CONSTRUCTION;

D O I：

10.1371/journal.pone.0074113

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

This paper presents an implementation of the brute-force exact k-Nearest Neighbor Graph (k-NNG) construction for ultra-large high-dimensional data cloud. The proposed method uses Graphics Processing Units (GPUs) and is scalable with multi-levels of parallelism (between nodes of a cluster, between different GPUs on a single node, and within a GPU). The method is applicable to homogeneous computing clusters with a varying number of nodes and GPUs per node. We achieve a 6-fold speedup in data processing as compared with an optimized method running on a cluster of CPUs and bring a hitherto impossible k-NNG generation for a dataset of twenty million images with 15 k dimensionality into the realm of practical possibility.

引用

页数：12

共 50 条

[31] Approximate k-Nearest Neighbor Query of High Dimensional Data Based on Dimension Grouping and Reducing
Li S.
Hu Y.
Hao X.
Zhang L.
Hao Z.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2021, 58 (03): : 609 - 623
[32] GPU-Embedding of kNN-Graph Representing Large and High-Dimensional Data
Minch, Bartosz
Nowak, Mateusz
Wcislo, Rafal
Dzwinel, Witold
COMPUTATIONAL SCIENCE - ICCS 2020, PT II, 2020, 12138 : 322 - 336
[33] LSR-forest: An locality sensitive hashing-based approximate k-nearest neighbor query algorithm on high-dimensional uncertain data
Wang, Jiagang
Qian, Tu
Yang, Anbang
Wang, Hui
Qian, Jiangbo
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (08):
[34] Automatic high-dimensional association rule generation for large relational data sets
Zhang, W
Wang, G
ICCI 2005: FOURTH IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS - PROCEEDINGS, 2005, : 136 - 143
[35] Visualization of very large high-dimensional data sets as minimum spanning trees
Daniel Probst
Jean-Louis Reymond
Journal of Cheminformatics, 12
[36] Visualization of very large high-dimensional data sets as minimum spanning trees
Probst, Daniel
Reymond, Jean-Louis
JOURNAL OF CHEMINFORMATICS, 2020, 12 (01)
[37] An Efficient Framework for Approximate Nearest Neighbor Search on High-Dimensional Multi-metric Data
Uemura, Reon
Amagata, Daichi
Hara, Takahiro
SIMILARITY SEARCH AND APPLICATIONS, SISAP 2024, 2025, 15268 : 3 - 17
[38] A Valid Clustering Algorithm for High-dimensional Large Data Sets Based on Distributed Method
Guo Xian e
Yan Junmei
PROCEEDINGS OF 2009 INTERNATIONAL WORKSHOP ON INFORMATION SECURITY AND APPLICATION, 2009, : 1 - 6
[39] SPARSE LEAST TRIMMED SQUARES REGRESSION FOR ANALYZING HIGH-DIMENSIONAL LARGE DATA SETS
Alfons, Andreas
Croux, Christophe
Gelper, Sarah
ANNALS OF APPLIED STATISTICS, 2013, 7 (01): : 226 - 248
[40] Approximate single linkage cluster analysis of large data sets in high-dimensional spaces
Eddy, WF
Mockus, A
Oue, SG
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1996, 23 (01) : 29 - 43

← 1 2 3 4 5 →