Evaluating Multi-GPU Sorting with Modern Interconnects

被引：5

作者：

Maltenberger, Tobias ^{[1
]}

Ilic, Ivan ^{[1
]}

Tolovski, Ilin ^{[1
]}

Rabl, Tilmann ^{[1
]}

机构：

[1] Univ Potsdam, Hasso Plattner Inst, Potsdam, Germany

来源：

PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22) | 2022年

关键词：

multi-GPU sorting; high-speed interconnects; database acceleration; ALGORITHM; JOINS; CORE;

D O I：

10.1145/3514221.3517842

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

GPUs have become a mainstream accelerator for database operations such as sorting. Most GPU sorting algorithms are single-GPU approaches. They neither harness the full computational power nor exploit the high-bandwidth P2P interconnects of modem multi-GPU platforms. The latest NVLink 2.0 and NVLink 3.0-based NVSwitch interconnects promise unparalleled multi-GPU acceleration. So far, multi-GPU sorting has only been evaluated on systems with PCIe 3.0. In this paper, we analyze serial, parallel, and bidirectional data transfer rates to, from, and between multiple GPUs on systems with PCIe 3.0/4.0, NVLink 2.0/3.0, and NVSwitch. We measure up to 35x higher parallel P2P throughput with NVLink 3.0-based NVSwitch over PCIe 3.0. To study GPU-accelerated sorting on today's hardware, we implement a P2P-based GPU-only (P2P sort) and a heterogeneous (HET sort) multi-GPU sorting algorithm and evaluate them on three modem platforms. We observe speedups over state-of-the-art parallel CPU radix sort of up to 14x for P2P sort and 9x for HET sort. On systems with fast P2P interconnects, P2P sort outperforms HET sort up to 1.65x. Finally, we show that overlapping GPU copy/compute operations does not mitigate the transfer bottleneck when sorting large out-of-core data.

引用

页码：1795 / 1809

页数：15

共 50 条

[41] Distributed texture memory in a Multi-GPU environment
Moerschell, Adam
Owens, John D.
COMPUTER GRAPHICS FORUM, 2008, 27 (01) : 130 - 151
[42] A Multi-GPU Implementation of a Cellular Genetic Algorithm
Vidal, Pablo
Alba, Enrique
2010 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2010,
[43] Accelerating MapReduce framework on multi-GPU systems
Hai Jiang
Yi Chen
Zhi Qiao
Kuan-Ching Li
WonWoo Ro
Jean-Luc Gaudiot
Cluster Computing, 2014, 17 : 293 - 301
[44] Scalable multi-gpu cloud raytracing with OpenGL
Chochlik, Matus
2014 10TH INTERNATIONAL CONFERENCE ON DIGITAL TECHNOLOGIES (DT), 2014, : 87 - 95
[45] Scalable Betweenness Centrality on Multi-GPU systems
Bernaschi, Massimo
Carbone, Giancarlo
Vella, Flavio
PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF'16), 2016, : 29 - 36
[46] An Empirical Evaluation of Allgatherv on Multi-GPU Systems
Rolinger, Thomas B.
Simon, Tyler A.
Krieger, Christopher D.
2018 18TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2018, : 123 - 132
[47] High-Performance Adaptive MPI Derived Datatype Communication for Modern Multi-GPU Systems
Chu, Ching-Hsiang
Hashmi, Jahanzeb Maqbool
Khorassani, Kawthar Shafie
Subramoni, Hari
Panda, Dhabaleswar K.
2019 IEEE 26TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC), 2019, : 267 - 276
[48] Multi-GPU Design and Performance Evaluation of Homomorphic Encryption on GPU Clusters
Al Badawi, Ahmad
Veeravalli, Bharadwaj
Lin, Jie
Xiao, Nan
Kazuaki, Matsumura
Khin Mi Mi, Aung
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (02) : 379 - 391
[49] GPU-Centered Parallel Model on Heterogeneous Multi-GPU Clusters
Wang, Feng
PROCEEDINGS OF 2012 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2012), 2012, : 1865 - 1868
[50] Financial applications on multi-CPU and multi-GPU architectures
Department of Computer Science and Electronics, Universidad de Cantabria, Santander, Spain
不详
J Supercomput, 2 (729-739):

← 1 2 3 4 5 →