Evaluating Multi-GPU Sorting with Modern Interconnects

被引:5
|
作者
Maltenberger, Tobias [1 ]
Ilic, Ivan [1 ]
Tolovski, Ilin [1 ]
Rabl, Tilmann [1 ]
机构
[1] Univ Potsdam, Hasso Plattner Inst, Potsdam, Germany
关键词
multi-GPU sorting; high-speed interconnects; database acceleration; ALGORITHM; JOINS; CORE;
D O I
10.1145/3514221.3517842
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
GPUs have become a mainstream accelerator for database operations such as sorting. Most GPU sorting algorithms are single-GPU approaches. They neither harness the full computational power nor exploit the high-bandwidth P2P interconnects of modem multi-GPU platforms. The latest NVLink 2.0 and NVLink 3.0-based NVSwitch interconnects promise unparalleled multi-GPU acceleration. So far, multi-GPU sorting has only been evaluated on systems with PCIe 3.0. In this paper, we analyze serial, parallel, and bidirectional data transfer rates to, from, and between multiple GPUs on systems with PCIe 3.0/4.0, NVLink 2.0/3.0, and NVSwitch. We measure up to 35x higher parallel P2P throughput with NVLink 3.0-based NVSwitch over PCIe 3.0. To study GPU-accelerated sorting on today's hardware, we implement a P2P-based GPU-only (P2P sort) and a heterogeneous (HET sort) multi-GPU sorting algorithm and evaluate them on three modem platforms. We observe speedups over state-of-the-art parallel CPU radix sort of up to 14x for P2P sort and 9x for HET sort. On systems with fast P2P interconnects, P2P sort outperforms HET sort up to 1.65x. Finally, we show that overlapping GPU copy/compute operations does not mitigate the transfer bottleneck when sorting large out-of-core data.
引用
收藏
页码:1795 / 1809
页数:15
相关论文
共 50 条
  • [41] Distributed texture memory in a Multi-GPU environment
    Moerschell, Adam
    Owens, John D.
    COMPUTER GRAPHICS FORUM, 2008, 27 (01) : 130 - 151
  • [42] A Multi-GPU Implementation of a Cellular Genetic Algorithm
    Vidal, Pablo
    Alba, Enrique
    2010 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2010,
  • [43] Accelerating MapReduce framework on multi-GPU systems
    Hai Jiang
    Yi Chen
    Zhi Qiao
    Kuan-Ching Li
    WonWoo Ro
    Jean-Luc Gaudiot
    Cluster Computing, 2014, 17 : 293 - 301
  • [44] Scalable multi-gpu cloud raytracing with OpenGL
    Chochlik, Matus
    2014 10TH INTERNATIONAL CONFERENCE ON DIGITAL TECHNOLOGIES (DT), 2014, : 87 - 95
  • [45] Scalable Betweenness Centrality on Multi-GPU systems
    Bernaschi, Massimo
    Carbone, Giancarlo
    Vella, Flavio
    PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF'16), 2016, : 29 - 36
  • [46] An Empirical Evaluation of Allgatherv on Multi-GPU Systems
    Rolinger, Thomas B.
    Simon, Tyler A.
    Krieger, Christopher D.
    2018 18TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2018, : 123 - 132
  • [47] High-Performance Adaptive MPI Derived Datatype Communication for Modern Multi-GPU Systems
    Chu, Ching-Hsiang
    Hashmi, Jahanzeb Maqbool
    Khorassani, Kawthar Shafie
    Subramoni, Hari
    Panda, Dhabaleswar K.
    2019 IEEE 26TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC), 2019, : 267 - 276
  • [48] Multi-GPU Design and Performance Evaluation of Homomorphic Encryption on GPU Clusters
    Al Badawi, Ahmad
    Veeravalli, Bharadwaj
    Lin, Jie
    Xiao, Nan
    Kazuaki, Matsumura
    Khin Mi Mi, Aung
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (02) : 379 - 391
  • [49] GPU-Centered Parallel Model on Heterogeneous Multi-GPU Clusters
    Wang, Feng
    PROCEEDINGS OF 2012 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2012), 2012, : 1865 - 1868
  • [50] Financial applications on multi-CPU and multi-GPU architectures
    Department of Computer Science and Electronics, Universidad de Cantabria, Santander, Spain
    不详
    J Supercomput, 2 (729-739):