Evaluating Multi-GPU Sorting with Modern Interconnects

被引:5
|
作者
Maltenberger, Tobias [1 ]
Ilic, Ivan [1 ]
Tolovski, Ilin [1 ]
Rabl, Tilmann [1 ]
机构
[1] Univ Potsdam, Hasso Plattner Inst, Potsdam, Germany
关键词
multi-GPU sorting; high-speed interconnects; database acceleration; ALGORITHM; JOINS; CORE;
D O I
10.1145/3514221.3517842
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
GPUs have become a mainstream accelerator for database operations such as sorting. Most GPU sorting algorithms are single-GPU approaches. They neither harness the full computational power nor exploit the high-bandwidth P2P interconnects of modem multi-GPU platforms. The latest NVLink 2.0 and NVLink 3.0-based NVSwitch interconnects promise unparalleled multi-GPU acceleration. So far, multi-GPU sorting has only been evaluated on systems with PCIe 3.0. In this paper, we analyze serial, parallel, and bidirectional data transfer rates to, from, and between multiple GPUs on systems with PCIe 3.0/4.0, NVLink 2.0/3.0, and NVSwitch. We measure up to 35x higher parallel P2P throughput with NVLink 3.0-based NVSwitch over PCIe 3.0. To study GPU-accelerated sorting on today's hardware, we implement a P2P-based GPU-only (P2P sort) and a heterogeneous (HET sort) multi-GPU sorting algorithm and evaluate them on three modem platforms. We observe speedups over state-of-the-art parallel CPU radix sort of up to 14x for P2P sort and 9x for HET sort. On systems with fast P2P interconnects, P2P sort outperforms HET sort up to 1.65x. Finally, we show that overlapping GPU copy/compute operations does not mitigate the transfer bottleneck when sorting large out-of-core data.
引用
收藏
页码:1795 / 1809
页数:15
相关论文
共 50 条
  • [1] Benchmarking multi-GPU applications on modern multi-GPU integrated systems
    Bernaschi, Massimo
    Agostini, Elena
    Rossetti, Davide
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (14):
  • [2] Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite
    Li, Ang
    Song, Shuaiwen Leon
    Chen, Jieyang
    Liu, Xu
    Tallent, Nathan
    Barker, Kevin
    2018 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC), 2018, : 191 - 202
  • [3] Modelling Multi-GPU Systems
    Spampinato, Daniele G.
    Elster, Anne C.
    Natvig, Thorvald
    PARALLEL COMPUTING: FROM MULTICORES AND GPU'S TO PETASCALE, 2010, 19 : 562 - 569
  • [4] MAPREDUCE IMPLEMENTATION WITH MULTI-GPU
    Chen, Yi
    Chen, Su
    Jiang, Hai
    INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE & TECHNOLOGY: PROCEEDINGS, 2012, : 21 - 25
  • [5] Multi-GPU Graph Analytics
    Pan, Yuechao
    Wang, Yangzihao
    Wu, Yuduo
    Yang, Carl
    Owens, John D.
    2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 479 - 490
  • [6] Cardiac simulation on multi-GPU platform
    Nimmagadda, Venkata Krishna
    Akoglu, Ali
    Hariri, Salim
    Moukabary, Talal
    JOURNAL OF SUPERCOMPUTING, 2012, 59 (03): : 1360 - 1378
  • [7] Multi-GPU Implementation of LU Factorization
    Jia, Yulu
    Luszczek, Piotr
    Dongarra, Jack
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2012, 2012, 9 : 106 - 115
  • [8] An introduction to multi-GPU programming for physicists
    Bernaschi, M.
    Bisson, M.
    Fatica, M.
    Phillips, E.
    EUROPEAN PHYSICAL JOURNAL-SPECIAL TOPICS, 2012, 210 (01): : 17 - 31
  • [9] Towards multi-GPU support for visualization
    Owens, John D.
    SCIDAC 2007: SCIENTIFIC DISCOVERY THROUGH ADVANCED COMPUTING, 2007, 78
  • [10] Cardiac simulation on multi-GPU platform
    Venkata Krishna Nimmagadda
    Ali Akoglu
    Salim Hariri
    Talal Moukabary
    The Journal of Supercomputing, 2012, 59 : 1360 - 1378