Evaluating Multi-GPU Sorting with Modern Interconnects

被引:5
|
作者
Maltenberger, Tobias [1 ]
Ilic, Ivan [1 ]
Tolovski, Ilin [1 ]
Rabl, Tilmann [1 ]
机构
[1] Univ Potsdam, Hasso Plattner Inst, Potsdam, Germany
关键词
multi-GPU sorting; high-speed interconnects; database acceleration; ALGORITHM; JOINS; CORE;
D O I
10.1145/3514221.3517842
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
GPUs have become a mainstream accelerator for database operations such as sorting. Most GPU sorting algorithms are single-GPU approaches. They neither harness the full computational power nor exploit the high-bandwidth P2P interconnects of modem multi-GPU platforms. The latest NVLink 2.0 and NVLink 3.0-based NVSwitch interconnects promise unparalleled multi-GPU acceleration. So far, multi-GPU sorting has only been evaluated on systems with PCIe 3.0. In this paper, we analyze serial, parallel, and bidirectional data transfer rates to, from, and between multiple GPUs on systems with PCIe 3.0/4.0, NVLink 2.0/3.0, and NVSwitch. We measure up to 35x higher parallel P2P throughput with NVLink 3.0-based NVSwitch over PCIe 3.0. To study GPU-accelerated sorting on today's hardware, we implement a P2P-based GPU-only (P2P sort) and a heterogeneous (HET sort) multi-GPU sorting algorithm and evaluate them on three modem platforms. We observe speedups over state-of-the-art parallel CPU radix sort of up to 14x for P2P sort and 9x for HET sort. On systems with fast P2P interconnects, P2P sort outperforms HET sort up to 1.65x. Finally, we show that overlapping GPU copy/compute operations does not mitigate the transfer bottleneck when sorting large out-of-core data.
引用
收藏
页码:1795 / 1809
页数:15
相关论文
共 50 条
  • [21] Efficient parallel A* search on multi-GPU system
    He, Xin
    Yao, Yapeng
    Chen, Zhiwen
    Sun, Jianhua
    Chen, Hao
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2021, 123 : 35 - 47
  • [22] Scalable multi-GPU implementation of the MAGFLOW simulator
    Rustico, Eugenio
    Bilotta, Giuseppe
    Herault, Alexis
    Del Negro, Ciro
    Gallo, Giovanni
    ANNALS OF GEOPHYSICS, 2011, 54 (05) : 592 - 599
  • [23] Towards a Multi-GPU Implementation of a Seismic Application
    Rigon, Pedro H. C.
    Schussler, Brenda S.
    Padoin, Edson L.
    Lorenzon, Arthur F.
    Carissimi, Alexandre
    Navaux, Philippe O. A.
    HIGH PERFORMANCE COMPUTING, CARLA 2023, 2024, 1887 : 146 - 159
  • [24] A multi-GPU biclustering algorithm for binary datasets
    Lopez-Fernandez, Aurelio
    Rodriguez-Baena, Domingo
    Gomez-Vela, Francisco
    Divina, Federico
    Garcia-Torres, Miguel
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2021, 147 (147) : 209 - 219
  • [25] Multi-GPU System Design with Memory Networks
    Kim, Gwangsun
    Lee, Minseok
    Jeong, Jiyun
    Kim, John
    2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2014, : 484 - 495
  • [26] An Energy-Efficient Multi-GPU Supercomputer
    Rohr, David
    Kalcher, Sebastian
    Bach, Matthias
    Alaqeeli, Abdulqadir A.
    Alzaid, Hani M.
    Eschweiler, Dominic
    Lindenstruth, Volker
    Alkhereyf, Sakhar B.
    Alharthi, Ahmad
    Almubarak, Abdulelah
    Alqwaiz, Ibraheem
    Bin Suliman, Riman
    2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS), 2014, : 42 - 45
  • [27] Multi-GPU Implementation of the NICAM Atmospheric Model
    Demeshko, Irina
    Maruyama, Naoya
    Tomita, Hirofumi
    Matsuoka, Satoshi
    EURO-PAR 2012: PARALLEL PROCESSING WORKSHOPS, 2013, 7640 : 175 - 184
  • [28] Involving CPUs into Multi-GPU Deep Learning
    Le, Tung D.
    Sekiyama, Taro
    Negishi, Yasushi
    Imai, Haruki
    Kawachiya, Kiyokuni
    PROCEEDINGS OF THE 2018 ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING (ICPE '18), 2018, : 56 - 67
  • [29] Snoopie: A Multi-GPU Communication Profiler and Visualizer
    Issa, Mohammad Kefah Taha
    Sasongko, Muhammad Aditya
    Turimbetov, Ilyas
    Baydamirli, Javid
    Sagbili, Dogan
    Unat, Didem
    PROCEEDINGS OF THE 38TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2024, 2024, : 525 - 536
  • [30] Suffix Array Construction on Multi-GPU Systems
    Bueren, Florian
    Juenger, Daniel
    Kobus, Robin
    Hundt, Christian
    Schmidt, Bertil
    HPDC'19: PROCEEDINGS OF THE 28TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, 2019, : 183 - 194