UPC plus plus : A High-Performance Communication Framework for Asynchronous Computation

被引:26
|
作者
Bachan, John [1 ]
Baden, Scott B. [1 ]
Hofmeyr, Steven [1 ]
Jacquelin, Mathias [1 ]
Kamil, Amir [1 ,2 ]
Bonachea, Dan [1 ]
Hargrove, Paul H. [1 ]
Ahmed, Hadia [1 ]
机构
[1] Lawrence Berkeley Natl Lab, Computat Res Div, Berkeley, CA 94720 USA
[2] Univ Michigan, Ann Arbor, MI 48109 USA
关键词
Asynchronous; PGAS; RMA; RPC; Exascale;
D O I
10.1109/IPDPS.2019.00104
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
UPC++ is a C++ library that supports high-performance computation via an asynchronous communication framework. This paper describes a new incarnation that differs substantially from its predecessor, and we discuss the reasons for our design decisions. We present new design features, including future-based asynchrony management, distributed objects, and generalized Remote Procedure Call (RPC). We show microbenchmark performance results demonstrating that one-sided Remote Memory Access (RMA) in UPC++ is competitive with MPI-3 RMA; on a Cray XC40 UPC++ delivers up to a 25% improvement in the latency of blocking RMA put, and up to a 33% bandwidth improvement in an RMA throughput test. We showcase the benefits of UPC++ with irregular applications through a pair of application motifs, a distributed hash table and a sparse solver component. Our distributed hash table in UPC++ delivers near-linear weak scaling up to 34816 cores of a Cray XC40. Our UPC++ implementation of the sparse solver component shows robust strong scaling up to 2048 cores, where it outperforms variants communicating using MPI by up to 3.1x. UPC++ encourages the use of aggressive asynchrony in low-overhead RMA and RPC, improving programmer productivity and delivering high performance in irregular applications.
引用
收藏
页码:963 / 973
页数:11
相关论文
共 50 条
  • [21] Overlapping Communication and Computation with ExaMPI's Strong Progress and Modern C plus plus Design
    Schafer, Derek
    Hines, Thomas
    Suggs, Evan Drake
    Rufenacht, Martin
    Skjellum, Anthony
    PROCEEDINGS OF EXAMPI 2021: WORKSHOP ON EXASCALE MPI, 2021, : 18 - 26
  • [22] CARAVEL: A C plus plus framework for the computation of multi-loop amplitudes with numerical unitarity
    Abreu, S.
    Dormans, J.
    Cordero, F. Febres
    Ita, H.
    Kraus, M.
    Page, B.
    Pascual, E.
    Ruf, M. S.
    Sotnikov, V.
    COMPUTER PHYSICS COMMUNICATIONS, 2021, 267
  • [23] Thrust plus plus : Extending Thrust Framework for Better Abstraction and Performance
    George, Ajai V.
    Manoj, Sankar
    Gupte, Sanket R.
    Mitra, Sayantan
    Sarkar, Santonu
    2017 IEEE 24TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2017, : 368 - 377
  • [24] BrainGrid plus Workbench: High-Performance/High-Quality Neural Simulation
    Stiber, Michael
    Kawasaki, Fumitaka
    Davis, Delmar B.
    Asuncion, Hazeline U.
    Lee, Jewel Yun-Hsuan
    Boyer, Destiny
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 2469 - 2476
  • [26] Generic C plus plus Implementation of High-Performance BFS-RBF-based Mesh Motion Schemes
    Gottschling, Peter
    Heinzl, Rene
    Weinhub, J.
    Kirchner, Nadejda
    Sauer, Martin
    Klomfass, Arno
    Steinhardt, Cornelius
    Wensch, Joerg
    NUMERICAL ANALYSIS AND APPLIED MATHEMATICS, VOLS I-III, 2010, 1281 : 1631 - +
  • [27] FedAT: A High-Performance and Communication -Efficient Federated Learning System with Asynchronous Tiers
    Chai, Zheng
    Chen, Yujing
    Anwar, Ali
    Zhao, Liang
    Cheng, Yue
    Rangwala, Huzefa
    SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2021,
  • [28] IP plus optical technology for high-performance backbone network in Japan
    Yamanaka, Naoaki
    2006 IEEE SARNOFF SYMPOSIUM, 2006, : 371 - 374
  • [29] PiCo: High-performance data analytics pipelines in modern C plus
    Misale, Claudia
    Drocco, Maurizio
    Tremblay, Guy
    Martinelli, Alberto R.
    Aldinucci, Marco
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 87 : 392 - 403
  • [30] QTLS: High-Performance TLS Asynchronous Offload Framework with Intel® QuickAssist Technology
    Hu, Xiaokang
    Wei, Changzheng
    Li, Jian
    Will, Brian
    Yu, Ping
    Gong, Lu
    Guan, Haibing
    PROCEEDINGS OF THE 24TH SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '19), 2019, : 158 - 172