UPC plus plus : A High-Performance Communication Framework for Asynchronous Computation

被引：26

作者：

Bachan, John ^{[1
]}

Baden, Scott B. ^{[1
]}

Hofmeyr, Steven ^{[1
]}

Jacquelin, Mathias ^{[1
]}

Kamil, Amir ^{[1
,2
]}

Bonachea, Dan ^{[1
]}

Hargrove, Paul H. ^{[1
]}

Ahmed, Hadia ^{[1
]}

机构：

[1] Lawrence Berkeley Natl Lab, Computat Res Div, Berkeley, CA 94720 USA

[2] Univ Michigan, Ann Arbor, MI 48109 USA

来源：

2019 IEEE 33RD INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2019) | 2019年

关键词：

Asynchronous; PGAS; RMA; RPC; Exascale;

D O I：

10.1109/IPDPS.2019.00104

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

UPC++ is a C++ library that supports high-performance computation via an asynchronous communication framework. This paper describes a new incarnation that differs substantially from its predecessor, and we discuss the reasons for our design decisions. We present new design features, including future-based asynchrony management, distributed objects, and generalized Remote Procedure Call (RPC). We show microbenchmark performance results demonstrating that one-sided Remote Memory Access (RMA) in UPC++ is competitive with MPI-3 RMA; on a Cray XC40 UPC++ delivers up to a 25% improvement in the latency of blocking RMA put, and up to a 33% bandwidth improvement in an RMA throughput test. We showcase the benefits of UPC++ with irregular applications through a pair of application motifs, a distributed hash table and a sparse solver component. Our distributed hash table in UPC++ delivers near-linear weak scaling up to 34816 cores of a Cray XC40. Our UPC++ implementation of the sparse solver component shows robust strong scaling up to 2048 cores, where it outperforms variants communicating using MPI by up to 3.1x. UPC++ encourages the use of aggressive asynchrony in low-overhead RMA and RPC, improving programmer productivity and delivering high performance in irregular applications.

引用

页码：963 / 973

页数：11

共 50 条

[21] Overlapping Communication and Computation with ExaMPI's Strong Progress and Modern C plus plus Design
Schafer, Derek
Hines, Thomas
Suggs, Evan Drake
Rufenacht, Martin
Skjellum, Anthony
PROCEEDINGS OF EXAMPI 2021: WORKSHOP ON EXASCALE MPI, 2021, : 18 - 26
[22] CARAVEL: A C plus plus framework for the computation of multi-loop amplitudes with numerical unitarity
Abreu, S.
Dormans, J.
Cordero, F. Febres
Ita, H.
Kraus, M.
Page, B.
Pascual, E.
Ruf, M. S.
Sotnikov, V.
COMPUTER PHYSICS COMMUNICATIONS, 2021, 267
[23] Thrust plus plus : Extending Thrust Framework for Better Abstraction and Performance
George, Ajai V.
Manoj, Sankar
Gupte, Sanket R.
Mitra, Sayantan
Sarkar, Santonu
2017 IEEE 24TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2017, : 368 - 377
[24] BrainGrid plus Workbench: High-Performance/High-Quality Neural Simulation
Stiber, Michael
Kawasaki, Fumitaka
Davis, Delmar B.
Asuncion, Hazeline U.
Lee, Jewel Yun-Hsuan
Boyer, Destiny
2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 2469 - 2476
[25] A perfect combination: high-precision optics plus high-performance camera
不详
ANTI-CORROSION METHODS AND MATERIALS, 2013, 60 (04) : 217 - 217
[26] Generic C plus plus Implementation of High-Performance BFS-RBF-based Mesh Motion Schemes
Gottschling, Peter
Heinzl, Rene
Weinhub, J.
Kirchner, Nadejda
Sauer, Martin
Klomfass, Arno
Steinhardt, Cornelius
Wensch, Joerg
NUMERICAL ANALYSIS AND APPLIED MATHEMATICS, VOLS I-III, 2010, 1281 : 1631 - +
[27] FedAT: A High-Performance and Communication -Efficient Federated Learning System with Asynchronous Tiers
Chai, Zheng
Chen, Yujing
Anwar, Ali
Zhao, Liang
Cheng, Yue
Rangwala, Huzefa
SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2021,
[28] IP plus optical technology for high-performance backbone network in Japan
Yamanaka, Naoaki
2006 IEEE SARNOFF SYMPOSIUM, 2006, : 371 - 374
[29] PiCo: High-performance data analytics pipelines in modern C plus
Misale, Claudia
Drocco, Maurizio
Tremblay, Guy
Martinelli, Alberto R.
Aldinucci, Marco
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 87 : 392 - 403
[30] QTLS: High-Performance TLS Asynchronous Offload Framework with Intel® QuickAssist Technology
Hu, Xiaokang
Wei, Changzheng
Li, Jian
Will, Brian
Yu, Ping
Gong, Lu
Guan, Haibing
PROCEEDINGS OF THE 24TH SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '19), 2019, : 158 - 172

← 1 2 3 4 5 →