Lightweight Measurement and Analysis of HPC Performance Variability

被引:3
|
作者
Dominguez-Trujillo, Jered [1 ]
Haskins, Keira [1 ]
Khouzani, Soheila Jafari [1 ]
Leap, Christopher [1 ]
Tashakkori, Sahba [1 ]
Wofford, Quincy [1 ]
Estrada, Trilce [1 ]
Bridges, Patrick G. [1 ]
Widener, Patrick M. [2 ]
机构
[1] Univ New Mexico, Comp Sci Dept, Albuquerque, NM 87131 USA
[2] Sandia Natl Labs, Ctr Comp Res, POB 5800, Albuquerque, NM 87185 USA
基金
美国国家科学基金会; 美国能源部;
关键词
BOOTSTRAP;
D O I
10.1109/PMBS51919.2020.00011
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Performance variation deriving from hardware and software sources is common in modern scientific and data-intensive computing systems, and synchronization in parallel and distributed programs often exacerbates their impacts at scale. The decentralized and emergent effects of such variation are, unfortunately, also difficult to systematically measure, analyze, and predict; modeling assumptions which are stringent enough to make analysis tractable frequently cannot be guaranteed at meaningful application scales, and longitudinal methods at such scales can require the capture and manipulation of impractically large amounts of data. This paper describes a new, scalable, and statistically robust approach for effective modeling, measurement, and analysis of large-scale performance variation in HPC systems. Our approach avoids the need to reason about complex distributions of runtimes among large numbers of individual application processes by focusing instead on the maximum length of distributed workload intervals. We describe this approach and its implementation in MPI which makes it applicable to a diverse set of HPC workloads. We also present evaluations of these techniques for quantifying and predicting performance variation carried out on large-scale computing systems, and discuss the strengths and limitations of the underlying modeling assumptions.
引用
收藏
页码:50 / 60
页数:11
相关论文
共 50 条
  • [21] Seamless Simulations in Climate Variability and HPC
    Takahashi, Keiko
    Onishi, Ryo
    Sugimura, Takeshi
    Baba, Yuya
    Goto, Koji
    Fuchigami, Hiromitsu
    HIGH PERFORMANCE COMPUTING ON VECTOR SYSTEMS 2009, 2010, : 199 - +
  • [22] Strength and durability characteristics of ternary blend and lightweight HPC
    Dhinakaran, G.
    Kumar, K. Revanth
    Vijayarakhavan, S.
    Avinash, M.
    CONSTRUCTION AND BUILDING MATERIALS, 2017, 134 : 727 - 736
  • [23] BeeSwarm: Enabling Parallel Scaling Performance Measurement in Continuous Integration for HPC Applications
    Tronge, Jake
    Chen, Jieyang
    Grubel, Patricia
    Randles, Tim
    Davis, Rusty
    Wofford, Quincy
    Anaya, Steven
    Guan, Qiang
    2021 36TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING ASE 2021, 2021, : 1136 - 1140
  • [24] Keeping performance in HPC
    Farber, Rob
    Scientific Computing, 2007, 24 (08):
  • [25] Performance Analysis of WRF Simulations in a Public Cloud and HPC Environment
    Goga, Klodiana
    Parodi, Antonio
    Ruiu, Pietro
    Terzo, Olivier
    COMPLEX, INTELLIGENT, AND SOFTWARE INTENSIVE SYSTEMS, CISIS-2017, 2018, 611 : 384 - 396
  • [26] TCP/IP performance analysis of local ATM networks for HPC
    Yang, L
    Shu, RB
    HIGH PERFORMANCE COMPUTING ON THE INFORMATION SUPERHIGHWAY - HPC ASIA '97, PROCEEDINGS, 1997, : 230 - 235
  • [27] Proposal of Container-Based HPC Structures and Performance Analysis
    Yong, Chanho
    Lee, Go-Won
    Huh, Eui-Nam
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2018, 14 (06): : 1398 - 1404
  • [28] Optical packet switching in HPC. An analysis of applications performance
    Meyer, Hugo
    Carlos Sancho, Jose
    Mrdakovic, Milica
    Miao, Wang
    Calabretta, Nicola
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 82 : 606 - 616
  • [29] SYMBIOSYS: A Methodology for Performance Analysis of Composable HPC Data Services
    Ramesh, Srinivasan
    Malony, Allen D.
    Carns, Philip
    Ross, Robert B.
    Dorier, Matthieu
    Soumagne, Jerome
    Snyder, Shane
    2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2021, : 35 - 45
  • [30] Application kernels: HPC resources performance monitoring and variance analysis
    Simakov, Nikolay A.
    White, Joseph P.
    DeLeon, Robert L.
    Ghadersohi, Amin
    Furlani, Thomas R.
    Jones, Matthew D.
    Gallo, Steven M.
    Patra, Abani K.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (17): : 5238 - 5260