Lightweight Measurement and Analysis of HPC Performance Variability

被引:3
|
作者
Dominguez-Trujillo, Jered [1 ]
Haskins, Keira [1 ]
Khouzani, Soheila Jafari [1 ]
Leap, Christopher [1 ]
Tashakkori, Sahba [1 ]
Wofford, Quincy [1 ]
Estrada, Trilce [1 ]
Bridges, Patrick G. [1 ]
Widener, Patrick M. [2 ]
机构
[1] Univ New Mexico, Comp Sci Dept, Albuquerque, NM 87131 USA
[2] Sandia Natl Labs, Ctr Comp Res, POB 5800, Albuquerque, NM 87185 USA
基金
美国国家科学基金会; 美国能源部;
关键词
BOOTSTRAP;
D O I
10.1109/PMBS51919.2020.00011
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Performance variation deriving from hardware and software sources is common in modern scientific and data-intensive computing systems, and synchronization in parallel and distributed programs often exacerbates their impacts at scale. The decentralized and emergent effects of such variation are, unfortunately, also difficult to systematically measure, analyze, and predict; modeling assumptions which are stringent enough to make analysis tractable frequently cannot be guaranteed at meaningful application scales, and longitudinal methods at such scales can require the capture and manipulation of impractically large amounts of data. This paper describes a new, scalable, and statistically robust approach for effective modeling, measurement, and analysis of large-scale performance variation in HPC systems. Our approach avoids the need to reason about complex distributions of runtimes among large numbers of individual application processes by focusing instead on the maximum length of distributed workload intervals. We describe this approach and its implementation in MPI which makes it applicable to a diverse set of HPC workloads. We also present evaluations of these techniques for quantifying and predicting performance variation carried out on large-scale computing systems, and discuss the strengths and limitations of the underlying modeling assumptions.
引用
收藏
页码:50 / 60
页数:11
相关论文
共 50 条
  • [1] Understanding the causes of performance variability in HPC workloads
    Skinner, D
    Kramer, W
    IISWC - 2005: PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION, 2005, : 137 - 149
  • [2] Exascale potholes for HPC: Execution performance and variability analysis of the flagship application code HemeLB
    Wylie, Brian J. N.
    PROCEEDINGS OF 2020 IEEE/ACM INTERNATIONAL WORKSHOP ON HPC USER SUPPORT TOOLS (HUST) AND THE WORKSHOP ON PROGRAMMING AND PERFORMANCE VISUALIZATION TOOLS (PROTOOLS), 2020, : 59 - 70
  • [3] Performance Evaluation of a Lightweight Virtualization Solution for HPC I/O Scenarios
    Beserra, David
    Moreno, Edward David
    Endo, Patricia Takako
    Barreto, Jymmy
    2016 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2016, : 4681 - 4686
  • [4] Lightweight HPC beam OMEGA
    Sykora, Michal
    Jedlinsky, Petr
    Komanec, Jan
    BUILDING UP EFFICIENT AND SUSTAINABLE TRANSPORT INFRASTRUCTURE 2017 (BESTINFRA2017), 2017, 236
  • [5] Harnessing Performance Variability: A HPC-oriented Application Scenario
    Massari, Giuseppe
    Libutti, Simone
    Portero, Antoni
    Vavrik, Radim
    Kuchar, Stepan
    Vondrak, Vit
    Borghese, Luca
    Fornaciari, William
    2015 EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD), 2015, : 111 - 116
  • [6] Performance analysis of HPC applications in the cloud
    Exposito, Roberto R.
    Taboada, Guillermo L.
    Ramos, Sabela
    Tourino, Juan
    Doallo, Ramon
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2013, 29 (01): : 218 - 229
  • [7] Performance Analysis of LXC for HPC Environments
    Beserra, David
    Moreno, Edward David
    Endo, Patricia Takako
    Barreto, Jymmy
    Sadok, Djamel
    Fernandes, Stenio
    2015 9TH INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT, AND SOFTWARE INTENSIVE SYSTEMS CISIS 2015, 2015, : 358 - 363
  • [8] Performance analysis of Cellular Automata HPC implementations
    Millan, Emmanuel N.
    Bederian, Carlos S.
    Fabiana Piccoli, Maria
    Garcia Garino, Carlos
    Bringa, Eduardo M.
    COMPUTERS & ELECTRICAL ENGINEERING, 2015, 48 : 12 - 24
  • [9] Facilitating the Process of Performance Analysis of HPC Applications
    V. V. Voevodin
    A. V. Debolskiy
    E. V. Mortikov
    Lobachevskii Journal of Mathematics, 2023, 44 : 3178 - 3190
  • [10] Facilitating the Process of Performance Analysis of HPC Applications
    Voevodin, V. V.
    Debolskiy, A. V.
    Mortikov, E. V.
    LOBACHEVSKII JOURNAL OF MATHEMATICS, 2023, 44 (08) : 3178 - 3190