Lightweight Measurement and Analysis of HPC Performance Variability

被引:3
|
作者
Dominguez-Trujillo, Jered [1 ]
Haskins, Keira [1 ]
Khouzani, Soheila Jafari [1 ]
Leap, Christopher [1 ]
Tashakkori, Sahba [1 ]
Wofford, Quincy [1 ]
Estrada, Trilce [1 ]
Bridges, Patrick G. [1 ]
Widener, Patrick M. [2 ]
机构
[1] Univ New Mexico, Comp Sci Dept, Albuquerque, NM 87131 USA
[2] Sandia Natl Labs, Ctr Comp Res, POB 5800, Albuquerque, NM 87185 USA
基金
美国国家科学基金会; 美国能源部;
关键词
BOOTSTRAP;
D O I
10.1109/PMBS51919.2020.00011
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Performance variation deriving from hardware and software sources is common in modern scientific and data-intensive computing systems, and synchronization in parallel and distributed programs often exacerbates their impacts at scale. The decentralized and emergent effects of such variation are, unfortunately, also difficult to systematically measure, analyze, and predict; modeling assumptions which are stringent enough to make analysis tractable frequently cannot be guaranteed at meaningful application scales, and longitudinal methods at such scales can require the capture and manipulation of impractically large amounts of data. This paper describes a new, scalable, and statistically robust approach for effective modeling, measurement, and analysis of large-scale performance variation in HPC systems. Our approach avoids the need to reason about complex distributions of runtimes among large numbers of individual application processes by focusing instead on the maximum length of distributed workload intervals. We describe this approach and its implementation in MPI which makes it applicable to a diverse set of HPC workloads. We also present evaluations of these techniques for quantifying and predicting performance variation carried out on large-scale computing systems, and discuss the strengths and limitations of the underlying modeling assumptions.
引用
收藏
页码:50 / 60
页数:11
相关论文
共 50 条
  • [41] PERFORMANCE OF THE HPC CALORIMETER IN DELPHI
    CHAN, A
    CRAWLEY, HB
    EDSALL, DM
    FIRESTONE, A
    GORBICS, MS
    GORN, L
    LAMSA, JW
    MCKAY, R
    MEYER, WT
    ROSENBERG, EI
    THOMAS, WD
    WAYNE, MR
    BODINI, M
    CAVALLO, F
    GIORDANO, V
    MASELLI, L
    NAVARRIA, FL
    PERROTTA, A
    ROSSI, U
    VALENTI, G
    BASTIE, C
    BELL, W
    BURMEISTER, H
    CAMPORESI, T
    CATTAI, A
    FEINDT, M
    FURSTENAU, H
    GILLESPIE, D
    MARTIN, J
    SCHAEL, S
    CONTRI, R
    CROSETTI, G
    MEOLA, G
    MORELLI, A
    MORETTINI, P
    PIANA, G
    SANNINO, M
    SQUARCIA, S
    TRASPEDINI, L
    ALGERI, A
    BOHMER, R
    DEBOER, W
    KOHLER, W
    KONIG, B
    KREUTER, C
    PODOBRIN, O
    SEITZ, A
    WIELERS, M
    BONIVENTO, W
    CALVI, M
    IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 1995, 42 (04) : 491 - 498
  • [42] Performance Evaluation of Containers for HPC
    Ruiz, Cristian
    Jeanvoine, Emmanuel
    Nussbaum, Lucas
    EURO-PAR 2015: PARALLEL PROCESSING WORKSHOPS, 2015, 9523 : 813 - 824
  • [43] A Survey About Quantitative Measurement of Performance Variability in High Performance Computers
    Wu, Linping
    Xu, Xiaowen
    Wei, Yong
    Liu, Xu
    ADVANCED PARALLEL PROCESSING TECHNOLOGIES, 2017, 10561 : 76 - 86
  • [44] Performance of the HPC calorimeter in DELPHI
    Iowa State Univ, Ames, United States
    IEEE Transactions on Nuclear Science, 1995, 42 (4 pt 1): : 491 - 498
  • [45] Performance modeling of HPC applications
    Snavely, A
    Lee, C
    Carrington, L
    Wolter, N
    Labarta, J
    Gimenez, J
    Jones, P
    PARALLEL COMPUTING: SOFTWARE TECHNOLOGY, ALGORITHMS, ARCHITECTURES AND APPLICATIONS, 2004, 13 : 777 - 784
  • [46] A Performance Analysis And Optimization Of PMIx-based HPC Software Stacks
    Polyakov, Artem Y.
    Karasev, Boris, I
    Hursey, Joshua
    Ladd, Joshua
    Brinskii, Mikhail
    Shipunova, Elena
    EUROMPI'19: PROCEEDINGS OF THE 26TH EUROPEAN MPI USERS' GROUP MEETING, 2019,
  • [47] Performance and Energy-efficiency Analysis of ARM Processors for HPC Workloads
    Carrington, Laura
    PROCEEDINGS OF CO-HPC 2015: 2ND INTERNATIONAL WORKSHOP ON HARDWARE-SOFTWARE CO-DESIGN FOR HIGH PERFORMANCE COMPUTING, 2015,
  • [48] Optimizing HPC I/O Performance with Regression Analysis and Ensemble Learning
    Liu, Zhangyu
    Zhang, Cheng
    Wu, Huijun
    Fang, Jianbin
    Peng, Lin
    Ye, Guixin
    Tang, Zhanyong
    2023 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, CLUSTER, 2023, : 234 - 246
  • [49] Web-oriented visual performance analysis tool for HPC: THPTiii
    Shi, PZ
    Li, SL
    CHINESE JOURNAL OF ELECTRONICS, 2003, 12 (04): : 506 - 510
  • [50] Empirical Performance Analysis of HPC Benchmarks Across Variations in Cloud Computing
    Ahuja, Sanjay P.
    Mani, Sindhu
    INTERNATIONAL JOURNAL OF CLOUD APPLICATIONS AND COMPUTING, 2013, 3 (01) : 13 - 26