Validation and Uncertainty Assessment of Extreme-Scale HPC Simulation through Bayesian Inference

被引:0
|
作者
Wilke, Jeremiah J. [1 ]
Sargsyan, Khachik [1 ]
Kenny, Joseph P. [1 ]
Debusschere, Bert [1 ]
Najm, Habib N. [1 ]
Hendry, Gilbert [1 ]
机构
[1] Sandia Natl Labs, Livermore, CA 94550 USA
来源
关键词
PERFORMANCE PREDICTION; PARALLEL;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Simulation of high-performance computing (HPC) systems plays a critical role in their development - especially as HPC moves toward the co-design model used for embedded systems, tying hardware and software into a unified design cycle. Exploring system-wide trade-offs in hardware, middleware and applications using high-fidelity cycle-accurate simulation, however, is far too costly. Coarse-grained methods can provide efficient, accurate simulation but require rigorous uncertainty quantification (UQ) before using results to support design decisions. We present here SST/macro, a coarse-grained structural simulator providing flexible congestion models for low-cost simulation. We explore the accuracy limits of coarse-grained simulation by deriving error distributions of model parameters using Bayesian inference. Propagating these uncertainties through the model, we demonstrate SST/macro's utility in making conclusions about performance tradeoffs for a series of MPI collectives. Low-cost and high-accuracy simulations coupled with UQ methodology make SST/macro a powerful tool for rapidly prototyping systems to aid extreme-scale HPC co-design.
引用
收藏
页码:41 / 52
页数:12
相关论文
共 50 条
  • [1] Modeling and Simulation of Extreme-Scale Fat-Tree Networks for HPC Systems and Data Centers
    Liu, Ning
    Haider, Adnan
    Jin, Dong
    Sun, Xian-He
    ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION, 2017, 27 (02):
  • [2] Convergence of HPC and Big Data in extreme-scale data analysis through the DCEx programming model
    Garcia-Blas, Javier
    Fernandez Munoz, Javier
    Carretero, Jesus
    Marozzo, Fabrizio
    Talia, Domenico
    Trunfio, Paolo
    Fernandez-Pena, Alberto
    Martin de Blas, Daniel
    2022 IEEE 34TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2022), 2022, : 130 - 139
  • [3] Understanding and Exploiting Spatial Properties of System Failures on Extreme-Scale HPC Systems
    Gupta, Saurabh
    Tiwari, Devesh
    Jantzi, Christopher
    Rogers, James
    Maxwell, Don
    2015 45TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, 2015, : 37 - 44
  • [4] Extreme-Scale UQ for Bayesian Inverse Problems Governed by PDEs
    Bui-Thanh, Tan
    Burstedde, Carsten
    Ghattas, Omar
    Martin, James
    Stadler, Georg
    Wilcox, Lucas C.
    2012 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2012,
  • [5] Memory-Conscious Collective I/O for Extreme-Scale HPC Systems
    Lu, Yin
    Chen, Yong
    Thakur, Rajeev
    Zhuang, Yu
    2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 1360 - 1360
  • [6] Memory-Conscious Collective I/O for Extreme-scale HPC Systems
    Lu, Yin
    Chen, Yong
    Thakur, Rajeev
    Zhuang, Yu
    2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 1361 - +
  • [7] LFTI: A New Performance Metric for Assessing Interconnect Designs for Extreme-Scale HPC Systems
    Yuan, Xin
    Mahapatra, Santosh
    Lang, Michael
    Pakin, Scott
    2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
  • [8] Physics -informed neural network uncertainty assessment through Bayesian inference
    Costa, Erbet Almeida
    Rebello, Carine Menezes
    Santana, Vinicius Viena
    Nogueira, Idelfonso B. R.
    IFAC PAPERSONLINE, 2024, 58 (14): : 652 - 657
  • [9] Canopus: A Paradigm Shift Towards Elastic Extreme-Scale Data Analytics on HPC Storage
    Lu, Tao
    Suchyta, Eric
    Pugmire, Dave
    Choi, Jong
    Klasky, Scott
    Liu, Qing
    Podhorszki, Norbert
    Ainsworth, Mark
    Wolf, Matthew
    2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2017, : 58 - 69
  • [10] Extreme-Scale Ab initio Quantum Raman Spectra Simulations on the Leadership HPC System in China
    Shang, Honghui
    Li, Fang
    Zhang, Yunquan
    Zhang, Libo
    Fu, You
    Gao, Yingxiang
    Wu, Yangjun
    Duan, Xiaohui
    Lin, Rongfen
    Liu, Xin
    Liu, Ying
    Chen, Dexun
    SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2021,