HPC AI500 V3.0: A scalable HPC AI benchmarking framework

被引:0
|
作者
Jiang Z. [1 ,2 ]
Luo C. [1 ]
Gao W. [1 ]
Wang L. [1 ]
Zhan J. [1 ,2 ]
机构
[1] Institute of Computing Technology, Chinese Academy of Sciences, Beijing
[2] University of Chinese Academy of Sciences, Beijing
关键词
Artificial intelligence; Benchmarking; High performance computing; Scalability;
D O I
10.1016/j.tbench.2022.100083
中图分类号
学科分类号
摘要
In recent years, the convergence of High Performance Computing (HPC) and artificial intelligence (AI) makes the community desperately need a benchmark to guide the design of next-generation scalable HPC AI systems. The success of the HPL benchmarks and the affiliated TOP500 ranking indicates that scalability is the fundamental requirement to evaluate HPC systems. However, being scalable in terms of these emerging AI workloads like deep learning (DL) raises nontrivial challenges. This paper formally and systematically analyzes the factor that limits scalability in DL workloads and presents HPC AI500 v3.0, a scalable HPC AI benchmarking framework. The HPC AI500 V3.0 methodology is inspired by bagging, which utilizes the collective wisdom of an ensemble of base models and enables the benchmarks to be adaptively scalable to different scales of HPC systems. We implement HPC AI500 V3.0 in a highly customizable manner, maintaining the space of various optimization from both system and algorithm levels. By reusing the representative workloads in HPC AI500 V2.0, we evaluate HPC AI500 V3.0 on typical HPC systems, and the results show it has near-linear scalability. Furthermore, based on the customizable design, we present a case study to perform a trade-off between AI model quality and its training speed. The source code of HPC AI500 V3.0 is publicly available from the HPC AI500 project homepage https://www.benchcouncil.org/aibench/hpcai500/. © 2022 The Authors
引用
收藏
相关论文
共 50 条
  • [1] HPC AI500 V2.0: The Methodology, Tools, and Metrics for Benchmarking HPC AI Systems
    Jiang, Zihan
    Gao, Wanling
    Tang, Fei
    Wang, Lei
    Xiong, Xingwang
    Luo, Chunjie
    Lan, Chuanxin
    Li, Hongxiao
    Zhan, Jianfeng
    2021 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2021), 2021, : 47 - 58
  • [2] Scalable HPC & AI Infrastructure for COVID-19 Therapeutics
    Lee, Hyungro
    Merzky, Andre
    Tan, Li
    Titov, Mikhail
    Turilli, Matteo
    Alfe, Dario
    Bhati, Agastya
    Brace, Alex
    Clyde, Austin
    Coveney, Peter
    Ma, Heng
    Ramanathan, Arvind
    Stevens, Rick
    Trifan, Anda
    Van Dam, Hubertus
    Wan, Shunzhou
    Wilkinson, Sean
    Jha, Shantenu
    PROCEEDINGS OF THE PLATFORM FOR ADVANCED SCIENTIFIC COMPUTING CONFERENCE (PASC '21), 2021,
  • [3] The Need for HPC in AI Solutions
    Ettifouri, Imane
    Zbakh, Ivlostapha
    Tadonki, Claude
    ARTIFICIAL INTELLIGENCE AND HIGH PERFORMANCE COMPUTING IN THE CLOUD, 2024, 1220 : 137 - 159
  • [4] On the HPC/HPDA/AI Competences in Bulgaria
    Karaivanova, Aneta
    Atanassov, Emanouil
    Gurov, Todor
    DIGITAL PRESENTATION AND PRESERVATION OF CULTURAL AND SCIENTIFIC HERITAGE, 2022, 12 : 291 - 298
  • [5] SAIH: A Scalable Evaluation Methodology for Understanding AI Performance Trend on HPC Systems
    Du, Jiang-Su
    Li, Dong-Sheng
    Wen, Ying-Peng
    Jiang, Jia-Zhi
    Huang, Dan
    Liao, Xiang-Ke
    Lu, Yu-Tong
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2024, 39 (02) : 384 - 400
  • [6] TunIO: An AI-powered Framework for Optimizing HPC I/O
    Rajesh, Neeraj
    Bateman, Keith
    Bez, Jean Luca
    Byna, Suren
    Kougkas, Anthony
    Sun, Xian-He
    PROCEEDINGS 2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS 2024, 2024, : 494 - 505
  • [7] HPCFAIR: Enabling FAIR AI for HPC Applications
    Verma, Gaurav
    Emani, Murali
    Liao, Chunhua
    Lin, Pei-Hung
    Vanderbruggen, Tristan
    Shen, Xipeng
    Chapman, Barbara
    PROCEEDINGS OF THE WORKSHOP ON MACHINE LEARNING IN HIGH PERFORMANCE COMPUTING ENVIRONMENTS (MLHPC 2021), 2021, : 58 - 68
  • [8] Summit and Sierra: Designing AI/HPC Supercomputers
    Kahle, James A.
    Moreno, Jaime
    Dreps, Dan
    2019 IEEE INTERNATIONAL SOLID-STATE CIRCUITS CONFERENCE (ISSCC), 2019, 62 : 42 - 43
  • [9] The Convergence of HPC and AI on Intel® Based Supercomputers
    Codreanu, Valeriu
    2019 18TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC 2019), 2019, : XX - XX
  • [10] MiniMod: A Modular Miniapplication Benchmarking Framework for HPC
    Marts, W. Pepper
    Dosanjh, Matthew G. F.
    Levy, Scott
    Schonbein, Whit
    Grant, Ryan E.
    Bridges, Patrick G.
    2021 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2021), 2021, : 12 - 22