HPC AI500 V3.0: A scalable HPC AI benchmarking framework

被引:0
|
作者
Jiang Z. [1 ,2 ]
Luo C. [1 ]
Gao W. [1 ]
Wang L. [1 ]
Zhan J. [1 ,2 ]
机构
[1] Institute of Computing Technology, Chinese Academy of Sciences, Beijing
[2] University of Chinese Academy of Sciences, Beijing
关键词
Artificial intelligence; Benchmarking; High performance computing; Scalability;
D O I
10.1016/j.tbench.2022.100083
中图分类号
学科分类号
摘要
In recent years, the convergence of High Performance Computing (HPC) and artificial intelligence (AI) makes the community desperately need a benchmark to guide the design of next-generation scalable HPC AI systems. The success of the HPL benchmarks and the affiliated TOP500 ranking indicates that scalability is the fundamental requirement to evaluate HPC systems. However, being scalable in terms of these emerging AI workloads like deep learning (DL) raises nontrivial challenges. This paper formally and systematically analyzes the factor that limits scalability in DL workloads and presents HPC AI500 v3.0, a scalable HPC AI benchmarking framework. The HPC AI500 V3.0 methodology is inspired by bagging, which utilizes the collective wisdom of an ensemble of base models and enables the benchmarks to be adaptively scalable to different scales of HPC systems. We implement HPC AI500 V3.0 in a highly customizable manner, maintaining the space of various optimization from both system and algorithm levels. By reusing the representative workloads in HPC AI500 V2.0, we evaluate HPC AI500 V3.0 on typical HPC systems, and the results show it has near-linear scalability. Furthermore, based on the customizable design, we present a case study to perform a trade-off between AI model quality and its training speed. The source code of HPC AI500 V3.0 is publicly available from the HPC AI500 project homepage https://www.benchcouncil.org/aibench/hpcai500/. © 2022 The Authors
引用
收藏
相关论文
共 50 条
  • [21] Overview of HPC and AI Computing for COVID-19 in the US
    Stevens, Rick L.
    PACT '20: PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2020, : 1 - 1
  • [22] When Sally Met Harry or When AI Met HPC
    Cortés U.
    Moya U.
    Valero M.
    Supercomputing Frontiers and Innovations, 2021, 8 (01): : 4 - 8
  • [23] AIPerf: Automated Machine Learning as an AI-HPC Benchmark
    Ren, Zhixiang
    Liu, Yongheng
    Shi, Tianhui
    Xie, Lei
    Zhou, Yue
    Zhai, Jidong
    Zhang, Youhui
    Zhang, Yunquan
    Chen, Wenguang
    BIG DATA MINING AND ANALYTICS, 2021, 4 (03) : 208 - 220
  • [24] AI4DEV 2023: First Workshop on AI Assisted Software Development for HPC
    Laguna, Ignacio
    Georgakoudis, Giorgis
    Parasyris, Konstantinos
    ACM International Conference Proceeding Series, 2023,
  • [25] Editorial for the special issue on large-scale AI in classical HPC environment and AI for science
    Wei Xue
    Haohuan Fu
    Weile Jia
    Guangming Tan
    CCF Transactions on High Performance Computing, 2021, 3 : 221 - 223
  • [26] Efficient Coupling Streaming AI and Ensemble Simulations on HPC Clusters
    Jiang, Jiazhi
    Zhang, Hongbin
    Liu, Deyin
    Du, Jiangsu
    Yao, Xiaojiao
    Wei, Jinhui
    Chen, Pin
    Huang, Dan
    Lu, Yutong
    EURO-PAR 2024: PARALLEL PROCESSING, PT I, EURO-PAR 2024, 2024, 14801 : 313 - 328
  • [27] Editorial for the special issue on large-scale AI in classical HPC environment and AI for science
    Xue, Wei
    Fu, Haohuan
    Jia, Weile
    Tan, Guangming
    CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING, 2021, 3 (03) : 221 - 223
  • [28] Sustainable Supercomputing for AI: GPU Power Capping at HPC Scale
    Zhao, Dan
    Samsi, Siddharth
    McDonald, Joseph
    Li, Baolin
    Bestor, David
    Jones, Michael
    Tiwari, Devesh
    Gadepally, Vijay
    PROCEEDINGS OF THE 2023 ACM SYMPOSIUM ON CLOUD COMPUTING, SOCC 2023, 2023, : 588 - 596
  • [29] AIPerf: Automated Machine Learning as an AI-HPC Benchmark
    Zhixiang Ren
    Yongheng Liu
    Tianhui Shi
    Lei Xie
    Yue Zhou
    Jidong Zhai
    Youhui Zhang
    Yunquan Zhang
    Wenguang Chen
    Big Data Mining and Analytics, 2021, (03) : 208 - 220
  • [30] Developing AI Applications for the HPC-Cloud Continuum with ColonyOS
    Kristiansson, Johan
    Wikfeldt, Thor
    2024 23RD INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING, ISPDC 2024, 2024,