Optimizing HPC I/O Performance with Regression Analysis and Ensemble Learning

被引:1
|
作者
Liu, Zhangyu [1 ]
Zhang, Cheng [1 ]
Wu, Huijun [2 ]
Fang, Jianbin [2 ]
Peng, Lin [2 ]
Ye, Guixin
Tang, Zhanyong [1 ]
机构
[1] Northwest Univ, Sch Informat Sci & Technol, Xian, Peoples R China
[2] Natl Univ Def Technol, Coll Comp, Changsha, Peoples R China
基金
中国国家自然科学基金;
关键词
HPC; Parallel I/O; Performance Optimization; Auto-tuning; Ensemble Learning;
D O I
10.1109/CLUSTER52292.2023.00027
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
To improve parallel I/O performance, it is imperative to optimize the adjustable parameters across the different layers of the I/O software stack. Finding an optimal configuration for different scenarios is hampered by the complex interaction dynamics between these parameters and the large parameter space. Previous research efforts have focused on tuning these parameters using independent algorithms; however, these approaches exhibit certain shortcomings such as unstable performance results and delayed convergence rates. This paper introduces OPRAEL, an auto-tuning approach on parallel I/O tasks by ensembles and performance modeling using regression analysis. To test its effectiveness, we applied this approach on the Tianhe-II supercomputer using one well-known I/O benchmark(IOR) and two I/O kernels(S3D-I/O, BT-I/O). Leveraging our experience in predictive modeling, we optimized the tuning of the I/O stack parameters. Our experimental results show a remarkable 10.2X improvement in write performance speedup for the optimization task with BT-I/O and a 500x500x500 input. We also compared the potential of using a single search algorithm versus using reinforcement learning search in the I/O parameter auto-optimization task. Our results show that OPRAEL outperforms the traditional approach, resulting in a maximum 8.4X improvement in write performance for the 128-process IOR optimization.
引用
收藏
页码:234 / 246
页数:13
相关论文
共 50 条
  • [31] Design and implementation of I/O performance prediction scheme on HPC systems through large-scale log analysis
    Sunggon Kim
    Alex Sim
    Kesheng Wu
    Suren Byna
    Yongseok Son
    Journal of Big Data, 10
  • [32] LDMS Darshan Connector: For Run Time Diagnosis of HPC Application I/O Performance
    Walton, Sara
    Aaziz, Omar
    Solorzano, Ana Luisa V.
    Schwaller, Ben
    2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022), 2022, : 626 - 634
  • [33] Optimizing performance and energy of HPC applications on POWER7
    Brochard, Luigi
    Panda, Raj
    Vemuganti, Sid
    COMPUTER SCIENCE-RESEARCH AND DEVELOPMENT, 2010, 25 (3-4): : 135 - 140
  • [34] Design and implementation of I/O performance prediction scheme on HPC systems through large-scale log analysis
    Kim, Sunggon
    Sim, Alex
    Wu, Kesheng
    Byna, Suren
    Son, Yongseok
    JOURNAL OF BIG DATA, 2023, 10 (01)
  • [35] Characterizing and Predicting the I/O Performance of HPC Applications Using a Parameterized Synthetic Benchmark
    Shan, Hongzhang
    Antypas, Katie
    Shalf, John
    INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2008, : 408 - 419
  • [36] To Checkpoint or Not to Checkpoint: Understanding Energy-Performance-I/O Tradeoffs in HPC Checkpointing
    El-Sayed, Nosayba
    Schroeder, Bianca
    2014 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2014, : 93 - 102
  • [37] Modular HPC I/O Characterization with Darshan
    Snyder, Shane
    Carns, Philip
    Harms, Kevin
    Ross, Robert
    Lockwood, Glenn K.
    Wright, Nicholas J.
    PROCEEDINGS OF ESPT 2016: 5TH WORKSHOP ON EXTREME-SCALE PROGRAMMING TOOLS, 2016, : 9 - 17
  • [38] Analysis of Optimizing Triple-band Antenna Based on Ensemble Learning Method
    Qiu, Peiyi
    Meng, Jinghui
    Yang, Xiaoyu
    Kanaya, Haruichi
    2024 IEEE INTERNATIONAL SYMPOSIUM ON ANTENNAS AND PROPAGATION AND INC/USNCURSI RADIO SCIENCE MEETING, AP-S/INC-USNC-URSI 2024, 2024, : 1485 - 1486
  • [39] A Comprehensive I/O Knowledge Cycle for Modular and Automated HPC Workload Analysis
    Zhu, Zhaobin
    Neuwirth, Sarah
    Lippert, Thomas
    2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022), 2022, : 581 - 588
  • [40] HPC I/O in the Data Center Workshop (HPC-IODC 2018)
    Kunkel, Julian M.
    Lofstead, Jay
    HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2018, 2018, 11203 : 2 - 7