Optimizing HPC I/O Performance with Regression Analysis and Ensemble Learning

被引:1
|
作者
Liu, Zhangyu [1 ]
Zhang, Cheng [1 ]
Wu, Huijun [2 ]
Fang, Jianbin [2 ]
Peng, Lin [2 ]
Ye, Guixin
Tang, Zhanyong [1 ]
机构
[1] Northwest Univ, Sch Informat Sci & Technol, Xian, Peoples R China
[2] Natl Univ Def Technol, Coll Comp, Changsha, Peoples R China
基金
中国国家自然科学基金;
关键词
HPC; Parallel I/O; Performance Optimization; Auto-tuning; Ensemble Learning;
D O I
10.1109/CLUSTER52292.2023.00027
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
To improve parallel I/O performance, it is imperative to optimize the adjustable parameters across the different layers of the I/O software stack. Finding an optimal configuration for different scenarios is hampered by the complex interaction dynamics between these parameters and the large parameter space. Previous research efforts have focused on tuning these parameters using independent algorithms; however, these approaches exhibit certain shortcomings such as unstable performance results and delayed convergence rates. This paper introduces OPRAEL, an auto-tuning approach on parallel I/O tasks by ensembles and performance modeling using regression analysis. To test its effectiveness, we applied this approach on the Tianhe-II supercomputer using one well-known I/O benchmark(IOR) and two I/O kernels(S3D-I/O, BT-I/O). Leveraging our experience in predictive modeling, we optimized the tuning of the I/O stack parameters. Our experimental results show a remarkable 10.2X improvement in write performance speedup for the optimization task with BT-I/O and a 500x500x500 input. We also compared the potential of using a single search algorithm versus using reinforcement learning search in the I/O parameter auto-optimization task. Our results show that OPRAEL outperforms the traditional approach, resulting in a maximum 8.4X improvement in write performance for the 128-process IOR optimization.
引用
收藏
页码:234 / 246
页数:13
相关论文
共 50 条
  • [1] Optimizing I/O Performance of HPC Applications with Autotuning
    Behzad, Babak
    Byna, Surendra
    Prabhat
    Snir, Marc
    ACM TRANSACTIONS ON PARALLEL COMPUTING, 2019, 5 (04)
  • [2] TunIO: An AI-powered Framework for Optimizing HPC I/O
    Rajesh, Neeraj
    Bateman, Keith
    Bez, Jean Luca
    Byna, Suren
    Kougkas, Anthony
    Sun, Xian-He
    PROCEEDINGS 2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS 2024, 2024, : 494 - 505
  • [3] Optimizing ensemble weights and hyperparameters of machine learning models for regression problems
    Shahhosseini, Mohsen
    Hu, Guiping
    Pham, Hieu
    Machine Learning with Applications, 2022, 7
  • [4] Optimizing ensemble weights and hyperparameters of machine learning models for regression problems
    Shahhosseini, Mohsen
    Hu, Guiping
    Pham, Hieu
    MACHINE LEARNING WITH APPLICATIONS, 2022, 7
  • [5] Optimizing performance on modern HPC systems: learning from simple kernel benchmarks
    Hager, G.
    Zeiser, T.
    Treibig, J.
    Wellein, G.
    COMPUTATIONAL SCIENCE AND HIGH PERFORMANCE COMPUTING II, 2006, 91 : 273 - +
  • [6] An I/O Analysis of HPC Workloads on CephFS and Lustre
    Chiusole, Alberto
    Cozzini, Stefano
    van der Ster, Daniel
    Lamanna, Massimo
    Giuliani, Graziano
    HIGH PERFORMANCE COMPUTING: ISC HIGH PERFORMANCE 2019 INTERNATIONAL WORKSHOPS, 2020, 11887 : 300 - 316
  • [7] Towards HPC I/O Performance Prediction Through Large-scale Log Analysis
    Kim, Sunggon
    Sim, Alex
    Wu, Kesheng
    Byna, Suren
    Son, Yongseok
    Eom, Hyeonsang
    PROCEEDINGS OF THE 29TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, HPDC 2020, 2020, : 77 - 88
  • [8] PHDFS: Optimizing I/O performance of HDFS in deep learning cloud computing platform
    Zhu, Zongwei
    Tan, Luchao
    Li, Yinzhen
    Ji, Cheng
    JOURNAL OF SYSTEMS ARCHITECTURE, 2020, 109
  • [9] A Taxonomy of Error Sources in HPC I/O Machine Learning Models
    Isakov, Mihailo
    Currier, Mikaela
    del Rosario, Eliakin
    Madireddy, Sandeep
    Balaprakash, Prasanna
    Carns, Philip
    Ross, Robert B.
    Lockwood, Glenn K.
    Kinsy, Michel A.
    SC22: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2022,
  • [10] Analysis of I/O Performance for Optimizing Software Defined Storage in Cloud Integration
    Cha, Jae-Geun
    Kim, Seongwoon
    2018 IEEE 3RD INTERNATIONAL CONFERENCE ON COMMUNICATION AND INFORMATION SYSTEMS (ICCIS), 2018, : 222 - 226