Understanding HPC Application I/O Behavior Using System Level Statistics

被引:15
|
作者
Paul, Arnab K. [1 ]
Faaland, Olaf [2 ]
Moody, Adam [2 ]
Gonsiorowski, Elsa [2 ]
Mohror, Kathryn [2 ]
Butt, Ali R. [1 ]
机构
[1] Virginia Tech, Blacksburg, VA 24061 USA
[2] Lawrence Livermore Natl Lab, Livermore, CA 94550 USA
基金
美国国家科学基金会;
关键词
FILE-ACCESS;
D O I
10.1109/HiPC50609.2020.00034
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The processor performance of high performance computing (HPC) systems is increasing at a much higher rate than storage performance. This imbalance leads to I/O performance bottlenecks in massively parallel HPC applications. Therefore, there is a need for improvements in storage and file system designs to meet the ever-growing I/O needs of HPC applications. Storage and file system designers require a deep understanding of how HPC application I/O behavior affects current storage system installations in order to improve them. In this work, we contribute to this understanding using application-agnostic file system statistics gathered on compute nodes as well as metadata and object storage file system servers. We analyze file system statistics of more than 4 million jobs over a period of three years on two systems at Lawrence Livermore National Laboratory that include a 15 PiB Lustre file system for storage. The results of our study add to the state-of-the-art in I/O understanding by providing insight into how general HPC workloads affect the performance of large-scale storage systems. Some key observations in our study show that reads and writes are evenly distributed across the storage system; applications which perform I/O, spread that I/O across similar to 78% of the minutes of their runtime on average; less than 22% of HPC users who submit write-intensive jobs perform efficient writes to the file system; and I/O contention seriously impacts I/O performance.
引用
收藏
页码:202 / 211
页数:10
相关论文
共 50 条
  • [1] Toward Understanding I/O Behavior in HPC Workflows
    Luettgau, Jakob
    Snyder, Shane
    Carns, Philip
    Wozniak, Justin M.
    Kunkel, Julian
    Ludwig, Thomas
    PROCEEDINGS OF 2018 IEEE/ACM 3RD JOINT INTERNATIONAL WORKSHOP ON PARALLEL DATA STORAGE & DATA INTENSIVE SCALABLE COMPUTING SYSTEMS (PDSW-DISCS), 2018, : 64 - 75
  • [2] A Multi-Level Approach for Understanding I/O Activity in HPC Applications
    Luu, Houng
    Behzad, Babak
    Aydt, Ruth
    Winslett, Marianne
    2013 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2013,
  • [3] A parallel I/O behavior model for HPC applications using serial I/O libraries
    Gomez-Sanchez, Pilar
    Mendez, Sandra
    Rexachs, Dolores
    Luque, Emilio
    2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 244 - 251
  • [4] Extracting and characterizing I/O behavior of HPC workloads
    Devarajan, Hariharan
    Mohror, Kathryn
    2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022), 2022, : 243 - 255
  • [5] Enhance Virtualized HPC System Based on I/O Behavior Perception and Asymmetric Scheduling
    Hu, Yanyan
    Long, Xiang
    Zhang, Jiong
    2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS), 2012, : 169 - 178
  • [6] Methodology and Application of HPC I/O Characterization with MPIProf and IOT
    Chang, Yan-Tyng Sherry
    Jin, Henry
    Bauer, John
    PROCEEDINGS OF ESPT 2016: 5TH WORKSHOP ON EXTREME-SCALE PROGRAMMING TOOLS, 2016, : 1 - 8
  • [7] A Quantitative Study of the Spatiotemporal I/O Burstiness of HPC Application
    Yang, Wenxiang
    Liao, Xiangke
    Dong, Dezun
    Yu, Jie
    2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022), 2022, : 1349 - 1359
  • [8] Evaluation of HPC Application I/O on Object Storage Systems
    Liu, Jialin
    Koziol, Quincey
    Butler, Gregory F.
    Fortner, Neil
    Chaarawi, Mohamad
    Tang, Houjun
    Byna, Suren
    Lockwood, Glenn K.
    Cheema, Ravi
    Kallback-Rose, Kristy A.
    Hazen, Damian
    Prabhat
    PROCEEDINGS OF 2018 IEEE/ACM 3RD JOINT INTERNATIONAL WORKSHOP ON PARALLEL DATA STORAGE & DATA INTENSIVE SCALABLE COMPUTING SYSTEMS (PDSW-DISCS), 2018, : 24 - 34
  • [9] A Quantitative Study of the Spatiotemporal I/O Burstiness of HPC Application
    Yang, Wenxiang
    Liao, Xiangke
    Dong, Dezun
    Yu, Jie
    Proceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium, IPDPS 2022, 2022, : 1349 - 1359
  • [10] Investigation Of Leading HPC I/O Performance Using A Scientific-Application Derived Benchmark
    Borrill, Julian
    Oliker, Leonid
    Shalf, John
    Shan, Hongzhang
    2007 ACM/IEEE SC07 CONFERENCE, 2010, : 488 - 499