Understanding HPC Application I/O Behavior Using System Level Statistics

被引:15
|
作者
Paul, Arnab K. [1 ]
Faaland, Olaf [2 ]
Moody, Adam [2 ]
Gonsiorowski, Elsa [2 ]
Mohror, Kathryn [2 ]
Butt, Ali R. [1 ]
机构
[1] Virginia Tech, Blacksburg, VA 24061 USA
[2] Lawrence Livermore Natl Lab, Livermore, CA 94550 USA
来源
2020 IEEE 27TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2020) | 2020年
基金
美国国家科学基金会;
关键词
FILE-ACCESS;
D O I
10.1109/HiPC50609.2020.00034
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The processor performance of high performance computing (HPC) systems is increasing at a much higher rate than storage performance. This imbalance leads to I/O performance bottlenecks in massively parallel HPC applications. Therefore, there is a need for improvements in storage and file system designs to meet the ever-growing I/O needs of HPC applications. Storage and file system designers require a deep understanding of how HPC application I/O behavior affects current storage system installations in order to improve them. In this work, we contribute to this understanding using application-agnostic file system statistics gathered on compute nodes as well as metadata and object storage file system servers. We analyze file system statistics of more than 4 million jobs over a period of three years on two systems at Lawrence Livermore National Laboratory that include a 15 PiB Lustre file system for storage. The results of our study add to the state-of-the-art in I/O understanding by providing insight into how general HPC workloads affect the performance of large-scale storage systems. Some key observations in our study show that reads and writes are evenly distributed across the storage system; applications which perform I/O, spread that I/O across similar to 78% of the minutes of their runtime on average; less than 22% of HPC users who submit write-intensive jobs perform efficient writes to the file system; and I/O contention seriously impacts I/O performance.
引用
收藏
页码:202 / 211
页数:10
相关论文
共 50 条
  • [21] Understanding System Level Caching Behavior in Multimedia SoC
    Karandikar, Prashant
    Mody, Mihir
    Sanghvi, Hetul
    Easwaran, Vasant
    Shankar, Prithvi Y. A.
    Gulati, Rahul
    Nandan, Neeraj
    Manda, Dipan, I
    Das, Subrangshu
    2014 INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND SIGNAL PROCESSING (ICCSP), 2014,
  • [22] On the Root Causes of Cross-Application I/O Interference in HPC Storage Systems
    Yildiz, Orcun
    Dorier, Matthieu
    Ibrahim, Shadi
    Ross, Rob
    Antoniu, Gabriel
    2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016), 2016, : 750 - 759
  • [23] Arbitration Policies for On-Demand User-Level I/O Forwarding on HPC Platforms
    Bez, Jean Luca
    Miranda, Alberto
    Nou, Ramon
    Boito, Francieli Zanon
    Cortes, Toni
    Navaux, Philippe
    2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2021, : 577 - 586
  • [24] Accelerating memory and I/O intensive HPC applications using hardware compression
    AlSaleh, Saleh
    Elrabaa, Muhammad E. S.
    El-Maleh, Aiman
    Daud, Khaled
    Hroub, Ayman
    Mudawar, Muhamed
    Tonellot, Thierry
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2024, 193
  • [25] uMMAP-IO: User-level Memory-mapped I/O for HPC
    Rivas-Gomez, Sergio
    Fanfarillo, Alessandro
    Valat, Sebastien
    Laferriere, Christophe
    Couvee, Philippe
    Narasimhamurthy, Sai
    Markidis, Stefano
    2019 IEEE 26TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC), 2019, : 363 - 372
  • [26] Failure prediction using machine learning in a virtualised HPC system and application
    Mohammed, Bashir
    Awan, Irfan
    Ugail, Hassan
    Younas, Muhammad
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (02): : 471 - 485
  • [27] Failure prediction using machine learning in a virtualised HPC system and application
    Bashir Mohammed
    Irfan Awan
    Hassan Ugail
    Muhammad Younas
    Cluster Computing, 2019, 22 : 471 - 485
  • [28] Using system-level models to evaluate I/O subsystem designs
    Ganger, GR
    Patt, YN
    IEEE TRANSACTIONS ON COMPUTERS, 1998, 47 (06) : 667 - 678
  • [29] I/I REHAB SUCCESS COMES WITH UNDERSTANDING SYSTEM BEHAVIOR.
    Nogaj, Richard J.
    1600, (131):
  • [30] Using AWS EC2 as Test-Bed infrastructure in the I/O system configuration for HPC applications
    Gomez-Sanchez, Pilar
    Encinas, Diego
    Panadero, Javier
    Bezerra, Aprigio
    Mendez, Sandra
    Naiouf, Marcelo
    De Giusti, Armando
    Rexachs, Dolores
    Luque, Emilio
    JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY, 2016, 16 (02): : 65 - 75