Understanding HPC Application I/O Behavior Using System Level Statistics

被引：15

作者：

Paul, Arnab K. ^{[1
]}

Faaland, Olaf ^{[2
]}

Moody, Adam ^{[2
]}

Gonsiorowski, Elsa ^{[2
]}

Mohror, Kathryn ^{[2
]}

Butt, Ali R. ^{[1
]}

机构：

[1] Virginia Tech, Blacksburg, VA 24061 USA

[2] Lawrence Livermore Natl Lab, Livermore, CA 94550 USA

来源：

2020 IEEE 27TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2020) | 2020年

基金：

美国国家科学基金会;

关键词：

FILE-ACCESS;

D O I：

10.1109/HiPC50609.2020.00034

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The processor performance of high performance computing (HPC) systems is increasing at a much higher rate than storage performance. This imbalance leads to I/O performance bottlenecks in massively parallel HPC applications. Therefore, there is a need for improvements in storage and file system designs to meet the ever-growing I/O needs of HPC applications. Storage and file system designers require a deep understanding of how HPC application I/O behavior affects current storage system installations in order to improve them. In this work, we contribute to this understanding using application-agnostic file system statistics gathered on compute nodes as well as metadata and object storage file system servers. We analyze file system statistics of more than 4 million jobs over a period of three years on two systems at Lawrence Livermore National Laboratory that include a 15 PiB Lustre file system for storage. The results of our study add to the state-of-the-art in I/O understanding by providing insight into how general HPC workloads affect the performance of large-scale storage systems. Some key observations in our study show that reads and writes are evenly distributed across the storage system; applications which perform I/O, spread that I/O across similar to 78% of the minutes of their runtime on average; less than 22% of HPC users who submit write-intensive jobs perform efficient writes to the file system; and I/O contention seriously impacts I/O performance.

引用

页码：202 / 211

页数：10

共 50 条

[21] Understanding System Level Caching Behavior in Multimedia SoC
Karandikar, Prashant
Mody, Mihir
Sanghvi, Hetul
Easwaran, Vasant
Shankar, Prithvi Y. A.
Gulati, Rahul
Nandan, Neeraj
Manda, Dipan, I
Das, Subrangshu
2014 INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND SIGNAL PROCESSING (ICCSP), 2014,
[22] On the Root Causes of Cross-Application I/O Interference in HPC Storage Systems
Yildiz, Orcun
Dorier, Matthieu
Ibrahim, Shadi
Ross, Rob
Antoniu, Gabriel
2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016), 2016, : 750 - 759
[23] Arbitration Policies for On-Demand User-Level I/O Forwarding on HPC Platforms
Bez, Jean Luca
Miranda, Alberto
Nou, Ramon
Boito, Francieli Zanon
Cortes, Toni
Navaux, Philippe
2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2021, : 577 - 586
[24] Accelerating memory and I/O intensive HPC applications using hardware compression
AlSaleh, Saleh
Elrabaa, Muhammad E. S.
El-Maleh, Aiman
Daud, Khaled
Hroub, Ayman
Mudawar, Muhamed
Tonellot, Thierry
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2024, 193
[25] uMMAP-IO: User-level Memory-mapped I/O for HPC
Rivas-Gomez, Sergio
Fanfarillo, Alessandro
Valat, Sebastien
Laferriere, Christophe
Couvee, Philippe
Narasimhamurthy, Sai
Markidis, Stefano
2019 IEEE 26TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC), 2019, : 363 - 372
[26] Failure prediction using machine learning in a virtualised HPC system and application
Mohammed, Bashir
Awan, Irfan
Ugail, Hassan
Younas, Muhammad
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (02): : 471 - 485
[27] Failure prediction using machine learning in a virtualised HPC system and application
Bashir Mohammed
Irfan Awan
Hassan Ugail
Muhammad Younas
Cluster Computing, 2019, 22 : 471 - 485
[28] Using system-level models to evaluate I/O subsystem designs
Ganger, GR
Patt, YN
IEEE TRANSACTIONS ON COMPUTERS, 1998, 47 (06) : 667 - 678
[29] I/I REHAB SUCCESS COMES WITH UNDERSTANDING SYSTEM BEHAVIOR.
Nogaj, Richard J.
1600, (131):
[30] Using AWS EC2 as Test-Bed infrastructure in the I/O system configuration for HPC applications
Gomez-Sanchez, Pilar
Encinas, Diego
Panadero, Javier
Bezerra, Aprigio
Mendez, Sandra
Naiouf, Marcelo
De Giusti, Armando
Rexachs, Dolores
Luque, Emilio
JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY, 2016, 16 (02): : 65 - 75

← 1 2 3 4 5 →