Understanding HPC Application I/O Behavior Using System Level Statistics

被引:15
|
作者
Paul, Arnab K. [1 ]
Faaland, Olaf [2 ]
Moody, Adam [2 ]
Gonsiorowski, Elsa [2 ]
Mohror, Kathryn [2 ]
Butt, Ali R. [1 ]
机构
[1] Virginia Tech, Blacksburg, VA 24061 USA
[2] Lawrence Livermore Natl Lab, Livermore, CA 94550 USA
来源
2020 IEEE 27TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2020) | 2020年
基金
美国国家科学基金会;
关键词
FILE-ACCESS;
D O I
10.1109/HiPC50609.2020.00034
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The processor performance of high performance computing (HPC) systems is increasing at a much higher rate than storage performance. This imbalance leads to I/O performance bottlenecks in massively parallel HPC applications. Therefore, there is a need for improvements in storage and file system designs to meet the ever-growing I/O needs of HPC applications. Storage and file system designers require a deep understanding of how HPC application I/O behavior affects current storage system installations in order to improve them. In this work, we contribute to this understanding using application-agnostic file system statistics gathered on compute nodes as well as metadata and object storage file system servers. We analyze file system statistics of more than 4 million jobs over a period of three years on two systems at Lawrence Livermore National Laboratory that include a 15 PiB Lustre file system for storage. The results of our study add to the state-of-the-art in I/O understanding by providing insight into how general HPC workloads affect the performance of large-scale storage systems. Some key observations in our study show that reads and writes are evenly distributed across the storage system; applications which perform I/O, spread that I/O across similar to 78% of the minutes of their runtime on average; less than 22% of HPC users who submit write-intensive jobs perform efficient writes to the file system; and I/O contention seriously impacts I/O performance.
引用
收藏
页码:202 / 211
页数:10
相关论文
共 50 条
  • [31] Motivation and Implementation of a Dynamic Remote Storage System for I/O Demanding HPC Applications
    Neuer, Matthias
    Salk, Juergen
    Berger, Holger
    Focht, Erich
    Mosch, Christian
    Siegmund, Karsten
    Kushnarenko, Volodymyr
    Kombrink, Stefan
    Wesner, Stefan
    HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2016 INTERNATIONAL WORKSHOPS, 2016, 9945 : 616 - 626
  • [32] Detection of I/O overloads in local HPC cluster Rocks with LUSTRE file system
    Edison Ramirez, John
    Rojas Cordero, Alexis
    2017 CONGRESO INTERNACIONAL DE INNOVACION Y TENDENCIAS EN INGENIERIA (CONIITI), 2017,
  • [33] CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination
    Dorier, Maahieu
    Antoniu, Gabriel
    Ross, Rob
    Kimpe, Dries
    Ibrahim, Shadi
    2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
  • [34] Parallel I/O Performance for Application-Level Checkpointing on the Blue Gene/P System
    Fu, Jing
    Min, Misun
    Latham, Robert
    Carothers, Christopher D.
    2011 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2011, : 465 - 473
  • [35] Using Formal Grammars to Predict I/O Behaviors in HPC: The Omnisc'IO Approach
    Dorier, Matthieu
    Ibrahim, Shadi
    Antoniu, Gabriel
    Ross, Rob
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (08) : 2435 - 2449
  • [36] Characterizing and Predicting the I/O Performance of HPC Applications Using a Parameterized Synthetic Benchmark
    Shan, Hongzhang
    Antypas, Katie
    Shalf, John
    INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2008, : 408 - 419
  • [37] ION: Navigating the HPC I/O Optimization Journey using Large Language Models
    Egersdoerfer, Chris
    Sareen, Arnav
    Bez, Jean Luca
    Byna, Suren
    Dai, Dong
    PROCEEDINGS OF THE 2024 16TH ACM WORKSHOP ON HOT TOPICS IN STORAGE AND FILE SYSTEMS, HOTSTORAGE 2024, 2024, : 86 - 92
  • [38] Toward Managing HPC Burst Buffers Effectively: Draining Strategy to Regulate Bursty I/O Behavior
    Tang, Kun
    Huang, Ping
    He, Xubin
    Lu, Tao
    Vazhkudai, Sudharshan S.
    Tiwari, Devesh
    2017 IEEE 25TH INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS, AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS), 2017, : 87 - 98
  • [39] Accelerating I/O performance of ZFS-based Lustre file system in HPC environment
    Bang, Jiwoo
    Kim, Chungyong
    Byun, Eun-Kyu
    Sung, Hanul
    Lee, Jaehwan
    Eom, Hyeonsang
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (07): : 7665 - 7691
  • [40] System-level I/O power modeling
    Pinello, WP
    Patel, PR
    Li, YL
    MICROELECTRONIC YIELD, RELIABILITY, AND ADVANCED PACKAGING, 2000, 4229 : 217 - 220