Towards I/O analysis of HPC systems and a generic architecture to collect access patterns

被引:11
|
作者
Wiedemann, Marc C. [1 ,2 ]
Kunkel, Julian M. [2 ]
Zimmer, Michaela [2 ]
Ludwig, Thomas [2 ]
Resch, Michael [3 ]
Boenisch, Thomas [3 ]
Wang, Xuan [3 ]
Chut, Andriy [3 ]
Aguilera, Alvaro [4 ]
Nagel, Wolfgang E. [4 ]
Kluge, Michael [4 ]
Mickler, Holger [4 ]
机构
[1] Bundesstr 45a, D-20146 Hamburg, Germany
[2] Univ Hamburg, Deutsch Klimarechenzentrum GmbH, Hamburg, Germany
[3] Univ Stuttgart, High Performance Comp Ctr Stuttgart HLRS, Stuttgart, Germany
[4] Tech Univ Dresden, Zentrum Informationsdienste & Hochleistungsrechne, Dresden, Germany
来源
关键词
I/O analysis; I/O path; Causality tree;
D O I
10.1007/s00450-012-0221-5
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In high-performance computing applications, a high-level I/O call will trigger activities on a multitude of hardware components. These are massively parallel systems supported by huge storage systems and internal software layers. Their complex interplay currently makes it impossible to identify the causes for and the locations of I/O bottlenecks. Existing tools indicate when a bottleneck occurs but provide little guidance in identifying the cause or improving the situation. We have thus initiated Scalable I/O for Extreme Performance to find solutions for this problem. To achieve this goal in SIOX, we will build a system to record access information on all layers and components, to recognize access patterns, and to characterize the I/O system. The system will ultimately be able to recognize the causes of the I/O bottlenecks and propose optimizations for the I/O middleware that can improve I/O performance, such as throughput rate and latency. Furthermore, the SIOX system will be able to support decision making while planning new I/O systems. In this paper, we introduce the SIOX system and describe its current status: We first outline our approach for collecting the required access information. We then provide the architectural concept, the methods for reconstructing the I/O path and an excerpt of the interface for data collection. This paper focuses especially on the architecture, which collects and combines the relevant access information along the I/O path, and which is responsible for the efficient transfer of this information. An abstract modelling approach allows us to better understand the complexity of the analysis of the I/O activities on parallel computing systems, and an abstract interface allows us to adapt the SIOX system to various HPC file systems.
引用
收藏
页码:241 / 251
页数:11
相关论文
共 50 条
  • [21] On the Load Imbalance Problem of I/O Forwarding Layer in HPC Systems
    Yu, Jie
    Liu, Guangming
    Dong, Wenrui
    Li, Xiaoyong
    Zhang, Jian
    Sun, Fuxing
    PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 2424 - 2428
  • [22] An empirical study of I/O separation for burst buffers in HPC systems
    Koo, Donghun
    Lee, Jaehwan
    Liu, Jialin
    Byun, Eun-Kyu
    Kwak, Jae-Hyuck
    Lockwood, Glenn K.
    Hwang, Soonwook
    Antypas, Katie
    Wu, Kesheng
    Eom, Hyeonsang
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2021, 148 : 96 - 108
  • [23] Uncovering Access, Reuse, and Sharing Characteristics of I/O-Intensive Files on Large-Scale Production HPC Systems
    Patel, Tirthak
    Byna, Suren
    Lockwood, Glenn K.
    Wright, Nicholas J.
    Carns, Philip
    Ross, Robert
    Tiwari, Devesh
    PROCEEDINGS OF THE 18TH USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES, 2020, : 91 - 101
  • [24] Grid data access architecture based on application I/O phases and I/O communities
    Perez, JM
    Carretero, J
    Garcia, JD
    Sanchez, LM
    PDPTA '04: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS 1-3, 2004, : 568 - 574
  • [25] Characterizing I/O Workloads of HPC Applications Through Online Analysis
    Dong, Wenrui
    Liu, Guangming
    Yu, Jie
    Zuo, You
    2015 IEEE 34TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2015,
  • [26] Optimizing HPC I/O Performance with Regression Analysis and Ensemble Learning
    Liu, Zhangyu
    Zhang, Cheng
    Wu, Huijun
    Fang, Jianbin
    Peng, Lin
    Ye, Guixin
    Tang, Zhanyong
    2023 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, CLUSTER, 2023, : 234 - 246
  • [27] Modeling Power Consumption of Lossy Compressed I/O for Exascale HPC Systems
    Wilkins, Grant
    Calhoun, Jon C.
    2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 1118 - 1126
  • [28] Characterizing Machine Learning I/O Workloads on Leadership Scale HPC Systems
    Paul, Arnab K.
    Karimi, Ahmad Maroof
    Wang, Feiyi
    29TH INTERNATIONAL SYMPOSIUM ON THE MODELING, ANALYSIS, AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS 2021), 2021, : 198 - 205
  • [29] Spatio-temporal Analysis of HPC I/O and Connection Data
    Kim, Jinoh
    Choi, Jinhwan
    Sim, Alex
    2018 IEEE 38TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2018, : 1585 - 1588
  • [30] HPC I/O Throughput Bottleneck Analysis with hxplainable Local Models
    Isakov, Mihailo
    del Rosario, Eliakin
    Madireddy, Sandeep
    Balaprakash, Prasanna
    Carns, Philip
    Ross, Robert B.
    Kinsy, Michel A.
    PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20), 2020,