On the Root Causes of Cross-Application I/O Interference in HPC Storage Systems

被引:69
|
作者
Yildiz, Orcun [1 ]
Dorier, Matthieu [2 ]
Ibrahim, Shadi [1 ]
Ross, Rob [2 ]
Antoniu, Gabriel [1 ]
机构
[1] INRIA Rennes Bretagne Atlant, Rennes, France
[2] Argonne Natl Lab, Argonne, IL 60439 USA
关键词
Exascale I/O; Parallel File Systems; Cross-Application Contention; Interference;
D O I
10.1109/IPDPS.2016.50
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As we move toward the exascale era, performance variability in HPC systems remains a challenge. I/O interference, a major cause of this variability, is becoming more important every day with the growing number of concurrent applications that share larger machines. Earlier research efforts on mitigating I/O interference focus on a single potential cause of interference (e.g., the network). Yet the root causes of I/O interference can be diverse. In this work, we conduct an extensive experimental campaign to explore the various root causes of I/O interference in HPC storage systems. We use microbenchmarks on the Grid' 5000 testbed to evaluate how the applications' access pattern, the network components, the file system's configuration, and the backend storage devices influence I/O interference. Our studies reveal that in many situations interference is a result of bad flow control in the I/O path, rather than being caused by some single bottleneck in one of its components. We further show that interference-free behavior is not necessarily a sign of optimal performance. To the best of our knowledge, our work provides the first deep insight into the role of each of the potential root causes of interference and their interplay. Our findings can help developers and platform owners improve I/O performance and motivate further research addressing the problem across all components of the I/O stack.
引用
收藏
页码:750 / 759
页数:10
相关论文
共 50 条
  • [31] Memory-Conscious Collective I/O for Extreme-scale HPC Systems
    Lu, Yin
    Chen, Yong
    Thakur, Rajeev
    Zhuang, Yu
    2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 1361 - +
  • [32] Towards I/O analysis of HPC systems and a generic architecture to collect access patterns
    Wiedemann, Marc C.
    Kunkel, Julian M.
    Zimmer, Michaela
    Ludwig, Thomas
    Resch, Michael
    Boenisch, Thomas
    Wang, Xuan
    Chut, Andriy
    Aguilera, Alvaro
    Nagel, Wolfgang E.
    Kluge, Michael
    Mickler, Holger
    COMPUTER SCIENCE-RESEARCH AND DEVELOPMENT, 2013, 28 (2-3): : 241 - 251
  • [33] Memory-Conscious Collective I/O for Extreme-Scale HPC Systems
    Lu, Yin
    Chen, Yong
    Thakur, Rajeev
    Zhuang, Yu
    2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 1360 - 1360
  • [34] I/O Performance Modeling of Virtualized Storage Systems
    Noorshams, Qais
    Rostami, Kiana
    Kounev, Samuel
    Tuma, Petr
    Reussner, Ralf
    2013 IEEE 21ST INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS & SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS 2013), 2013, : 121 - +
  • [35] I/O profiling for distributed IP storage systems
    Han, JZ
    Zhou, D
    He, XB
    Gao, JZ
    ICESS 2005: SECOND INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS, 2005, : 581 - 586
  • [36] I/O Characteristics Discovery in Cloud Storage Systems
    Zhou, Jiang
    Dai, Dong
    Mao, Yu
    Chen, Xin
    Zhuang, Yu
    Chen, Yong
    PROCEEDINGS 2018 IEEE 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2018, : 170 - 177
  • [37] Taming I/O Variation on QoS-Less HPC Storage: What Can Applications Do?
    Qiao, Zhenbo
    Liu, Qing
    Podhorszki, Norbert
    Klasky, Scott
    Chen, Jieyang
    PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20), 2020,
  • [38] Investigation Of Leading HPC I/O Performance Using A Scientific-Application Derived Benchmark
    Borrill, Julian
    Oliker, Leonid
    Shalf, John
    Shan, Hongzhang
    2007 ACM/IEEE SC07 CONFERENCE, 2010, : 488 - 499
  • [39] Measuring I/O Performance of Lustre and the Temporary File System for Tradespace Applications on HPC Systems
    Kosta, Leonard
    Hunter, Harrison
    George, Glover
    Strelzoff, Andrew
    Matthews, Suzanne J.
    PROCEEDINGS OF THE SOUTHEAST CONFERENCE ACM SE'17, 2017, : 187 - 190
  • [40] I/O path based performance model for storage systems
    Department of Precision Instruments and Mechanology, Tsinghua University, Beijing 100084, China
    Qinghua Daxue Xuebao, 2006, 11 (1824-1827):