Fingerprinting the Checker Policies of Parallel File Systems

被引:6
|
作者
Han, Runzhou [1 ]
Zhang, Duo [1 ]
Zheng, Mai [1 ]
机构
[1] Iowa State Univ, Ames, IA 50011 USA
关键词
D O I
10.1109/PDSW51947.2020.00013
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Parallel file systems (PFSes) play an essential role in high performance computing. To ensure the integrity, many PFSes are designed with a checker component, which serves as the last line of defense to bring a corrupted PFS back to a healthy state. Motivated by real-world incidents of PFS corruptions, we perform a fine-grained study on the capability of PFS checkers in this paper. We apply type-aware fault injection to specific PFS structures, and examine the detection and repair policies of PFS checkers meticulously via a well-defined taxonomy. The study results on two representative PFS checkers show that they are able to handle a wide range of corruptions on important data structures. On the other hand, neither of them is perfect: there are multiple cases where the checkers may behave sub-optimally, leading to kernel panics, wrong repairs, etc. Our work has led to a new patch on Lustre. We hope to develop our methodology into a generic framework for analyzing the checkers of diverse PFSes, and enable more elegant designs of PFS checkers for reliable high-performance computing.
引用
收藏
页码:46 / 51
页数:6
相关论文
共 50 条
  • [1] CACHING AND WRITEBACK POLICIES IN PARALLEL FILE-SYSTEMS
    KOTZ, D
    ELLIS, CS
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1993, 17 (1-2) : 140 - 145
  • [2] FaultyRank: A Graph-based Parallel File System Checker
    Kamat, Saisha
    Islam, Abdullah Al Raqibul
    Zheng, Mai
    Dai, Dong
    2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS, 2023, : 200 - 210
  • [3] Parallel file systems
    Kuhn M.
    Informatik-Spektrum, 2019, 42 (05): : 360 - 364
  • [4] ParaMoC: A Parallel Model Checker for Pushdown Systems
    Wei, Hansheng
    Ye, Xin
    Shi, Jianqi
    Huang, Yanhong
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2019, PT II, 2020, 11945 : 305 - 312
  • [5] On Distributed File Tree Walk of Parallel File Systems
    LaFon, Jharrod
    Misra, Satyajayant
    Bringhurst, Jon
    2012 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2012,
  • [6] Small-File Access in Parallel File Systems
    Carns, Philip
    Lang, Sam
    Ross, Robert
    Vilayannur, Murali
    Kunkel, Julian
    Ludwig, Thomas
    2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 524 - +
  • [7] An Incremental File System Consistency Checker for Block-Level CDP Systems
    Lu, Maohua
    Chiueh, Tzi-cker
    Lin, Shibiao
    PROCEEDINGS OF THE SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, 2008, : 157 - +
  • [8] Quantifying the Effects of Contention on Parallel File Systems
    Wright, Steven A.
    Jarvis, Stephen A.
    2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 932 - 940
  • [9] A Generic Framework for Testing Parallel File Systems
    Cao, Jinrui
    Wang, Simeng
    Dai, Dong
    Zheng, Mai
    Chen, Yong
    PROCEEDINGS OF PDSW-DISCS 2016 - 1ST JOINT INTERNATIONAL WORKSHOP ON PARALLEL DATA STORAGE AND DATA INTENSIVE SCALABLE COMPUTING SYSTEMS, 2016, : 49 - 54
  • [10] Hint controlled distribution with parallel file systems
    Lucas, HV
    Ludwig, T
    RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE, PROCEEDINGS, 2005, 3666 : 110 - 118