Fingerprinting the Checker Policies of Parallel File Systems

被引:6
|
作者
Han, Runzhou [1 ]
Zhang, Duo [1 ]
Zheng, Mai [1 ]
机构
[1] Iowa State Univ, Ames, IA 50011 USA
关键词
D O I
10.1109/PDSW51947.2020.00013
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Parallel file systems (PFSes) play an essential role in high performance computing. To ensure the integrity, many PFSes are designed with a checker component, which serves as the last line of defense to bring a corrupted PFS back to a healthy state. Motivated by real-world incidents of PFS corruptions, we perform a fine-grained study on the capability of PFS checkers in this paper. We apply type-aware fault injection to specific PFS structures, and examine the detection and repair policies of PFS checkers meticulously via a well-defined taxonomy. The study results on two representative PFS checkers show that they are able to handle a wide range of corruptions on important data structures. On the other hand, neither of them is perfect: there are multiple cases where the checkers may behave sub-optimally, leading to kernel panics, wrong repairs, etc. Our work has led to a new patch on Lustre. We hope to develop our methodology into a generic framework for analyzing the checkers of diverse PFSes, and enable more elegant designs of PFS checkers for reliable high-performance computing.
引用
收藏
页码:46 / 51
页数:6
相关论文
共 50 条
  • [31] Performance Impact of Operating Systems' Caching Parameters on Parallel File Systems
    Inacio, Eduardo C.
    Dantas, Mario A. R.
    Boito, Francieli Z.
    Navaux, Philippe O. A.
    de Macedo, Douglas D. J.
    30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, : 2066 - 2068
  • [32] On evaluating decentralized parallel I/O scheduling strategies for parallel file systems
    Isaila, Florin
    Singh, David
    Carretero, Jesus
    Garcia, Felix
    HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2006, 2007, 4395 : 120 - +
  • [33] OPTIMIZATION OF FILE MIGRATION POLICIES IN DISTRIBUTED COMPUTER-SYSTEMS
    SHENG, ORL
    COMPUTERS & OPERATIONS RESEARCH, 1992, 19 (05) : 335 - 351
  • [34] FINGERPRINTING - A TECHNIQUE FOR FILE IDENTIFICATION AND MAINTENANCE
    MCGREGOR, DR
    MARIANI, JA
    SOFTWARE-PRACTICE & EXPERIENCE, 1982, 12 (12): : 1165 - 1166
  • [35] Mlock: building delegable metadata service for the parallel file systems
    Zhang Quan
    Feng Dan
    Wang Fang
    Wu Sen
    SCIENCE CHINA-INFORMATION SCIENCES, 2015, 58 (03) : 1 - 14
  • [36] Implementation and evaluation of active storage in modern parallel file systems
    Piernas-Canovas, Juan
    Nieplocha, Jarek
    PARALLEL COMPUTING, 2010, 36 (01) : 26 - 47
  • [37] A Technique for Lock-less Mirroring in Parallel File Systems
    Settlemyer, Bradley W.
    Ligon, Walter B., III
    CCGRID 2008: EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, VOLS 1 AND 2, PROCEEDINGS, 2008, : 801 - 806
  • [38] Mlock:building delegable metadata service for the parallel file systems
    ZHANG Quan
    FENG Dan
    WANG Fang
    WU Sen
    Science China(Information Sciences), 2015, 58 (03) : 66 - 79
  • [39] Performance Impacts with Reliable Parallel File Systems at Exascale Level
    Nou, Ramon
    Miranda, Alberto
    Cortes, Toni
    EURO-PAR 2015: PARALLEL PROCESSING, 2015, 9233 : 277 - 288
  • [40] The use of locality information on data intensive parallel file systems
    Sugawara Junior, Ricardo Ryoiti
    Sato, Liria Matsumoto
    2013 IEEE 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2013), 2013, : 167 - 173