A machine-learning-guided framework for fault-tolerant DNNs

被引:4
|
作者
Traiola, Marcello [1 ]
Kritikakou, Angeliki [1 ]
Sentieys, Olivier [1 ]
机构
[1] Univ Rennes, INRIA, CNRS, IRISA, Rennes, France
关键词
Reliability Analysis; Fault Tolerance; Machine Learning; Neural Networks; ERROR;
D O I
10.23919/DATE56975.2023.10137033
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep Neural Networks (DNNs) show promising performance in several application domains. Nevertheless, DNN results may be incorrect, not only because of the network intrinsic inaccuracy, but also due to faults affecting the hardware. Ensuring the fault tolerance of DNN is crucial, but common fault tolerance approaches are not cost-effective, due to the prohibitive overheads for large DNNs. This work proposes a comprehensive framework to assess the fault tolerance of DNN parameters and cost-effectively protect them. As a first step, the proposed framework performs a statistical fault injection. The results are used in the second step with classification-based machine learning methods to obtain a bit-accurate prediction of the criticality of all network parameters. Last, Error Correction Codes (ECCs) are selectively inserted to protect only the critical parameters, hence entailing low cost. Thanks to the proposed framework, we explored and protected two Convolutional Neural Networks (CNNs), each with four different data encoding. The results show that it is possible to protect the critical network parameters with selective ECCs while saving up to 79% memory w.r.t. conventional ECC approaches.
引用
收藏
页数:2
相关论文
共 50 条
  • [1] Machine-Learning-Guided Discovery of Electrochemical Reactions
    Zahrt, Andrew F.
    Mo, Yiming
    Nandiwale, Kakasaheb Y.
    Shprints, Ron
    Heid, Esther
    Jensen, Klavs F.
    JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2022, 144 (49) : 22599 - 22610
  • [2] Fault-Tolerant Low-Precision DNNs using Explainable AI
    Sabih, Muhammad
    Hannig, Frank
    Teich, Juergen
    51ST ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN-W 2021), 2021, : 166 - 174
  • [3] MaLT: Machine-Learning-Guided Test Case Design and Fault Localization of Complex Software Systems
    Ji, Yi
    Mak, Simon
    Lekivetz, Ryan
    Morgan, Joseph
    2024 22ND ACM-IEEE INTERNATIONAL SYMPOSIUM ON FORMAL METHODS AND MODELS FOR SYSTEM DESIGN, MEMOCODE 2024, 2024, : 58 - 62
  • [4] (An Outline Of) A Fault-Tolerant Turing Machine
    capuni, Ilir
    ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2013, (128):
  • [5] A Fault-Tolerant Framework for Web Services
    Liu, Lingxia
    Wu, ZhaoXue
    Ma, Zhiqiang
    Wei, Wei
    2009 WRI WORLD CONGRESS ON SOFTWARE ENGINEERING, VOL 3, PROCEEDINGS, 2009, : 138 - 142
  • [6] harDNNing: a machine-learning-based framework for fault tolerance assessment and protection of DNNs
    Traiola, Marcello
    Kritikakou, Angeliki
    Sentieys, Olivier
    2023 IEEE EUROPEAN TEST SYMPOSIUM, ETS, 2023,
  • [7] Machine-Learning-Guided Selectively Unsound Static Analysis
    Heo, Kihong
    Oh, Hakjoo
    Yi, Kwangkeun
    2017 IEEE/ACM 39TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2017, : 519 - 529
  • [8] Machine-learning-guided directed evolution for protein engineering
    Yang, Kevin K.
    Wu, Zachary
    Arnold, Frances H.
    NATURE METHODS, 2019, 16 (08) : 687 - 694
  • [9] Fault-Tolerant Algorithm for Software Preduction Using Machine Learning Techniques
    Kumar, Jullius
    Gupta, Dharmendra Lal
    Umrao, Lokendra Singh
    INTERNATIONAL JOURNAL OF SOFTWARE SCIENCE AND COMPUTATIONAL INTELLIGENCE-IJSSCI, 2022, 14 (01):
  • [10] Machine-learning-guided directed evolution for protein engineering
    Kevin K. Yang
    Zachary Wu
    Frances H. Arnold
    Nature Methods, 2019, 16 : 687 - 694