A machine-learning-guided framework for fault-tolerant DNNs

被引:4
|
作者
Traiola, Marcello [1 ]
Kritikakou, Angeliki [1 ]
Sentieys, Olivier [1 ]
机构
[1] Univ Rennes, INRIA, CNRS, IRISA, Rennes, France
关键词
Reliability Analysis; Fault Tolerance; Machine Learning; Neural Networks; ERROR;
D O I
10.23919/DATE56975.2023.10137033
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep Neural Networks (DNNs) show promising performance in several application domains. Nevertheless, DNN results may be incorrect, not only because of the network intrinsic inaccuracy, but also due to faults affecting the hardware. Ensuring the fault tolerance of DNN is crucial, but common fault tolerance approaches are not cost-effective, due to the prohibitive overheads for large DNNs. This work proposes a comprehensive framework to assess the fault tolerance of DNN parameters and cost-effectively protect them. As a first step, the proposed framework performs a statistical fault injection. The results are used in the second step with classification-based machine learning methods to obtain a bit-accurate prediction of the criticality of all network parameters. Last, Error Correction Codes (ECCs) are selectively inserted to protect only the critical parameters, hence entailing low cost. Thanks to the proposed framework, we explored and protected two Convolutional Neural Networks (CNNs), each with four different data encoding. The results show that it is possible to protect the critical network parameters with selective ECCs while saving up to 79% memory w.r.t. conventional ECC approaches.
引用
收藏
页数:2
相关论文
共 50 条
  • [21] Fault-tolerant CAM architectures: A design framework
    Salice, F
    Sami, MG
    Stefanelli, R
    17TH IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI SYSTEMS, PROCEEDINGS, 2002, : 233 - 241
  • [22] A Communication Framework for Fault-Tolerant Parallel Execution
    Kanna, Nagarajan
    Subhlok, Jaspal
    Gabriel, Edgar
    Rohit, Eshwar
    Anderson, David
    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, 2010, 5898 : 1 - +
  • [23] An Adaptive Fusion Framework for Fault-Tolerant Multibiometrics
    Chindaro, S.
    Zhou, Z.
    Ng, M. W. R.
    Deravi, F.
    INFORMATION SECURITY AND DIGITAL FORENSICS, 2010, 41 : 155 - 162
  • [24] Fault-tolerant scheduling framework for MedioGRID system
    Pop, Florin
    Tudor, Dacian
    Cristea, Valentin
    Cretu, Vladimir
    EUROCON 2007: THE INTERNATIONAL CONFERENCE ON COMPUTER AS A TOOL, VOLS 1-6, 2007, : 1495 - 1500
  • [25] Neuromorphic Context-Dependent Learning Framework With Fault-Tolerant Spike Routing
    Yang, Shuangming
    Wang, Jiang
    Deng, Bin
    Azghadi, Mostafa Rahimi
    Linares-Barranco, Bernabe
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (12) : 7126 - 7140
  • [26] Locality-Aware and Fault-Tolerant Batching for Machine Learning on Distributed Datasets
    Liu, Liu
    Ding, Zhijun
    Cheng, Dazhao
    Zhou, Xiaobo
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2024, 12 (02) : 370 - 387
  • [27] Fault-tolerant quantum chemical calculations with improved machine-learning models
    Yuan, Kai
    Zhou, Shuai
    Li, Ning
    Li, Tianyan
    Ding, Bowen
    Guo, Danhuai
    Ma, Yingjin
    JOURNAL OF COMPUTATIONAL CHEMISTRY, 2024, 45 (31) : 2640 - 2658
  • [28] Machine-Learning-Guided Mutagenesis for Directed Evolution of Fluorescent Proteins
    Saito, Yutaka
    Oikawa, Misaki
    Nakazawa, Hikaru
    Niide, Teppei
    Kameda, Tomoshi
    Tsuda, Koji
    Umetsu, Mitsuo
    ACS SYNTHETIC BIOLOGY, 2018, 7 (09): : 2014 - 2022
  • [29] Machine-Learning-Guided Prediction Models of Critical Temperature of Cuprates
    Lee, Dongeon
    You, Daegun
    Lee, Dongwoo
    Li, Xin
    Kim, Sooran
    JOURNAL OF PHYSICAL CHEMISTRY LETTERS, 2021, 12 (26): : 6211 - 6217
  • [30] Fault-Tolerant Incremental Learning for Extreme Learning Machines
    Leung, Ho-Chun
    Leung, Chi-Sing
    Wong, Eric W. M.
    NEURAL INFORMATION PROCESSING, ICONIP 2016, PT II, 2016, 9948 : 168 - 176