Byzantine fault tolerance in distributed machine learning: a survey

被引:0
|
作者
Bouhata, Djamila [1 ,2 ]
Moumen, Hamouma [1 ,2 ]
Mazari, Jocelyn Ahmed [3 ,4 ]
Bounceur, Ahcene [5 ]
机构
[1] Univ Batna, Comp Sci Dept, 2 53 Constantine Rd, Batna 05078, Algeria
[2] Lab Applicat Math Comp & Elect, Comp Sci Dept, Batna, Algeria
[3] Sorbonne Univ, CNRS, ISIR, Paris, France
[4] Extrality, Paris, France
[5] Univ Sharjah, Informat Syst Dept, Sharjah, U Arab Emirates
关键词
Byzantine fault tolerance; distributed machine learning; stochastic gradient descent; communication; optimisation; SUBGRADIENT METHODS; COORDINATE DESCENT; GRADIENT DESCENT; AGREEMENT; GENERALS;
D O I
10.1080/0952813X.2024.2391778
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Byzantine Fault Tolerance (BFT) is crucial for ensuring the resilience of Distributed Machine Learning (DML) systems during training under adversarial conditions. Among the rising corpus of research on BFT in DML, there is no comprehensive classification of techniques or broad analysis of different approaches. This paper provides an in-depth survey of recent advancements in BFT for DML, with a focus on first-order optimisation methods, particularly, the popular one Stochastic Gradient Descent (SGD) during the training phase. We offer a novel classification of BFT approaches based on characteristics such as the communication process, optimisation method, and topology setting. This classification aims to enhance the understanding of various BFT methods and guide future research in addressing open challenges in the field. This work provides the foundations for developing robust BFT systems, using a variety of optimisation methods to strengthen resilience.
引用
收藏
页数:59
相关论文
共 50 条
  • [21] A survey of methods for distributed machine learning
    Peteiro-Barral, Diego
    Guijarro-Berdinas, Bertha
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2013, 2 (01) : 1 - 11
  • [22] Fault Tolerance of Cloud Infrastructure with Machine Learning
    Kalaskar, Chetankumar
    Thangam, S.
    CYBERNETICS AND INFORMATION TECHNOLOGIES, 2023, 23 (04) : 26 - 50
  • [23] On misbehaviour and fault tolerance in machine learning systems
    Myllyaho, Lalli
    Raatikainen, Mikko
    Mannisto, Tomi
    Nurminen, Jukka K.
    Mikkonen, Tommi
    JOURNAL OF SYSTEMS AND SOFTWARE, 2022, 183
  • [24] Byzantine Fault Tolerance of Regenerating Codes
    Oggier, Frederique
    Datta, Anwitaman
    2011 IEEE INTERNATIONAL CONFERENCE ON PEER-TO-PEER COMPUTING (P2P), 2011, : 112 - 121
  • [25] CloudBFT: Elastic Byzantine Fault Tolerance
    Nogueira, Rodrigo
    Araujo, Filipe
    Barbosa, Raul
    2014 20TH IEEE PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING (PRDC 2014), 2014, : 180 - 189
  • [26] RBFT: Redundant Byzantine Fault Tolerance
    Aublin, Pierre-Louis
    Ben Mokhtar, Sonia
    Quema, Vivien
    2013 IEEE 33RD INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2013, : 297 - 306
  • [27] Efficient Byzantine Fault-Tolerance
    Veronese, Giuliana Santos
    Correia, Miguel
    Bessani, Alysson Neves
    Lung, Lau Cheuk
    Verissimo, Paulo
    IEEE TRANSACTIONS ON COMPUTERS, 2013, 62 (01) : 16 - 30
  • [28] Byzantine fault tolerance for nondeterministic applications
    Zhao, Weribing
    DASC 2007: THIRD IEEE INTERNATIONAL SYMPOSIUM ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, PROCEEDINGS, 2007, : 108 - 115
  • [29] Byzantine fault tolerance for agent systems
    Araragi, Tadashi
    DEPCOS-RELCOMEX 2006, 2006, : 232 - 239
  • [30] Byzantine fault tolerance can be fast
    Castro, M
    Liskov, B
    INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS, 2001, : 513 - 518