Federated learning with workload-aware client scheduling in heterogeneous systems

被引：13

作者：

Li, Li ^{[1
]}

Liu, Duo ^{[1
]}

Duan, Moming ^{[1
]}

Zhang, Yu ^{[1
]}

Ren, Ao ^{[1
]}

Chen, Xianzhang ^{[1
]}

Tan, Yujuan ^{[1
]}

Wang, Chengliang ^{[1
]}

机构：

[1] Chongqing Univ, Coll Comp Sci, Chongqing, Peoples R China

来源：

NEURAL NETWORKS | 2022年 / 154卷

基金：

中国国家自然科学基金;

关键词：

Federated learning; Distributed machine learning; Heterogeneous systems; Neural Networks;

D O I：

10.1016/j.neunet.2022.07.030

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Federated Learning (FL) is a novel distributed machine learning, which allows thousands of edge devices to train models locally without uploading data to the central server. Since devices in real federated settings are resource-constrained, FL encounters systems heterogeneity, which causes considerable stragglers and incurs significant accuracy degradation. To tackle the challenges of systems heterogeneity and improve the robustness of the global model, we propose a novel adaptive federated framework in this paper. Specifically, we propose FedSAE that leverages the workload completion history of clients to adaptively predict the affordable training workload for each device. Consequently, FedSAE can significantly reduce stragglers in highly heterogeneous systems. We incorporate Active Learning into FedSAE to dynamically schedule participants. The server evaluates the devices' training value based on their training loss in each round, and larger-value clients are selected with a higher probability. As a result, the model convergence is accelerated. Furthermore, we propose q-FedSAE that combines FedSAE and q-FFL to improve global fairness in highly heterogeneous systems. The evaluations conducted in a highly heterogeneous system demonstrate that both FedSAE and q-FedSAE converge faster than FedAvg. In particular, FedSAE outperforms FedAvg across multiple federated datasets - FedSAE improves testing accuracy by 22.19% and reduces stragglers by 90.69% on average. Moreover, holding the same accuracy as FedSAE, q-FedSAE allows for more robust convergence and fairer model performance than q-FedAvg, FedSAE.(c) 2022 Elsevier Ltd. All rights reserved.

引用

页码：560 / 573

页数：14

共 50 条

[31] Energy-Aware Scheduling Scheme Using Workload-Aware Consolidation Technique in Cloud Data Centres
Li Hongyou
Wang Jiangyong
Peng Jian
Wang Junfeng
Liu Tang
CHINA COMMUNICATIONS, 2013, 10 (12) : 114 - 124
[32] Workload-Aware Provisioning in Public Clouds
Xu, Yunjing
Musgrave, Zachary
Noble, Brian
Bailey, Michael
IEEE INTERNET COMPUTING, 2014, 18 (04) : 15 - 21
[33] A Practical Approach For Workload-Aware Data Movement in Disaggregated Memory Systems
Puri, Amit
Bellamkonda, Kartheek
Narreddy, Kailash
Jose, John
Venkatesh, Tamarapalli
2023 IEEE 35TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING, SBAC-PAD, 2023, : 78 - 88
[34] WatCache: a workload-aware temporary cache on the compute side of HPC systems
Yu, Jie
Liu, Guangming
Dong, Wenrui
Li, Xiaoyong
JOURNAL OF SUPERCOMPUTING, 2019, 75 (02): : 554 - 586
[35] Adaptive Workload-Aware Task Scheduling for Single-ISA Asymmetric Multicore Architectures
Chen, Quan
Guo, Minyi
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2014, 11 (01)
[36] Data Distribution-Aware Online Client Selection Algorithm for Federated Learning in Heterogeneous Networks
Lee, Jaewook
Ko, Haneul
Seo, Sangwon
Pack, Sangheon
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (01) : 1127 - 1136
[37] LSched A Workload-Aware Learned Query Scheduler for Analytical Database Systems
Sabek, Ibrahim
Ukyab, Tenzin Samten
Kraska, Tim
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 1228 - 1242
[38] WatCache: a workload-aware temporary cache on the compute side of HPC systems
Jie Yu
Guangming Liu
Wenrui Dong
Xiaoyong Li
The Journal of Supercomputing, 2019, 75 : 554 - 586
[39] FAIRNESS-AWARE CLIENT SELECTION FOR FEDERATED LEARNING
Shi, Yuxin
Liu, Zelei
Shi, Zhuan
Yu, Han
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 324 - 329
[40] WARP: Workload-Aware Replication and Partitioning for RDF
Hose, Katja
Schenkel, Ralf
2013 IEEE 29TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW), 2013, : 1 - 6

← 1 2 3 4 5 →