Distributed Machine Learning based Mitigating Straggler in Big Data Environment

被引：1

作者：

Lu, Haodong ^{[1
]}

Wang, Kun ^{[2
]}

机构：

[1] Nanjing Univ Posts & Telecommun, Coll Internet Things, Nanjing, Peoples R China

[2] Univ Calif Los Angeles, Dept Elect & Comp Engn, Los Angeles, CA USA

来源：

IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2021) | 2021年

基金：

中国国家自然科学基金;

关键词：

Parameter Server; Straggler; Deep Reinforcement Learning;

D O I：

10.1109/ICC42927.2021.9500531

中图分类号：

TN [电子技术、通信技术];

学科分类号：

0809 ;

摘要：

In big data era, utilizing the parameter server paradigm has been regarded as an efficient and practical way to improve performance in processing deep learning (DL) applications. One of the main problems is that straggler greatly hinders DL training progress, but the previous methods cannot fully consider the resource utilization of the cluster when dealing with straggler. To mitigate straggler problem in parameter server, we propose a Deep Reinforcement Learning (DRL)-based framework called Distributed Actor-critic Reinforcement Learning (DARL) that can automatically adapt each worker's training load to the dynamic cluster without parameter settings. DARL employs state-of-the-art techniques to stabilize training and improve convergence, including distributed framework, multiple actors and prioritized experience replay. Meanwhile, we also apply our customized experience sampling method to fully exploit potentially good samples. Experiments using real DL workloads show that DARL outperforms the representative Bulk Synchronous Parallel (BSP) scheme by 57.8% and Stale Synchronous Parallel (SSP) by 503% in terms of per-iteration time in heterogeneous environment.

引用

页数：6

共 50 条

[31] Granular computing based machine learning in the era of big data
Hu, Qinghua
Mi, Jusheng
Chen, Degang
Information Sciences, 2022, 591 : 422 - 423
[32] Big Data Analysis of TV Dramas Based on Machine Learning
Tan, Jiaqi
Mao, Feiqiao
Yang, Lianghai
Wang, Jiahui
SMART COMPUTING AND COMMUNICATION, SMARTCOM 2017, 2018, 10699 : 90 - 95
[33] Towards Mitigating Straggler with Deep Reinforcement Learning in Parameter Server
Lu, Haodong
Wang, Kun
2020 IEEE/CIC INTERNATIONAL CONFERENCE ON COMMUNICATIONS IN CHINA (ICCC), 2020, : 829 - 834
[34] Modelling of healthcare data analytics using optimal machine learning model in big data environment
Fancy, Chelladurai
Krishnaraj, Nagappan
Ishwarya, K.
Raja, G.
Chandrasekaran, Shyamala
EXPERT SYSTEMS, 2025, 42 (01)
[35] Machine learning for big data analytics
Oja, E. (erkki.oja@aalto.fi), 1600, Springer Verlag (384):
[36] Big data and machine learning in health
Carvalho, D.
Cruz, R.
EUROPEAN JOURNAL OF PUBLIC HEALTH, 2020, 30 : 10 - 11
[37] Machine learning and big scientific data
Hey, Tony
Butler, Keith
Jackson, Sam
Thiyagalingam, Jeyarajan
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2020, 378 (2166):
[38] Machine Learning under Big Data
Shi, Chunhe
Wu, Chengdong
Han, Xiaowei
Xie, Yinghong
Li, Zhen
PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ELECTRONIC, MECHANICAL, INFORMATION AND MANAGEMENT SOCIETY (EMIM), 2016, 40 : 301 - 305
[39] Machine learning, big data, and neuroscience
Pillow, Jonathan
Sahani, Maneesh
CURRENT OPINION IN NEUROBIOLOGY, 2019, 55 : III - IV
[40] Scalable malware detection system using big data and distributed machine learning approach
Manish Kumar
Soft Computing, 2022, 26 : 3987 - 4003

← 1 2 3 4 5 →