Distributed Machine Learning based Mitigating Straggler in Big Data Environment

被引：1

作者：

Lu, Haodong ^{[1
]}

Wang, Kun ^{[2
]}

机构：

[1] Nanjing Univ Posts & Telecommun, Coll Internet Things, Nanjing, Peoples R China

[2] Univ Calif Los Angeles, Dept Elect & Comp Engn, Los Angeles, CA USA

来源：

IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2021) | 2021年

基金：

中国国家自然科学基金;

关键词：

Parameter Server; Straggler; Deep Reinforcement Learning;

D O I：

10.1109/ICC42927.2021.9500531

中图分类号：

TN [电子技术、通信技术];

学科分类号：

0809 ;

摘要：

In big data era, utilizing the parameter server paradigm has been regarded as an efficient and practical way to improve performance in processing deep learning (DL) applications. One of the main problems is that straggler greatly hinders DL training progress, but the previous methods cannot fully consider the resource utilization of the cluster when dealing with straggler. To mitigate straggler problem in parameter server, we propose a Deep Reinforcement Learning (DRL)-based framework called Distributed Actor-critic Reinforcement Learning (DARL) that can automatically adapt each worker's training load to the dynamic cluster without parameter settings. DARL employs state-of-the-art techniques to stabilize training and improve convergence, including distributed framework, multiple actors and prioritized experience replay. Meanwhile, we also apply our customized experience sampling method to fully exploit potentially good samples. Experiments using real DL workloads show that DARL outperforms the representative Bulk Synchronous Parallel (BSP) scheme by 57.8% and Stale Synchronous Parallel (SSP) by 503% in terms of per-iteration time in heterogeneous environment.

引用

页数：6

共 50 条

[41] Scalable malware detection system using big data and distributed machine learning approach
Kumar, Manish
SOFT COMPUTING, 2022, 26 (08) : 3987 - 4003
[42] SCHEDULING THE ALLOCATION OF DATA FRAGMENTS IN A DISTRIBUTED DATABASE ENVIRONMENT - A MACHINE LEARNING APPROACH
CHATURVEDI, AR
CHOUBEY, AK
ROAN, JS
IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, 1994, 41 (02) : 194 - 207
[43] Spark Based Distributed Deep Learning Framework For Big Data Applications
Khumoyun, Akhmedov
Cui, Yun
Hanku, Lee
2016 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND COMMUNICATIONS TECHNOLOGIES (ICISCT), 2016,
[44] A MapReduce Based Distributed Framework for Similarity Search in Healthcare Big Data Environment
Sarma, Hiren K. D.
Dwivedi, Yogesh K.
Rana, Nripendra P.
Slade, Emma L.
OPEN AND BIG DATA MANAGEMENT AND INNOVATION, I3E 2015, 2015, 9373 : 173 - 182
[45] An Improvement of a Checkpoint-based Distributed Testing Technique on a Big Data Environment
Sudsee, Bhuridech
Kaewkasi, Chanwit
2019 21ST INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT): ICT FOR 4TH INDUSTRIAL REVOLUTION, 2019, : 1081 - 1090
[46] Mitigating Straggler Effect in Federated Learning Based on Reconfigurable Intelligent Surface over Internet of Vehicles
Li Zejun
Wu Hao
Lu Yunlong
Dai Yueyue
Ai Bo
China Communications, 2024, 21 (08) : 62 - 78
[47] Mitigating Straggler Effect in Federated Learning Based on Reconfigurable Intelligent Surface over Internet of Vehicles
Li, Zejun
Wu, Hao
Lu, Yunlong
Dai, Yueyue
Ai, Bo
CHINA COMMUNICATIONS, 2024, 21 (08) : 62 - 78
[48] Security Threats and Defensive Approaches in Machine Learning System Under Big Data Environment
Chen Hongsong
Zhang Yongpeng
Cao Yongrui
Bharat Bhargava
Wireless Personal Communications, 2021, 117 : 3505 - 3525
[49] Performance Evaluation of Machine Learning Classifiers for Stock Market Prediction in Big Data Environment
Kalra, Sneh
Gupta, Sachin
Prasad, Jay Shankar
JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES, 2019, 14 (05): : 295 - 306
[50] Cyber forensics framework for big data analytics in IoT environment using machine learning
Chhabra, Gurpal Singh
Singh, Varinder Pal
Singh, Maninder
MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (23-24) : 15881 - 15900

← 1 2 3 4 5 →