Robustness challenges in Reinforcement Learning based time-critical cloud resource scheduling: A Meta-Learning based solution

被引:7
|
作者
Liu, Hongyun [1 ,2 ]
Chen, Peng [3 ]
Ouyang, Xue [4 ]
Gao, Hui [5 ]
Yan, Bing [6 ]
Grosso, Paola [1 ]
Zhao, Zhiming [1 ]
机构
[1] Univ Amsterdam, Informat Inst, NL-1098 XH Amsterdam, Netherlands
[2] Univ Amsterdam, Grad Sch Informat, NL-1098 XH Amsterdam, Netherlands
[3] Xihua Univ, Sch Comp & Software Engn, Chengdu 610039, Peoples R China
[4] Natl Univ Def Technol, Sch Comp Sci, Changsha 410073, Peoples R China
[5] Shaanxi Univ Sci & Technol, Coll Elect & Control Engn, Xian 710021, Peoples R China
[6] Univ Adelaide, Sch Elect & Elect Engn, Adelaide, SA 5005, Australia
基金
中国国家自然科学基金;
关键词
Robustness; Reinforcement Learning; Meta Learning; Resource management; Task scheduling; Cloud computing; MANAGEMENT;
D O I
10.1016/j.future.2023.03.029
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Cloud computing attracts increasing attention in processing dynamic computing tasks and automating the software development and operation pipeline. In many cases, the computing tasks have strict deadlines. The cloud resource manager (e.g., orchestrator) effectively manages the resources and provides tasks Quality of Service (QoS). Cloud task scheduling is tricky due to the dynamic nature of task workload and resource availability. Reinforcement Learning (RL) has attracted lots of research attention in scheduling. However, those RL-based approaches suffer from low scheduling performance robustness when the task workload and resource availability change, particularly when handling timecritical tasks. This paper focuses on both challenges of robustness and deadline guarantee among such RL, specifically Deep RL (DRL)-based scheduling approaches. We quantify the robustness measurements as the retraining time and investigate how to improve both robustness and deadline guarantee of DRL-based scheduling. We propose MLR-TC-DRLS, a practical, robust Meta Deep Reinforcement Learning-based scheduling solution to provide time-critical tasks deadline guarantee and fast adaptation under highly dynamic situations. We comprehensively evaluate MLR-TC-DRLS performance against RL-based and RL advanced variants-based scheduling approaches using real-world and synthetic data. The evaluations validate that our proposed approach improves the scheduling performance robustness of typical DRL variants scheduling approaches with 97%-98.5% deadline guarantees and 200%-500% faster adaptation.
引用
收藏
页码:18 / 33
页数:16
相关论文
共 50 条
  • [1] A Reinforcement Learning Based Resource Management Approach for Time-critical Workloads in Distributed Computing Environment
    Liu, Zixia
    Zhang, Hong
    Rao, Bingbing
    Wang, Liqiang
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 252 - 261
  • [2] Deep Reinforcement Learning based Elasticity-compatible Heterogeneous Resource Management for Time-critical Computing
    Liu, Zixia
    Wang, Liqiang
    Quan, Gang
    PROCEEDINGS OF THE 49TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2020, 2020,
  • [3] Towards A Robust Meta-Reinforcement Learning-Based Scheduling Framework for Time Critical Tasks in Cloud Environments
    Liu, Hongyun
    Chen, Peng
    Zhao, Zhiming
    2021 IEEE 14TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD 2021), 2021, : 637 - 647
  • [4] Cloud Resource Scheduling With Deep Reinforcement Learning and Imitation Learning
    Guo, Wenxia
    Tian, Wenhong
    Ye, Yufei
    Xu, Lingxiao
    Wu, Kui
    IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (05): : 3576 - 3586
  • [5] MRLCC: an adaptive cloud task scheduling method based on meta reinforcement learning
    Xiu, Xi
    Li, Jialun
    Long, Yujie
    Wu, Weigang
    JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2023, 12 (01):
  • [6] MRLCC: an adaptive cloud task scheduling method based on meta reinforcement learning
    Xi Xiu
    Jialun Li
    Yujie Long
    Weigang Wu
    Journal of Cloud Computing, 12
  • [7] Deep reinforcement learning-based algorithms selectors for the resource scheduling in hierarchical Cloud computing
    Zhou G.
    Wen R.
    Tian W.
    Buyya R.
    Journal of Network and Computer Applications, 2022, 208
  • [8] Workflow scheduling based on deep reinforcement learning in the cloud environment
    Tingting Dong
    Fei Xue
    Chuangbai Xiao
    Jiangjiang Zhang
    Journal of Ambient Intelligence and Humanized Computing, 2021, 12 : 10823 - 10835
  • [9] Task Scheduling Mechanism Based on Reinforcement Learning in Cloud Computing
    Wang, Yugui
    Dong, Shizhong
    Fan, Weibei
    MATHEMATICS, 2023, 11 (15)
  • [10] Learn to chill - Intelligent Chiller Scheduling using Meta-learning and Deep Reinforcement Learning
    Manoharan, Praveen
    Venkat, Malini Pooni
    Nagarathinam, Srinarayana
    Vasan, Arunchandar
    BUILDSYS'21: PROCEEDINGS OF THE 2021 ACM INTERNATIONAL CONFERENCE ON SYSTEMS FOR ENERGY-EFFICIENT BUILT ENVIRONMENTS, 2021, : 21 - 30