SpotDAG: An RL-Based Algorithm for DAG Workflow Scheduling in Heterogeneous Cloud Environments

被引:2
|
作者
Lin, Liduo [1 ]
Pan, Li [1 ]
Liu, Shijun [1 ]
机构
[1] Shandong Univ, Sch Software, Jinan, Peoples R China
基金
国家重点研发计划;
关键词
Data processing; Job shop scheduling; Costs; Cloud computing; Optimization; Task analysis; Data models; Heterogeneous cloud environments; spot instance; on-demand instance; IaaS; TASKS;
D O I
10.1109/TSC.2024.3422828
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As increasingly complex functions are implemented in applications, directed acyclic graphs (DAGs) are widely used to model the inter-dependencies between individual functions. Cloud-based data processing platforms need to consider the complex topology of DAGs and arbitrary deadlines given by users for job scheduling, leading to an NP-hard decision-making problem. Leveraging spot instances in data processing platforms can achieve significant cost savings, but the unpredictable interruption of spot instances makes the problem of VM scaling and job scheduling more difficult. In this paper, a Reinforcement Learning (RL) based approach called SpotDAG is proposed to solve the auto-scaling problem for jobs modeled as DAGs on a data processing platform where spot instances are introduced. SpotDAG makes cluster scaling and job scheduling decisions at the same time by mapping its output to several meta-policies. This paper introduces the self-attention mechanism for feature extraction to help the intelligent agent learn faster. A mask layer after the output of the proposed RL-based algorithm circumvents illegal actions to ensure that a job is completed by its deadline. Extensive experimental results show that the proposed approach can significantly reduce the cost of instances for data processing platforms while ensuring that jobs are completed in time.
引用
收藏
页码:2904 / 2917
页数:14
相关论文
共 50 条
  • [21] Hybrid Algorithm for Workflow Scheduling in Cloud-based Cyberinfrastructures
    Nicolae, Andrei Alexandru
    Negru, Catalin
    Pop, Florin
    Mocanu, Mariana
    Cristea, Valentin
    2014 17TH INTERNATIONAL CONFERENCE ON NETWORK-BASED INFORMATION SYSTEMS (NBIS 2014), 2014, : 221 - 228
  • [22] Cloud service workflow scheduling algorithm based on priority rules
    Zhao Y.
    Hu B.
    Zhang Z.
    Zhang R.
    International Journal of Internet Manufacturing and Services, 2022, 8 (03): : 254 - 266
  • [23] Granularity-based workflow scheduling algorithm for cloud computing
    Madhu Sudan Kumar
    Indrajeet Gupta
    Sanjaya K. Panda
    Prasanta K. Jana
    The Journal of Supercomputing, 2017, 73 : 5440 - 5464
  • [24] Decomposition Based Multi-objective Workflow Scheduling for Cloud Environments
    Bugingo, Emmanuel
    Zheng, Wei
    Zhang, Dongzhan
    Qin, Yingsheng
    Zhang, Defu
    2019 SEVENTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD), 2019, : 37 - 42
  • [25] Task Duplication-Based Workflow Scheduling for Heterogeneous Cloud Environment
    Gupta, Indrajeet
    Kumar, Madhu Sudan
    Jana, Prasanta K.
    2016 NINTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3), 2016, : 96 - 102
  • [26] Elasticity Based Scheduling Heuristic Algorithm for Cloud Environments
    Al Buhussain, Ali
    De Grande, Robson E.
    Boukerche, Azzedine
    2016 IEEE/ACM 20TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL TIME APPLICATIONS (DS-RT), 2016, : 1 - 8
  • [27] A scheduling algorithm based on reinforcement learning for heterogeneous environments
    Lin, Ziniu
    Li, Chen
    Tian, Lihua
    Zhang, Bin
    APPLIED SOFT COMPUTING, 2022, 130
  • [28] HICA: A Hybrid Scientific Workflow Scheduling Algorithm for Symmetric Homogeneous Resource Cloud Environments
    Hu, Liang
    Wu, Xianwei
    Che, Xilong
    SYMMETRY-BASEL, 2025, 17 (02):
  • [29] Scheduling algorithm based on critical tasks in heterogeneous environments
    Zhou, Lan
    Shixin, Sun
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2008, 19 (02) : 398 - IBC
  • [30] Scheduling algorithm based on critical tasks in heterogeneous environments
    Lan Zhou & Sun Shixin Coll. of Computer Science and Engineering
    JournalofSystemsEngineeringandElectronics, 2008, (02) : 398 - 405