SpotDAG: An RL-Based Algorithm for DAG Workflow Scheduling in Heterogeneous Cloud Environments

被引：2

作者：

Lin, Liduo ^{[1
]}

Pan, Li ^{[1
]}

Liu, Shijun ^{[1
]}

机构：

[1] Shandong Univ, Sch Software, Jinan, Peoples R China

来源：

IEEE TRANSACTIONS ON SERVICES COMPUTING | 2024年 / 17卷 / 05期

基金：

国家重点研发计划;

关键词：

Data processing; Job shop scheduling; Costs; Cloud computing; Optimization; Task analysis; Data models; Heterogeneous cloud environments; spot instance; on-demand instance; IaaS; TASKS;

D O I：

10.1109/TSC.2024.3422828

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

As increasingly complex functions are implemented in applications, directed acyclic graphs (DAGs) are widely used to model the inter-dependencies between individual functions. Cloud-based data processing platforms need to consider the complex topology of DAGs and arbitrary deadlines given by users for job scheduling, leading to an NP-hard decision-making problem. Leveraging spot instances in data processing platforms can achieve significant cost savings, but the unpredictable interruption of spot instances makes the problem of VM scaling and job scheduling more difficult. In this paper, a Reinforcement Learning (RL) based approach called SpotDAG is proposed to solve the auto-scaling problem for jobs modeled as DAGs on a data processing platform where spot instances are introduced. SpotDAG makes cluster scaling and job scheduling decisions at the same time by mapping its output to several meta-policies. This paper introduces the self-attention mechanism for feature extraction to help the intelligent agent learn faster. A mask layer after the output of the proposed RL-based algorithm circumvents illegal actions to ensure that a job is completed by its deadline. Extensive experimental results show that the proposed approach can significantly reduce the cost of instances for data processing platforms while ensuring that jobs are completed in time.

引用

页码：2904 / 2917

页数：14

共 50 条

[1] RL-based Scheduling Strategies in Actual Grid Environments
Costa, Bernardo
Dutra, Ines
Mattoso, Marta
PROCEEDINGS OF THE 2008 INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS, 2008, : 572 - +
[2] Efficient Prediction of Makespan Matrix Workflow Scheduling Algorithm for Heterogeneous Cloud Environments
Zhang, Longxin
Ai, Minghui
Tan, Runti
Man, Junfeng
Deng, Xiaojun
Li, Keqin
JOURNAL OF GRID COMPUTING, 2023, 21 (04)
[3] Efficient Prediction of Makespan Matrix Workflow Scheduling Algorithm for Heterogeneous Cloud Environments
Longxin Zhang
Minghui Ai
Runti Tan
Junfeng Man
Xiaojun Deng
Keqin Li
Journal of Grid Computing, 2023, 21
[4] Workflow Scheduling Algorithm based on Reliance Group in Cloud Environments
Zhang, Yinjuan
Liu, Bo
Li, Chen
Li, Yun
APPLIED SCIENCE, MATERIALS SCIENCE AND INFORMATION TECHNOLOGIES IN INDUSTRY, 2014, 513-517 : 2203 - 2206
[5] An improved Adaptive workflow scheduling Algorithm in cloud Environments
Zhang, Yinjuan
Li, Yun
2015 Third International Conference on Advanced Cloud and Big Data, 2015, : 112 - 116
[6] An energy efficient RL based workflow scheduling in cloud computing
Reddy, Pillareddy Vamsheedhar
Reddy, Karri Ganesh
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 234
[7] A hybrid heuristic workflow scheduling algorithm for cloud computing environments
Mirzayi, Sahar
Rafe, Vahid
JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2015, 27 (06) : 721 - 735
[8] Tri-Objective Workflow Scheduling and Optimization in Heterogeneous Cloud Environments
Alrammah, Huda
Gu, Yi
Liu, Zhifeng
2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020), 2020, : 739 - 748
[9] An RL-based Model for Optimized Kubernetes Scheduling
Rothman, John
Chamanara, Javad
2023 IEEE 31ST INTERNATIONAL CONFERENCE ON NETWORK PROTOCOLS, ICNP, 2023,
[10] RL-based Scheduling of an AAM Traffic Network
Altun, Arinc Tutku
Xu, Yan
Inalhan, Gokhan
Hardt, Michael W.
2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 87 - 88

← 1 2 3 4 5 →