A Data and Task Co-Scheduling Algorithm for Scientific Cloud Workflows

被引:15
|
作者
Deng, Kefeng [1 ]
Ren, Kaijun [1 ]
Zhu, Min [1 ]
Song, Junqiang [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp, Changsha 410073, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Cloud computing; scientific workflow; co-scheduling; data placement; task scheduling; DATA PLACEMENT; STRATEGY;
D O I
10.1109/TCC.2015.2511745
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cloud computing has emerged as a promising computational infrastructure for cost-efficient workflow execution by provisioning on-demand resources in a pay-as-you-go manner. While scientific workflows require accessing community-wide resources, they usually need to be performed in collaborative cloud environments composed of multiple datacenters. Although such environments facilitate scientific collaboration, the movements of input and intermediate datasets across geographically distributed datacenters may cause intolerable latency that would hinder efficient execution of large-scale data-intensive scientific workflows. To address the problem, in this article we propose a novel multi-level K-cut graph partitioning algorithm to minimize the volume of data transfer across datacenters while satisfying load balancing and fixed data constraints. The algorithm first contracts the fixed input datasets in the same datacenter and their consuming tasks, and coarsens the contracted graph to a predefined scale in a level-by-level manner. Then, a K-cut algorithm is used to partition the resulted graph into K parts such that the cut size is minimized. After that, the partitioned graph is projected back to the original workflow graph, during which the load balancing constraint is maintained. We evaluate our algorithm using three real-world workflow applications and the results demonstrate that the proposed algorithm outperforms other state-of-the-art algorithms.
引用
收藏
页码:349 / 362
页数:14
相关论文
共 50 条
  • [31] Energy Efficient Scheduling of Scientific Workflows in Cloud Environment
    Ghose, Manojit
    Verma, Pratyush
    Karmakar, Sushanta
    Sahu, Aryabartta
    2017 19TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS (HPCC) / 2017 15TH IEEE INTERNATIONAL CONFERENCE ON SMART CITY (SMARTCITY) / 2017 3RD IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (DSS), 2017, : 170 - 177
  • [32] Optimal Workflow Scheduling for Scientific Workflows in Cloud Computing
    Agarkhed, Jayashree
    Ashalatha, R.
    IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGICAL TRENDS IN COMPUTING, COMMUNICATIONS AND ELECTRICAL ENGINEERING (ICETT), 2016,
  • [33] An improved task scheduling algorithm for scientific workflow in cloud computing environment
    Geng, Xiaozhong
    Mao, Yingshuang
    Xiong, Mingyuan
    Liu, Yang
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 3): : S7539 - S7548
  • [34] An improved task scheduling algorithm for scientific workflow in cloud computing environment
    Xiaozhong Geng
    Yingshuang Mao
    Mingyuan Xiong
    Yang Liu
    Cluster Computing, 2019, 22 : 7539 - 7548
  • [35] An incremental reinforcement learning scheduling strategy for data-intensive scientific workflows in the cloud
    Nascimento, Andre
    Silva, Vitor
    Paes, Aline
    de Oliveira, Daniel
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (11):
  • [36] Fault Tolerant and Data Oriented Scientific Workflows Management and Scheduling System in Cloud Computing
    Ahmad, Zulfiqar
    Jehangiri, Ali Imran
    Mohamed, Nader
    Othman, Mohamed
    Umar, Arif Iqbal
    IEEE ACCESS, 2022, 10 : 77614 - 77632
  • [37] Task Duplication-Based Scheduling Algorithm for Budget-Constrained Workflows in Cloud Computing
    Yao, Fuguang
    Pu, Changjiu
    Zhang, Zongyin
    IEEE ACCESS, 2021, 9 : 37262 - 37272
  • [38] Optimal Data Placement for Scientific Workflows in Cloud
    Shrivastava, Manish
    JOURNAL OF COMPUTER INFORMATION SYSTEMS, 2024, 64 (04) : 501 - 517
  • [39] Performance-Driven Task Co-Scheduling for MapReduce Environments
    Polo, Jorda
    Carrera, David
    Becerra, Yolanda
    Torres, Jordi
    Ayguade, Eduard
    Steinder, Malgorzata
    Whalley, Ian
    PROCEEDINGS OF THE 2010 IEEE-IFIP NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM, 2010, : 373 - 380
  • [40] A data placement strategy in scientific cloud workflows
    Yuan, Dong
    Yang, Yun
    Liu, Xiao
    Chen, Jinjun
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2010, 26 (08): : 1200 - 1214