Fast Big Data Analysis in Geo-Distributed Cloud

被引:2
|
作者
Li, Yue [1 ]
Zhao, Laiping [2 ]
Cui, Chenzhou [3 ]
Yu, Ce [1 ]
机构
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China
[2] Tianjin Univ, Sch Comp Software, Tianjin, Peoples R China
[3] CAS NAOC, Natl Astron Observ, Beijing, Peoples R China
关键词
D O I
10.1109/CLUSTER.2016.28
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As cloud services grow to span more and more globally distributed datacenters, there is an increasingly need for scheduling algorithms to automatically place tasks across these datacenters. In geo-distributed cloud, the limited WAN bandwidth has become the major bottleneck in fast big data analytics. The scheduling algorithm needs to minimize the global completion time, by jointly optimizing task scheduling and WAN data transfer. In this paper, we model the task scheduling as a community detection problem, with respect to the dependency relations between task, data, and datacenters, and propose a Community Detection-based Scheduling (CDS) algorithm, which is able to minimize the WAN data transfer volume. We utilize the real China-Astronomy-Cloud network to evaluate the proposed algorithms. Experimental results show that we can reduce the total data transfer volume by up to 40.7%, and the global completion time by up to 35.8%, compared with the Hypergraph Partition-based scheduling algorithm and the greedy scheduling algorithm.
引用
收藏
页码:388 / 391
页数:4
相关论文
共 50 条
  • [41] TripS: Automated Multi-tiered Data Placement in a Geo-distributed Cloud Environment
    Oh, Kwangsung
    Chandra, Abhishek
    Weissman, Jon
    SYSTOR'17: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL SYSTEMS AND STORAGE CONFERENCE, 2017,
  • [42] Sketch-based Data Placement among Geo-distributed Datacenters for Cloud Storages
    Yu, Boyang
    Pan, Jianping
    IEEE INFOCOM 2016 - THE 35TH ANNUAL IEEE INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS, 2016,
  • [43] Renewable Energy-Aware Big Data Analytics in Geo-Distributed Data Centers with Reinforcement Learning
    Xu, Chenhan
    Wang, Kun
    Li, Peng
    Xia, Rui
    Guo, Song
    Guo, Minyi
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2020, 7 (01): : 205 - 215
  • [44] A General Communication Cost Optimization Framework for Big Data Stream Processing in Geo-Distributed Data Centers
    Gu, Lin
    Zeng, Deze
    Guo, Song
    Xiang, Yong
    Hu, Jiankun
    IEEE TRANSACTIONS ON COMPUTERS, 2016, 65 (01) : 19 - 29
  • [45] Global reduction for geo-distributed MapReduce across cloud federation
    Gouasmi, Thouraya
    Kacem, Ahmed Hadj
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2025, 162
  • [46] Online Cost Minimization for Operating Geo-distributed Cloud CDNs
    Zhang, Xiaoxi
    Wu, Chuan
    Li, Zongpeng
    Lau, Francis C. M.
    2015 IEEE 23RD INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS), 2015, : 21 - 30
  • [47] A fast solution for bi-objective traffic minimization in geo-distributed data flows
    Michailidou, Anna-Valentini
    Gounaris, Anastasios
    IDEAS '19: PROCEEDINGS OF THE 23RD INTERNATIONAL DATABASE APPLICATIONS & ENGINEERING SYMPOSIUM (IDEAS 2019), 2019, : 219 - 228
  • [48] On Fast and Coordinated Data Backup in Geo-Distributed Optical Inter-Datacenter Networks
    Yao, Jingjing
    Lu, Ping
    Gong, Long
    Zhu, Zuqing
    JOURNAL OF LIGHTWAVE TECHNOLOGY, 2015, 33 (14) : 3005 - 3015
  • [49] Octopus: Based on Congestion-aware Scheduling on Geo-distributed Big Data Analytics Cluster
    Du, Haizhou
    Zhang, Keke
    Yang, Zhenchen
    2018 5TH INTERNATIONAL CONFERENCE ON SYSTEMS AND INFORMATICS (ICSAI), 2018, : 490 - 495
  • [50] WANalytics: Geo-Distributed Analytics for a Data Intensive World
    Vulimiri, Ashish
    Curino, Carlo
    Godfrey, P. Brighten
    Jungblut, Thomas
    Karanasos, Konstantinos
    Padhye, Jitu
    Varghese, George
    SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 1087 - 1092