Multi-job Hadoop scheduling to process Geo-distributed big data

被引:0
|
作者
Cavallo, Marco [1 ]
Di Modica, Giuseppe [1 ]
Polito, Carmelo [1 ]
Tomarchio, Orazio [1 ]
机构
[1] Univ Catania, Dept Elect Elect & Comp Engn, Catania, Italy
关键词
Big Data; MapReduce; Multi-job scheduling; Geographical computing environment; Hierarchical Hadoop;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Effective big data analysis is one of the most notable research challenge of the latest few years. Hadoop, the most popular implementation of the MapReduce framework, has today become widespread used for processing large data sets using cloud resources. However, in many scenarios, data are geographically distributed over data centers and moving them to a single site for processing may result extremely expensive when not feasible at all. A key challenge for running applications in such a geographically distributed environment is how to efficiently schedule the computation over the different datacenters. In this work we present a job scheduler for a Hierarchical Hadoop Framework (H2F) that allows the management of multiple requests of job execution ensuring an efficient use of the available resources. Our experimental evaluations show that using H2F significantly improves processing time for geodistributed data sets with respect to a plain Hadoop system.
引用
收藏
页码:1175 / 1181
页数:7
相关论文
共 50 条
  • [21] A Scheduling Framework for Periodic Tasks in Geo-Distributed Data Centers
    Li, Yan
    Zhang, Hong
    Wang, Yong
    Liu, Xinran
    Zhang, Peng
    9TH IEEE INTERNATIONAL SYMPOSIUM ON SERVICE-ORIENTED SYSTEM ENGINEERING (SOSE 2015), 2015, : 247 - 252
  • [22] Traffic-aware Task Placement with Guaranteed Job Completion Time for Geo-distributed Big Data
    Li, Peng
    Miyazaki, Toshiaki
    Guo, Song
    2017 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2017,
  • [23] ResLake: Towards Minimum Job Latency and Balanced Resource Utilization in Geo-distributed Job Scheduling
    Zhang, Xinchun
    Kashaf, Aqsa
    Zou, Yihan
    Zhang, Wei
    Liao, Weibo
    Song, Haoxiang
    Ye, Jintao
    Li, Yakun
    Shi, Rui
    Tian, Yong
    Feng, Wei
    Chen, Binbin
    Chen, Zuzhi
    Zhang, Tieying
    Tang, Yongping
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (12): : 3934 - 3946
  • [24] Flutter: Scheduling Tasks Closer to Data Across Geo-Distributed Datacenters
    Hu, Zhiming
    Li, Baochun
    Luo, Jun
    IEEE INFOCOM 2016 - THE 35TH ANNUAL IEEE INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS, 2016,
  • [25] Harmony: An Approach for Geo-distributed Processing of Big-Data Applications
    Zhang, Han
    Ramapantulu, Lavanya
    Teo, Yong Meng
    2019 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2019, : 160 - 170
  • [26] Anomaly Detection and Repair for Accurate Predictions in Geo-distributed Big Data
    Corizzo, Roberto
    Ceci, Michelangelo
    Japkowicz, Nathalie
    BIG DATA RESEARCH, 2019, 16 : 18 - 35
  • [27] Workload-Aware Scheduling Across Geo-distributed Data Centers
    Jin, Yibo
    Gao, Yuan
    Qian, Zhuzhong
    Zhai, Mingyu
    Peng, Hui
    Lu, Sanglu
    2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 1455 - 1462
  • [28] Optimizing Geo-Distributed Data Analytics with Coordinated Task Scheduling and Routing
    Zhao, Laiping
    Yang, Yanan
    Munir, Ali
    Liu, Alex X.
    Li, Yue
    Qu, Wenyu
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (02) : 279 - 293
  • [29] Privacy-preserving workflow scheduling in geo-distributed data centers
    Xiao, Yao
    Zhou, Amelie Chi
    Yang, Xuan
    He, Bingsheng
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 130 : 46 - 58
  • [30] Efficient Process Mapping in Geo-Distributed Cloud Data Centers
    Zhou, Amelie Chi
    Gong, Yifan
    He, Bingsheng
    Zhai, Jidong
    SC'17: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2017,