Multi-job Hadoop scheduling to process Geo-distributed big data

被引:0
|
作者
Cavallo, Marco [1 ]
Di Modica, Giuseppe [1 ]
Polito, Carmelo [1 ]
Tomarchio, Orazio [1 ]
机构
[1] Univ Catania, Dept Elect Elect & Comp Engn, Catania, Italy
关键词
Big Data; MapReduce; Multi-job scheduling; Geographical computing environment; Hierarchical Hadoop;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Effective big data analysis is one of the most notable research challenge of the latest few years. Hadoop, the most popular implementation of the MapReduce framework, has today become widespread used for processing large data sets using cloud resources. However, in many scenarios, data are geographically distributed over data centers and moving them to a single site for processing may result extremely expensive when not feasible at all. A key challenge for running applications in such a geographically distributed environment is how to efficiently schedule the computation over the different datacenters. In this work we present a job scheduler for a Hierarchical Hadoop Framework (H2F) that allows the management of multiple requests of job execution ensuring an efficient use of the available resources. Our experimental evaluations show that using H2F significantly improves processing time for geodistributed data sets with respect to a plain Hadoop system.
引用
收藏
页码:1175 / 1181
页数:7
相关论文
共 50 条
  • [1] A Hierarchical Hadoop Framework to Process Geo-Distributed Big Data
    Di Modica, Giuseppe
    Tomarchio, Orazio
    BIG DATA AND COGNITIVE COMPUTING, 2022, 6 (01)
  • [2] A Hadoop based Framework to Process Geo-distributed Big Data
    Cavallo, Marco
    Cusma', Lorenzo
    Di Modica, Giuseppe
    Polito, Carmelo
    Tomarchio, Orazio
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, VOL 1 (CLOSER), 2016, : 178 - 185
  • [3] A Hierarchical Hadoop Framework to Handle Big Data in Geo-Distributed Computing Environments
    Tomarchio, Orazio
    Di Modica, Giuseppe
    Cavallo, Marco
    Polito, Carmelo
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGIES AND SYSTEMS APPROACH, 2018, 11 (01) : 16 - 47
  • [4] A LAHC-based Job Scheduling Strategy to Improve Big Data Processing in Geo-distributed Contexts
    Cavallo, Marco
    Di Modica, Giuseppe
    Polito, Carmelo
    Tomarchio, Orazio
    IOTBDS: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON INTERNET OF THINGS, BIG DATA AND SECURITY, 2017, : 92 - 101
  • [5] Load Balance Based Job Scheduling in Geo-Distributed Clouds
    Li, Chunlin
    Tang, Jianhang
    Luo, Youlong
    WIRELESS PERSONAL COMMUNICATIONS, 2019, 107 (01) : 169 - 192
  • [6] Load Balance Based Job Scheduling in Geo-Distributed Clouds
    Chunlin Li
    Jianhang Tang
    Youlong Luo
    Wireless Personal Communications, 2019, 107 : 169 - 192
  • [7] Yugong: Geo-Distributed Data and Job Placement at Scale
    Huang, Yuzhen
    Shi, Yingjie
    Zhong, Zheng
    Feng, Yihui
    Cheng, James
    Li, Jiwei
    Fang, Haochuan
    Li, Chao
    Guan, Tao
    Zhou, Jingren
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2019, 12 (12): : 2155 - 2169
  • [8] Fast Big Data Analysis in Geo-Distributed Cloud
    Li, Yue
    Zhao, Laiping
    Cui, Chenzhou
    Yu, Ce
    2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2016, : 388 - 391
  • [9] H2F: a Hierarchical Hadoop Framework for big data processing in geo-distributed environments
    Cavallo, Marco
    Di Modica, Giuseppe
    Polito, Carmelo
    Tomarchio, Orazio
    2016 3RD IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES (BDCAT), 2016, : 27 - 35
  • [10] GEODIS: towards the optimization of data locality-aware job scheduling in geo-distributed data centers
    Convolbo, Moise W.
    Chou, Jerry
    Hsu, Ching-Hsien
    Chung, Yeh Ching
    COMPUTING, 2018, 100 (01) : 21 - 46