Multi-job Hadoop scheduling to process Geo-distributed big data

被引:0
|
作者
Cavallo, Marco [1 ]
Di Modica, Giuseppe [1 ]
Polito, Carmelo [1 ]
Tomarchio, Orazio [1 ]
机构
[1] Univ Catania, Dept Elect Elect & Comp Engn, Catania, Italy
关键词
Big Data; MapReduce; Multi-job scheduling; Geographical computing environment; Hierarchical Hadoop;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Effective big data analysis is one of the most notable research challenge of the latest few years. Hadoop, the most popular implementation of the MapReduce framework, has today become widespread used for processing large data sets using cloud resources. However, in many scenarios, data are geographically distributed over data centers and moving them to a single site for processing may result extremely expensive when not feasible at all. A key challenge for running applications in such a geographically distributed environment is how to efficiently schedule the computation over the different datacenters. In this work we present a job scheduler for a Hierarchical Hadoop Framework (H2F) that allows the management of multiple requests of job execution ensuring an efficient use of the available resources. Our experimental evaluations show that using H2F significantly improves processing time for geodistributed data sets with respect to a plain Hadoop system.
引用
收藏
页码:1175 / 1181
页数:7
相关论文
共 50 条
  • [31] Improving Performance for Geo-Distributed Data Process in Wide -Area
    Zhang, Ge
    Wang, Haozhan
    Luan, Zhongzhi
    Wu, Weiguo
    Qian, Depei
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (CIT), 2017, : 162 - 167
  • [32] Scheduling Jobs Across Geo-distributed Datacenters
    Hung, Chien-Chun
    Golubchik, Leana
    Yu, Minlan
    ACM SOCC'15: PROCEEDINGS OF THE SIXTH ACM SYMPOSIUM ON CLOUD COMPUTING, 2015, : 111 - 124
  • [33] Application Profiling in Hierarchical Hadoop for Geo-distributed Computing Environments
    Cavallo, Marco
    Di Modica, Giuseppe
    Polito, Carmelo
    Tomarchio, Orazio
    2016 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATION (ISCC), 2016, : 555 - 560
  • [35] Scheduling in Big Data Heterogeneous Distributed System Using Hadoop
    Thakkar, Shraddha
    Patel, Sanjay
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ICT FOR SUSTAINABLE DEVELOPMENT ICT4SD 2015, VOL 2, 2016, 409 : 119 - 131
  • [36] Multi-objective optimization of data deployment and scheduling based on the minimum cost in geo-distributed cloud
    Xie, Tianxing
    Li, Chunlin
    Hao, Na
    Luo, Youlong
    COMPUTER COMMUNICATIONS, 2022, 185 : 142 - 158
  • [37] Time Optimization Modeling for Big Data Placement and Analysis for Geo-Distributed Data Centers
    Khan, Awais
    Attique, Muhammad
    Chung, Tae-Sun
    Kim, Youngjae
    2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2016, : 140 - 141
  • [38] Cost-Aware Big Data Processing Across Geo-Distributed Datacenters
    Xiao, Wenhua
    Bao, Weidong
    Zhu, Xiaomin
    Liu, Ling
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (11) : 3114 - 3127
  • [39] CoS-HDFS: Co-Locating Geo-Distributed Spatial Data in Hadoop Distributed File System
    Fahmy, Mariam Malak
    Elghandour, Iman
    Nagi, Magdy
    2016 3RD IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES (BDCAT), 2016, : 123 - 132
  • [40] Energy-Aware Cloud Workflow Applications Scheduling With Geo-Distributed Data
    Li, Xiaoping
    Yu, Wei
    Ruiz, Ruben
    Zhu, Jie
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (02) : 891 - 903