Multi-job Hadoop scheduling to process Geo-distributed big data

被引:0
|
作者
Cavallo, Marco [1 ]
Di Modica, Giuseppe [1 ]
Polito, Carmelo [1 ]
Tomarchio, Orazio [1 ]
机构
[1] Univ Catania, Dept Elect Elect & Comp Engn, Catania, Italy
关键词
Big Data; MapReduce; Multi-job scheduling; Geographical computing environment; Hierarchical Hadoop;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Effective big data analysis is one of the most notable research challenge of the latest few years. Hadoop, the most popular implementation of the MapReduce framework, has today become widespread used for processing large data sets using cloud resources. However, in many scenarios, data are geographically distributed over data centers and moving them to a single site for processing may result extremely expensive when not feasible at all. A key challenge for running applications in such a geographically distributed environment is how to efficiently schedule the computation over the different datacenters. In this work we present a job scheduler for a Hierarchical Hadoop Framework (H2F) that allows the management of multiple requests of job execution ensuring an efficient use of the available resources. Our experimental evaluations show that using H2F significantly improves processing time for geodistributed data sets with respect to a plain Hadoop system.
引用
收藏
页码:1175 / 1181
页数:7
相关论文
共 50 条
  • [41] Multi-Objective Optimizations in Geo-Distributed Data Analytics Systems
    Niu, Zhaojie
    He, Bingsheng
    Zhou, Amelie Chi
    Tong, Lau Chiew
    2017 IEEE 23RD INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2017, : 519 - 528
  • [42] Optimal Coordination Mechanisms for Multi-job Scheduling Games
    Abed, Fidaa
    Correa, Jose R.
    Huang, Chien-Chung
    ALGORITHMS - ESA 2014, 2014, 8737 : 13 - 24
  • [43] Sustainable AIGC Workload Scheduling of Geo-Distributed Data Centers: A Multi-Agent Reinforcement Learning Approach
    Zhang, Siyue
    Xu, Minrui
    Lim, Wei Yang Bryan
    Niyato, Dusit
    IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 3500 - 3505
  • [44] Adaptive priority-based data placement and multi-task scheduling in geo-distributed cloud systems
    Li, Chunlin
    Liu, Jun
    Li, Weigang
    Luo, Youlong
    KNOWLEDGE-BASED SYSTEMS, 2021, 224
  • [45] Multi-queue scheduling of heterogeneous jobs in hybrid geo-distributed cloud environment
    Li Chunlin
    Tang Jianhang
    Luo Youlong
    JOURNAL OF SUPERCOMPUTING, 2018, 74 (10): : 5263 - 5292
  • [46] Multi-queue scheduling of heterogeneous jobs in hybrid geo-distributed cloud environment
    Li Chunlin
    Tang Jianhang
    Luo Youlong
    The Journal of Supercomputing, 2018, 74 : 5263 - 5292
  • [47] Efficient Geo-Distributed Data Processing with Rout
    Jayalath, Chamikara
    Eugster, Patrick
    2013 IEEE 33RD INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2013, : 470 - 480
  • [48] Low Latency Geo-distributed Data Analytics
    Pu, Qifan
    Ananthanarayanan, Ganesh
    Bodik, Peter
    Kandula, Srikanth
    Akella, Aditya
    Bahl, Paramvir
    Stoica, Ion
    ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2015, 45 (04) : 421 - 434
  • [49] Improving Multi-Job MapReduce Scheduling in an Opportunistic Environment
    Ji, Yuting
    Tong, Lang
    He, Ting
    Tan, Jian
    Lee, Kang-won
    Zhang, Li
    2013 IEEE SIXTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD 2013), 2013, : 9 - 16
  • [50] A survey on bandwidth-aware geo-distributed frameworks for big-data analytics
    Mohammed Bergui
    Said Najah
    Nikola S. Nikolov
    Journal of Big Data, 8