Multi-job Hadoop scheduling to process Geo-distributed big data

被引:0
|
作者
Cavallo, Marco [1 ]
Di Modica, Giuseppe [1 ]
Polito, Carmelo [1 ]
Tomarchio, Orazio [1 ]
机构
[1] Univ Catania, Dept Elect Elect & Comp Engn, Catania, Italy
关键词
Big Data; MapReduce; Multi-job scheduling; Geographical computing environment; Hierarchical Hadoop;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Effective big data analysis is one of the most notable research challenge of the latest few years. Hadoop, the most popular implementation of the MapReduce framework, has today become widespread used for processing large data sets using cloud resources. However, in many scenarios, data are geographically distributed over data centers and moving them to a single site for processing may result extremely expensive when not feasible at all. A key challenge for running applications in such a geographically distributed environment is how to efficiently schedule the computation over the different datacenters. In this work we present a job scheduler for a Hierarchical Hadoop Framework (H2F) that allows the management of multiple requests of job execution ensuring an efficient use of the available resources. Our experimental evaluations show that using H2F significantly improves processing time for geodistributed data sets with respect to a plain Hadoop system.
引用
收藏
页码:1175 / 1181
页数:7
相关论文
共 50 条
  • [11] GEODIS: towards the optimization of data locality-aware job scheduling in geo-distributed data centers
    Moïse W. Convolbo
    Jerry Chou
    Ching-Hsien Hsu
    Yeh Ching Chung
    Computing, 2018, 100 : 21 - 46
  • [12] Traffic-Aware Geo-Distributed Big Data Analytics with Predictable Job Completion Time
    Li, Peng
    Guo, Song
    Miyazaki, Toshiaki
    Liao, Xiaofei
    Jin, Hai
    Zomaya, Albert Y.
    Wang, Kun
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (06) : 1785 - 1796
  • [13] Octopus: Based on Congestion-aware Scheduling on Geo-distributed Big Data Analytics Cluster
    Du, Haizhou
    Zhang, Keke
    Yang, Zhenchen
    2018 5TH INTERNATIONAL CONFERENCE ON SYSTEMS AND INFORMATICS (ICSAI), 2018, : 490 - 495
  • [14] MapReduce Task Scheduling in Heterogeneous Geo-Distributed Data Centers
    Li, Xiaoping
    Chen, Fuchao
    Ruiz, Ruben
    Zhu, Jie
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (06) : 3317 - 3329
  • [15] Joint Scheduling of Data and Computation in Geo-distributed Cloud Systems
    Yin, Lingyan
    Sun, Jizhou
    Zhao, Laiping
    Cui, Chenzhou
    Xiao, Jian
    Yu, Ce
    2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 657 - 666
  • [16] Data Centers Selection for Moving Geo-distributed Big Data to Cloud
    Zhang, Jiangtao
    Yuan, Qiang
    Chen, Shi
    Huang, Hejiao
    Wang, Xuan
    JOURNAL OF INTERNET TECHNOLOGY, 2019, 20 (01): : 111 - 122
  • [17] Fast, scalable and geo-distributed PCA for big data analytics
    Adnan, T. M. Tariq
    Tanjim, Md Mehrab
    Adnan, Muhammad Abdullah
    INFORMATION SYSTEMS, 2021, 98 (98)
  • [18] Cost Minimization for Big Data Processing in Geo-Distributed Data Centers
    Gu, Lin
    Zeng, Deze
    Li, Peng
    Guo, Song
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2014, 2 (03) : 314 - 323
  • [19] GOFS: Geo-distributed Scheduling in OpenFaaS
    Rossi, Fabiana
    Falvo, Simone
    Cardellini, Valeria
    26TH IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (IEEE ISCC 2021), 2021,
  • [20] VNF Deployment and Flow Scheduling in Geo-distributed Data Centers
    Gu, Lin
    Chen, Xiaoxiao
    Jin, Hai
    Lu, Feng
    2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2018,