Performance Modeling of MapReduce Jobs in Heterogeneous Cloud Environments

被引:45
|
作者
Zhang, Zhuoyao [1 ]
Cherkasova, Ludmila [2 ]
Boon Thau Loo [1 ]
机构
[1] Univ Penn, Philadelphia, PA 19104 USA
[2] Hewlett Packard Labs, Palo Alto, CA USA
关键词
MapReduce; heterogeneous clusters; performance modeling; efficiency;
D O I
10.1109/CLOUD.2013.107
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Many companies start using Hadoop for advanced data analytics over large datasets. While a traditional Hadoop cluster deployment assumes a homogeneous cluster, many enterprise clusters are grown incrementally over time, and might have a variety of different servers in the cluster. The nodes' heterogeneity represents an additional challenge for efficient cluster and job management. Due to resource heterogeneity, it is often unclear which resources introduce inefficiency and bottlenecks, and how such a Hadoop cluster should be configured and optimized. In this work(1), we explore the efficiency and performance accuracy of the bounds-based performance model for predicting the MapReduce job completion times in heterogeneous Hadoop clusters. We validate the accuracy of the proposed performance model using a diverse set of 13 realistic applications and two different heterogeneous clusters. Since one of the Hadoop clusters is formed by different capacity VM instances in Amazon EC2 environment, we additionally explore and discuss factors that impact the MapReduce job performance in the Cloud.
引用
收藏
页码:839 / 846
页数:8
相关论文
共 50 条
  • [21] Salamander: a Holistic Scheduling of MapReduce Jobs on Ephemeral Cloud Resources
    Handaoui, Mohamed
    Dartois, Jean-Emile
    Lemarchand, Laurent
    Boukhobza, Jalil
    2020 20TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2020), 2020, : 320 - 329
  • [22] Modeling the Performance of Heterogeneous IaaS Cloud Centers
    Khazaei, Hamzeh
    Misic, Jelena
    Misic, Vojislav B.
    Mohammadi, Nasim Beigi
    2013 33RD IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOPS (ICDCSW 2013), 2013, : 232 - 237
  • [23] Scheduling Heterogeneous MapReduce Jobs for Efficiency Improvement in Enterprise Clusters
    Yao, Yi
    Tai, Jianzhe
    Sheng, Bo
    Mi, Ningfang
    2013 IFIP/IEEE INTERNATIONAL SYMPOSIUM ON INTEGRATED NETWORK MANAGEMENT (IM 2013), 2013, : 872 - 875
  • [24] Predictive modelling of MapReduce job performance in cloud environments using machine learning techniques
    Bergui, Mohammed
    Hourri, Soufiane
    Najah, Said
    Nikolov, Nikola S.
    JOURNAL OF BIG DATA, 2024, 11 (01)
  • [25] Predicting Job Completion Time In Heterogeneous MapReduce Environments
    Singhal, Rekha
    Verma, Abhishek
    2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 17 - 27
  • [26] Cuckoo: Opportunistic MapReduce on Ephemeral and Heterogeneous Cloud Resources
    Dartois, Jean-Emile
    Ribeiro, Heverson B.
    Boukhobza, Jalil
    Barais, Olivier
    2019 IEEE 12TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (IEEE CLOUD 2019), 2019, : 396 - 403
  • [27] Resource Provisioning Framework for MapReduce Jobs with Performance Goals
    Verma, Abhishek
    Cherkasova, Ludmila
    Campbell, Roy H.
    MIDDLEWARE 2011, 2011, 7049 : 165 - +
  • [28] Scientific data processing using MapReduce in cloud environments
    Kong, Xiangsheng
    Information Technology Journal, 2013, 12 (23) : 7869 - 7873
  • [29] A Learning-based MapReduce Scheduler in Heterogeneous Environments
    Naik, Nenavath Srinivas
    Negi, Atul
    2017 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2017, : 2020 - 2025
  • [30] Resource provisioning framework for mapreduce jobs with performance goals
    Verma, Abhishek
    Cherkasova, Ludmila
    Campbell, Roy H.
    HP Laboratories Technical Report, 2011, (173):