Predicting the performance of big data applications on the cloud

被引:0
|
作者
D. Ardagna
E. Barbierato
E. Gianniti
M. Gribaudo
T. B. M. Pinto
A. P. C. da Silva
J. M. Almeida
机构
[1] Politecnico de Milano,Dipartimento di Elettronica, Informazione e Bioingegneria
[2] Universidade Federal de Minas Gerais,Departamento de Ciência da Computação
来源
关键词
Performance prediction; Apache spark; Parallel computing; Data science; Big data; Analytical and simulation models;
D O I
暂无
中图分类号
学科分类号
摘要
Data science applications have become widespread as a means to extract knowledge from large datasets. Such applications are often characterized by highly heterogeneous and irregular data access patterns, thus often being referred to as big data applications. Such characteristics make the application execution quite challenging for existing software and hardware infrastructures to meet their resource demands. The cloud computing paradigm, in turn, offers a natural hosting solution to such applications since its on-demand pricing model allows allocating effectively computing resources according to application’s needs. However, these properties impose extra challenge to the accurate performance prediction of cloud-based applications, which is a key step to adequate capacity planning and managing of the hosting infrastructure. In this article, we tackle this challenge by exploring three modeling approaches for predicting the performance of big data applications running on the cloud. We evaluate two queuing-based analytical models and dagSim, a fast ad-hoc simulator, in various scenarios based on different applications and infrastructure setups. The considered approaches are compared in terms of prediction accuracy and execution time. Our results indicate that our two best approaches, one analytical model and dagSim, can predict average application execution times with only up to a 7%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$7\%$$\end{document} relative error, on average. Moreover, a comparison with the widely used event-based simulator available with the Java Modeling Tool (JMT) suite demonstrates that both the analytical model and dagSim run very fast, requiring at least two orders of magnitude lower execution time than JMT while providing slightly better accuracy, being thus practical for online prediction.
引用
收藏
页码:1321 / 1353
页数:32
相关论文
共 50 条
  • [31] Is Big Data Performance Reproducible in Modern Cloud Networks?
    Uta, Alexandru
    Custura, Alexandru
    Duplyakin, Dmitry
    Jimenez, Ivo
    Rellermeyer, Jan
    Maltzahn, Carlos
    Ricci, Robert
    Iosup, Alexandru
    PROCEEDINGS OF THE 17TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, 2020, : 513 - 527
  • [32] Simulation of Runtime Performance of Big Data Workflows on the Cloud
    Llwaah, Faris
    Cala, Jacek
    Thomas, Nigel
    COMPUTER PERFORMANCE ENGINEERING, 2016, 9951 : 141 - 155
  • [33] Predicting cloud performance for HPC applications before deployment
    Mariani, Giovanni
    Anghel, Andreea
    Jongerius, Rik
    Dittmann, Gero
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 87 : 618 - 628
  • [34] Detection of SLA Violation for Big Data Analytics Applications in Cloud
    Zeng, Xuezhi
    Garg, Saurabh
    Barika, Mutaz
    Bista, Sanat
    Puthal, Deepak
    Zomaya, Albert Y.
    Ranjan, Rajiv
    IEEE TRANSACTIONS ON COMPUTERS, 2021, 70 (05) : 746 - 758
  • [35] Optimizing Quality-Aware Big Data Applications in the Cloud
    Gianniti, Eugenio
    Ciavotta, Michele
    Ardagna, Danilo
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2021, 9 (02) : 737 - 752
  • [36] Optimizing Capacity Allocation for Big Data Applications in Cloud Datacenters
    Spicuglia, Sebastiano
    Chen, Lydia Y.
    Birke, Robert
    Binder, Walter
    PROCEEDINGS OF THE 2015 IFIP/IEEE INTERNATIONAL SYMPOSIUM ON INTEGRATED NETWORK MANAGEMENT (IM), 2015, : 511 - 517
  • [37] Online Task Scheduling of Big Data Applications in the Cloud Environment
    Bouhouch, Laila
    Zbakh, Mostapha
    Tadonki, Claude
    INFORMATION, 2023, 14 (05)
  • [38] Big Data Analytics Technology and Applications in Cloud Computing Perspective
    Wen, Xiangbin
    Wang, Zhenghui
    APPLIED MATHEMATICS AND NONLINEAR SCIENCES, 2023, 8 (02) : 1415 - 1432
  • [39] Special Issue on Innovative Applications of Big Data and Cloud Computing
    Yang, Chao-Tung
    Tsung, Chen-Kun
    Yen, Neil Yuwen
    Verma, Vinod Kumar
    APPLIED SCIENCES-BASEL, 2022, 12 (19):
  • [40] Guest Editors' Introduction: Securing Big Data Applications in the Cloud
    Bhargava, Bharat
    Khalil, Ibrahim
    Sandhu, Ravi
    IEEE CLOUD COMPUTING, 2014, 1 (03): : 24 - 26