Predicting the performance of big data applications on the cloud

被引:0
|
作者
D. Ardagna
E. Barbierato
E. Gianniti
M. Gribaudo
T. B. M. Pinto
A. P. C. da Silva
J. M. Almeida
机构
[1] Politecnico de Milano,Dipartimento di Elettronica, Informazione e Bioingegneria
[2] Universidade Federal de Minas Gerais,Departamento de Ciência da Computação
来源
关键词
Performance prediction; Apache spark; Parallel computing; Data science; Big data; Analytical and simulation models;
D O I
暂无
中图分类号
学科分类号
摘要
Data science applications have become widespread as a means to extract knowledge from large datasets. Such applications are often characterized by highly heterogeneous and irregular data access patterns, thus often being referred to as big data applications. Such characteristics make the application execution quite challenging for existing software and hardware infrastructures to meet their resource demands. The cloud computing paradigm, in turn, offers a natural hosting solution to such applications since its on-demand pricing model allows allocating effectively computing resources according to application’s needs. However, these properties impose extra challenge to the accurate performance prediction of cloud-based applications, which is a key step to adequate capacity planning and managing of the hosting infrastructure. In this article, we tackle this challenge by exploring three modeling approaches for predicting the performance of big data applications running on the cloud. We evaluate two queuing-based analytical models and dagSim, a fast ad-hoc simulator, in various scenarios based on different applications and infrastructure setups. The considered approaches are compared in terms of prediction accuracy and execution time. Our results indicate that our two best approaches, one analytical model and dagSim, can predict average application execution times with only up to a 7%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$7\%$$\end{document} relative error, on average. Moreover, a comparison with the widely used event-based simulator available with the Java Modeling Tool (JMT) suite demonstrates that both the analytical model and dagSim run very fast, requiring at least two orders of magnitude lower execution time than JMT while providing slightly better accuracy, being thus practical for online prediction.
引用
收藏
页码:1321 / 1353
页数:32
相关论文
共 50 条
  • [21] An Overview of Monitoring Tools for Big Data and Cloud Applications
    Iuhasz, Gabriel
    Dragan, Ioan
    2015 17TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC), 2016, : 363 - 366
  • [22] Predicting Performance Using Consumer Big Data
    Froot, Kenneth
    Kang, Namho
    Ozik, Gideon
    Sadka, Ronnie
    JOURNAL OF PORTFOLIO MANAGEMENT, 2022, 48 (03): : 47 - 61
  • [23] Cloud computing,IoT, and big data: Technologies and applications
    Bakhouya, Mohamed
    Zbakh, Mostapha
    Essaaidi, Mohamed
    Manneback, Pierre
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (17):
  • [24] Partitioning the Impact of Mobile Applications on Big Data Cloud
    Ahmed, Fayyaz
    8TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT-2017) AND THE 7TH INTERNATIONAL CONFERENCE ON SUSTAINABLE ENERGY INFORMATION TECHNOLOGY (SEIT 2017), 2017, 109 : 1041 - 1046
  • [25] Cloud Based Web Scraping for Big Data Applications
    Chaulagain, Ram Sharan
    Pandey, Santosh
    Basnet, Sadhu Ram
    Shakya, Subarna
    2017 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD), 2017, : 138 - 143
  • [26] Cloud Infrastructure Resource Allocation for Big Data Applications
    Dai, Wenyun
    Qiu, Longfei
    Wu, Ana
    Qiu, Meikang
    IEEE TRANSACTIONS ON BIG DATA, 2018, 4 (03) : 313 - 324
  • [27] Big Data Applications Performance Assurance
    Zibitsker, Boris
    ICPE'16 COMPANION: PROCEEDINGS OF THE 2016 COMPANION PUBLICATION FOR THE ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING, 2016, : 31 - 31
  • [28] Performance prediction of parallel computing models to analyze cloud-based big data applications
    Shen, Chao
    Tong, Weiqin
    Choo, Kim-Kwang Raymond
    Kausar, Samina
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2018, 21 (02): : 1439 - 1454
  • [29] Performance prediction of parallel computing models to analyze cloud-based big data applications
    Chao Shen
    Weiqin Tong
    Kim-Kwang Raymond Choo
    Samina Kausar
    Cluster Computing, 2018, 21 : 1439 - 1454
  • [30] A Hybrid Machine Learning Approach for Performance Modeling of Cloud-Based Big Data Applications
    Ataie, Ehsan
    Evangelinou, Athanasia
    Gianniti, Eugenio
    Ardagna, Danilo
    COMPUTER JOURNAL, 2022, 65 (12): : 3123 - 3140