On Machine Learning-based Stage-aware Performance Prediction of Spark Applications

被引:1
|
作者
Ye, Guangjun [1 ]
Liu, Wuji [2 ]
Wu, Chase Q. [2 ]
Shen, Wei [1 ]
Lyu, Xukang [3 ]
机构
[1] Zhejiang Sci Tech Univ, Sch Informat Sci & Technol, Hangzhou 310018, Zhejiang, Peoples R China
[2] New Jersey Inst Technol, Dept Comp Sci, Newark, NJ 07102 USA
[3] Tianjin Univ, Coll Intelligence & Comp, Sch Comp Software, Tianjin 300354, Peoples R China
基金
美国国家科学基金会;
关键词
Big data computing; performance modeling; Spark; in-memory processing;
D O I
10.1109/IPCCC50635.2020.9391564
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The data volume of large-scale applications in various science, engineering, and business domains has experienced an explosive growth over the past decade, and has gone far beyond the computing capability and storage capacity of any single server. As a viable solution, such data is oftentimes stored in distributed file systems and processed by parallel computing engines, as exemplified by Spark, which has gained increasing popularity over the traditional MapReduce framework due to its fast in-memory processing of streaming data. Spark engines are generally deployed in cloud environments such as Amazon EC2 and Alibaba Cloud. However, storage and computing resources in these cloud environments are typically provisioned on a pay-as-you-go basis and thus an accurate estimate of the execution time of Spark workloads is critical to making full utilization of cloud resources and meeting performance requirements of end users. Our insight is that the execution pattern of many Spark workloads is qualitatively similar, which makes it possible to leverage historical performance data to predict the execution time of a given Spark application. We use the execution information extracted from Spark History Server as training data and develop a stage-aware hierarchical neural network model for performance prediction. Experimental results show that the proposed hierarchical model achieves higher accuracy than a holistic prediction model at the end-to-end level, and also outperforms other existing regression-based prediction methods.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Machine Learning for Performance Prediction of Spark Cloud Applications
    Maros, Alexandre
    Murai, Fabricio
    Couto da Silva, Ana Paula
    Almeida, Jussara M.
    Lattuada, Marco
    Gianniti, Eugenio
    Hosseini, Marjan
    Ardagna, Danilo
    2019 IEEE 12TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (IEEE CLOUD 2019), 2019, : 99 - 106
  • [2] StageNet: Stage-Aware Neural Networks for Health Risk Prediction
    Gao, Junyi
    Xiao, Cao
    Wang, Yasha
    Tang, Wen
    Glass, Lucas M.
    Sun, Jimeng
    WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 530 - 540
  • [3] Stage-Aware Hierarchical Attentive Relational Network for Diagnosis Prediction
    Wang, Liping
    Liu, Qiang
    Zhang, Mengqi
    Hu, Yaxuan
    Wu, Shu
    Wang, Liang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (04) : 1773 - 1784
  • [4] Stage-aware Brain Graph Learning for Alzheimer's Disease
    Peng, Ciyuan
    Liu, Mujie
    Meng, Chenxuan
    Xue, Sha
    Keogh, Kathleen
    Xia, Feng
    2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 1346 - 1349
  • [5] CloudProphet: A Machine Learning-Based Performance Prediction for Public Clouds
    Huang, Darong
    Costero, Luis
    Pahlevan, Ali
    Zapater, Marina
    Atienza, David
    IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, 2024, 9 (04): : 661 - 676
  • [6] A DT Machine Learning-Based Satellite Orbit Prediction for IoT Applications
    Xu X.
    Wen H.
    Song H.
    Zhao Y.
    IEEE Internet of Things Magazine, 2023, 6 (02): : 96 - 100
  • [7] Machine learning-based prediction of transfusion
    Mitterecker, Andreas
    Hofmann, Axel
    Trentino, Kevin M.
    Lloyd, Adam
    Leahy, Michael F.
    Schwarzbauer, Karin
    Tschoellitsch, Thomas
    Boeck, Carl
    Hochreiter, Sepp
    Meier, Jens
    TRANSFUSION, 2020, 60 (09) : 1977 - 1986
  • [8] Dynamic Stage-aware User Interest Learning for Heterogeneous Sequential Recommendation
    Li, Weixin
    Lin, Xiaolin
    Pan, Weike
    Ming, Zhong
    PROCEEDINGS OF THE EIGHTEENTH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2024, 2024, : 465 - 474
  • [9] Online Machine Learning-based Temperature Prediction for Thermal-aware NoC System
    Chen, Kun-Chih
    Liao, Yuan-Hou
    2019 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC), 2019, : 65 - 66
  • [10] Machine learning-based prediction and performance study of transparent soil properties
    Wang, Bo
    Hou, Hengjun
    Zhu, Zhengwei
    Xiao, Wang
    SMART STRUCTURES AND SYSTEMS, 2021, 28 (02) : 289 - 304