On Machine Learning-based Stage-aware Performance Prediction of Spark Applications

被引:1
|
作者
Ye, Guangjun [1 ]
Liu, Wuji [2 ]
Wu, Chase Q. [2 ]
Shen, Wei [1 ]
Lyu, Xukang [3 ]
机构
[1] Zhejiang Sci Tech Univ, Sch Informat Sci & Technol, Hangzhou 310018, Zhejiang, Peoples R China
[2] New Jersey Inst Technol, Dept Comp Sci, Newark, NJ 07102 USA
[3] Tianjin Univ, Coll Intelligence & Comp, Sch Comp Software, Tianjin 300354, Peoples R China
基金
美国国家科学基金会;
关键词
Big data computing; performance modeling; Spark; in-memory processing;
D O I
10.1109/IPCCC50635.2020.9391564
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The data volume of large-scale applications in various science, engineering, and business domains has experienced an explosive growth over the past decade, and has gone far beyond the computing capability and storage capacity of any single server. As a viable solution, such data is oftentimes stored in distributed file systems and processed by parallel computing engines, as exemplified by Spark, which has gained increasing popularity over the traditional MapReduce framework due to its fast in-memory processing of streaming data. Spark engines are generally deployed in cloud environments such as Amazon EC2 and Alibaba Cloud. However, storage and computing resources in these cloud environments are typically provisioned on a pay-as-you-go basis and thus an accurate estimate of the execution time of Spark workloads is critical to making full utilization of cloud resources and meeting performance requirements of end users. Our insight is that the execution pattern of many Spark workloads is qualitatively similar, which makes it possible to leverage historical performance data to predict the execution time of a given Spark application. We use the execution information extracted from Spark History Server as training data and develop a stage-aware hierarchical neural network model for performance prediction. Experimental results show that the proposed hierarchical model achieves higher accuracy than a holistic prediction model at the end-to-end level, and also outperforms other existing regression-based prediction methods.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] Machine Learning-based Pin Accessibility Prediction and Application
    Fang, Shao-Yun
    2021 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN, AUTOMATION AND TEST (VLSI-DAT), 2021,
  • [42] Machine Learning-based Corporate Socia Responsibility Prediction
    Teoh, T-T
    Heng, Q. K.
    Chia, J. J.
    Shie, J. M.
    Liaw, S. W.
    Yang, M.
    Nguwi, Y-Y
    PROCEEDINGS OF THE IEEE 2019 9TH INTERNATIONAL CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEMS (CIS) ROBOTICS, AUTOMATION AND MECHATRONICS (RAM) (CIS & RAM 2019), 2019, : 501 - 505
  • [43] Machine Learning-Based Prediction of the Excitation Wavelength of Phosphors
    Sahu, Sunil K.
    Shrivastav, Anil
    Swamy, N. K.
    Dubey, Vikas
    Halwar, D. K.
    Kumar, M. Tanooj
    Rao, M. C.
    JOURNAL OF APPLIED SPECTROSCOPY, 2024, 91 (03) : 669 - 677
  • [44] Machine learning-based prediction of FeNi nanoparticle magnetization
    Williamson, Federico
    Naciff, Nadhir
    Catania, Carlos
    dos Santos, Gonzalo
    Amigo, Nicolas
    Bringa, Eduardo M.
    JOURNAL OF MATERIALS RESEARCH AND TECHNOLOGY-JMR&T, 2024, 33 : 5263 - 5276
  • [45] Machine Learning-Based Link Prediction for Hotel Network
    Sevim, Yiğit
    Orman, Günce Keziban
    Yöndem, Meltem Turhan
    IAENG International Journal of Computer Science, 2022, 49 (04)
  • [46] Interpretability of machine learning-based prediction models in healthcare
    Stiglic, Gregor
    Kocbek, Primoz
    Fijacko, Nino
    Zitnik, Marinka
    Verbert, Katrien
    Cilar, Leona
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 10 (05)
  • [47] Machine Learning-Based Prediction of Antiferromagnetic Skyrmion Formation
    Saini, Shipra
    Shukla, Alok Kumar
    Nehete, Hemkant
    Bindal, Namita
    Kaushik, Brajesh Kumar
    IEEE TRANSACTIONS ON ELECTRON DEVICES, 2024, 71 (04) : 2774 - 2780
  • [48] Machine learning-based prediction models for postpartum hemorrhage
    Venkatesh, Kartik K.
    Strauss, Robert
    Grotegut, Chad
    Heine, Phillips
    Stamilio, David M.
    Menard, Kathryn
    Jelovsek, Eric
    AMERICAN JOURNAL OF OBSTETRICS AND GYNECOLOGY, 2020, 222 (01) : S175 - S176
  • [49] Machine Learning-Based Prediction of the Martensite Start Temperature
    Wentzien, Marcel
    Koch, Marcel
    Friedrich, Thomas
    Ingber, Jerome
    Kempka, Henning
    Schmalzried, Dirk
    Kunert, Maik
    STEEL RESEARCH INTERNATIONAL, 2024, 95 (10)
  • [50] Machine Learning-based RSSI Prediction in Factory Environments
    Webber, Julian
    Suga, Norisato
    Ano, Susumu
    Jou, Yafei
    Mehbodniya, Abolfazl
    Higashimori, Toshihide
    Yano, Kazuto
    Suzuki, Yoshinori
    PROCEEDINGS OF 2019 25TH ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS (APCC), 2019, : 195 - 200