Data-Efficient Performance Modeling for Configurable Big Data Frameworks by Reducing Information Overlap Between Training Examples

被引:2
|
作者
Liu, Zhiqiang [1 ]
Shi, Xuanhua [1 ]
Jin, Hai [1 ]
机构
[1] Huazhong Univ Sci & Technol, Natl Engn Res Ctr Big Data Technol & Syst, Serv Comp Technol & Syst Lab, Cluster & Grid Comp Lab, Wuhan 430074, Peoples R China
关键词
Big data framework; Highly configurable software; Performance model; Active sampling; MAPREDUCE;
D O I
10.1016/j.bdr.2022.100358
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To support the various analysis application of big data, big data processing frameworks are designed to be highly configurable. However, for common users, it is difficult to tailor the configurable frameworks to achieve optimal performance for every application. Recently, many automatic tuning methods are proposed to configure these frameworks. In detail, these methods firstly build a performance prediction model through sampling configurations randomly and measuring the corresponding performance. Then, they conduct heuristic search in the configuration space based on the performance prediction model. For most frameworks, it is too expensive to build the performance model since it needs to measure the performance of large amounts of configurations, which cause too much overhead on data collection. In this paper, we propose a novel data-efficient method to build the performance model with little impact on prediction accuracy. Compared to the traditional methods, the proposed method can reduce the overhead of data collection because it can train the performance model with much less training examples. Specifically, the proposed method can actively sample the important examples according to the dynamic requirement of the performance model during the iterative model updating. Hence, it can make full use of the collected informative data and train the performance model with much less training examples. To sample the important training examples, we employ several virtual performance model to estimate the importance of all candidate configurations efficiently. Experimental results show that our method needs less training examples than traditional methods with little impact on prediction accuracy.(c) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页数:10
相关论文
共 10 条
  • [1] Data-efficient performance learning for configurable systems
    Jianmei Guo
    Dingyu Yang
    Norbert Siegmund
    Sven Apel
    Atrisha Sarkar
    Pavel Valov
    Krzysztof Czarnecki
    Andrzej Wasowski
    Huiqun Yu
    Empirical Software Engineering, 2018, 23 : 1826 - 1867
  • [2] Data-efficient performance learning for configurable systems
    Guo, Jianmei
    Yang, Dingyu
    Siegmund, Norbert
    Apel, Sven
    Sarkar, Atrisha
    Valov, Pavel
    Czarnecki, Krzysztof
    Wasowski, Andrzej
    Yu, Huiqun
    EMPIRICAL SOFTWARE ENGINEERING, 2018, 23 (03) : 1826 - 1867
  • [3] Modeling and Simulation in Performance Optimization of Big Data Processing Frameworks
    Ranjan, Rajiv
    IEEE CLOUD COMPUTING, 2014, 1 (04): : 14 - 19
  • [4] Data-efficient surrogate modeling of thermodynamic equilibria using Sobolev training, data augmentation and adaptive sampling
    Winz, Joschka
    Engell, Sebastian
    CHEMICAL ENGINEERING SCIENCE, 2024, 299
  • [5] Performance Modeling and Analysis of a Hadoop Cluster for Efficient Big Data Processing
    Lim, JongBeom
    Ahnh, Jong-Suk
    Lee, Kang-Woo
    ADVANCED SCIENCE LETTERS, 2016, 22 (09) : 2314 - 2319
  • [6] PERFORMANCE-EFFICIENT RECOMMENDATION AND PREDICTION SERVICE FOR BIG DATA FRAMEWORKS FOCUSING ON DATA COMPRESSION AND IN-MEMORY DATA STORAGE INDICATORS
    Astsatryan, Hrachya
    Lalayan, Arthur
    Kocharyan, Aram
    Hagimont, Daniel
    SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2021, 22 (04): : 401 - 412
  • [7] Adaptive incremental transfer learning for efficient performance modeling of big data workloads
    Garralda-Barrio, Mariano
    Eiras-Franco, Carlos
    Bolon-Canedo, Veronica
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2025, 166
  • [8] Modeling of performance evaluation of educational information based on big data deep learning and cloud platform
    Ye, Jun
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 38 (06) : 7155 - 7165
  • [9] Potential of big visual data and building information modeling for construction performance analytics: An exploratory study
    Han, Kevin K.
    Golparvar-Fard, Mani
    AUTOMATION IN CONSTRUCTION, 2017, 73 : 184 - 198
  • [10] A Study on the Relationship between CEO Succession Based on Big Data Information Processing Technology and Corporate Performance
    Liu, Ying
    Wang, Jin
    Li, Na
    JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (03) : 228 - 237