Adaptive incremental transfer learning for efficient performance modeling of big data workloads

被引:0
|
作者
Garralda-Barrio, Mariano [1 ]
Eiras-Franco, Carlos [1 ]
Bolon-Canedo, Veronica [1 ]
机构
[1] Univ A Coruna, CITIC, La Coruna, Spain
关键词
Performance modeling; Big data; Machine learning; Apache spark; Distributed computing;
D O I
10.1016/j.future.2025.107730
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The rise of data-intensive scalable computing systems, such as Apache Spark, has transformed data processing by enabling the efficient manipulation of large datasets across machine clusters. However, system configuration to optimize performance remains a challenge. This paper introduces an adaptive incremental transfer learning approach to predicting workload execution times. By integrating both unsupervised and supervised learning, we develop models that adapt incrementally to new workloads and configurations. To guide the optimal selection of relevant workloads, the model employs the coefficient of distance variation (CdV) and the coefficient of quality correlation (CqC), combined in the exploration-exploitation balance coefficient (EEBC). Comprehensive evaluations demonstrate the robustness and reliability of our model for performance modeling in Spark applications, with average improvements of up to 31% over state-of-the-art methods. This research contributes to efficient performance tuning systems by enabling transfer learning from historical workloads to new, previously unseen workloads. The full source code is openly available.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Adaptive Learning Model and Implementation Based on Big Data
    Liang, Qilang
    Chang, Na
    2019 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD 2019), 2019, : 183 - 186
  • [42] Adaptive Deep Incremental Learning - Assisted Missing Data Imputation for Streaming Data
    Syavasya, C. V. S. R.
    Lakshmi, M. A.
    JOURNAL OF INTERCONNECTION NETWORKS, 2022, 22 (SUPP02)
  • [43] An Innovative Deep-Learning Algorithm for Supporting the Approximate Classification of Workloads in Big Data Environments
    Cuzzocrea, Alfredo
    Mumolo, Enzo
    Leung, Carson K.
    Grasso, Giorgio Mario
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING (IDEAL 2019), PT II, 2019, 11872 : 225 - 237
  • [44] Adaptive efficient analysis for big data ergodic diffusion models
    Leonid I. Galtchouk
    Serge M. Pergamenshchikov
    Statistical Inference for Stochastic Processes, 2022, 25 : 127 - 158
  • [45] Adaptive efficient analysis for big data ergodic diffusion models
    Galtchouk, Leonid I.
    Pergamenshchikov, Serge M.
    STATISTICAL INFERENCE FOR STOCHASTIC PROCESSES, 2022, 25 (01) : 127 - 158
  • [46] Optimal and Efficient Distributed Online Learning for Big Data
    Sayin, Muhammed O.
    Vanli, N. Denizcan
    Delibalta, Ibrahim
    Kozat, Suleyman S.
    2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 126 - 133
  • [47] A Novel Efficient Discriminative Projection Learning for Big Data
    Cui, Yan
    Jiang, Jielin
    Jiang, Xiaoyan
    Dai, Yue
    2022 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2022, : 658 - 665
  • [48] Big data transfer optimization through adaptive parameter tuning
    Arslan, Engin
    Pehlivan, Bahadir A.
    Kosar, Tevfik
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2018, 120 : 89 - 100
  • [49] Efficient Learning in Adaptive Processing of Data Structures
    Siu-Yeung Cho
    Zheru Chi
    Zhiyong Wang
    Wan-Chi Siu
    Neural Processing Letters, 2003, 17 : 175 - 190
  • [50] Efficient learning in adaptive processing of data structures
    Cho, SY
    Chi, ZR
    Wang, ZY
    Siu, WC
    NEURAL PROCESSING LETTERS, 2003, 17 (02) : 175 - 190