Symbolic regression as a feature engineering method for machine and deep learning regression tasks

被引:2
|
作者
Shmuel, Assaf [1 ]
Glickman, Oren [1 ]
Lazebnik, Teddy [2 ,3 ]
机构
[1] Bar Ilan Univ, Dept Comp Sci, Ramat Gan, Israel
[2] Ariel Univ, Dept Math, Ariel, Israel
[3] UCL, Canc Inst, Dept Canc Biol, London, England
来源
关键词
symbolic regression; neural network; data-driven physics; feature engineering; data science; FEATURE-SELECTION; BIG DATA; MODEL;
D O I
10.1088/2632-2153/ad513a
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the realm of machine and deep learning (DL) regression tasks, the role of effective feature engineering (FE) is pivotal in enhancing model performance. Traditional approaches of FE often rely on domain expertise to manually design features for machine learning (ML) models. In the context of DL models, the FE is embedded in the neural network's architecture, making it hard for interpretation. In this study, we propose to integrate symbolic regression (SR) as an FE process before a ML model to improve its performance. We show, through extensive experimentation on synthetic and 21 real-world datasets, that the incorporation of SR-derived features significantly enhances the predictive capabilities of both machine and DL regression models with 34%-86% root mean square error (RMSE) improvement in synthetic datasets and 4%-11.5% improvement in real-world datasets. In an additional realistic use case, we show the proposed method improves the ML performance in predicting superconducting critical temperatures based on Eliashberg theory by more than 20% in terms of RMSE. These results outline the potential of SR as an FE component in data-driven models, improving them in terms of performance and interpretability.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] A decomposition method for symbolic regression problems
    Astarabadi, Samaneh Sadat Mousavi
    Ebadzadeh, Mohammad Mehdi
    APPLIED SOFT COMPUTING, 2018, 62 : 514 - 523
  • [32] Impact Analysis of Stacked Machine Learning Algorithms Based Feature Selections for Deep Learning Algorithm Applied to Regression Analysis
    Kulkarni, Shrirang Ambaji
    Gurupur, Varadraj P.
    King, Christian
    SOUTHEASTCON 2022, 2022, : 269 - 275
  • [33] The Method of Applying Support Vector Machine to Engineering Data Regression
    Tian, Jin
    PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND MANAGEMENT INNOVATION, 2015, 28 : 640 - 644
  • [34] A Deep Learning Assisted Gene Expression Programming Framework for Symbolic Regression Problems
    Zhong, Jinghui
    Lin, Yusen
    Lu, Chengyu
    Huang, Zhixing
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT VII, 2018, 11307 : 530 - 541
  • [35] On the Existence of Feature Bundles and their Effect on Symbolic Regression Algorithms
    Neshatian, Kourosh
    Varn, Lucianne
    2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 2974 - 2981
  • [36] Feature Standardisation and Coefficient Optimisation for Effective Symbolic Regression
    Dick, Grant
    Owen, Caitlin A.
    Whigham, Peter A.
    GECCO'20: PROCEEDINGS OF THE 2020 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2020, : 306 - 314
  • [37] Extremely Accurate Symbolic Regression for Large Feature Problems
    Korns, Michael F.
    GENETIC PROGRAMMING THEORY AND PRACTICE XII, 2015, : 109 - 131
  • [38] Accelerating graph-based tracking tasks with symbolic regression
    Soybelman, Nathalie
    Schiavi, Carlo
    Di Bello, Francesco A.
    Gross, Eilam
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2024, 5 (04):
  • [39] Predicting Agriculture Yields Based on Machine Learning Using Regression and Deep Learning
    Sharma, Priyanka
    Dadheech, Pankaj
    Aneja, Nagender
    Aneja, Sandhya
    IEEE ACCESS, 2023, 11 : 111255 - 111264
  • [40] Iterative symbolic regression for learning transport equations
    Ansari, Mehrad
    Gandhi, Heta A.
    Foster, David G.
    White, Andrew D.
    AICHE JOURNAL, 2022, 68 (06)