Genetic Programming for Feature Selection Based on Feature Removal Impact in High-Dimensional Symbolic Regression

被引:4
|
作者
Al-Helali, Baligh [1 ,2 ]
Chen, Qi [1 ,2 ]
Xue, Bing [1 ,2 ]
Zhang, Mengjie [1 ,2 ]
机构
[1] Victoria Univ Wellington, Ctr Data Sci & Artificial Intelligence, Wellington 6140, New Zealand
[2] Victoria Univ Wellington, Sch Engn & Comp Sci, Wellington 6140, New Zealand
关键词
Feature selection; genetic programming; high dimensionality; symbolic regression; FEATURE RANKING; CLASSIFICATION; EVOLUTIONARY;
D O I
10.1109/TETCI.2024.3369407
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Symbolic regression is increasingly important for discovering mathematical models for various prediction tasks. It works by searching for the arithmetic expressions that best represent a target variable using a set of input features. However, as the number of features increases, the search process becomes more complex. To address high-dimensional symbolic regression, this work proposes a genetic programming for feature selection method based on the impact of feature removal on the performance of SR models. Unlike existing Shapely value methods that simulate feature absence at the data level, the proposed approach suggests removing features at the model level. This approach circumvents the production of unrealistic data instances, which is a major limitation of Shapely value and permutation-based methods. Moreover, after calculating the importance of the features, a cut-off strategy, which works by injecting a number of random features and utilising their importance to automatically set a threshold, is proposed for selecting important features. The experimental results on artificial and real-world high-dimensional data sets show that, compared with state-of-the-art feature selection methods using the permutation importance and Shapely value, the proposed method not only improves the SR accuracy but also selects smaller sets of features.
引用
收藏
页码:2269 / 2282
页数:14
相关论文
共 50 条
  • [41] A PSO Based Hybrid Feature Selection Algorithm for High-Dimensional Classification
    Binh Tran
    Zhang, Mengjie
    Xue, Bing
    2016 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2016, : 3801 - 3808
  • [42] A GA-based Feature Selection for High-dimensional Data Clustering
    Sun, Mei
    Xiong, Langhuan
    Sun, Haojun
    Jiang, Dazhi
    THIRD INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTING, 2009, : 769 - 772
  • [43] A Feature Subset Selection Method Based On High-Dimensional Mutual Information
    Zheng, Yun
    Kwoh, Chee Keong
    ENTROPY, 2011, 13 (04) : 860 - 901
  • [44] Tournament screening cum EBIC for feature selection with high-dimensional feature spaces
    CHEN ZeHua1 & CHEN JiaHua2 1 Department of Statistics & Applied Probability
    Science China Mathematics, 2009, (06) : 1327 - 1341
  • [45] Feature Selection and Feature Learning for High-dimensional Batch Reinforcement Learning: A Survey
    Liu, De-Rong
    Li, Hong-Liang
    Wang, Ding
    INTERNATIONAL JOURNAL OF AUTOMATION AND COMPUTING, 2015, 12 (03) : 229 - 242
  • [46] Tournament screening cum EBIC for feature selection with high-dimensional feature spaces
    Chen Zehua
    Chen JiaHua
    SCIENCE IN CHINA SERIES A-MATHEMATICS, 2009, 52 (06): : 1327 - 1341
  • [47] Tournament screening cum EBIC for feature selection with high-dimensional feature spaces
    CHEN ZeHua CHEN JiaHua Department of Statistics Applied ProbabilityNational University of Singapore Science Drive Singapore Department of StatisticsUniversity of British ColumbiaVancouverBCVT ZCanada
    ScienceinChina(SeriesA:Mathematics), 2009, 52 (06) : 1327 - 1341
  • [48] Tournament screening cum EBIC for feature selection with high-dimensional feature spaces
    ZeHua Chen
    JiaHua Chen
    Science in China Series A: Mathematics, 2009, 52 : 1327 - 1341
  • [49] Automated Grammar-based Feature Selection in Symbolic Regression
    Ali, Muhammad Sarmad
    Kshirsagar, Meghana
    Naredo, Enrique
    Ryan, Conor
    PROCEEDINGS OF THE 2022 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'22), 2022, : 902 - 910
  • [50] Multitree Genetic Programming With Feature-Based Transfer Learning for Symbolic Regression on Incomplete Data
    Al-Helali, Baligh
    Chen, Qi
    Xue, Bing
    Zhang, Mengjie
    IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (07) : 4014 - 4027