A Genetic Programming approach for feature selection in highly dimensional skewed data

被引:58
|
作者
Viegas, Felipe [2 ]
Rocha, Leonardo [1 ]
Goncalves, Marcos [2 ]
Mourao, Fernando [1 ]
Sa, Giovanni [1 ]
Salles, Thiago [2 ]
Andrade, Guilherme [2 ]
Sandin, Isac [1 ]
机构
[1] Univ Fed Sao Joao del Rei, Dept Comp Sci, Sao Joao Del Rei, MG, Brazil
[2] Univ Fed Minas Gerais, Dept Comp Sci, Belo Horizonte, MG, Brazil
关键词
Feature selection; Classification; Genetic Programming; CLASSIFICATION;
D O I
10.1016/j.neucom.2017.08.050
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High dimensionality, also known as the curse of dimensionality, is still a major challenge for automatic classification solutions. Accordingly, several feature selection (FS) strategies have been proposed for dimensionality reduction over the years. However, they potentially perform poorly in face of unbalanced data. In this work, we propose a novel feature selection strategy based on Genetic Programming, which is resilient to data skewness issues, in other words, it works well with both, balanced and unbalanced data. The proposed strategy aims at combining the most discriminative feature sets selected by distinct feature selection metrics in order to obtain a more effective and impartial set of the most discriminative features, departing from the hypothesis that distinct feature selection metrics produce different (and potentially complementary) feature space projections. We evaluated our proposal in biological and textual datasets. Our experimental results show that our proposed solution not only increases the efficiency of the learning process, reducing up to 83% the size of the data space, but also significantly increases its effectiveness in some scenarios. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:554 / 569
页数:16
相关论文
共 50 条
  • [21] A Light Causal Feature Selection Approach to High-Dimensional Data
    Ling, Zhaolong
    Li, Ying
    Zhang, Yiwen
    Yu, Kui
    Zhou, Peng
    Li, Bo
    Wu, Xindong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (08) : 7639 - 7650
  • [22] Genetic programming with active data selection
    Zhang, BT
    Cho, DY
    SIMULATED EVOLUTION AND LEARNING, 1999, 1585 : 146 - 153
  • [23] An Enhanced Hybrid Feature Selection Approach for High Dimensional Data Processing
    Lincy, Blessy Trencia
    Nagarajan, Suresh Kumar
    JOURNAL OF TESTING AND EVALUATION, 2019, 47 (06) : 4161 - 4176
  • [24] A Hybridization Approach for Optimal Feature Subset Selection in High Dimensional Data
    Sharmili, K. C.
    Chilambuchelvan, A.
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2018, 26 (06) : 949 - 970
  • [25] Multistage feature selection approach for high-dimensional cancer data
    Alkuhlani, Alhasan
    Nassef, Mohammad
    Farag, Ibrahim
    SOFT COMPUTING, 2017, 21 (22) : 6895 - 6906
  • [26] Multistage feature selection approach for high-dimensional cancer data
    Alhasan Alkuhlani
    Mohammad Nassef
    Ibrahim Farag
    Soft Computing, 2017, 21 : 6895 - 6906
  • [27] Genetic Programming for Feature Selection and Feature Combination in Salient Object Detection
    Afzali, Shima
    Al-Sahaf, Harith
    Xue, Bing
    Hollitt, Christopher
    Zhang, Mengjie
    APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2019, 2019, 11454 : 308 - 324
  • [28] Aggressive and Effective Feature Selection using Genetic Programming
    Sandin, Isac
    Andrade, Guilherme
    Viegas, Felipe
    Madeira, Daniel
    Rocha, Leonardo
    Salles, Thiago
    Goncalves, Marcos
    2012 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2012,
  • [29] Genetic programming for simultaneous feature selection and classifier design
    Muni, DP
    Pal, NR
    Das, J
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2006, 36 (01): : 106 - 117
  • [30] Feature selection for speaker verification using genetic programming
    Loughran R.
    Agapitos A.
    Kattan A.
    Brabazon A.
    O’Neill M.
    Evolutionary Intelligence, 2017, 10 (1-2) : 1 - 21