A Genetic Programming approach for feature selection in highly dimensional skewed data

被引:58
|
作者
Viegas, Felipe [2 ]
Rocha, Leonardo [1 ]
Goncalves, Marcos [2 ]
Mourao, Fernando [1 ]
Sa, Giovanni [1 ]
Salles, Thiago [2 ]
Andrade, Guilherme [2 ]
Sandin, Isac [1 ]
机构
[1] Univ Fed Sao Joao del Rei, Dept Comp Sci, Sao Joao Del Rei, MG, Brazil
[2] Univ Fed Minas Gerais, Dept Comp Sci, Belo Horizonte, MG, Brazil
关键词
Feature selection; Classification; Genetic Programming; CLASSIFICATION;
D O I
10.1016/j.neucom.2017.08.050
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High dimensionality, also known as the curse of dimensionality, is still a major challenge for automatic classification solutions. Accordingly, several feature selection (FS) strategies have been proposed for dimensionality reduction over the years. However, they potentially perform poorly in face of unbalanced data. In this work, we propose a novel feature selection strategy based on Genetic Programming, which is resilient to data skewness issues, in other words, it works well with both, balanced and unbalanced data. The proposed strategy aims at combining the most discriminative feature sets selected by distinct feature selection metrics in order to obtain a more effective and impartial set of the most discriminative features, departing from the hypothesis that distinct feature selection metrics produce different (and potentially complementary) feature space projections. We evaluated our proposal in biological and textual datasets. Our experimental results show that our proposed solution not only increases the efficiency of the learning process, reducing up to 83% the size of the data space, but also significantly increases its effectiveness in some scenarios. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:554 / 569
页数:16
相关论文
共 50 条
  • [41] Genetic Programming for Feature Selection and Feature Construction in Skin Cancer Image Classification
    Ul Ain, Qurrat
    Xue, Bing
    Al-Sahaf, Harith
    Zhang, Mengjie
    PRICAI 2018: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2018, 11012 : 732 - 745
  • [42] A Novel GA-based Feature Selection Approach for High Dimensional Data
    De Stefano, Claudio
    Fontanella, Francesco
    di Freca, Alessandra Scotto
    PROCEEDINGS OF THE 2016 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'16 COMPANION), 2016, : 87 - 88
  • [43] A BINARY KRILL HERD APPROACH BASED FEATURE SELECTION FOR HIGH DIMENSIONAL DATA
    Shahana, A. H.
    Preeja, V
    2016 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT), VOL 3, 2015, : 630 - 635
  • [44] A BINARY KRILL HERD APPROACH BASED FEATURE SELECTION FOR HIGH DIMENSIONAL DATA
    Shahana, A. H.
    Preeja, V
    2016 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT), VOL 2, 2016, : 297 - 302
  • [45] A Hybrid Approach for Feature Selection Based on Correlation Feature Selection and Genetic Algorithm
    Rani, Pooja
    Kumar, Rajneesh
    Jain, Anurag
    INTERNATIONAL JOURNAL OF SOFTWARE INNOVATION, 2022, 10 (01)
  • [46] PhysicsGP: A genetic programming approach to event selection
    Cranmer, K
    Bowman, RS
    COMPUTER PHYSICS COMMUNICATIONS, 2005, 167 (03) : 165 - 176
  • [47] A Novel Multiobjective Genetic Programming Approach to High-Dimensional Data Classification
    Zhou, Yu
    Yang, Nanjian
    Huang, Xingyue
    Lee, Jaesung
    Kwong, Sam
    IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (09) : 5205 - 5216
  • [48] Feature Extraction and Selection for Parsimonious Classifiers With Multiobjective Genetic Programming
    Nag, Kaustuv
    Pal, Nikhil R.
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2020, 24 (03) : 454 - 466
  • [49] Importance of feature selection stability in the classifier evaluation on high- dimensional genetic data
    Lukaszuk, Tomasz
    Krawczuk, Jerzy
    PEERJ, 2024, 12
  • [50] Sensitivity-Like Analysis for Feature Selection in Genetic Programming
    Dick, Grant
    PROCEEDINGS OF THE 2017 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'17), 2017, : 401 - 408