A hybrid feature selection method for text classification using a feature-correlation-based genetic algorithmA hybrid feature selection method for text classification...L. Farek, A. Benaidja

被引:0
|
作者
Lazhar Farek [1 ]
Amira Benaidja [3 ]
机构
[1] University of Guelma,Computer Science Department
[2] University of Setif 1,Computer Science Department
[3] Laboratory of Vision and Artificial Intelligence - LAVIA,undefined
[4] Larbi Tebessi University,undefined
关键词
Feature redundancy; High-dimensionality; Correlation analysis; Redundancy removal; Optimization;
D O I
10.1007/s00500-024-10386-x
中图分类号
学科分类号
摘要
This paper introduces a new hybrid method to address the issue of redundant and irrelevant features selected by filter-based methods for text classification. The method utilizes an enhanced genetic algorithm called “Feature Correlation-based Genetic Algorithm” (FC-GA). Initially, a feature subset with the highest classification accuracy is selected by a filter-based method, which will be then used by the FC-GA to generate potential solutions by considering the correlation between features that have similar classification weights and avoiding useless random solutions. The encoding process involves assigning a value of 0 to features that provide a high degree of correlation with other features having almost the same classification information beyond a specified context, while features that are lowly correlated retain their initial code of 1. Through iterative optimization using crossover and mutation operators, the algorithm should remove redundant features that provide strong correlations and high redundancy, which could lead to improved classification performance at a lower computation cost. The aim of this study is to improve the efficiency of filter-based methods, incorporate feature correlation information into genetic algorithms, and utilize pre-optimized feature subsets to efficiently identify optimal solutions. To evaluate the effectiveness of the proposed method, SVM and NB classifiers are employed on six public datasets and compared to five well-known and effective filter-based methods. The results indicate that a significant portion (about 50%) of the features selected by reference filter-based methods are redundant. Eliminating those redundant features leads to a significant improvement in classification performance as measured by the micro-f1 measure.
引用
收藏
页码:13567 / 13593
页数:26
相关论文
共 50 条
  • [11] A hybrid feature selection method for text categorization
    Montanes, E.
    Quevedo, J. R.
    Combarro, E. F.
    Diaz, I.
    Ranilla, J.
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2007, 15 (02) : 133 - 151
  • [12] Feature Selection For Text Classification Using Genetic Algorithms
    Bidi, Noria
    Elberrichi, Zakaria
    PROCEEDINGS OF 2016 8TH INTERNATIONAL CONFERENCE ON MODELLING, IDENTIFICATION & CONTROL (ICMIC 2016), 2016, : 806 - 810
  • [13] A Hybrid Feature Selection Method for Classification Purposes
    Cateni, Silvia
    Colla, Valentina
    Vannucci, Marco
    UKSIM-AMSS EIGHTH EUROPEAN MODELLING SYMPOSIUM ON COMPUTER MODELLING AND SIMULATION (EMS 2014), 2014, : 39 - 44
  • [14] A New Filter Feature Selection Method for Text Classification
    Cekik, Rasim
    IEEE ACCESS, 2024, 12 : 139316 - 139335
  • [15] A parallel feature selection method study for text classification
    Li, Zhao
    Lu, Wei
    Sun, Zhanquan
    Xing, Weiwei
    NEURAL COMPUTING & APPLICATIONS, 2017, 28 : S513 - S524
  • [16] Statera: A Balanced Feature Selection Method for Text Classification
    Gama Bispo, Braian Varjao
    Rios, Tatiane Nogueira
    2018 7TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2018, : 260 - 265
  • [17] A parallel feature selection method study for text classification
    Zhao Li
    Wei Lu
    Zhanquan Sun
    Weiwei Xing
    Neural Computing and Applications, 2017, 28 : 513 - 524
  • [18] A novel probabilistic feature selection method for text classification
    Uysal, Alper Kursat
    Gunal, Serkan
    KNOWLEDGE-BASED SYSTEMS, 2012, 36 : 226 - 235
  • [19] Feature Selection in Text Classification
    Sahin, Durmus Ozkan
    Ates, Nurullah
    Kilic, Erdal
    2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1777 - 1780
  • [20] A New Feature Selection Method for Text Classification Based on Independent Feature Space Search
    Liu, Yong
    Ju, Shenggen
    Wang, Junfeng
    Su, Chong
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020