A hybrid feature selection method for text classification using a feature-correlation-based genetic algorithmA hybrid feature selection method for text classification...L. Farek, A. Benaidja

被引:0
|
作者
Lazhar Farek [1 ]
Amira Benaidja [3 ]
机构
[1] University of Guelma,Computer Science Department
[2] University of Setif 1,Computer Science Department
[3] Laboratory of Vision and Artificial Intelligence - LAVIA,undefined
[4] Larbi Tebessi University,undefined
关键词
Feature redundancy; High-dimensionality; Correlation analysis; Redundancy removal; Optimization;
D O I
10.1007/s00500-024-10386-x
中图分类号
学科分类号
摘要
This paper introduces a new hybrid method to address the issue of redundant and irrelevant features selected by filter-based methods for text classification. The method utilizes an enhanced genetic algorithm called “Feature Correlation-based Genetic Algorithm” (FC-GA). Initially, a feature subset with the highest classification accuracy is selected by a filter-based method, which will be then used by the FC-GA to generate potential solutions by considering the correlation between features that have similar classification weights and avoiding useless random solutions. The encoding process involves assigning a value of 0 to features that provide a high degree of correlation with other features having almost the same classification information beyond a specified context, while features that are lowly correlated retain their initial code of 1. Through iterative optimization using crossover and mutation operators, the algorithm should remove redundant features that provide strong correlations and high redundancy, which could lead to improved classification performance at a lower computation cost. The aim of this study is to improve the efficiency of filter-based methods, incorporate feature correlation information into genetic algorithms, and utilize pre-optimized feature subsets to efficiently identify optimal solutions. To evaluate the effectiveness of the proposed method, SVM and NB classifiers are employed on six public datasets and compared to five well-known and effective filter-based methods. The results indicate that a significant portion (about 50%) of the features selected by reference filter-based methods are redundant. Eliminating those redundant features leads to a significant improvement in classification performance as measured by the micro-f1 measure.
引用
收藏
页码:13567 / 13593
页数:26
相关论文
共 50 条
  • [21] Hybrid ACO and TOFA Feature Selection Approach for Text Classification
    Alghamdi, Hanan S.
    Tang, H. Lilian
    Alshomrani, Saleh
    2012 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2012,
  • [22] A novel filter feature selection method for text classification: Extensive Feature Selector
    Parlak, Bekir
    Uysal, Alper Kursat
    JOURNAL OF INFORMATION SCIENCE, 2023, 49 (01) : 59 - 78
  • [23] Research on Feature Selection and kNN Classification Method in Chinese Text Classification
    Xiao Chao
    Wu Ping
    PROCEEDINGS OF THE 2015 4TH NATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS AND COMPUTER ENGINEERING ( NCEECE 2015), 2016, 47 : 956 - 962
  • [24] A feature selection method based on synonym merging in text classification system
    Haipeng Yao
    Chong Liu
    Peiying Zhang
    Luyao Wang
    EURASIP Journal on Wireless Communications and Networking, 2017
  • [25] A feature selection method based on synonym merging in text classification system
    Yao, Haipeng
    Liu, Chong
    Zhang, Peiying
    Wang, Luyao
    EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2017,
  • [26] An improved method of feature selection based on concept attributes in text classification
    Liao, SS
    Jiang, MH
    ADVANCES IN NATURAL COMPUTATION, PT 1, PROCEEDINGS, 2005, 3610 : 1140 - 1149
  • [27] Intelligent Feature Selection Using Hybrid Based Feature Selection Method
    Nisar, Shibli
    Tariq, Muhammad
    2016 SIXTH INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING TECHNOLOGY (INTECH), 2016, : 168 - 172
  • [28] Feature selection using hybrid poor and rich optimization algorithm for text classification
    Thirumoorthy, K.
    Muneeswaran, K.
    PATTERN RECOGNITION LETTERS, 2021, 147 : 63 - 70
  • [29] A feature selection method to handle imbalanced data in text classification
    Chang, Fengxiang
    Guo, Jun
    Xu, Weiran
    Yao, Kejun
    Journal of Digital Information Management, 2015, 13 (03): : 169 - 175
  • [30] A Review on Feature Selection and Feature Extraction for Text Classification
    Shah, Foram P.
    Patel, Vibha
    PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2016, : 2264 - 2268