A hybrid feature selection method for text classification using a feature-correlation-based genetic algorithmA hybrid feature selection method for text classification...L. Farek, A. Benaidja

被引：0

作者：

Lazhar Farek ^{[1
]}

Amira Benaidja ^{[3
]}

机构：

[1] University of Guelma,Computer Science Department

[2] University of Setif 1,Computer Science Department

[3] Laboratory of Vision and Artificial Intelligence - LAVIA,undefined

[4] Larbi Tebessi University,undefined

来源：

Soft Computing | 2024年 / 28卷 / 23期

关键词：

Feature redundancy; High-dimensionality; Correlation analysis; Redundancy removal; Optimization;

D O I：

10.1007/s00500-024-10386-x

中图分类号：

学科分类号：

摘要：

This paper introduces a new hybrid method to address the issue of redundant and irrelevant features selected by filter-based methods for text classification. The method utilizes an enhanced genetic algorithm called “Feature Correlation-based Genetic Algorithm” (FC-GA). Initially, a feature subset with the highest classification accuracy is selected by a filter-based method, which will be then used by the FC-GA to generate potential solutions by considering the correlation between features that have similar classification weights and avoiding useless random solutions. The encoding process involves assigning a value of 0 to features that provide a high degree of correlation with other features having almost the same classification information beyond a specified context, while features that are lowly correlated retain their initial code of 1. Through iterative optimization using crossover and mutation operators, the algorithm should remove redundant features that provide strong correlations and high redundancy, which could lead to improved classification performance at a lower computation cost. The aim of this study is to improve the efficiency of filter-based methods, incorporate feature correlation information into genetic algorithms, and utilize pre-optimized feature subsets to efficiently identify optimal solutions. To evaluate the effectiveness of the proposed method, SVM and NB classifiers are employed on six public datasets and compared to five well-known and effective filter-based methods. The results indicate that a significant portion (about 50%) of the features selected by reference filter-based methods are redundant. Eliminating those redundant features leads to a significant improvement in classification performance as measured by the micro-f1 measure.

引用

页码：13567 / 13593

页数：26

共 50 条

[21] Hybrid ACO and TOFA Feature Selection Approach for Text Classification
Alghamdi, Hanan S.
Tang, H. Lilian
Alshomrani, Saleh
2012 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2012,
[22] A novel filter feature selection method for text classification: Extensive Feature Selector
Parlak, Bekir
Uysal, Alper Kursat
JOURNAL OF INFORMATION SCIENCE, 2023, 49 (01) : 59 - 78
[23] Research on Feature Selection and kNN Classification Method in Chinese Text Classification
Xiao Chao
Wu Ping
PROCEEDINGS OF THE 2015 4TH NATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS AND COMPUTER ENGINEERING ( NCEECE 2015), 2016, 47 : 956 - 962
[24] A feature selection method based on synonym merging in text classification system
Haipeng Yao
Chong Liu
Peiying Zhang
Luyao Wang
EURASIP Journal on Wireless Communications and Networking, 2017
[25] A feature selection method based on synonym merging in text classification system
Yao, Haipeng
Liu, Chong
Zhang, Peiying
Wang, Luyao
EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2017,
[26] An improved method of feature selection based on concept attributes in text classification
Liao, SS
Jiang, MH
ADVANCES IN NATURAL COMPUTATION, PT 1, PROCEEDINGS, 2005, 3610 : 1140 - 1149
[27] Intelligent Feature Selection Using Hybrid Based Feature Selection Method
Nisar, Shibli
Tariq, Muhammad
2016 SIXTH INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING TECHNOLOGY (INTECH), 2016, : 168 - 172
[28] Feature selection using hybrid poor and rich optimization algorithm for text classification
Thirumoorthy, K.
Muneeswaran, K.
PATTERN RECOGNITION LETTERS, 2021, 147 : 63 - 70
[29] A feature selection method to handle imbalanced data in text classification
Chang, Fengxiang
Guo, Jun
Xu, Weiran
Yao, Kejun
Journal of Digital Information Management, 2015, 13 (03): : 169 - 175
[30] A Review on Feature Selection and Feature Extraction for Text Classification
Shah, Foram P.
Patel, Vibha
PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2016, : 2264 - 2268

← 1 2 3 4 5 →