A hybrid feature selection method for text classification using a feature-correlation-based genetic algorithmA hybrid feature selection method for text classification...L. Farek, A. Benaidja

被引：0

作者：

Lazhar Farek ^{[1
]}

Amira Benaidja ^{[3
]}

机构：

[1] University of Guelma,Computer Science Department

[2] University of Setif 1,Computer Science Department

[3] Laboratory of Vision and Artificial Intelligence - LAVIA,undefined

[4] Larbi Tebessi University,undefined

来源：

Soft Computing | 2024年 / 28卷 / 23期

关键词：

Feature redundancy; High-dimensionality; Correlation analysis; Redundancy removal; Optimization;

D O I：

10.1007/s00500-024-10386-x

中图分类号：

学科分类号：

摘要：

This paper introduces a new hybrid method to address the issue of redundant and irrelevant features selected by filter-based methods for text classification. The method utilizes an enhanced genetic algorithm called “Feature Correlation-based Genetic Algorithm” (FC-GA). Initially, a feature subset with the highest classification accuracy is selected by a filter-based method, which will be then used by the FC-GA to generate potential solutions by considering the correlation between features that have similar classification weights and avoiding useless random solutions. The encoding process involves assigning a value of 0 to features that provide a high degree of correlation with other features having almost the same classification information beyond a specified context, while features that are lowly correlated retain their initial code of 1. Through iterative optimization using crossover and mutation operators, the algorithm should remove redundant features that provide strong correlations and high redundancy, which could lead to improved classification performance at a lower computation cost. The aim of this study is to improve the efficiency of filter-based methods, incorporate feature correlation information into genetic algorithms, and utilize pre-optimized feature subsets to efficiently identify optimal solutions. To evaluate the effectiveness of the proposed method, SVM and NB classifiers are employed on six public datasets and compared to five well-known and effective filter-based methods. The results indicate that a significant portion (about 50%) of the features selected by reference filter-based methods are redundant. Eliminating those redundant features leads to a significant improvement in classification performance as measured by the micro-f1 measure.

引用

页码：13567 / 13593

页数：26

共 50 条

[1] A Hybrid Feature Selection Method For Vietnamese Text Classification
Nguyen Tri Hai
Tuan Dinh Le
Nguyen Hoang Nghia
Vu Thanh Nguyen
2015 SEVENTH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE), 2015, : 91 - 96
[2] Hybrid feature selection for text classification
Gunal, Serkan
TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2012, 20 : 1296 - 1311
[3] Study on the Method of Feature Selection Based on Hybrid Model for Text Classification
Li, Runzhi
Zhang, Yangsen
MATERIALS SCIENCE AND INFORMATION TECHNOLOGY, PTS 1-8, 2012, 433-440 : 2881 - 2886
[4] A hybrid method of feature selection for Chinese text sentiment classification
Wang, Suge
Wei, Yingjie
Li, Deyu
Zhang, Wu
Li, Wei
FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 3, PROCEEDINGS, 2007, : 435 - +
[5] Hybrid Support Vector Machine based Feature Selection Method for Text Classification
Sabbah, Thabit
Ayyash, Mosab
Ashraf, Mahmood
INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (3A) : 599 - 609
[6] Efficient Method for Feature Selection in Text Classification
Sun, Jian
Zhang, Xiang
Liao, Dan
Chang, Victor
2017 INTERNATIONAL CONFERENCE ON ENGINEERING AND TECHNOLOGY (ICET), 2017,
[7] A new feature selection method for text classification
Uchyigit, Gulden
Clark, Keith
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2007, 21 (02) : 423 - 438
[8] Text feature selection method for hierarchical classification
Zhu, Cui-Ling
Ma, Jun
Zhang, Dong-Mei
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2011, 24 (01): : 103 - 110
[9] Feature Selection Method of Text Tendency Classification
Li, Yanling
Dai, Guanzhong
Li, Gang
FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 34 - +
[10] An enhanced feature selection method for text classification
Kang, Jinbeom
Lee, Eunshil
Hong, Kwanghee
Park, Jeahyun
Kim, Taehwan
Park, Juyoung
Choi, Joongmin
Yang, Jaeyoung
PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 36 - 41

← 1 2 3 4 5 →