A hybrid feature selection method for text classification using a feature-correlation-based genetic algorithmA hybrid feature selection method for text classification...L. Farek, A. Benaidja

被引：0

作者：

Lazhar Farek ^{[1
]}

Amira Benaidja ^{[3
]}

机构：

[1] University of Guelma,Computer Science Department

[2] University of Setif 1,Computer Science Department

[3] Laboratory of Vision and Artificial Intelligence - LAVIA,undefined

[4] Larbi Tebessi University,undefined

来源：

Soft Computing | 2024年 / 28卷 / 23期

关键词：

Feature redundancy; High-dimensionality; Correlation analysis; Redundancy removal; Optimization;

D O I：

10.1007/s00500-024-10386-x

中图分类号：

学科分类号：

摘要：

This paper introduces a new hybrid method to address the issue of redundant and irrelevant features selected by filter-based methods for text classification. The method utilizes an enhanced genetic algorithm called “Feature Correlation-based Genetic Algorithm” (FC-GA). Initially, a feature subset with the highest classification accuracy is selected by a filter-based method, which will be then used by the FC-GA to generate potential solutions by considering the correlation between features that have similar classification weights and avoiding useless random solutions. The encoding process involves assigning a value of 0 to features that provide a high degree of correlation with other features having almost the same classification information beyond a specified context, while features that are lowly correlated retain their initial code of 1. Through iterative optimization using crossover and mutation operators, the algorithm should remove redundant features that provide strong correlations and high redundancy, which could lead to improved classification performance at a lower computation cost. The aim of this study is to improve the efficiency of filter-based methods, incorporate feature correlation information into genetic algorithms, and utilize pre-optimized feature subsets to efficiently identify optimal solutions. To evaluate the effectiveness of the proposed method, SVM and NB classifiers are employed on six public datasets and compared to five well-known and effective filter-based methods. The results indicate that a significant portion (about 50%) of the features selected by reference filter-based methods are redundant. Eliminating those redundant features leads to a significant improvement in classification performance as measured by the micro-f1 measure.

引用

页码：13567 / 13593

页数：26

共 50 条

[31] Research on Feature Selection Method in Chinese Text Automatic Classification
Hong, Ying
Shao, Xiwen
PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON APPLIED SCIENCE AND ENGINEERING INNOVATION, 2015, 12 : 1759 - 1763
[32] Research on feature selection method in Chinese text automatic classification
Hong, Ying
Geng, Zengmin
ENERGY SCIENCE AND APPLIED TECHNOLOGY, 2016, : 359 - 361
[33] Two-stage Feature Selection Method for Text Classification
Li Xi
Dai Hang
Wang Mingwen
MINES 2009: FIRST INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION NETWORKING AND SECURITY, VOL 1, PROCEEDINGS, 2009, : 234 - +
[34] Dynamic feature selection in text classification
Doan, Son
Horiguchi, Susumu
INTELLIGENT CONTROL AND AUTOMATION, 2006, 344 : 664 - 675
[35] Contextual feature selection for text classification
Paradis, Francois
Nie, Jian-Yun
INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (02) : 344 - 352
[36] Feature Selection Strategy in Text Classification
Fung, Pui Cheong Gabriel
Morstatter, Fred
Liu, Huan
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6634 : 26 - 37
[37] Feature selection for text classification: A review
Deng, Xuelian
Li, Yuqing
Weng, Jian
Zhang, Jilian
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (03) : 3797 - 3816
[38] Feature Selection Methods for Text Classification
Dasgupta, Anirban
Drineas, Petros
Harb, Boulos
Josifovski, Vanja
Mahoney, Michael W.
KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 230 - +
[39] Feature selection for text classification: A review
Xuelian Deng
Yuqing Li
Jian Weng
Jilian Zhang
Multimedia Tools and Applications, 2019, 78 : 3797 - 3816
[40] Feature Selection for Ordinal Text Classification
Baccianella, Stefano
Esuli, Andrea
Sebastiani, Fabrizio
NEURAL COMPUTATION, 2014, 26 (03) : 557 - 591

← 1 2 3 4 5 →