A hybrid feature selection method for text classification using a feature-correlation-based genetic algorithmA hybrid feature selection method for text classification...L. Farek, A. Benaidja

被引:0
|
作者
Lazhar Farek [1 ]
Amira Benaidja [3 ]
机构
[1] University of Guelma,Computer Science Department
[2] University of Setif 1,Computer Science Department
[3] Laboratory of Vision and Artificial Intelligence - LAVIA,undefined
[4] Larbi Tebessi University,undefined
关键词
Feature redundancy; High-dimensionality; Correlation analysis; Redundancy removal; Optimization;
D O I
10.1007/s00500-024-10386-x
中图分类号
学科分类号
摘要
This paper introduces a new hybrid method to address the issue of redundant and irrelevant features selected by filter-based methods for text classification. The method utilizes an enhanced genetic algorithm called “Feature Correlation-based Genetic Algorithm” (FC-GA). Initially, a feature subset with the highest classification accuracy is selected by a filter-based method, which will be then used by the FC-GA to generate potential solutions by considering the correlation between features that have similar classification weights and avoiding useless random solutions. The encoding process involves assigning a value of 0 to features that provide a high degree of correlation with other features having almost the same classification information beyond a specified context, while features that are lowly correlated retain their initial code of 1. Through iterative optimization using crossover and mutation operators, the algorithm should remove redundant features that provide strong correlations and high redundancy, which could lead to improved classification performance at a lower computation cost. The aim of this study is to improve the efficiency of filter-based methods, incorporate feature correlation information into genetic algorithms, and utilize pre-optimized feature subsets to efficiently identify optimal solutions. To evaluate the effectiveness of the proposed method, SVM and NB classifiers are employed on six public datasets and compared to five well-known and effective filter-based methods. The results indicate that a significant portion (about 50%) of the features selected by reference filter-based methods are redundant. Eliminating those redundant features leads to a significant improvement in classification performance as measured by the micro-f1 measure.
引用
收藏
页码:13567 / 13593
页数:26
相关论文
共 50 条
  • [41] Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering
    Bharti, Kusum Kumari
    Singh, Pramod Kumar
    EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (06) : 3105 - 3114
  • [42] Text Guide: Improving the Quality of Long Text Classification by a Text Selection Method Based on Feature Importance
    Fiok, Krzysztof
    Karwowski, Waldemar
    Gutierrez-Franco, Edgar
    Davahli, Mohammad Reza
    Wilamowski, Maciej
    Ahram, Tareq
    Al-Juaid, Awad
    Zurada, Jozef
    IEEE ACCESS, 2021, 9 (09): : 105439 - 105450
  • [43] Text Classification Using Correlation Based Feature Selection on Multi-layer ELM Feature Space
    Roul, Rajendra Kumar
    Sahoo, Jajati Keshari
    Satyanath, Gaurav
    DISTRIBUTED COMPUTING AND INTELLIGENT TECHNOLOGY, ICDCIT 2023, 2023, 13776 : 355 - 361
  • [44] A novel feature selection method for text classification using association rules and clustering
    Sheydaei, Navid
    Saraee, Mohamad
    Shahgholian, Azar
    JOURNAL OF INFORMATION SCIENCE, 2015, 41 (01) : 3 - 15
  • [45] A Chi-square Statistics Based Feature Selection Method in Text Classification
    Zhai, Yujia
    Song, Wei
    Liu, Xianjun
    Liu, Lizhen
    Zhao, Xinlei
    PROCEEDINGS OF 2018 IEEE 9TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2018, : 160 - 163
  • [46] Feature Selection Method Based On Statistics of Compound Words for Arabic Text Classification
    Adel, Aisha
    Omar, Nazlia
    Albared, Mohammed
    Al-Shabi, Adel
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2019, 16 (02) : 178 - 185
  • [47] A new feature selection method based on frequent and associated itemsets for text classification
    Farghaly, Heba Mamdouh
    Abd El-Hafeez, Tarek
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (25):
  • [48] Fast hybrid dimensionality reduction method for classification based on feature selection and grouped feature extraction
    Li, Mengmeng
    Wang, Haofeng
    Yang, Lifang
    Liang, You
    Shang, Zhigang
    Wan, Hong
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 150
  • [49] Feature Selection by Using Heuristic Methods for Text Classification
    Sel, Ilhami
    Yeroglu, Celalettin
    Hanbay, Davut
    2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,
  • [50] Feature Selection for Text Classification Using Mutual Information
    Sel, Ilhami
    Karci, Ali
    Hanbay, Davut
    2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,