Ensemble Filter-Wrapper Text Feature Selection Methods for Text Classification

被引:1
|
作者
Ige, Oluwaseun Peter [1 ,2 ]
Gan, Keng Hoon [1 ]
机构
[1] Univ Sains Malaysia, Sch Comp Sci, Gelugor 11800, Malaysia
[2] Universal Basic Educ Commiss, Abuja 900284, Nigeria
来源
关键词
Metaheuristic algorithms; text classification; multi-univariate filter feature selection; ensemble filter-wrapper techniques; BEE COLONY ALGORITHM; OPTIMIZATION;
D O I
10.32604/cmes.2024.053373
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Feature selection is a crucial technique in text classification for improving the efficiency and effectiveness of classifiers or machine learning techniques by reducing the dataset's dimensionality. This involves eliminating irrelevant, redundant, and noisy features to streamline the classification process. Various methods, from single feature selection techniques to ensemble filter-wrapper methods, have been used in the literature. Metaheuristic algorithms have become popular due to their ability to handle optimization complexity and the continuous influx of text documents. Feature selection is inherently multi-objective, balancing the enhancement of feature relevance, accuracy, and the reduction of redundant features. This research presents a two-fold objective for feature selection. The first objective is to identify the top-ranked features using an ensemble of three multi-univariate filter methods: Information Gain (Infogain), Chi-Square (Chi(2)), and Analysis of Variance (ANOVA). This aims to maximize feature relevance while minimizing redundancy. The second objective involves reducing the number of selected features and increasing accuracy through a hybrid approach combining Artificial Bee Colony (ABC) and Genetic Algorithms (GA). This hybrid method operates in a wrapper framework to identify the most informative subset of text features. Support Vector Machine (SVM) was employed as the performance evaluator for the proposed model, tested on two high-dimensional multiclass datasets. The experimental results demonstrated that the ensemble filter combined with the ABC+GA hybrid approach is a promising solution for text feature selection, offering superior performance compared to other existing feature selection algorithms.
引用
收藏
页码:1847 / 1865
页数:19
相关论文
共 50 条
  • [31] A Multi-objective hybrid filter-wrapper evolutionary approach for feature selection
    Hammami, Marwa
    Bechikh, Slim
    Hung, Chih-Cheng
    Ben Said, Lamjed
    MEMETIC COMPUTING, 2019, 11 (02) : 193 - 208
  • [32] A Multi-objective hybrid filter-wrapper evolutionary approach for feature selection
    Marwa Hammami
    Slim Bechikh
    Chih-Cheng Hung
    Lamjed Ben Said
    Memetic Computing, 2019, 11 : 193 - 208
  • [33] A new hybrid filter-wrapper feature selection method for clustering based on ranking
    Solorio-Fernandez, Saul
    Ariel Carrasco-Ochoa, J.
    Fco. Martinez-Trinidad, Jose
    NEUROCOMPUTING, 2016, 214 : 866 - 880
  • [34] A hybrid ensemble-filter wrapper feature selection approach for medical data classification
    Singh, Namrata
    Singh, Pradeep
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2021, 217
  • [35] Enhanced Filter Feature Selection Methods for Arabic Text Categorization
    Ghareb, Abdullah Saeed
    Abu Bakara, Azuraliza
    Al-Radaideh, Qasem A.
    Hamdan, Abdul Razak
    INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2018, 8 (02) : 1 - 24
  • [36] Robustness and Predictive Performance of Homogeneous Ensemble Feature Selection in Text Classification
    Mehta, Poornima
    Chandra, Satish
    INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2021, 11 (01) : 75 - 89
  • [37] A novel filter feature selection method for text classification: Extensive Feature Selector
    Parlak, Bekir
    Uysal, Alper Kursat
    JOURNAL OF INFORMATION SCIENCE, 2023, 49 (01) : 59 - 78
  • [38] Feature Selection Methods: Case of Filter and Wrapper Approaches for Maximising Classification Accuracy
    Wah, Yap Bee
    Ibrahim, Nurain
    Hamid, Hamzah Abdul
    Abdul-Rahman, Shuzlina
    Fong, Simon
    PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY, 2018, 26 (01): : 329 - 339
  • [39] Application of a GA/Bayesian filter-wrapper feature selection method to classification of clinical depression from speech data
    Torres, Juan
    Saad, Ashraf
    Moore, Elliot
    SOFT COMPUTING IN INDUSTRIAL APPLICATIONS: RECENT AND EMERGING METHODS AND TECHNIQUES, 2007, 39 : 115 - +
  • [40] Feature selection methods for text classification: a systematic literature review
    Pintas, Julliano Trindade
    Fernandes, Leandro A. F.
    Garcia, Ana Cristina Bicharra
    ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (08) : 6149 - 6200