Ensemble Filter-Wrapper Text Feature Selection Methods for Text Classification

被引:1
|
作者
Ige, Oluwaseun Peter [1 ,2 ]
Gan, Keng Hoon [1 ]
机构
[1] Univ Sains Malaysia, Sch Comp Sci, Gelugor 11800, Malaysia
[2] Universal Basic Educ Commiss, Abuja 900284, Nigeria
来源
关键词
Metaheuristic algorithms; text classification; multi-univariate filter feature selection; ensemble filter-wrapper techniques; BEE COLONY ALGORITHM; OPTIMIZATION;
D O I
10.32604/cmes.2024.053373
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Feature selection is a crucial technique in text classification for improving the efficiency and effectiveness of classifiers or machine learning techniques by reducing the dataset's dimensionality. This involves eliminating irrelevant, redundant, and noisy features to streamline the classification process. Various methods, from single feature selection techniques to ensemble filter-wrapper methods, have been used in the literature. Metaheuristic algorithms have become popular due to their ability to handle optimization complexity and the continuous influx of text documents. Feature selection is inherently multi-objective, balancing the enhancement of feature relevance, accuracy, and the reduction of redundant features. This research presents a two-fold objective for feature selection. The first objective is to identify the top-ranked features using an ensemble of three multi-univariate filter methods: Information Gain (Infogain), Chi-Square (Chi(2)), and Analysis of Variance (ANOVA). This aims to maximize feature relevance while minimizing redundancy. The second objective involves reducing the number of selected features and increasing accuracy through a hybrid approach combining Artificial Bee Colony (ABC) and Genetic Algorithms (GA). This hybrid method operates in a wrapper framework to identify the most informative subset of text features. Support Vector Machine (SVM) was employed as the performance evaluator for the proposed model, tested on two high-dimensional multiclass datasets. The experimental results demonstrated that the ensemble filter combined with the ABC+GA hybrid approach is a promising solution for text feature selection, offering superior performance compared to other existing feature selection algorithms.
引用
收藏
页码:1847 / 1865
页数:19
相关论文
共 50 条
  • [1] Hybrid Filter-Wrapper Feature Selection Method for Sentiment Classification
    Ansari, Gunjan
    Ahmad, Tanvir
    Doja, Mohammad Najmud
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2019, 44 (11) : 9191 - 9208
  • [2] Filter feature selection methods for text classification: a review
    Ming, Hong
    Heyong, Wang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (1) : 2053 - 2091
  • [3] Filter feature selection methods for text classification: a review
    Hong Ming
    Wang Heyong
    Multimedia Tools and Applications, 2024, 83 : 2053 - 2091
  • [4] A Novel Filter-Wrapper Based Feature Selection Approach for Cancer Data Classification
    Mufassirin, M. M. Mohamed
    Ragel, Roshan G.
    2018 IEEE 9TH INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION FOR SUSTAINABILITY (ICIAFS' 2018), 2018,
  • [5] Filter-Wrapper Approach to Feature Selection of GPCR Protein
    Kamal, Nor Ashikin Mohamad
    Abu Bakar, Azuraliza
    Zainudin, Suhaila
    5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS 2015, 2015, : 693 - 698
  • [6] Global Filter-Wrapper method based on class-dependent correlation for text classification
    Kermani, Fatemeh Zarisfi
    Eslami, Esfandiar
    Sadeghi, Faramarz
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2019, 85 : 619 - 633
  • [7] Feature Selection Methods for Text Classification
    Dasgupta, Anirban
    Drineas, Petros
    Harb, Boulos
    Josifovski, Vanja
    Mahoney, Michael W.
    KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 230 - +
  • [8] A HYBRID FILTER-WRAPPER FEATURE SELECTION APPROACH FOR AUTHORSHIP ATTRIBUTION
    Ma, Jianbin
    Xue, Bing
    Zhang, Mengjie
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2019, 15 (05): : 1989 - 2006
  • [9] Comparison on Feature Selection Methods for Text Classification
    Liu, Wenkai
    Xiao, Jiongen
    Hong, Ming
    2020 THE 4TH INTERNATIONAL CONFERENCE ON MANAGEMENT ENGINEERING, SOFTWARE ENGINEERING AND SERVICE SCIENCES (ICMSS 2020), 2020, : 82 - 86
  • [10] A hybrid filter-wrapper gene selection method for cancer classification
    Alomari, Osama Ahmad
    Khader, Ahamad Tajudin
    Al-Betar, Mohammed Azmi
    Alyasseri, Zaid Abdi Alkareem
    2018 2ND INTERNATIONAL CONFERENCE ON BIOSIGNAL ANALYSIS, PROCESSING AND SYSTEMS (ICBAPS 2018), 2018, : 113 - 118