A comparative study of feature selection methods for binary text streams classification

被引:0
|
作者
Matheus Bernardelli de Moraes
Andre Leon Sampaio Gradvohl
机构
[1] University of Campinas,School of Technology
来源
Evolving Systems | 2021年 / 12卷
关键词
Text streams; Feature drift; Feature selection; Evolving regularization; Binary classification; Concept drift;
D O I
暂无
中图分类号
学科分类号
摘要
Text streams are a continuous flow of high-dimensional text, transmitted at high-volume and high-velocities. They are expected to be classified in real-time, which is challenging due to the high dimensionality of feature space. Applying feature selection algorithms is one solution to reduce text streams feature space and improve the learning process. However, since text streams are potentially unbounded, it is expected a change in their probabilistic distribution over time, the so-called Concept Drift. The concept drift impacts the feature selection process due to the feature drift when the relevance of features is also subject to changes over time. This paper presents a comparative study of six feature selection methods for binary text streams classification, even in the presence of feature drift. We also propose the Online Feature Selection with Evolving Regularization (OFSER) algorithm, a modified version of the Online Feature Selection (OFS) algorithm, which uses evolving regularization to dynamically penalize model complexity, reducing feature drift impacts on the feature selection process. We conducted the experimental analysis on eleven real-world, commonly used datasets for text classification. The OFSER algorithm showed F1-scores up to 12.92% higher than other algorithms in some cases. The results using Iman and Davenport and Bergmann–Hommel’s tests show that OFSER algorithm is statistically superior to Information Gain and Extremal Feature Selection algorithms in terms of improving the base classifier predictive power.
引用
收藏
页码:997 / 1013
页数:16
相关论文
共 50 条
  • [31] A Comparative Study on Feature Window Selection in Text Filtering
    Hu Quan
    Xie Fang
    Liu Xiaoguang
    2009 INTERNATIONAL FORUM ON INFORMATION TECHNOLOGY AND APPLICATIONS, VOL 3, PROCEEDINGS, 2009, : 209 - +
  • [32] Enhanced Binary Black Hole algorithm for text feature selection on resources classification
    Wu, Xian
    Fei, Minrui
    Wu, Dakui
    Zhou, Wenju
    Du, Songlin
    Fei, Zixiang
    KNOWLEDGE-BASED SYSTEMS, 2023, 274
  • [33] A Comparative Study on Various Text Classification Methods
    Khanna, Samarth
    Tiwari, Bishnu
    Das, Priyanka
    Das, Asit Kumar
    COMPUTATIONAL INTELLIGENCE IN PATTERN RECOGNITION, CIPR 2020, 2020, 1120 : 539 - 549
  • [34] Comparison of Feature Selection Methods in Text Classification on Highly Skewed Datasets
    Asim, Muhammad Nabeel
    Wasim, Muhammad
    Ali, Muhammad Sajid
    Rehman, Abdur
    2017 FIRST INTERNATIONAL CONFERENCE ON LATEST TRENDS IN ELECTRICAL ENGINEERING AND COMPUTING TECHNOLOGIES (INTELLECT), 2017,
  • [35] The Effect of Combining Different Feature Selection Methods on Arabic Text Classification
    Al-Thubaity, Abdulmohsen
    Abanumay, Norah
    AL-Jerayyed, Sara
    Alrukban, Aljoharah
    Mannaa, Zarah
    2013 14TH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD 2013), 2013, : 211 - 216
  • [36] Efficient Text Classification Using Best Feature Selection and Combination of Methods
    Srinivas, M.
    Supreethi, K. P.
    Prasad, E. V.
    Kumari, S. Anitha
    HUMAN INTERFACE AND THE MANAGEMENT OF INFORMATION: DESIGNING INFORMATION ENVIRONMENTS, PT I, 2009, 5617 : 437 - +
  • [37] Filter methods for feature selection -: A comparative study
    Sanchez-Marono, Noelia
    Alonso-Betanzos, Amparo
    Tombilla-Sanroman, Marfa
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2007, 2007, 4881 : 178 - 187
  • [38] A comparative study of facial feature classification methods
    Gonzalez-Ruiz, Martin
    Diaz-Ramirez, Victor H.
    Cazorla, Miguel
    Juarez-Salazar, Rigiberto
    OPTICS AND PHOTONICS FOR INFORMATION PROCESSING XVIII, 2024, 13136
  • [39] Dynamic feature selection in text classification
    Doan, Son
    Horiguchi, Susumu
    INTELLIGENT CONTROL AND AUTOMATION, 2006, 344 : 664 - 675
  • [40] Contextual feature selection for text classification
    Paradis, Francois
    Nie, Jian-Yun
    INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (02) : 344 - 352