A comparative study of feature selection methods for binary text streams classification

被引:0
|
作者
Matheus Bernardelli de Moraes
Andre Leon Sampaio Gradvohl
机构
[1] University of Campinas,School of Technology
来源
Evolving Systems | 2021年 / 12卷
关键词
Text streams; Feature drift; Feature selection; Evolving regularization; Binary classification; Concept drift;
D O I
暂无
中图分类号
学科分类号
摘要
Text streams are a continuous flow of high-dimensional text, transmitted at high-volume and high-velocities. They are expected to be classified in real-time, which is challenging due to the high dimensionality of feature space. Applying feature selection algorithms is one solution to reduce text streams feature space and improve the learning process. However, since text streams are potentially unbounded, it is expected a change in their probabilistic distribution over time, the so-called Concept Drift. The concept drift impacts the feature selection process due to the feature drift when the relevance of features is also subject to changes over time. This paper presents a comparative study of six feature selection methods for binary text streams classification, even in the presence of feature drift. We also propose the Online Feature Selection with Evolving Regularization (OFSER) algorithm, a modified version of the Online Feature Selection (OFS) algorithm, which uses evolving regularization to dynamically penalize model complexity, reducing feature drift impacts on the feature selection process. We conducted the experimental analysis on eleven real-world, commonly used datasets for text classification. The OFSER algorithm showed F1-scores up to 12.92% higher than other algorithms in some cases. The results using Iman and Davenport and Bergmann–Hommel’s tests show that OFSER algorithm is statistically superior to Information Gain and Extremal Feature Selection algorithms in terms of improving the base classifier predictive power.
引用
收藏
页码:997 / 1013
页数:16
相关论文
共 50 条
  • [1] A comparative study of feature selection methods for binary text streams classification
    de Moraes, Matheus Bernardelli
    Sampaio Gradvohl, Andre Leon
    EVOLVING SYSTEMS, 2021, 12 (04) : 997 - 1013
  • [2] Comparative Study of Feature Selection Methods for Medical Full Text Classification
    Adriano Goncalves, Carlos
    Lorenzo Iglesias, Eva
    Borrajo, Lourdes
    Camacho, Rui
    Seara Vieira, Adrian
    Goncalves, Celia Talma
    BIOINFORMATICS AND BIOMEDICAL ENGINEERING (IWBBIO 2019), PT II, 2019, 11466 : 550 - 560
  • [3] A Comparative Study on Feature Selection in Unbalance Text Classification
    Xu, Yan
    2012 INTERNATIONAL SYMPOSIUM ON INFORMATION SCIENCE AND ENGINEERING (ISISE), 2012, : 44 - 47
  • [4] An Experimental Study of Feature Selection Methods for Text Classification
    Uchyigit, Gulden
    Clark, Keith
    PERSONALIZATION TECHNIQUES AND RECOMMENDER SYSTEMS, 2008, : 303 - 320
  • [5] Feature Selection Methods for Text Classification
    Dasgupta, Anirban
    Drineas, Petros
    Harb, Boulos
    Josifovski, Vanja
    Mahoney, Michael W.
    KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 230 - +
  • [6] Arabic Text Classification: A Review Study on Feature Selection Methods
    Hijazi, Musab Mustafa
    Zeki, Akram
    Ismail, Amelia
    2021 22ND INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2021, : 554 - 559
  • [7] A comparative study on unsupervised feature selection methods for text clustering
    Liu, LY
    Kang, JC
    Yu, J
    Wang, ZL
    PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (IEEE NLP-KE'05), 2005, : 597 - 601
  • [8] Comparison on Feature Selection Methods for Text Classification
    Liu, Wenkai
    Xiao, Jiongen
    Hong, Ming
    2020 THE 4TH INTERNATIONAL CONFERENCE ON MANAGEMENT ENGINEERING, SOFTWARE ENGINEERING AND SERVICE SCIENCES (ICMSS 2020), 2020, : 82 - 86
  • [9] Different Classification Algorithms Based on Arabic Text Classification: Feature Selection Comparative Study
    Raho, Ghazi
    Al-Shalabi, Riyad
    Kanaan, Ghassan
    Asma'aNassar
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2015, 6 (02) : 192 - 195
  • [10] Feature Selection by Using Heuristic Methods for Text Classification
    Sel, Ilhami
    Yeroglu, Celalettin
    Hanbay, Davut
    2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,