The impact of preprocessing on text classification

被引:369
|
作者
Uysal, Alper Kursat [1 ]
Gunal, Serkan [1 ]
机构
[1] Anadolu Univ, Dept Comp Engn, Eskisehir, Turkey
关键词
Pattern recognition; Text categorization; Text classification; Text preprocessing; FEATURE-SELECTION; ALGORITHM; MODEL;
D O I
10.1016/j.ipm.2013.08.006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Preprocessing is one of the key components in a typical text classification framework. This paper aims to extensively examine the impact of preprocessing on text classification in terms of various aspects such as classification accuracy, text domain, text language, and dimension reduction. For this purpose, all possible combinations of widely used preprocessing tasks are comparatively evaluated on two different domains, namely e-mail and news, and in two different languages, namely Turkish and English. In this way, contribution of the preprocessing tasks to classification success at various feature dimensions, possible interactions among these tasks, and also dependency of these tasks to the respective languages and domains are comprehensively assessed. Experimental analysis on benchmark datasets reveals that choosing appropriate combinations of preprocessing tasks, rather than enabling or disabling them all, may provide significant improvement on classification accuracy depending on the domain and language studied on. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:104 / 112
页数:9
相关论文
共 50 条
  • [1] The Impact of Preprocessing on Classification Performance in Convolutional Neural Networks for Turkish Text
    Salur, Mehmet Umut
    Aydin, Ilhan
    2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP), 2018,
  • [2] The Importance of preprocessing in Turkish Text Classification
    Acikalin, Buse
    Bayazit, Nilgun Guler
    2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 2053 - 2056
  • [3] The Research of Text Preprocessing Effect on Text Documents Classification Efficiency
    Kurbatow, Andrew
    2015 INTERNATIONAL CONFERENCE "STABILITY AND CONTROL PROCESSES" IN MEMORY OF V.I. ZUBOV (SCP), 2015, : 653 - 655
  • [4] Connecting Text Classification with Image Classification: A New Preprocessing Method for Implicit Sentiment Text Classification
    Chen, Meikang
    Ubul, Kurban
    Xu, Xuebin
    Aysa, Alimjan
    Muhammat, Mahpirat
    SENSORS, 2022, 22 (05)
  • [5] Effects of Preprocessing on Text Classification in Balanced and Imbalanced Datasets
    Karaca, Mehmet F.
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2024, 18 (03): : 591 - 609
  • [6] The Impact of Features and Preprocessing on Automatic Text Summarization
    Bal, Salih
    Sora Gunal, Efnan
    ROMANIAN JOURNAL OF INFORMATION SCIENCE AND TECHNOLOGY, 2022, 25 (02): : 117 - 132
  • [7] The impact of text preprocessing on the prediction of review ratings
    Isik, Muhittin
    Dag, Hasan
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2020, 28 (03) : 1405 - 1421
  • [8] Investigating the Impact of Preprocessing Techniques and Representation Models on Arabic Text Classification using Machine Learning
    Masadeh, Mahmoud
    Moustapha, A.
    Sharada, B.
    Hanumanthappa, J.
    Hemachandran, K.
    Chola, Channabasava
    Muaad, Abdullah Y.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (01) : 1115 - 1123
  • [9] Impact of preprocessing on medical data classification
    Sarab ALMUHAIDEB
    Mohamed El Bachir MENAI
    Frontiers of Computer Science, 2016, 10 (06) : 1082 - 1102
  • [10] Impact of preprocessing on medical data classification
    Sarab Almuhaideb
    Mohamed El Bachir Menai
    Frontiers of Computer Science, 2016, 10 : 1082 - 1102