The Importance of preprocessing in Turkish Text Classification

被引:0
|
作者
Acikalin, Buse [1 ]
Bayazit, Nilgun Guler [1 ]
机构
[1] Yildiz Tekn Univ, Matemat Muhendisligi Bolumu, Istanbul, Turkey
关键词
Text Mining; Latent Dirichlet Allocation; Topic Models; Support Vector Machine; Naive Bayes; Random Forest;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this study, the effects of the application of stop words filtering and stemming methods on the classification of Turkish Texts. The documents in a corpus that consists of summaries of conference and journal articles classified by Naive Bayes, Support Vector Machines and Random Forests methods and their performers have been compaired. All the models that have employed preprocessing with stemming and stop words elimination have yielded between 2.26% and 4.94% improvement in performance to the models that have not employed such preprocessing.
引用
收藏
页码:2053 / 2056
页数:4
相关论文
共 50 条
  • [1] The Impact of Preprocessing on Classification Performance in Convolutional Neural Networks for Turkish Text
    Salur, Mehmet Umut
    Aydin, Ilhan
    2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP), 2018,
  • [2] The impact of preprocessing on text classification
    Uysal, Alper Kursat
    Gunal, Serkan
    INFORMATION PROCESSING & MANAGEMENT, 2014, 50 (01) : 104 - 112
  • [3] The Research of Text Preprocessing Effect on Text Documents Classification Efficiency
    Kurbatow, Andrew
    2015 INTERNATIONAL CONFERENCE "STABILITY AND CONTROL PROCESSES" IN MEMORY OF V.I. ZUBOV (SCP), 2015, : 653 - 655
  • [4] Connecting Text Classification with Image Classification: A New Preprocessing Method for Implicit Sentiment Text Classification
    Chen, Meikang
    Ubul, Kurban
    Xu, Xuebin
    Aysa, Alimjan
    Muhammat, Mahpirat
    SENSORS, 2022, 22 (05)
  • [5] Active Learning for Turkish Text Classification
    Sapci, Ali Osman Berk
    Tastan, Oznur
    Yeniterzi, Reyyan
    2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [6] Effects of Preprocessing on Text Classification in Balanced and Imbalanced Datasets
    Karaca, Mehmet F.
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2024, 18 (03): : 591 - 609
  • [7] A comparative analysis of text classification for Turkish language
    Yildirim, Savas
    Yildiz, Tugba
    PAMUKKALE UNIVERSITY JOURNAL OF ENGINEERING SCIENCES-PAMUKKALE UNIVERSITESI MUHENDISLIK BILIMLERI DERGISI, 2018, 24 (05): : 879 - 886
  • [8] The Effect of Transfer Learning on Turkish Text Classification
    Sahin, Gurkan
    Diri, Banu
    29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [9] Turkish Medical Text Classification Using BERT
    Celikten, Azer
    Bulut, Hasan
    29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [10] Zero-Shot Turkish Text Classification
    Birim, Ahmet
    Erden, Mustafa
    Arslan, Levent M.
    29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,