The Importance of preprocessing in Turkish Text Classification

被引:0
|
作者
Acikalin, Buse [1 ]
Bayazit, Nilgun Guler [1 ]
机构
[1] Yildiz Tekn Univ, Matemat Muhendisligi Bolumu, Istanbul, Turkey
关键词
Text Mining; Latent Dirichlet Allocation; Topic Models; Support Vector Machine; Naive Bayes; Random Forest;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this study, the effects of the application of stop words filtering and stemming methods on the classification of Turkish Texts. The documents in a corpus that consists of summaries of conference and journal articles classified by Naive Bayes, Support Vector Machines and Random Forests methods and their performers have been compaired. All the models that have employed preprocessing with stemming and stop words elimination have yielded between 2.26% and 4.94% improvement in performance to the models that have not employed such preprocessing.
引用
收藏
页码:2053 / 2056
页数:4
相关论文
共 50 条
  • [31] Relational Turkish Text Classification Using Distant Supervised Entities and Relations
    Okur, Halil Ibrahim
    Tohma, Kadir
    Sertbas, Ahmet
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 79 (02): : 2209 - 2228
  • [32] An automated new approach in fast text classification (fastText): A case study for Turkish text classification without pre-processing
    Kuyumcu, Birol
    Aksakalli, Cuneyt
    Delil, Selman
    NLPIR 2019: 2019 3RD INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, 2019, : 1 - 4
  • [33] Text Guide: Improving the Quality of Long Text Classification by a Text Selection Method Based on Feature Importance
    Fiok, Krzysztof
    Karwowski, Waldemar
    Gutierrez-Franco, Edgar
    Davahli, Mohammad Reza
    Wilamowski, Maciej
    Ahram, Tareq
    Al-Juaid, Awad
    Zurada, Jozef
    IEEE ACCESS, 2021, 9 (09): : 105439 - 105450
  • [34] Classification of Full Text Biomedical Documents: Sections Importance Assessment
    Oliveira Goncalves, Carlos Adriano
    Camacho, Rui
    Goncalves, Celia Talma
    Seara Vieira, Adrian
    Borrajo Diz, Lourdes
    Lorenzo Iglesias, Eva
    APPLIED SCIENCES-BASEL, 2021, 11 (06):
  • [35] Investigating the Impact of Preprocessing Techniques and Representation Models on Arabic Text Classification using Machine Learning
    Masadeh, Mahmoud
    Moustapha, A.
    Sharada, B.
    Hanumanthappa, J.
    Hemachandran, K.
    Chola, Channabasava
    Muaad, Abdullah Y.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (01) : 1115 - 1123
  • [36] The Evaluation of Word Embedding Models and Deep Learning Algorithms for Turkish Text Classification
    Kilimci, Zeynep Hilal
    Akyokus, Selim
    2019 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2019, : 548 - 553
  • [37] Machine Learning-Based Text Classification Comparison: Turkish Language Context
    Alzoubi, Yehia Ibrahim
    Topcu, Ahmet E.
    Erkaya, Ahmed Enis
    APPLIED SCIENCES-BASEL, 2023, 13 (16):
  • [38] Universal text preprocessing for data compression
    Abel, J
    Teahan, W
    IEEE TRANSACTIONS ON COMPUTERS, 2005, 54 (05) : 497 - 507
  • [39] Preprocessing Arabic text on social media
    Hegazi, Mohamed Osman
    Al-Dossari, Yasser
    Al-Yahy, Abdullah
    Al-Sumari, Abdulaziz
    Hilal, Anwer
    HELIYON, 2021, 7 (02)
  • [40] Text preprocessing for Czech speech synthesis
    Batusek, R
    Dvorák, J
    TEXT, SPEECH AND DIALOGUE, 1999, 1692 : 209 - 214