The Importance of preprocessing in Turkish Text Classification

被引:0
|
作者
Acikalin, Buse [1 ]
Bayazit, Nilgun Guler [1 ]
机构
[1] Yildiz Tekn Univ, Matemat Muhendisligi Bolumu, Istanbul, Turkey
关键词
Text Mining; Latent Dirichlet Allocation; Topic Models; Support Vector Machine; Naive Bayes; Random Forest;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this study, the effects of the application of stop words filtering and stemming methods on the classification of Turkish Texts. The documents in a corpus that consists of summaries of conference and journal articles classified by Naive Bayes, Support Vector Machines and Random Forests methods and their performers have been compaired. All the models that have employed preprocessing with stemming and stop words elimination have yielded between 2.26% and 4.94% improvement in performance to the models that have not employed such preprocessing.
引用
收藏
页码:2053 / 2056
页数:4
相关论文
共 50 条
  • [21] Text document preprocessing with the Bayes formula for classification using the Support Vector Machine
    Isa, Dino
    Lee, Lam Hong
    Kallimani, V. P.
    RajKumar, R.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (09) : 1264 - 1272
  • [22] Unified benchmark for zero-shot Turkish text classification
    celik, Emrecan
    Dalyan, Tugba
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
  • [23] Efficient Turkish Text Classification Approach for Crisis Management Systems
    Alqaraleh, Saed
    GAZI UNIVERSITY JOURNAL OF SCIENCE, 2021, 34 (03): : 718 - 731
  • [24] Importance Weighted Feature Selection Strategy for Text Classification
    Li, Baoli
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2016, : 344 - 347
  • [25] FRENCH TEXT PREPROCESSING WITH TTL
    Todirascu, Amalia
    Ion, Radu
    Navlea, Mirabela
    Longo, Laurence
    PROCEEDINGS OF THE ROMANIAN ACADEMY SERIES A-MATHEMATICS PHYSICS TECHNICAL SCIENCES INFORMATION SCIENCE, 2011, 12 (02): : 151 - 158
  • [26] Preprocessing Impact on Turkish Sentiment Analysis
    Mulki, Hala
    Haddad, Hatem
    Ali, Chedi Bechikh
    Babaoglu, Ismail
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [27] Comparison of text preprocessing methods
    Chai, Christine P.
    NATURAL LANGUAGE ENGINEERING, 2023, 29 (03) : 509 - 553
  • [28] Text Classification in the Turkish Marketing Domain for Context Sensitive Ad Distribution
    Engin, Melih
    Can, Tolga
    2009 24TH INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2009, : 105 - 110
  • [29] A Turkish Text Classification Based Feature Selection and Density Peaks Clustering
    Zorarpaci, Ezgi
    2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2023,
  • [30] Improving automated Turkish text classification with learning-based algorithms
    Koksal, Omer
    Yilmaz, Eyup Halit
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (11):