The Importance of preprocessing in Turkish Text Classification

被引:0
|
作者
Acikalin, Buse [1 ]
Bayazit, Nilgun Guler [1 ]
机构
[1] Yildiz Tekn Univ, Matemat Muhendisligi Bolumu, Istanbul, Turkey
关键词
Text Mining; Latent Dirichlet Allocation; Topic Models; Support Vector Machine; Naive Bayes; Random Forest;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this study, the effects of the application of stop words filtering and stemming methods on the classification of Turkish Texts. The documents in a corpus that consists of summaries of conference and journal articles classified by Naive Bayes, Support Vector Machines and Random Forests methods and their performers have been compaired. All the models that have employed preprocessing with stemming and stop words elimination have yielded between 2.26% and 4.94% improvement in performance to the models that have not employed such preprocessing.
引用
收藏
页码:2053 / 2056
页数:4
相关论文
共 50 条
  • [41] STRING MATCHING WITH PREPROCESSING OF TEXT AND PATTERN
    NAOR, M
    LECTURE NOTES IN COMPUTER SCIENCE, 1991, 510 : 739 - 750
  • [42] The Influence of preprocessing parameters on text categorization
    Pomikalek, Jan
    Rehurek, Radim
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 19, 2007, 19 : 430 - 433
  • [43] DATA PREPROCESSING IN WEB TEXT MINING
    Jiang Yongbo
    FIFTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER THEORY AND ENGINEERING (ICACTE 2012), 2012, : 573 - 581
  • [44] Preprocessing text to improve compression ratios
    Kruse, H
    Mukherjee, A
    DCC '98 - DATA COMPRESSION CONFERENCE, 1998, : 556 - 556
  • [45] THE INFLUENCE OF TEXT PREPROCESSING METHODS AND TOOLS ON CALCULATING TEXT SIMILARITY
    Petrovic, Dorde
    Stankovic, Milena
    FACTA UNIVERSITATIS-SERIES MATHEMATICS AND INFORMATICS, 2019, 34 (05): : 973 - 994
  • [46] Text Preprocessing for Text Mining in Organizational Research: Review and Recommendations
    Hickman, Louis
    Thapa, Stuti
    Tay, Louis
    Cao, Mengyang
    Srinivasan, Padmini
    ORGANIZATIONAL RESEARCH METHODS, 2022, 25 (01) : 114 - 146
  • [47] A Chi-Square-Test for Word Importance Differentiation in Text Classification
    Meesad, Phayung
    Boonrawd, Pudsadee
    Nuipian, Vatinee
    INFORMATION AND ELECTRONICS ENGINEERING, 2011, 6 : 110 - 114
  • [48] Classification and Preprocessing in the Stock Data
    Juszczuk, Przemyslaw
    Kozak, Jan
    BUSINESS INFORMATION SYSTEMS WORKSHOPS, BIS 2017, 2017, 303 : 269 - 281
  • [49] ICA as a preprocessing technique for classification
    Sanchez-Poblador, V
    Monte-Moreno, E
    Solé-Casals, J
    INDEPENDENT COMPONENT ANALYSIS AND BLIND SIGNAL SEPARATION, 2004, 3195 : 1165 - 1172
  • [50] Advancing Reproducibility and Accountability of Unsupervised Machine Learning in Text Mining: Importance of Transparency in Reporting Preprocessing and Algorithm Selection
    Valtonen, L.
    Makinen, Saku J.
    Kirjavainen, Johanna
    ORGANIZATIONAL RESEARCH METHODS, 2024, 27 (01) : 88 - 113