Turkish Document Classification Based on Word2Vec and SVM Classifier

被引:0
|
作者
Sahin, Gurkan [1 ]
机构
[1] Yildiz Tekn Univ, Bilgisayar Muhendisligi Bolumu, Istanbul, Turkey
关键词
document categorization; SVM; word2vec;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this study, Turkish texts belonging to different categories were classified by using word2vec word vectors. Firstly, vectors of the words in all the texts were extracted then, each text was represented in terms of the mean vectors of the words it contains. Texts were classified by SVM and 0.92 F measurement score was obtained for seven different categories. As a result, it was experimentally shown that word2vec is more successful than tf-idf based classification for Turkish document classification.
引用
收藏
页数:4
相关论文
共 50 条
  • [1] A Study of Chinese Document Representation and Classification with Word2vec
    Zhu, Lei
    Wang, Guijun
    Zou, Xiancun
    PROCEEDINGS OF 2016 9TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 1, 2016, : 298 - 302
  • [2] Classification Turkish SMS with Deep Learning Tool Word2Vec
    Karasoy, Onur
    Balli, Serkan
    2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2017, : 294 - 297
  • [3] Research on Chinese Text Classification Based on Word2vec
    Yang, Zhi-Tong
    Zheng, Jun
    2016 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2016, : 1166 - 1170
  • [4] Diacritic restoration of Turkish tweets with word2vec
    Ozer, Zeynep
    Ozer, Ilyas
    Findik, Oguz
    ENGINEERING SCIENCE AND TECHNOLOGY-AN INTERNATIONAL JOURNAL-JESTECH, 2018, 21 (06): : 1120 - 1127
  • [5] Microblogging Short Text Classification based on Word2Vec
    Zhang, Yonghui
    Liu, Jingang
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ELECTRONIC, MECHANICAL, INFORMATION AND MANAGEMENT SOCIETY (EMIM), 2016, 40 : 395 - 401
  • [6] Short Text Classification Based on Wikipedia and Word2vec
    Liu Wensen
    Cao Zewen
    Wang Jun
    Wang Xiaoyi
    2016 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2016, : 1195 - 1200
  • [7] Document Classification Using Word2Vec and Chi-square on Apache Spark
    Choi, Mijin
    Jin, Rize
    Chung, Tae-Sun
    ADVANCES IN COMPUTER SCIENCE AND UBIQUITOUS COMPUTING, 2017, 421 : 867 - 872
  • [8] Text Classification Based on Word2vec and Convolutional Neural Network
    Li, Lin
    Xiao, Linlong
    Jin, Wenzhen
    Zhu, Hong
    Yang, Guocai
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT V, 2018, 11305 : 450 - 460
  • [9] Text Classification Research Based on Improved Word2vec and CNN
    Gao, Mengyuan
    Li, Tinghui
    Huang, Peifang
    SERVICE-ORIENTED COMPUTING, ICSOC 2018, 2019, 11434 : 126 - 135
  • [10] Automated Classification of Exchange Information Requirements for Construction Projects Using Word2Vec and SVM
    Mitera-Kielbasa, Ewelina
    Zima, Krzysztof
    INFRASTRUCTURES, 2024, 9 (11)