Text Classification Based on Word2vec and Convolutional Neural Network

被引:5
|
作者
Li, Lin [1 ]
Xiao, Linlong [1 ]
Jin, Wenzhen [1 ]
Zhu, Hong [1 ]
Yang, Guocai [1 ]
机构
[1] Southwest Univ, Sch Comp & Informat Sci, Chongqing, Peoples R China
关键词
Text classification; Text representation; Word2vec; Convolutional neural network;
D O I
10.1007/978-3-030-04221-9_40
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text representations in text classification usually have high dimensionality and are lack of semantics, resulting in poor classification effect. In this paper, TF-IDF is optimized by using optimization factors, then word2vec with semantic information is weighted, and the single-text representation model CD_STR is obtained. Based on the CD_STR model, the latent semantic index (LSI) and the TF-IDF weighted vector space model (T_VSM) are merged to obtain a fusion model, CD_MTR, which is more efficient. The text classification method MTR_MCNN of the fusion model CD_MTR combined with convolutional neural network is further proposed. This method first designs convolution kernels of different sizes and numbers, allowing them to extract text features from different aspects. Then the text vectors trained by the CD_MTR model are used as the input to the improved convolutional neural network. Tests on two datasets have verified that the performance of the two models, CD_STR and CD_MTR, is superior to other comparable textual representation models. The classification effect of MTR_MCNN method is better than that of other comparison methods, and the classification accuracy is higher than that of CD_MTR model.
引用
收藏
页码:450 / 460
页数:11
相关论文
共 50 条
  • [31] Using Word2Vec to Process Big Text Data
    Ma, Long
    Zhang, Yanqing
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2895 - 2897
  • [32] Key word extraction for short text via word2vec, doc2vec, and textrank
    Li, Jun
    Huang, Guimin
    Fan, Chunli
    Sun, Zhenglin
    Zhu, Hongtao
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (03) : 1794 - 1805
  • [33] Malware classification with Word2Vec, HMM2Vec, BERT, and ELMo
    Aparna Sunil Kale
    Vinay Pandya
    Fabio Di Troia
    Mark Stamp
    Journal of Computer Virology and Hacking Techniques, 2023, 19 : 1 - 16
  • [34] Malware classification with Word2Vec, HMM2Vec, BERT, and ELMo
    Kale, Aparna Sunil
    Pandya, Vinay
    Di Troia, Fabio
    Stamp, Mark
    JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2023, 19 (01) : 1 - 16
  • [35] Multi-Label Chinese Question Classification Based on Word2vec
    Fan, Zhengyu
    Su, Lei
    Liu, Xi
    Wang, Shuaiyang
    2017 4TH INTERNATIONAL CONFERENCE ON SYSTEMS AND INFORMATICS (ICSAI), 2017, : 546 - 550
  • [36] Malware Classification Based on Multilayer Perception and Word2Vec for IoT Security
    Qiao, Yanchen
    Zhang, Weizhe
    Du, Xiaojiang
    Guizani, Mohsen
    ACM TRANSACTIONS ON INTERNET TECHNOLOGY, 2022, 22 (01)
  • [37] Word Semantic Similarity Calculation Based on Word2vec
    Jin, Xiaolin
    Zhang, Shuwu
    Liu, Jie
    2018 INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND INFORMATION SCIENCES (ICCAIS), 2018, : 12 - 16
  • [38] Chinese Sentiment Classification Using Extended Word2Vec
    张胜
    张鑫
    程佳军
    王晖
    Journal of Donghua University(English Edition), 2016, 33 (05) : 823 - 826
  • [39] Word Clustering based on Word2vec and Semantic Similarity
    Luo Jie
    Wang Qinglin
    Li Yuan
    2014 33RD CHINESE CONTROL CONFERENCE (CCC), 2014, : 517 - 521
  • [40] Study on Tibetan Word Vector based on Word2vec
    Yang, Ning
    Li, Guanyu
    Ding, Hailan
    Gong, Chunwei
    2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187