Text Classification Based on Word2vec and Convolutional Neural Network

被引:5
|
作者
Li, Lin [1 ]
Xiao, Linlong [1 ]
Jin, Wenzhen [1 ]
Zhu, Hong [1 ]
Yang, Guocai [1 ]
机构
[1] Southwest Univ, Sch Comp & Informat Sci, Chongqing, Peoples R China
关键词
Text classification; Text representation; Word2vec; Convolutional neural network;
D O I
10.1007/978-3-030-04221-9_40
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text representations in text classification usually have high dimensionality and are lack of semantics, resulting in poor classification effect. In this paper, TF-IDF is optimized by using optimization factors, then word2vec with semantic information is weighted, and the single-text representation model CD_STR is obtained. Based on the CD_STR model, the latent semantic index (LSI) and the TF-IDF weighted vector space model (T_VSM) are merged to obtain a fusion model, CD_MTR, which is more efficient. The text classification method MTR_MCNN of the fusion model CD_MTR combined with convolutional neural network is further proposed. This method first designs convolution kernels of different sizes and numbers, allowing them to extract text features from different aspects. Then the text vectors trained by the CD_MTR model are used as the input to the improved convolutional neural network. Tests on two datasets have verified that the performance of the two models, CD_STR and CD_MTR, is superior to other comparable textual representation models. The classification effect of MTR_MCNN method is better than that of other comparison methods, and the classification accuracy is higher than that of CD_MTR model.
引用
收藏
页码:450 / 460
页数:11
相关论文
共 50 条
  • [41] A Study of Chinese Document Representation and Classification with Word2vec
    Zhu, Lei
    Wang, Guijun
    Zou, Xiancun
    PROCEEDINGS OF 2016 9TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 1, 2016, : 298 - 302
  • [42] Research on image text generation based on word2vec visual vocabulary attention
    Li, Danyang
    Zhao, Yahui
    Cui, Rongyi
    Zhao, Linlin
    2021 ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS TECHNOLOGY AND COMPUTER SCIENCE (ACCTCS 2021), 2021, : 344 - 348
  • [43] Multidocument Arabic Text Summarization Based on Clustering and Word2Vec to Reduce Redundancy
    Abdulateef, Samer
    Khan, Naseer Ahmed
    Chen, Bolin
    Shang, Xuequn
    INFORMATION, 2020, 11 (02)
  • [44] Word2Vec inversion and traditional text classifiers for phenotyping lupus
    Turner, Clayton A.
    Jacobs, Alexander D.
    Marques, Cassios K.
    Oates, James C.
    Kamen, Diane L.
    Anderson, Paul E.
    Obeid, Jihad S.
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2017, 17
  • [45] Generative Adversarial Networks for text using word2vec intermediaries
    Budhkar, Akshay
    Vishnubhotla, Krishnapriya
    Hossain, Safwan
    Rudzicz, Frank
    4TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP (REPL4NLP-2019), 2019, : 15 - 26
  • [46] Word2Vec inversion and traditional text classifiers for phenotyping lupus
    Clayton A. Turner
    Alexander D. Jacobs
    Cassios K. Marques
    James C. Oates
    Diane L. Kamen
    Paul E. Anderson
    Jihad S. Obeid
    BMC Medical Informatics and Decision Making, 17
  • [47] Text classification algorithm of tourist attractions subcategories with modified TF-IDF and Word2Vec
    Xiao, Lu
    Li, Qiaoxing
    Ma, Qian
    Shen, Jiasheng
    Yang, Yong
    Li, Danyang
    PLOS ONE, 2024, 19 (10):
  • [48] An Word2vec based on Chinese Medical Knowledge
    Zhu, Jiayi
    Ni, Pin
    Li, Yuming
    Peng, Junkun
    Dai, Zhenjin
    Le, Gangmin
    Bai, Xuming
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 6263 - 6265
  • [49] WEIGHTED WORD2VEC BASED ON THE DISTANCE OF WORDS
    Chang, Chia-Yang
    Lee, Shie-Jue
    Lai, Chih-Chin
    PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL 2, 2017, : 563 - 568
  • [50] ECG analysis based on Word2Vec model
    Oliinyk, Yurii
    Tereschenko, Andrii
    Baklan, Igor
    Beraudo, Elisa
    IDDM 2021: INFORMATICS & DATA-DRIVEN MEDICINE: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON INFORMATICS & DATA-DRIVEN MEDICINE (IDDM 2021), 2021, 3038 : 213 - 222