A study of damp-heat syndrome classification Using Word2vec and TF-IDF

被引:0
|
作者
Zhu, Wei [1 ]
Zhang, Wei [1 ]
Li, Guo-Zheng [1 ]
He, Chong [1 ]
Zhang, Lei [2 ]
机构
[1] Tongji Univ, Dept Control Sci & Engn, Shanghai 201804, Peoples R China
[2] Chinese Med Sci, China Acad, Inst Basic Res Clin Med, Beijing 100700, Peoples R China
关键词
Clinical record analysis; Word2vec; TF-IDF; TCM; Damp-heat syndrome Classification;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
With people's increasing concern about health, judging people's health through medical record is becoming a potential demand. Most of preview disease analysis researches were conducted on structured dataset, which usually ignored the relationship between different symptoms, and the dataset was expensive to get. In this paper, we proposed a novel model based on Word2vec and Terms Frequency-Inverse Document Frequency (TF-IDF), which could be used to detect damp-heat syndrome on unstructured records directly. Firstly, we adopt ICTCLAS system combined with corpus collected in the field of Traditional Chinese Medicine (TCM) to segment the clinical records into words. Secondly, Word2vec tool was used to train word vector. Then, we constructed the record representation vector according to word vector and TF-IDF. The record representation method was named Word2vec+ TF-IDF. In order to verify the effectiveness of the proposed method, we compared our record representation method with other text representation methods under four different classifiers. The experiment was conducted on the dataset collected from over 10 Chinese Medicine hospitals. And the experimental results show that our model perform better than the state-of-theart methods such as LSA and Doc2vec.
引用
收藏
页码:1415 / 1420
页数:6
相关论文
共 50 条
  • [31] SENTIMENT CLASSIFICATION USING TF-IDF FEATURES AND EXTENDED SPACE FOREST ENSEMBLE
    Cao, Nieqing
    Cao, Jingjing
    Lu, Haili
    Li, Bing
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOL. 2, 2015, : 526 - 532
  • [32] Research on Chinese Text Classification Based on Word2vec
    Yang, Zhi-Tong
    Zheng, Jun
    2016 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2016, : 1166 - 1170
  • [33] Microblogging Short Text Classification based on Word2Vec
    Zhang, Yonghui
    Liu, Jingang
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ELECTRONIC, MECHANICAL, INFORMATION AND MANAGEMENT SOCIETY (EMIM), 2016, 40 : 395 - 401
  • [34] Short Text Classification Based on Wikipedia and Word2vec
    Liu Wensen
    Cao Zewen
    Wang Jun
    Wang Xiaoyi
    2016 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2016, : 1195 - 1200
  • [35] Study on Tibetan Word Vector based on Word2vec
    Yang, Ning
    Li, Guanyu
    Ding, Hailan
    Gong, Chunwei
    2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
  • [36] A deep learning analysis on question classification task using Word2vec representations
    Yilmaz, Seyhmus
    Toklu, Sinan
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (07): : 2909 - 2928
  • [37] A deep learning analysis on question classification task using Word2vec representations
    Seyhmus Yilmaz
    Sinan Toklu
    Neural Computing and Applications, 2020, 32 : 2909 - 2928
  • [38] Document Classification Using Word2Vec and Chi-square on Apache Spark
    Choi, Mijin
    Jin, Rize
    Chung, Tae-Sun
    ADVANCES IN COMPUTER SCIENCE AND UBIQUITOUS COMPUTING, 2017, 421 : 867 - 872
  • [39] KEYWORD EXTRACTION BASED ON WORD SYNONYMS USING WORD2VEC
    Ogul, Iskender Ulgen
    Ozcan, Caner
    Hakdagli, Ozlem
    2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [40] Text Classification Based on Word2vec and Convolutional Neural Network
    Li, Lin
    Xiao, Linlong
    Jin, Wenzhen
    Zhu, Hong
    Yang, Guocai
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT V, 2018, 11305 : 450 - 460