A study of damp-heat syndrome classification Using Word2vec and TF-IDF

被引:0
|
作者
Zhu, Wei [1 ]
Zhang, Wei [1 ]
Li, Guo-Zheng [1 ]
He, Chong [1 ]
Zhang, Lei [2 ]
机构
[1] Tongji Univ, Dept Control Sci & Engn, Shanghai 201804, Peoples R China
[2] Chinese Med Sci, China Acad, Inst Basic Res Clin Med, Beijing 100700, Peoples R China
关键词
Clinical record analysis; Word2vec; TF-IDF; TCM; Damp-heat syndrome Classification;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
With people's increasing concern about health, judging people's health through medical record is becoming a potential demand. Most of preview disease analysis researches were conducted on structured dataset, which usually ignored the relationship between different symptoms, and the dataset was expensive to get. In this paper, we proposed a novel model based on Word2vec and Terms Frequency-Inverse Document Frequency (TF-IDF), which could be used to detect damp-heat syndrome on unstructured records directly. Firstly, we adopt ICTCLAS system combined with corpus collected in the field of Traditional Chinese Medicine (TCM) to segment the clinical records into words. Secondly, Word2vec tool was used to train word vector. Then, we constructed the record representation vector according to word vector and TF-IDF. The record representation method was named Word2vec+ TF-IDF. In order to verify the effectiveness of the proposed method, we compared our record representation method with other text representation methods under four different classifiers. The experiment was conducted on the dataset collected from over 10 Chinese Medicine hospitals. And the experimental results show that our model perform better than the state-of-theart methods such as LSA and Doc2vec.
引用
收藏
页码:1415 / 1420
页数:6
相关论文
共 50 条
  • [21] Authorship Clustering using TF-IDF weighted Word-Embeddings
    Agarwal, Lucky
    Thakral, Kartik
    Bhatt, Gaurav
    Mittal, Ankush
    PROCEEDINGS OF THE 11TH ANNUAL MEETING OF THE FORUM FOR INFORMATION RETRIEVAL EVALUATION (FIRE 2019), 2019, : 24 - 29
  • [22] Writer Identification using TF-IDF for Cursive Handwritten Word Recognition
    Bui, Quang Anh
    Visani, Muriel
    Prum, Sophea
    Ogier, Jean-Marc
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 844 - 848
  • [23] Multi-co-training for document classification using various document representations: TF-IDF, LDA, and Doc2Vec
    Kim, Donghwa
    Seo, Deokseong
    Cho, Suhyoun
    Kang, Pilsung
    INFORMATION SCIENCES, 2019, 477 : 15 - 29
  • [24] Stability of Word Embeddings Using Word2Vec
    Chugh, Mansi
    Whigham, Peter A.
    Dick, Grant
    AI 2018: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, 11320 : 812 - 818
  • [25] A Study on Sentiment Computing and Classification of Sina Weibo with Word2vec
    Bai Xue
    Chen Fu
    Zhan Shaobin
    2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 358 - 363
  • [26] Acceleration of Word2vec Using GPUs
    Bae, Seulki
    Yi, Youngmin
    NEURAL INFORMATION PROCESSING, ICONIP 2016, PT II, 2016, 9948 : 269 - 279
  • [27] 基于doc2vec和TF-IDF的相似文本识别
    贺益侗
    电子制作, 2018, (18) : 37 - 39
  • [28] Malware classification with Word2Vec, HMM2Vec, BERT, and ELMo
    Aparna Sunil Kale
    Vinay Pandya
    Fabio Di Troia
    Mark Stamp
    Journal of Computer Virology and Hacking Techniques, 2023, 19 : 1 - 16
  • [29] Malware classification with Word2Vec, HMM2Vec, BERT, and ELMo
    Kale, Aparna Sunil
    Pandya, Vinay
    Di Troia, Fabio
    Stamp, Mark
    JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2023, 19 (01) : 1 - 16
  • [30] Classification Bullying Tweet Using Convolutional Neural Network with Word2vec
    Ricko
    Sasongko, Priyo Sidik
    2021 5TH INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTATIONAL SCIENCES (ICICOS 2021), 2021,