A study of damp-heat syndrome classification Using Word2vec and TF-IDF

被引:0
|
作者
Zhu, Wei [1 ]
Zhang, Wei [1 ]
Li, Guo-Zheng [1 ]
He, Chong [1 ]
Zhang, Lei [2 ]
机构
[1] Tongji Univ, Dept Control Sci & Engn, Shanghai 201804, Peoples R China
[2] Chinese Med Sci, China Acad, Inst Basic Res Clin Med, Beijing 100700, Peoples R China
关键词
Clinical record analysis; Word2vec; TF-IDF; TCM; Damp-heat syndrome Classification;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
With people's increasing concern about health, judging people's health through medical record is becoming a potential demand. Most of preview disease analysis researches were conducted on structured dataset, which usually ignored the relationship between different symptoms, and the dataset was expensive to get. In this paper, we proposed a novel model based on Word2vec and Terms Frequency-Inverse Document Frequency (TF-IDF), which could be used to detect damp-heat syndrome on unstructured records directly. Firstly, we adopt ICTCLAS system combined with corpus collected in the field of Traditional Chinese Medicine (TCM) to segment the clinical records into words. Secondly, Word2vec tool was used to train word vector. Then, we constructed the record representation vector according to word vector and TF-IDF. The record representation method was named Word2vec+ TF-IDF. In order to verify the effectiveness of the proposed method, we compared our record representation method with other text representation methods under four different classifiers. The experiment was conducted on the dataset collected from over 10 Chinese Medicine hospitals. And the experimental results show that our model perform better than the state-of-theart methods such as LSA and Doc2vec.
引用
收藏
页码:1415 / 1420
页数:6
相关论文
共 50 条
  • [1] Text classification algorithm of tourist attractions subcategories with modified TF-IDF and Word2Vec
    Xiao, Lu
    Li, Qiaoxing
    Ma, Qian
    Shen, Jiasheng
    Yang, Yong
    Li, Danyang
    PLOS ONE, 2024, 19 (10):
  • [2] Question classification based on Bloom's taxonomy cognitive domain using modified TF-IDF and word2vec
    Mohammed, Manal
    Omar, Nazlia
    PLOS ONE, 2020, 15 (03):
  • [3] 基于TF-IDF与Word2vec的新闻热点分析
    王婧
    中国有线电视, 2023, (02) : 59 - 63
  • [4] 基于TF-IDF与Word2vec的用户评论分析研究
    刘宇韬
    施莉
    刘诗含
    成都航空职业技术学院学报, 2022, 38 (04) : 89 - 92
  • [5] 基于TF-IDF与word2vec的台词文本分类研究
    但宇豪
    黄继风
    杨琳
    高海
    上海师范大学学报(自然科学版), 2020, 49 (自然科学版) : 89 - 95
  • [6] 基于TF-IDF与word2vec的台词文本分类研究
    但宇豪
    黄继风
    杨琳
    高海
    上海师范大学学报(自然科学版), 2020, 49 (01) : 89 - 95
  • [7] Comparative Analysis of Machine Learning Algorithms for Email Phishing Detection Using TF-IDF, Word2Vec, and BERT
    Al Tawil, Arar
    Almazaydeh, Laiali
    Qawasmeh, Doaa
    Qawasmeh, Baraah
    Alshinwan, Mohammad
    Elleithy, Khaled
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 81 (02): : 3395 - 3412
  • [8] 基于Word2vec和改进TF-IDF算法的深度学习模型研究
    石琳
    徐瑞龙
    计算机与数字工程, 2021, 49 (05) : 966 - 970
  • [9] 基于TF-IDF和word2Vec的中文文本自动摘要模型
    龚永罡
    郭远南
    中国新通信, 2023, 25 (02) : 65 - 67
  • [10] TF-IDF和Word2vec在新闻文本分类中的比较研究
    王丽
    肖小玲
    张乐乐
    电脑知识与技术, 2020, 16 (29) : 220 - 222