A Short Text Similarity Measure Based on Hidden Topics

被引:0
|
作者
Chen, Hong-chao [1 ,2 ]
Guo, Xiao-hua [1 ]
Liu, Ling-qiang [1 ]
Zhu, Xin-hua [1 ,2 ]
机构
[1] Guangxi Normal Univ, Coll Comp Sci & IT, Guilin 541004, Peoples R China
[2] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China
来源
COMPUTER SCIENCE AND TECHNOLOGY (CST2016) | 2017年
基金
中国国家自然科学基金;
关键词
Short text; Similarity measure; Topic model; KNN; Information retrieval;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Similarity measurement plays an important role in the classification of short text. However, traditional text similarity measures fail to achieve a high accuracy because the sparse features in short text. In this paper, we propose a new method based on the different number of hidden topics, which are derived through well-known topic models such as Latent Dirichlet Allocation (LDA). We obtain the related topics, and integrate the topics with the features of short text in order to decrease the sparseness and improve the word co-occurrences. Numerous experiments were conducted on the open data set (Wikipedia dataset) and the results demonstrated that our proposed method improves classification accuracy by 14.03% on the k-nearest neighbors algorithm (KNN). This indicates that our method outperforms other state-of-the-art methods which do not utilize hidden topics and validates that the method is effective.
引用
收藏
页码:1101 / 1108
页数:8
相关论文
共 50 条
  • [21] A Dynamic Clustering Method of Hot Topics Based on User Interaction and Text Similarity
    Liu, Shan
    Wu, Xiaoqing
    Chai, Jianping
    2021 14TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2021), 2021,
  • [22] A COMBINED MEASURE FOR TEXT SEMANTIC SIMILARITY
    Li, Hao-Di
    Chen, Qing-Cai
    Wang, Xiao-Long
    PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 1869 - 1873
  • [23] Text matching to measure patent similarity
    Arts, Sam
    Cassiman, Bruno
    Carlos Gomez, Juan
    STRATEGIC MANAGEMENT JOURNAL, 2018, 39 (01) : 62 - 84
  • [24] FUSE (Fuzzy Similarity Measure) - A measure for determining fuzzy short text similarity using Interval Type-2 fuzzy sets
    Adel, Naeemeh
    Crockett, Keeley
    Crispin, Alan
    Chandran, David
    Carvalho, Joao P.
    2018 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2018,
  • [25] A Similarity Measure for Text Classification and Clustering
    Lin, Yung-Shen
    Jiang, Jung-Yi
    Lee, Shie-Jue
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (07) : 1575 - 1590
  • [26] An Effective TF/IDF-Based Text-to-Text Semantic Similarity Measure for Text Classification
    Albitar, Shereen
    Fournier, Sebastien
    Espinasse, Bernard
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2014, PT I, 2014, 8786 : 105 - 114
  • [27] An Improved Text Retrieval Algorithm Based on Suffix Tree Similarity Measure
    Huang, Cheng-hui
    Yin, Jian
    Han, Dong
    INFORMATION COMPUTING AND APPLICATIONS, PT 2, 2010, 106 : 150 - +
  • [28] Topic Model Based Text Similarity Measure for Chinese Judgment Document
    Wang, Yue
    Ge, Jidong
    Zhou, Yemao
    Feng, Yi
    Li, Chuanyi
    Li, Zhongjin
    Zhou, Xiaoyu
    Luo, Bin
    DATA SCIENCE, PT II, 2017, 728 : 42 - 54
  • [29] Clustering of Text Collections based on PART Neural Network and Similarity Measure
    Krakovsky, R.
    Mokris, I.
    IEEE INTERNATIONAL CONFERENCE ON SYSTEM SCIENCE AND ENGINEERING (ICSSE 2013), 2013, : 253 - 257
  • [30] Similarity measures for Chinese short text based on representation learning
    University of Science and Technology Beijing, Beijing, China
    不详
    J. Inf. Comput. Sci., 6 (2253-2263):