A Short Text Similarity Measure Based on Hidden Topics

被引:0
|
作者
Chen, Hong-chao [1 ,2 ]
Guo, Xiao-hua [1 ]
Liu, Ling-qiang [1 ]
Zhu, Xin-hua [1 ,2 ]
机构
[1] Guangxi Normal Univ, Coll Comp Sci & IT, Guilin 541004, Peoples R China
[2] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China
来源
COMPUTER SCIENCE AND TECHNOLOGY (CST2016) | 2017年
基金
中国国家自然科学基金;
关键词
Short text; Similarity measure; Topic model; KNN; Information retrieval;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Similarity measurement plays an important role in the classification of short text. However, traditional text similarity measures fail to achieve a high accuracy because the sparse features in short text. In this paper, we propose a new method based on the different number of hidden topics, which are derived through well-known topic models such as Latent Dirichlet Allocation (LDA). We obtain the related topics, and integrate the topics with the features of short text in order to decrease the sparseness and improve the word co-occurrences. Numerous experiments were conducted on the open data set (Wikipedia dataset) and the results demonstrated that our proposed method improves classification accuracy by 14.03% on the k-nearest neighbors algorithm (KNN). This indicates that our method outperforms other state-of-the-art methods which do not utilize hidden topics and validates that the method is effective.
引用
收藏
页码:1101 / 1108
页数:8
相关论文
共 50 条
  • [41] An Improved Similarity Measure for Text Clustering and Classification
    Reddy, G. Suresh
    Kanth, T. V. Rajini
    Rao, A. Ananda
    ADVANCED SCIENCE LETTERS, 2015, 21 (11) : 3583 - 3590
  • [42] An improved Similarity Measure For Chinese Text Clustering
    Zhang, Shaolei
    Wang, Zhong
    Huang, Wei
    2016 2ND INTERNATIONAL CONFERENCE ON MECHANICAL, ELECTRONIC AND INFORMATION TECHNOLOGY ENGINEERING (ICMITE 2016), 2016, : 141 - 144
  • [43] A Comment on "A Similarity Measure for Text Classification and Clustering"
    Nagwani, Naresh Kumar
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (09) : 2589 - 2590
  • [44] Similarity measures for short segments of text
    Metzler, Donald
    Dumais, Susan
    Meek, Christopher
    ADVANCES IN INFORMATION RETRIEVAL, 2007, 4425 : 16 - +
  • [45] Benchmarking short text semantic similarity
    O'Shea J.
    Bandar Z.
    Crockett K.
    McLean D.
    International Journal of Intelligent Information and Database Systems, 2010, 4 (02) : 103 - 120
  • [46] Text Similarity Approach for SNOMED CT Primitive Concept Similarity Measure
    Htun, Htet Htet
    Sornlertlamvanich, Virach
    2017 8TH INTERNATIONAL CONFERENCE OF INFORMATION AND COMMUNICATION TECHNOLOGY FOR EMBEDDED SYSTEMS (IC-ICTES), 2017,
  • [47] A MODIFIED ANT-BASED TEXT CLUSTERING ALGORITHM WITH SEMANTIC SIMILARITY MEASURE
    Xia, Haoxiang
    Wang, Shuguang
    Yoshida, Taketoshi
    JOURNAL OF SYSTEMS SCIENCE AND SYSTEMS ENGINEERING, 2006, 15 (04) : 474 - 492
  • [48] Process-extraction-based text similarity measure for emergency response plans
    Guo, Wenyan
    Zeng, Qingtian
    Duan, Hua
    Ni, Weijian
    Liu, Cong
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 183
  • [49] A modified ant-based text clustering algorithm with semantic similarity measure
    Haoxiang Xia
    Shuguang Wang
    Taketoshi Yoshida
    Journal of Systems Science and Systems Engineering, 2006, 15 : 474 - 492
  • [50] A Mechanics-Based Similarity Measure for Text Classification in Machine Learning Paradigm
    Kuppili, Venkatanareshbabu
    Biswas, Mainak
    Edla, Damodar Reddy
    Prasad, K. J. Ravi
    Suri, Jasjit S.
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2020, 4 (02): : 180 - 200