A Short Text Similarity Measure Based on Hidden Topics

被引：0

作者：

Chen, Hong-chao ^{[1
,2
]}

Guo, Xiao-hua ^{[1
]}

Liu, Ling-qiang ^{[1
]}

Zhu, Xin-hua ^{[1
,2
]}

机构：

[1] Guangxi Normal Univ, Coll Comp Sci & IT, Guilin 541004, Peoples R China

[2] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China

来源：

COMPUTER SCIENCE AND TECHNOLOGY (CST2016) | 2017年

基金：

中国国家自然科学基金;

关键词：

Short text; Similarity measure; Topic model; KNN; Information retrieval;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Similarity measurement plays an important role in the classification of short text. However, traditional text similarity measures fail to achieve a high accuracy because the sparse features in short text. In this paper, we propose a new method based on the different number of hidden topics, which are derived through well-known topic models such as Latent Dirichlet Allocation (LDA). We obtain the related topics, and integrate the topics with the features of short text in order to decrease the sparseness and improve the word co-occurrences. Numerous experiments were conducted on the open data set (Wikipedia dataset) and the results demonstrated that our proposed method improves classification accuracy by 14.03% on the k-nearest neighbors algorithm (KNN). This indicates that our method outperforms other state-of-the-art methods which do not utilize hidden topics and validates that the method is effective.

引用

页码：1101 / 1108

页数：8

共 50 条

[21] A Dynamic Clustering Method of Hot Topics Based on User Interaction and Text Similarity
Liu, Shan
Wu, Xiaoqing
Chai, Jianping
2021 14TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2021), 2021,
[22] A COMBINED MEASURE FOR TEXT SEMANTIC SIMILARITY
Li, Hao-Di
Chen, Qing-Cai
Wang, Xiao-Long
PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 1869 - 1873
[23] Text matching to measure patent similarity
Arts, Sam
Cassiman, Bruno
Carlos Gomez, Juan
STRATEGIC MANAGEMENT JOURNAL, 2018, 39 (01) : 62 - 84
[24] FUSE (Fuzzy Similarity Measure) - A measure for determining fuzzy short text similarity using Interval Type-2 fuzzy sets
Adel, Naeemeh
Crockett, Keeley
Crispin, Alan
Chandran, David
Carvalho, Joao P.
2018 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2018,
[25] A Similarity Measure for Text Classification and Clustering
Lin, Yung-Shen
Jiang, Jung-Yi
Lee, Shie-Jue
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (07) : 1575 - 1590
[26] An Effective TF/IDF-Based Text-to-Text Semantic Similarity Measure for Text Classification
Albitar, Shereen
Fournier, Sebastien
Espinasse, Bernard
WEB INFORMATION SYSTEMS ENGINEERING - WISE 2014, PT I, 2014, 8786 : 105 - 114
[27] An Improved Text Retrieval Algorithm Based on Suffix Tree Similarity Measure
Huang, Cheng-hui
Yin, Jian
Han, Dong
INFORMATION COMPUTING AND APPLICATIONS, PT 2, 2010, 106 : 150 - +
[28] Topic Model Based Text Similarity Measure for Chinese Judgment Document
Wang, Yue
Ge, Jidong
Zhou, Yemao
Feng, Yi
Li, Chuanyi
Li, Zhongjin
Zhou, Xiaoyu
Luo, Bin
DATA SCIENCE, PT II, 2017, 728 : 42 - 54
[29] Clustering of Text Collections based on PART Neural Network and Similarity Measure
Krakovsky, R.
Mokris, I.
IEEE INTERNATIONAL CONFERENCE ON SYSTEM SCIENCE AND ENGINEERING (ICSSE 2013), 2013, : 253 - 257
[30] Similarity measures for Chinese short text based on representation learning
University of Science and Technology Beijing, Beijing, China
不详
J. Inf. Comput. Sci., 6 (2253-2263):

← 1 2 3 4 5 →