Document Similarity Measure Based on Topic Model

被引:0
|
作者
He, Ming [1 ]
Wang, Zhen-zhen [1 ]
Du, Yong-ping [1 ]
机构
[1] Beijing Univ Technol, Coll Comp Sci, Beijing, Peoples R China
关键词
latent Dirichlet allocation; document similarity computation; topic model;
D O I
10.4028/www.scientific.net/AMM.513-517.1280
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Document similarity computation is an exciting research topic in information retrieval (IR) and it is a key issue for automatic document categorization, clustering analysis, fuzzy query and question answering. Topic model is an emerging field in natural language processing ( NLP), IR and machine learning (ML). In this paper, we apply a latent Dirichlet allocation (LDA) topic modelbased method to compute similarity between documents. By mapping a document with term space representation into a topic space, a distribution over topics derived for computing document similarity. An empirical study using real data set demonstrates the efficiency of our method.
引用
收藏
页码:1280 / 1284
页数:5
相关论文
共 50 条
  • [1] Topic Model Based Text Similarity Measure for Chinese Judgment Document
    Wang, Yue
    Ge, Jidong
    Zhou, Yemao
    Feng, Yi
    Li, Chuanyi
    Li, Zhongjin
    Zhou, Xiaoyu
    Luo, Bin
    DATA SCIENCE, PT II, 2017, 728 : 42 - 54
  • [2] Novel Similarity Measure for Document Clustering Based on Topic Phrases
    ELdesoky, A. E.
    Saleh, M.
    Sakr, N. A.
    ICNM: 2009 INTERNATIONAL CONFERENCE ON NETWORKING & MEDIA CONVERGENCE, 2007, : 92 - +
  • [3] SimDoc: Topic Sequence Alignment based Document Similarity Framework
    Maheshwari, Gaurav
    Trivedi, Priyansh
    Sahijwani, Harshita
    Jha, Kunal
    Dasgupta, Sourish
    Lehmann, Jens
    K-CAP 2017: PROCEEDINGS OF THE KNOWLEDGE CAPTURE CONFERENCE, 2017,
  • [4] Learning a concept-based document similarity measure
    Huang, Lan
    Milne, David
    Frank, Eibe
    Witten, Ian H.
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2012, 63 (08): : 1593 - 1608
  • [5] Hierarchical Document Clustering based on Cosine Similarity measure
    Popat, Shraddha K.
    Deshmukh, Pramod B.
    Metre, Vishakha A.
    2017 1ST INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND INFORMATION MANAGEMENT (ICISIM), 2017, : 153 - 159
  • [6] Document Visual Similarity Measure For Document Search
    Ahmadullin, Ildus
    Allebach, Jan P.
    Damera-Venkata, Niranjan
    Fan, Jian
    Lee, Seungyon
    Lin, Qian
    Liu, Jerry
    DOCENG 2011: PROCEEDINGS OF THE 2011 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, 2011, : 139 - 142
  • [7] A topic model based on CRP and word similarity
    Zhang, Xiao-Ping
    Zhou, Xue-Zhong
    Huang, Hou-Kuan
    Feng, Qi
    Chen, Shi-Bo
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2010, 23 (01): : 72 - 76
  • [8] A topic-based document correlation model
    Jia, Xi-Ping
    Peng, Hong
    Zheng, Qj-Lun
    Jiang, Zhuo-Lin
    Li, Zhao
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 2487 - 2491
  • [9] A Topic based Document Relevance Ranking Model
    Gao, Yang
    Xu, Yue
    Li, Yuefeng
    WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, : 271 - 272
  • [10] Divergence-based similarity measure for spoken document retrieval
    Liu, Peng
    Soong, Frank K.
    Zhou, Jian-Lai
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 89 - +