Document Similarity Measure Based on Topic Model

被引:0
|
作者
He, Ming [1 ]
Wang, Zhen-zhen [1 ]
Du, Yong-ping [1 ]
机构
[1] Beijing Univ Technol, Coll Comp Sci, Beijing, Peoples R China
关键词
latent Dirichlet allocation; document similarity computation; topic model;
D O I
10.4028/www.scientific.net/AMM.513-517.1280
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Document similarity computation is an exciting research topic in information retrieval (IR) and it is a key issue for automatic document categorization, clustering analysis, fuzzy query and question answering. Topic model is an emerging field in natural language processing ( NLP), IR and machine learning (ML). In this paper, we apply a latent Dirichlet allocation (LDA) topic modelbased method to compute similarity between documents. By mapping a document with term space representation into a topic space, a distribution over topics derived for computing document similarity. An empirical study using real data set demonstrates the efficiency of our method.
引用
收藏
页码:1280 / 1284
页数:5
相关论文
共 50 条
  • [11] A measure based on optimal matching in graph theory for document similarity
    Wan, XJ
    Peng, YX
    INFORMATION RETRIEVAL TECHNOLOGY, 2005, 3411 : 227 - 238
  • [12] Affinity-based similarity measure for web document clustering
    Shyu, ML
    Chen, SC
    Chen, M
    Rubin, SH
    PROCEEDINGS OF THE 2004 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI-2004), 2004, : 247 - 252
  • [13] Pairwise document similarity measure based on present term set
    Oghbaie M.
    Mohammadi Zanjireh M.
    Journal of Big Data, 2018, 5 (01)
  • [14] Similarity Measure for Semantic Document Interconnections
    Hwang, Myunggwon
    Choi, Dongjin
    Choi, Junho
    Kim, Hanil
    Kim, Pankoo
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2010, 13 (02): : 253 - 267
  • [15] An Embedding-Based Topic Model for Document Classification
    Seifollahi, Sattar
    Piccardi, Massimo
    Jolfaei, Alireza
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (03)
  • [16] The Topic Similarity Computation Model based Information Granularity
    Xu Liyong
    Dong Yanrong
    Xu Na
    Pei Caiyan
    Gu Liwei
    Kang Yan
    2009 WRI WORLD CONGRESS ON SOFTWARE ENGINEERING, VOL 2, PROCEEDINGS, 2009, : 12 - 15
  • [17] Document Representation Based on Semantic Smoothed Topic Model
    Liu, Ying
    Song, Wei
    Liu, Lizhen
    Wang, Hanshi
    2016 17TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2016, : 65 - 69
  • [18] Application of a Similarity Measure for Graphs to Web-based Document Structures
    Dehmer, Matthias
    Emmert-Streib, Frank
    Mehler, Alexander
    Kilian, Juergen
    Muehlhaeuser, Max
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 8, 2005, 8 : 77 - 81
  • [19] A novel document similarity measure based on earth mover's distance
    Wan, Xiaojun
    INFORMATION SCIENCES, 2007, 177 (18) : 3718 - 3730
  • [20] Weighted Similarity: A New Similarity Measure for Document Ranking Features
    Shirzad, Mehrnoush Barani
    Keyvanpour, Mohammad Reza
    ARTIFICIAL INTELLIGENCE TRENDS IN INTELLIGENT SYSTEMS, CSOC2017, VOL 1, 2017, 573 : 273 - 280