Document Similarity Measure Based on Topic Model

被引:0
|
作者
He, Ming [1 ]
Wang, Zhen-zhen [1 ]
Du, Yong-ping [1 ]
机构
[1] Beijing Univ Technol, Coll Comp Sci, Beijing, Peoples R China
关键词
latent Dirichlet allocation; document similarity computation; topic model;
D O I
10.4028/www.scientific.net/AMM.513-517.1280
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Document similarity computation is an exciting research topic in information retrieval (IR) and it is a key issue for automatic document categorization, clustering analysis, fuzzy query and question answering. Topic model is an emerging field in natural language processing ( NLP), IR and machine learning (ML). In this paper, we apply a latent Dirichlet allocation (LDA) topic modelbased method to compute similarity between documents. By mapping a document with term space representation into a topic space, a distribution over topics derived for computing document similarity. An empirical study using real data set demonstrates the efficiency of our method.
引用
收藏
页码:1280 / 1284
页数:5
相关论文
共 50 条
  • [31] A Semantic Similarity Measure for Scholarly Document Based on the Study of n-gram
    Samen, Yannick-Ulrich Tchantchou
    JOURNAL OF WEB ENGINEERING, 2022, 21 (07): : 2095 - 2114
  • [32] CompareLDA: A Topic Model for Document Comparison
    Tkachenko, Maksim
    Lauw, Hady W.
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 7112 - 7119
  • [33] A model conditioned data compression based similarity measure
    Cerra, D.
    Datcu, M.
    DCC: 2008 DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2008, : 509 - 509
  • [34] Topic Model based Approach for Improved Indexing in Content based Document Retrieval
    Cha, Moon Soo
    Kim, So Yeon
    Ha, Jae Hee
    Lee, Min-June
    Choi, Young-June
    Sohn, Kyung-Ah
    INTERNATIONAL JOURNAL OF NETWORKED AND DISTRIBUTED COMPUTING, 2016, 4 (01) : 55 - 64
  • [35] Phrase-based document similarity based on an Index Graph model
    Hammouda, KM
    Kamel, MS
    2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, : 203 - 210
  • [36] Designing a Semantic Similarity Measure for Biomedical Document Clustering
    Logeswari, S.
    Kandhasamy, Premalatha
    JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2015, 5 (06) : 1163 - 1170
  • [37] A New Similarity Measure for Document Classification and Text Mining
    Eminagaoglu, Mete
    Goksen, Yilmaz
    ECONOMIES OF THE BALKAN AND EASTERN EUROPEAN COUNTRIES, 2020, : 353 - 366
  • [38] XML document similarity measure in terms of the structure and contents
    Kim, Woosaeng
    PROCEEDINGS OF THE 2ND WSEAS INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATIONS: MODERN TOPICS OF COMPUTER SCIENCE, 2008, : 205 - 212
  • [39] An Intelligent Similarity Measure for Effective Text Document Clustering
    Aishwarya, M. L.
    Selvi, K.
    2016 INTERNATIONAL CONFERENCE ON COMPUTING TECHNOLOGIES AND INTELLIGENT DATA ENGINEERING (ICCTIDE'16), 2016,
  • [40] A New Retrieval Model Based on TextTiling for Document Similarity Search
    Xiao-Jun Wan
    Yu-Xin Peng
    Journal of Computer Science and Technology, 2005, 20 : 552 - 558