Document Similarity Measure Based on Topic Model

被引:0
|
作者
He, Ming [1 ]
Wang, Zhen-zhen [1 ]
Du, Yong-ping [1 ]
机构
[1] Beijing Univ Technol, Coll Comp Sci, Beijing, Peoples R China
关键词
latent Dirichlet allocation; document similarity computation; topic model;
D O I
10.4028/www.scientific.net/AMM.513-517.1280
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Document similarity computation is an exciting research topic in information retrieval (IR) and it is a key issue for automatic document categorization, clustering analysis, fuzzy query and question answering. Topic model is an emerging field in natural language processing ( NLP), IR and machine learning (ML). In this paper, we apply a latent Dirichlet allocation (LDA) topic modelbased method to compute similarity between documents. By mapping a document with term space representation into a topic space, a distribution over topics derived for computing document similarity. An empirical study using real data set demonstrates the efficiency of our method.
引用
收藏
页码:1280 / 1284
页数:5
相关论文
共 50 条
  • [21] Recommendation Model Based On a Contextual Similarity Measure
    Hannech, Amel
    Adda, Mehdi
    Mcheick, Hamid
    2016 15TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2016), 2016, : 394 - 401
  • [22] Korean document summarization using topic phrases extraction and locality-based similarity
    Ryu, J
    Han, KR
    Rim, KW
    FOUNDATIONS OF INTELLIGENT SYSTEMS, 2003, 2871 : 320 - 325
  • [23] NET-LDA: a novel topic modeling method based on semantic document similarity
    Ekinci, Ekin
    Omurca, Sevinc Ilhan
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2020, 28 (04) : 2244 - 2260
  • [24] Document Clustering in Correlation Similarity Measure Space
    Zhang, Taiping
    Tang, Yuan Yan
    Fang, Bin
    Xiang, Yong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (06) : 1002 - 1013
  • [25] Concept based document similarity using graph model
    Sonawane S.S.
    Kulkarni P.
    International Journal of Information Technology, 2022, 14 (1) : 311 - 322
  • [26] Topic Model Based Knowledge Graph for Entity Similarity Measuring
    Sun, Haoran
    Ren, Rui
    Cai, Hongming
    Xu, Boyi
    Liu, Yonggang
    Li, Tongyu
    2018 IEEE 15TH INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING (ICEBE 2018), 2018, : 94 - 101
  • [27] Topic Similarity Networks: Visual Analytics for Large Document Sets
    Maiya, Arun S.
    Rolfe, Robert M.
    2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014, : 364 - 372
  • [28] Using Link-Based Content Analysis to Measure Document Similarity Effectively
    Li, Pei
    Li, Zhixu
    Liu, Hongyan
    He, Jun
    Du, Xiaoyong
    ADVANCES IN DATA AND WEB MANAGEMENT, PROCEEDINGS, 2009, 5446 : 455 - 467
  • [29] CheckSim: A Reference-Based Identity Document Verification by Image Similarity Measure
    Ghanmi, Nabil
    Nabli, Cyrine
    Awal, Ahmad-Montaser
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021 WORKSHOPS, PT I, 2021, 12916 : 422 - 436
  • [30] Multi-viewpoint Based Similarity Measure and Optimality Criteria for Document Clustering
    Duc Thang Nguyen
    Chen, Lihui
    Chan, Chee Keong
    INFORMATION RETRIEVAL TECHNOLOGY, 2010, 6458 : 49 - 60