Document Similarity Measure Based on Topic Model

被引:0
|
作者
He, Ming [1 ]
Wang, Zhen-zhen [1 ]
Du, Yong-ping [1 ]
机构
[1] Beijing Univ Technol, Coll Comp Sci, Beijing, Peoples R China
关键词
latent Dirichlet allocation; document similarity computation; topic model;
D O I
10.4028/www.scientific.net/AMM.513-517.1280
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Document similarity computation is an exciting research topic in information retrieval (IR) and it is a key issue for automatic document categorization, clustering analysis, fuzzy query and question answering. Topic model is an emerging field in natural language processing ( NLP), IR and machine learning (ML). In this paper, we apply a latent Dirichlet allocation (LDA) topic modelbased method to compute similarity between documents. By mapping a document with term space representation into a topic space, a distribution over topics derived for computing document similarity. An empirical study using real data set demonstrates the efficiency of our method.
引用
收藏
页码:1280 / 1284
页数:5
相关论文
共 50 条
  • [41] A new retrieval model based on TextTiling for document similarity search
    Wan, XJ
    Peng, YX
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2005, 20 (04) : 552 - 558
  • [42] Tiered sentence based topic model for multi-document summarization
    Akhtar, Nadeem
    Beg, M. M. Sufyan
    Javed, Hira
    Hussain, Md Muzakkir
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2022, 43 (08): : 2131 - 2141
  • [43] Document Recommendation with Implicit Feedback Based on Matrix Factorization and Topic Model
    Lai, Chin-Hui
    Liu, Duen-Ren
    Lin, Siao-Rong
    PROCEEDINGS OF 4TH IEEE INTERNATIONAL CONFERENCE ON APPLIED SYSTEM INNOVATION 2018 ( IEEE ICASI 2018 ), 2018, : 62 - 65
  • [44] Research On Multi-document Summarization Based On LDA Topic Model
    Bian, Jinqiang
    Jiang, Zengru
    Chen, Qian
    2014 SIXTH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC), VOL 2, 2014, : 113 - 116
  • [45] Binary Document Classification Based on Fast Flux Discriminant with Similarity Measure on Word Set
    Okubo, Keisuke
    Kumoi, Gendo
    Goto, Masayuki
    INDUSTRIAL ENGINEERING AND MANAGEMENT SYSTEMS, 2019, 18 (02): : 245 - 251
  • [46] LDA-Based Topic Modeling Sentiment Analysis Using Topic/Document/Sentence (TDS) Model
    Farkhod, Akhmedov
    Abdusalomov, Akmalbek
    Makhmudov, Fazliddin
    Cho, Young Im
    APPLIED SCIENCES-BASEL, 2021, 11 (23):
  • [47] Clip Recommendation based on Topic Similarity
    Park, Wonjoo
    Son, Jeong-Woo
    Lee, Sang-Yun
    Kim, Sun-Joong
    2018 20TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT), 2018, : 661 - 663
  • [48] A similarity measure model based on the dissimilarity degree between users
    Al-Safi, Jehan
    Kaleli, Cihan
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2021, 42 (07): : 1649 - 1656
  • [49] A novel affect-based model of similarity measure of videos
    Niu, Jianwei
    Zhao, Xiaoke
    Aziz, Muhammad Ali Abdul
    NEUROCOMPUTING, 2016, 173 : 339 - 345
  • [50] Spectral similarity measure based on fuzzy feature contrast model
    Tang, H
    Fang, T
    Shi, PF
    OPTICS COMMUNICATIONS, 2004, 238 (1-3) : 123 - 137