A probabilistic topic model based on short distance Co-occurrences

被引:7
|
作者
Rahimi, Marziea [1 ]
Zahedi, Morteza [1 ]
Mashayekhi, Hoda [1 ]
机构
[1] Shahrood Univ Technol, Fac Comp Engn, Shahrood 3619995161, Iran
关键词
Probabilistic topic model; Latent Dirichlet Allocation; Document clustering; Context window; Local co-occurrence; Word order; NOISY TEXT; DISCOVERY; CLASSIFICATION;
D O I
10.1016/j.eswa.2022.116518
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A limitation of many probabilistic topic models such as Latent Dirichlet Allocation (LDA) is their inflexibility to use local contexts. As a result, these models cannot directly benefit from short-distance co-occurrences, which are more likely to be indicators of meaningful word relationships. Some models such as the Bigram Topic Model (BTM) consider local context by integrating language and topic models. However, due to taking the exact word order into account, such models suffer severely from sparseness. Some other models like Latent Dirichlet Co-Clustering (LDCC) try to solve the problem by adding another level of granularity assuming a document as a bag of segments, while ignoring the word order. In this paper, we introduce a new topic model which uses overlapping windows to encode local word relationships. In the proposed model, we assume a document is comprised of fixed-size overlapping windows, and formulate a new generative process accordingly. In the inference procedure, each word is sampled once in only a single window, while influencing the sampling of its other fellow co-occurring words in other windows. Word relationships are discovered in the document level, but the topic of each word is derived considering only its neighbor words in a window, to emphasize local word relationships. By using overlapping windows, without assuming an explicit dependency between adjacent words, we avoid ignoring the word order completely. The proposed model is straightforward, not severely prone to sparseness and as the experimental results show, produces more meaningful and more coherent topics compared to the three mentioned established models.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Shared and unique mutational gene co-occurrences in cancers
    Liu, Junqi
    Zhao, Di
    Fan, Ruitai
    BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2015, 465 (04) : 777 - 783
  • [32] Word co-occurrences as a principle of an algorithm for extraction of terminology
    Zunker, G
    Rapp, R
    COGNITIVE ASPECTS OF LANGUAGE, 1996, 360 : 293 - 298
  • [33] Random Projections of Residuals as an Alternative to Co-occurrences in Steganalysis
    Holub, Vojtech
    Fridrich, Jessica
    Denemark, Tomas
    MEDIA WATERMARKING, SECURITY, AND FORENSICS 2013, 2013, 8665
  • [34] Framing comorbidities and co-occurrences in a migraine with aura patient
    Andrea Negro
    Lidia D’Alonzo
    Paolo Martelletti
    Internal and Emergency Medicine, 2014, 9 : 603 - 604
  • [35] Visualizing Textbook Concepts: Beyond Word Co-occurrences
    Sastry, Chandramouli Shama
    Jagaluru, Darshan Siddesh
    Mahesh, Kavi
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2017), PT I, 2018, 10761 : 363 - 376
  • [36] Video Classification using Semantic Concept Co-occurrences
    Assari, Shayan Modiri
    Zamir, Amir Roshan
    Shah, Mubarak
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 2529 - 2536
  • [37] Mining Geometrical Motifs Co-occurrences in the CMS Dataset
    Musci, Mirto
    Ferretti, Marco
    DATABASE AND EXPERT SYSTEMS APPLICATIONS: DEXA 2018 INTERNATIONAL WORKSHOPS, 2018, 903 : 179 - 190
  • [38] Framing comorbidities and co-occurrences in a migraine with aura patient
    Negro, Andrea
    D'Alonzo, Lidia
    Martelletti, Paolo
    INTERNAL AND EMERGENCY MEDICINE, 2014, 9 (05) : 603 - 604
  • [39] Topic evolution based on the probabilistic topic model: a review
    Houkui Zhou
    Huimin Yu
    Roland Hu
    Frontiers of Computer Science, 2017, 11 : 786 - 802
  • [40] Topic evolution based on the probabilistic topic model: a review
    Zhou, Houkui
    Yu, Huimin
    Hu, Roland
    FRONTIERS OF COMPUTER SCIENCE, 2017, 11 (05) : 786 - 802