A probabilistic topic model based on short distance Co-occurrences

被引:7
|
作者
Rahimi, Marziea [1 ]
Zahedi, Morteza [1 ]
Mashayekhi, Hoda [1 ]
机构
[1] Shahrood Univ Technol, Fac Comp Engn, Shahrood 3619995161, Iran
关键词
Probabilistic topic model; Latent Dirichlet Allocation; Document clustering; Context window; Local co-occurrence; Word order; NOISY TEXT; DISCOVERY; CLASSIFICATION;
D O I
10.1016/j.eswa.2022.116518
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A limitation of many probabilistic topic models such as Latent Dirichlet Allocation (LDA) is their inflexibility to use local contexts. As a result, these models cannot directly benefit from short-distance co-occurrences, which are more likely to be indicators of meaningful word relationships. Some models such as the Bigram Topic Model (BTM) consider local context by integrating language and topic models. However, due to taking the exact word order into account, such models suffer severely from sparseness. Some other models like Latent Dirichlet Co-Clustering (LDCC) try to solve the problem by adding another level of granularity assuming a document as a bag of segments, while ignoring the word order. In this paper, we introduce a new topic model which uses overlapping windows to encode local word relationships. In the proposed model, we assume a document is comprised of fixed-size overlapping windows, and formulate a new generative process accordingly. In the inference procedure, each word is sampled once in only a single window, while influencing the sampling of its other fellow co-occurring words in other windows. Word relationships are discovered in the document level, but the topic of each word is derived considering only its neighbor words in a window, to emphasize local word relationships. By using overlapping windows, without assuming an explicit dependency between adjacent words, we avoid ignoring the word order completely. The proposed model is straightforward, not severely prone to sparseness and as the experimental results show, produces more meaningful and more coherent topics compared to the three mentioned established models.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Headache and Tremor: Co-occurrences and Possible Associations
    Kuiper, Mathys
    Hendrikx, Suzan
    Koehler, Peter J.
    TREMOR AND OTHER HYPERKINETIC MOVEMENTS, 2015, 5
  • [22] Corpus of Syntactic Co-Occurrences: A Delayed Promise
    Klyshinsky, Eduard S.
    Lukashevich, Natalia Y.
    ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE, 2018, 789 : 121 - 127
  • [23] A preliminary model of pronoun/verb co-occurrences in child-directed speech
    Laakso, A
    Smith, LB
    PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE ON COGNITIVE MODELING, 2004, : 136 - 141
  • [24] Analyzing Relatedness by Toponym Co-Occurrences on Web Pages
    Liu, Yu
    Wang, Fahui
    Kang, Chaogui
    Gao, Yong
    Lu, Yongmei
    TRANSACTIONS IN GIS, 2014, 18 (01) : 89 - 107
  • [25] Discovering Significant Co-Occurrences to Characterize Network Behaviors
    Arthur-Durett, Kristine
    Carroll, Thomas E.
    Chikkagoudar, Satish
    HUMAN INTERFACE AND THE MANAGEMENT OF INFORMATION: INTERACTION, VISUALIZATION, AND ANALYTICS, HIMI 2018 HELD AS PART OF HCI 2018, PART I, 2018, 10904 : 609 - 623
  • [26] Disentangling categorical relationships through a graph of co-occurrences
    Martinez-Romo, Juan
    Araujo, Lourdes
    Borge-Holthoefer, Javier
    Arenas, Alex
    Capitan, Jose A.
    Cuesta, Jose A.
    PHYSICAL REVIEW E, 2011, 84 (04)
  • [27] Hypergraph-Based Anomaly Detection of High-Dimensional Co-Occurrences
    Silva, Jorge
    Willett, Rebecca
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, 31 (03) : 563 - 569
  • [28] Visualization of health-subject analysis based on query term co-occurrences
    Zhang, Jin
    Wolfram, Dietmar
    Wang, Peiling
    Hong, Yi
    Gillis, Rick
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2008, 59 (12): : 1933 - 1947
  • [29] Diversification Improvements Through News Article Co-occurrences
    Yaros, John Robert
    Imielinski, Tomasz
    2014 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE FOR FINANCIAL ENGINEERING & ECONOMICS (CIFER), 2014, : 130 - 137
  • [30] Attitudes From Mere Co-Occurrences Are Guided by Differentiation
    Alves, Hans
    Hoegden, Fabia
    Gast, Anne
    Aust, Frederik
    Unkelbach, Christian
    JOURNAL OF PERSONALITY AND SOCIAL PSYCHOLOGY, 2020, 119 (03) : 560 - 581