Sequential latent Dirichlet allocation

被引:41
|
作者
Du, Lan [1 ]
Buntine, Wray [1 ]
Jin, Huidong [2 ]
Chen, Changyou [1 ]
机构
[1] Natl ICT Australia, Canberra, ACT 2601, Australia
[2] CSIRO Math Informat & Stat, Canberra, ACT, Australia
基金
澳大利亚研究理事会;
关键词
Latent Dirichlet allocation; Poisson-Dirichlet process; Collapsed Gibbs sampler; Topic model; Document structure; MODEL;
D O I
10.1007/s10115-011-0425-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Understanding how topics within a document evolve over the structure of the document is an interesting and potentially important problem in exploratory and predictive text analytics. In this article, we address this problem by presenting a novel variant of latent Dirichlet allocation (LDA): Sequential LDA (SeqLDA). This variant directly considers the underlying sequential structure, i.e. a document consists of multiple segments (e.g. chapters, paragraphs), each of which is correlated to its antecedent and subsequent segments. Such progressive sequential dependency is captured by using the hierarchical two-parameter Poisson-Dirichlet process (HPDP). We develop an efficient collapsed Gibbs sampling algorithm to sample from the posterior of the SeqLDA based on the HPDP. Our experimental results on patent documents show that by considering the sequential structure within a document, our SeqLDA model has a higher fidelity over LDA in terms of perplexity (a standard measure of dictionary-based compressibility). The SeqLDA model also yields a nicer sequential topic structure than LDA, as we show in experiments on several books such as Melville's 'Moby Dick'.
引用
收藏
页码:475 / 503
页数:29
相关论文
共 50 条
  • [21] Topic Selection in Latent Dirichlet Allocation
    Wang, Biao
    Liu, Zelong
    Li, Maozhen
    Liu, Yang
    Qi, Man
    2014 11TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2014, : 756 - 760
  • [22] Crowd labeling latent Dirichlet allocation
    Pion-Tonachini, Luca
    Makeig, Scott
    Kreutz-Delgado, Ken
    KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 53 (03) : 749 - 765
  • [23] The Auto Annotation Latent Dirichlet Allocation
    Xiang, Yingzhuo
    Yang, Dongmei
    Yan, Jikun
    PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INFORMATION SCIENCES, MACHINERY, MATERIALS AND ENERGY (ICISMME 2015), 2015, 126 : 1908 - 1911
  • [24] Exploring Symmetrical and Asymmetrical Dirichlet Priors for Latent Dirichlet Allocation
    Syed, Shaheen
    Spruit, Marco
    INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2018, 12 (03) : 399 - 423
  • [25] ADR-SPLDA: Activity discovery and recognition by combining sequential patterns and latent Dirichlet allocation
    Chikhaoui, Belkacem
    Wang, Shengrui
    Pigot, Helene
    PERVASIVE AND MOBILE COMPUTING, 2012, 8 (06) : 845 - 862
  • [26] Joint Latent Dirichlet Allocation for Social Tags
    Yao, Jiangchao
    Wang, Yanfeng
    Zhang, Ya
    Sun, Jun
    Zhou, Jun
    IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (01) : 224 - 237
  • [27] Bug localization using latent Dirichlet allocation
    Lukins, Stacy K.
    Kraft, Nicholas A.
    Etzkorn, Letha H.
    INFORMATION AND SOFTWARE TECHNOLOGY, 2010, 52 (09) : 972 - 990
  • [28] BiModal Latent Dirichlet Allocation for Text and Image
    Liao, Xiaofeng
    Jiang, Qingshan
    Zhang, Wei
    Zhang, Kai
    2014 4TH IEEE INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2014, : 736 - 739
  • [29] Latent Dirichlet Allocation Models for Image Classification
    Rasiwasia, Nikhil
    Vasconcelos, Nuno
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (11) : 2665 - 2679
  • [30] Nonstationary Latent Dirichlet Allocation for Speech Recognition
    Chueh, Chuang-Hua
    Chien, Jen-Tzung
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 356 - 359