Sequential latent Dirichlet allocation

被引:41
|
作者
Du, Lan [1 ]
Buntine, Wray [1 ]
Jin, Huidong [2 ]
Chen, Changyou [1 ]
机构
[1] Natl ICT Australia, Canberra, ACT 2601, Australia
[2] CSIRO Math Informat & Stat, Canberra, ACT, Australia
基金
澳大利亚研究理事会;
关键词
Latent Dirichlet allocation; Poisson-Dirichlet process; Collapsed Gibbs sampler; Topic model; Document structure; MODEL;
D O I
10.1007/s10115-011-0425-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Understanding how topics within a document evolve over the structure of the document is an interesting and potentially important problem in exploratory and predictive text analytics. In this article, we address this problem by presenting a novel variant of latent Dirichlet allocation (LDA): Sequential LDA (SeqLDA). This variant directly considers the underlying sequential structure, i.e. a document consists of multiple segments (e.g. chapters, paragraphs), each of which is correlated to its antecedent and subsequent segments. Such progressive sequential dependency is captured by using the hierarchical two-parameter Poisson-Dirichlet process (HPDP). We develop an efficient collapsed Gibbs sampling algorithm to sample from the posterior of the SeqLDA based on the HPDP. Our experimental results on patent documents show that by considering the sequential structure within a document, our SeqLDA model has a higher fidelity over LDA in terms of perplexity (a standard measure of dictionary-based compressibility). The SeqLDA model also yields a nicer sequential topic structure than LDA, as we show in experiments on several books such as Melville's 'Moby Dick'.
引用
收藏
页码:475 / 503
页数:29
相关论文
共 50 条
  • [1] Sequential latent Dirichlet allocation
    Lan Du
    Wray Buntine
    Huidong Jin
    Changyou Chen
    Knowledge and Information Systems, 2012, 31 : 475 - 503
  • [2] Sequential activity profiling:: Latent dirichlet allocation of Markov chains
    Girolami, M
    Kabán, A
    DATA MINING AND KNOWLEDGE DISCOVERY, 2005, 10 (03) : 175 - 196
  • [3] Discriminative sequential association latent dirichlet allocation for visual recognition
    Yao, Ting-Ting
    Xie, Zhao
    Gao, Jun
    Wang, Chi
    PATTERN ANALYSIS AND APPLICATIONS, 2016, 19 (03) : 719 - 730
  • [4] Discriminative sequential association latent dirichlet allocation for visual recognition
    Ting-Ting Yao
    Zhao Xie
    Jun Gao
    Chi Wang
    Pattern Analysis and Applications, 2016, 19 : 719 - 730
  • [5] Sequential Activity Profiling: Latent Dirichlet Allocation of Markov Chains
    Mark Girolami
    Ata Kabán
    Data Mining and Knowledge Discovery, 2005, 10 : 175 - 196
  • [6] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [7] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 601 - 608
  • [8] Collective Latent Dirichlet Allocation
    Shen, Zhi-Yong
    Sun, Jun
    Shen, Yi-Dong
    ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 1019 - 1024
  • [9] The Security of Latent Dirichlet Allocation
    Mei, Shike
    Zhu, Xiaojin
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 38, 2015, 38 : 681 - 689
  • [10] Parallel Latent Dirichlet Allocation on GPUs
    Moon, Gordon E.
    Nisa, Israt
    Sukumaran-Rajam, Aravind
    Bandyopadhyay, Bortik
    Parthasarathy, Srinivasan
    Sadayappan, P.
    COMPUTATIONAL SCIENCE - ICCS 2018, PT II, 2018, 10861 : 259 - 272