Sequential latent Dirichlet allocation

被引:41
|
作者
Du, Lan [1 ]
Buntine, Wray [1 ]
Jin, Huidong [2 ]
Chen, Changyou [1 ]
机构
[1] Natl ICT Australia, Canberra, ACT 2601, Australia
[2] CSIRO Math Informat & Stat, Canberra, ACT, Australia
基金
澳大利亚研究理事会;
关键词
Latent Dirichlet allocation; Poisson-Dirichlet process; Collapsed Gibbs sampler; Topic model; Document structure; MODEL;
D O I
10.1007/s10115-011-0425-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Understanding how topics within a document evolve over the structure of the document is an interesting and potentially important problem in exploratory and predictive text analytics. In this article, we address this problem by presenting a novel variant of latent Dirichlet allocation (LDA): Sequential LDA (SeqLDA). This variant directly considers the underlying sequential structure, i.e. a document consists of multiple segments (e.g. chapters, paragraphs), each of which is correlated to its antecedent and subsequent segments. Such progressive sequential dependency is captured by using the hierarchical two-parameter Poisson-Dirichlet process (HPDP). We develop an efficient collapsed Gibbs sampling algorithm to sample from the posterior of the SeqLDA based on the HPDP. Our experimental results on patent documents show that by considering the sequential structure within a document, our SeqLDA model has a higher fidelity over LDA in terms of perplexity (a standard measure of dictionary-based compressibility). The SeqLDA model also yields a nicer sequential topic structure than LDA, as we show in experiments on several books such as Melville's 'Moby Dick'.
引用
收藏
页码:475 / 503
页数:29
相关论文
共 50 条
  • [41] Exploit latent Dirichlet allocation for collaborative filtering
    Zhoujun Li
    Haijun Zhang
    Senzhang Wang
    Feiran Huang
    Zhenping Li
    Jianshe Zhou
    Frontiers of Computer Science, 2018, 12 : 571 - 581
  • [42] Latent Dirichlet Allocation for Automatic Document Categorization
    Biro, Istvan
    Szabo, Jacint
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2009, 5782 : 430 - 441
  • [43] The Sensitivity of Latent Dirichlet Allocation for Information Retrieval
    Park, Laurence A. F.
    Ramamohanarao, Kotagiri
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2009, 5782 : 176 - 188
  • [44] Exploit latent Dirichlet allocation for collaborative filtering
    Li, Zhoujun
    Zhang, Haijun
    Wang, Senzhang
    Huang, Feiran
    Li, Zhenping
    Zhou, Jianshe
    FRONTIERS OF COMPUTER SCIENCE, 2018, 12 (03) : 571 - 581
  • [45] Clustered Latent Dirichlet Allocation for Scientific Discovery
    Gropp, Christopher
    Herzog, Alexander
    Safro, Ilya
    Wilson, Paul W.
    Apon, Amy W.
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 4503 - 4511
  • [46] Robust Initialization for Learning Latent Dirichlet Allocation
    Lovato, Pietro
    Bicego, Manuele
    Murino, Vittorio
    Perina, Alessandro
    SIMILARITY-BASED PATTERN RECOGNITION, SIMBAD 2015, 2015, 9370 : 117 - 132
  • [47] A Latent Dirichlet Allocation method for Selectional Preferences
    Ritter, Alan
    Mausam
    Etzioni, Oren
    ACL 2010: 48TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2010, : 424 - 434
  • [48] Scalable Hyperparameter Selection for Latent Dirichlet Allocation
    Xia, Wei
    Doss, Hani
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2020, 29 (04) : 875 - 895
  • [49] Comparison of Estimation Algorithms for Latent Dirichlet Allocation
    Mardones-Segovia, Constanza
    Choi, Hye-Jeong
    Hong, Minju
    Wheeler, Jordan M.
    Cohen, Allan S.
    QUANTITATIVE PSYCHOLOGY, 2022, 393 : 27 - 37
  • [50] Tweet Sentiment Analysis with Latent Dirichlet Allocation
    Ohmura, Masahiro
    Kakusho, Koh
    Okadome, Takeshi
    INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2014, 4 (03) : 66 - 79