Sequential latent Dirichlet allocation

被引:41
|
作者
Du, Lan [1 ]
Buntine, Wray [1 ]
Jin, Huidong [2 ]
Chen, Changyou [1 ]
机构
[1] Natl ICT Australia, Canberra, ACT 2601, Australia
[2] CSIRO Math Informat & Stat, Canberra, ACT, Australia
基金
澳大利亚研究理事会;
关键词
Latent Dirichlet allocation; Poisson-Dirichlet process; Collapsed Gibbs sampler; Topic model; Document structure; MODEL;
D O I
10.1007/s10115-011-0425-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Understanding how topics within a document evolve over the structure of the document is an interesting and potentially important problem in exploratory and predictive text analytics. In this article, we address this problem by presenting a novel variant of latent Dirichlet allocation (LDA): Sequential LDA (SeqLDA). This variant directly considers the underlying sequential structure, i.e. a document consists of multiple segments (e.g. chapters, paragraphs), each of which is correlated to its antecedent and subsequent segments. Such progressive sequential dependency is captured by using the hierarchical two-parameter Poisson-Dirichlet process (HPDP). We develop an efficient collapsed Gibbs sampling algorithm to sample from the posterior of the SeqLDA based on the HPDP. Our experimental results on patent documents show that by considering the sequential structure within a document, our SeqLDA model has a higher fidelity over LDA in terms of perplexity (a standard measure of dictionary-based compressibility). The SeqLDA model also yields a nicer sequential topic structure than LDA, as we show in experiments on several books such as Melville's 'Moby Dick'.
引用
收藏
页码:475 / 503
页数:29
相关论文
共 50 条
  • [31] Evaluation of Stability and Similarity of Latent Dirichlet Allocation
    Tang, Jun
    Huo, Ruilong
    Yao, Jiali
    2013 FOURTH WORLD CONGRESS ON SOFTWARE ENGINEERING (WCSE), 2013, : 78 - 83
  • [32] Weighted Latent Dirichlet Allocation for Cluster Ensemble
    Wang, Hongjun
    Li, Zhishu
    Cheng, Yang
    SECOND INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTING: WGEC 2008, PROCEEDINGS, 2008, : 437 - 441
  • [33] Indexing by Latent Dirichlet Allocation and an Ensemble Model
    Wang, Yanshan
    Lee, Jae-Sung
    Choi, In-Chan
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2016, 67 (07) : 1736 - 1750
  • [34] Latent Dirichlet Allocation modeling of environmental microbiomes
    Kim, Anastasiia
    Sevanto, Sanna
    Moore, Eric R.
    Lubbers, Nicholas
    PLOS COMPUTATIONAL BIOLOGY, 2023, 19 (06)
  • [35] Unsupervised Object Localization with Latent Dirichlet Allocation
    Yang, Tong-feng
    Ma, Jun
    2013 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (ICCSAI 2013), 2013, : 230 - 234
  • [36] Latent Dirichlet Allocation for Internet Price War
    Li, Chenchen
    Yan, Xiang
    Deng, Xiaotie
    Qi, Yuan
    Chu, Wei
    Song, Le
    Qiao, Junlong
    He, Jianshan
    Xiong, Junwu
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 639 - 646
  • [37] Multi-dependent Latent Dirichlet Allocation
    Hsin, Wei-Cheng
    Huang, Jen-Wei
    2017 CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2017, : 154 - 159
  • [38] Author Identification Using Latent Dirichlet Allocation
    Calvo, Hiram
    Hernandez-Castaneda, Angel
    Garcia-Flores, Jorge
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2017, PT II, 2018, 10762 : 303 - 312
  • [39] Unsupervised Feature Selection for Latent Dirichlet Allocation
    Xu Weiran
    Du Gang
    Chen Guang
    Guo Jun
    Yang Jie
    CHINA COMMUNICATIONS, 2011, 8 (05) : 54 - 62
  • [40] Latent Dirichlet Allocation Based Multilevel Classification
    Bhutada, Sunil
    Balaram, V. V. S. S. S.
    Bulusu, Vishnu Vardhan
    2014 INTERNATIONAL CONFERENCE ON CONTROL, INSTRUMENTATION, COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICCICCT), 2014, : 1020 - 1024