A Spectral Algorithm for Latent Dirichlet Allocation

被引:25
|
作者
Anandkumar, Anima [1 ]
Foster, Dean P. [2 ]
Hsu, Daniel [3 ]
Kakade, Sham M. [4 ]
Liu, Yi-Kai [5 ]
机构
[1] Univ Calif Irvine, Irvine, CA USA
[2] Yahoo Labs, New York, NY USA
[3] Columbia Univ, New York, NY 10027 USA
[4] Microsoft Res, Cambridge, MA USA
[5] NIST, Gaithersburg, MD 20899 USA
关键词
Topic models; Mixture models; Method of moments; Latent Dirichlet allocation; LEARNING MIXTURES; DECOMPOSITIONS; EM;
D O I
10.1007/s00453-014-9909-1
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Topic modeling is a generalization of clustering that posits that observations (words in a document) are generated by multiple latent factors (topics), as opposed to just one. The increased representational power comes at the cost of a more challenging unsupervised learning problem for estimating the topic-word distributions when only words are observed, and the topics are hidden. This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of multi-view models and topic models, including latent Dirichlet allocation (LDA). For LDA, the procedure correctly recovers both the topic-word distributions and the parameters of the Dirichlet prior over the topic mixtures, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method is based on an efficiently computable orthogonal tensor decomposition of low-order moments.
引用
收藏
页码:193 / 214
页数:22
相关论文
共 50 条
  • [31] Exploring Symmetrical and Asymmetrical Dirichlet Priors for Latent Dirichlet Allocation
    Syed, Shaheen
    Spruit, Marco
    INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2018, 12 (03) : 399 - 423
  • [32] Language Model Adaptation Using Latent Dirichlet Allocation and an Efficient Topic Inference Algorithm
    Heidel, Aaron
    Chang, Hung-an
    Lee, Lin-shan
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1145 - +
  • [33] A Document Clustering Algorithm Based on Semi-constrained Hierarchical Latent Dirichlet Allocation
    Xu, Jungang
    Zhou, Shilong
    Qiu, Lin
    Liu, Shengyuan
    Li, Pengfei
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2014, 2014, 8793 : 49 - 60
  • [34] Latent Dirichlet Allocation (LDA) Model and kNN Algorithm to Classify Research Project Selection
    Saf'ie, M. A.
    Utami, E.
    Fatta, H. A.
    INTERNATIONAL CONFERENCE ON ADVANCED MATERIALS FOR BETTER FUTURE 2017, 2018, 333
  • [35] A SPEECH EMOTION RECOGNITION FRAMEWORK BASED ON LATENT DIRICHLET ALLOCATION: ALGORITHM AND FPGA IMPLEMENTATION
    Shah, Mohit
    Miao, Lifeng
    Chakrabarti, Chaitali
    Spanias, Andreas
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 2553 - 2557
  • [36] Joint Latent Dirichlet Allocation for Social Tags
    Yao, Jiangchao
    Wang, Yanfeng
    Zhang, Ya
    Sun, Jun
    Zhou, Jun
    IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (01) : 224 - 237
  • [37] Bug localization using latent Dirichlet allocation
    Lukins, Stacy K.
    Kraft, Nicholas A.
    Etzkorn, Letha H.
    INFORMATION AND SOFTWARE TECHNOLOGY, 2010, 52 (09) : 972 - 990
  • [38] BiModal Latent Dirichlet Allocation for Text and Image
    Liao, Xiaofeng
    Jiang, Qingshan
    Zhang, Wei
    Zhang, Kai
    2014 4TH IEEE INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2014, : 736 - 739
  • [39] Latent Dirichlet Allocation Models for Image Classification
    Rasiwasia, Nikhil
    Vasconcelos, Nuno
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (11) : 2665 - 2679
  • [40] Nonstationary Latent Dirichlet Allocation for Speech Recognition
    Chueh, Chuang-Hua
    Chien, Jen-Tzung
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 356 - 359