A Spectral Algorithm for Latent Dirichlet Allocation

被引:25
|
作者
Anandkumar, Anima [1 ]
Foster, Dean P. [2 ]
Hsu, Daniel [3 ]
Kakade, Sham M. [4 ]
Liu, Yi-Kai [5 ]
机构
[1] Univ Calif Irvine, Irvine, CA USA
[2] Yahoo Labs, New York, NY USA
[3] Columbia Univ, New York, NY 10027 USA
[4] Microsoft Res, Cambridge, MA USA
[5] NIST, Gaithersburg, MD 20899 USA
关键词
Topic models; Mixture models; Method of moments; Latent Dirichlet allocation; LEARNING MIXTURES; DECOMPOSITIONS; EM;
D O I
10.1007/s00453-014-9909-1
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Topic modeling is a generalization of clustering that posits that observations (words in a document) are generated by multiple latent factors (topics), as opposed to just one. The increased representational power comes at the cost of a more challenging unsupervised learning problem for estimating the topic-word distributions when only words are observed, and the topics are hidden. This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of multi-view models and topic models, including latent Dirichlet allocation (LDA). For LDA, the procedure correctly recovers both the topic-word distributions and the parameters of the Dirichlet prior over the topic mixtures, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method is based on an efficiently computable orthogonal tensor decomposition of low-order moments.
引用
收藏
页码:193 / 214
页数:22
相关论文
共 50 条
  • [21] Selecting Priors for Latent Dirichlet Allocation
    Syed, Shaheen
    Spruit, Marco
    2018 IEEE 12TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2018, : 194 - 202
  • [22] Latent IBP Compound Dirichlet Allocation
    Archambeau, Cedric
    Lakshminarayanan, Balaji
    Bouchard, Guillaume
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (02) : 321 - 333
  • [23] Crowd labeling latent Dirichlet allocation
    Luca Pion-Tonachini
    Scott Makeig
    Ken Kreutz-Delgado
    Knowledge and Information Systems, 2017, 53 : 749 - 765
  • [24] INFERENCE IN SUPERVISED LATENT DIRICHLET ALLOCATION
    Lakshminarayanan, Balaji
    Raich, Raviv
    2011 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2011,
  • [25] Slow mixing for Latent Dirichlet Allocation
    Jonasson, Johan
    STATISTICS & PROBABILITY LETTERS, 2017, 129 : 96 - 100
  • [26] Labeled Phrase Latent Dirichlet Allocation
    Tang, Yi-Kun
    Mao, Xian-Ling
    Huang, Heyan
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2016, PT I, 2016, 10041 : 525 - 536
  • [27] Bibliometric Analysis of Latent Dirichlet Allocation
    Garg, Mohit
    Rangra, Priya
    DESIDOC JOURNAL OF LIBRARY & INFORMATION TECHNOLOGY, 2022, 42 (02): : 105 - 113
  • [28] Topic Selection in Latent Dirichlet Allocation
    Wang, Biao
    Liu, Zelong
    Li, Maozhen
    Liu, Yang
    Qi, Man
    2014 11TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2014, : 756 - 760
  • [29] Crowd labeling latent Dirichlet allocation
    Pion-Tonachini, Luca
    Makeig, Scott
    Kreutz-Delgado, Ken
    KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 53 (03) : 749 - 765
  • [30] The Auto Annotation Latent Dirichlet Allocation
    Xiang, Yingzhuo
    Yang, Dongmei
    Yan, Jikun
    PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INFORMATION SCIENCES, MACHINERY, MATERIALS AND ENERGY (ICISMME 2015), 2015, 126 : 1908 - 1911