A Spectral Algorithm for Latent Dirichlet Allocation

被引:25
|
作者
Anandkumar, Anima [1 ]
Foster, Dean P. [2 ]
Hsu, Daniel [3 ]
Kakade, Sham M. [4 ]
Liu, Yi-Kai [5 ]
机构
[1] Univ Calif Irvine, Irvine, CA USA
[2] Yahoo Labs, New York, NY USA
[3] Columbia Univ, New York, NY 10027 USA
[4] Microsoft Res, Cambridge, MA USA
[5] NIST, Gaithersburg, MD 20899 USA
关键词
Topic models; Mixture models; Method of moments; Latent Dirichlet allocation; LEARNING MIXTURES; DECOMPOSITIONS; EM;
D O I
10.1007/s00453-014-9909-1
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Topic modeling is a generalization of clustering that posits that observations (words in a document) are generated by multiple latent factors (topics), as opposed to just one. The increased representational power comes at the cost of a more challenging unsupervised learning problem for estimating the topic-word distributions when only words are observed, and the topics are hidden. This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of multi-view models and topic models, including latent Dirichlet allocation (LDA). For LDA, the procedure correctly recovers both the topic-word distributions and the parameters of the Dirichlet prior over the topic mixtures, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method is based on an efficiently computable orthogonal tensor decomposition of low-order moments.
引用
收藏
页码:193 / 214
页数:22
相关论文
共 50 条
  • [1] A Spectral Algorithm for Latent Dirichlet Allocation
    Anima Anandkumar
    Dean P. Foster
    Daniel Hsu
    Sham M. Kakade
    Yi-Kai Liu
    Algorithmica, 2015, 72 : 193 - 214
  • [2] An end-to-end Differentially Private Latent Dirichlet Allocation Using a Spectral Algorithm
    DeCarolis, Christopher
    Ram, Mukul
    Esmaeili, Seyed
    Wang, Yu-Xiang
    Huang, Furong
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [3] An end-to-end Differentially Private Latent Dirichlet Allocation Using a Spectral Algorithm
    DeCarolis, Christopher
    Ram, Mukul
    Esmaeili, Seyed
    Wang, Yu-Xiang
    Huang, Furong
    25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [4] Discovering Latent Topics by Gaussian Latent Dirichlet Allocation and Spectral Clustering
    Yuan, Bo
    Gao, Xinbo
    Niu, Zhenxing
    Tian, Qi
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (01)
  • [5] A PERCEPTUAL HASHING ALGORITHM USING LATENT DIRICHLET ALLOCATION
    Vretos, Nicholas
    Nikolaidis, Nikos
    Pitas, Ioannis
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 362 - 365
  • [6] An Online Inference Algorithm for Labeled Latent Dirichlet Allocation
    Zhou, Qiang
    Huang, Heyan
    Mao, Xian-Ling
    WEB TECHNOLOGIES AND APPLICATIONS (APWEB 2015), 2015, 9313 : 17 - 28
  • [7] A Fast Algorithm for Posterior Inference with Latent Dirichlet Allocation
    Bui Thi-Thanh-Xuan
    Vu Van-Tu
    Takasu, Atsuhiro
    Khoat Than
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2018, PT II, 2018, 10752 : 137 - 146
  • [8] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [9] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 601 - 608
  • [10] Sequential latent Dirichlet allocation
    Du, Lan
    Buntine, Wray
    Jin, Huidong
    Chen, Changyou
    KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 31 (03) : 475 - 503