Topic modeling for large-scale text data

被引:7
|
作者
Li, Xi-ming [1 ,2 ]
Ouyang, Ji-hong [1 ,2 ]
Lu, You [1 ,2 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China
[2] Jilin Univ, MOE Key Lab Symbol Computat & Knowledge Engn, Changchun 130012, Peoples R China
基金
中国国家自然科学基金;
关键词
Latent Dirichlet allocation (LDA); Topic modeling; Online learning; Moving average;
D O I
10.1631/FITEE.1400352
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper develops a novel online algorithm, namely moving average stochastic variational inference (MASVI), which applies the results obtained by previous iterations to smooth out noisy natural gradients. We analyze the convergence property of the proposed algorithm and conduct a set of experiments on two large-scale collections that contain millions of documents. Experimental results indicate that in contrast to algorithms named 'stochastic variational inference' and 'SGRLD', our algorithm achieves a faster convergence rate and better performance.
引用
收藏
页码:457 / 465
页数:9
相关论文
共 50 条
  • [21] A distributed incremental information acquisition model for large-scale text data
    Sun, Shengtao
    Gong, Jibing
    Zomaya, Albert Y.
    Wu, Aizhi
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 1): : 2383 - 2394
  • [22] See, caption, cluster: Large-scale image analysis using captioning and topic modeling
    Kang, Kyeongpil
    Jin, Kyohoon
    Jang, Soojin
    Choo, Jaegul
    Kim, Youngbin
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 237
  • [23] LARGE SCALE TOPIC MODELING MADE PRACTICAL
    Wahlgreen, Bjarne Orum
    Hansen, Lars Kai
    2011 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2011,
  • [24] A data parallel approach for large-scale Gaussian process modeling
    Choudhury, A
    Nair, PB
    Keane, AJ
    PROCEEDINGS OF THE SECOND SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2002, : 95 - 111
  • [25] Large-scale CyTOF data modeling of leukemia patient cohorts
    Kong, Garth
    Vu, Tania
    Lind, Evan
    Nikolova, Olga H.
    CANCER RESEARCH, 2024, 84 (06)
  • [26] Cognitive Modeling With Representations From Large-Scale Digital Data
    Bhatia, Sudeep
    Aka, Ada
    CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE, 2022, 31 (03) : 207 - 214
  • [27] Automated Protocol for Large-Scale Modeling of Gene Expression Data
    Hall, Michelle Lynn
    Calkins, David
    Sherman, Woody
    Journal of Chemical Information and Modeling, 2016, 56 (11) : 2216 - 2224
  • [28] Large-Scale Modeling of Critical Telecommunications Facilities and Data Centers
    Bodi, Frank
    INTELEC 08 - 30TH INTERNATIONAL TELECOMMUNICATIONS ENERGY, VOLS 1 AND 2, 2008, : 229 - 236
  • [29] Predictive modeling of everyday behavior from large-scale data
    Motomura, Yoichi
    Synthesiology, 2009, 2 (01): : 1 - 12
  • [30] Intergenerational Family Storytelling and Modeling with Large-Scale Data Sets
    Axelrod, Daryl B.
    Kahn, Jennifer
    PROCEEDINGS OF ACM INTERACTION DESIGN AND CHILDREN (IDC 2019), 2019, : 352 - 360