Topic modeling for large-scale text data

被引:7
|
作者
Li, Xi-ming [1 ,2 ]
Ouyang, Ji-hong [1 ,2 ]
Lu, You [1 ,2 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China
[2] Jilin Univ, MOE Key Lab Symbol Computat & Knowledge Engn, Changchun 130012, Peoples R China
基金
中国国家自然科学基金;
关键词
Latent Dirichlet allocation (LDA); Topic modeling; Online learning; Moving average;
D O I
10.1631/FITEE.1400352
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper develops a novel online algorithm, namely moving average stochastic variational inference (MASVI), which applies the results obtained by previous iterations to smooth out noisy natural gradients. We analyze the convergence property of the proposed algorithm and conduct a set of experiments on two large-scale collections that contain millions of documents. Experimental results indicate that in contrast to algorithms named 'stochastic variational inference' and 'SGRLD', our algorithm achieves a faster convergence rate and better performance.
引用
收藏
页码:457 / 465
页数:9
相关论文
共 50 条
  • [31] Large-Scale Evaluation of Topic Models and Dimensionality Reduction Methods for 2D Text Spatialization
    Atzberger, Daniel
    Cech, Tim
    Trapp, Matthias
    Richter, Rico
    Scheibel, Willy
    Doellner, Jurgen
    Schreck, Tobias
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (01) : 902 - 912
  • [32] Large-Scale Modeling of Sparse Protein Kinase Activity Data
    Luukkonen, Sohvi
    Meijer, Erik
    Tricarico, Giovanni A.
    Hofmans, Johan
    Stouten, Pieter F. W.
    van Westen, Gerard J. P.
    Lenselink, Eelke B.
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2023, 63 (12) : 3688 - 3696
  • [33] Feature selection for text data via topic modeling
    Jang, Woosol
    Kim, Ye Eun
    Son, Won
    KOREAN JOURNAL OF APPLIED STATISTICS, 2022, 35 (06) : 739 - 754
  • [34] ZenLDA: Large-Scale Topic Model Training on Distributed Data-Parallel Platform
    Bo Zhao
    Hucheng Zhou
    Guoqiang Li
    Yihua Huang
    Big Data Mining and Analytics, 2018, (01) : 57 - 74
  • [35] A Large-Scale Data Collection Scheme for Distributed Topic-Based Pub/Sub
    Teranishi, Yuuichi
    Kawakami, Tomoya
    Ishi, Yoshimasa
    Yoshihisa, Tomoki
    2017 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS (ICNC), 2016, : 230 - 236
  • [36] ZenLDA: Large-Scale Topic Model Training on Distributed Data-Parallel Platform
    Zhao, Bo
    Zhou, Hucheng
    Li, Guoqiang
    Huang, Yihua
    BIG DATA MINING AND ANALYTICS, 2018, 1 (01): : 57 - 74
  • [37] LARGE-SCALE URBAN MODELING
    HELWEG, OJ
    JOURNAL OF THE URBAN PLANNING & DEVELOPMENT DIVISION-ASCE, 1979, 105 (02): : 89 - 101
  • [38] LARGE-SCALE FLOODPLAIN MODELING
    GEE, DM
    ANDERSON, MG
    BAIRD, L
    EARTH SURFACE PROCESSES AND LANDFORMS, 1990, 15 (06) : 513 - 523
  • [39] LARGE-SCALE URBAN MODELING
    GRIGG, NS
    JOURNAL OF THE URBAN PLANNING & DEVELOPMENT DIVISION-ASCE, 1980, 106 (01): : 106 - 107
  • [40] Large-scale modeling of cancer signaling: Mechanistic modeling meets Big Data
    Froehlich, F.
    Shadrin, A.
    Kessler, T.
    Wierling, C.
    Heinig, M.
    Theis, F. J.
    Lange, B.
    Lehrach, H.
    Hasenauer, J.
    EUROPEAN JOURNAL OF CANCER, 2016, 69 : S44 - S44