Topic modeling for large-scale text data

被引:7
|
作者
Li, Xi-ming [1 ,2 ]
Ouyang, Ji-hong [1 ,2 ]
Lu, You [1 ,2 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China
[2] Jilin Univ, MOE Key Lab Symbol Computat & Knowledge Engn, Changchun 130012, Peoples R China
基金
中国国家自然科学基金;
关键词
Latent Dirichlet allocation (LDA); Topic modeling; Online learning; Moving average;
D O I
10.1631/FITEE.1400352
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper develops a novel online algorithm, namely moving average stochastic variational inference (MASVI), which applies the results obtained by previous iterations to smooth out noisy natural gradients. We analyze the convergence property of the proposed algorithm and conduct a set of experiments on two large-scale collections that contain millions of documents. Experimental results indicate that in contrast to algorithms named 'stochastic variational inference' and 'SGRLD', our algorithm achieves a faster convergence rate and better performance.
引用
收藏
页码:457 / 465
页数:9
相关论文
共 50 条
  • [1] Topic modeling for large-scale text data
    Xi-ming Li
    Ji-hong Ouyang
    You Lu
    Frontiers of Information Technology & Electronic Engineering, 2015, 16 : 457 - 465
  • [2] A Large-scale Text Analysis with Word Embeddings and Topic Modeling
    Choi, Won-Joon
    Kim, Euhee
    JOURNAL OF COGNITIVE SCIENCE, 2019, 20 (01) : 147 - 187
  • [3] Topic Modeling of Large Scale Social Text
    Wang, Jia-wen
    Yang, Qun
    2ND INTERNATIONAL CONFERENCE ON COMMUNICATIONS, INFORMATION MANAGEMENT AND NETWORK SECURITY (CIMNS 2017), 2017, : 237 - 242
  • [4] Topic Modeling Techniques for Text Mining over a Large-Scale Scientific and Biomedical Text Corpus
    Avasthi S.
    Chauhan R.
    Acharjya D.P.
    International Journal of Ambient Computing and Intelligence, 2022, 13 (01)
  • [5] Empath: Understanding Topic Signals in Large-Scale Text
    Fast, Ethan
    Chen, Binbin
    Bernstein, Michael S.
    34TH ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2016, 2016, : 4647 - 4657
  • [6] A Distributed Topic Model for Large-Scale Streaming Text
    Li, Yicong
    Feng, Dawei
    Lu, Menglong
    Li, Dongsheng
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2019, PT II, 2019, 11776 : 37 - 48
  • [7] LDA*: A Robust and Large-scale Topic Modeling System
    Yu, Lele
    Zhang, Ce
    Shao, Yingxia
    Cui, Bin
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (11): : 1406 - 1417
  • [8] Large-scale Analysis of Free-Text Data for Mental Health Surveillance with Topic Modelling
    Gu, Yang
    Leroy, Gondy
    AMCIS 2020 PROCEEDINGS, 2020,
  • [9] Large-Scale High-Precision Topic Modeling on Twitter
    Yang, Shuang
    Kolcz, Alek
    Schlaikjer, Andy
    Gupta, Pankaj
    PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 1907 - 1916
  • [10] Topic modeling and improvement of image representation for large-scale image retrieval
    Nguyen Anh Tu
    Dong-Luong Dinh
    Rasel, Mostofa Kamal
    Lee, Young-Koo
    INFORMATION SCIENCES, 2016, 366 : 99 - 120