Topic modeling for large-scale text data

被引:7
|
作者
Li, Xi-ming [1 ,2 ]
Ouyang, Ji-hong [1 ,2 ]
Lu, You [1 ,2 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China
[2] Jilin Univ, MOE Key Lab Symbol Computat & Knowledge Engn, Changchun 130012, Peoples R China
基金
中国国家自然科学基金;
关键词
Latent Dirichlet allocation (LDA); Topic modeling; Online learning; Moving average;
D O I
10.1631/FITEE.1400352
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper develops a novel online algorithm, namely moving average stochastic variational inference (MASVI), which applies the results obtained by previous iterations to smooth out noisy natural gradients. We analyze the convergence property of the proposed algorithm and conduct a set of experiments on two large-scale collections that contain millions of documents. Experimental results indicate that in contrast to algorithms named 'stochastic variational inference' and 'SGRLD', our algorithm achieves a faster convergence rate and better performance.
引用
收藏
页码:457 / 465
页数:9
相关论文
共 50 条
  • [41] LinkedIn Skills: Large-Scale Topic Extraction and Inference
    Bastian, Mathieu
    Hayes, Matthew
    Vaughan, William
    Shah, Sam
    Skomoroch, Peter
    Kim, Hyungjin
    PROCEEDINGS OF THE 8TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS'14), 2014, : 1 - 8
  • [42] Big data techniques: Large-scale text analysis for scientific and journalistic research
    Arcila-Calderon, Carlos
    Barbosa-Caro, Eduar
    Cabezuelo-Lorenzo, Francisco
    PROFESIONAL DE LA INFORMACION, 2016, 25 (04): : 623 - 631
  • [43] WIKITABLET: A Large-Scale Data-to-Text Dataset for GeneratingWikipedia Article Sections
    Chen, Mingda
    Wiseman, Sam
    Gimpel, Kevin
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 193 - 209
  • [44] A large-scale study based on topic modeling to determine the research interests and trends on computational thinking
    Ozcan Ozyurt
    Hacer Ozyurt
    Education and Information Technologies, 2023, 28 : 3557 - 3579
  • [45] TopicNets: Visual Analysis of Large Text Corpora with Topic Modeling
    Gretarsson, Brynjar
    O'Donovan, John
    Bostandjiev, Svetlin
    Hoellerer, Tobias
    Asuncion, Arthur
    Newman, David
    Smyth, Padhraic
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2012, 3 (02)
  • [46] A large-scale study based on topic modeling to determine the research interests and trends on computational thinking
    Ozyurt, Ozcan
    Ozyurt, Hacer
    EDUCATION AND INFORMATION TECHNOLOGIES, 2023, 28 (03) : 3557 - 3579
  • [47] Text Relevance Analysis Method over Large-Scale High-Dimensional Text Data Processing
    Wang, Ling
    Ding, Wei
    Zhou, Tie Hua
    Ryu, Keun Ho
    COMPUTATIONAL COLLECTIVE INTELLIGENCE (ICCCI 2015), PT I, 2015, 9329 : 371 - 379
  • [48] Large-Scale Text Mining of Biomedical Literature
    Ginter, Filip
    ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2013, (116): : 43 - 44
  • [49] Feature Extraction for Large-Scale Text Collections
    Gallagher, Luke
    Mallia, Antonio
    Culpepper, J. Shane
    Suel, Torsten
    Cambazoglu, B. Barla
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 3015 - 3022
  • [50] Large-Scale Text Similarity Computing with Spark
    Bao, Xiaoan
    Dai, Shichao
    Zhang, Na
    Yu, Chenghai
    INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2016, 9 (04): : 95 - 100