Distributed Latent Dirichlet Allocation on Streams

被引:1
|
作者
Guo, Yunyan [1 ]
Li, Jianzhong [1 ,2 ]
机构
[1] Harbin Inst Technol, 92 Xidazhi St, Harbin 15001, Heilongjiang, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Distributed streams; learning system; variational inference; VARIATIONAL INFERENCE; OPTIMIZATION; BURSTY;
D O I
10.1145/3451528
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Latent Dirichlet Allocation (LDA) has been widely used for topic modeling, with applications spanning various areas such as natural language processing and information retrieval. While LDA on small and static datasets has been extensively studied, several real-world challenges are posed in practical scenarios where datasets are often huge and are gathered in a streaming fashion. As the state-of-the-art LDA algorithm on streams, Streaming Variational Bayes (SVB) introduced Bayesian updating to provide a streaming procedure. However, the utility of SVB is limited in applications since it ignored three challenges of processing real-world streams: topic evolution, data turbulence, and real-time inference. In this article, we propose a novel distributed LDA algorithm-referred to as StreamFed-LDA-to deal with challenges on streams. For topic modeling of streaming data, the ability to capture evolving topics is essential for practical online inference. To achieve this goal, StreamFed-LDA is based on a specialized framework that supports lifelong (continual) learning of evolving topics. On the other hand, data turbulence is commonly present in streams due to real-life events. In that case, the design of StreamFed-LDA allows the model to learn new characteristics fromthe most recent data while maintaining the historical information. On massive streaming data, it is difficult and crucial to provide real-time inference results. To increase the throughput and reduce the latency, StreamFed-LDA introduces additional techniques that substantially reduce both computation and communication costs in distributed systems. Experiments on four real-world datasets show that the proposed framework achieves significantly better performance of online inference compared with the baselines. At the same time, StreamFed-LDA also reduces the latency by orders of magnitudes in real-world datasets.
引用
收藏
页数:20
相关论文
共 50 条
  • [41] Exploit latent Dirichlet allocation for collaborative filtering
    Zhoujun Li
    Haijun Zhang
    Senzhang Wang
    Feiran Huang
    Zhenping Li
    Jianshe Zhou
    Frontiers of Computer Science, 2018, 12 : 571 - 581
  • [42] Research progress and hot topics of distributed photovoltaic: Bibliometric analysis and Latent Dirichlet Allocation model
    Li, Na
    Lv, Tao
    Wang, Xingyu
    Meng, Xiangyun
    Xu, Jie
    Guo, Yuxia
    ENERGY AND BUILDINGS, 2025, 327
  • [43] The Sensitivity of Latent Dirichlet Allocation for Information Retrieval
    Park, Laurence A. F.
    Ramamohanarao, Kotagiri
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2009, 5782 : 176 - 188
  • [44] Exploit latent Dirichlet allocation for collaborative filtering
    Li, Zhoujun
    Zhang, Haijun
    Wang, Senzhang
    Huang, Feiran
    Li, Zhenping
    Zhou, Jianshe
    FRONTIERS OF COMPUTER SCIENCE, 2018, 12 (03) : 571 - 581
  • [45] Clustered Latent Dirichlet Allocation for Scientific Discovery
    Gropp, Christopher
    Herzog, Alexander
    Safro, Ilya
    Wilson, Paul W.
    Apon, Amy W.
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 4503 - 4511
  • [46] Robust Initialization for Learning Latent Dirichlet Allocation
    Lovato, Pietro
    Bicego, Manuele
    Murino, Vittorio
    Perina, Alessandro
    SIMILARITY-BASED PATTERN RECOGNITION, SIMBAD 2015, 2015, 9370 : 117 - 132
  • [47] A Latent Dirichlet Allocation method for Selectional Preferences
    Ritter, Alan
    Mausam
    Etzioni, Oren
    ACL 2010: 48TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2010, : 424 - 434
  • [48] Scalable Hyperparameter Selection for Latent Dirichlet Allocation
    Xia, Wei
    Doss, Hani
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2020, 29 (04) : 875 - 895
  • [49] Enriched Latent Dirichlet Allocation for Sentiment Analysis
    Osmani, Amjad
    Mohasefi, Jamshid Bagherzadeh
    Gharehchopogh, Farhad Soleimanian
    EXPERT SYSTEMS, 2020, 37 (04)
  • [50] Tweet Sentiment Analysis with Latent Dirichlet Allocation
    Ohmura, Masahiro
    Kakusho, Koh
    Okadome, Takeshi
    INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2014, 4 (03) : 66 - 79