Distributed Latent Dirichlet Allocation on Streams

被引:1
|
作者
Guo, Yunyan [1 ]
Li, Jianzhong [1 ,2 ]
机构
[1] Harbin Inst Technol, 92 Xidazhi St, Harbin 15001, Heilongjiang, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Distributed streams; learning system; variational inference; VARIATIONAL INFERENCE; OPTIMIZATION; BURSTY;
D O I
10.1145/3451528
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Latent Dirichlet Allocation (LDA) has been widely used for topic modeling, with applications spanning various areas such as natural language processing and information retrieval. While LDA on small and static datasets has been extensively studied, several real-world challenges are posed in practical scenarios where datasets are often huge and are gathered in a streaming fashion. As the state-of-the-art LDA algorithm on streams, Streaming Variational Bayes (SVB) introduced Bayesian updating to provide a streaming procedure. However, the utility of SVB is limited in applications since it ignored three challenges of processing real-world streams: topic evolution, data turbulence, and real-time inference. In this article, we propose a novel distributed LDA algorithm-referred to as StreamFed-LDA-to deal with challenges on streams. For topic modeling of streaming data, the ability to capture evolving topics is essential for practical online inference. To achieve this goal, StreamFed-LDA is based on a specialized framework that supports lifelong (continual) learning of evolving topics. On the other hand, data turbulence is commonly present in streams due to real-life events. In that case, the design of StreamFed-LDA allows the model to learn new characteristics fromthe most recent data while maintaining the historical information. On massive streaming data, it is difficult and crucial to provide real-time inference results. To increase the throughput and reduce the latency, StreamFed-LDA introduces additional techniques that substantially reduce both computation and communication costs in distributed systems. Experiments on four real-world datasets show that the proposed framework achieves significantly better performance of online inference compared with the baselines. At the same time, StreamFed-LDA also reduces the latency by orders of magnitudes in real-world datasets.
引用
收藏
页数:20
相关论文
共 50 条
  • [21] Topic Selection in Latent Dirichlet Allocation
    Wang, Biao
    Liu, Zelong
    Li, Maozhen
    Liu, Yang
    Qi, Man
    2014 11TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2014, : 756 - 760
  • [22] Crowd labeling latent Dirichlet allocation
    Pion-Tonachini, Luca
    Makeig, Scott
    Kreutz-Delgado, Ken
    KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 53 (03) : 749 - 765
  • [23] The Auto Annotation Latent Dirichlet Allocation
    Xiang, Yingzhuo
    Yang, Dongmei
    Yan, Jikun
    PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INFORMATION SCIENCES, MACHINERY, MATERIALS AND ENERGY (ICISMME 2015), 2015, 126 : 1908 - 1911
  • [24] Exploring Symmetrical and Asymmetrical Dirichlet Priors for Latent Dirichlet Allocation
    Syed, Shaheen
    Spruit, Marco
    INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2018, 12 (03) : 399 - 423
  • [25] BiModal Latent Dirichlet Allocation for Text and Image
    Liao, Xiaofeng
    Jiang, Qingshan
    Zhang, Wei
    Zhang, Kai
    2014 4TH IEEE INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2014, : 736 - 739
  • [26] Joint Latent Dirichlet Allocation for Social Tags
    Yao, Jiangchao
    Wang, Yanfeng
    Zhang, Ya
    Sun, Jun
    Zhou, Jun
    IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (01) : 224 - 237
  • [27] Bug localization using latent Dirichlet allocation
    Lukins, Stacy K.
    Kraft, Nicholas A.
    Etzkorn, Letha H.
    INFORMATION AND SOFTWARE TECHNOLOGY, 2010, 52 (09) : 972 - 990
  • [28] Latent Dirichlet Allocation Models for Image Classification
    Rasiwasia, Nikhil
    Vasconcelos, Nuno
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (11) : 2665 - 2679
  • [29] Nonstationary Latent Dirichlet Allocation for Speech Recognition
    Chueh, Chuang-Hua
    Chien, Jen-Tzung
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 356 - 359
  • [30] Weighted Latent Dirichlet Allocation for Cluster Ensemble
    Wang, Hongjun
    Li, Zhishu
    Cheng, Yang
    SECOND INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTING: WGEC 2008, PROCEEDINGS, 2008, : 437 - 441