Distributed Latent Dirichlet Allocation on Streams

被引:1
|
作者
Guo, Yunyan [1 ]
Li, Jianzhong [1 ,2 ]
机构
[1] Harbin Inst Technol, 92 Xidazhi St, Harbin 15001, Heilongjiang, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Distributed streams; learning system; variational inference; VARIATIONAL INFERENCE; OPTIMIZATION; BURSTY;
D O I
10.1145/3451528
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Latent Dirichlet Allocation (LDA) has been widely used for topic modeling, with applications spanning various areas such as natural language processing and information retrieval. While LDA on small and static datasets has been extensively studied, several real-world challenges are posed in practical scenarios where datasets are often huge and are gathered in a streaming fashion. As the state-of-the-art LDA algorithm on streams, Streaming Variational Bayes (SVB) introduced Bayesian updating to provide a streaming procedure. However, the utility of SVB is limited in applications since it ignored three challenges of processing real-world streams: topic evolution, data turbulence, and real-time inference. In this article, we propose a novel distributed LDA algorithm-referred to as StreamFed-LDA-to deal with challenges on streams. For topic modeling of streaming data, the ability to capture evolving topics is essential for practical online inference. To achieve this goal, StreamFed-LDA is based on a specialized framework that supports lifelong (continual) learning of evolving topics. On the other hand, data turbulence is commonly present in streams due to real-life events. In that case, the design of StreamFed-LDA allows the model to learn new characteristics fromthe most recent data while maintaining the historical information. On massive streaming data, it is difficult and crucial to provide real-time inference results. To increase the throughput and reduce the latency, StreamFed-LDA introduces additional techniques that substantially reduce both computation and communication costs in distributed systems. Experiments on four real-world datasets show that the proposed framework achieves significantly better performance of online inference compared with the baselines. At the same time, StreamFed-LDA also reduces the latency by orders of magnitudes in real-world datasets.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] Indexing by Latent Dirichlet Allocation and an Ensemble Model
    Wang, Yanshan
    Lee, Jae-Sung
    Choi, In-Chan
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2016, 67 (07) : 1736 - 1750
  • [32] Evaluation of Stability and Similarity of Latent Dirichlet Allocation
    Tang, Jun
    Huo, Ruilong
    Yao, Jiali
    2013 FOURTH WORLD CONGRESS ON SOFTWARE ENGINEERING (WCSE), 2013, : 78 - 83
  • [33] Latent Dirichlet Allocation modeling of environmental microbiomes
    Kim, Anastasiia
    Sevanto, Sanna
    Moore, Eric R.
    Lubbers, Nicholas
    PLOS COMPUTATIONAL BIOLOGY, 2023, 19 (06)
  • [34] Unsupervised Object Localization with Latent Dirichlet Allocation
    Yang, Tong-feng
    Ma, Jun
    2013 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (ICCSAI 2013), 2013, : 230 - 234
  • [35] Latent Dirichlet Allocation for Internet Price War
    Li, Chenchen
    Yan, Xiang
    Deng, Xiaotie
    Qi, Yuan
    Chu, Wei
    Song, Le
    Qiao, Junlong
    He, Jianshan
    Xiong, Junwu
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 639 - 646
  • [36] Multi-dependent Latent Dirichlet Allocation
    Hsin, Wei-Cheng
    Huang, Jen-Wei
    2017 CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2017, : 154 - 159
  • [37] Latent Dirichlet Allocation for Automatic Document Categorization
    Biro, Istvan
    Szabo, Jacint
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2009, 5782 : 430 - 441
  • [38] Author Identification Using Latent Dirichlet Allocation
    Calvo, Hiram
    Hernandez-Castaneda, Angel
    Garcia-Flores, Jorge
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2017, PT II, 2018, 10762 : 303 - 312
  • [39] Latent Dirichlet Allocation Based Multilevel Classification
    Bhutada, Sunil
    Balaram, V. V. S. S. S.
    Bulusu, Vishnu Vardhan
    2014 INTERNATIONAL CONFERENCE ON CONTROL, INSTRUMENTATION, COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICCICCT), 2014, : 1020 - 1024
  • [40] Unsupervised Feature Selection for Latent Dirichlet Allocation
    Xu Weiran
    Du Gang
    Chen Guang
    Guo Jun
    Yang Jie
    CHINA COMMUNICATIONS, 2011, 8 (05) : 54 - 62