A context-enhanced Dirichlet model for online clustering in short text streams

被引:6
|
作者
Kumar, Jay [1 ,3 ]
Shao, Junming [1 ,2 ]
Kumar, Rajesh [1 ]
Din, Salah Ud [1 ]
Mawuli, Cobbinah B. [2 ]
Yang, Qinli [2 ]
机构
[1] Univ Elect Sci & Technol China, Yangtze Delta Reg Inst Huzhou, Huzhou 313001, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
[3] Dalhousie Univ, Inst Big Data Analyt, 6299 South St, Halifax, NS B3H 4R2, Canada
基金
中国国家自然科学基金;
关键词
Text stream; Probabilistic model; Topic evolution; Micro-clusters;
D O I
10.1016/j.eswa.2023.120262
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Online clustering of short text streams has become significant due to the popularity of news and social media platforms. The objective of online clustering is to maintain active topics (clusters) by automatically detecting new topics and forgetting outdated ones. Most existing approaches exploit static and high dimensional semantic term representation of the text to enhance the clustering quality. While these approaches use inference procedures that depend on a fixed batch size to reduce the number of clusters related to a given topic and bring it closer to the actual number of topics. This paper proposes a non-parametric Dirichlet model with episodic inference (EINDM) to cluster the evolving short text stream by introducing a window-based low-dimensional semantic term representation which captures the contextual relationships between words. In addition, an episodic inference procedure is introduced to reduce the cluster sparsity in the model. Furthermore, a novel "word specificity"measure is proposed based on neighborhood terms for evolving contexts for individual terms. Extensive empirical evaluation demonstrates that EINDM yields the best performance, in terms of NMI, homogeneity, and cluster purity, compared to recent state-of-the-art clustering models.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] An Online Semantic-enhanced Dirichlet Model for Short Text Stream Clustering
    Kumar, Jay
    Shao, Junming
    Din, Salah ud
    Ali, Wazir
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 766 - 776
  • [2] A topic-enhanced dirichlet model for short text stream clustering
    Liu, Kan
    He, Jiarui
    Chen, Yu
    NEURAL COMPUTING & APPLICATIONS, 2024, : 8125 - 8140
  • [3] Efficient Clustering of Short Text Streams using Online-Offline Clustering
    Rakib, Md Rashadul Hasan
    Zeh, Norbert
    Milios, Evangelos
    PROCEEDINGS OF THE 21ST ACM SYMPOSIUM ON DOCUMENT ENGINEERING (DOCENG '21), 2021,
  • [4] Model-based Clustering of Short Text Streams
    Yin, Jianhua
    Chao, Daren
    Liu, Zhongkun
    Zhang, Wei
    Yu, Xiaohui
    Wang, Jianyong
    KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 2634 - 2642
  • [5] An Online Dirichlet Model based on Sentence Embedding and DBSCAN for Noisy Short Text Stream Clustering
    Si, XianLiang
    Li, Peipei
    Hu, Xuegang
    Zhang, Yuhong
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [6] Context-Enhanced Directed Model Checking
    Wehrle, Martin
    Kupferschmid, Sebastian
    MODEL CHECKING SOFTWARE, 2010, 6349 : 88 - 105
  • [7] An Online Semantic-Enhanced Graphical Model for Evolving Short Text Stream Clustering
    Kumar, Jay
    Din, Salah Ud
    Yang, Qinli
    Kumar, Rajesh
    Shao, Junming
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (12) : 13809 - 13820
  • [8] An Adaptive Dirichlet Multinomial Mixture Model for Short Text Streaming Clustering
    Duan, Ruting
    Li, Chunping
    2018 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2018), 2018, : 49 - 55
  • [9] Online Clustering of Massive Text Data Streams
    Maha Ben-Fares
    Parisa Rastin
    Nistor Grozavu
    Pierre Holat
    SN Computer Science, 6 (5)
  • [10] Collaborative User Clustering for Short Text Streams
    Liang, Shangsong
    Ren, Zhaochun
    Yilmaz, Emine
    Kanoulas, Evangelos
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3504 - 3510