A context-enhanced Dirichlet model for online clustering in short text streams

被引:6
|
作者
Kumar, Jay [1 ,3 ]
Shao, Junming [1 ,2 ]
Kumar, Rajesh [1 ]
Din, Salah Ud [1 ]
Mawuli, Cobbinah B. [2 ]
Yang, Qinli [2 ]
机构
[1] Univ Elect Sci & Technol China, Yangtze Delta Reg Inst Huzhou, Huzhou 313001, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
[3] Dalhousie Univ, Inst Big Data Analyt, 6299 South St, Halifax, NS B3H 4R2, Canada
基金
中国国家自然科学基金;
关键词
Text stream; Probabilistic model; Topic evolution; Micro-clusters;
D O I
10.1016/j.eswa.2023.120262
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Online clustering of short text streams has become significant due to the popularity of news and social media platforms. The objective of online clustering is to maintain active topics (clusters) by automatically detecting new topics and forgetting outdated ones. Most existing approaches exploit static and high dimensional semantic term representation of the text to enhance the clustering quality. While these approaches use inference procedures that depend on a fixed batch size to reduce the number of clusters related to a given topic and bring it closer to the actual number of topics. This paper proposes a non-parametric Dirichlet model with episodic inference (EINDM) to cluster the evolving short text stream by introducing a window-based low-dimensional semantic term representation which captures the contextual relationships between words. In addition, an episodic inference procedure is introduced to reduce the cluster sparsity in the model. Furthermore, a novel "word specificity"measure is proposed based on neighborhood terms for evolving contexts for individual terms. Extensive empirical evaluation demonstrates that EINDM yields the best performance, in terms of NMI, homogeneity, and cluster purity, compared to recent state-of-the-art clustering models.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Short text optimized topic model for service clustering
    Lu J.-W.
    Zheng J.-H.
    Li D.-N.
    Xu J.
    Xiao G.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2022, 56 (12): : 2416 - 2425+2444
  • [32] Online short text clustering using infinite extensions of discrete mixture models
    Hannachi, Samar
    Najar, Fatma
    Ennajari, Hafsa
    Bouguila, Nizar
    COMPUTATIONAL INTELLIGENCE, 2023, 39 (05) : 759 - 782
  • [33] Short Text Online Clustering Based on Incremental Robust Nonnegative Matrix Factorization
    He C.-B.
    Tang Y.
    Zhang Q.
    Liu S.-Y.
    Liu H.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2019, 47 (05): : 1086 - 1093
  • [34] Railway Fault Text Clustering Method Using an Improved Dirichlet Multinomial Mixture Model
    Yang, Ni
    Zhang, Youpeng
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022
  • [35] A Latent Dirichlet Allocation and Fuzzy Clustering Based Machine Learning Model for Text Thesaurus
    Luo, J.
    Yu, D.
    Dai, Z.
    INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2020, 15 (02)
  • [36] Short Text Topic Model with Word Embeddings and Context Information
    Zhang, Xianchao
    Feng, Ran
    Liang, Wenxin
    RECENT ADVANCES IN INFORMATION AND COMMUNICATION TECHNOLOGY 2018, 2019, 769 : 55 - 64
  • [37] A Novel Short Text Clustering Model Based on Grey System Theory
    Hüseyin Fidan
    Mehmet Erkan Yuksel
    Arabian Journal for Science and Engineering, 2020, 45 : 2865 - 2882
  • [38] A Novel Short Text Clustering Model Based on Grey System Theory
    Fidan, Huseyin
    Yuksel, Mehmet Erkan
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2020, 45 (04) : 2865 - 2882
  • [39] Stochastic variational inference for clustering short text data with finite mixtures of Dirichlet-Multinomial distributionsStochastic variational inference for clustering short text data...M. Bilancia et al.
    Massimo Bilancia
    Andrea Nigri
    Samuele Magro
    Statistical Papers, 2025, 66 (4)
  • [40] Short-text Sentiment Enhanced Achievement Prediction Method for Online Learners
    Ye J.-M.
    Luo D.-X.
    Chen S.
    Zidonghua Xuebao/Acta Automatica Sinica, 2020, 46 (09): : 1927 - 1940