A context-enhanced Dirichlet model for online clustering in short text streams

被引:6
|
作者
Kumar, Jay [1 ,3 ]
Shao, Junming [1 ,2 ]
Kumar, Rajesh [1 ]
Din, Salah Ud [1 ]
Mawuli, Cobbinah B. [2 ]
Yang, Qinli [2 ]
机构
[1] Univ Elect Sci & Technol China, Yangtze Delta Reg Inst Huzhou, Huzhou 313001, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
[3] Dalhousie Univ, Inst Big Data Analyt, 6299 South St, Halifax, NS B3H 4R2, Canada
基金
中国国家自然科学基金;
关键词
Text stream; Probabilistic model; Topic evolution; Micro-clusters;
D O I
10.1016/j.eswa.2023.120262
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Online clustering of short text streams has become significant due to the popularity of news and social media platforms. The objective of online clustering is to maintain active topics (clusters) by automatically detecting new topics and forgetting outdated ones. Most existing approaches exploit static and high dimensional semantic term representation of the text to enhance the clustering quality. While these approaches use inference procedures that depend on a fixed batch size to reduce the number of clusters related to a given topic and bring it closer to the actual number of topics. This paper proposes a non-parametric Dirichlet model with episodic inference (EINDM) to cluster the evolving short text stream by introducing a window-based low-dimensional semantic term representation which captures the contextual relationships between words. In addition, an episodic inference procedure is introduced to reduce the cluster sparsity in the model. Furthermore, a novel "word specificity"measure is proposed based on neighborhood terms for evolving contexts for individual terms. Extensive empirical evaluation demonstrates that EINDM yields the best performance, in terms of NMI, homogeneity, and cluster purity, compared to recent state-of-the-art clustering models.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] INCREMENTAL CLUSTERING IN SHORT TEXT STREAMS BASED ON BM25
    Xu, Lixin
    Chen, Guang
    Yang, Lei
    2014 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2014, : 8 - 12
  • [22] Structural Feature-based Event Clustering for Short Text Streams
    Sun, Zhengya
    Han, Jiuqi
    Hao, Hong-Wei
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 3252 - 3257
  • [23] Ballistic Trajectory Prediction Based on Context-enhanced Long Short-Term Memory Network
    Ren J.
    Wu X.
    Bo Y.
    Wu P.
    He S.
    Binggong Xuebao/Acta Armamentarii, 2023, 44 (02): : 462 - 471
  • [24] Online burst detection over high speed short text streams
    Yuan, Zhijian
    Jia, Yan
    Yang, Shuqiang
    COMPUTATIONAL SCIENCE - ICCS 2007, PT 3, PROCEEDINGS, 2007, 4489 : 717 - +
  • [25] Clustering massive text data streams by semantic smoothing model
    Liu, Yubao
    Cai, Jiarong
    Yin, Jian
    Wai-Chee Fu, Ada
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2007, 4632 : 389 - 400
  • [26] Clustering massive text data streams by semantic smoothing model
    Liu, Yubao
    Cai, Jiarong
    Yin, Jian
    Fu, Ada Wai-Chee
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2007, 4632 : 389 - +
  • [27] Optimization Research based on the online comment clustering of short text
    Zhang, Ping
    Wang, Jianzhong
    2016 IEEE INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC), 2016, : 838 - 842
  • [28] A comparison of the performance of latent Dirichlet allocation and the Dirichlet multinomial mixture model on short text
    Mazarura, Jocelyn
    de Waal, Alta
    2016 PATTERN RECOGNITION ASSOCIATION OF SOUTH AFRICA AND ROBOTICS AND MECHATRONICS INTERNATIONAL CONFERENCE (PRASA-ROBMECH), 2016,
  • [29] Unsupervised Text Learning Based on Context Mixture Model with Dirichlet Prior
    Chen, Dongling
    Wang, Daling
    Yu, Ge
    ADVANCED WEB AND NETWORK TECHNOLOGIES, AND APPLICATIONS, 2008, 4977 : 172 - 181
  • [30] A graph convolutional topic model for short and noisy text streams
    Ngo Van Linh
    Tran Xuan Bach
    Khoat Than
    NEUROCOMPUTING, 2022, 468 : 345 - 359