Tracking the Evolution of Clusters in Social Media Streams

被引:4
|
作者
Anwar, Tarique [1 ]
Nepal, Surya [2 ]
Paris, Cecile [2 ]
Yang, Jian [3 ]
Wu, Jia [3 ]
Sheng, Quan Z. [3 ]
机构
[1] Univ York, York YO10 5DD, England
[2] CSIROs Data61, Marsfield, NSW 2122, Australia
[3] Macquarie Univ, Sydney, NSW 2109, Australia
关键词
COVID-19; Social networking (online); Clustering algorithms; Indexes; Big Data; Heuristic algorithms; Australia; Social media; stream clustering; density peak clustering; shared dependency tree; short text streams;
D O I
10.1109/TBDATA.2022.3204207
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Tracking the evolution of clusters in social media streams is becoming increasingly important for many applications, such as early detection and monitoring of natural disasters or pandemics. In contrast to clustering on a static set of data, streaming data clustering does not have a global view of the complete data. The local (or partial) view in a high-speed stream makes clustering a challenging task. In this paper, we propose a novel density peak based algorithm, TStream, for tracking the evolution of clusters and outliers in social media streams, via the evolutionary actions of cluster adjustment, emergence, disappearance, split, and merge. TStream is based on a temporal decay model and text stream summarisation. The decay model captures the decreasing importance of textual documents over time. The stream summarisation compactly represents them with the help of cells (aka micro-clusters) in the memory. We also propose a novel efficient index called shared dependency tree (aka SD-Tree) based on the ideas of density peak and shared dependency. It maintains the dynamic dependency relationships in TStream and thereby improves the overall efficiency. We conduct extensive experiments on five real datasets. TStream outperforms the existing state-of-the-art solutions based on MStream, MStreamF, EDMStream, OSGM, and EStream, in terms of cluster mapping measure (CMM) by up to 17.8%, 18.6%, 6.9%, 16.4%, and 20.1%, respectively. It is also significantly more efficient than MStream, MStreamF, OSGM, and EStream, in terms of response time and throughput.
引用
收藏
页码:701 / 715
页数:15
相关论文
共 50 条
  • [1] Beyond Keywords: Tracking the Evolution of Conversational Clusters in Social Media
    Houghton, James P.
    Siegel, Michael
    Madnick, Stuart
    Tounaka, Nobuaki
    Nakamura, Kazutaka
    Sugiyama, Takaaki
    Nakagawa, Daisuke
    Shirnen, Buyanjargal
    SOCIOLOGICAL METHODS & RESEARCH, 2019, 48 (03) : 588 - 607
  • [2] A Survey on Event Tracking in Social Media Data Streams
    Han, Zixuan
    Shi, Leilei
    Liu, Lu
    Jiang, Liang
    Fang, Jiawei
    Lin, Fanyuan
    Zhang, Jinjuan
    Panneerselvam, John
    Antonopoulos, Nick
    BIG DATA MINING AND ANALYTICS, 2024, 7 (01): : 217 - 243
  • [3] Information Evolution Modeling and Tracking in Social Media
    Shabunina, Ekaterina
    Pasi, Gabriella
    2017 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2017), 2017, : 599 - 606
  • [4] Visual topic discovering, tracking and summarization from social media streams
    Zhao Lu
    Yu-Ru Lin
    Xiaoxia Huang
    Naixue Xiong
    Zhijun Fang
    Multimedia Tools and Applications, 2017, 76 : 10855 - 10879
  • [5] Visual topic discovering, tracking and summarization from social media streams
    Lu, Zhao
    Lin, Yu-Ru
    Huang, Xiaoxia
    Xiong, Naixue
    Fang, Zhijun
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (08) : 10855 - 10879
  • [6] A graph-based approach to ememes identification and tracking in Social Media streams
    Shabunina, Ekaterina
    Pasi, Gabriella
    KNOWLEDGE-BASED SYSTEMS, 2018, 139 : 108 - 118
  • [7] Burst Detection in Social Media Streams for Tracking Interest Profiles in Real Time
    Buntain, Cody
    Lin, Jimmy
    SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, : 777 - 780
  • [8] Tracking clusters and anomalies in evolving data streams
    Guggilam, Sreelekha
    Chandola, Varun
    Patra, Abani
    STATISTICAL ANALYSIS AND DATA MINING, 2022, 15 (02) : 156 - 178
  • [9] Tracking and Analyzing Individual Distress Following Terrorist Attacks Using Social Media Streams
    Lin, Yu-Ru
    Margolin, Drew
    Wen, Xidao
    RISK ANALYSIS, 2017, 37 (08) : 1580 - 1605
  • [10] Querying and Tracking Influencers in Social Streams
    Subbian, Karthik
    Aggarwal, Charu C.
    Srivastava, Jaideep
    PROCEEDINGS OF THE NINTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'16), 2016, : 493 - 502