Topic Clustering for Social Media Texts with Heterogeneous Graph Neural Networks

被引:0
|
作者
Xiaodong F. [1 ]
Kangxin H. [1 ]
机构
[1] School of Public Affairs and Administration, University of Electronic Science and Technology of China, Chengdu
基金
中国国家自然科学基金;
关键词
Graph Neural Networks; Heterogeneous Information Network; Multiple Interactions; Social Media; Topic Clustering;
D O I
10.11925/infotech.2096-3467.2022.0038
中图分类号
学科分类号
摘要
[Objective] This paper develops an effective topic clustering method to address the issues of semantic sparsity and multiple interactions of social media texts. [Methods] We constructed a model for the multiple interaction relationship between social media users and online contents with the help of heterogeneous information network. First, we used word embedding method to obtain the representation of texts as the initial input features. Then, we propagated and aggregated representations of nodes with the heterogeneous graph neural network. Finally, we trained the model with representation of text nodes, and conducted an unsupervised clustering for the topics. [Results] We examined our model on the English benchmark data set, and found its NMI for original posts and comments reached 0.837 2 and 0.868 9 respectively, which were higher than those of the traditional LDA or directly clustering method with words or text embedding vectors by Word2Vec, Doc2Vec, or GolVe. [Limitations] Due to the limits of data, we did not examine the social relationship among users and multimedia contents online. [Conclusions] The proposed model can effectively improve the topic clustering for social media texts. © 2022, Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:9 / 19
页数:10
相关论文
共 35 条
  • [1] Yan Duanwu, Mei Xirui, Yang Xiongfei, Et al., Research on Microblog Text Topic Clustering Based on the Fusion of Topic Model and Word Embedding[J], Journal of Modern Information, 41, 10, pp. 67-74, (2021)
  • [2] Li X M, Li C C, Chi J J, Et al., Short Text Topic Modeling by Exploring Original Documents, Knowledge and Information Systems, 56, 2, pp. 443-462, (2018)
  • [3] Mehrotra R, Sanner S, Buntine W, Et al., Improving LDA Topic Models for Microblogs via Tweet Pooling and Automatic Labeling, Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 889-892, (2013)
  • [4] Vavliakis K N, Symeonidis A L, Mitkas P A., Event Identification in Web Social Media Through Named Entity Recognition and Topic Modeling, Data & Knowledge Engineering, 88, pp. 1-24, (2013)
  • [5] Curiskis S A, Drake B, Osborn T R, Et al., An Evaluation of Document Clustering and Topic Modelling in Two Online Social Networks: Twitter and Reddit, Information Processing & Management, 57, 2, (2020)
  • [6] Wu S Z, Zhang H P, Xu C C, Et al., Text Clustering on Short Message by Using Deep Semantic Representation, Proceedings of the 4th International Conference on Computer, Communication and Computational Sciences, pp. 133-145, (2019)
  • [7] Zhang C X, Song D J, Huang C, Et al., Heterogeneous Graph Neural Network, Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 793-803, (2019)
  • [8] Xu S Y, Yang C, Shi C, Et al., Topic-Aware Heterogeneous Graph Neural Network for Link Prediction, Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 2261-2270, (2021)
  • [9] Allan J., Topic Detection and Tracking: Event-Based Information Organization [M], (2012)
  • [10] Yang Y M, Pierce T, Carbonell J., A Study of Retrospective and On-Line Event Detection, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 28-36, (1998)