Topic representation model based on microblogging behavior analysis

被引:10
|
作者
Han, Weihong [1 ]
Tian, Zhihong [1 ]
Huang, Zizhong [2 ]
Li, Shudong [1 ]
Jia, Yan [3 ]
机构
[1] Guangzhou Univ, Cyberspace Inst Adv Technol, Guangzhou 510006, Peoples R China
[2] Natl Univ Def Technol, Comp Sch, Changsha 410073, Peoples R China
[3] Cyberspace Secur Res Ctr, Peng Cheng Lab, Shenzhen 518000, Peoples R China
关键词
Topic representation model; Behavior analysis; Word distribution; LDA model; Topic detection; INTERNET;
D O I
10.1007/s11280-020-00822-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the development of microblogging, it has become an important way for people to obtain information, express opinions, and make suggestions. Identifying new topics quickly and accurately from the massive microblogging data plays a crucial role for recommending information and controlling public opinion. The topic representation model provides a basis for topic detection. In this paper, we propose a topic representation model based on user behavior analysis, i.e., microblogging behavior analysis-latent Dirichlet allocation (MBA-LDA) model, for microblogging datasets. Topic-word distribution is acquired by the LDA model which considers information on user behaviors (such as posting, forwarding and commenting) and word distribution among documents within one topic and among different topics. The model also re-assesses the importance of words in topic representation. The basic idea is that the distribution of words within a topic or among different topics has a great influence on the selection of topic expression words. If a word is evenly distributed among all documents of a certain topic, it indicates that the word is the common word of all documents in the topic, and it is more suitable to represent this topic. If a word is more evenly distributed among various topics, it indicates that the word is the common word of all topics, and it can't achieve the purpose of distinguishing topics, so it is less suitable to represent any topic. By experiments with Sina Microblogging's actual data set, the topic model based on the MBA-LDA algorithm makes the representative words more important and increases the differentiation of topic words, which effectively improves the accuracy of subsequent topic detection and evolutionary analysis.
引用
收藏
页码:3083 / 3097
页数:15
相关论文
共 50 条
  • [1] Topic representation model based on microblogging behavior analysis
    Weihong Han
    Zhihong Tian
    Zizhong Huang
    Shudong Li
    Yan Jia
    World Wide Web, 2020, 23 : 3083 - 3097
  • [2] A Prerecognition Model for Hot Topic Discovery Based on Microblogging Data
    Zhu, Tongyu
    Yu, Jianjun
    SCIENTIFIC WORLD JOURNAL, 2014,
  • [3] Influential user weighted sentiment analysis on topic based microblogging community
    Eliacik, Alpaslan Burak
    Erdogan, Nadia
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 92 : 403 - 418
  • [4] Natural disaster topic extraction in Sina microblogging based on graph analysis
    Ma, Tinghuai
    Zhao, YuWei
    Zhou, Honghao
    Tian, Yuan
    Al-Dhelaan, Abdullah
    Al-Rodhaan, Mznah
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 115 : 346 - 355
  • [5] Analysis of the propagation characteristics of the hot topic of microblogging
    Zhao, Longwen
    Yao, Haibo
    Zhou, Tingting
    Advances in Information Sciences and Service Sciences, 2012, 4 (19): : 256 - 263
  • [6] Sentiment polarity Analysis on Microblogging Hot Topic
    Xu Yabin
    Zhang Guanglei
    INTERNATIONAL JOURNAL OF SECURITY AND ITS APPLICATIONS, 2016, 10 (07): : 319 - 331
  • [7] Multimodal learning for topic sentiment analysis in microblogging
    Huang, Faliang
    Zhang, Shichao
    Zhang, Jilian
    Yu, Ge
    NEUROCOMPUTING, 2017, 253 : 144 - 153
  • [8] Social influence analysis in microblogging platforms - A topic-sensitive based approach
    Cano, Amparo E.
    Mazumdar, Suvodeep
    Ciravegna, Fabio
    SEMANTIC WEB, 2014, 5 (05) : 357 - 372
  • [9] A Phrase Topic Model Based on Distributed Representation
    Ma, Jialin
    Cheng, Jieyi
    Zhang, Lin
    Zhou, Lei
    Chen, Bolun
    CMC-COMPUTERS MATERIALS & CONTINUA, 2020, 64 (01): : 455 - 469
  • [10] A phrase topic model based on distributed representation
    Ma J.
    Cheng J.
    Zhang L.
    Zhou L.
    Chen B.
    Computers, Materials and Continua, 2020, 64 (01): : 455 - 469