Topic representation model based on microblogging behavior analysis

被引:10
|
作者
Han, Weihong [1 ]
Tian, Zhihong [1 ]
Huang, Zizhong [2 ]
Li, Shudong [1 ]
Jia, Yan [3 ]
机构
[1] Guangzhou Univ, Cyberspace Inst Adv Technol, Guangzhou 510006, Peoples R China
[2] Natl Univ Def Technol, Comp Sch, Changsha 410073, Peoples R China
[3] Cyberspace Secur Res Ctr, Peng Cheng Lab, Shenzhen 518000, Peoples R China
关键词
Topic representation model; Behavior analysis; Word distribution; LDA model; Topic detection; INTERNET;
D O I
10.1007/s11280-020-00822-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the development of microblogging, it has become an important way for people to obtain information, express opinions, and make suggestions. Identifying new topics quickly and accurately from the massive microblogging data plays a crucial role for recommending information and controlling public opinion. The topic representation model provides a basis for topic detection. In this paper, we propose a topic representation model based on user behavior analysis, i.e., microblogging behavior analysis-latent Dirichlet allocation (MBA-LDA) model, for microblogging datasets. Topic-word distribution is acquired by the LDA model which considers information on user behaviors (such as posting, forwarding and commenting) and word distribution among documents within one topic and among different topics. The model also re-assesses the importance of words in topic representation. The basic idea is that the distribution of words within a topic or among different topics has a great influence on the selection of topic expression words. If a word is evenly distributed among all documents of a certain topic, it indicates that the word is the common word of all documents in the topic, and it is more suitable to represent this topic. If a word is more evenly distributed among various topics, it indicates that the word is the common word of all topics, and it can't achieve the purpose of distinguishing topics, so it is less suitable to represent any topic. By experiments with Sina Microblogging's actual data set, the topic model based on the MBA-LDA algorithm makes the representative words more important and increases the differentiation of topic words, which effectively improves the accuracy of subsequent topic detection and evolutionary analysis.
引用
收藏
页码:3083 / 3097
页数:15
相关论文
共 50 条
  • [31] Topic-Grained Text Representation-Based Model for Document Retrieval
    Du, Mengxue
    Li, Shasha
    Jie, Yu
    Ma, Jun
    Bin, Ji
    Liu, Huijun
    Lin, Wuhang
    Yi, Zibo
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT III, 2022, 13531 : 776 - 788
  • [32] Study of Cross-Media Topic Analysis Based on Visual Topic Model
    Zhou, Yipeng
    Liang, Meiyu
    Du, Junping
    PROCEEDINGS OF THE 2012 24TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2012, : 3467 - 3470
  • [33] Double LDA: A Sentiment Analysis Model Based On Topic Model
    Chen, Xue
    Tang, Wenqing
    Xu, Hao
    Hu, Xiaofeng
    2014 10TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG), 2014, : 49 - 56
  • [34] An alternative topic model based on Common Interest Authors for topic evolution analysis
    Jung, Sukhwan
    Yoon, Wan Chul
    JOURNAL OF INFORMETRICS, 2020, 14 (03)
  • [35] Topic Detection from Microblog Based on Text Clustering and Topic Model Analysis
    Huang, Siqi
    Yang, Yitao
    Li, Huakang
    Sun, Guozi
    2014 ASIA-PACIFIC SERVICES COMPUTING CONFERENCE (APSCC), 2014, : 88 - 92
  • [36] Sustainable Career Development of Chinese Generation Z (Post-00s) Attending and Graduating from University: Dynamic Topic Model Analysis Based on Microblogging
    Wang, Peng
    Zhang, Mengnan
    Wang, Yike
    Yuan, Xiqing
    SUSTAINABILITY, 2023, 15 (03)
  • [37] Topic Classification Based on Distributed Document Representation and Latent Topic Information
    Chen, Peixin
    Guo, Wu
    Wang, Qingnan
    Song, Yan
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 614 - 617
  • [38] The Hidden Markov Topic Model: A Probabilistic Model of Semantic Representation
    Andrews, Mark
    Vigliocco, Gabriella
    TOPICS IN COGNITIVE SCIENCE, 2010, 2 (01) : 101 - 113
  • [39] Determining Emotional Profile Based on Microblogging Analysis
    Martins, Ricardo
    Henriques, Pedro
    Novais, Paulo
    PROGRESS IN ARTIFICIAL INTELLIGENCE, PT II, 2019, 11805 : 159 - 171
  • [40] Topic Model Based Opinion Mining and Sentiment Analysis
    Krishna, Vamshi B.
    Pandey, Ajeet Kumar
    Kumar, Siva A. P.
    2018 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2018,