MR-LDA: An Efficient Topic Model for Classification of Short Text in Big Social Data

被引:7
|
作者
Pang, Xiongwen [1 ]
Wan, Benshuai [2 ]
Li, Huifang [1 ]
Lin, Weiwei [3 ]
机构
[1] South China Normal Univ, Sch Comp, Guangzhou, Guangdong, Peoples R China
[2] Guangdong Nanhai Rural Commercial Bank Co Ltd, Dept Informat Technol, Guangzhou, Guangdong, Peoples R China
[3] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Big Data; Latent Dirichlet Allocation; Micro-Blog; Social Network; Topic Mining;
D O I
10.4018/IJGHPC.2016100106
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Latent Dirichlet Allocation(LDA) is an efficient method of text mining, but applying LDA directly to Chinese micro-blog texts will not work well because micro-blogs are more social, brief, and closely related with each other. Based on LDA, this paper proposes a Micro-blog Relation LDA model (MR-LDA), which takes the relations between Chinese micro-blog documents and other Chinese microblog documents into consideration to help topic mining in micro-blog. The authors extend LDA in the following two points. First, they aggregate several Chinese micro-blogs as a single micro-blog document to solve the problem of short texts. Second, they model the generation process of Chinese micro-blogs more accurately by taking relationship between micro-blog documents into consideration. MR-LDA is more suitable to model Chinese micro-blog data. Gibbs sampling method is borrowed to inference the model. Experimental results on actual datasets show that MR-LDA model can offer an effective solution to text mining for Chinese micro-blog.
引用
收藏
页码:100 / 113
页数:14
相关论文
共 50 条
  • [41] Topic Modeling and Visualization for Big Data in Social Sciences
    Sukhija, Nitin
    Tatineni, Mahidhar
    Brown, Nicole
    Van Moer, Mark
    Rodriguez, Paul
    Callicott, Spencer
    2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 1198 - 1205
  • [42] Unveiling Metaverse Social Trends: Analysing Big Data Regarding Online Sports News With LDA-Based Topic Modelling
    Na, Ju Chan
    Kim, Eun Joung
    Kim, Jung Yoon
    REVISTA DE PSICOLOGIA DEL DEPORTE, 2024, 33 (01): : 115 - 125
  • [43] Text Similarity Computing Based on LDA Topic Model and Word Co-occurrence
    Shao, Minglai
    Qin, Liangxi
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, KNOWLEDGE ENGINEERING AND INFORMATION ENGINEERING (SEKEIE 2014), 2014, 114 : 199 - 203
  • [44] Query Classification using LDA Topic Model and Sparse Representation Based Classifier
    Bhattacharya, Indrani
    Sil, Jaya
    PROCEEDINGS OF THE THIRD ACM IKDD CONFERENCE ON DATA SCIENCES (CODS), 2016,
  • [45] Concept based Short Text Stream Classification with Topic Drifting Detection
    Li, Peipei
    He, Lu
    Hu, Xuegang
    Zhang, Yuhong
    Li, Lei
    Wu, Xindong
    2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2016, : 1009 - 1014
  • [46] Subset Labeled LDA: A Topic Model for Extreme Multi-label Classification
    Papanikolaou, Yannis
    Tsoumakas, Grigorios
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY (DAWAK 2018), 2018, 11031 : 152 - 162
  • [47] Neural labeled LDA: a topic model for semi-supervised document classification
    Wang, Wei
    Guo, Bing
    Shen, Yan
    Yang, Han
    Chen, Yaosen
    Suo, Xinhua
    SOFT COMPUTING, 2021, 25 (23) : 14561 - 14571
  • [48] Neural labeled LDA: a topic model for semi-supervised document classification
    Wei Wang
    Bing Guo
    Yan Shen
    Han Yang
    Yaosen Chen
    Xinhua Suo
    Soft Computing, 2021, 25 : 14561 - 14571
  • [49] A Method for Topic Classification of Web Pages Using LDA-SVM Model
    Wei, Yuliang
    Wang, Wei
    Wang, Bailing
    Yang, Bo
    Liu, Yang
    PROCEEDINGS OF 2017 CHINESE INTELLIGENT AUTOMATION CONFERENCE, 2018, 458 : 589 - 596
  • [50] Big data in transportation: a systematic literature analysis and topic classification
    Tzika-Kostopoulou, Danai
    Nathanail, Eftihia
    Kokkinos, Konstantinos
    KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (08) : 5021 - 5046