Hashtag-Based Tweet Expansion for Improved Topic Modeling

被引:2
|
作者
Kumar, Durgesh [1 ]
Singh, Loitongbam Gyanendro [1 ]
Singh, Sanasam Ranbir [1 ]
机构
[1] IIT Guwahati, Dept Comp Sci & Engn, Gauhati 781039, Assam, India
关键词
Feature extraction; Internet; Encyclopedias; Semantics; Online services; Task analysis; Social networking (online); BERT; Bi-directional Long Short Term Memory (BiLSTM); Co-occurrence Frequency Expansion (CoFE); Graph Convolution Network (GCN); hashtags; Latent Dirichlet Allocation (LDA); topic modeling; tweet expansion; twitter; SOCIAL MEDIA; TWITTER;
D O I
10.1109/TCSS.2022.3171206
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Topic modeling on tweets is known to experience under-specificity and data sparsity due to its character limitation. In earlier studies, researchers attempted to address this problem by either 1) tweet aggregation, where related tweets are combined into a single document or 2) tweet expansion with related text from external sources. The first approach faces the problem of losing the topic distribution in individual tweets. While finding a relevant text from the external source for a random tweet in the second approach is challenging for various reasons like differences in writing styles, multilingual content, and informal text. In contrast to adding context from external resources or combining related tweets into a pool, this study uses the internal vocabulary (hashtags) to counter under-specificity and sparsity in tweets. Earlier studies have indicated hashtags to be an important feature for representing the underlying context present in the tweet. Sequential models like Bi-directional Long Short Term Memory (BiLSTM) and Convolution Neural Network (CNN) over distributed representation of words have shown promising results in capturing semantic relationships between words of a tweet in the past. Motivated by the above, this article proposes a unified framework of hashtag-based tweet expansion exploiting text-based and network-based representation learning methods such as BiLSTM, BERT, and Graph Convolution Network (GCN). The hashtag-based expanded tweets using the proposed framework have significantly improved topic modeling performance compared to un-expanded (raw) tweets and hashtag-pooling-based approaches over two real-world tweet datasets of different nature. Furthermore, this article also studies the significance of hashtags in topic modeling performance by experimenting with different combination of word types such as hashtags, keywords, and user mentions.
引用
收藏
页码:1211 / 1229
页数:19
相关论文
共 50 条
  • [31] Topic Modeling for Short Texts with Co-occurrence Frequency-based Expansion
    Pedrosa, Gabriel
    Pita, Marcelo
    Bicalho, Paulo
    Lacerda, Anisio
    Pappa, Gisele L.
    PROCEEDINGS OF 2016 5TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2016), 2016, : 277 - 282
  • [32] An Improved Software Bug Triaging Approach Based on Topic Modeling and Fuzzy Logic
    Panda, Rama Ranjan
    Nagwani, Naresh Kumar
    PROCEEDINGS OF THIRD DOCTORAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE, DOSCI 2022, 2023, 479 : 337 - 346
  • [33] An Improved LDA Topic Modeling Method Based on Partition for Medium and Long Texts
    Guo C.
    Lu M.
    Wei W.
    Annals of Data Science, 2021, 8 (02) : 331 - 344
  • [34] What to Tag Your Microblog: Hashtag Recommendation Based on Topic Analysis and Collaborative Filtering
    Wang, Yuan
    Qu, Jishi
    Liu, Jie
    Chen, Jimeng
    Huang, Yalou
    WEB TECHNOLOGIES AND APPLICATIONS, APWEB 2014, 2014, 8709 : 610 - 618
  • [35] An integrated clustering and BERT framework for improved topic modeling
    George L.
    Sumathy P.
    International Journal of Information Technology, 2023, 15 (4) : 2187 - 2195
  • [36] A personalized hashtag recommendation approach using LDA-based topic model in microblog environment
    Zhao, Feng
    Zhu, Yajun
    Jin, Hai
    Yang, Laurence T.
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2016, 65 : 196 - 206
  • [37] RankTopic: Ranking Based Topic Modeling
    Duan, Dongsheng
    Li, Yuhua
    Li, Ruixuan
    Zhang, Rui
    Wen, Aiming
    12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, : 211 - 220
  • [38] A Topic Modeling Based on Prompt Learning
    Qiu, Mingjie
    Yang, Wenzhong
    Wei, Fuyuan
    Chen, Mingliang
    ELECTRONICS, 2024, 13 (16)
  • [39] A semi-supervised approach of short text topic modeling using embedded fuzzy clustering for Twitter hashtag recommendation
    Pradipta Kumar Pattanayak
    Rudra Mohan Tripathy
    Sudarsan Padhy
    Discover Sustainability, 5
  • [40] TOPIC MODELING BASED ON ATTRIBUTED GRAPH
    Zhang Lidan
    2022 19TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2022,