Hashtag-Based Tweet Expansion for Improved Topic Modeling

被引:2
|
作者
Kumar, Durgesh [1 ]
Singh, Loitongbam Gyanendro [1 ]
Singh, Sanasam Ranbir [1 ]
机构
[1] IIT Guwahati, Dept Comp Sci & Engn, Gauhati 781039, Assam, India
关键词
Feature extraction; Internet; Encyclopedias; Semantics; Online services; Task analysis; Social networking (online); BERT; Bi-directional Long Short Term Memory (BiLSTM); Co-occurrence Frequency Expansion (CoFE); Graph Convolution Network (GCN); hashtags; Latent Dirichlet Allocation (LDA); topic modeling; tweet expansion; twitter; SOCIAL MEDIA; TWITTER;
D O I
10.1109/TCSS.2022.3171206
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Topic modeling on tweets is known to experience under-specificity and data sparsity due to its character limitation. In earlier studies, researchers attempted to address this problem by either 1) tweet aggregation, where related tweets are combined into a single document or 2) tweet expansion with related text from external sources. The first approach faces the problem of losing the topic distribution in individual tweets. While finding a relevant text from the external source for a random tweet in the second approach is challenging for various reasons like differences in writing styles, multilingual content, and informal text. In contrast to adding context from external resources or combining related tweets into a pool, this study uses the internal vocabulary (hashtags) to counter under-specificity and sparsity in tweets. Earlier studies have indicated hashtags to be an important feature for representing the underlying context present in the tweet. Sequential models like Bi-directional Long Short Term Memory (BiLSTM) and Convolution Neural Network (CNN) over distributed representation of words have shown promising results in capturing semantic relationships between words of a tweet in the past. Motivated by the above, this article proposes a unified framework of hashtag-based tweet expansion exploiting text-based and network-based representation learning methods such as BiLSTM, BERT, and Graph Convolution Network (GCN). The hashtag-based expanded tweets using the proposed framework have significantly improved topic modeling performance compared to un-expanded (raw) tweets and hashtag-pooling-based approaches over two real-world tweet datasets of different nature. Furthermore, this article also studies the significance of hashtags in topic modeling performance by experimenting with different combination of word types such as hashtags, keywords, and user mentions.
引用
收藏
页码:1211 / 1229
页数:19
相关论文
共 50 条
  • [41] A semi-supervised approach of short text topic modeling using embedded fuzzy clustering for Twitter hashtag recommendation
    Pattanayak, Pradipta Kumar
    Tripathy, Rudra Mohan
    Padhy, Sudarsan
    DISCOVER SUSTAINABILITY, 2024, 5 (01):
  • [42] Connective power of the twitter networks: Discovering the reverse agenda-setting effects of hashtag activism through topic modeling
    Chong M.
    Proceedings of the Association for Information Science and Technology, 2019, 56 (01): : 629 - 630
  • [43] Topic Classification Based on Improved Word Embedding
    Sheng, Liangliang
    Xu, Lizhen
    2017 14TH WEB INFORMATION SYSTEMS AND APPLICATIONS CONFERENCE (WISA 2017), 2017, : 117 - 121
  • [44] What Users Tweet on NFTs: Mining Twitter to Understand NFT-Related Concerns Using a Topic Modeling Approach
    Meyns, Sarah C.
    Dalipi, Fisnik
    IEEE ACCESS, 2022, 10 : 117658 - 117680
  • [45] MODELING MULTIWORD PHRASES WITH CONSTRAINED PHRASE TREES FOR IMPROVED TOPIC MODELING OF CONVERSATIONAL SPEECH
    Hazen, Timothy J.
    Richardson, Fred
    2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 222 - 227
  • [46] Topic-STG: Extending the Session-based Temporal Graph Approach for Personalized Tweet Recommendation
    Yu, Jianjun
    Shen, Yi
    Yang, Zhenglu
    WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, : 413 - 414
  • [47] Integrating Intra-Speaker Topic Modeling and Temporal-Based Inter-Speaker Topic Modeling in Random Walk for Improved Multi-Party Meeting Summarization
    Chen, Yun-Nung
    Metze, Florian
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2343 - 2346
  • [48] Teaching System Modeling Based on Topic Maps
    Chen, Xilun
    Hou, Xia
    Li, Ning
    ADVANCES IN COMPUTER SCIENCE, ENVIRONMENT, ECOINFORMATICS, AND EDUCATION, PT II, 2011, 215 : 197 - 204
  • [49] Social Networks Analysis Based on Topic Modeling
    Muon Nguyen
    Thanh Ho
    Phuc Do
    PROCEEDINGS OF 2013 IEEE RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES: RESEARCH, INNOVATION, AND VISION FOR THE FUTURE (RIVF), 2013, : 119 - 122
  • [50] ROMANIAN TOPIC MODELING - AN EVALUATION OF PROBABILISTIC VERSUS TRANSFORMER-BASED TOPIC MODELING FOR DOMAIN CATEGORIZATION
    Nitu, Melania
    Dascalu, Mihai
    Dascalu, Maria-Iuliana
    REVUE ROUMAINE DES SCIENCES TECHNIQUES-SERIE ELECTROTECHNIQUE ET ENERGETIQUE, 2023, 68 (03): : 295 - 300