Hashtag-Based Tweet Expansion for Improved Topic Modeling

被引:2
|
作者
Kumar, Durgesh [1 ]
Singh, Loitongbam Gyanendro [1 ]
Singh, Sanasam Ranbir [1 ]
机构
[1] IIT Guwahati, Dept Comp Sci & Engn, Gauhati 781039, Assam, India
关键词
Feature extraction; Internet; Encyclopedias; Semantics; Online services; Task analysis; Social networking (online); BERT; Bi-directional Long Short Term Memory (BiLSTM); Co-occurrence Frequency Expansion (CoFE); Graph Convolution Network (GCN); hashtags; Latent Dirichlet Allocation (LDA); topic modeling; tweet expansion; twitter; SOCIAL MEDIA; TWITTER;
D O I
10.1109/TCSS.2022.3171206
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Topic modeling on tweets is known to experience under-specificity and data sparsity due to its character limitation. In earlier studies, researchers attempted to address this problem by either 1) tweet aggregation, where related tweets are combined into a single document or 2) tweet expansion with related text from external sources. The first approach faces the problem of losing the topic distribution in individual tweets. While finding a relevant text from the external source for a random tweet in the second approach is challenging for various reasons like differences in writing styles, multilingual content, and informal text. In contrast to adding context from external resources or combining related tweets into a pool, this study uses the internal vocabulary (hashtags) to counter under-specificity and sparsity in tweets. Earlier studies have indicated hashtags to be an important feature for representing the underlying context present in the tweet. Sequential models like Bi-directional Long Short Term Memory (BiLSTM) and Convolution Neural Network (CNN) over distributed representation of words have shown promising results in capturing semantic relationships between words of a tweet in the past. Motivated by the above, this article proposes a unified framework of hashtag-based tweet expansion exploiting text-based and network-based representation learning methods such as BiLSTM, BERT, and Graph Convolution Network (GCN). The hashtag-based expanded tweets using the proposed framework have significantly improved topic modeling performance compared to un-expanded (raw) tweets and hashtag-pooling-based approaches over two real-world tweet datasets of different nature. Furthermore, this article also studies the significance of hashtags in topic modeling performance by experimenting with different combination of word types such as hashtags, keywords, and user mentions.
引用
收藏
页码:1211 / 1229
页数:19
相关论文
共 50 条
  • [1] Hashtag-based topic evolution in social media
    Alam, Md Hijbul
    Ryu, Woo-Jong
    Lee, SangKeun
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2017, 20 (06): : 1527 - 1549
  • [2] Hashtag-based topic evolution in social media
    Md. Hijbul Alam
    Woo-Jong Ryu
    SangKeun Lee
    World Wide Web, 2017, 20 : 1527 - 1549
  • [3] An Approach to Analyse a Hashtag-Based Topic Thread in Twitter
    Shabunina, Ekaterina
    Marrara, Stefania
    Pasi, Gabriella
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2016, 2016, 9612 : 350 - 358
  • [4] Hashtag Graph based Topic Model for Tweet Mining
    Wang, Yuan
    Liu, Jie
    Qu, Jishi
    Huang, Yalou
    Chen, Jimeng
    Feng, Xia
    2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2014, : 1025 - 1030
  • [5] A Semi-supervised Approach of Cluster-Based Topic Modeling for Effective Tweet Hashtag Recommendation
    Pradipta Kumar Pattanayak
    Rudra M. Tripathy
    Sudarsan Padhy
    SN Computer Science, 5 (7)
  • [6] Hashtag Recommendation Based on Topic Enhanced Embedding, Tweet Entity Data and Learning to Rank
    Li, Quanzhi
    Shah, Sameena
    Nourbakhsh, Armineh
    Liu, Xiaomo
    Fang, Rui
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 2085 - 2088
  • [7] Hashtag Recommendation Based on User Tweet and Hashtag Classification on Twitter
    Jeon, Mina
    Jun, Sanghoon
    Hwang, Eenjun
    WEB-AGE INFORMATION MANAGEMENT: WAIM 2014 INTERNATIONAL WORKSHOPS, 2014, 8597 : 325 - 336
  • [8] Segmentation Based Representation for Tweet Hashtag
    Sharmila, S. P.
    Sujatha, P. Kola
    2015 SEVENTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2015,
  • [9] Tweet Modeling with LSTM Recurrent Neural Networks for Hashtag Recommendation
    Li, Jia
    Xu, Hua
    He, Xingwei
    Deng, Junhui
    Sun, Xiaomin
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 1570 - 1577
  • [10] A hashtag-based sub-event detection framework for social media
    Lu, Guoming
    Mu, Yaqiao
    Gu, Jianbin
    Kouassi, Franck A. P.
    Lu, Chenxi
    Wang, Ruozhou
    Chen, Aiguo
    COMPUTERS & ELECTRICAL ENGINEERING, 2021, 94