Hashtag-Based Tweet Expansion for Improved Topic Modeling

被引:2
|
作者
Kumar, Durgesh [1 ]
Singh, Loitongbam Gyanendro [1 ]
Singh, Sanasam Ranbir [1 ]
机构
[1] IIT Guwahati, Dept Comp Sci & Engn, Gauhati 781039, Assam, India
关键词
Feature extraction; Internet; Encyclopedias; Semantics; Online services; Task analysis; Social networking (online); BERT; Bi-directional Long Short Term Memory (BiLSTM); Co-occurrence Frequency Expansion (CoFE); Graph Convolution Network (GCN); hashtags; Latent Dirichlet Allocation (LDA); topic modeling; tweet expansion; twitter; SOCIAL MEDIA; TWITTER;
D O I
10.1109/TCSS.2022.3171206
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Topic modeling on tweets is known to experience under-specificity and data sparsity due to its character limitation. In earlier studies, researchers attempted to address this problem by either 1) tweet aggregation, where related tweets are combined into a single document or 2) tweet expansion with related text from external sources. The first approach faces the problem of losing the topic distribution in individual tweets. While finding a relevant text from the external source for a random tweet in the second approach is challenging for various reasons like differences in writing styles, multilingual content, and informal text. In contrast to adding context from external resources or combining related tweets into a pool, this study uses the internal vocabulary (hashtags) to counter under-specificity and sparsity in tweets. Earlier studies have indicated hashtags to be an important feature for representing the underlying context present in the tweet. Sequential models like Bi-directional Long Short Term Memory (BiLSTM) and Convolution Neural Network (CNN) over distributed representation of words have shown promising results in capturing semantic relationships between words of a tweet in the past. Motivated by the above, this article proposes a unified framework of hashtag-based tweet expansion exploiting text-based and network-based representation learning methods such as BiLSTM, BERT, and Graph Convolution Network (GCN). The hashtag-based expanded tweets using the proposed framework have significantly improved topic modeling performance compared to un-expanded (raw) tweets and hashtag-pooling-based approaches over two real-world tweet datasets of different nature. Furthermore, this article also studies the significance of hashtags in topic modeling performance by experimenting with different combination of word types such as hashtags, keywords, and user mentions.
引用
收藏
页码:1211 / 1229
页数:19
相关论文
共 50 条
  • [21] Adaptive query generation for topic-based tweet retrieval
    Cotelo, Juan M.
    Cruz, Fermin L.
    Troyano, Jose A.
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2012, (48): : 57 - 64
  • [22] Improved Parsimonious Topic Modeling Based on the Bayesian Information Criterion
    Wang, Hang
    Miller, David
    ENTROPY, 2020, 22 (03)
  • [23] TweetSift: Tweet Topic Classification Based on Entity Knowledge Base and Topic Enhanced Word Embedding
    Li, Quanzhi
    Shah, Sameena
    Liu, Xiaomo
    Nourbakhsh, Armineh
    Fang, Rui
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 2429 - 2432
  • [24] Cognitive Modeling for Topic Expansion (Short Paper)
    Kulkarni, Sumant
    Srinivasa, Srinath
    Arora, Rajeev
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2013 CONFERENCES, 2013, 8185 : 703 - 710
  • [25] A Tweet Classification Model Based on Dynamic and Static Component Topic Vectors
    Nand, Parma
    Perera, Rivindu
    Klette, Gisela
    AI 2015: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2015, 9457 : 424 - 430
  • [26] FINDING NEWS-TOPIC ORIENTED INFLUENTIAL TWITTER USERS BASED ON TOPIC RELATED HASHTAG COMMUNITY DETECTION
    Xiao, Feng
    Noro, Tomoya
    Tokuda, Takehiro
    JOURNAL OF WEB ENGINEERING, 2014, 13 (5-6): : 405 - 429
  • [27] Agent-Based Document Expansion for Information Retrieval Based on Topic Modeling of Local Information
    Strauss, Oliver
    Kutzias, Damian
    Kett, Holger
    2022 9TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE, ISCMI, 2022, : 198 - 202
  • [28] Impacts of biomedical hashtag-based Twitter campaign: #DHPSP utilization for promotion of open innovation in digital health, patient safety, and personalized medicine
    Kletecka-Pulker, Maria
    Mondal, Himel
    Wang, Dongdong
    Parra, R. Gonzalo
    Maigoro, Abdulkadir Yusif
    Lee, Soojin
    Garg, Tushar
    Mulholland, Eoghan J.
    Devkota, Hari Prasad
    Konwar, Bikramjit
    Patnaik, Sourav S.
    Lordan, Ronan
    Nawaz, Faisal A.
    Tsagkaris, Christos
    Rayan, Rehab A.
    Louka, Anna Maria
    De, Ronita
    Badhe, Pravin
    Schaden, Eva
    Willschke, Harald
    Maleczek, Mathias
    Boyina, Hemanth Kumar
    Khalid, Garba M.
    Uddin, Md Sahab
    Sanusi
    Khan, Johra
    Odimegwu, Joy, I
    Yeung, Andy Wai Kan
    Akram, Faizan
    Sai, Chandragiri Siva
    Bucher, Sherri
    Paswan, Shravan Kumar
    Singla, Rajeev K.
    Shen, Bairong
    Di Lonardo, Sara
    Tosevska, Anela
    Simal-Gandara, Jesus
    Zec, Manja
    Gonzalez-Burgos, Elena
    Habijan, Marija
    Battino, Maurizio
    Giampieri, Francesca
    Tikhonov, Aleksei
    Cianciosi, Danila
    Forbes-Hernandez, Tamara Y.
    Quiles, Jose L.
    Mezzetti, Bruno
    Babiaka, Smith B.
    Ahmed, Mosa E. O.
    Piccard, Paula
    CURRENT RESEARCH IN BIOTECHNOLOGY, 2021, 3 : 146 - 153
  • [29] Corpus-based Topic Derivation and Timestamp-based Popular Hashtag Prediction in Twitter
    Kumar, Sharath B. R.
    Wang, Kuochen
    Shen, Shi-Min
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2019, 35 (03) : 675 - 696
  • [30] Topic Model-based Freshness Estimation Towards Diverse Tweet Recommendation
    Yokoyama, Makoto
    Ma, Qiang
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2019, : 9 - 16