Examining the Coherence of the Top Ranked Tweet Topics

被引:8
|
作者
Fang, Anjie [1 ]
Macdonald, Craig [1 ]
Ounis, Iadh [1 ]
Habel, Philip [1 ]
机构
[1] Univ Glasgow, Glasgow, Lanark, Scotland
关键词
D O I
10.1145/2911451.2914731
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Topic modelling approaches help scholars to examine the topics discussed in a corpus. Due to the popularity of Twitter, two distinct methods have been proposed to accommodate the brevity of tweets: the tweet pooling method and Twitter LDA. Both of these methods demonstrate a higher performance in producing more interpretable topics than the standard Latent Dirichlet Allocation (LDA) when applied on tweets. However, while various metrics have been proposed to estimate the coherence of the generated topics from tweets, the coherence of the top ranked topics, those that are most likely to be examined by users, has not been investigated. In addition, the effect of the number of generated topics K on the topic coherence scores has not been studied. In this paper, we conduct large-scale experiments using three topic modelling approaches over two Twitter datasets, and apply a state-of-the-art coherence metric to study the coherence of the top ranked topics and how K affects such coherence. Inspired by ranking metrics such as precision at n, we use coherence at n to assess the coherence of a topic model. To verify our results, we conduct a pairwise user study to obtain human preferences over topics. Our findings are threefold: we find evidence that Twitter LDA outperforms both LDA and the tweet pooling method because the top ranked topics it generates have more coherence; we demonstrate that a larger number of topics (K) helps to generate topics with more coherence; and finally, we show that coherence at n is more effective when evaluating the coherence of a topic model than the average coherence score.
引用
收藏
页码:825 / 828
页数:4
相关论文
共 50 条
  • [21] EPSRC lists top topics
    不详
    CHEMISTRY IN BRITAIN, 2001, 37 (08) : 10 - 10
  • [22] Characterizing top ranked code examples in Google
    Hora, Andre
    JOURNAL OF SYSTEMS AND SOFTWARE, 2021, 178 (178)
  • [24] Survival After Cancer Treatment at Top-Ranked US Cancer Hospitals vs Affiliates of Top-Ranked Cancer Hospitals
    Boffa, Daniel J.
    Mallin, Katherine
    Herrin, Jeph
    Resio, Benjamin
    Salazar, Michelle C.
    Palis, Bryan
    Facktor, Matthew
    McCabe, Ryan
    Nelson, Heidi
    Shulman, Lawrence N.
    JAMA NETWORK OPEN, 2020, 3 (05) : E203942
  • [25] Total Knee Arthroplasty Outcomes in Top-Ranked and Non-Top-Ranked Orthopedic Hospitals: An Analysis of Medicare Administrative Data
    Cram, Peter
    Cai, Xueya
    Lu, Xin
    Vaughan-Sarrazin, Mary S.
    Miller, Benjamin J.
    MAYO CLINIC PROCEEDINGS, 2012, 87 (04) : 341 - 348
  • [26] Research topics, author profiles, and collaboration networks in the top-ranked journal on educational technology over the past 40 years: a bibliometric analysis
    Xieling Chen
    Guoxing Yu
    Gary Cheng
    Tianyong Hao
    Journal of Computers in Education, 2019, 6 : 563 - 585
  • [27] Tweet topics and sentiments relating to distance learning among Italian Twitter users
    Stracqualursi, Luisa
    Agati, Patrizia
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [28] Research topics, author profiles, and collaboration networks in the top-ranked journal on educational technology over the past 40 years: a bibliometric analysis
    Chen, Xieling
    Yu, Guoxing
    Cheng, Gary
    Hao, Tianyong
    JOURNAL OF COMPUTERS IN EDUCATION, 2019, 6 (04) : 563 - 585
  • [29] EXAMINING RESEARCH TOPICS FOR NOVELTY AND PATENTABILITY
    VASILEV, VF
    KLEMANTOVIC, GI
    AKULOVA, EN
    NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 1-ORGANIZATSIYA I METODIKA INFORMATSIONNOI RABOTY, 1981, (10): : 6 - 7
  • [30] TWEET TO THE TOP? SOCIAL MEDIA PERSONAL BRANDING AND CAREER OUTCOMES
    Chen, Yanzhen
    Rui, Huaxia
    Whinston, Andrew B.
    MIS QUARTERLY, 2021, 45 (02) : 499 - 533