Topic-Specific Post Identification in Microblog Streams

被引:0
|
作者
Karunasekera, Shanika [1 ]
Harwood, Aaron [1 ]
Samarawickrama, Sameendra [1 ]
Ramamohanarao, Kotagiri [1 ]
Robins, Garry [2 ]
机构
[1] Univ Melbourne, Dept Comp & Informat Syst, Melbourne, Vic 3010, Australia
[2] Univ Melbourne, Melbourne Sch Psychol Sci, Melbourne, Vic 3010, Australia
关键词
microblog; topic; keyword; query; document; term;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The tracking of microblog discussion, on a given topic, is useful for a wide range of higher level applications. Microblog services like Twitter provide a simple keyword based tracking capability, where any tweet containing a keyword is returned. Due to the short length of microblog posts, using a small number of topic specific query words for tracking, would impact recall. Use of a larger number of keywords (compared to regular document retrieval) is generally required in order to obtain good recall, but this would result in a large number of off-topic posts, resulting in low precision. In our work, we consider the scenario of using a large number of query terms to maintain high recall, for automated tracking of a microblog streams. The challenge we address is how to score each of the returned microblogs, with respect to the query, on-line, in an unsupervised manner, so as to identify those that are on topic. To this end, we proposed a new term-scoring expression, which we call Adjusted Information Gain (AIG), and we compare this to other term-scoring expressions: inverse document frequency, Dice, Jaccard and keyword frequency. Our comparisons consider a selection of document-scoring functions applied to roughly 40 million tweets collects over a 20 day period for each of two topics. Our results show significant improvements (from 8%-40% of the area under the ROC curves) to existing term-scoring expressions, depending on topic and specificity, and provide insight into further work in query expansion techniques.
引用
收藏
页数:7
相关论文
共 50 条
  • [31] Topic-specific characteristics of proof-related reasoning
    Bergwall, Andreas
    INTERNATIONAL JOURNAL OF MATHEMATICAL EDUCATION IN SCIENCE AND TECHNOLOGY, 2023,
  • [32] Topic-Specific YouTube Crawling to Detect Online Radicalization
    Agarwal, Swati
    Sureka, Ashish
    DATABASES IN NETWORKED INFORMATION SYSTEMS (DNIS 2015), 2015, 8999 : 133 - 151
  • [33] Topic-Specific Emotion Mining Model for Online Comments
    Luo, Xiangfeng
    Yi, Yawen
    FUTURE INTERNET, 2019, 11 (03)
  • [34] An efficient topic-specific web text filtering framework
    Li, Q
    Li, JH
    WEB TECHNOLOGIES RESEARCH AND DEVELOPMENT - APWEB 2005, 2005, 3399 : 157 - 163
  • [35] A topic-specific dictionary construction algorithm for information retrieval
    Xu, Jingfang
    Li, Xing
    Li, Yue
    Jisuanji Gongcheng/Computer Engineering, 2005, 31 (21): : 143 - 145
  • [36] Maximal Sequence Mining Approach for Topic Detection from Microblog Streams
    Jafariakinabad, Fereshteh
    Hua, Kien A.
    PROCEEDINGS OF 2016 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2016,
  • [37] Topic Model on Microblog with Dual-Streams Graph Convolution Networks
    Wang, Haocheng
    He, Ruifang
    Liu, Huanyu
    Wu, Chenhao
    Wang, Bo
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [38] Microblog topic identification using Linked Open Data
    Yildirim, Ahmet
    Uskudarli, Suzan
    PLOS ONE, 2020, 15 (08):
  • [39] Topic-Level Influencers Identification in the Microblog Sphere
    Wang, Yakun
    Zhang, Zhongbao
    Su, Sen
    Chang, Cheng
    Zia, Muhammad Azam
    ECAI 2016: 22ND EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, 285 : 1559 - 1560
  • [40] A Topic-Specific Web Search System Focusing on Quality Pages
    Pirkola, Ari
    Talvensaari, Tuomas
    RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, 2010, 6273 : 490 - 493