Topic-Specific Post Identification in Microblog Streams

被引：0

作者：

Karunasekera, Shanika ^{[1
]}

Harwood, Aaron ^{[1
]}

Samarawickrama, Sameendra ^{[1
]}

Ramamohanarao, Kotagiri ^{[1
]}

Robins, Garry ^{[2
]}

机构：

[1] Univ Melbourne, Dept Comp & Informat Syst, Melbourne, Vic 3010, Australia

[2] Univ Melbourne, Melbourne Sch Psychol Sci, Melbourne, Vic 3010, Australia

来源：

2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2014年

关键词：

microblog; topic; keyword; query; document; term;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The tracking of microblog discussion, on a given topic, is useful for a wide range of higher level applications. Microblog services like Twitter provide a simple keyword based tracking capability, where any tweet containing a keyword is returned. Due to the short length of microblog posts, using a small number of topic specific query words for tracking, would impact recall. Use of a larger number of keywords (compared to regular document retrieval) is generally required in order to obtain good recall, but this would result in a large number of off-topic posts, resulting in low precision. In our work, we consider the scenario of using a large number of query terms to maintain high recall, for automated tracking of a microblog streams. The challenge we address is how to score each of the returned microblogs, with respect to the query, on-line, in an unsupervised manner, so as to identify those that are on topic. To this end, we proposed a new term-scoring expression, which we call Adjusted Information Gain (AIG), and we compare this to other term-scoring expressions: inverse document frequency, Dice, Jaccard and keyword frequency. Our comparisons consider a selection of document-scoring functions applied to roughly 40 million tweets collects over a 20 day period for each of two topics. Our results show significant improvements (from 8%-40% of the area under the ROC curves) to existing term-scoring expressions, depending on topic and specificity, and provide insight into further work in query expansion techniques.

引用

页数：7

共 50 条

[31] Topic-specific characteristics of proof-related reasoning
Bergwall, Andreas
INTERNATIONAL JOURNAL OF MATHEMATICAL EDUCATION IN SCIENCE AND TECHNOLOGY, 2023,
[32] Topic-Specific YouTube Crawling to Detect Online Radicalization
Agarwal, Swati
Sureka, Ashish
DATABASES IN NETWORKED INFORMATION SYSTEMS (DNIS 2015), 2015, 8999 : 133 - 151
[33] Topic-Specific Emotion Mining Model for Online Comments
Luo, Xiangfeng
Yi, Yawen
FUTURE INTERNET, 2019, 11 (03)
[34] An efficient topic-specific web text filtering framework
Li, Q
Li, JH
WEB TECHNOLOGIES RESEARCH AND DEVELOPMENT - APWEB 2005, 2005, 3399 : 157 - 163
[35] A topic-specific dictionary construction algorithm for information retrieval
Xu, Jingfang
Li, Xing
Li, Yue
Jisuanji Gongcheng/Computer Engineering, 2005, 31 (21): : 143 - 145
[36] Maximal Sequence Mining Approach for Topic Detection from Microblog Streams
Jafariakinabad, Fereshteh
Hua, Kien A.
PROCEEDINGS OF 2016 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2016,
[37] Topic Model on Microblog with Dual-Streams Graph Convolution Networks
Wang, Haocheng
He, Ruifang
Liu, Huanyu
Wu, Chenhao
Wang, Bo
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[38] Microblog topic identification using Linked Open Data
Yildirim, Ahmet
Uskudarli, Suzan
PLOS ONE, 2020, 15 (08):
[39] Topic-Level Influencers Identification in the Microblog Sphere
Wang, Yakun
Zhang, Zhongbao
Su, Sen
Chang, Cheng
Zia, Muhammad Azam
ECAI 2016: 22ND EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, 285 : 1559 - 1560
[40] A Topic-Specific Web Search System Focusing on Quality Pages
Pirkola, Ari
Talvensaari, Tuomas
RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, 2010, 6273 : 490 - 493

← 1 2 3 4 5 →