Topic-Specific Post Identification in Microblog Streams

被引:0
|
作者
Karunasekera, Shanika [1 ]
Harwood, Aaron [1 ]
Samarawickrama, Sameendra [1 ]
Ramamohanarao, Kotagiri [1 ]
Robins, Garry [2 ]
机构
[1] Univ Melbourne, Dept Comp & Informat Syst, Melbourne, Vic 3010, Australia
[2] Univ Melbourne, Melbourne Sch Psychol Sci, Melbourne, Vic 3010, Australia
关键词
microblog; topic; keyword; query; document; term;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The tracking of microblog discussion, on a given topic, is useful for a wide range of higher level applications. Microblog services like Twitter provide a simple keyword based tracking capability, where any tweet containing a keyword is returned. Due to the short length of microblog posts, using a small number of topic specific query words for tracking, would impact recall. Use of a larger number of keywords (compared to regular document retrieval) is generally required in order to obtain good recall, but this would result in a large number of off-topic posts, resulting in low precision. In our work, we consider the scenario of using a large number of query terms to maintain high recall, for automated tracking of a microblog streams. The challenge we address is how to score each of the returned microblogs, with respect to the query, on-line, in an unsupervised manner, so as to identify those that are on topic. To this end, we proposed a new term-scoring expression, which we call Adjusted Information Gain (AIG), and we compare this to other term-scoring expressions: inverse document frequency, Dice, Jaccard and keyword frequency. Our comparisons consider a selection of document-scoring functions applied to roughly 40 million tweets collects over a 20 day period for each of two topics. Our results show significant improvements (from 8%-40% of the area under the ROC curves) to existing term-scoring expressions, depending on topic and specificity, and provide insight into further work in query expansion techniques.
引用
收藏
页数:7
相关论文
共 50 条
  • [41] Topic-specific crawling on the Web with the measurements of the relevancy context graph
    Hsu, CC
    Wu, F
    INFORMATION SYSTEMS, 2006, 31 (4-5) : 232 - 246
  • [42] Identifying off-topic student essays without topic-specific training data
    Higgins, D.
    Burstein, J.
    Attali, Y.
    Natural Language Engineering, 2006, 12 (02) : 145 - 159
  • [43] Metadata based web mining for topic-specific information gathering
    Yi, J
    Sundaresan, N
    Huang, A
    ELECTRONIC COMMERCE AND WEB TECHNOLOGIES, PROCEEDINGS, 2000, 1875 : 359 - 368
  • [44] A Topic-Specific Contextual Expert Finding Method in Social Network
    Xie, Xiaoqin
    Li, Yijia
    Zhang, Zhiqiang
    Pan, Haiwei
    Han, Shuai
    WEB TECHNOLOGIES AND APPLICATIONS, PT I, 2016, 9931 : 292 - 303
  • [45] A data mining approach to topic-specific web resource discovery
    Xiang, Lei
    Meng, Xin
    ICICTA: 2009 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTATION TECHNOLOGY AND AUTOMATION, VOL II, PROCEEDINGS, 2009, : 595 - 599
  • [46] A topic-specific web robot model based on restless bandits
    O'Meara, T
    Patel, A
    IEEE INTERNET COMPUTING, 2001, 5 (02) : 27 - 35
  • [47] A Topic-Specific Web Crawler using Deep Convolutional Networks
    ALqaraleh, Saed
    Sirin, Hatice Meltem Nergiz
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2023, 20 (03) : 310 - 318
  • [48] Using intelligent Internet robots for building topic-specific collections
    Romanova, EV
    Romanov, MV
    Nekrest'yanov, IS
    PROGRAMMING AND COMPUTER SOFTWARE, 2000, 26 (03) : 163 - 169
  • [49] Using intelligent internet robots for building topic-specific collections
    E. V. Romanova
    M. V. Romanov
    I. S. Nekrest'yanov
    Programming and Computer Software, 2000, 26 : 163 - 169
  • [50] STATE PHARMACY BOARDS ADOPTING TOPIC-SPECIFIC CE REQUIREMENTS
    SOARES, MS
    AMERICAN JOURNAL OF HOSPITAL PHARMACY, 1991, 48 (12): : 2560 - 2560