Topic-Specific Post Identification in Microblog Streams

被引:0
|
作者
Karunasekera, Shanika [1 ]
Harwood, Aaron [1 ]
Samarawickrama, Sameendra [1 ]
Ramamohanarao, Kotagiri [1 ]
Robins, Garry [2 ]
机构
[1] Univ Melbourne, Dept Comp & Informat Syst, Melbourne, Vic 3010, Australia
[2] Univ Melbourne, Melbourne Sch Psychol Sci, Melbourne, Vic 3010, Australia
关键词
microblog; topic; keyword; query; document; term;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The tracking of microblog discussion, on a given topic, is useful for a wide range of higher level applications. Microblog services like Twitter provide a simple keyword based tracking capability, where any tweet containing a keyword is returned. Due to the short length of microblog posts, using a small number of topic specific query words for tracking, would impact recall. Use of a larger number of keywords (compared to regular document retrieval) is generally required in order to obtain good recall, but this would result in a large number of off-topic posts, resulting in low precision. In our work, we consider the scenario of using a large number of query terms to maintain high recall, for automated tracking of a microblog streams. The challenge we address is how to score each of the returned microblogs, with respect to the query, on-line, in an unsupervised manner, so as to identify those that are on topic. To this end, we proposed a new term-scoring expression, which we call Adjusted Information Gain (AIG), and we compare this to other term-scoring expressions: inverse document frequency, Dice, Jaccard and keyword frequency. Our comparisons consider a selection of document-scoring functions applied to roughly 40 million tweets collects over a 20 day period for each of two topics. Our results show significant improvements (from 8%-40% of the area under the ROC curves) to existing term-scoring expressions, depending on topic and specificity, and provide insight into further work in query expansion techniques.
引用
收藏
页数:7
相关论文
共 50 条
  • [21] Topic-specific web content adaptation to mobile devices
    Lee, Eunshil
    Kang, Jinbeom
    Choi, Joongmin
    Yang, Jaeyoung
    2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 845 - +
  • [22] Identifying the Topic-Specific Influential Users using SLM
    Shalaby, May
    Rafea, Ahmed
    2015 FIRST INTERNATIONAL CONFERENCE ON ARABIC COMPUTATIONAL LINGUISTICS (ACLING 2015): ADVANCES IN ARABIC COMPUTATIONAL LINGUISTICS, 2015, : 118 - 123
  • [23] An intelligent topic-specific crawler using degree of relevance
    Noh, S
    Choi, Y
    Seo, H
    Choi, K
    Jung, G
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2004, PROCEEDINGS, 2004, 3177 : 491 - 498
  • [24] Inherit/feedback: A topic-specific hyperlink analysis method
    Yang, P
    Zheng, QL
    Peng, H
    Li, YJ
    8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL II, PROCEEDINGS: COMPUTING TECHNIQUES, 2004, : 267 - 272
  • [25] A topic-specific crawling strategy based on semantics similarity
    Du, YaJun
    Pen, QiangQiang
    Gao, ZhaoQiong
    DATA & KNOWLEDGE ENGINEERING, 2013, 88 : 75 - 93
  • [26] Effectiveness of Topic-specific Infobuttons: A randomized Controlled Trial
    Del Fiol, Guilherme
    Haug, Peter J.
    Cimino, James J.
    Narus, Scott P.
    Norlin, Chuck
    Mitchell, Joyce A.
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2008, 15 (06) : 752 - 759
  • [27] Augmented Topic-Specific Summarization for Domain Dialogue Text
    Rao, Zhiqiang
    Wei, Daimeng
    Li, Zongyao
    Shang, Hengchao
    Yang, Jinlong
    Yu, Zhengzhe
    Li, Shaojun
    Wu, Zhanglin
    Lei, Lizhi
    Yang, Hao
    Qin, Ying
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT II, 2022, 13552 : 274 - 283
  • [28] Combination of link and content analysis for topic-specific crawler
    Zhang, Huiying
    Yin, Chunxia
    Yuan, Fuyong
    Journal of Information and Computational Science, 2009, 6 (01): : 33 - 39
  • [29] Topic-specific text filtering based on multiple reducts
    Li, Q
    Li, JH
    AUTONOMOUS INTELLIGENT SYSTEMS: AGENTS AND DATA MINING, PROCEEDINGS, 2005, 3505 : 175 - 183
  • [30] Coordination and communication among Topic-Specific Search Agents
    Xiang, Dan
    Du, YaJun
    Yi, LiangZhong
    Li, Kai
    ICNC 2007: THIRD INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 4, PROCEEDINGS, 2007, : 703 - +