Improving Retrieval of Short Texts Through Document Expansion

被引:0
|
作者
Efron, Miles [1 ]
Organisciak, Peter [1 ]
Fenlon, Katrina [1 ]
机构
[1] Univ Illinois, 501 E Daniel St,MC 492, Champaign, IL 61820 USA
关键词
Information retrieval; microblogs; twitter; Dublin Core; document expansion; language models; temporal IR;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Collections containing a large number of short documents are becoming increasingly common. As these collections grow in number and size, providing effective retrieval of brief texts presents a significant research problem. We propose a novel approach to improving information retrieval (IR) for short texts based on aggressive document expansion. Starting from the hypothesis that short documents tend to be about a single topic, we submit documents as pseudo-queries and analyze the results to learn about the documents themselves. Document expansion helps in this context because short documents yield little in the way of term frequency information. However, as we show, the proposed technique helps us model not only lexical properties, but also temporal properties of documents. We present experimental results using a corpus of microblog (Twitter) data and a corpus of metadata records from a federated digital library. With respect to established baselines, results of these experiments show that applying our proposed document expansion method yields significant improvements in effectiveness. Specifically, our method improves the lexical representation of documents and the ability to let time influence retrieval.
引用
收藏
页码:911 / 920
页数:10
相关论文
共 50 条
  • [31] Document Retrieval Model Through Semantic Linking
    Ensan, Faezeh
    Bagheri, Ebrahim
    WSDM'17: PROCEEDINGS OF THE TENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2017, : 181 - 190
  • [32] Language model expansion using webdata for spoken document retrieval
    Masumura, Ryo
    Hahm, Seongjun
    Ito, Akinori
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2144 - 2147
  • [33] Document retrieval through concept hierarchy formulation
    Schönhofen, Péter
    Charaf, Hassan
    Periodica Polytechnica Electrical Engineering, 2001, 45 (02): : 91 - 108
  • [34] Clicked Phrase Document Expansion for Sponsored Search Ad Retrieval
    Hillard, Dustin
    Leggetter, Chris
    SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 799 - 800
  • [35] Enhanced Web document retrieval using automatic query expansion
    Khan, MS
    Khor, S
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2004, 55 (01): : 29 - 40
  • [36] Query expansion for document retrieval by mining additional query terms
    National Taiwan University of Science and Technology, Taiwan
    不详
    不详
    不详
    Int J Inf Manage Sci, 2008, 1 (17-30):
  • [37] Simple weighting techniques for query expansion in biomedical document retrieval
    Song, Young-In
    Han, Kyoung-Soo
    Park, So-Young
    Kim, Sang-Bum
    Rim, Hae-Chang
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (11) : 1873 - 1876
  • [38] A document expansion framework for tag-based image retrieval
    Lu, Wei
    Ding, Heng
    Jiang, Jiepu
    ASLIB JOURNAL OF INFORMATION MANAGEMENT, 2018, 70 (01) : 47 - 65
  • [39] Topic Modeling of Short Texts: A Pseudo-Document View
    Zuo, Yuan
    Wu, Junjie
    Zhang, Hui
    Lin, Hao
    Wang, Fei
    Xu, Ke
    Xiong, Hui
    KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 2105 - 2114
  • [40] Semantic annotation of biomedical texts through concept retrieval
    Berlanga, Rafael
    Nebot, Victoria
    Jimenez, Ernesto
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2010, (45): : 247 - 250