Improving Retrieval of Short Texts Through Document Expansion

被引:0
|
作者
Efron, Miles [1 ]
Organisciak, Peter [1 ]
Fenlon, Katrina [1 ]
机构
[1] Univ Illinois, 501 E Daniel St,MC 492, Champaign, IL 61820 USA
关键词
Information retrieval; microblogs; twitter; Dublin Core; document expansion; language models; temporal IR;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Collections containing a large number of short documents are becoming increasingly common. As these collections grow in number and size, providing effective retrieval of brief texts presents a significant research problem. We propose a novel approach to improving information retrieval (IR) for short texts based on aggressive document expansion. Starting from the hypothesis that short documents tend to be about a single topic, we submit documents as pseudo-queries and analyze the results to learn about the documents themselves. Document expansion helps in this context because short documents yield little in the way of term frequency information. However, as we show, the proposed technique helps us model not only lexical properties, but also temporal properties of documents. We present experimental results using a corpus of microblog (Twitter) data and a corpus of metadata records from a federated digital library. With respect to established baselines, results of these experiments show that applying our proposed document expansion method yields significant improvements in effectiveness. Specifically, our method improves the lexical representation of documents and the ability to let time influence retrieval.
引用
收藏
页码:911 / 920
页数:10
相关论文
共 50 条
  • [1] Short Texts Classification Through Reference Document Expansion
    Yang Zhen
    Fan Kefeng
    Lai Yingxu
    Gao Kaiming
    Wang Yong
    CHINESE JOURNAL OF ELECTRONICS, 2014, 23 (02) : 315 - 321
  • [2] Short Texts Classification Through Reference Document Expansion
    YANG Zhen
    FAN Kefeng
    LAI Yingxu
    GAO Kaiming
    WANG Yong
    ChineseJournalofElectronics, 2014, 23 (02) : 315 - 321
  • [3] Short texts classification through reference document expansion
    1600, Chinese Institute of Electronics (23):
  • [4] Improving MEDLINE document retrieval using automatic query expansion
    Yoo, Sooyoung
    Choi, Jinwook
    ASIAN DIGITAL LIBRARIES: LOOKING BACK 10 YEARS AND FORGING NEW FRONTIERS, PROCEEDINGS, 2007, 4822 : 241 - 249
  • [5] Catalogue expansion through cataloguing and automatic document retrieval
    Lepsky, K
    Zimmermann, HH
    ZEITSCHRIFT FUR BIBLIOTHEKSWESEN UND BIBLIOGRAPHIE, 2000, 47 (04): : 305 - 316
  • [6] Document expansion for speech retrieval
    Singhal, A
    Pereira, F
    SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, : 34 - 41
  • [7] IMPROVING PHONEME-BASED SPOKEN DOCUMENT RETRIEVAL WITH PHONETIC CONTEXT EXPANSION
    Olivier, Le Blouch
    Collen, Patrice
    2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, 2008, : 1217 - 1220
  • [8] Soft Computing Techniques Based Automatic Query Expansion Approach for Improving Document Retrieval
    Sharma, Dilip Kumar
    Pamula, Rajendra
    Chauhan, D. S.
    PROCEEDINGS 2019 AMITY INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AICAI), 2019, : 972 - 976
  • [9] Improving Document Clustering for Short Texts by Long Documents via a Dirichlet Multinomial Allocation Model
    Yan, Yingying
    Huang, Ruizhang
    Ma, Can
    Xu, Liyang
    Ding, Zhiyuan
    Wang, Rui
    Huang, Ting
    Liu, Bowei
    WEB AND BIG DATA, APWEB-WAIM 2017, PT I, 2017, 10366 : 626 - 641
  • [10] Query expansion and query reduction in document retrieval
    Zukerman, I
    Raskutti, B
    Wen, YY
    15TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2003, : 552 - 559