Improving Retrieval of Short Texts Through Document Expansion

被引:0
|
作者
Efron, Miles [1 ]
Organisciak, Peter [1 ]
Fenlon, Katrina [1 ]
机构
[1] Univ Illinois, 501 E Daniel St,MC 492, Champaign, IL 61820 USA
关键词
Information retrieval; microblogs; twitter; Dublin Core; document expansion; language models; temporal IR;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Collections containing a large number of short documents are becoming increasingly common. As these collections grow in number and size, providing effective retrieval of brief texts presents a significant research problem. We propose a novel approach to improving information retrieval (IR) for short texts based on aggressive document expansion. Starting from the hypothesis that short documents tend to be about a single topic, we submit documents as pseudo-queries and analyze the results to learn about the documents themselves. Document expansion helps in this context because short documents yield little in the way of term frequency information. However, as we show, the proposed technique helps us model not only lexical properties, but also temporal properties of documents. We present experimental results using a corpus of microblog (Twitter) data and a corpus of metadata records from a federated digital library. With respect to established baselines, results of these experiments show that applying our proposed document expansion method yields significant improvements in effectiveness. Specifically, our method improves the lexical representation of documents and the ability to let time influence retrieval.
引用
收藏
页码:911 / 920
页数:10
相关论文
共 50 条
  • [41] Improving the retrieval effectiveness of very short queries
    Crouch, CJ
    Crouch, DB
    Chen, QY
    Holtz, SJ
    INFORMATION PROCESSING & MANAGEMENT, 2002, 38 (01) : 1 - 36
  • [42] A CONCEPTUAL-MODEL FOR STORAGE AND RETRIEVAL OF SHORT SCIENTIFIC TEXTS
    CHEN, ZX
    INFORMATION PROCESSING & MANAGEMENT, 1993, 29 (02) : 209 - 214
  • [43] Improving the retrieval accuracy by dynamically adjusting metadata for document databases
    Chen, X
    Kiyoki, Y
    Proceedings of the IASTED International Conference on Databases and Applications, 2004, : 74 - 80
  • [44] Refining aggregation functions for improving document ranking in information retrieval
    Boughanem, Mohand
    Loiseau, Yannick
    Prade, Henri
    SCALABLE UNCERTAINTY MANAGEMENT, PROCEEDINGS, 2007, 4772 : 255 - +
  • [45] Feature weighting for improving document image retrieval system performance
    Keyvanpour, M., 1600, International Journal of Computer Science Issues (IJCSI) (09): : 3 - 3
  • [46] Improving information retrieval by combining user profile and document segmentation
    LaineCruzel, S
    Lafouge, T
    Lardy, JP
    BenAbdallah, N
    INFORMATION PROCESSING & MANAGEMENT, 1996, 32 (03) : 305 - 315
  • [47] A Framework for Task-specific Short Document Expansion
    Bairi, Ramakrishna B.
    Udupa, Raghavendra
    Ramakrishnan, Ganesh
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 791 - 800
  • [48] An LDA-smoothed Relevance Model for Document Expansion: A Case Study for Spoken Document Retrieval
    Ganguly, Debasis
    Leveling, Johannes
    Jones, Gareth J. F.
    SIGIR'13: THE PROCEEDINGS OF THE 36TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH & DEVELOPMENT IN INFORMATION RETRIEVAL, 2013, : 1057 - 1060
  • [49] Improving the accessibility of biomedical texts by semantic enrichment and definition expansion
    Accuosto, Pablo
    Saggion, Horacio
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2018, (61): : 57 - 64
  • [50] IMPROVING THE QUALITY OF LEARNING THROUGH THE QUESTIONS OF TEXTS
    Sula, Artur
    Lama, Irena Ndoci
    Gjokutaj, Mimoza
    PROBLEMS OF EDUCATION IN THE 21ST CENTURY, 2011, 36 : 106 - 115