Multilingual information retrieval in the language modeling framework

被引:12
|
作者
Rahimi, Razieh [1 ]
Shakery, Azadeh [1 ,2 ]
King, Irwin [3 ]
机构
[1] Univ Tehran, Coll Engn, Sch Elect & Comp Engn, Tehran, Iran
[2] Inst Res Fundamental Sci IPM, Sch Comp Sci, Tehran, Iran
[3] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Shatin, Hong Kong, Peoples R China
来源
INFORMATION RETRIEVAL JOURNAL | 2015年 / 18卷 / 03期
关键词
Multilingual information retrieval; Multilingual language models; KL-divergence framework; Language modeling framework; Multilingual feedback; MERGING STRATEGY; SYSTEM;
D O I
10.1007/s10791-015-9255-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multilingual information retrieval (MLIR) provides results that are more comprehensive than those of mono- and cross-lingual retrieval. Methods for MLIR are categorized as: (1) Fusion-based methods that merge results from multiple retrieval runs, and (2) Direct methods that build a unique index for the entire collection. Merging results of individual runs reduces the overall effectiveness, while more effective direct methods suffer from either time complexity and memory overhead, or over-weighting of index terms. In this paper, we propose a direct MLIR approach by using the language modeling framework that includes a novel multilingual language model estimation for documents, and a new way to globally estimate word statistics. These contributions enable ranking documents in multiple languages in one retrieval phase without having the problems of the previous direct methods. Moreover, our approach has the advantage of accommodating multilingual feedback information which helps to prevent query drift, and consequently to improve the performance. Finally, we effectively address the common case of incomplete coverage of translation resources in our proposed estimation methods. Experimental results show that the proposed approach outperforms the previous MLIR approaches.
引用
收藏
页码:246 / 281
页数:36
相关论文
共 50 条
  • [31] Language-modeling kernel based approach for information retrieval
    Xie, Ying
    Raghavan, Vijay V.
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2007, 58 (14): : 2353 - 2365
  • [32] A multi-dependency language modeling approach to information retrieval
    Cai, Keke
    Chen, Chun
    Bu, Jiajun
    Qiu, Guang
    Huang, Peng
    EMERGING TECHNOLOGIES IN KNOWLEDGE DISCOVERY AND DATA MINING, 2007, 4819 : 484 - +
  • [33] Learning a merge model for multilingual information retrieval
    Tsai, Ming-Feng
    Chen, Hsin-Hsi
    Wang, Yu-Ting
    INFORMATION PROCESSING & MANAGEMENT, 2011, 47 (05) : 635 - 646
  • [34] How Robust are Multilingual Information Retrieval Systems?
    Mandl, Thomas
    Womser-Hacker, Christa
    Di Nunzio, Giorgio
    Ferro, Nicola
    APPLIED COMPUTING 2008, VOLS 1-3, 2008, : 1132 - 1136
  • [35] Selection and merging strategies for multilingual information retrieval
    Savoy, J
    Berger, PY
    MULTILINGUAL INFORMATION ACCESS FOR TEXT, SPEECH AND IMAGES, 2005, 3491 : 27 - 37
  • [36] Multilingual Geographical Information Retrieval systems in CLEF
    Perea Ortega, Jose Manuel
    Garcia Cumbreras, Miguel Angel
    Garcia Vega, Manuel
    Urena Lopez, L. Alfonso
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2008, (40): : 129 - 136
  • [37] NLPIR: A theoretical framework for applying natural language processing to information retrieval
    Zhou, L
    Zhang, DS
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2003, 54 (02): : 115 - 123
  • [38] Summary-based model of information retrieval in language model framework
    Li, Weijiang
    Zhao, Tiejun
    Journal of Computational Information Systems, 2009, 5 (03): : 1201 - 1207
  • [39] Incorporating passage feature within language model framework for information retrieval
    Dang, Ke
    Zhao, Tiejun
    Qi, Haoliang
    Zheng, Dequan
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2007, 4394 : 476 - +
  • [40] Extending the language modeling framework for sentence retrieval to include local context
    Fernandez, Ronald T.
    Losada, David E.
    Azzopardi, Leif A.
    INFORMATION RETRIEVAL, 2011, 14 (04): : 355 - 389