Multilingual information retrieval in the language modeling framework

被引:12
|
作者
Rahimi, Razieh [1 ]
Shakery, Azadeh [1 ,2 ]
King, Irwin [3 ]
机构
[1] Univ Tehran, Coll Engn, Sch Elect & Comp Engn, Tehran, Iran
[2] Inst Res Fundamental Sci IPM, Sch Comp Sci, Tehran, Iran
[3] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Shatin, Hong Kong, Peoples R China
来源
INFORMATION RETRIEVAL JOURNAL | 2015年 / 18卷 / 03期
关键词
Multilingual information retrieval; Multilingual language models; KL-divergence framework; Language modeling framework; Multilingual feedback; MERGING STRATEGY; SYSTEM;
D O I
10.1007/s10791-015-9255-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multilingual information retrieval (MLIR) provides results that are more comprehensive than those of mono- and cross-lingual retrieval. Methods for MLIR are categorized as: (1) Fusion-based methods that merge results from multiple retrieval runs, and (2) Direct methods that build a unique index for the entire collection. Merging results of individual runs reduces the overall effectiveness, while more effective direct methods suffer from either time complexity and memory overhead, or over-weighting of index terms. In this paper, we propose a direct MLIR approach by using the language modeling framework that includes a novel multilingual language model estimation for documents, and a new way to globally estimate word statistics. These contributions enable ranking documents in multiple languages in one retrieval phase without having the problems of the previous direct methods. Moreover, our approach has the advantage of accommodating multilingual feedback information which helps to prevent query drift, and consequently to improve the performance. Finally, we effectively address the common case of incomplete coverage of translation resources in our proposed estimation methods. Experimental results show that the proposed approach outperforms the previous MLIR approaches.
引用
收藏
页码:246 / 281
页数:36
相关论文
共 50 条
  • [21] MULTILINGUAL THESAURI FOR INFORMATION-RETRIEVAL
    不详
    NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 1-ORGANIZATSIYA I METODIKA INFORMATSIONNOI RABOTY, 1975, (09): : 31 - 32
  • [22] Linguistic Information in Multilingual Image Retrieval
    Hernandez-Aranda, David
    Fresno Fernandez, Victor
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2013, (51): : 33 - 40
  • [23] Dependency structure applied to language modeling for information retrieval
    Lee, Changki
    Lee, Gary Geunbae
    Jang, Myung-Gil
    ETRI JOURNAL, 2006, 28 (03) : 337 - 346
  • [24] Conceptual indexing for multilingual information retrieval
    Guyot, Jacques
    Radhouani, Said
    Falquet, Gilles
    ACCESSING MULTILINGUAL INFORMATION REPOSITORIES, 2006, 4022 : 102 - 112
  • [25] Merging mechanisms in multilingual information retrieval
    Lin, WC
    Chen, HH
    ADVANCES IN CROSS-LANGUAGE INFORMATION RETRIEVAL, 2003, 2785 : 175 - 186
  • [26] Neural Approaches to Multilingual Information Retrieval
    Lawrie, Dawn
    Yang, Eugene
    Oard, Douglas W.
    Mayfield, James
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT I, 2023, 13980 : 521 - 536
  • [27] Language-Preference-Based Re-ranking for Multilingual Swahili Information Retrieval
    Telemala, Joseph P.
    Suleman, Hussein
    PROCEEDINGS OF THE 2022 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2022, 2022, : 54 - 62
  • [28] Mining a multilingual association dictionary from Wikipedia for cross-language information retrieval
    Ye, Zheng
    Huang, Jimmy Xiangji
    He, Ben
    Lin, Hongfei
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2012, 63 (12): : 2474 - 2487
  • [29] A NEURAL DOCUMENT LANGUAGE MODELING FRAMEWORK FOR SPOKEN DOCUMENT RETRIEVAL
    Yen, Li-Phen
    Wu, Zhen-Yu
    Chen, Kuan-Yu
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8139 - 8143
  • [30] Research on Chinese information retrieval based on a hybrid language modeling
    Zheng, De-Quan
    Zhao, Tie-Jun
    Yu, Feng
    Li, Sheng
    Yu, Hao
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 2586 - +