Multilingual information retrieval in the language modeling framework

被引:12
|
作者
Rahimi, Razieh [1 ]
Shakery, Azadeh [1 ,2 ]
King, Irwin [3 ]
机构
[1] Univ Tehran, Coll Engn, Sch Elect & Comp Engn, Tehran, Iran
[2] Inst Res Fundamental Sci IPM, Sch Comp Sci, Tehran, Iran
[3] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Shatin, Hong Kong, Peoples R China
来源
INFORMATION RETRIEVAL JOURNAL | 2015年 / 18卷 / 03期
关键词
Multilingual information retrieval; Multilingual language models; KL-divergence framework; Language modeling framework; Multilingual feedback; MERGING STRATEGY; SYSTEM;
D O I
10.1007/s10791-015-9255-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multilingual information retrieval (MLIR) provides results that are more comprehensive than those of mono- and cross-lingual retrieval. Methods for MLIR are categorized as: (1) Fusion-based methods that merge results from multiple retrieval runs, and (2) Direct methods that build a unique index for the entire collection. Merging results of individual runs reduces the overall effectiveness, while more effective direct methods suffer from either time complexity and memory overhead, or over-weighting of index terms. In this paper, we propose a direct MLIR approach by using the language modeling framework that includes a novel multilingual language model estimation for documents, and a new way to globally estimate word statistics. These contributions enable ranking documents in multiple languages in one retrieval phase without having the problems of the previous direct methods. Moreover, our approach has the advantage of accommodating multilingual feedback information which helps to prevent query drift, and consequently to improve the performance. Finally, we effectively address the common case of incomplete coverage of translation resources in our proposed estimation methods. Experimental results show that the proposed approach outperforms the previous MLIR approaches.
引用
收藏
页码:246 / 281
页数:36
相关论文
共 50 条
  • [41] Extending the language modeling framework for sentence retrieval to include local context
    Ronald T. Fernández
    David E. Losada
    Leif A. Azzopardi
    Information Retrieval, 2011, 14 : 355 - 389
  • [42] LANGUAGE-AGNOSTIC MULTILINGUAL MODELING
    Datta, Arindrima
    Ramabhadran, Bhuvana
    Emond, Jesse
    Kannan, Anjuli
    Roark, Brian
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8239 - 8243
  • [43] Automatic processing of multilingual medical terminology:: applications to thesaurus enrichment and cross-language information retrieval
    Déjean, H
    Gaussier, E
    Renders, JM
    Sadat, F
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2005, 33 (02) : 111 - 124
  • [44] Cross-Lingual Information Retrieval from Multilingual Construction Documents Using Pretrained Language Models
    Kim, Jungyeon
    Chung, Sehwan
    Chi, Seokho
    JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT, 2024, 150 (06)
  • [45] Using Web resources to construct multilingual medical thesaurus for cross-language medical information retrieval
    Lu, Wen-Hsiang
    Lin, Ray S.
    Chan, Yi-Che
    Chen, Kuan-Hsi
    DECISION SUPPORT SYSTEMS, 2008, 45 (03) : 585 - 595
  • [46] Multilingual single document keyword extraction for information retrieval
    Bracewell, DB
    Ren, FJ
    Kuriowa, S
    Proceedings of the 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE'05), 2005, : 517 - 522
  • [47] Multilingual information retrieval using English and Chinese queries
    Chen, AT
    EVLAUATION OF CROSS-LANGUAGE INFORMATION RETRIEVAL SYSTEMS, 2002, 2406 : 44 - 58
  • [48] Multilingual Information Retrieval in Thoracic Radiology: Feasibility Study
    Castilla, Andre Coutinho
    Furuie, Sergio Shiguemi
    Mendonca, Eneida A.
    MEDINFO 2007: PROCEEDINGS OF THE 12TH WORLD CONGRESS ON HEALTH (MEDICAL) INFORMATICS, PTS 1 AND 2: BUILDING SUSTAINABLE HEALTH SYSTEMS, 2007, 129 : 387 - +
  • [49] An Information Retrieval Based Approach for Multilingual Ontology Matching
    Rexha, Andi
    Dragoni, Mauro
    Kern, Roman
    Kroell, Mark
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2016, 2016, 9612 : 433 - 439
  • [50] An Exploration of Users' Needs for Multilingual Information Retrieval and Access
    Vassilakaki, Evgenia
    Garoufallou, Emmanouel
    Johnson, Frances
    Hartley, R. J.
    METADATA AND SEMANTICS RESEARCH, MTSR 2015, 2015, 544 : 249 - 258