Cross-Lingual Phrase Retrieval

被引:0
|
作者
Zheng, Heqi [1 ,2 ]
Zhang, Xiao [1 ]
Chi, Zewen [1 ]
Huang, Heyan [1 ,2 ]
Yan, Tan [1 ]
Lan, Tian [1 ]
Wei, Wei [3 ]
Mao, Xian-Ling [1 ]
机构
[1] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing, Peoples R China
[2] Beijing Engn Res Ctr High Volume Language Informa, Beijing, Peoples R China
[3] Huazhong Univ Sci & Technol, Wuhan, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-lingual retrieval aims to retrieve relevant text across languages. Current methods typically achieve cross-lingual retrieval by learning language-agnostic text representations in word or sentence level. However, how to learn phrase representations for cross-lingual phrase retrieval is still an open problem. In this paper, we propose XPR, a cross-lingual phrase retriever that extracts phrase representations from unlabeled example sentences. Moreover, we create a large-scale cross-lingual phrase retrieval dataset, which contains 65K bilingual phrase pairs and 4.2M example sentences in 8 English-centric language pairs. Experimental results show that XPR outperforms state-of-the-art baselines which utilize word-level or sentence-level representations. XPR also shows impressive zero-shot transferability that enables the model to perform retrieval in an unseen language pair during training. Our dataset, code, and trained models are publicly available at github.com/cwszz/XPR/.
引用
收藏
页码:4193 / 4204
页数:12
相关论文
共 50 条
  • [21] Cross-Lingual Information Retrieval System for Indian Languages
    Jagarlamudi, Jagadeesh
    Kumaran, A.
    ADVANCES IN MULTILINGUAL AND MULTIMODAL INFORMATION RETRIEVAL, 2008, 5152 : 80 - 87
  • [22] CL2CM: Improving Cross-Lingual Cross-Modal Retrieval via Cross-Lingual Knowledge Transfer
    Wang, Yabing
    Wang, Fan
    Dong, Jianfeng
    Luo, Hao
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5651 - 5659
  • [23] Cross-Lingual Transfer Learning for Phrase Break Prediction with Multilingual Language Model
    Lee, Hoyeon
    Yoon, Hyun-Wook
    Kim, Jong-Hwan
    Kim, Jae-Min
    INTERSPEECH 2023, 2023, : 611 - 615
  • [24] Steering Large Language Models for Cross-lingual Information Retrieval
    Guo, Ping
    Ren, Yubing
    Hu, Yue
    Cao, Yanan
    Li, Yunpeng
    Huang, Heyan
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 585 - 596
  • [25] Cross-Lingual Image Retrieval Interactions Based on a Game Competition
    Di Nunzio, Giorgio Maria
    EVALUATING SYSTEMS FOR MULTILINGUAL AND MULTIMODAL INFORMATION ACCESS, 2009, 5706 : 243 - 250
  • [26] PHONETIC NAME MATCHING FOR CROSS-LINGUAL SPOKEN SENTENCE RETRIEVAL
    Ji, Heng
    Grishman, Ralph
    Wang, Wen
    2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 281 - +
  • [27] Cross-lingual information retrieval using hidden Markov models
    Xu, JX
    Weischedel, R
    PROCEEDINGS OF THE 2000 JOINT SIGDAT CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND VERY LARGE CORPORA, 2000, : 95 - 103
  • [28] Cross-lingual Information Retrieval: application and Challenges for Indian Languages
    Patel, Jay
    Makvana, Kamlesh
    Shah, Parth
    2019 IEEE 5TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2019,
  • [29] Supporting Arabic Cross-Lingual Retrieval Using Contextual Information
    Ahmed, Farag
    Nuernberger, Andreas
    Nitsche, Marcus
    MULTIDISCIPLINARY INFORMATION RETRIEVAL, 2011, 6653 : 30 - 45
  • [30] Cross-Lingual Document Retrieval Using Regularized Wasserstein Distance
    Balikas, Georgios
    Laclau, Charlotte
    Redko, Ievgen
    Amini, Massih-Reza
    ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018), 2018, 10772 : 398 - 410