Cross-Lingual Phrase Retrieval

被引:0
|
作者
Zheng, Heqi [1 ,2 ]
Zhang, Xiao [1 ]
Chi, Zewen [1 ]
Huang, Heyan [1 ,2 ]
Yan, Tan [1 ]
Lan, Tian [1 ]
Wei, Wei [3 ]
Mao, Xian-Ling [1 ]
机构
[1] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing, Peoples R China
[2] Beijing Engn Res Ctr High Volume Language Informa, Beijing, Peoples R China
[3] Huazhong Univ Sci & Technol, Wuhan, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-lingual retrieval aims to retrieve relevant text across languages. Current methods typically achieve cross-lingual retrieval by learning language-agnostic text representations in word or sentence level. However, how to learn phrase representations for cross-lingual phrase retrieval is still an open problem. In this paper, we propose XPR, a cross-lingual phrase retriever that extracts phrase representations from unlabeled example sentences. Moreover, we create a large-scale cross-lingual phrase retrieval dataset, which contains 65K bilingual phrase pairs and 4.2M example sentences in 8 English-centric language pairs. Experimental results show that XPR outperforms state-of-the-art baselines which utilize word-level or sentence-level representations. XPR also shows impressive zero-shot transferability that enables the model to perform retrieval in an unseen language pair during training. Our dataset, code, and trained models are publicly available at github.com/cwszz/XPR/.
引用
收藏
页码:4193 / 4204
页数:12
相关论文
共 50 条
  • [1] Unsupervised Cross-Lingual Mapping for Phrase Embedding Spaces
    Ayana, Abraham G.
    Cao, Hailong
    Zhao, Tiejun
    ADVANCES IN INFORMATION AND COMMUNICATION, VOL 2, 2020, 1130 : 512 - 524
  • [2] Semantic Cross-Lingual Information Retrieval
    Pourmahmoud, Solmaz
    Shamsfard, Mehrnoush
    23RD INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2008, : 80 - +
  • [3] A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval
    Ghanbari, Elham
    Shakery, Azadeh
    APPLIED INTELLIGENCE, 2022, 52 (03) : 3156 - 3174
  • [4] A Learning to rank framework based on cross-lingual loss function for cross-lingual information retrieval
    Elham Ghanbari
    Azadeh Shakery
    Applied Intelligence, 2022, 52 : 3156 - 3174
  • [5] On cross-lingual retrieval with multilingual text encoders
    Litschko, Robert
    Vulic, Ivan
    Ponzetto, Simone Paolo
    Glavas, Goran
    INFORMATION RETRIEVAL JOURNAL, 2022, 25 (02): : 149 - 183
  • [6] Query by Example for Cross-Lingual Event Retrieval
    Sarwar, Sheikh Muhammad
    Allan, James
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1601 - 1604
  • [7] Cross-lingual information retrieval by feature vectors
    Lilleng, Jeanine
    Tomassen, Stein L.
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS, 2007, 4592 : 229 - +
  • [8] Dictionary methods for cross-lingual information retrieval
    Ballesteros, L
    Croft, B
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, 1996, 1134 : 791 - 801
  • [9] A system for supporting cross-lingual information retrieval
    Capstick, J
    Diagne, AK
    Erbach, G
    Uszkoreit, H
    Leisenberg, A
    Leisenberg, M
    INFORMATION PROCESSING & MANAGEMENT, 2000, 36 (02) : 275 - 289
  • [10] Cross-lingual Language Model Pretraining for Retrieval
    Yu, Puxuan
    Fei, Hongliang
    Li, Ping
    PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021), 2021, : 1029 - 1039