Multilingual Universal Sentence Encoder for Semantic Retrieval

被引:0
|
作者
Yang, Yinfei [1 ]
Cer, Daniel [1 ]
Ahmad, Amin [1 ]
Guo, Mandy [1 ]
Law, Jax [1 ]
Constant, Noah [1 ]
Abrego, Gustavo Hernandez [1 ]
Yuan, Steve [2 ]
Tar, Chris [1 ]
Sung, Yun-Hsuan [1 ]
Strope, Brian [1 ]
Kurzweil, Ray [1 ]
机构
[1] Google AI, Mountain View, CA 94043 USA
[2] Google, Cambridge, MA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present easy-to-use retrieval focused multilingual sentence embedding models, made available on TensorFlow Hub. The models embed text from 16 languages into a shared semantic space using a multi-task trained dual-encoder that learns tied cross-lingual representations via translation bridge tasks (Chidambaram et al., 2018). The models achieve a new state-of-the-art in performance on monolingual and cross-lingual semantic retrieval (SR). Competitive performance is obtained on the related tasks of translation pair bitext retrieval (BR) and retrieval question answering (ReQA). On transfer learning tasks, our multilingual embeddings approach, and in some cases exceed, the performance of English only sentence embeddings.
引用
收藏
页码:87 / 94
页数:8
相关论文
共 50 条
  • [41] Multilingual sentence categorization and novelty mining
    Zhang, Yi
    Tsai, Flora S.
    Kwee, Agus Trisnajaya
    INFORMATION PROCESSING & MANAGEMENT, 2011, 47 (05) : 667 - 675
  • [42] THEMATIZATION AND SENTENCE RETRIEVAL
    PERFETTI, CA
    GOLDMAN, SR
    JOURNAL OF VERBAL LEARNING AND VERBAL BEHAVIOR, 1974, 13 (01): : 70 - 79
  • [43] SENTENCE STORAGE AND RETRIEVAL
    WEARING, AJ
    PSYCHONOMIC SCIENCE, 1969, 17 (02): : 118 - &
  • [44] CODING AND SENTENCE RETRIEVAL
    DOSHER, B
    JOURNAL OF PSYCHOLINGUISTIC RESEARCH, 1983, 12 (05) : 528 - 528
  • [45] Multilingual Corpus Creation for Multilingual Semantic Similarity Task
    Ahmed, Mahtab
    Dixit, Chahna
    Mercer, Robert E.
    Khan, Atif
    Samee, Muhammad Rifayat
    Urra, Felipe
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4190 - 4196
  • [46] Distillation for Multilingual Information Retrieval
    Yang, Eugene
    Lawrie, Dawn
    Mayfield, James
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2368 - 2373
  • [47] The ICSI plus Multilingual Sentence Segmentation System
    Zimmerman, M.
    Hakkani-Tuer, D.
    Fung, J.
    Mirghafori, N.
    Gottlieb, L.
    Shriberg, E.
    Liu, Y.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 117 - 120
  • [48] Toward Computational Models of Multilingual Sentence Processing
    Frank, Stefan L.
    LANGUAGE LEARNING, 2021, 71 : 193 - 218
  • [49] Multilingual information retrieval system
    Hong, Z
    Syin, C
    Lia, KF
    MULTIMEDIA STORAGE AND ARCHIVING SYSTEMS, 1996, 2916 : 33 - 44
  • [50] Multilingual Retrieval of Radiology Images
    Kahn, Charles E., Jr.
    RADIOGRAPHICS, 2009, 29 (01) : 23 - 29