Multilingual Universal Sentence Encoder for Semantic Retrieval

被引:0
|
作者
Yang, Yinfei [1 ]
Cer, Daniel [1 ]
Ahmad, Amin [1 ]
Guo, Mandy [1 ]
Law, Jax [1 ]
Constant, Noah [1 ]
Abrego, Gustavo Hernandez [1 ]
Yuan, Steve [2 ]
Tar, Chris [1 ]
Sung, Yun-Hsuan [1 ]
Strope, Brian [1 ]
Kurzweil, Ray [1 ]
机构
[1] Google AI, Mountain View, CA 94043 USA
[2] Google, Cambridge, MA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present easy-to-use retrieval focused multilingual sentence embedding models, made available on TensorFlow Hub. The models embed text from 16 languages into a shared semantic space using a multi-task trained dual-encoder that learns tied cross-lingual representations via translation bridge tasks (Chidambaram et al., 2018). The models achieve a new state-of-the-art in performance on monolingual and cross-lingual semantic retrieval (SR). Competitive performance is obtained on the related tasks of translation pair bitext retrieval (BR) and retrieval question answering (ReQA). On transfer learning tasks, our multilingual embeddings approach, and in some cases exceed, the performance of English only sentence embeddings.
引用
收藏
页码:87 / 94
页数:8
相关论文
共 50 条
  • [1] MULTILINGUAL TEXT CLASSIFIER USING PRE-TRAINED UNIVERSAL SENTENCE ENCODER MODEL
    Orlovskiy, O., V
    Sohrab, Khalili
    Ostapov, S. E.
    Hazdyuk, K. P.
    Shumylyak, L. M.
    RADIO ELECTRONICS COMPUTER SCIENCE CONTROL, 2022, (03) : 102 - 108
  • [2] Universal Sentence Encoder for English
    Cer, Daniel
    Yang, Yinfei
    Kong, Sheng-yi
    Hua, Nan
    Limtiaco, Nicole
    St John, Rhomni
    Constant, Noah
    Guajardo-Cespedes, Mario
    Yuan, Steve
    Tar, Chris
    Sung, Yun-Hsuan
    Strope, Brian
    Kurzweil, Ray
    CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, 2018, : 169 - 174
  • [3] MathUSE: Mathematical information retrieval system using universal sentence encoder model
    Dadure, Pankaj
    Pakray, Partha
    Bandyopadhyay, Sivaji
    JOURNAL OF INFORMATION SCIENCE, 2024, 50 (01) : 66 - 84
  • [4] Semantic Sentiment Classification for COVID-19 Tweets Using Universal Sentence Encoder
    Fattoh, Ibrahim Eldesouky
    Alsheref, Fahad Kamal
    Ead, Waleed M.
    Youssef, Ahmed Mohamed
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [5] Semantic Sentiment Classification for COVID-19 Tweets Using Universal Sentence Encoder
    Fattoh, Ibrahim Eldesouky
    Kamal Alsheref, Fahad
    Ead, Waleed M.
    Youssef, Ahmed Mohamed
    Computational Intelligence and Neuroscience, 2022, 2022
  • [6] Gated recurrent unit with multilingual universal sentence encoder for Arabic aspect-based sentiment analysis
    AL-Smadi, Mohammad
    Hammad, Mahmoud M.
    Al-Zboon, Sa'ad A.
    AL-Tawalbeh, Saja
    Cambria, Erik
    KNOWLEDGE-BASED SYSTEMS, 2023, 261
  • [7] Discrete Cosine Transform as Universal Sentence Encoder
    Almarwani, Nada
    Diab, Mona
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 419 - 426
  • [8] MuMUR: Multilingual Multimodal Universal Retrieval
    Madasu, Avinash
    Aflalo, Estelle
    Stan, Gabriela Ben Melech
    Rosenman, Shachar
    Tseng, Shao-Yen
    Bertasius, Gedas
    Lal, Vasudev
    INFORMATION RETRIEVAL JOURNAL, 2023, 26 (1-2):
  • [9] MuMUR: Multilingual Multimodal Universal Retrieval
    Avinash Madasu
    Estelle Aflalo
    Gabriela Ben Melech Stan
    Shachar Rosenman
    Shao-Yen Tseng
    Gedas Bertasius
    Vasudev Lal
    Information Retrieval Journal, 2023, 26
  • [10] Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization
    Hirota, Wataru
    Suhara, Yoshihiko
    Golshan, Behzad
    Tan, Wang-Chiew
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7935 - 7943