Multilingual Universal Sentence Encoder for Semantic Retrieval

被引:0
|
作者
Yang, Yinfei [1 ]
Cer, Daniel [1 ]
Ahmad, Amin [1 ]
Guo, Mandy [1 ]
Law, Jax [1 ]
Constant, Noah [1 ]
Abrego, Gustavo Hernandez [1 ]
Yuan, Steve [2 ]
Tar, Chris [1 ]
Sung, Yun-Hsuan [1 ]
Strope, Brian [1 ]
Kurzweil, Ray [1 ]
机构
[1] Google AI, Mountain View, CA 94043 USA
[2] Google, Cambridge, MA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present easy-to-use retrieval focused multilingual sentence embedding models, made available on TensorFlow Hub. The models embed text from 16 languages into a shared semantic space using a multi-task trained dual-encoder that learns tied cross-lingual representations via translation bridge tasks (Chidambaram et al., 2018). The models achieve a new state-of-the-art in performance on monolingual and cross-lingual semantic retrieval (SR). Competitive performance is obtained on the related tasks of translation pair bitext retrieval (BR) and retrieval question answering (ReQA). On transfer learning tasks, our multilingual embeddings approach, and in some cases exceed, the performance of English only sentence embeddings.
引用
收藏
页码:87 / 94
页数:8
相关论文
共 50 条
  • [21] SEMANTIC SEARCH WITH SENTENCE-BERT FOR DESIGN INFORMATION RETRIEVAL
    Walsh, Hannah S.
    Andrade, Sequoia R.
    PROCEEDINGS OF ASME 2022 INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, IDETC-CIE2022, VOL 2, 2022,
  • [22] An Effective Fuzzy Clustering of Crime Reports Embedded by a Universal Sentence Encoder Model
    Pramanik, Aparna
    Das, Asit Kumar
    Pelusi, Danilo
    Nayak, Janmenjoy
    MATHEMATICS, 2023, 11 (03)
  • [23] Multilingual sentence hunter
    Liu, JYC
    Lin, JL
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2005 WORKSHOPS, PROCEEDINGS, 2005, 3807 : 84 - 93
  • [24] Universal information retrieval system in semantic Web environment
    Yoo, JM
    Myaeng, SH
    Jin, Y
    Lee, MH
    Proceedings of the 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE'05), 2005, : 348 - 353
  • [25] Analysis of Joint Multilingual Sentence Representations and Semantic K-Nearest Neighbor Graphs
    Schwenk, Holger
    Kiela, Douwe
    Douze, Matthijs
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 6982 - 6990
  • [26] Semantic retrieval of spoken words with an obliterated initial phoneme in a sentence context
    Sivonen, Paivi
    Maess, Burkhard
    Friederici, Angela D.
    NEUROSCIENCE LETTERS, 2006, 408 (03) : 220 - 225
  • [27] Compressing Sentence Representation for Semantic Retrieval via Homomorphic Projective Distillation
    Zhao, Xuandong
    Yu, Zhiguo
    Wu, Ming
    Li, Lei
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 774 - 781
  • [28] Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax
    Yang, Yinfei
    Abrego, Gustavo Hernandez
    Yuan, Steve
    Guo, Mandy
    Shen, Qinlan
    Cer, Daniel
    Sung, Yun-hsuan
    Strope, Brian
    Kurzweil, Ray
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 5370 - 5378
  • [29] A Corpus for Evaluating Semantic Multilingual Web Retrieval Systems: The Sense Folder Corpus
    De Luca, Ernesto William
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 3475 - 3480
  • [30] Exploring Anisotropy and Outliers in Multilingual Language Models for Cross-Lingual Semantic Sentence Similarity
    Libovicky, Jindrich
    Fraser, Alexander
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 7023 - 7037