Meemi: A simple method for post-processing and integrating cross-lingual word embeddings

被引:0
|
作者
Doval, Yerai [1 ]
Camacho-Collados, Jose [2 ]
Espinosa-Anke, Luis [2 ]
Schockaert, Steven [2 ]
机构
[1] Univ Vigo, Escola Super Enxenaria Informat, Grp COLE, Ourensevigo, Spain
[2] Cardiff Univ, Sch Comp Sci & Informat, Cardiff CF24 3AA, Wales
关键词
D O I
10.1017/S1351324921000280
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Word embeddings have become a standard resource in the toolset of any Natural Language Processing practitioner. While monolingual word embeddings encode information about words in the context of a particular language, cross-lingual embeddings define a multilingual space where word embeddings from two or more languages are integrated together. Current state-of-the-art approaches learn these embeddings by aligning two disjoint monolingual vector spaces through an orthogonal transformation which preserves the structure of the monolingual counterparts. In this work, we propose to apply an additional transformation after this initial alignment step, which aims to bring the vector representations of a given word and its translations closer to their average. Since this additional transformation is non-orthogonal, it also affects the structure of the monolingual spaces. We show that our approach both improves the integration of the monolingual spaces and the quality of the monolingual spaces themselves. Furthermore, because our transformation can be applied to an arbitrary number of languages, we are able to effectively obtain a truly multilingual space. The resulting (monolingual and multilingual) spaces show consistent gains over the current state-of-the-art in standard intrinsic tasks, namely dictionary induction and word similarity, as well as in extrinsic tasks such as cross-lingual hypernym discovery and cross-lingual natural language inference.
引用
收藏
页码:746 / 768
页数:23
相关论文
共 50 条
  • [21] Neural topic-enhanced cross-lingual word embeddings for CLIR
    Zhou, Dong
    Qu, Wei
    Li, Lin
    Tang, Mingdong
    Yang, Aimin
    INFORMATION SCIENCES, 2022, 608 : 809 - 824
  • [22] Adversarial training with Wasserstein distance for learning cross-lingual word embeddings
    Yuling Li
    Yuhong Zhang
    Kui Yu
    Xuegang Hu
    Applied Intelligence, 2021, 51 : 7666 - 7678
  • [23] Cross-Lingual Word Embeddings for Low-Resource Language Modeling
    Adams, Oliver
    Makarucha, Adam
    Neubig, Graham
    Bird, Steven
    Cohn, Trevor
    15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 937 - 947
  • [24] Pre-tokenization of Multi-word Expressions in Cross-lingual Word Embeddings
    Otani, Naoki
    Ozakil, Satoru
    Zhao, Xingyuan
    Li, Yucen
    St Johns, Micaelah
    Levin, Lori
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 4451 - 4464
  • [25] A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings
    Artetxe, Mikel
    Labaka, Gorka
    Agirre, Eneko
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 789 - 798
  • [26] Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings
    Vulic, Ivan
    Moens, Marie-Francine
    SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2015, : 363 - 372
  • [27] Learning Cross-lingual Word Embeddings via Matrix Co-factorization
    Shi, Tianze
    Liu, Zhiyuan
    Liu, Yang
    Sun, Maosong
    PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL) AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (IJCNLP), VOL 2, 2015, : 567 - 572
  • [28] Fully unsupervised word translation from cross-lingual word embeddings especially for healthcare professionals
    Shweta Chauhan
    Shefali Saxena
    Philemon Daniel
    International Journal of System Assurance Engineering and Management, 2022, 13 : 28 - 37
  • [29] Fully unsupervised word translation from cross-lingual word embeddings especially for healthcare professionals
    Chauhan, Shweta
    Saxena, Shefali
    Daniel, Philemon
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2022, 13 (SUPPL 1) : 28 - 37
  • [30] A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments
    Levy, Omer
    Sogaard, Anders
    Goldberg, Yoav
    15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 765 - 774