Meemi: A simple method for post-processing and integrating cross-lingual word embeddings

被引:0
|
作者
Doval, Yerai [1 ]
Camacho-Collados, Jose [2 ]
Espinosa-Anke, Luis [2 ]
Schockaert, Steven [2 ]
机构
[1] Univ Vigo, Escola Super Enxenaria Informat, Grp COLE, Ourensevigo, Spain
[2] Cardiff Univ, Sch Comp Sci & Informat, Cardiff CF24 3AA, Wales
关键词
D O I
10.1017/S1351324921000280
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Word embeddings have become a standard resource in the toolset of any Natural Language Processing practitioner. While monolingual word embeddings encode information about words in the context of a particular language, cross-lingual embeddings define a multilingual space where word embeddings from two or more languages are integrated together. Current state-of-the-art approaches learn these embeddings by aligning two disjoint monolingual vector spaces through an orthogonal transformation which preserves the structure of the monolingual counterparts. In this work, we propose to apply an additional transformation after this initial alignment step, which aims to bring the vector representations of a given word and its translations closer to their average. Since this additional transformation is non-orthogonal, it also affects the structure of the monolingual spaces. We show that our approach both improves the integration of the monolingual spaces and the quality of the monolingual spaces themselves. Furthermore, because our transformation can be applied to an arbitrary number of languages, we are able to effectively obtain a truly multilingual space. The resulting (monolingual and multilingual) spaces show consistent gains over the current state-of-the-art in standard intrinsic tasks, namely dictionary induction and word similarity, as well as in extrinsic tasks such as cross-lingual hypernym discovery and cross-lingual natural language inference.
引用
收藏
页码:746 / 768
页数:23
相关论文
共 50 条
  • [11] Multi-Adversarial Learning for Cross-Lingual Word Embeddings
    Wang, Haozhou
    Henderson, James
    Merlo, Paola
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 463 - 472
  • [12] Learning Tibetan-Chinese cross-lingual word embeddings
    Ma, Wei
    Yu, Hongzhi
    Zhao, Kun
    Zhao, Deshun
    2019 15TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG 2019), 2019, : 49 - 53
  • [13] A Variational Autoencoding Approach for Inducing Cross-lingual Word Embeddings
    Wei, Liangchen
    Deng, Zhi-Hong
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4165 - 4171
  • [14] Cross-Lingual Word Representations via Spectral Graph Embeddings
    Oshikiri, Takamasa
    Fukui, Kazuki
    Shimodaira, Hidetoshi
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2016), VOL 2, 2016, : 493 - 498
  • [15] A Study of Efficacy of Cross-lingual Word Embeddings for Indian Languages
    Khatri, Jyotsana
    Murthy, Rudra
    Bhattacharyya, Pushpak
    PROCEEDINGS OF THE 7TH ACM IKDD CODS AND 25TH COMAD (CODS-COMAD 2020), 2020, : 347 - 348
  • [16] A Closer Look on Unsupervised Cross-lingual Word Embeddings Mapping
    Plucinski, Kamil
    Lango, Mateusz
    Zimniewicz, Michal
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 5555 - 5562
  • [17] Unsupervised cross-lingual word embeddings learning with adversarial training
    Li, Yuling
    Zhang, Yuhong
    Li, Peipei
    Hu, Xuegang
    2019 10TH IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (ICBK 2019), 2019, : 150 - 156
  • [18] Evaluating Sub-word embeddings in cross-lingual models
    Parizi, Ali Hakimi
    Cook, Paul
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2712 - 2719
  • [19] Non-Linearity in mapping based Cross-Lingual Word Embeddings
    Zhao, Jiawei
    Gilman, Andrew
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3583 - 3589
  • [20] Adversarial training with Wasserstein distance for learning cross-lingual word embeddings
    Li, Yuling
    Zhang, Yuhong
    Yu, Kui
    Hu, Xuegang
    APPLIED INTELLIGENCE, 2021, 51 (11) : 7666 - 7678