Adversarial training with Wasserstein distance for learning cross-lingual word embeddings

被引:5
|
作者
Li, Yuling [1 ]
Zhang, Yuhong [1 ]
Yu, Kui [2 ]
Hu, Xuegang [2 ]
机构
[1] Hefei Univ Technol, Hefei, Anhui, Peoples R China
[2] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Anhui, Peoples R China
基金
美国国家科学基金会;
关键词
Cross-lingual word embeddings; Generative adversarial networks; Noise; NETWORKS; SPACE;
D O I
10.1007/s10489-020-02136-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent studies have managed to learn cross-lingual word embeddings in a completely unsupervised manner through generative adversarial networks (GANs). These GANs-based methods enable the alignment of two monolingual embedding spaces approximately, but the performance on the embeddings of low-frequency words (LFEs) is still unsatisfactory. The existing solution is to set up the low sampling rates for the embeddings of LFEs based on word-frequency information. However, such a solution has two shortcomings. First, this solution relies on the word-frequency information that is not always available in real scenarios. Second, the uneven sampling may cause the models to overlook the distribution information of LFEs, thereby negatively affecting their performance. In this study, we propose a novel unsupervised GANs-based method that effectively improves the quality of LFEs, circumventing the above two issues. Our method is based on the observation that LFEs tend to be densely clustered in the embedding space. In these dense embedding points, obtaining fine-grained alignment through adversarial training is difficult. We use this idea to introduce a noise function that can disperse the dense embedding points to a certain extent. In addition, we train a Wasserstein critic network to encourage the noise-adding embeddings and the original embeddings to have similar semantics. We test our approach on two common evaluation tasks, namely, bilingual lexicon induction and cross-lingual word similarity. Experimental results show that the proposed model has stronger or competitive performance compared with the supervised and unsupervised baselines.
引用
收藏
页码:7666 / 7678
页数:13
相关论文
共 50 条
  • [21] A Variational Autoencoding Approach for Inducing Cross-lingual Word Embeddings
    Wei, Liangchen
    Deng, Zhi-Hong
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4165 - 4171
  • [22] Cross-Lingual Word Representations via Spectral Graph Embeddings
    Oshikiri, Takamasa
    Fukui, Kazuki
    Shimodaira, Hidetoshi
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2016), VOL 2, 2016, : 493 - 498
  • [23] A Study of Efficacy of Cross-lingual Word Embeddings for Indian Languages
    Khatri, Jyotsana
    Murthy, Rudra
    Bhattacharyya, Pushpak
    PROCEEDINGS OF THE 7TH ACM IKDD CODS AND 25TH COMAD (CODS-COMAD 2020), 2020, : 347 - 348
  • [24] A Closer Look on Unsupervised Cross-lingual Word Embeddings Mapping
    Plucinski, Kamil
    Lango, Mateusz
    Zimniewicz, Michal
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 5555 - 5562
  • [25] Evaluating Sub-word embeddings in cross-lingual models
    Parizi, Ali Hakimi
    Cook, Paul
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2712 - 2719
  • [26] Learning Bilingual Sentiment-Specific Word Embeddings without Cross-lingual Supervision
    Feng, Yanlin
    Wan, Xiaojun
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 420 - 429
  • [27] Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text Classification
    Dong, Xin
    Zhu, Yaxin
    Zhang, Yupeng
    Fu, Zuohui
    Xu, Dongkuan
    Yang, Sen
    de Melo, Gerard
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1541 - 1544
  • [28] Non-Linearity in mapping based Cross-Lingual Word Embeddings
    Zhao, Jiawei
    Gilman, Andrew
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3583 - 3589
  • [29] Neural topic-enhanced cross-lingual word embeddings for CLIR
    Zhou, Dong
    Qu, Wei
    Li, Lin
    Tang, Mingdong
    Yang, Aimin
    INFORMATION SCIENCES, 2022, 608 : 809 - 824
  • [30] A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings
    Artetxe, Mikel
    Labaka, Gorka
    Agirre, Eneko
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 789 - 798