Adversarial training with Wasserstein distance for learning cross-lingual word embeddings

被引:5
|
作者
Li, Yuling [1 ]
Zhang, Yuhong [1 ]
Yu, Kui [2 ]
Hu, Xuegang [2 ]
机构
[1] Hefei Univ Technol, Hefei, Anhui, Peoples R China
[2] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Anhui, Peoples R China
基金
美国国家科学基金会;
关键词
Cross-lingual word embeddings; Generative adversarial networks; Noise; NETWORKS; SPACE;
D O I
10.1007/s10489-020-02136-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent studies have managed to learn cross-lingual word embeddings in a completely unsupervised manner through generative adversarial networks (GANs). These GANs-based methods enable the alignment of two monolingual embedding spaces approximately, but the performance on the embeddings of low-frequency words (LFEs) is still unsatisfactory. The existing solution is to set up the low sampling rates for the embeddings of LFEs based on word-frequency information. However, such a solution has two shortcomings. First, this solution relies on the word-frequency information that is not always available in real scenarios. Second, the uneven sampling may cause the models to overlook the distribution information of LFEs, thereby negatively affecting their performance. In this study, we propose a novel unsupervised GANs-based method that effectively improves the quality of LFEs, circumventing the above two issues. Our method is based on the observation that LFEs tend to be densely clustered in the embedding space. In these dense embedding points, obtaining fine-grained alignment through adversarial training is difficult. We use this idea to introduce a noise function that can disperse the dense embedding points to a certain extent. In addition, we train a Wasserstein critic network to encourage the noise-adding embeddings and the original embeddings to have similar semantics. We test our approach on two common evaluation tasks, namely, bilingual lexicon induction and cross-lingual word similarity. Experimental results show that the proposed model has stronger or competitive performance compared with the supervised and unsupervised baselines.
引用
收藏
页码:7666 / 7678
页数:13
相关论文
共 50 条
  • [41] Best Practices for Learning Domain-Specific Cross-Lingual Embeddings
    Shakurova, Lena
    Nyari, Beata
    Li, Chao
    Rotaru, Mihai
    4TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP (REPL4NLP-2019), 2019, : 230 - 234
  • [42] Cross-lingual alignments of ELMo contextual embeddings
    Matej Ulčar
    Marko Robnik-Šikonja
    Neural Computing and Applications, 2022, 34 : 13043 - 13061
  • [43] English-Welsh Cross-Lingual Embeddings
    Espinosa-Anke, Luis
    Palmer, Geraint
    Corcoran, Padraig
    Filimonov, Maxim
    Spasic, Irena
    Knight, Dawn
    APPLIED SCIENCES-BASEL, 2021, 11 (14):
  • [44] CLUSE: Cross-Lingual Unsupervised Sense Embeddings
    Chi, Ta-Chung
    Chen, Yun-Nung
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 271 - 281
  • [45] Cross-Lingual Word Alignment for ASEAN Languages with Contrastive Learning
    Zhang, Jingshen
    Qiu, Xinying
    Shen, Teng
    Wang, Wenyu
    Zhang, Kailin
    Feng, Wenhe
    2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 61 - 66
  • [46] Cross-lingual alignments of ELMo contextual embeddings
    Ulcar, Matej
    Robnik-Sikonja, Marko
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (15): : 13043 - 13061
  • [47] Cross-lingual embeddings with auxiliary topic models
    Zhou, Dong
    Peng, Xiaoya
    Li, Lin
    Han, Jun-mei
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 190
  • [48] WEWD: A Combined Approach for Measuring Cross-lingual Semantic Word Similarity Based on Word Embeddings and Word Definitions
    Van-Tan Bui
    Phuong-Thai Nguyen
    2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021), 2021, : 37 - 42
  • [49] Meemi: A simple method for post-processing and integrating cross-lingual word embeddings
    Doval, Yerai
    Camacho-Collados, Jose
    Espinosa-Anke, Luis
    Schockaert, Steven
    NATURAL LANGUAGE ENGINEERING, 2023, 29 (03) : 746 - 768
  • [50] WASSERSTEIN CROSS-LINGUAL ALIGNMENT FOR NAMED ENTITY RECOGNITION
    Wang, Rui
    Henao, Ricardo
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8342 - 8346