Adversarial training with Wasserstein distance for learning cross-lingual word embeddings

被引：5

作者：

Li, Yuling ^{[1
]}

Zhang, Yuhong ^{[1
]}

Yu, Kui ^{[2
]}

Hu, Xuegang ^{[2
]}

机构：

[1] Hefei Univ Technol, Hefei, Anhui, Peoples R China

[2] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Anhui, Peoples R China

来源：

APPLIED INTELLIGENCE | 2021年 / 51卷 / 11期

基金：

美国国家科学基金会;

关键词：

Cross-lingual word embeddings; Generative adversarial networks; Noise; NETWORKS; SPACE;

D O I：

10.1007/s10489-020-02136-x

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent studies have managed to learn cross-lingual word embeddings in a completely unsupervised manner through generative adversarial networks (GANs). These GANs-based methods enable the alignment of two monolingual embedding spaces approximately, but the performance on the embeddings of low-frequency words (LFEs) is still unsatisfactory. The existing solution is to set up the low sampling rates for the embeddings of LFEs based on word-frequency information. However, such a solution has two shortcomings. First, this solution relies on the word-frequency information that is not always available in real scenarios. Second, the uneven sampling may cause the models to overlook the distribution information of LFEs, thereby negatively affecting their performance. In this study, we propose a novel unsupervised GANs-based method that effectively improves the quality of LFEs, circumventing the above two issues. Our method is based on the observation that LFEs tend to be densely clustered in the embedding space. In these dense embedding points, obtaining fine-grained alignment through adversarial training is difficult. We use this idea to introduce a noise function that can disperse the dense embedding points to a certain extent. In addition, we train a Wasserstein critic network to encourage the noise-adding embeddings and the original embeddings to have similar semantics. We test our approach on two common evaluation tasks, namely, bilingual lexicon induction and cross-lingual word similarity. Experimental results show that the proposed model has stronger or competitive performance compared with the supervised and unsupervised baselines.

引用

页码：7666 / 7678

页数：13

共 50 条

[21] A Variational Autoencoding Approach for Inducing Cross-lingual Word Embeddings
Wei, Liangchen
Deng, Zhi-Hong
PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4165 - 4171
[22] Cross-Lingual Word Representations via Spectral Graph Embeddings
Oshikiri, Takamasa
Fukui, Kazuki
Shimodaira, Hidetoshi
PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2016), VOL 2, 2016, : 493 - 498
[23] A Study of Efficacy of Cross-lingual Word Embeddings for Indian Languages
Khatri, Jyotsana
Murthy, Rudra
Bhattacharyya, Pushpak
PROCEEDINGS OF THE 7TH ACM IKDD CODS AND 25TH COMAD (CODS-COMAD 2020), 2020, : 347 - 348
[24] A Closer Look on Unsupervised Cross-lingual Word Embeddings Mapping
Plucinski, Kamil
Lango, Mateusz
Zimniewicz, Michal
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 5555 - 5562
[25] Evaluating Sub-word embeddings in cross-lingual models
Parizi, Ali Hakimi
Cook, Paul
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2712 - 2719
[26] Learning Bilingual Sentiment-Specific Word Embeddings without Cross-lingual Supervision
Feng, Yanlin
Wan, Xiaojun
2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 420 - 429
[27] Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text Classification
Dong, Xin
Zhu, Yaxin
Zhang, Yupeng
Fu, Zuohui
Xu, Dongkuan
Yang, Sen
de Melo, Gerard
PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1541 - 1544
[28] Non-Linearity in mapping based Cross-Lingual Word Embeddings
Zhao, Jiawei
Gilman, Andrew
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3583 - 3589
[29] Neural topic-enhanced cross-lingual word embeddings for CLIR
Zhou, Dong
Qu, Wei
Li, Lin
Tang, Mingdong
Yang, Aimin
INFORMATION SCIENCES, 2022, 608 : 809 - 824
[30] A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings
Artetxe, Mikel
Labaka, Gorka
Agirre, Eneko
PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 789 - 798

← 1 2 3 4 5 →