Evaluating Sub-word embeddings in cross-lingual models

被引:0
|
作者
Parizi, Ali Hakimi [1 ]
Cook, Paul [1 ]
机构
[1] Univ New Brunswick, Fredericton, NB, Canada
关键词
Cross-lingual Word Embeddings; Low-resource Languages; Morphologically-rich Languages;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Cross-lingual word embeddings create a shared space for embeddings in two languages, and enable knowledge to be transferred between languages for tasks such as bilingual lexicon induction. One problem, however, is out-of-vocabulary (OOV) words, for which no embeddings are available. This is particularly problematic for low-resource and morphologically-rich languages, which often have relatively high OOV rates. Approaches to learning sub-word embeddings have been proposed to address the problem of OOV words, but most prior work has not considered sub-word embeddings in cross-lingual models. In this paper, we consider whether sub-word embeddings can be leveraged to form cross-lingual embeddings for OOV words. Specifically, we consider a novel bilingual lexicon induction task focused on OOV words, for language pairs covering several language families. Our results indicate that cross-lingual representations for OOV words can indeed be formed from sub-word embeddings, including in the case of a truly low-resource morphologically-rich language.
引用
收藏
页码:2712 / 2719
页数:8
相关论文
共 50 条
  • [41] Persian Sentiment Analysis without Training Data Using Cross-Lingual Word Embeddings
    Aliramezani, Mohammad
    Doostmohammadi, Ehsan
    Bokaei, Mohammad Hadi
    Sameti, Hossien
    2020 10TH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2020, : 78 - 82
  • [42] Bilingual Word Embeddings for Cross-Lingual Personality Recognition Using Convolutional Neural Nets
    Bin Siddique, Farhad
    Fung, Pascale
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3271 - 3275
  • [43] Learning Bilingual Sentiment-Specific Word Embeddings without Cross-lingual Supervision
    Feng, Yanlin
    Wan, Xiaojun
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 420 - 429
  • [44] Cross-lingual hate speech detection using domain-specific word embeddings
    Monnar, Ayme Arango
    Rojas, Jorge Perez
    Labra, Barbara Polete
    PLOS ONE, 2024, 19 (07):
  • [45] Evaluating Factuality in Cross-lingual Summarization
    Gao, Mingqi
    Wang, Wenqing
    Wan, Xiaojun
    Xu, Yuemei
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 12415 - 12431
  • [46] Evaluating Modeling Units and Sub-word Features in Language Models for Turkish ASR
    Liu, Chang
    Zhang, Yike
    Zhang, Pengyuan
    Wang, Yaofeng
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 414 - 418
  • [47] Why Overfitting Isn't Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries
    Zhang, Mozhi
    Fujinuma, Yoshinari
    Paul, Michael J.
    Boyd-Graber, Jordan
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 2214 - 2220
  • [48] Manipuri-English Cross-lingual Word Embeddings using a Temporally Aligned Comparable Corpus
    Laitonjam, Lenin
    Singh, Sanasam Ranbir
    2021 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2021, : 195 - 199
  • [49] Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer
    Zhao, Jieyu
    Mukherjee, Subhabrata
    Hosseini, Saghar
    Chang, Kai-Wei
    Awadallah, Ahmed Hassan
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 2896 - 2907
  • [50] A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings
    Artetxe, Mikel
    Labaka, Gorka
    Agirre, Eneko
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 789 - 798