Evaluating Sub-word embeddings in cross-lingual models

被引:0
|
作者
Parizi, Ali Hakimi [1 ]
Cook, Paul [1 ]
机构
[1] Univ New Brunswick, Fredericton, NB, Canada
关键词
Cross-lingual Word Embeddings; Low-resource Languages; Morphologically-rich Languages;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Cross-lingual word embeddings create a shared space for embeddings in two languages, and enable knowledge to be transferred between languages for tasks such as bilingual lexicon induction. One problem, however, is out-of-vocabulary (OOV) words, for which no embeddings are available. This is particularly problematic for low-resource and morphologically-rich languages, which often have relatively high OOV rates. Approaches to learning sub-word embeddings have been proposed to address the problem of OOV words, but most prior work has not considered sub-word embeddings in cross-lingual models. In this paper, we consider whether sub-word embeddings can be leveraged to form cross-lingual embeddings for OOV words. Specifically, we consider a novel bilingual lexicon induction task focused on OOV words, for language pairs covering several language families. Our results indicate that cross-lingual representations for OOV words can indeed be formed from sub-word embeddings, including in the case of a truly low-resource morphologically-rich language.
引用
收藏
页码:2712 / 2719
页数:8
相关论文
共 50 条
  • [31] Fully unsupervised word translation from cross-lingual word embeddings especially for healthcare professionals
    Shweta Chauhan
    Shefali Saxena
    Philemon Daniel
    International Journal of System Assurance Engineering and Management, 2022, 13 : 28 - 37
  • [32] Fully unsupervised word translation from cross-lingual word embeddings especially for healthcare professionals
    Chauhan, Shweta
    Saxena, Shefali
    Daniel, Philemon
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2022, 13 (SUPPL 1) : 28 - 37
  • [33] A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments
    Levy, Omer
    Sogaard, Anders
    Goldberg, Yoav
    15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 765 - 774
  • [34] Harnessing Deep Cross-lingual Word Embeddings to Infer Accurate Phylogenetic Trees
    Mantha, Yashasvi
    Kanojia, Diptesh
    Dubey, Abhijeet
    Bhattacharyya, Pushpak
    Kulkarni, Malhar
    PROCEEDINGS OF THE 7TH ACM IKDD CODS AND 25TH COMAD (CODS-COMAD 2020), 2020, : 330 - 331
  • [35] Cross-lingual alignments of ELMo contextual embeddings
    Matej Ulčar
    Marko Robnik-Šikonja
    Neural Computing and Applications, 2022, 34 : 13043 - 13061
  • [36] English-Welsh Cross-Lingual Embeddings
    Espinosa-Anke, Luis
    Palmer, Geraint
    Corcoran, Padraig
    Filimonov, Maxim
    Spasic, Irena
    Knight, Dawn
    APPLIED SCIENCES-BASEL, 2021, 11 (14):
  • [37] CLUSE: Cross-Lingual Unsupervised Sense Embeddings
    Chi, Ta-Chung
    Chen, Yun-Nung
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 271 - 281
  • [38] Cross-lingual alignments of ELMo contextual embeddings
    Ulcar, Matej
    Robnik-Sikonja, Marko
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (15): : 13043 - 13061
  • [39] WEWD: A Combined Approach for Measuring Cross-lingual Semantic Word Similarity Based on Word Embeddings and Word Definitions
    Van-Tan Bui
    Phuong-Thai Nguyen
    2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021), 2021, : 37 - 42
  • [40] Meemi: A simple method for post-processing and integrating cross-lingual word embeddings
    Doval, Yerai
    Camacho-Collados, Jose
    Espinosa-Anke, Luis
    Schockaert, Steven
    NATURAL LANGUAGE ENGINEERING, 2023, 29 (03) : 746 - 768