Learning Tibetan-Chinese cross-lingual word embeddings

被引:1
|
作者
Ma, Wei [1 ]
Yu, Hongzhi [1 ]
Zhao, Kun [1 ]
Zhao, Deshun [1 ]
机构
[1] Northwest Minzu Univ Lanzhou, Key Lab Chinas Ethn Languages & Informat Technol, Minist Educ, Lanzhou, Peoples R China
关键词
word vectors; cross-lingual; fastText; CCA;
D O I
10.1109/SKG49510.2019.00017
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The idea of Word Embedding is based on the semantic distribution hypothesis of the linguist Harris (1954), who believes that words of the same semantics are distributed in similar contexts. Learning of vector-space word embeddings is a technique of central importance in natural language processing. In recent years, cross-lingual word vectors have received more and more attention. Cross-lingual word vectors enable knowledge transfer between different languages, the most important It is this transfer that can take place between resource-rich and low resource languages. This paper uses Tibetan and Chinese Wikipedia corpus to train monolingual word vectors, mainly using the fastText word vector training method, and the two monolingual word vectors are analyzed by CCA correlation, thus obtaining Tibetan-Chinese cross-lingual word vectors. In the experiment, we evaluated the resulting word representations on standard lexical semantic evaluation tasks and the results show that this method has a certain improvement on the semantic representation of the word vector.
引用
收藏
页码:49 / 53
页数:5
相关论文
共 50 条
  • [1] Cross-Lingual Word Embeddings
    Søgaard A.
    Vulić I.
    Ruder S.
    Faruqui M.
    Synthesis Lectures on Human Language Technologies, 2019, 12 (02): : 1 - 132
  • [2] Tibetan-Chinese Cross-Lingual Sentiment Classification Based on Adversarial Network
    Zhang, Tingting
    Jiang, Tao
    Shan, Ruikang
    2021 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2021, : 340 - 345
  • [3] Cross-Lingual Word Embeddings
    Corro, Caio Filippo
    TRAITEMENT AUTOMATIQUE DES LANGUES, 2019, 60 (01): : 46 - 48
  • [4] Cross-Lingual Word Embeddings
    Agirre, Eneko
    COMPUTATIONAL LINGUISTICS, 2020, 46 (01) : 245 - 248
  • [5] Multi-Adversarial Learning for Cross-Lingual Word Embeddings
    Wang, Haozhou
    Henderson, James
    Merlo, Paola
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 463 - 472
  • [6] Unsupervised cross-lingual word embeddings learning with adversarial training
    Li, Yuling
    Zhang, Yuhong
    Li, Peipei
    Hu, Xuegang
    2019 10TH IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (ICBK 2019), 2019, : 150 - 156
  • [7] Refinement of Unsupervised Cross-Lingual Word Embeddings
    Biesialska, Magdalena
    Costa-jussa, Marta R.
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 1978 - 1981
  • [8] Interactive Refinement of Cross-Lingual Word Embeddings
    Yuan, Michelle
    Zhang, Mozhi
    Van Durme, Benjamin
    Findlater, Leah
    Boyd-Graber, Jordan
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 5984 - 5996
  • [9] Cross-Lingual Word Embeddings for Turkic Languages
    Kuriyozov, Elmurod
    Doval, Yerai
    Gomez-Rodriguez, Carlos
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4054 - 4062
  • [10] Adversarial training with Wasserstein distance for learning cross-lingual word embeddings
    Li, Yuling
    Zhang, Yuhong
    Yu, Kui
    Hu, Xuegang
    APPLIED INTELLIGENCE, 2021, 51 (11) : 7666 - 7678