Learning Tibetan-Chinese cross-lingual word embeddings

被引:1
|
作者
Ma, Wei [1 ]
Yu, Hongzhi [1 ]
Zhao, Kun [1 ]
Zhao, Deshun [1 ]
机构
[1] Northwest Minzu Univ Lanzhou, Key Lab Chinas Ethn Languages & Informat Technol, Minist Educ, Lanzhou, Peoples R China
关键词
word vectors; cross-lingual; fastText; CCA;
D O I
10.1109/SKG49510.2019.00017
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The idea of Word Embedding is based on the semantic distribution hypothesis of the linguist Harris (1954), who believes that words of the same semantics are distributed in similar contexts. Learning of vector-space word embeddings is a technique of central importance in natural language processing. In recent years, cross-lingual word vectors have received more and more attention. Cross-lingual word vectors enable knowledge transfer between different languages, the most important It is this transfer that can take place between resource-rich and low resource languages. This paper uses Tibetan and Chinese Wikipedia corpus to train monolingual word vectors, mainly using the fastText word vector training method, and the two monolingual word vectors are analyzed by CCA correlation, thus obtaining Tibetan-Chinese cross-lingual word vectors. In the experiment, we evaluated the resulting word representations on standard lexical semantic evaluation tasks and the results show that this method has a certain improvement on the semantic representation of the word vector.
引用
收藏
页码:49 / 53
页数:5
相关论文
共 50 条
  • [21] A Closer Look on Unsupervised Cross-lingual Word Embeddings Mapping
    Plucinski, Kamil
    Lango, Mateusz
    Zimniewicz, Michal
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 5555 - 5562
  • [22] Evaluating Sub-word embeddings in cross-lingual models
    Parizi, Ali Hakimi
    Cook, Paul
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2712 - 2719
  • [23] Learning Bilingual Sentiment-Specific Word Embeddings without Cross-lingual Supervision
    Feng, Yanlin
    Wan, Xiaojun
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 420 - 429
  • [24] Non-Linearity in mapping based Cross-Lingual Word Embeddings
    Zhao, Jiawei
    Gilman, Andrew
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3583 - 3589
  • [25] Neural topic-enhanced cross-lingual word embeddings for CLIR
    Zhou, Dong
    Qu, Wei
    Li, Lin
    Tang, Mingdong
    Yang, Aimin
    INFORMATION SCIENCES, 2022, 608 : 809 - 824
  • [26] A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings
    Artetxe, Mikel
    Labaka, Gorka
    Agirre, Eneko
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 789 - 798
  • [27] Weakly-Supervised Concept-based Adversarial Learning for Cross-lingual Word Embeddings
    Wang, Haozhou
    Henderson, James
    Merlo, Paola
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 4419 - 4430
  • [28] Cross-Lingual Word Embeddings for Low-Resource Language Modeling
    Adams, Oliver
    Makarucha, Adam
    Neubig, Graham
    Bird, Steven
    Cohn, Trevor
    15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 937 - 947
  • [29] Pre-tokenization of Multi-word Expressions in Cross-lingual Word Embeddings
    Otani, Naoki
    Ozakil, Satoru
    Zhao, Xingyuan
    Li, Yucen
    St Johns, Micaelah
    Levin, Lori
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 4451 - 4464
  • [30] Exploiting Common Characters in Chinese and Japanese to Learn Cross-lingual Word Embeddings via Matrix Factorization
    Wang, Jilei
    Luo, Shiying
    Shi, Weiyan
    Dai, Tao
    Xia, Shu-Tao
    REPRESENTATION LEARNING FOR NLP, 2018, : 113 - 121