Learning Tibetan-Chinese cross-lingual word embeddings

被引:1
|
作者
Ma, Wei [1 ]
Yu, Hongzhi [1 ]
Zhao, Kun [1 ]
Zhao, Deshun [1 ]
机构
[1] Northwest Minzu Univ Lanzhou, Key Lab Chinas Ethn Languages & Informat Technol, Minist Educ, Lanzhou, Peoples R China
关键词
word vectors; cross-lingual; fastText; CCA;
D O I
10.1109/SKG49510.2019.00017
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The idea of Word Embedding is based on the semantic distribution hypothesis of the linguist Harris (1954), who believes that words of the same semantics are distributed in similar contexts. Learning of vector-space word embeddings is a technique of central importance in natural language processing. In recent years, cross-lingual word vectors have received more and more attention. Cross-lingual word vectors enable knowledge transfer between different languages, the most important It is this transfer that can take place between resource-rich and low resource languages. This paper uses Tibetan and Chinese Wikipedia corpus to train monolingual word vectors, mainly using the fastText word vector training method, and the two monolingual word vectors are analyzed by CCA correlation, thus obtaining Tibetan-Chinese cross-lingual word vectors. In the experiment, we evaluated the resulting word representations on standard lexical semantic evaluation tasks and the results show that this method has a certain improvement on the semantic representation of the word vector.
引用
收藏
页码:49 / 53
页数:5
相关论文
共 50 条
  • [31] Cross-Lingual Transfer Learning for Complex Word Identification
    Zaharia, George-Eduard
    Cercel, Dumitru-Clementin
    Dascalu, Mihai
    2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 384 - 390
  • [32] Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings
    Vulic, Ivan
    Moens, Marie-Francine
    SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2015, : 363 - 372
  • [33] Fully unsupervised word translation from cross-lingual word embeddings especially for healthcare professionals
    Shweta Chauhan
    Shefali Saxena
    Philemon Daniel
    International Journal of System Assurance Engineering and Management, 2022, 13 : 28 - 37
  • [34] Fully unsupervised word translation from cross-lingual word embeddings especially for healthcare professionals
    Chauhan, Shweta
    Saxena, Shefali
    Daniel, Philemon
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2022, 13 (SUPPL 1) : 28 - 37
  • [35] Harnessing Deep Cross-lingual Word Embeddings to Infer Accurate Phylogenetic Trees
    Mantha, Yashasvi
    Kanojia, Diptesh
    Dubey, Abhijeet
    Bhattacharyya, Pushpak
    Kulkarni, Malhar
    PROCEEDINGS OF THE 7TH ACM IKDD CODS AND 25TH COMAD (CODS-COMAD 2020), 2020, : 330 - 331
  • [36] Best Practices for Learning Domain-Specific Cross-Lingual Embeddings
    Shakurova, Lena
    Nyari, Beata
    Li, Chao
    Rotaru, Mihai
    4TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP (REPL4NLP-2019), 2019, : 230 - 234
  • [37] Cross-lingual alignments of ELMo contextual embeddings
    Matej Ulčar
    Marko Robnik-Šikonja
    Neural Computing and Applications, 2022, 34 : 13043 - 13061
  • [38] English-Welsh Cross-Lingual Embeddings
    Espinosa-Anke, Luis
    Palmer, Geraint
    Corcoran, Padraig
    Filimonov, Maxim
    Spasic, Irena
    Knight, Dawn
    APPLIED SCIENCES-BASEL, 2021, 11 (14):
  • [39] CLUSE: Cross-Lingual Unsupervised Sense Embeddings
    Chi, Ta-Chung
    Chen, Yun-Nung
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 271 - 281
  • [40] Cross-Lingual Word Alignment for ASEAN Languages with Contrastive Learning
    Zhang, Jingshen
    Qiu, Xinying
    Shen, Teng
    Wang, Wenyu
    Zhang, Kailin
    Feng, Wenhe
    2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 61 - 66