Learning Tibetan-Chinese cross-lingual word embeddings

被引:1
|
作者
Ma, Wei [1 ]
Yu, Hongzhi [1 ]
Zhao, Kun [1 ]
Zhao, Deshun [1 ]
机构
[1] Northwest Minzu Univ Lanzhou, Key Lab Chinas Ethn Languages & Informat Technol, Minist Educ, Lanzhou, Peoples R China
关键词
word vectors; cross-lingual; fastText; CCA;
D O I
10.1109/SKG49510.2019.00017
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The idea of Word Embedding is based on the semantic distribution hypothesis of the linguist Harris (1954), who believes that words of the same semantics are distributed in similar contexts. Learning of vector-space word embeddings is a technique of central importance in natural language processing. In recent years, cross-lingual word vectors have received more and more attention. Cross-lingual word vectors enable knowledge transfer between different languages, the most important It is this transfer that can take place between resource-rich and low resource languages. This paper uses Tibetan and Chinese Wikipedia corpus to train monolingual word vectors, mainly using the fastText word vector training method, and the two monolingual word vectors are analyzed by CCA correlation, thus obtaining Tibetan-Chinese cross-lingual word vectors. In the experiment, we evaluated the resulting word representations on standard lexical semantic evaluation tasks and the results show that this method has a certain improvement on the semantic representation of the word vector.
引用
收藏
页码:49 / 53
页数:5
相关论文
共 50 条
  • [41] Cross-lingual alignments of ELMo contextual embeddings
    Ulcar, Matej
    Robnik-Sikonja, Marko
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (15): : 13043 - 13061
  • [42] Cross-lingual embeddings with auxiliary topic models
    Zhou, Dong
    Peng, Xiaoya
    Li, Lin
    Han, Jun-mei
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 190
  • [43] WEWD: A Combined Approach for Measuring Cross-lingual Semantic Word Similarity Based on Word Embeddings and Word Definitions
    Van-Tan Bui
    Phuong-Thai Nguyen
    2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021), 2021, : 37 - 42
  • [44] Meemi: A simple method for post-processing and integrating cross-lingual word embeddings
    Doval, Yerai
    Camacho-Collados, Jose
    Espinosa-Anke, Luis
    Schockaert, Steven
    NATURAL LANGUAGE ENGINEERING, 2023, 29 (03) : 746 - 768
  • [45] Persian Sentiment Analysis without Training Data Using Cross-Lingual Word Embeddings
    Aliramezani, Mohammad
    Doostmohammadi, Ehsan
    Bokaei, Mohammad Hadi
    Sameti, Hossien
    2020 10TH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2020, : 78 - 82
  • [46] Bilingual Word Embeddings for Cross-Lingual Personality Recognition Using Convolutional Neural Nets
    Bin Siddique, Farhad
    Fung, Pascale
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3271 - 3275
  • [47] Cross-lingual hate speech detection using domain-specific word embeddings
    Monnar, Ayme Arango
    Rojas, Jorge Perez
    Labra, Barbara Polete
    PLOS ONE, 2024, 19 (07):
  • [48] Deep Learning for Mandarin-Tibetan Cross-Lingual Speech Synthesis
    Zhang, Weizhao
    Yang, Hongwu
    Bu, Xiaolong
    Wang, Lili
    IEEE ACCESS, 2019, 7 : 167884 - 167894
  • [49] Evaluating the Impact of Sub-word Information and Cross-lingual Word Embeddings on Mi'kmaq Language Modelling
    Boudreau, Jeremie
    Patra, Akankshya
    Suvarna, Ashima
    Cook, Paul
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2736 - 2745
  • [50] Cross-lingual Continual Learning
    M'hamdi, Meryem
    Ren, Xiang
    May, Jonathan
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 3908 - 3943