Learning Bilingual Lexicon for Low-Resource Language Pairs

被引:0
|
作者
Zhu, ShaoLin [1 ,2 ,3 ]
Li, Xiao [1 ,2 ]
Yang, YaTing [1 ,2 ]
Wang, Lei [1 ,2 ]
Mi, ChengGang [1 ,2 ]
机构
[1] Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Urumqi, Peoples R China
[2] Key Lab Speech Language Informat Proc Xinjiang, Urumqi, Peoples R China
[3] Univ Chinese Acad Sci, Beijing, Peoples R China
基金
中国科学院西部之光基金;
关键词
D O I
10.1007/978-3-319-73618-1_66
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning bilingual lexicon from monolingual data is a novel idea in natural language process which can benefit many low-resource language pairs. In this paper, we present an approach for obtaining bilingual lexicon from monolingual data. Our method only requires a small seed bilingual lexicon and we use the Canonical Correlation Analysis to construct a shared latent space to explain two monolingual embeddings how to be linked. Experimental results show that a considerable precision and size bilingual lexicon can be learned in Chinese-Uyghur and Chinese-Kazakh monolingual data.
引用
收藏
页码:760 / 770
页数:11
相关论文
共 50 条
  • [1] Supervised Bilingual Word Embeddings for Low-Resource Language Pairs: Myanmar and Thai
    16TH INTERNATIONAL JOINT SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE PROCESSING (ISAI-NLP 2021), 2021,
  • [2] Tone Learning in Low-Resource Bilingual TTS
    Liu, Ruolan
    Wen, Xue
    Lu, Chunhui
    Chen, Xiao
    INTERSPEECH 2020, 2020, : 2952 - 2956
  • [3] Exploring Bilingual Word Embeddings for Hiligaynon, a Low-Resource Language
    Michel, Leah
    Hangya, Viktor
    Fraser, Alexander
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2573 - 2580
  • [4] Improving Bilingual Lexicon Induction on Distant Language Pairs
    Zhu, Wenhao
    Zhou, Zhihao
    Huang, Shujian
    Lin, Zhenya
    Zhou, Xiangsheng
    Tu, Yaofeng
    Chen, Jiajun
    MACHINE TRANSLATION, CCMT 2019, 2019, 1104 : 1 - 10
  • [5] Plan Optimization to Bilingual Dictionary Induction for Low-resource Language Families
    Nasution, Arbi Haza
    Murakami, Yohei
    Ishida, Toru
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (02)
  • [6] Unsupervised Multimodal Machine Translation for Low-resource Distant Language Pairs
    Tayir, Turghun
    Li, Lin
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (04)
  • [7] Obtaining Parallel Sentences in Low-Resource Language Pairs with Minimal Supervision
    Shi, Xiayang
    Yue, Ping
    Liu, Xinyi
    Xu, Chun
    Xu, Lin
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [8] A Generalized Constraint Approach to Bilingual Dictionary Induction for Low-Resource Language Families
    Nasution, Arbi Haza
    Murakami, Yohei
    Ishida, Toru
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2018, 17 (02)
  • [9] Meta Auxiliary Learning for Low-resource Spoken Language Understanding
    Gao, Yingying
    Feng, Junlan
    Deng, Chao
    Zhang, Shilei
    INTERSPEECH 2022, 2022, : 2703 - 2707
  • [10] Lexicon-based fine-tuning of multilingual language models for low-resource language sentiment analysis
    Dhananjaya, Vinura
    Ranathunga, Surangika
    Jayasena, Sanath
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2024, 9 (05) : 1116 - 1125