Learning Bilingual Lexicon for Low-Resource Language Pairs

被引:0
|
作者
Zhu, ShaoLin [1 ,2 ,3 ]
Li, Xiao [1 ,2 ]
Yang, YaTing [1 ,2 ]
Wang, Lei [1 ,2 ]
Mi, ChengGang [1 ,2 ]
机构
[1] Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Urumqi, Peoples R China
[2] Key Lab Speech Language Informat Proc Xinjiang, Urumqi, Peoples R China
[3] Univ Chinese Acad Sci, Beijing, Peoples R China
基金
中国科学院西部之光基金;
关键词
D O I
10.1007/978-3-319-73618-1_66
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning bilingual lexicon from monolingual data is a novel idea in natural language process which can benefit many low-resource language pairs. In this paper, we present an approach for obtaining bilingual lexicon from monolingual data. Our method only requires a small seed bilingual lexicon and we use the Canonical Correlation Analysis to construct a shared latent space to explain two monolingual embeddings how to be linked. Experimental results show that a considerable precision and size bilingual lexicon can be learned in Chinese-Uyghur and Chinese-Kazakh monolingual data.
引用
收藏
页码:760 / 770
页数:11
相关论文
共 50 条
  • [21] Quality Control for Crowdsourced Bilingual Dictionary in Low-Resource Languages
    Chida, Hiroki
    Murakami, Yohei
    Pituxcoosuvarn, Mondheera
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6590 - 6596
  • [22] Learning bilingual word embedding for automatic text summarization in low resource language
    Wijayanti, Rini
    Khodra, Masayu Leylia
    Surendro, Kridanto
    Widyantoro, Dwi H.
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (04) : 224 - 235
  • [23] An Adversarial Joint Learning Model for Low-Resource Language Semantic Textual Similarity
    Tian, Junfeng
    Lan, Man
    Wu, Yuanbin
    Wang, Jingang
    Qiu, Long
    Li, Sheng
    Jun, Lang
    Si, Luo
    ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018), 2018, 10772 : 89 - 101
  • [24] Large-scale Transfer Learning for Low-resource Spoken Language Understanding
    Jia, Xueli
    Wang, Jianzong
    Zhang, Zhiyong
    Cheng, Ning
    Xiao, Jing
    INTERSPEECH 2020, 2020, : 1555 - 1559
  • [25] Multi-language transfer learning for low-resource legal case summarization
    Moro, Gianluca
    Piscaglia, Nicola
    Ragazzi, Luca
    Italiani, Paolo
    ARTIFICIAL INTELLIGENCE AND LAW, 2024, 32 (04) : 1111 - 1139
  • [26] GlotLID: Language Identification for Low-Resource Languages
    Kargaran, Amir Hossein
    Imani, Ayyoob
    Yvon, Francois
    Schuetze, Hinrich
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 6155 - 6218
  • [27] Machine Translation into Low-resource Language Varieties
    Kumar, Sachin
    Anastasopoulos, Antonios
    Wintner, Shuly
    Tsvetkov, Yulia
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 110 - 121
  • [28] Multi-task Learning for Low-Resource Second Language Acquisition Modeling
    Hu, Yong
    Huang, Heyan
    Lan, Tian
    Wei, Xiaochi
    Nie, Yuxiang
    Qi, Jiarui
    Yang, Liner
    Mao, Xian-Ling
    WEB AND BIG DATA, PT I, APWEB-WAIM 2020, 2020, 12317 : 603 - 611
  • [29] Character Profiling in Low-Resource Language Documents
    Wong, Tak-sum
    Lee, John
    ADCS 2019: PROCEEDINGS OF THE 24TH AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM, 2019,
  • [30] Enriching the Transfer Learning with Pre-Trained Lexicon Embedding for Low-Resource Neural Machine Translation
    Mieradilijiang Maimaiti
    Yang Liu
    Huanbo Luan
    Maosong Sun
    TsinghuaScienceandTechnology, 2022, 27 (01) : 150 - 163