Learning Bilingual Lexicon for Low-Resource Language Pairs

被引:0
|
作者
Zhu, ShaoLin [1 ,2 ,3 ]
Li, Xiao [1 ,2 ]
Yang, YaTing [1 ,2 ]
Wang, Lei [1 ,2 ]
Mi, ChengGang [1 ,2 ]
机构
[1] Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Urumqi, Peoples R China
[2] Key Lab Speech Language Informat Proc Xinjiang, Urumqi, Peoples R China
[3] Univ Chinese Acad Sci, Beijing, Peoples R China
基金
中国科学院西部之光基金;
关键词
D O I
10.1007/978-3-319-73618-1_66
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning bilingual lexicon from monolingual data is a novel idea in natural language process which can benefit many low-resource language pairs. In this paper, we present an approach for obtaining bilingual lexicon from monolingual data. Our method only requires a small seed bilingual lexicon and we use the Canonical Correlation Analysis to construct a shared latent space to explain two monolingual embeddings how to be linked. Experimental results show that a considerable precision and size bilingual lexicon can be learned in Chinese-Uyghur and Chinese-Kazakh monolingual data.
引用
收藏
页码:760 / 770
页数:11
相关论文
共 50 条
  • [41] Parameter-Efficient Language Model Tuning with Active Learning in Low-Resource Settings
    Jukic, Josip
    Snajder, Jan
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5061 - 5074
  • [42] Hybrid Approach Text Generation for Low-Resource Language
    Rakhimova, Diana
    Adali, Esref
    Karibayeva, Aidana
    ADVANCES IN COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2024, PART I, 2024, 2165 : 256 - 268
  • [43] Multilingual Offensive Language Identification for Low-resource Languages
    Ranasinghe, Tharindu
    Zampieri, Marcos
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (01)
  • [44] A Scheme for News Article Classification in a Low-Resource Language
    Yohannes, Hailemariam Mehari
    Amagasa, Toshiyuki
    INFORMATION INTEGRATION AND WEB INTELLIGENCE, IIWAS 2022, 2022, 13635 : 519 - 530
  • [45] Low-resource Taxonomy Enrichment with Pretrained Language Models
    Takeoka, Kunihiro
    Akimoto, Kosuke
    Oyamada, Masafumi
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 2747 - 2758
  • [46] Natural language processing applications for low-resource languages
    Pakray, Partha
    Gelbukh, Alexander
    Bandyopadhyay, Sivaji
    NATURAL LANGUAGE PROCESSING, 2025, 31 (02): : 183 - 197
  • [47] NLPashto: NLP Toolkit for Low-resource Pashto Language
    Haq, Ijazul
    Qiu, Weidong
    Guo, Jie
    Tang, Peng
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (06) : 1344 - 1352
  • [48] Automatic Labeling of Clusters for a Low-Resource Urdu Language
    Nasim, Zarmeen
    Haider, Sajjad
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (05)
  • [49] Building a Dataset for Misinformation Detection in the Low-Resource Language
    Mukwevho, Mulweli
    Rananga, Seani
    Mbooi, Mahlatse S.
    Isong, Bassey
    Marivate, Vukosi
    2024 IST-AFRICA CONFERENCE, 2024,
  • [50] Meta Learning for Low-Resource Molecular Optimization
    Wang, Jiahao
    Zheng, Shuangjia
    Chen, Jianwen
    Yang, Yuedong
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2021, 61 (04) : 1627 - 1636