Enhancing distant low-resource neural machine translation with semantic pivot

被引:0
|
作者
Zhu, Enchang [1 ,2 ]
Huang, Yuxin [2 ]
Xian, Yantuan [2 ]
Zhu, Junguo [1 ]
Gao, Minghu [1 ]
Yu, Zhiqiang [1 ]
机构
[1] Yunnan Minzu Univ, Sch Math & Comp Sci, Kunming 650500, Peoples R China
[2] Kunming Univ Sci & Technol, Sch Informat Engn & Automat, Kunming 650500, Peoples R China
基金
中国国家自然科学基金;
关键词
Machine translation; Chinese-Lao; Pivot; Adapter; Similar linguistic feature;
D O I
10.1016/j.aej.2024.12.073
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Prior work has proved that pivot-based method can boost the performance of neural machine translation (NMT). However, in low-resource scenarios, the efficient of pivot-based method is impaired severely due to data sparsity problem. As a typical low-resource language pair, Chinese-Lao NMT suffers the same performance dilemma. In addition, due to the significant linguistic gap between Chinese and Lao, some traditional and effective low-resource translation methods, such as introducing similarity external knowledge, sharing word space, and literal translation, are not suitable for the translation of this language pair. Fortunately, it is highly adaptable to pivot strategy, as there is a pivot language, Thai, which is highly similar to the target language Lao. Here, we propose a novel approach for incorporating similar linguistic features between Thai and Lao into the Chinese-Lao translation model. Firstly, an in-depth linguistic similarity analysis of Thai and Lao is conducted. Secondly, an elaborate pivot-based translation framework with KL adapter is applied. Experiments on the Chinese-Lao translation task show that our approach can help transfer more linguistic knowledges from the Chinese encoder to the Lao decoder via similar linguistic features, achieving substantial improvements compared to the baseline models.
引用
收藏
页码:633 / 643
页数:11
相关论文
共 50 条
  • [21] Survey of Low-Resource Machine Translation
    Haddow, Barry
    Bawden, Rachel
    Barone, Antonio Valerio Miceli
    Helcl, Jindrich
    Birch, Alexandra
    COMPUTATIONAL LINGUISTICS, 2022, 48 (03) : 673 - 732
  • [22] Enhancing low-resource neural machine translation with syntax-graph guided self-attention
    Gong, Longchao
    Li, Yan
    Guo, Junjun
    Yu, Zhengtao
    Gao, Shengxiang
    KNOWLEDGE-BASED SYSTEMS, 2022, 246
  • [23] Rethinking the Exploitation of Monolingual Data for Low-Resource Neural Machine Translation
    Pang, Jianhui
    Yang, Baosong
    Wong, Derek Fai
    Wan, Yu
    Liu, Dayiheng
    Chao, Lidia Sam
    Xie, Jun
    COMPUTATIONAL LINGUISTICS, 2023, 50 (01) : 25 - 47
  • [24] A Diverse Data Augmentation Strategy for Low-Resource Neural Machine Translation
    Li, Yu
    Li, Xiao
    Yang, Yating
    Dong, Rui
    INFORMATION, 2020, 11 (05)
  • [25] A Content Word Augmentation Method for Low-Resource Neural Machine Translation
    Li, Fuxue
    Zhao, Zhongchao
    Chi, Chuncheng
    Yan, Hong
    Zhang, Zhen
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV, 2023, 14089 : 720 - 731
  • [26] Understanding and Improving Low-Resource Neural Machine Translation with Shallow Features
    Sun, Yanming
    Liu, Xuebo
    Wong, Derek F.
    Lin, Yuchu
    Li, Bei
    Zhan, Runzhe
    Chao, Lidia S.
    Zhang, Min
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 227 - 239
  • [27] Incremental Domain Adaptation for Neural Machine Translation in Low-Resource Settings
    Kalimuthu, Marimuthu
    Barz, Michael
    Sonntag, Daniel
    FOURTH ARABIC NATURAL LANGUAGE PROCESSING WORKSHOP (WANLP 2019), 2019, : 1 - 10
  • [28] Benchmarking Neural and Statistical Machine Translation on Low-Resource African Languages
    Duh, Kevin
    McNamee, Paul
    Post, Matt
    Thompson, Brian
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2667 - 2675
  • [29] An Analysis of Massively Multilingual Neural Machine Translation for Low-Resource Languages
    Mueller, Aaron
    Nicolai, Garrett
    McCarthy, Arya D.
    Lewis, Dylan
    Wu, Winston
    Yarowsky, David
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3710 - 3718
  • [30] Towards a Low-Resource Neural Machine Translation for Indigenous Languages in Canada
    Ngoc Tan Le
    Sadat, Fatiha
    TRAITEMENT AUTOMATIQUE DES LANGUES, 2021, 62 (03): : 39 - 63