Improving Multilingual Neural Machine Translation System for Indic Languages

被引:3
|
作者
Das, Sudhansu Bala [1 ]
Biradar, Atharv [2 ]
Mishra, Tapas Kumar [1 ]
Patra, Bidyut Kr. [3 ]
机构
[1] Natl Inst Technol NIT, Rourkela 769008, Odisha, India
[2] Pune Inst Comp Technol PICT, Pune, Maharashtra, India
[3] Indian Inst Technol IIT, Varanasi, Uttar Pradesh, India
关键词
Multilingual neuralmachine translation system (MNMT); Indic languages (ILs); low resource language; corpus; BLEU score;
D O I
10.1145/3587932
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Machine Translation System (MTS) serves as effective tool for communication by translating text or speech from one language to another language. Recently, neural machine translation (NMT) has become popular for its performance and cost-effectiveness. However, NMT systems are restricted in translating low-resource languages as a huge quantity of data is required to learn useful mappings across languages. The need for an efficient translation system becomes obvious in a large multilingual environment like India. Indian languages (ILs) are still treated as low-resource languages due to unavailability of corpora. In order to address such an asymmetric nature, the multilingual neural machine translation (MNMT) system evolves as an ideal approach in this direction. The MNMT converts many languages using a single model, which is extremely useful in terms of training process and lowering online maintenance costs. It is also helpful for improving low-resource translation. In this article, we propose an MNMT system to address the issues related to low-resource language translation. Our model comprises two MNMT systems, i.e., for English-Indic (one-to-many) and for Indic-English (many-to-one) with a shared encoder-decoder containing 15 language pairs (30 translation directions). Since most of IL pairs have a scanty amount of parallel corpora, not sufficient for training any machine translation model, we explore various augmentation strategies to improve overall translation quality through the proposed model. A state-of-the-art transformer architecture is used to realize the proposed model. In addition, the article addresses the use of language relationships (in terms of dialect, script, etc.), particularly about the role of high-resource languages of the same family in boosting the performance of low-resource languages. Moreover, the experimental results also show the advantage of back-translation and domain adaptation for ILs to enhance the translation quality of both source and target languages. Using all these key approaches, our proposed model emerges to be more efficient than the baseline model in terms of evaluation metrics, i.e., BLEU (BiLingual Evaluation Understudy) score for a set of ILs.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Multilingual Neural Machine Translation for Indic to Indic Languages
    Das, Sudhansu Bala
    Panda, Divyajyoti
    Mishra, Tapas Kumar
    Patra, Bidyut Kr.
    Ekbal, Asif
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (05)
  • [2] Improving Multilingual Neural Machine Translation with Auxiliary Source Languages
    Xu, Weijia
    Yin, Yuwei
    Ma, Shuming
    Zhang, Dongdong
    Huang, Haoyang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 3029 - 3041
  • [3] Neural machine translation system of indic languages-an attention based approach
    Shah, Parth
    Bakrola, Vishvajit
    2019 2nd International Conference on Advanced Computational and Communication Paradigms, ICACCP 2019, 2019,
  • [4] Statistical machine translation for Indic languages
    Das, Sudhansu Bala
    Panda, Divyajyoti
    Mishra, Tapas Kumar
    Patra, Bidyut Kr.
    NATURAL LANGUAGE PROCESSING, 2025, 31 (02): : 328 - 345
  • [5] Paraphrases as Foreign Languages in Multilingual Neural Machine Translation
    Zhou, Zhong
    Sperber, Matthias
    Waibel, Alex
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 113 - 122
  • [6] Is Robustness Transferable across Languages in Multilingual Neural Machine Translation?
    Pan, Leiyu
    Supryadi
    Xiong, Deyi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 14114 - 14125
  • [7] Extremely Low-resource Multilingual Neural Machine Translation for Indic Mizo Language
    Lalrempuii C.
    Soni B.
    International Journal of Information Technology, 2023, 15 (8) : 4275 - 4282
  • [8] Survey on Neural Machine Translation for multilingual translation system
    Basmatkar, Pranjali
    Holani, Hemant
    Kaushal, Shivani
    PROCEEDINGS OF THE 2019 3RD INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2019), 2019, : 443 - 448
  • [9] Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation
    Zhang, Biao
    Williams, Philip
    Titov, Ivan
    Sennrich, Rico
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 1628 - 1639
  • [10] Multilingual Agreement for Multilingual Neural Machine Translation
    Yang, Jian
    Yin, Yuwei
    Ma, Shuming
    Huang, Haoyang
    Zhang, Dongdong
    Li, Zhoujun
    Wei, Furu
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 233 - 239