Morphologically Motivated Input Variations and Data Augmentation in Turkish-English Neural Machine Translation

被引:0
|
作者
Yirmibesoglu, Zeynep [1 ]
Gungor, Tunga [1 ]
机构
[1] Bogazici Univ, Comp Engn, Istanbul, Turkiye
关键词
Neural machine translation; morphology; low-resource; Transformer; encoder-decoder; attention; data augmentation; word segmentation;
D O I
10.1145/3571073
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Success of neural networks in natural language processing has paved the way for neural machine translation (NMT), which rapidly became the mainstream approach in machine translation. Significant improvement in translation performance has been achieved with breakthroughs such as encoder-decoder networks, attention mechanism, and Transformer architecture. However, the necessity of large amounts of parallel data for training an NMT system and rare words in translation corpora are issues yet to be overcome. In this article, we approach NMT of the low-resource Turkish-English language pair. We employ state-of-the-art NMT architectures and data augmentationmethods that exploit monolingual corpora. We point out the importance of input representation for the morphologically rich Turkish language and make a comprehensive analysis of linguistically and non-linguistically motivated input segmentation approaches. We prove the effectiveness of morphologically motivated input segmentation for the Turkish language. Moreover, we show the superiority of the Transformer architecture over attentional encoder-decoder models for the Turkish-English language pair. Among the employed data augmentation approaches, we observe back-translation to be the most effective and confirm the benefit of increasing the amount of parallel data on translation quality. This research demonstrates a comprehensive analysis on NMT architectures with different hyperparameters, data augmentation methods, and input representation techniques, and proposes ways of tackling the low-resource setting of Turkish-English NMT.
引用
收藏
页数:31
相关论文
共 50 条
  • [31] Neural Machine Translation for English to Hindi
    Saini, Sandeep
    Sahula, Vineet
    2018 FOURTH INTERNATIONAL CONFERENCE ON INFORMATION RETRIEVAL AND KNOWLEDGE MANAGEMENT (CAMP), 2018, : 25 - 30
  • [32] English to Sinhala Neural Machine Translation
    Fonseka, Thilakshi
    Naranpanawa, Rashmini
    Perera, Ravinga
    Thayasivam, Uthayasanker
    2020 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2020), 2020, : 305 - 309
  • [33] Lexical-Constraint-Aware Neural Machine Translation via Data Augmentation
    Chen, Guanhua
    Chen, Yun
    Wang, Yong
    Li, Victor O. K.
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3587 - 3593
  • [34] Improving Neural Machine Translation Through Code-Mixed Data Augmentation
    Appicharla, Ramakrishna
    Gupta, Kamal Kumar
    Ekbal, Asif
    Bhattacharyya, Pushpak
    COMPUTATIONAL INTELLIGENCE, 2025, 41 (02)
  • [35] A Diverse Data Augmentation Strategy for Low-Resource Neural Machine Translation
    Li, Yu
    Li, Xiao
    Yang, Yating
    Dong, Rui
    INFORMATION, 2020, 11 (05)
  • [36] Understanding Data Augmentation in Neural Machine Translation: Two Perspectives towards Generalization
    Li, Guanlin
    Liu, Lemao
    Huang, Guoping
    Zhu, Conghui
    Zhao, Tiejun
    Shi, Shuming
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 5689 - 5695
  • [37] Improving Adversarial Neural Machine Translation for Morphologically Rich Language
    Mi, Chenggang
    Xie, Lei
    Zhang, Yanning
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2020, 4 (04): : 417 - 426
  • [38] Investigation of Data Augmentation Techniques for Assamese-English Language Pair Machine Translation
    Lalrempuii, Candy
    Soni, Badal
    2023 18TH INTERNATIONAL JOINT SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE PROCESSING, ISAI-NLP, 2023,
  • [39] Neural Machine Translation for Amharic-English Translation
    Gezmu, Andargachew Mekonne
    Nuernberger, Andreas
    Bati, Tesfaye Bayu
    ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 1, 2021, : 526 - 532
  • [40] Linguistically Motivated Evaluation of English-Latvian Statistical Machine Translation
    Skadina, Inguna
    Levane-Petrova, Kristine
    Rabante, Guna
    HUMAN LANGUAGE TECHNOLOGIES: THE BALTIC PERSPECTIVE, 2012, 247 : 221 - 229