Morphologically Motivated Input Variations and Data Augmentation in Turkish-English Neural Machine Translation

被引:0
|
作者
Yirmibesoglu, Zeynep [1 ]
Gungor, Tunga [1 ]
机构
[1] Bogazici Univ, Comp Engn, Istanbul, Turkiye
关键词
Neural machine translation; morphology; low-resource; Transformer; encoder-decoder; attention; data augmentation; word segmentation;
D O I
10.1145/3571073
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Success of neural networks in natural language processing has paved the way for neural machine translation (NMT), which rapidly became the mainstream approach in machine translation. Significant improvement in translation performance has been achieved with breakthroughs such as encoder-decoder networks, attention mechanism, and Transformer architecture. However, the necessity of large amounts of parallel data for training an NMT system and rare words in translation corpora are issues yet to be overcome. In this article, we approach NMT of the low-resource Turkish-English language pair. We employ state-of-the-art NMT architectures and data augmentationmethods that exploit monolingual corpora. We point out the importance of input representation for the morphologically rich Turkish language and make a comprehensive analysis of linguistically and non-linguistically motivated input segmentation approaches. We prove the effectiveness of morphologically motivated input segmentation for the Turkish language. Moreover, we show the superiority of the Transformer architecture over attentional encoder-decoder models for the Turkish-English language pair. Among the employed data augmentation approaches, we observe back-translation to be the most effective and confirm the benefit of increasing the amount of parallel data on translation quality. This research demonstrates a comprehensive analysis on NMT architectures with different hyperparameters, data augmentation methods, and input representation techniques, and proposes ways of tackling the low-resource setting of Turkish-English NMT.
引用
收藏
页数:31
相关论文
共 50 条
  • [1] THE USE OF RECURRENT NEURAL NETWORKS LANGUAGE MODEL IN TURKISH-ENGLISH MACHINE TRANSLATION
    Yilmaz, Ertugrul
    El-Kahlout, Ilknur Durgar
    2014 22ND SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2014, : 1247 - 1250
  • [2] A Data Augmentation Method for English-Vietnamese Neural Machine Translation
    Pham, Nghia Luan
    Nguyen, Van Vinh
    Pham, Thang Viet
    IEEE ACCESS, 2023, 11 : 28034 - 28044
  • [3] A Turkish-English Speech Translation System with Speaker Adaptation
    Mermer, Coskun
    Demir, Cemil
    Kaya, Hamza
    Dogan, Mehmet Ugur
    2009 IEEE 17TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2009, : 830 - 833
  • [4] Machine translation: Turkish-English bilingual speakers' accuracy detection of evidentiality and preference of MT
    Tosun, Sumeyra
    COGNITIVE RESEARCH-PRINCIPLES AND IMPLICATIONS, 2024, 9 (01)
  • [5] Improving Low-Resource Kazakh-English and Turkish-English Neural Machine Translation Using Transfer Learning and Part of Speech Tags
    Yazar, Bilge Kagan
    Kilic, Erdal
    IEEE ACCESS, 2025, 13 : 32341 - 32356
  • [6] Compositional Representation of Morphologically-Rich Input for Neural Machine Translation
    Ataman, Duygu
    Federico, Marcello
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2018, : 305 - 311
  • [7] Counterfactual Data Augmentation for Neural Machine Translation
    Liu, Qi
    Kusner, Matt J.
    Blunsom, Phil
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 187 - 197
  • [8] Data Augmentation by Adjunct Deletion for Neural Machine Translation
    Fadaei, Hakimeh
    Faili, Heshaam
    2018 9TH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2018, : 715 - 719
  • [9] Deterministic Reversible Data Augmentation for Neural Machine Translation
    Yao, Jiashu
    Huang, Heyan
    Liu, Zeming
    Guo, Yuhang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 8075 - 8089
  • [10] Soft Contextual Data Augmentation for Neural Machine Translation
    Gao, Fei
    Zhu, Jinhua
    Wu, Lijun
    Xia, Yingce
    Qin, Tao
    Cheng, Xueqi
    Zhou, Wengang
    Liu, Tie-Yan
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5539 - 5544