DaLC: Domain Adaptation Learning Curve Prediction for Neural Machine Translation

被引:0
|
作者
Park, Cheonbok [1 ]
Kim, Hantae [1 ]
Calapodescu, Ioan [2 ]
Cho, Hyunchang [1 ]
Nikoulina, Vassilina [2 ]
机构
[1] NAVER Corp, Papago, Seongnam Si, South Korea
[2] NAVER LABS Europe, Meylan, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Domain Adaptation (DA) of Neural Machine Translation (NMT) model often relies on a pretrained general NMT model which is adapted to the new domain on a sample of in-domain parallel data. Without parallel data, there is no way to estimate the potential benefit of DA, nor the amount of parallel samples it would require. It is however a desirable functionality that could help MT practitioners to make an informed decision before investing resources in dataset creation. We propose a Domain adaptation Learning Curve prediction (DaLC) model that predicts prospective DA performance based on in-domain monolingual samples in the source language. Our model relies on the NMT encoder representations combined with various instance and corpus-level features. We demonstrate that instance-level is better able to distinguish between different domains compared to corpus-level frameworks proposed in previous studies (Xia et al., 2020; Kolachina et al., 2012). Finally, we perform indepth analyses of the results highlighting the limitations of our approach, and provide directions for future research.
引用
收藏
页码:1789 / 1807
页数:19
相关论文
共 50 条
  • [41] Generalised Unsupervised Domain Adaptation of Neural Machine Translation with Cross-Lingual Data Selection
    Thuy-Trang Vu
    He, Xuanli
    Dinh Phung
    Haffari, Gholamreza
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3335 - 3346
  • [42] Active Learning For Neural Machine Translation
    Zhang, Pei
    Xu, Xueying
    Xiong, Deyi
    2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 153 - 158
  • [43] Measuring Immediate Adaptation Performance for Neural Machine Translation
    Simianer, Patrick
    Wuebker, Joern
    DeNero, John
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 2038 - 2046
  • [44] Rapid Adaptation of Neural Machine Translation to New Languages
    Neubig, Graham
    Hu, Junjie
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 875 - 880
  • [45] Prediction Improves Simultaneous Neural Machine Translation
    Alinejad, Ashkan
    Siahbani, Maryam
    Sarkar, Anoop
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3022 - 3027
  • [46] Non-Parametric Adaptation for Neural Machine Translation
    Bapna, Ankur
    Firat, Orhan
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 1921 - 1931
  • [47] Document-Level Adaptation for Neural Machine Translation
    Kothur, Sachith Sri Ram
    Knowles, Rebecca
    Koehn, Philipp
    NEURAL MACHINE TRANSLATION AND GENERATION, 2018, : 64 - 73
  • [48] INMT: Interactive Neural Machine Translation Prediction
    Santy, Sebastin
    Dandapat, Sandipan
    Choudhury, Monojit
    Bali, Kalika
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, 2019, : 103 - 108
  • [49] Terminology-Enriched Meta-curriculum Learning for Domain Neural Machine Translation
    Chen, Zheng
    Wang, Yifan
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV, 2023, 14089 : 379 - 390
  • [50] Content-Equivalent Translated Parallel News Corpus and Extension of Domain Adaptation for Neural Machine Translation
    Mino, Hideya
    Tanaka, Hideki
    Ito, Hitoshi
    Goto, Isao
    Yamada, Ichiro
    Tokunaga, Takenobu
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3616 - 3622