LegoNN: Building Modular Encoder-Decoder Models

被引:3
|
作者
Dalmia, Siddharth [1 ]
Okhonko, Dmytro [4 ]
Lewis, Mike [2 ]
Edunov, Sergey [2 ]
Watanabe, Shinji [1 ]
Metze, Florian [1 ,2 ]
Zettlemoyer, Luke [2 ]
Mohamed, Abdelrahman [3 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Meta Platforms Inc, Menlo Pk, CA 94025 USA
[3] Rembrand Inc, Palo Alto, CA 94062 USA
[4] Samaya AI, Mountain View, CA 94040 USA
关键词
End-to-end; encoder-decoder models; modularity; speech recognition; machine translation; TRANSFORMER;
D O I
10.1109/TASLP.2023.3296019
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
State-of-the-art encoder-decoder models (e.g. for machine translation (MT) or automatic speech recognition (ASR)) are constructed and trained end-to-end as an atomic unit. No component of the model can be (re-)used without the others, making it impossible to share parts, e.g. a high resourced decoder, across tasks. We describe LegoNN, a procedure for building encoder-decoder architectures in a way so that its parts can be applied to other tasks without the need for any fine-tuning. To achieve this reusability, the interface between encoder and decoder modules is grounded to a sequence of marginal distributions over a pre-defined discrete vocabulary. We present two approaches for ingesting these marginals; one is differentiable, allowing the flow of gradients across the entire network, and the other is gradient-isolating. To enable the portability of decoder modules between MT tasks for different source languages and across other tasks like ASR, we introduce a modality agnostic encoder which consists of a length control mechanism to dynamically adapt encoders' output lengths in order to match the expected input length range of pre-trained decoders. We present several experiments to demonstrate the effectiveness of LegoNN models: a trained language generation LegoNN decoder module from German-English (De-En) MT task can be reused without any fine-tuning for the Europarl English ASR and the Romanian-English (Ro-En) MT tasks, matching or beating the performance of baseline. After fine-tuning, LegoNN models improve the Ro-En MT task by 1.5 BLEU points and achieve 12.5% relative WER reduction on the Europarl ASR task. To show how the approach generalizes, we compose a LegoNN ASR model from three modules - each has been learned within different end-to-end trained models on three different datasets - achieving an overall WER reduction of 19.5%.
引用
收藏
页码:3112 / 3126
页数:15
相关论文
共 50 条
  • [21] A survey on handwritten mathematical expression recognition: The rise of encoder-decoder and GNN models
    Truong, Thanh-Nghia
    Nguyen, Cuong Tuan
    Zanibbi, Richard
    Mouchere, Harold
    Nakagawa, Masaki
    PATTERN RECOGNITION, 2024, 153
  • [22] Short-term Inland Vessel Trajectory Prediction with Encoder-Decoder Models
    Donandt, Kathrin
    Boettger, Karim
    Soeffker, Dirk
    2022 IEEE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2022, : 974 - 979
  • [23] Investigation on the Encoder-Decoder Application for Mesh Generation
    Mameli, Marco
    Balloni, Emanuele
    Mancini, Adriano
    Frontoni, Emanuele
    Zingaretti, Primo
    ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT II, 2024, 14496 : 387 - 400
  • [24] Development of Secure Encoder-Decoder for JPEG Images
    Hamissa, Ghada
    Abd Elkader, Hatem
    Sarhan, Amany
    Fahmy, Mahmoud
    ICCES'2010: THE 2010 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS, 2010, : 189 - 194
  • [25] Encoder-decoder network with RMP for tongue segmentation
    Kusakunniran, Worapan
    Borwarnginn, Punyanuch
    Karnjanapreechakorn, Sarattha
    Thongkanchorn, Kittikhun
    Ritthipravat, Panrasee
    Tuakta, Pimchanok
    Benjapornlert, Paitoon
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2023, 61 (05) : 1193 - 1207
  • [26] Understanding How Encoder-Decoder Architectures Attend
    Aitken, Kyle
    Ramasesh, Vinay V.
    Cao, Yuan
    Maheswaranathan, Niru
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [27] DOM Refinement with neural Encoder-Decoder Networks
    Metzger, Nando
    PFG-JOURNAL OF PHOTOGRAMMETRY REMOTE SENSING AND GEOINFORMATION SCIENCE, 2020, 88 (3-4): : 362 - 363
  • [28] Encoder-decoder multimodal speaker change detection
    Jung, Jee-weon
    Seo, Soonshin
    Heo, Hee-Soo
    Kim, Geonmin
    Kim, You Jin
    Kwon, Young-ki
    Lee, Minjae
    Lee, Bong-Jin
    INTERSPEECH 2023, 2023, : 5311 - 5315
  • [29] Adversarial Signal Denoising with Encoder-Decoder Networks
    Casas, Leslie
    Klimmek, Attila
    Navab, Nassir
    Belagiannis, Vasileios
    28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 1467 - 1471
  • [30] Proper Error Estimation and Calibration for Attention-Based Encoder-Decoder Models
    Lee, Mun-Hak
    Chang, Joon-Hyuk
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4919 - 4930