LegoNN: Building Modular Encoder-Decoder Models

被引:3
|
作者
Dalmia, Siddharth [1 ]
Okhonko, Dmytro [4 ]
Lewis, Mike [2 ]
Edunov, Sergey [2 ]
Watanabe, Shinji [1 ]
Metze, Florian [1 ,2 ]
Zettlemoyer, Luke [2 ]
Mohamed, Abdelrahman [3 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Meta Platforms Inc, Menlo Pk, CA 94025 USA
[3] Rembrand Inc, Palo Alto, CA 94062 USA
[4] Samaya AI, Mountain View, CA 94040 USA
关键词
End-to-end; encoder-decoder models; modularity; speech recognition; machine translation; TRANSFORMER;
D O I
10.1109/TASLP.2023.3296019
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
State-of-the-art encoder-decoder models (e.g. for machine translation (MT) or automatic speech recognition (ASR)) are constructed and trained end-to-end as an atomic unit. No component of the model can be (re-)used without the others, making it impossible to share parts, e.g. a high resourced decoder, across tasks. We describe LegoNN, a procedure for building encoder-decoder architectures in a way so that its parts can be applied to other tasks without the need for any fine-tuning. To achieve this reusability, the interface between encoder and decoder modules is grounded to a sequence of marginal distributions over a pre-defined discrete vocabulary. We present two approaches for ingesting these marginals; one is differentiable, allowing the flow of gradients across the entire network, and the other is gradient-isolating. To enable the portability of decoder modules between MT tasks for different source languages and across other tasks like ASR, we introduce a modality agnostic encoder which consists of a length control mechanism to dynamically adapt encoders' output lengths in order to match the expected input length range of pre-trained decoders. We present several experiments to demonstrate the effectiveness of LegoNN models: a trained language generation LegoNN decoder module from German-English (De-En) MT task can be reused without any fine-tuning for the Europarl English ASR and the Romanian-English (Ro-En) MT tasks, matching or beating the performance of baseline. After fine-tuning, LegoNN models improve the Ro-En MT task by 1.5 BLEU points and achieve 12.5% relative WER reduction on the Europarl ASR task. To show how the approach generalizes, we compose a LegoNN ASR model from three modules - each has been learned within different end-to-end trained models on three different datasets - achieving an overall WER reduction of 19.5%.
引用
收藏
页码:3112 / 3126
页数:15
相关论文
共 50 条
  • [1] Ensemble Encoder-Decoder Models for Predicting Land Transformation
    Pourmohammadi, Pariya
    Strager, Michael P.
    Adjeroh, Donald A.
    IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2021, 14 : 11429 - 11438
  • [2] KILM: Knowledge Injection into Encoder-Decoder Language Models
    Xu, Yan
    Namazifar, Mahdi
    Hazarika, Devamanyu
    Padmakumar, Aishwarya
    Liu, Yang
    Hakkani-Tur, Dilek
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 5013 - 5035
  • [3] Encoder-decoder models for latent phonological representations of words
    Jacobs, Cassandra L.
    Mailhot, Frederic
    16TH SIGMORPHON WORKSHOP ON COMPUTATIONAL RESEARCH IN PHONETICS PHONOLOGY, AND MORPHOLOGY (SIGMORPHON 2019), 2019, : 206 - 217
  • [4] Ensemble Encoder-Decoder Models for Predicting Land Transformation
    Pourmohammadi, Pariya
    Strager, Michael P.
    Adjeroh, Donald A.
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 11429 - 11438
  • [5] Alpha matting for portraits using encoder-decoder models
    Akshat Srivastava
    Srivatsav Raghu
    Abitha K Thyagarajan
    Jayasri Vaidyaraman
    Mohanaprasad Kothandaraman
    Pavan Sudheendra
    Avinav Goel
    Multimedia Tools and Applications, 2022, 81 : 14517 - 14528
  • [6] Alpha matting for portraits using encoder-decoder models
    Srivastava, Akshat
    Raghu, Srivatsav
    Thyagarajan, Abitha K.
    Vaidyaraman, Jayasri
    Kothandaraman, Mohanaprasad
    Sudheendra, Pavan
    Goel, Avinav
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (10) : 14517 - 14528
  • [7] Confidence measures in encoder-decoder models for speech recognition
    Woodward, Alejandro
    Bonnin, Clara
    Masuda, Issey
    Varas, David
    Bou-Balust, Elisenda
    Riveiro, Juan Carlos
    INTERSPEECH 2020, 2020, : 611 - 615
  • [8] Variational Memory Encoder-Decoder
    Hung Le
    Truyen Tran
    Thin Nguyen
    Venkatesh, Svetha
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [9] Analytical study of the encoder-decoder models for ultrasound image segmentation
    Somya Srivastava
    Ankit Vidyarthi
    Shikha Jain
    Service Oriented Computing and Applications, 2024, 18 : 81 - 100
  • [10] Encoder-decoder semantic segmentation models for pressure wound images
    Eldem, Huseyin
    Ulker, Erkan
    Isikli, Osman Yasar
    IMAGING SCIENCE JOURNAL, 2022, 70 (02): : 75 - 86