LegoNN: Building Modular Encoder-Decoder Models

被引:3
|
作者
Dalmia, Siddharth [1 ]
Okhonko, Dmytro [4 ]
Lewis, Mike [2 ]
Edunov, Sergey [2 ]
Watanabe, Shinji [1 ]
Metze, Florian [1 ,2 ]
Zettlemoyer, Luke [2 ]
Mohamed, Abdelrahman [3 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Meta Platforms Inc, Menlo Pk, CA 94025 USA
[3] Rembrand Inc, Palo Alto, CA 94062 USA
[4] Samaya AI, Mountain View, CA 94040 USA
关键词
End-to-end; encoder-decoder models; modularity; speech recognition; machine translation; TRANSFORMER;
D O I
10.1109/TASLP.2023.3296019
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
State-of-the-art encoder-decoder models (e.g. for machine translation (MT) or automatic speech recognition (ASR)) are constructed and trained end-to-end as an atomic unit. No component of the model can be (re-)used without the others, making it impossible to share parts, e.g. a high resourced decoder, across tasks. We describe LegoNN, a procedure for building encoder-decoder architectures in a way so that its parts can be applied to other tasks without the need for any fine-tuning. To achieve this reusability, the interface between encoder and decoder modules is grounded to a sequence of marginal distributions over a pre-defined discrete vocabulary. We present two approaches for ingesting these marginals; one is differentiable, allowing the flow of gradients across the entire network, and the other is gradient-isolating. To enable the portability of decoder modules between MT tasks for different source languages and across other tasks like ASR, we introduce a modality agnostic encoder which consists of a length control mechanism to dynamically adapt encoders' output lengths in order to match the expected input length range of pre-trained decoders. We present several experiments to demonstrate the effectiveness of LegoNN models: a trained language generation LegoNN decoder module from German-English (De-En) MT task can be reused without any fine-tuning for the Europarl English ASR and the Romanian-English (Ro-En) MT tasks, matching or beating the performance of baseline. After fine-tuning, LegoNN models improve the Ro-En MT task by 1.5 BLEU points and achieve 12.5% relative WER reduction on the Europarl ASR task. To show how the approach generalizes, we compose a LegoNN ASR model from three modules - each has been learned within different end-to-end trained models on three different datasets - achieving an overall WER reduction of 19.5%.
引用
收藏
页码:3112 / 3126
页数:15
相关论文
共 50 条
  • [31] UNWRITTEN LANGUAGES DEMAND ATTENTION TOO! WORD DISCOVERY WITH ENCODER-DECODER MODELS
    Boito, Marcely Zanon
    Berard, Alexandre
    Villavicencio, Aline
    Besacier, Laurent
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 458 - 465
  • [32] Encoder-Decoder Joint Enhancement for Video Chat
    Zhang, Zhenghao
    Wang, Zhao
    Ye, Yan
    Wang, Shiqi
    Zheng, Changwen
    2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,
  • [33] Explainable gait recognition with prototyping encoder-decoder
    Moon, Jucheol
    Shin, Yong-Min
    Park, Jin-Duk
    Minaya, Nelson Hebert
    Shin, Won-Yong
    Choi, Sang-Il
    PLOS ONE, 2022, 17 (03):
  • [34] Weed detection in precision agriculture: leveraging encoder-decoder models for semantic segmentation
    Thiagarajan S.
    Vijayalakshmi A.
    Grace G.H.
    Journal of Ambient Intelligence and Humanized Computing, 2024, 15 (09) : 3547 - 3561
  • [35] Parallel encoder-decoder framework for image captioning
    Saeidimesineh, Reyhane
    Adibi, Peyman
    Karshenas, Hossein
    Darvishy, Alireza
    KNOWLEDGE-BASED SYSTEMS, 2023, 282
  • [36] On Mining Conditions using Encoder-decoder Networks
    Gallego, Fernando O.
    Corchuelo, Rafael
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE (ICAART), VOL 2, 2019, : 624 - 630
  • [37] An encoder-decoder switch network for purchase prediction
    Park, Chanyoung
    Kim, Donghyun
    Yu, Hwanjo
    KNOWLEDGE-BASED SYSTEMS, 2019, 185
  • [38] Exemplar Encoder-Decoder for Neural Conversation Generation
    Pandey, Gaurav
    Contractor, Danish
    Kumar, Vineet
    Joshi, Sachindra
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 1329 - 1338
  • [39] Chaotic Encoder-Decoder on FPGA for Crypto System
    Roeksukrungrueang, Chanathip
    Dittaphong, Xaysamone
    Khongsomboon, Khamphong
    Panyanouyong, Nounchan
    Chivapreecha, Sorawat
    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [40] Molecular all-photonic encoder-decoder
    Andreasson, Joakim
    Straight, Stephen D.
    Moore, Thomas A.
    Moore, Ana L.
    Gust, Devens
    JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2008, 130 (33) : 11122 - 11128