LegoNN: Building Modular Encoder-Decoder Models

被引:3
|
作者
Dalmia, Siddharth [1 ]
Okhonko, Dmytro [4 ]
Lewis, Mike [2 ]
Edunov, Sergey [2 ]
Watanabe, Shinji [1 ]
Metze, Florian [1 ,2 ]
Zettlemoyer, Luke [2 ]
Mohamed, Abdelrahman [3 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Meta Platforms Inc, Menlo Pk, CA 94025 USA
[3] Rembrand Inc, Palo Alto, CA 94062 USA
[4] Samaya AI, Mountain View, CA 94040 USA
关键词
End-to-end; encoder-decoder models; modularity; speech recognition; machine translation; TRANSFORMER;
D O I
10.1109/TASLP.2023.3296019
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
State-of-the-art encoder-decoder models (e.g. for machine translation (MT) or automatic speech recognition (ASR)) are constructed and trained end-to-end as an atomic unit. No component of the model can be (re-)used without the others, making it impossible to share parts, e.g. a high resourced decoder, across tasks. We describe LegoNN, a procedure for building encoder-decoder architectures in a way so that its parts can be applied to other tasks without the need for any fine-tuning. To achieve this reusability, the interface between encoder and decoder modules is grounded to a sequence of marginal distributions over a pre-defined discrete vocabulary. We present two approaches for ingesting these marginals; one is differentiable, allowing the flow of gradients across the entire network, and the other is gradient-isolating. To enable the portability of decoder modules between MT tasks for different source languages and across other tasks like ASR, we introduce a modality agnostic encoder which consists of a length control mechanism to dynamically adapt encoders' output lengths in order to match the expected input length range of pre-trained decoders. We present several experiments to demonstrate the effectiveness of LegoNN models: a trained language generation LegoNN decoder module from German-English (De-En) MT task can be reused without any fine-tuning for the Europarl English ASR and the Romanian-English (Ro-En) MT tasks, matching or beating the performance of baseline. After fine-tuning, LegoNN models improve the Ro-En MT task by 1.5 BLEU points and achieve 12.5% relative WER reduction on the Europarl ASR task. To show how the approach generalizes, we compose a LegoNN ASR model from three modules - each has been learned within different end-to-end trained models on three different datasets - achieving an overall WER reduction of 19.5%.
引用
收藏
页码:3112 / 3126
页数:15
相关论文
共 50 条
  • [41] A Scheme for Inverse Design of Encoded Metasurfaces Supported by Encoder-Decoder and Generative Models
    Chen, Bin
    Li, Da
    Gu, Zheming
    Wu, Yunlong
    Fan, Yudi
    Liu, Wenjie
    Wu, Jianguo
    2024 INTERNATIONAL CONFERENCE ON MICROWAVE AND MILLIMETER WAVE TECHNOLOGY, ICMMT, 2024,
  • [42] Building Extraction of Aerial Images by a Global and Multi-Scale Encoder-Decoder Network
    Ma, Jingjing
    Wu, Linlin
    Tang, Xu
    Liu, Fang
    Zhang, Xiangrong
    Jiao, Licheng
    REMOTE SENSING, 2020, 12 (15)
  • [43] An encoder-decoder LSTM-based EMPC framework applied to a building HVAC system
    Ellis, Matthew J.
    Chinde, Venkatesh
    CHEMICAL ENGINEERING RESEARCH & DESIGN, 2020, 160 : 508 - 520
  • [44] Unsupervised Story Comprehension with Hierarchical Encoder-Decoder
    Wang, Bingning
    Yao, Ting
    Zhang, Qi
    Xu, Jingfang
    Liu, Kang
    Tian, Zhixing
    Zhao, Jun
    PROCEEDINGS OF THE 2019 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL (ICTIR'19), 2019, : 92 - 99
  • [45] Correlation Encoder-Decoder Model for Text Generation
    Zhang, Xu
    Li, Yifeng
    Peng, Xueping
    Qiao, Xinxiao
    Zhang, Hui
    Lu, Wenpeng
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [46] Encoder-decoder based process generation method
    Tang W.
    Wang P.
    Cai D.
    Zhang G.
    Wang Y.
    Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2023, 29 (11): : 3656 - 3668
  • [47] Distillation of encoder-decoder transformers for sequence labelling
    Farina, Marco
    Pappadopulo, Duccio
    Gupta, Anant
    Huang, Leslie
    Irsoy, Ozan
    Solorio, Thamar
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2539 - 2549
  • [48] Attentive encoder-decoder networks for crowd counting
    Liu, Xuhui
    Hu, Yutao
    Zhang, Baochang
    Zhen, Xiantong
    Luo, Xiaoyan
    Cao, Xianbin
    Neurocomputing, 2022, 490 : 246 - 257
  • [49] Laplacian encoder-decoder network for raindrop removal
    Zini, Simone
    Buzzelli, Marco
    PATTERN RECOGNITION LETTERS, 2022, 158 : 24 - 33
  • [50] Timber Tracing with Multimodal Encoder-Decoder Networks
    Zolotarev, Fedor
    Eerola, Tuomas
    Lensu, Lasse
    Kalviainen, Heikki
    Haario, Heikki
    Heikkinen, Jere
    Kauppi, Tomi
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS, CAIP 2019, PT II, 2019, 11679 : 342 - 353