LegoNN: Building Modular Encoder-Decoder Models

被引：3

作者：

Dalmia, Siddharth ^{[1
]}

Okhonko, Dmytro ^{[4
]}

Lewis, Mike ^{[2
]}

Edunov, Sergey ^{[2
]}

Watanabe, Shinji ^{[1
]}

Metze, Florian ^{[1
,2
]}

Zettlemoyer, Luke ^{[2
]}

Mohamed, Abdelrahman ^{[3
]}

机构：

[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

[2] Meta Platforms Inc, Menlo Pk, CA 94025 USA

[3] Rembrand Inc, Palo Alto, CA 94062 USA

[4] Samaya AI, Mountain View, CA 94040 USA

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2023年 / 31卷

关键词：

End-to-end; encoder-decoder models; modularity; speech recognition; machine translation; TRANSFORMER;

D O I：

10.1109/TASLP.2023.3296019

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

State-of-the-art encoder-decoder models (e.g. for machine translation (MT) or automatic speech recognition (ASR)) are constructed and trained end-to-end as an atomic unit. No component of the model can be (re-)used without the others, making it impossible to share parts, e.g. a high resourced decoder, across tasks. We describe LegoNN, a procedure for building encoder-decoder architectures in a way so that its parts can be applied to other tasks without the need for any fine-tuning. To achieve this reusability, the interface between encoder and decoder modules is grounded to a sequence of marginal distributions over a pre-defined discrete vocabulary. We present two approaches for ingesting these marginals; one is differentiable, allowing the flow of gradients across the entire network, and the other is gradient-isolating. To enable the portability of decoder modules between MT tasks for different source languages and across other tasks like ASR, we introduce a modality agnostic encoder which consists of a length control mechanism to dynamically adapt encoders' output lengths in order to match the expected input length range of pre-trained decoders. We present several experiments to demonstrate the effectiveness of LegoNN models: a trained language generation LegoNN decoder module from German-English (De-En) MT task can be reused without any fine-tuning for the Europarl English ASR and the Romanian-English (Ro-En) MT tasks, matching or beating the performance of baseline. After fine-tuning, LegoNN models improve the Ro-En MT task by 1.5 BLEU points and achieve 12.5% relative WER reduction on the Europarl ASR task. To show how the approach generalizes, we compose a LegoNN ASR model from three modules - each has been learned within different end-to-end trained models on three different datasets - achieving an overall WER reduction of 19.5%.

引用

页码：3112 / 3126

页数：15

共 50 条

[41] A Scheme for Inverse Design of Encoded Metasurfaces Supported by Encoder-Decoder and Generative Models
Chen, Bin
Li, Da
Gu, Zheming
Wu, Yunlong
Fan, Yudi
Liu, Wenjie
Wu, Jianguo
2024 INTERNATIONAL CONFERENCE ON MICROWAVE AND MILLIMETER WAVE TECHNOLOGY, ICMMT, 2024,
[42] Building Extraction of Aerial Images by a Global and Multi-Scale Encoder-Decoder Network
Ma, Jingjing
Wu, Linlin
Tang, Xu
Liu, Fang
Zhang, Xiangrong
Jiao, Licheng
REMOTE SENSING, 2020, 12 (15)
[43] An encoder-decoder LSTM-based EMPC framework applied to a building HVAC system
Ellis, Matthew J.
Chinde, Venkatesh
CHEMICAL ENGINEERING RESEARCH & DESIGN, 2020, 160 : 508 - 520
[44] Unsupervised Story Comprehension with Hierarchical Encoder-Decoder
Wang, Bingning
Yao, Ting
Zhang, Qi
Xu, Jingfang
Liu, Kang
Tian, Zhixing
Zhao, Jun
PROCEEDINGS OF THE 2019 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL (ICTIR'19), 2019, : 92 - 99
[45] Correlation Encoder-Decoder Model for Text Generation
Zhang, Xu
Li, Yifeng
Peng, Xueping
Qiao, Xinxiao
Zhang, Hui
Lu, Wenpeng
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[46] Encoder-decoder based process generation method
Tang W.
Wang P.
Cai D.
Zhang G.
Wang Y.
Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2023, 29 (11): : 3656 - 3668
[47] Distillation of encoder-decoder transformers for sequence labelling
Farina, Marco
Pappadopulo, Duccio
Gupta, Anant
Huang, Leslie
Irsoy, Ozan
Solorio, Thamar
17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2539 - 2549
[48] Attentive encoder-decoder networks for crowd counting
Liu, Xuhui
Hu, Yutao
Zhang, Baochang
Zhen, Xiantong
Luo, Xiaoyan
Cao, Xianbin
Neurocomputing, 2022, 490 : 246 - 257
[49] Laplacian encoder-decoder network for raindrop removal
Zini, Simone
Buzzelli, Marco
PATTERN RECOGNITION LETTERS, 2022, 158 : 24 - 33
[50] Timber Tracing with Multimodal Encoder-Decoder Networks
Zolotarev, Fedor
Eerola, Tuomas
Lensu, Lasse
Kalviainen, Heikki
Haario, Heikki
Heikkinen, Jere
Kauppi, Tomi
COMPUTER ANALYSIS OF IMAGES AND PATTERNS, CAIP 2019, PT II, 2019, 11679 : 342 - 353

← 1 2 3 4 5 →