Cold Fusion: Training Seq2Seq Models Together with Language Models

被引:0
|
作者
Sriram, Anuroop [1 ]
Jun, Heewoo [1 ]
Satheesh, Sanjeev [1 ]
Coates, Adam [1 ]
机构
[1] Baidu Res, Sunnyvale, CA 94089 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sequence-to-sequence (Seq2Seq) models with attention have excelled at tasks which involve generating natural language sentences such as machine translation, image captioning and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language model. In this work, we present the Cold Fusion method, which leverages a pre-trained language model during training, and show its effectiveness on the speech recognition task. We show that Seq2Seq models with Cold Fusion are able to better utilize language information enjoying i) faster convergence and better generalization, and ii) almost complete transfer to a new domain while using less than 10% of the labeled training data.
引用
收藏
页码:387 / 391
页数:5
相关论文
共 50 条
  • [1] Sparsing and Smoothing for the seq2seq Models
    Zhao S.
    Liang Z.
    Wen J.
    Chen J.
    IEEE Transactions on Artificial Intelligence, 2023, 4 (03): : 464 - 472
  • [2] Learning the Dyck Language with Attention-based Seq2Seq Models
    Yu, Xiang
    Ngoc Thang Vu
    Kuhn, Jonas
    BLACKBOXNLP WORKSHOP ON ANALYZING AND INTERPRETING NEURAL NETWORKS FOR NLP AT ACL 2019, 2019, : 138 - 146
  • [3] Application of Seq2Seq Models on Code Correction
    Huang, Shan
    Zhou, Xiao
    Chin, Sang
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2021, 4
  • [4] A Primer on Seq2Seq Models for Generative Chatbots
    Scotti, Vincenzo
    Sbattella, Licia
    Tedesco, Roberto
    ACM COMPUTING SURVEYS, 2024, 56 (03)
  • [5] Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq Models
    Soltan, Saleh
    Rosenbaum, Andy
    Falke, Tobias
    Lu, Qin
    Rumshisky, Anna
    Hamza, Wael
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 9380 - 9394
  • [6] Profanity-Avoiding Training Framework for Seq2seq Models with Certified Robustness
    Zhang, Hengtong
    Zheng, Tianhang
    Li, Yaliang
    Gao, Jing
    Su, Lu
    Li, Bo
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5151 - 5161
  • [7] Seq2Seq models for recommending short text conversations
    Torres, Johnny
    Vaca, Carmen
    Teran, Luis
    Abad, Cristina L.
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 150
  • [8] Seq2Seq Deep Learning Models for Microtext Normalization
    Satapathy, Ranjan
    Li, Yang
    Cavallari, Sandro
    Cambria, Erik
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [9] Learning Transductions and Alignments with RNN Seq2seq Models
    Wang, Zhengxiang
    INTERNATIONAL CONFERENCE ON GRAMMATICAL INFERENCE, VOL 217, 2023, 217 : 223 - 249
  • [10] Seq2Seq Surrogates of Epidemic Models to Facilitate Bayesian Inference
    Charles, Giovanni
    Wolock, Timothy M.
    Winskill, Peter
    Ghani, Azra
    Bhatt, Samir
    Flaxman, Seth
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 14170 - 14177