Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR in Transfer Learning

被引:4
|
作者
Zeng, Zhiping [1 ,2 ]
Pham, Van Tung [1 ]
Xu, Haihua [1 ]
Khassanov, Yerbolat [1 ,3 ]
Chng, Eng Siong [1 ]
Ni, Chongjia [4 ]
Ma, Bin [4 ]
机构
[1] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
[2] Huya AI, Guangzhou, Peoples R China
[3] Nazarbayev Univ, ISSAI, Astana, Kazakhstan
[4] Alibaba Grp, Machine Intelligence Technol, Hangzhou, Peoples R China
关键词
cross-lingual transfer learning; transformer; lstm; unpaired text; independent language model;
D O I
10.1109/ISCSLP49672.2021.9362086
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we study leveraging extra text data to improve low-resource end-to-end ASR under cross-lingual transfer learning setting. To this end, we extend the prior work [1], and propose a hybrid Transformer-LSTM based architecture. This architecture not only takes advantage of the highly effective encoding capacity of the Transformer network but also benefits from extra text data due to the LSTM-based independent language model network. We conduct experiments on our in-house Malay corpus which contains limited labeled data and a large amount of extra text. Results show that the proposed architecture outperforms the previous LSTM-based architecture [1] by 24.2% relative word error rate (WER) when both are trained using limited labeled data. Starting from this, we obtain further 25.4% relative WER reduction by transfer learning from another resource-rich language. Moreover, we obtain additional 13.6% relative WER reduction by boosting the LSTM decoder of the transferred model with the extra text data. Overall, our best model outperforms the vanilla Transformer ASR by 11.9% relative WER. Last but not least, the proposed hybrid architecture offers much faster inference compared to both LSTM and Transformer architectures.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] PRE-TRAINING TRANSFORMER DECODER FOR END-TO-END ASR MODEL WITH UNPAIRED TEXT DATA
    Gao, Changfeng
    Cheng, Gaofeng
    Yang, Runyan
    Zhu, Han
    Zhang, Pengyuan
    Yan, Yonghong
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6543 - 6547
  • [2] LAYER-NORMALIZED LSTM FOR HYBRID-HMM AND END-TO-END ASR
    Zeineldeen, Mohammad
    Zeyer, Albert
    Schlueter, Ralf
    Ney, Hermann
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7679 - 7683
  • [3] AN END-TO-END SPEECH ACCENT RECOGNITION METHOD BASED ON HYBRID CTC/ATTENTION TRANSFORMER ASR
    Gao, Qiang
    Wu, Haiwei
    Sun, Yanqing
    Duan, Yitao
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7253 - 7257
  • [4] Bootstrap an End-to-end ASR System by Multilingual Training, Transfer Learning, Text-to-text Mapping and Synthetic Audio
    Giollo, Manuel
    Gunceler, Deniz
    Liu, Yulan
    Willett, Daniel
    INTERSPEECH 2021, 2021, : 2416 - 2420
  • [5] Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR
    Maekaku, Takashi
    Fujita, Yuya
    Peng, Yifan
    Watanabe, Shinji
    INTERSPEECH 2022, 2022, : 1071 - 1075
  • [6] Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR
    Maekaku, Takashi
    Fujita, Yuya
    Peng, Yifan
    Watanabe, Shinji
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2022, 2022-September : 1071 - 1075
  • [7] Data Augmentation Using CycleGAN for End-to-End Children ASR
    Singh, Dipesh K.
    Amin, Preet P.
    Sailor, Hardik B.
    Patil, Hemant A.
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 511 - 515
  • [8] Transformer-based end-to-end scene text recognition
    Zhu, Xinghao
    Zhang, Zhi
    PROCEEDINGS OF THE 2021 IEEE 16TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2021), 2021, : 1691 - 1695
  • [9] Semi-Supervised Learning with Data Augmentation for End-to-End ASR
    Weninger, Felix
    Mana, Franco
    Gemello, Roberto
    Andres-Ferrer, Jesus
    Zhan, Puming
    INTERSPEECH 2020, 2020, : 2802 - 2806
  • [10] End-to-End generation of Multiple-Choice questions using Text-to-Text transfer Transformer models
    Rodriguez-Torrealba, Ricardo
    Garcia-Lopez, Eva
    Garcia-Cabot, Antonio
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 208