Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR in Transfer Learning

被引:4
|
作者
Zeng, Zhiping [1 ,2 ]
Pham, Van Tung [1 ]
Xu, Haihua [1 ]
Khassanov, Yerbolat [1 ,3 ]
Chng, Eng Siong [1 ]
Ni, Chongjia [4 ]
Ma, Bin [4 ]
机构
[1] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
[2] Huya AI, Guangzhou, Peoples R China
[3] Nazarbayev Univ, ISSAI, Astana, Kazakhstan
[4] Alibaba Grp, Machine Intelligence Technol, Hangzhou, Peoples R China
关键词
cross-lingual transfer learning; transformer; lstm; unpaired text; independent language model;
D O I
10.1109/ISCSLP49672.2021.9362086
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we study leveraging extra text data to improve low-resource end-to-end ASR under cross-lingual transfer learning setting. To this end, we extend the prior work [1], and propose a hybrid Transformer-LSTM based architecture. This architecture not only takes advantage of the highly effective encoding capacity of the Transformer network but also benefits from extra text data due to the LSTM-based independent language model network. We conduct experiments on our in-house Malay corpus which contains limited labeled data and a large amount of extra text. Results show that the proposed architecture outperforms the previous LSTM-based architecture [1] by 24.2% relative word error rate (WER) when both are trained using limited labeled data. Starting from this, we obtain further 25.4% relative WER reduction by transfer learning from another resource-rich language. Moreover, we obtain additional 13.6% relative WER reduction by boosting the LSTM decoder of the transferred model with the extra text data. Overall, our best model outperforms the vanilla Transformer ASR by 11.9% relative WER. Last but not least, the proposed hybrid architecture offers much faster inference compared to both LSTM and Transformer architectures.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] Transformer-Based End-to-End Classification of Variable-Length Volumetric Data
    Oghbaie, Marzieh
    Araujo, Teresa
    Emre, Taha
    Schmidt-Erfurth, Ursula
    Bogunovic, Hrvoje
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VI, 2023, 14225 : 358 - 367
  • [42] End-to-end prediction of weld penetration: A deep learning and transfer learning based method
    Jiao, Wenhua
    Wang, Qiyue
    Cheng, Yongchao
    Zhang, YuMing
    JOURNAL OF MANUFACTURING PROCESSES, 2021, 63 : 191 - 197
  • [43] End-to-End Learning of Representations for Asynchronous Event-Based Data
    Gehrig, Daniel
    Loquercio, Antonio
    Derpanis, Konstantinos G.
    Scaramuzza, Davide
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5632 - 5642
  • [44] End-to-End Airplane Detection Using Transfer Learning in Remote Sensing Images
    Chen, Zhong
    Zhang, Ting
    Ouyang, Chao
    REMOTE SENSING, 2018, 10 (01)
  • [45] End-to-end tire defect detection model based on transfer learning techniques
    Saleh R.A.A.
    Konyar M.Z.
    Kaplan K.
    Ertunç H.M.
    Neural Computing and Applications, 2024, 36 (20) : 12483 - 12503
  • [46] End-to-end Indonesian Speech Synthesis Based On Transfer Learning And Alternate Training
    Lu, Yu
    Yang, Jian
    Yang, Ruolin
    2021 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2021, : 30 - 35
  • [47] END-TO-END SPEECH RECOGNITION USING A HIGH RANK LSTM-CTC BASED MODEL
    Shi, Yangyang
    Hwang, Mei-Yuh
    Lei, Xin
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7080 - 7084
  • [48] AN END-TO-END CHINESE TEXT NORMALIZATION MODEL BASED ON RULE-GUIDED FLAT-LATTICE TRANSFORMER
    Dai, Wenlin
    Song, Changhe
    Li, Xiang
    Wu, Zhiyong
    Pan, Huashan
    Li, Xiulin
    Meng, Helen
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7122 - 7126
  • [49] An end-to-end distance measuring for mixed data based on deep relevance learning
    Cheng, Li
    Wang, Yijie
    Ma, Xingkong
    INTELLIGENT DATA ANALYSIS, 2020, 24 (01) : 83 - 99
  • [50] Deep Learning End-to-End Approach for the Prediction of Tinnitus based on EEG Data
    Allgaier, Johannes
    Neff, Patrick
    Schlee, Winfried
    Schoisswohl, Stefan
    Pryss, Ruediger
    2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC), 2021, : 816 - 819