Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR in Transfer Learning

被引:4
|
作者
Zeng, Zhiping [1 ,2 ]
Pham, Van Tung [1 ]
Xu, Haihua [1 ]
Khassanov, Yerbolat [1 ,3 ]
Chng, Eng Siong [1 ]
Ni, Chongjia [4 ]
Ma, Bin [4 ]
机构
[1] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
[2] Huya AI, Guangzhou, Peoples R China
[3] Nazarbayev Univ, ISSAI, Astana, Kazakhstan
[4] Alibaba Grp, Machine Intelligence Technol, Hangzhou, Peoples R China
关键词
cross-lingual transfer learning; transformer; lstm; unpaired text; independent language model;
D O I
10.1109/ISCSLP49672.2021.9362086
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we study leveraging extra text data to improve low-resource end-to-end ASR under cross-lingual transfer learning setting. To this end, we extend the prior work [1], and propose a hybrid Transformer-LSTM based architecture. This architecture not only takes advantage of the highly effective encoding capacity of the Transformer network but also benefits from extra text data due to the LSTM-based independent language model network. We conduct experiments on our in-house Malay corpus which contains limited labeled data and a large amount of extra text. Results show that the proposed architecture outperforms the previous LSTM-based architecture [1] by 24.2% relative word error rate (WER) when both are trained using limited labeled data. Starting from this, we obtain further 25.4% relative WER reduction by transfer learning from another resource-rich language. Moreover, we obtain additional 13.6% relative WER reduction by boosting the LSTM decoder of the transferred model with the extra text data. Overall, our best model outperforms the vanilla Transformer ASR by 11.9% relative WER. Last but not least, the proposed hybrid architecture offers much faster inference compared to both LSTM and Transformer architectures.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Deep-learning based end-to-end system for text reading in the wild
    Harizi, Riadh
    Walha, Rim
    Drira, Fadoua
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (17) : 24691 - 24719
  • [32] Modal Contrastive Learning Based End-to-End Text Image Machine Translation
    Ma, Cong
    Han, Xu
    Wu, Linghui
    Zhang, Yaping
    Zhao, Yang
    Zhou, Yu
    Zong, Chengqing
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2153 - 2165
  • [33] Deep-learning based end-to-end system for text reading in the wild
    Riadh Harizi
    Rim Walha
    Fadoua Drira
    Multimedia Tools and Applications, 2022, 81 : 24691 - 24719
  • [34] Far-Field End-to-End Text-Dependent Speaker Verification based on Mixed Training Data with Transfer Learning and Enrollment Data Augmentation
    Qin, Xiaoyi
    Cai, Danwei
    Li, Ming
    INTERSPEECH 2019, 2019, : 4045 - 4049
  • [35] SymFormer: End-to-End Symbolic Regression Using Transformer-Based Architecture
    Vastl, Martin
    Kulhanek, Jonas
    Kubalik, Jiri
    Derner, Erik
    Babuska, Robert
    IEEE ACCESS, 2024, 12 : 37840 - 37849
  • [36] Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning
    Chang, Xuankai
    Yan, Brian
    Fujita, Yuya
    Maekaku, Takashi
    Watanabe, Shinji
    INTERSPEECH 2023, 2023, : 1399 - 1403
  • [37] UNSUPERVISED SPEAKER ADAPTATION USING ATTENTION-BASED SPEAKER MEMORY FOR END-TO-END ASR
    Sari, Leda
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7384 - 7388
  • [38] HIERARCHICAL TRANSFORMER-BASED LARGE-CONTEXT END-TO-END ASR WITH LARGE-CONTEXT KNOWLEDGE DISTILLATION
    Masumura, Ryo
    Makishima, Naoki
    Ihori, Mana
    Takashima, Akihiko
    Tanaka, Tomohiro
    Orihashi, Shota
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5879 - 5883
  • [39] Hierarchical transformer-based large-context end-to-end ASR with large-context knowledge distillation
    Masumura, Ryo
    Makishima, Naoki
    Ihori, Mana
    Takashima, Akihiko
    Tanaka, Tomohiro
    Orihashi, Shota
    arXiv, 2021,
  • [40] CAT: A CTC-CRF based ASR Toolkit Bridging the Hybrid and the End-to-end Approaches towards Data Efficiency and Low Latency
    An, Keyu
    Xiang, Hongyu
    Ou, Zhijian
    INTERSPEECH 2020, 2020, : 566 - 570