SEMI-SUPERVISED TRAINING FOR END-TO-END MODELS VIA WEAK DISTILLATION

被引:0
|
作者
Li, Bo [1 ]
Sainath, Tara N. [1 ]
Pang, Ruoming [1 ]
Wu, Zelin [1 ]
机构
[1] Google LLC, Mountain View, CA 94043 USA
关键词
semi-supervised training; sequence to sequence;
D O I
10.1109/icassp.2019.8682172
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
End-to-end (E2E) models are a promising research direction in speech recognition, as the single all-neural E2E system offers a much simpler and more compact solution compared to a conventional model, which has a separate acoustic (AM), pronunciation (PM) and language model (LM). However, it has been noted that E2E models perform poorly on tail words and proper nouns, likely because the end-to-end optimization requires joint audio-text pairs, and does not take advantage of additional lexicons and large amounts of text-only data used to train the LMs in conventional models. There has been numerous efforts in training an RNN-LM on text-only data and fusing it into the end-to-end model. In this work, we contrast this approach to training the E2E model with audio-text pairs generated from unsupervised speech data. To target the proper noun issue specifically, we adopt a Part-of-Speech (POS) tagger to filter the unsupervised data to use only those with proper nouns. We show that training with filtered unsupervised-data provides up to a 13% relative reduction in word-error-rate (WER), and when used in conjunction with a cold-fusion RNN-LM, up to a 17% relative improvement.
引用
收藏
页码:2837 / 2841
页数:5
相关论文
共 50 条
  • [1] Semi-supervised ASR by End-to-end Self-training
    Chen, Yang
    Wang, Weiran
    Wang, Chao
    INTERSPEECH 2020, 2020, : 2787 - 2791
  • [2] Improving End-to-End Bangla Speech Recognition with Semi-supervised Training
    Sadeq, Nafis
    Chowdhury, Nafis Tahmid
    Utshaw, Farhan Tanvir
    Ahmed, Shafayat
    Adnan, Muhammad Abdullah
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1875 - 1883
  • [3] Semi-Supervised End-to-End Speech Recognition
    Karita, Shigeki
    Watanabe, Shinji
    Iwata, Tomoharu
    Ogawa, Atsunori
    Delcroix, Marc
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2 - 6
  • [4] Semi-Supervised Training with Pseudo-Labeling for End-to-End Neural Diarization
    Takashima, Yuki
    Fujita, Yusuke
    Horiguchi, Shota
    Watanabe, Shinji
    Garcia, Paola
    Nagamatsu, Kenji
    INTERSPEECH 2021, 2021, : 3096 - 3100
  • [5] SEMI-SUPERVISED TRAINING FOR IMPROVING DATA EFFICIENCY IN END-TO-END SPEECH SYNTHESIS
    Chung, Yu-An
    Wang, Yuxuan
    Hsu, Wei-Ning
    Zhang, Yu
    Skerry-Ryan, R. J.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6940 - 6944
  • [6] SEMI-SUPERVISED SPEAKER ADAPTATION FOR END-TO-END SPEECH SYNTHESIS WITH PRETRAINED MODELS
    Inoue, Katsuki
    Hara, Sunao
    Abe, Masanobu
    Hayashi, Tomoki
    Yamamoto, Ryuichi
    Watanabe, Shinji
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7634 - 7638
  • [7] SEMI-SUPERVISED END-TO-END SPEECH RECOGNITION VIA LOCAL PRIOR MATCHING
    Hsu, Wei-Ning
    Lee, Ann
    Synnaeve, Gabriel
    Hannun, Awni
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 125 - 132
  • [8] End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training
    Wu, Pengfei
    Ling, Zhenhua
    Liu, Lijuan
    Jiang, Yuan
    Wu, Hongchuan
    Dai, Lirong
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 623 - 627
  • [9] Exploiting semi-supervised training through a dropout regularization in end-to-end speech recognition
    Dey, Subhadeep
    Motlicek, Petr
    Bui, Trung
    Dernoncourt, Franck
    INTERSPEECH 2019, 2019, : 734 - 738
  • [10] SEMI-SUPERVISED LEARNING BASED ON HIERARCHICAL GENERATIVE MODELS FOR END-TO-END SPEECH SYNTHESIS
    Fujimoto, Takato
    Takaki, Shinji
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7644 - 7648