SEMI-SUPERVISED TRAINING FOR END-TO-END MODELS VIA WEAK DISTILLATION

被引:0
|
作者
Li, Bo [1 ]
Sainath, Tara N. [1 ]
Pang, Ruoming [1 ]
Wu, Zelin [1 ]
机构
[1] Google LLC, Mountain View, CA 94043 USA
关键词
semi-supervised training; sequence to sequence;
D O I
10.1109/icassp.2019.8682172
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
End-to-end (E2E) models are a promising research direction in speech recognition, as the single all-neural E2E system offers a much simpler and more compact solution compared to a conventional model, which has a separate acoustic (AM), pronunciation (PM) and language model (LM). However, it has been noted that E2E models perform poorly on tail words and proper nouns, likely because the end-to-end optimization requires joint audio-text pairs, and does not take advantage of additional lexicons and large amounts of text-only data used to train the LMs in conventional models. There has been numerous efforts in training an RNN-LM on text-only data and fusing it into the end-to-end model. In this work, we contrast this approach to training the E2E model with audio-text pairs generated from unsupervised speech data. To target the proper noun issue specifically, we adopt a Part-of-Speech (POS) tagger to filter the unsupervised data to use only those with proper nouns. We show that training with filtered unsupervised-data provides up to a 13% relative reduction in word-error-rate (WER), and when used in conjunction with a cold-fusion RNN-LM, up to a 17% relative improvement.
引用
收藏
页码:2837 / 2841
页数:5
相关论文
共 50 条
  • [41] Multiple-hypothesis CTC-based semi-supervised adaptation of end-to-end speech recognition
    Do, Cong-Thanh
    Doddipatla, Rama
    Hain, Thomas
    2021, arXiv
  • [42] MULTIPLE-HYPOTHESIS CTC-BASED SEMI-SUPERVISED ADAPTATION OF END-TO-END SPEECH RECOGNITION
    Do, Cong-Thanh
    Doddipatla, Rama
    Hain, Thomas
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6978 - 6982
  • [43] Framewise Supervised Training towards End-to-End Speech Recognition Models: First Results
    Li, Mohan
    Cao, Yuanjiang
    Zhou, Weicong
    Liu, Min
    INTERSPEECH 2019, 2019, : 1641 - 1645
  • [44] A new end-to-end semi-supervised deep learning framework for mastering robot-written character identification
    Jia, Qilong
    Fan, Song
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (06) : 7833 - 7846
  • [45] IMPROVING SEMI-SUPERVISED END-TO-END AUTOMATIC SPEECH RECOGNITION USING CYCLEGAN AND INTER-DOMAIN LOSSES
    Li, Chia-Yu
    Vu, Ngoc Thang
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 822 - 829
  • [46] Self Correspondence Distillation for End-to-End Weakly-Supervised Semantic Segmentation
    Xu, Rongtao
    Wang, Changwei
    Sun, Jiaxi
    Xu, Shibiao
    Meng, Weiliang
    Zhang, Xiaopeng
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3045 - 3053
  • [47] Teaching Semi-Supervised Classifier via Generalized Distillation
    Gong, Chen
    Chang, Xiaojun
    Fang, Meng
    Yang, Jian
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2156 - 2162
  • [48] A Probabilistic End-To-End Task-Oriented Dialog Model with Latent Belief States towards Semi-Supervised Learning
    Zhang, Yichi
    Ou, Zhijian
    Hu, Min
    Feng, Junlan
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 9207 - 9219
  • [49] Semi-Supervised Monocular 3D Face Reconstruction With End-to-End Shape-Preserved Domain Transfer
    Piao, Jingtan
    Qian, Chen
    Li, Hongsheng
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9397 - 9406
  • [50] END-TO-END TRAINING APPROACHES FOR DISCRIMINATIVE SEGMENTAL MODELS
    Tang, Hao
    Wang, Weiran
    Gimpel, Kevin
    Livescu, Karen
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 496 - 502