SEMI-SUPERVISED TRAINING FOR END-TO-END MODELS VIA WEAK DISTILLATION

被引：0

作者：

Li, Bo ^{[1
]}

Sainath, Tara N. ^{[1
]}

Pang, Ruoming ^{[1
]}

Wu, Zelin ^{[1
]}

机构：

[1] Google LLC, Mountain View, CA 94043 USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

semi-supervised training; sequence to sequence;

D O I：

10.1109/icassp.2019.8682172

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

End-to-end (E2E) models are a promising research direction in speech recognition, as the single all-neural E2E system offers a much simpler and more compact solution compared to a conventional model, which has a separate acoustic (AM), pronunciation (PM) and language model (LM). However, it has been noted that E2E models perform poorly on tail words and proper nouns, likely because the end-to-end optimization requires joint audio-text pairs, and does not take advantage of additional lexicons and large amounts of text-only data used to train the LMs in conventional models. There has been numerous efforts in training an RNN-LM on text-only data and fusing it into the end-to-end model. In this work, we contrast this approach to training the E2E model with audio-text pairs generated from unsupervised speech data. To target the proper noun issue specifically, we adopt a Part-of-Speech (POS) tagger to filter the unsupervised data to use only those with proper nouns. We show that training with filtered unsupervised-data provides up to a 13% relative reduction in word-error-rate (WER), and when used in conjunction with a cold-fusion RNN-LM, up to a 17% relative improvement.

引用

页码：2837 / 2841

页数：5

共 50 条

[41] Multiple-hypothesis CTC-based semi-supervised adaptation of end-to-end speech recognition
Do, Cong-Thanh
Doddipatla, Rama
Hain, Thomas
2021, arXiv
[42] MULTIPLE-HYPOTHESIS CTC-BASED SEMI-SUPERVISED ADAPTATION OF END-TO-END SPEECH RECOGNITION
Do, Cong-Thanh
Doddipatla, Rama
Hain, Thomas
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6978 - 6982
[43] Framewise Supervised Training towards End-to-End Speech Recognition Models: First Results
Li, Mohan
Cao, Yuanjiang
Zhou, Weicong
Liu, Min
INTERSPEECH 2019, 2019, : 1641 - 1645
[44] A new end-to-end semi-supervised deep learning framework for mastering robot-written character identification
Jia, Qilong
Fan, Song
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (06) : 7833 - 7846
[45] IMPROVING SEMI-SUPERVISED END-TO-END AUTOMATIC SPEECH RECOGNITION USING CYCLEGAN AND INTER-DOMAIN LOSSES
Li, Chia-Yu
Vu, Ngoc Thang
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 822 - 829
[46] Self Correspondence Distillation for End-to-End Weakly-Supervised Semantic Segmentation
Xu, Rongtao
Wang, Changwei
Sun, Jiaxi
Xu, Shibiao
Meng, Weiliang
Zhang, Xiaopeng
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3045 - 3053
[47] Teaching Semi-Supervised Classifier via Generalized Distillation
Gong, Chen
Chang, Xiaojun
Fang, Meng
Yang, Jian
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2156 - 2162
[48] A Probabilistic End-To-End Task-Oriented Dialog Model with Latent Belief States towards Semi-Supervised Learning
Zhang, Yichi
Ou, Zhijian
Hu, Min
Feng, Junlan
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 9207 - 9219
[49] Semi-Supervised Monocular 3D Face Reconstruction With End-to-End Shape-Preserved Domain Transfer
Piao, Jingtan
Qian, Chen
Li, Hongsheng
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9397 - 9406
[50] END-TO-END TRAINING APPROACHES FOR DISCRIMINATIVE SEGMENTAL MODELS
Tang, Hao
Wang, Weiran
Gimpel, Kevin
Livescu, Karen
2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 496 - 502

← 1 2 3 4 5 →