SEMI-SUPERVISED TRAINING FOR END-TO-END MODELS VIA WEAK DISTILLATION

被引：0

作者：

Li, Bo ^{[1
]}

Sainath, Tara N. ^{[1
]}

Pang, Ruoming ^{[1
]}

Wu, Zelin ^{[1
]}

机构：

[1] Google LLC, Mountain View, CA 94043 USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

semi-supervised training; sequence to sequence;

D O I：

10.1109/icassp.2019.8682172

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

End-to-end (E2E) models are a promising research direction in speech recognition, as the single all-neural E2E system offers a much simpler and more compact solution compared to a conventional model, which has a separate acoustic (AM), pronunciation (PM) and language model (LM). However, it has been noted that E2E models perform poorly on tail words and proper nouns, likely because the end-to-end optimization requires joint audio-text pairs, and does not take advantage of additional lexicons and large amounts of text-only data used to train the LMs in conventional models. There has been numerous efforts in training an RNN-LM on text-only data and fusing it into the end-to-end model. In this work, we contrast this approach to training the E2E model with audio-text pairs generated from unsupervised speech data. To target the proper noun issue specifically, we adopt a Part-of-Speech (POS) tagger to filter the unsupervised data to use only those with proper nouns. We show that training with filtered unsupervised-data provides up to a 13% relative reduction in word-error-rate (WER), and when used in conjunction with a cold-fusion RNN-LM, up to a 17% relative improvement.

引用

页码：2837 / 2841

页数：5

共 50 条

[31] SEMI-SUPERVISED END-TO-END SPEECH RECOGNITION USING TEXT-TO-SPEECH AND AUTOENCODERS
Karita, Shigeki
Watanabe, Shinji
Iwata, Tomoharu
Delcroix, Marc
Ogawa, Atsunori
Nakatani, Tomohiro
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6166 - 6170
[32] SEMI-SUPERVISED TRANSFER LEARNING FOR LANGUAGE EXPANSION OF END-TO-END SPEECH RECOGNITION MODELS TO LOW-RESOURCE LANGUAGES
Kim, Jiyeon
Kumar, Mehul
Gowda, Dhananjaya
Garg, Abhinav
Kim, Chanwoo
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 984 - 988
[33] Real-Time End-to-End Vehicle and Landmark Localization Based on Semi-Supervised Learning
Xiao, Nengfei
Xiong, Zhongxia
Ma, Yalong
Wu, Xinkai
CICTP 2023: INNOVATION-EMPOWERED TECHNOLOGY FOR SUSTAINABLE, INTELLIGENT, DECARBONIZED, AND CONNECTED TRANSPORTATION, 2023, : 268 - 278
[34] EMOVA: A Semi-supervised End-to-End Moving-Window Attentive Framework for Aspect Mining
Li, Ning
Chow, Chi-Yin
Zhang, Jia-Dong
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT II, 2020, 12085 : 811 - 823
[35] End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning
Tanaka, Tomohiro
Masumura, Ryo
Ihori, Mana
Takashima, Akihiko
Orihashi, Shota
Makishima, Naoki
INTERSPEECH 2021, 2021, : 4458 - 4462
[36] Tic action recognition for children tic disorder with end-to-end video semi-supervised learning
Wang, Xiangyang
Yang, Kun
Ding, Qiang
Wang, Rui
Sun, Jinhua
VISUAL COMPUTER, 2025,
[37] End-To-End Graph-Based Deep Semi-Supervised Learning with Extended Graph Laplacian
Wang, Zihao
Tu, Enmei
Zhou, Meng
Yang, Jie
2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 5948 - 5953
[38] Dialect-aware Semi-supervised Learning for End-to-End Multi-dialect Speech Recognition
Shiota, Sayaka
Imaizumi, Ryo
Masumura, Ryo
Kiya, Hitoshi
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 240 - 244
[39] Boundary-refined prototype generation: A general end-to-end paradigm for semi-supervised semantic segmentation
Dong, Junhao
Meng, Zhu
Liu, Delong
Liu, Jiaxuan
Zhao, Zhicheng
Su, Fei
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
[40] Seq3seq Fingerprint: Towards End-to-end Semi-supervised Deep Drug Discovery
Zhang, Xiaoyu
Wang, Sheng
Zhu, Feiyun
Xu, Zheng
Wang, Yuhong
Huang, Junzhou
ACM-BCB'18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2018, : 404 - 413

← 1 2 3 4 5 →