IMPROVING SEMI-SUPERVISED END-TO-END AUTOMATIC SPEECH RECOGNITION USING CYCLEGAN AND INTER-DOMAIN LOSSES

被引：1

作者：

Li, Chia-Yu ^{[1
]}

Vu, Ngoc Thang ^{[1
]}

机构：

[1] Univ Stuttgart, Inst Nat Language Proc IMS, Stuttgart, Germany

来源：

2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT | 2022年

关键词：

speech recognition; End-to-end; semisupervised training; CycleGAN;

D O I：

10.1109/SLT54892.2023.10022448

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a novel method that combines CycleGAN and inter-domain losses for semi-supervised end-to-end automatic speech recognition. Inter-domain loss targets the extraction of an intermediate shared representation of speech and text inputs using a shared network. CycleGAN uses cycleconsistent loss and the identity mapping loss to preserve relevant characteristics of the input feature after converting from one domain to another. As such, both approaches are suitable to train end-to-end models on unpaired speech-text inputs. In this paper, we exploit the advantages from both inter-domain loss and CycleGAN to achieve better shared representation of unpaired speech and text inputs and thus improve the speech-to-text mapping. Our experimental results on the WSJ eval92 and Voxforge (non English) show 8 similar to 8.5% character error rate reduction over the baseline, and the results on LibriSpeech test clean also show noticeable improvement.

引用

页码：822 / 829

页数：8

共 50 条

[21] Semi-Supervised End-to-End Learning for Integrated Sensing and Communications
Mateos-Ramos, Jose Miguel
Chatelier, Baptiste
Hager, Christian
Keskin, Musa Furkan
Le Magoarou, Luc
Wymeersch, Henk
2024 IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING FOR COMMUNICATION AND NETWORKING, ICMLCN 2024, 2024, : 132 - 138
[22] GrowingNet: An end-to-end growing network for semi-supervised learning
Zhang, Qifei
Yu, Xiaomo
COMPUTER COMMUNICATIONS, 2020, 151 : 208 - 215
[23] ACTIVEMATCH: END-TO-END SEMI-SUPERVISED ACTIVE REPRESENTATION LEARNING
Yuan, Xinkai
Li, Zilinghan
Wang, Gaoang
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1136 - 1140
[24] End-to-End Semi-supervised Learning for Differentiable Particle Filters
Wen, Hao
Chen, Xiongjie
Papagiannis, Georgios
Hu, Conghui
Li, Yunpeng
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 5825 - 5831
[25] End-to-End Semi-Supervised Learning for Video Action Detection
Kumar, Akash
Rawat, Yogesh Singh
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 14680 - 14690
[26] Semi-supervised ASR by End-to-end Self-training
Chen, Yang
Wang, Weiran
Wang, Chao
INTERSPEECH 2020, 2020, : 2787 - 2791
[27] End-to-End Semi-Supervised Object Detection with Soft Teacher
Xu, Mengde
Zhang, Zheng
Hu, Han
Wang, Jianfeng
Wang, Lijuan
Wei, Fangyun
Bai, Xiang
Liu, Zicheng
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 3040 - 3049
[28] Semi-Supervised Learning with Data Augmentation for End-to-End ASR
Weninger, Felix
Mana, Franco
Gemello, Roberto
Andres-Ferrer, Jesus
Zhan, Puming
INTERSPEECH 2020, 2020, : 2802 - 2806
[29] An architecture for end-to-end and inter-domain trusted mail delivery service
Ayla, Erkut Sinan
Ozgit, Attila
ISCN '06: PROCEEDINGS OF THE 7TH INTERNATIONAL SYMPOSIUM ON COMPUTER NETWORKS, 2006, : 220 - +
[30] INCREMENTAL LEARNING FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
Fu, Li
Li, Xiaoxiao
Zi, Libo
Zhang, Zhengchen
Wu, Youzheng
He, Xiaodong
Zhou, Bowen
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 320 - 327

← 1 2 3 4 5 →