IMPROVING SEMI-SUPERVISED END-TO-END AUTOMATIC SPEECH RECOGNITION USING CYCLEGAN AND INTER-DOMAIN LOSSES

被引:1
|
作者
Li, Chia-Yu [1 ]
Vu, Ngoc Thang [1 ]
机构
[1] Univ Stuttgart, Inst Nat Language Proc IMS, Stuttgart, Germany
关键词
speech recognition; End-to-end; semisupervised training; CycleGAN;
D O I
10.1109/SLT54892.2023.10022448
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a novel method that combines CycleGAN and inter-domain losses for semi-supervised end-to-end automatic speech recognition. Inter-domain loss targets the extraction of an intermediate shared representation of speech and text inputs using a shared network. CycleGAN uses cycleconsistent loss and the identity mapping loss to preserve relevant characteristics of the input feature after converting from one domain to another. As such, both approaches are suitable to train end-to-end models on unpaired speech-text inputs. In this paper, we exploit the advantages from both inter-domain loss and CycleGAN to achieve better shared representation of unpaired speech and text inputs and thus improve the speech-to-text mapping. Our experimental results on the WSJ eval92 and Voxforge (non English) show 8 similar to 8.5% character error rate reduction over the baseline, and the results on LibriSpeech test clean also show noticeable improvement.
引用
收藏
页码:822 / 829
页数:8
相关论文
共 50 条
  • [1] Semi-Supervised End-to-End Speech Recognition
    Karita, Shigeki
    Watanabe, Shinji
    Iwata, Tomoharu
    Ogawa, Atsunori
    Delcroix, Marc
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2 - 6
  • [2] Improving End-to-End Bangla Speech Recognition with Semi-supervised Training
    Sadeq, Nafis
    Chowdhury, Nafis Tahmid
    Utshaw, Farhan Tanvir
    Ahmed, Shafayat
    Adnan, Muhammad Abdullah
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1875 - 1883
  • [3] SEMI-SUPERVISED END-TO-END SPEECH RECOGNITION USING TEXT-TO-SPEECH AND AUTOENCODERS
    Karita, Shigeki
    Watanabe, Shinji
    Iwata, Tomoharu
    Delcroix, Marc
    Ogawa, Atsunori
    Nakatani, Tomohiro
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6166 - 6170
  • [4] End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning
    Tanaka, Tomohiro
    Masumura, Ryo
    Ihori, Mana
    Takashima, Akihiko
    Orihashi, Shota
    Makishima, Naoki
    INTERSPEECH 2021, 2021, : 4458 - 4462
  • [5] SEQUENCE-LEVEL CONSISTENCY TRAINING FOR SEMI-SUPERVISED END-TO-END AUTOMATIC SPEECH RECOGNITION
    Masumura, Ryo
    Ihori, Mana
    Takashima, Akihiko
    Moriya, Takafumi
    Ando, Atsushi
    Shinohara, Yusuke
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7054 - 7058
  • [6] SEMI-SUPERVISED TRAINING FOR IMPROVING DATA EFFICIENCY IN END-TO-END SPEECH SYNTHESIS
    Chung, Yu-An
    Wang, Yuxuan
    Hsu, Wei-Ning
    Zhang, Yu
    Skerry-Ryan, R. J.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6940 - 6944
  • [7] SEMI-SUPERVISED END-TO-END SPEECH RECOGNITION VIA LOCAL PRIOR MATCHING
    Hsu, Wei-Ning
    Lee, Ann
    Synnaeve, Gabriel
    Hannun, Awni
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 125 - 132
  • [8] Exploiting semi-supervised training through a dropout regularization in end-to-end speech recognition
    Dey, Subhadeep
    Motlicek, Petr
    Bui, Trung
    Dernoncourt, Franck
    INTERSPEECH 2019, 2019, : 734 - 738
  • [9] End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training
    Wu, Pengfei
    Ling, Zhenhua
    Liu, Lijuan
    Jiang, Yuan
    Wu, Hongchuan
    Dai, Lirong
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 623 - 627
  • [10] SEE: Towards Semi-Supervised End-to-End Scene Text Recognition
    Bartz, Christian
    Yang, Haojin
    Meinel, Christoph
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 6674 - 6681