Exploiting semi-supervised training through a dropout regularization in end-to-end speech recognition

被引:10
|
作者
Dey, Subhadeep [1 ]
Motlicek, Petr [1 ]
Bui, Trung [2 ]
Dernoncourt, Franck [2 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
[2] Adobe Res, San Jose, CA USA
来源
关键词
speech recognition; semi-supervised learning; end-to-end ASR; dropout; NEURAL-NETWORKS;
D O I
10.21437/Interspeech.2019-3246
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In this paper, we explore various approaches for semi-supervised learning in an end-to-end automatic speech recognition (ASR) framework. The first step in our approach involves training a seed model on the limited amount of labelled data. Additional unlabelled speech data is employed through a data-selection mechanism to obtain the best hypothesized output, further used to retrain the seed model. However, uncertainties of the model may not be well captured with a single hypothesis. As opposed to this technique, we apply a dropout mechanism to capture the uncertainty by obtaining multiple hypothesized text transcripts of an speech recording. We assume that the diversity of automatically generated transcripts for an utterance will implicitly increase the reliability of the model. Finally, the data-selection process is also applied on these hypothesized transcripts to reduce the uncertainty. Experiments on freely-available TEDLIUM corpus and proprietary Adobe's internal dataset show that the proposed approach significantly reduces ASR errors, compared to the baseline model.
引用
收藏
页码:734 / 738
页数:5
相关论文
共 50 条
  • [41] Improved training of end-to-end attention models for speech recognition
    Zeyer, Albert
    Irie, Kazuki
    Schlueter, Ralf
    Ney, Hermann
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 7 - 11
  • [42] Serialized Output Training for End-to-End Overlapped Speech Recognition
    Kanda, Naoyuki
    Gaur, Yashesh
    Wang, Xiaofei
    Meng, Zhong
    Yoshioka, Takuya
    INTERSPEECH 2020, 2020, : 2797 - 2801
  • [43] End-to-End Semi-Supervised Opportunistic Osteoporosis Screening Using Computed Tomography
    Oh, Jieun
    Kim, Boah
    Oh, Gyutaek
    Hwangbo, Yul
    Ye, Jong Chul
    ENDOCRINOLOGY AND METABOLISM, 2024, 39 (03) : 500 - 510
  • [44] Semi-supervised Trajectory Understanding with POI Attention for End-to-End Trip Recommendation
    Zhou, Fan
    Wu, Hantao
    Trajcevski, Goce
    Khokhar, Ashfaq
    Zhang, Kunpeng
    ACM TRANSACTIONS ON SPATIAL ALGORITHMS AND SYSTEMS, 2020, 6 (02)
  • [45] Towards Precise End-to-end Semi-Supervised Human Head Detection Network
    Li, Rongchun
    Zhang, Junjie
    Liu, Yuntao
    Dou, Yong
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [46] Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework
    Zhou, Qiang
    Yu, Chaohui
    Wang, Zhibin
    Qian, Qi
    Li, Hao
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4079 - 4088
  • [47] Subword Regularization: An Analysis of Scalability and Generalization for End-to-End Automatic Speech Recognition
    Lakomkin, Egor
    Heymann, Jahn
    Sklyar, Ilya
    Wiesler, Simon
    INTERSPEECH 2020, 2020, : 3600 - 3604
  • [48] SUBWORD REGULARIZATION AND BEAM SEARCH DECODING FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
    Drexler, Jennifer
    Glass, James
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6266 - 6270
  • [49] AN EXPLORATION OF SELF-SUPERVISED PRETRAINED REPRESENTATIONS FOR END-TO-END SPEECH RECOGNITION
    Chang, Xuankai
    Maekaku, Takashi
    Guo, Pengcheng
    Shi, Jing
    Lu, Yen-Ju
    Subramanian, Aswin Shanmugam
    Wang, Tianzi
    Yang, Shu-wen
    Tsao, Yu
    Lee, Hung-yi
    Watanabe, Shinji
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 228 - 235
  • [50] End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation
    Chang, Xuankai
    Maekaku, Takashi
    Fujita, Yuya
    Watanabe, Shinji
    INTERSPEECH 2022, 2022, : 3819 - 3823