CONTEXTUAL SPEECH RECOGNITION WITH DIFFICULT NEGATIVE TRAINING EXAMPLES

被引:0
|
作者
Alon, Uri [1 ,2 ]
Pundak, Golan [2 ]
Sainath, Tara N. [2 ]
机构
[1] Technion, Haifa, Israel
[2] Google Inc, Mountain View, CA USA
关键词
speech recognition; sequence-to-sequence models; phonetics; attention; biasing;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Improving the representation of contextual information is key to unlocking the potential of end-to-end (E2E) automatic speech recognition (ASR). In this work, we present a novel and simple approach for training an ASR context mechanism with difficult negative examples. The main idea is to focus on proper nouns (e.g., unique entities such as names of people and places) in the reference transcript and use phonetically similar phrases as negative examples, encouraging the neural model to learn more discriminative representations. We apply our approach to an end-to-end contextual ASR model that jointly learns to transcribe and select the correct context items. We show that our proposed method gives up to 53:1% relative improvement in word error rate (WER) across several benchmarks.
引用
收藏
页码:6440 / 6444
页数:5
相关论文
共 50 条
  • [41] Detecting Adversarial Examples for Speech Recognition via Uncertainty Quantification
    Daeubener, Sina
    Schoenherr, Lea
    Fischer, Asja
    Kolossa, Dorothea
    INTERSPEECH 2020, 2020, : 4661 - 4665
  • [42] Speech-oriented negative emotion recognition
    He, Liang
    Bo, Yuming
    Zhao, Gaopeng
    2015 34TH CHINESE CONTROL CONFERENCE (CCC), 2015, : 3553 - 3558
  • [43] Recognition of negative emotions from the speech signal
    Lee, CM
    Narayanan, S
    Pieraccini, R
    ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 240 - 243
  • [44] Parallel Training of Neural Networks for Speech Recognition
    Vesely, Karel
    Burget, Lukas
    Grezl, Frantisek
    TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 439 - 446
  • [45] OPTIMAL SOLUTION OF A TRAINING PROBLEM IN SPEECH RECOGNITION
    NADAS, A
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (01): : 326 - 329
  • [46] Training Speech Recognition Models on HPC Infrastructure
    Karkada, Deepthi
    Saletore, Vikram A.
    PROCEEDINGS OF 2018 IEEE/ACM MACHINE LEARNING IN HPC ENVIRONMENTS (MLHPC 2018), 2018, : 124 - 132
  • [47] Research on Acceleration Method of Speech Recognition Training
    Bai, Liang
    Jiang, Jingfei
    Dou, Yong
    ADVANCED COMPUTER ARCHITECTURE, 2018, 908 : 42 - 50
  • [48] A robust training algorithm for adverse speech recognition
    Hong, WT
    Chen, SH
    SPEECH COMMUNICATION, 2000, 30 (04) : 273 - 293
  • [49] TRAINING SET DESIGN FOR CONNECTED SPEECH RECOGNITION
    BROWN, MK
    MCGEE, MA
    RABINER, LR
    WILPON, JG
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1991, 39 (06) : 1268 - 1281
  • [50] Study of parallelization of the training for automatic speech recognition
    Daoudi, EM
    Meziane, A
    El Hadj, YOM
    HIGH PERFORMANCE COMPUTING AND NETWORKING, PROCEEDINGS, 2000, 1823 : 576 - 579