CONTEXTUAL SPEECH RECOGNITION WITH DIFFICULT NEGATIVE TRAINING EXAMPLES

被引:0
|
作者
Alon, Uri [1 ,2 ]
Pundak, Golan [2 ]
Sainath, Tara N. [2 ]
机构
[1] Technion, Haifa, Israel
[2] Google Inc, Mountain View, CA USA
关键词
speech recognition; sequence-to-sequence models; phonetics; attention; biasing;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Improving the representation of contextual information is key to unlocking the potential of end-to-end (E2E) automatic speech recognition (ASR). In this work, we present a novel and simple approach for training an ASR context mechanism with difficult negative examples. The main idea is to focus on proper nouns (e.g., unique entities such as names of people and places) in the reference transcript and use phonetically similar phrases as negative examples, encouraging the neural model to learn more discriminative representations. We apply our approach to an end-to-end contextual ASR model that jointly learns to transcribe and select the correct context items. We show that our proposed method gives up to 53:1% relative improvement in word error rate (WER) across several benchmarks.
引用
收藏
页码:6440 / 6444
页数:5
相关论文
共 50 条
  • [1] Training Augmentation with Adversarial Examples for Robust Speech Recognition
    Sun, Sining
    Yeh, Ching-Feng
    Ostendorf, Mari
    Hwang, Mei-Yuh
    Xie, Lei
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2404 - 2408
  • [2] Contextual Partitioning for Speech Recognition
    Kent, Christopher G.
    Paul, Joann M.
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2013, 13 (01)
  • [3] Contextual variability during speech-in-speech recognition
    20142917942722
    Brouwer, S. (s.m.brouwer@uu.nl), 1600, Acoustical Society of America (136):
  • [4] Contextual variability during speech-in-speech recognition
    Brouwer, Susanne
    Bradlow, Ann R.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2014, 136 (01): : EL26 - EL32
  • [5] Contextual prediction models for speech recognition
    Halpern, Yoni
    Hall, Keith
    Schogol, Wad
    Riley, Michael
    Roark, Brian
    Skobeltsyn, Gleb
    Bauml, Martin
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2338 - 2342
  • [6] CONTEXTUAL RELATIVITY OF SPEECH RECOGNITION CUES
    SCHULTZ, MC
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1971, 49 (01): : 94 - &
  • [7] Bringing Contextual Information to Google Speech Recognition
    Aleksic, Petar
    Ghodsi, Mohammadreza
    Michaely, Assaf
    Allauzen, Cyril
    Hall, Keith
    Roark, Brian
    Rybach, David
    Moreno, Pedro
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 468 - 472
  • [8] Contextual confidence measures for continuous speech recognition
    Hernández-Abrego, G
    Mariño, JB
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1803 - 1806
  • [9] UTILIZATION OF CONTEXTUAL CONSTRAINTS IN AUTOMATIC SPEECH RECOGNITION
    ALTER, R
    IEEE TRANSACTIONS ON AUDIO AND ELECTROACOUSTICS, 1968, AU16 (01): : 6 - &
  • [10] Histogram equalization of contextual statistics of speech features for robust speech recognition
    Hsieh, Hsin-Ju
    Chen, Berlin
    Hung, Jeih-weih
    MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (17) : 6769 - 6795