CONTEXTUAL SPEECH RECOGNITION WITH DIFFICULT NEGATIVE TRAINING EXAMPLES

被引:0
|
作者
Alon, Uri [1 ,2 ]
Pundak, Golan [2 ]
Sainath, Tara N. [2 ]
机构
[1] Technion, Haifa, Israel
[2] Google Inc, Mountain View, CA USA
关键词
speech recognition; sequence-to-sequence models; phonetics; attention; biasing;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Improving the representation of contextual information is key to unlocking the potential of end-to-end (E2E) automatic speech recognition (ASR). In this work, we present a novel and simple approach for training an ASR context mechanism with difficult negative examples. The main idea is to focus on proper nouns (e.g., unique entities such as names of people and places) in the reference transcript and use phonetically similar phrases as negative examples, encouraging the neural model to learn more discriminative representations. We apply our approach to an end-to-end contextual ASR model that jointly learns to transcribe and select the correct context items. We show that our proposed method gives up to 53:1% relative improvement in word error rate (WER) across several benchmarks.
引用
收藏
页码:6440 / 6444
页数:5
相关论文
共 50 条
  • [21] Automatic Speech Recognition and Pronunciation Training
    Xiao, Wenqi
    PROCEEDINGS OF THE 2018 2ND INTERNATIONAL CONFERENCE ON EDUCATION, ECONOMICS AND MANAGEMENT RESEARCH (ICEEMR 2018), 2018, 182 : 466 - 468
  • [22] TRAINING AND SEARCH METHODS FOR SPEECH RECOGNITION
    JELINEK, F
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1995, 92 (22) : 9964 - 9969
  • [23] HMM speech recognition with reduced training
    Foo, SW
    Yap, T
    ICICS - PROCEEDINGS OF 1997 INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING, VOLS 1-3: THEME: TRENDS IN INFORMATION SYSTEMS ENGINEERING AND WIRELESS MULTIMEDIA COMMUNICATIONS, 1997, : 1016 - 1019
  • [24] Keyword Spotting for Google Assistant Using Contextual Speech Recognition
    Michaely, Assaf Hurwitz
    Zhang, Xuedong
    Simko, Gabor
    Parada, Carolina
    Aleksic, Petar
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 272 - 278
  • [25] The demiphone:: An efficient contextual subword unit for continuous speech recognition
    Mariño, JB
    Nogueiras, A
    Pachès-Leal, P
    Bonafonte, A
    SPEECH COMMUNICATION, 2000, 32 (03) : 187 - 197
  • [26] INTER-FRAME CONTEXTUAL MODELLING FOR VISUAL SPEECH RECOGNITION
    Pass, Adrian
    Ming, Ji
    Hanna, Philip
    Zhang, Jianguo
    Stewart, Darryl
    2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 93 - 96
  • [27] Automatic Speech Recognition Transformer with Global Contextual Information Decoder
    Qian, Yukun
    Zhuang, Xuyi
    Wang, Mingjiang
    INTERSPEECH 2023, 2023, : 4474 - 4478
  • [28] INTRODUCING CONTEXTUAL TRANSCRIPTION RULES IN LARGE VOCABULARY SPEECH RECOGNITION
    Gravier, Guillaume
    Yvon, Francois
    Jacob, Bruno
    Bimbot, Frederic
    INTEGRATION OF PHONETIC KNOWLEDGE IN SPEECH TECHNOLOGY, 2005, 25 : 87 - 106
  • [29] Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition
    Xu, Tianyi
    Yang, Zhanheng
    Huang, Kaixun
    Guo, Pengcheng
    Zhang, Ao
    Li, Biao
    Chen, Changru
    Li, Chao
    Xie, Lei
    INTERSPEECH 2023, 2023, : 1668 - 1672
  • [30] DEEP CONTEXT: END-TO-END CONTEXTUAL SPEECH RECOGNITION
    Pundak, Golan
    Sainath, Tara N.
    Prabhavalkar, Rohit
    Kannan, Anjuli
    Zhao, Ding
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 418 - 425