CONTEXTUAL SPEECH RECOGNITION WITH DIFFICULT NEGATIVE TRAINING EXAMPLES

被引:0
|
作者
Alon, Uri [1 ,2 ]
Pundak, Golan [2 ]
Sainath, Tara N. [2 ]
机构
[1] Technion, Haifa, Israel
[2] Google Inc, Mountain View, CA USA
关键词
speech recognition; sequence-to-sequence models; phonetics; attention; biasing;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Improving the representation of contextual information is key to unlocking the potential of end-to-end (E2E) automatic speech recognition (ASR). In this work, we present a novel and simple approach for training an ASR context mechanism with difficult negative examples. The main idea is to focus on proper nouns (e.g., unique entities such as names of people and places) in the reference transcript and use phonetically similar phrases as negative examples, encouraging the neural model to learn more discriminative representations. We apply our approach to an end-to-end contextual ASR model that jointly learns to transcribe and select the correct context items. We show that our proposed method gives up to 53:1% relative improvement in word error rate (WER) across several benchmarks.
引用
收藏
页码:6440 / 6444
页数:5
相关论文
共 50 条
  • [31] Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition
    Bleeker, Maurits
    Swietojanski, Pawel
    Braun, Stefan
    Zhuang, Xiaodan
    INTERSPEECH 2023, 2023, : 939 - 943
  • [32] The Benefits of Contextual Information for Speech Recognition Systems<bold> </bold>
    Kinch, Martin W.
    Melis, Wim J. C.
    Keates, Simeon
    2018 10TH COMPUTER SCIENCE AND ELECTRONIC ENGINEERING CONFERENCE (CEEC), 2018, : 225 - 230
  • [33] Listeners normalize speech for contextual speech rate even without an explicit recognition task
    Maslowski, Merel
    Meyer, Antje S.
    Bosker, Hans Rutger
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2019, 146 (01): : 179 - 188
  • [34] DIFFICULT SPEECH-RECOGNITION TECHNOLOGY SHOWS SIGNS OF MATURITY
    MARTIN, SL
    COMPUTER DESIGN, 1986, 25 (14): : 23 - &
  • [35] Improving Neural Network Performances - Training with Negative Examples
    Cernazanu-Glavan, Cosmin
    Holban, Stefan
    NOVEL ALGORITHMS AND TECHNIQUES IN TELECOMMUNICATIONS, AUTOMATION AND INDUSTRIAL ELECTRONICS, 2008, : 49 - 53
  • [36] Use of negative examples in training the HVS semantic model
    Jurcicek, Filip
    Svec, Jan
    Zahradil, Jiri
    Jelinek, Libor
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2006, 4188 : 605 - 612
  • [37] Joint evaluation of multiple speech patterns for speech recognition and training
    Nair, Nishanth Ulhas
    Sreenivas, T. V.
    COMPUTER SPEECH AND LANGUAGE, 2010, 24 (02): : 307 - 340
  • [38] Validation of Speech Data for Training Automatic Speech Recognition Systems
    Krizaj, Janes
    Gros, Jerneja Zganec
    Dobrisek, Simon
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1165 - 1169
  • [39] Training Speech Recognition Model with Speech Synthesis and Text Discriminator
    Lin, Hou-an
    Chen, Chia-ping
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2024, 40 (02) : 359 - 373
  • [40] Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition
    Qin, Yao
    Carlini, Nicholas
    Goodfellow, Ian
    Cottrell, Garrison
    Raffel, Colin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97