CONTEXTUAL SPEECH RECOGNITION WITH DIFFICULT NEGATIVE TRAINING EXAMPLES

被引：0

作者：

Alon, Uri ^{[1
,2
]}

Pundak, Golan ^{[2
]}

Sainath, Tara N. ^{[2
]}

机构：

[1] Technion, Haifa, Israel

[2] Google Inc, Mountain View, CA USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

speech recognition; sequence-to-sequence models; phonetics; attention; biasing;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Improving the representation of contextual information is key to unlocking the potential of end-to-end (E2E) automatic speech recognition (ASR). In this work, we present a novel and simple approach for training an ASR context mechanism with difficult negative examples. The main idea is to focus on proper nouns (e.g., unique entities such as names of people and places) in the reference transcript and use phonetically similar phrases as negative examples, encouraging the neural model to learn more discriminative representations. We apply our approach to an end-to-end contextual ASR model that jointly learns to transcribe and select the correct context items. We show that our proposed method gives up to 53:1% relative improvement in word error rate (WER) across several benchmarks.

引用

页码：6440 / 6444

页数：5

共 50 条

[31] Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition
Bleeker, Maurits
Swietojanski, Pawel
Braun, Stefan
Zhuang, Xiaodan
INTERSPEECH 2023, 2023, : 939 - 943
[32] The Benefits of Contextual Information for Speech Recognition Systems<bold> </bold>
Kinch, Martin W.
Melis, Wim J. C.
Keates, Simeon
2018 10TH COMPUTER SCIENCE AND ELECTRONIC ENGINEERING CONFERENCE (CEEC), 2018, : 225 - 230
[33] Listeners normalize speech for contextual speech rate even without an explicit recognition task
Maslowski, Merel
Meyer, Antje S.
Bosker, Hans Rutger
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2019, 146 (01): : 179 - 188
[34] DIFFICULT SPEECH-RECOGNITION TECHNOLOGY SHOWS SIGNS OF MATURITY
MARTIN, SL
COMPUTER DESIGN, 1986, 25 (14): : 23 - &
[35] Improving Neural Network Performances - Training with Negative Examples
Cernazanu-Glavan, Cosmin
Holban, Stefan
NOVEL ALGORITHMS AND TECHNIQUES IN TELECOMMUNICATIONS, AUTOMATION AND INDUSTRIAL ELECTRONICS, 2008, : 49 - 53
[36] Use of negative examples in training the HVS semantic model
Jurcicek, Filip
Svec, Jan
Zahradil, Jiri
Jelinek, Libor
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2006, 4188 : 605 - 612
[37] Joint evaluation of multiple speech patterns for speech recognition and training
Nair, Nishanth Ulhas
Sreenivas, T. V.
COMPUTER SPEECH AND LANGUAGE, 2010, 24 (02): : 307 - 340
[38] Validation of Speech Data for Training Automatic Speech Recognition Systems
Krizaj, Janes
Gros, Jerneja Zganec
Dobrisek, Simon
2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1165 - 1169
[39] Training Speech Recognition Model with Speech Synthesis and Text Discriminator
Lin, Hou-an
Chen, Chia-ping
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2024, 40 (02) : 359 - 373
[40] Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition
Qin, Yao
Carlini, Nicholas
Goodfellow, Ian
Cottrell, Garrison
Raffel, Colin
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97

← 1 2 3 4 5 →