CONTEXTUAL SPEECH RECOGNITION WITH DIFFICULT NEGATIVE TRAINING EXAMPLES

被引：0

作者：

Alon, Uri ^{[1
,2
]}

Pundak, Golan ^{[2
]}

Sainath, Tara N. ^{[2
]}

机构：

[1] Technion, Haifa, Israel

[2] Google Inc, Mountain View, CA USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

speech recognition; sequence-to-sequence models; phonetics; attention; biasing;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Improving the representation of contextual information is key to unlocking the potential of end-to-end (E2E) automatic speech recognition (ASR). In this work, we present a novel and simple approach for training an ASR context mechanism with difficult negative examples. The main idea is to focus on proper nouns (e.g., unique entities such as names of people and places) in the reference transcript and use phonetically similar phrases as negative examples, encouraging the neural model to learn more discriminative representations. We apply our approach to an end-to-end contextual ASR model that jointly learns to transcribe and select the correct context items. We show that our proposed method gives up to 53:1% relative improvement in word error rate (WER) across several benchmarks.

引用

页码：6440 / 6444

页数：5

共 50 条

[41] Detecting Adversarial Examples for Speech Recognition via Uncertainty Quantification
Daeubener, Sina
Schoenherr, Lea
Fischer, Asja
Kolossa, Dorothea
INTERSPEECH 2020, 2020, : 4661 - 4665
[42] Speech-oriented negative emotion recognition
He, Liang
Bo, Yuming
Zhao, Gaopeng
2015 34TH CHINESE CONTROL CONFERENCE (CCC), 2015, : 3553 - 3558
[43] Recognition of negative emotions from the speech signal
Lee, CM
Narayanan, S
Pieraccini, R
ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 240 - 243
[44] Parallel Training of Neural Networks for Speech Recognition
Vesely, Karel
Burget, Lukas
Grezl, Frantisek
TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 439 - 446
[45] OPTIMAL SOLUTION OF A TRAINING PROBLEM IN SPEECH RECOGNITION
NADAS, A
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (01): : 326 - 329
[46] Training Speech Recognition Models on HPC Infrastructure
Karkada, Deepthi
Saletore, Vikram A.
PROCEEDINGS OF 2018 IEEE/ACM MACHINE LEARNING IN HPC ENVIRONMENTS (MLHPC 2018), 2018, : 124 - 132
[47] Research on Acceleration Method of Speech Recognition Training
Bai, Liang
Jiang, Jingfei
Dou, Yong
ADVANCED COMPUTER ARCHITECTURE, 2018, 908 : 42 - 50
[48] A robust training algorithm for adverse speech recognition
Hong, WT
Chen, SH
SPEECH COMMUNICATION, 2000, 30 (04) : 273 - 293
[49] TRAINING SET DESIGN FOR CONNECTED SPEECH RECOGNITION
BROWN, MK
MCGEE, MA
RABINER, LR
WILPON, JG
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1991, 39 (06) : 1268 - 1281
[50] Study of parallelization of the training for automatic speech recognition
Daoudi, EM
Meziane, A
El Hadj, YOM
HIGH PERFORMANCE COMPUTING AND NETWORKING, PROCEEDINGS, 2000, 1823 : 576 - 579

← 1 2 3 4 5 →