CONTEXTUAL SPEECH RECOGNITION WITH DIFFICULT NEGATIVE TRAINING EXAMPLES

被引：0

作者：

Alon, Uri ^{[1
,2
]}

Pundak, Golan ^{[2
]}

Sainath, Tara N. ^{[2
]}

机构：

[1] Technion, Haifa, Israel

[2] Google Inc, Mountain View, CA USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

speech recognition; sequence-to-sequence models; phonetics; attention; biasing;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Improving the representation of contextual information is key to unlocking the potential of end-to-end (E2E) automatic speech recognition (ASR). In this work, we present a novel and simple approach for training an ASR context mechanism with difficult negative examples. The main idea is to focus on proper nouns (e.g., unique entities such as names of people and places) in the reference transcript and use phonetically similar phrases as negative examples, encouraging the neural model to learn more discriminative representations. We apply our approach to an end-to-end contextual ASR model that jointly learns to transcribe and select the correct context items. We show that our proposed method gives up to 53:1% relative improvement in word error rate (WER) across several benchmarks.

引用

页码：6440 / 6444

页数：5

共 50 条

[1] Training Augmentation with Adversarial Examples for Robust Speech Recognition
Sun, Sining
Yeh, Ching-Feng
Ostendorf, Mari
Hwang, Mei-Yuh
Xie, Lei
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2404 - 2408
[2] Contextual Partitioning for Speech Recognition
Kent, Christopher G.
Paul, Joann M.
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2013, 13 (01)
[3] Contextual variability during speech-in-speech recognition
20142917942722
Brouwer, S. (s.m.brouwer@uu.nl), 1600, Acoustical Society of America (136):
[4] Contextual variability during speech-in-speech recognition
Brouwer, Susanne
Bradlow, Ann R.
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2014, 136 (01): : EL26 - EL32
[5] Contextual prediction models for speech recognition
Halpern, Yoni
Hall, Keith
Schogol, Wad
Riley, Michael
Roark, Brian
Skobeltsyn, Gleb
Bauml, Martin
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2338 - 2342
[6] CONTEXTUAL RELATIVITY OF SPEECH RECOGNITION CUES
SCHULTZ, MC
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1971, 49 (01): : 94 - &
[7] Bringing Contextual Information to Google Speech Recognition
Aleksic, Petar
Ghodsi, Mohammadreza
Michaely, Assaf
Allauzen, Cyril
Hall, Keith
Roark, Brian
Rybach, David
Moreno, Pedro
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 468 - 472
[8] Contextual confidence measures for continuous speech recognition
Hernández-Abrego, G
Mariño, JB
2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1803 - 1806
[9] UTILIZATION OF CONTEXTUAL CONSTRAINTS IN AUTOMATIC SPEECH RECOGNITION
ALTER, R
IEEE TRANSACTIONS ON AUDIO AND ELECTROACOUSTICS, 1968, AU16 (01): : 6 - &
[10] Histogram equalization of contextual statistics of speech features for robust speech recognition
Hsieh, Hsin-Ju
Chen, Berlin
Hung, Jeih-weih
MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (17) : 6769 - 6795

← 1 2 3 4 5 →