CONTEXTUAL SPEECH RECOGNITION WITH DIFFICULT NEGATIVE TRAINING EXAMPLES

被引：0

作者：

Alon, Uri ^{[1
,2
]}

Pundak, Golan ^{[2
]}

Sainath, Tara N. ^{[2
]}

机构：

[1] Technion, Haifa, Israel

[2] Google Inc, Mountain View, CA USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

speech recognition; sequence-to-sequence models; phonetics; attention; biasing;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Improving the representation of contextual information is key to unlocking the potential of end-to-end (E2E) automatic speech recognition (ASR). In this work, we present a novel and simple approach for training an ASR context mechanism with difficult negative examples. The main idea is to focus on proper nouns (e.g., unique entities such as names of people and places) in the reference transcript and use phonetically similar phrases as negative examples, encouraging the neural model to learn more discriminative representations. We apply our approach to an end-to-end contextual ASR model that jointly learns to transcribe and select the correct context items. We show that our proposed method gives up to 53:1% relative improvement in word error rate (WER) across several benchmarks.

引用

页码：6440 / 6444

页数：5

共 50 条

[21] Automatic Speech Recognition and Pronunciation Training
Xiao, Wenqi
PROCEEDINGS OF THE 2018 2ND INTERNATIONAL CONFERENCE ON EDUCATION, ECONOMICS AND MANAGEMENT RESEARCH (ICEEMR 2018), 2018, 182 : 466 - 468
[22] TRAINING AND SEARCH METHODS FOR SPEECH RECOGNITION
JELINEK, F
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1995, 92 (22) : 9964 - 9969
[23] HMM speech recognition with reduced training
Foo, SW
Yap, T
ICICS - PROCEEDINGS OF 1997 INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING, VOLS 1-3: THEME: TRENDS IN INFORMATION SYSTEMS ENGINEERING AND WIRELESS MULTIMEDIA COMMUNICATIONS, 1997, : 1016 - 1019
[24] Keyword Spotting for Google Assistant Using Contextual Speech Recognition
Michaely, Assaf Hurwitz
Zhang, Xuedong
Simko, Gabor
Parada, Carolina
Aleksic, Petar
2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 272 - 278
[25] The demiphone:: An efficient contextual subword unit for continuous speech recognition
Mariño, JB
Nogueiras, A
Pachès-Leal, P
Bonafonte, A
SPEECH COMMUNICATION, 2000, 32 (03) : 187 - 197
[26] INTER-FRAME CONTEXTUAL MODELLING FOR VISUAL SPEECH RECOGNITION
Pass, Adrian
Ming, Ji
Hanna, Philip
Zhang, Jianguo
Stewart, Darryl
2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 93 - 96
[27] Automatic Speech Recognition Transformer with Global Contextual Information Decoder
Qian, Yukun
Zhuang, Xuyi
Wang, Mingjiang
INTERSPEECH 2023, 2023, : 4474 - 4478
[28] INTRODUCING CONTEXTUAL TRANSCRIPTION RULES IN LARGE VOCABULARY SPEECH RECOGNITION
Gravier, Guillaume
Yvon, Francois
Jacob, Bruno
Bimbot, Frederic
INTEGRATION OF PHONETIC KNOWLEDGE IN SPEECH TECHNOLOGY, 2005, 25 : 87 - 106
[29] Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition
Xu, Tianyi
Yang, Zhanheng
Huang, Kaixun
Guo, Pengcheng
Zhang, Ao
Li, Biao
Chen, Changru
Li, Chao
Xie, Lei
INTERSPEECH 2023, 2023, : 1668 - 1672
[30] DEEP CONTEXT: END-TO-END CONTEXTUAL SPEECH RECOGNITION
Pundak, Golan
Sainath, Tara N.
Prabhavalkar, Rohit
Kannan, Anjuli
Zhao, Ding
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 418 - 425

← 1 2 3 4 5 →