RNN TRANSDUCER MODELS FOR SPOKEN LANGUAGE UNDERSTANDING

被引：7

作者：

Thomas, Samuel ^{[1
]}

Kuo, Hong-Kwang J. ^{[1
]}

Saon, George ^{[1
]}

Tuske, Zoltan ^{[1
]}

Kingsbury, Brian ^{[1
]}

Kurata, Gakuto ^{[1
]}

Kons, Zvi ^{[1
]}

Hoory, Ron ^{[1
]}

机构：

[1] IBM Res AI, Yorktown Hts, NY 10598 USA

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

spoken language understanding; automatic speech recognition;

D O I：

10.1109/ICASSP39728.2021.9414029

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We present a comprehensive study on building and adapting RNN transducer (RNN-T) models for spoken language understanding (SLU). These end-to-end (E2E) models are constructed in three practical settings: a case where verbatim transcripts are available, a constrained case where the only available annotations are SLU labels and their values, and a more restrictive case where transcripts are available but not corresponding audio. We show how RNN-T SLU models can be developed starting from pre-trained automatic speech recognition (ASR) systems, followed by an SLU adaptation step. In settings where real audio data is not available, artificially synthesized speech is used to successfully adapt various SLU models. When evaluated on two SLU data sets, the ATIS corpus and a customer call center data set, the proposed models closely track the performance of other E2E models and achieve state-of-the-art results.

引用

页码：7493 / 7497

页数：5

共 50 条

[41] Chinese spoken language understanding in SHTQS
毛家菊
郭荣
陆汝占
Journal of Harbin Institute of Technology, 2005, (02) : 225 - 230
[42] Combining classifiers for spoken language understanding
Karahan, M
Hakkani-Tür, D
Riccardi, G
Tur, G
ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 589 - 594
[43] Compositional Generalization in Spoken Language Understanding
Ray, Avik
Shen, Yilin
Jin, Hongxia
INTERSPEECH 2023, 2023, : 750 - 754
[44] Understanding spoken language through TalkBank
Brian MacWhinney
Behavior Research Methods, 2019, 51 : 1919 - 1927
[45] PARSING COORDINATION FOR SPOKEN LANGUAGE UNDERSTANDING
Agarwal, Sanchit
Goel, Rahul
Chung, Tagyoung
Sethi, Abhishek
Mandal, Arindam
Matsoukas, Spyros
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 677 - 684
[46] Understanding spoken language through TalkBank
MacWhinney, Brian
BEHAVIOR RESEARCH METHODS, 2019, 51 (04) : 1919 - 1927
[47] APHASIC DIFFICULTIES UNDERSTANDING SPOKEN LANGUAGE
SCHUELL, H
NEUROLOGY, 1953, 3 (03) : 176 - 184
[48] Recent advances in spoken language understanding
De Mori, Renato
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2007, 4629 : 14 - 14
[49] TechWare: Spoken language understanding resources
Conversational Systems Research Center, Microsoft Research, Mountain View, CA, United States
不详
IEEE Signal Process Mag, 2013, 3 (187-189):
[50] Spoken language understanding for social robotics
Romero-Gonzalez, Cristina
Martinez-Gomez, Jesus
Garcia-Varea, Ismael
2020 IEEE INTERNATIONAL CONFERENCE ON AUTONOMOUS ROBOT SYSTEMS AND COMPETITIONS (ICARSC 2020), 2020, : 152 - 157

← 1 2 3 4 5 →