RNN TRANSDUCER MODELS FOR SPOKEN LANGUAGE UNDERSTANDING

被引:7
|
作者
Thomas, Samuel [1 ]
Kuo, Hong-Kwang J. [1 ]
Saon, George [1 ]
Tuske, Zoltan [1 ]
Kingsbury, Brian [1 ]
Kurata, Gakuto [1 ]
Kons, Zvi [1 ]
Hoory, Ron [1 ]
机构
[1] IBM Res AI, Yorktown Hts, NY 10598 USA
关键词
spoken language understanding; automatic speech recognition;
D O I
10.1109/ICASSP39728.2021.9414029
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a comprehensive study on building and adapting RNN transducer (RNN-T) models for spoken language understanding (SLU). These end-to-end (E2E) models are constructed in three practical settings: a case where verbatim transcripts are available, a constrained case where the only available annotations are SLU labels and their values, and a more restrictive case where transcripts are available but not corresponding audio. We show how RNN-T SLU models can be developed starting from pre-trained automatic speech recognition (ASR) systems, followed by an SLU adaptation step. In settings where real audio data is not available, artificially synthesized speech is used to successfully adapt various SLU models. When evaluated on two SLU data sets, the ATIS corpus and a customer call center data set, the proposed models closely track the performance of other E2E models and achieve state-of-the-art results.
引用
收藏
页码:7493 / 7497
页数:5
相关论文
共 50 条
  • [41] Chinese spoken language understanding in SHTQS
    毛家菊
    郭荣
    陆汝占
    Journal of Harbin Institute of Technology, 2005, (02) : 225 - 230
  • [42] Combining classifiers for spoken language understanding
    Karahan, M
    Hakkani-Tür, D
    Riccardi, G
    Tur, G
    ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 589 - 594
  • [43] Compositional Generalization in Spoken Language Understanding
    Ray, Avik
    Shen, Yilin
    Jin, Hongxia
    INTERSPEECH 2023, 2023, : 750 - 754
  • [44] Understanding spoken language through TalkBank
    Brian MacWhinney
    Behavior Research Methods, 2019, 51 : 1919 - 1927
  • [45] PARSING COORDINATION FOR SPOKEN LANGUAGE UNDERSTANDING
    Agarwal, Sanchit
    Goel, Rahul
    Chung, Tagyoung
    Sethi, Abhishek
    Mandal, Arindam
    Matsoukas, Spyros
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 677 - 684
  • [46] Understanding spoken language through TalkBank
    MacWhinney, Brian
    BEHAVIOR RESEARCH METHODS, 2019, 51 (04) : 1919 - 1927
  • [47] APHASIC DIFFICULTIES UNDERSTANDING SPOKEN LANGUAGE
    SCHUELL, H
    NEUROLOGY, 1953, 3 (03) : 176 - 184
  • [48] Recent advances in spoken language understanding
    De Mori, Renato
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2007, 4629 : 14 - 14
  • [49] TechWare: Spoken language understanding resources
    Conversational Systems Research Center, Microsoft Research, Mountain View, CA, United States
    不详
    IEEE Signal Process Mag, 2013, 3 (187-189):
  • [50] Spoken language understanding for social robotics
    Romero-Gonzalez, Cristina
    Martinez-Gomez, Jesus
    Garcia-Varea, Ismael
    2020 IEEE INTERNATIONAL CONFERENCE ON AUTONOMOUS ROBOT SYSTEMS AND COMPETITIONS (ICARSC 2020), 2020, : 152 - 157