RNN TRANSDUCER MODELS FOR SPOKEN LANGUAGE UNDERSTANDING

被引:7
|
作者
Thomas, Samuel [1 ]
Kuo, Hong-Kwang J. [1 ]
Saon, George [1 ]
Tuske, Zoltan [1 ]
Kingsbury, Brian [1 ]
Kurata, Gakuto [1 ]
Kons, Zvi [1 ]
Hoory, Ron [1 ]
机构
[1] IBM Res AI, Yorktown Hts, NY 10598 USA
关键词
spoken language understanding; automatic speech recognition;
D O I
10.1109/ICASSP39728.2021.9414029
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a comprehensive study on building and adapting RNN transducer (RNN-T) models for spoken language understanding (SLU). These end-to-end (E2E) models are constructed in three practical settings: a case where verbatim transcripts are available, a constrained case where the only available annotations are SLU labels and their values, and a more restrictive case where transcripts are available but not corresponding audio. We show how RNN-T SLU models can be developed starting from pre-trained automatic speech recognition (ASR) systems, followed by an SLU adaptation step. In settings where real audio data is not available, artificially synthesized speech is used to successfully adapt various SLU models. When evaluated on two SLU data sets, the ATIS corpus and a customer call center data set, the proposed models closely track the performance of other E2E models and achieve state-of-the-art results.
引用
收藏
页码:7493 / 7497
页数:5
相关论文
共 50 条
  • [21] RNN-Transducer based Chinese Sign Language Recognition
    Gao, Liqing
    Li, Haibo
    Liu, Zhijian
    Liu, Zekang
    Wan, Liang
    Feng, Wei
    NEUROCOMPUTING, 2021, 434 (45-54) : 45 - 54
  • [22] ON LANGUAGE MODEL INTEGRATION FOR RNN TRANSDUCER BASED SPEECH RECOGNITION
    Zhou, Wei
    Zheng, Zuoyun
    Schlueter, Ralf
    Ney, Hermann
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8407 - 8411
  • [23] Conditional models for detecting lambda-functions in a Spoken Language Understanding System
    Duvert, Frederic
    De Mori, Renato
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2434 - 2437
  • [24] Discriminative Spoken Language Understanding Using Statistical Machine Translation Alignment Models
    Aliannejadi, Mohammad
    Khadivi, Shahram
    Ghidary, Saeed Shiry
    Bokaei, Mohammad Hadi
    ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING, AISP 2013, 2014, 427 : 194 - +
  • [25] IMPROVING END-TO-END MODELS FOR SET PREDICTION IN SPOKEN LANGUAGE UNDERSTANDING
    Kuo, Hong-Kwang J.
    Tuske, Zoltan
    Thomas, Samuel
    Kingsbury, Brian
    Saon, George
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7162 - 7166
  • [26] Benchmarking Transformers-based models on French Spoken Language Understanding tasks
    Cattan, Oralie
    Ghannay, Sahar
    Servan, Christophe
    Rosset, Sophie
    INTERSPEECH 2022, 2022, : 1238 - 1242
  • [27] Predicting temporal performance drop of deployed production spoken language understanding models
    Do, Quynh
    Gaspers, Judith
    Sorokin, Daniil
    Lehnen, Patrick
    INTERSPEECH 2021, 2021, : 1249 - 1253
  • [28] SPOKEN LANGUAGE UNDERSTANDING FROM UNALIGNED DATA USING DISCRIMINATIVE CLASSIFICATION MODELS
    Mairesse, F.
    Gasic, M.
    Jurcicek, F.
    Keizer, S.
    Thomson, B.
    Yu, K.
    Young, S.
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4749 - 4752
  • [29] Is ATIS too shallow to go deeper for benchmarking Spoken Language Understanding models?
    Bechet, Frederic
    Raymond, Christian
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3449 - 3453
  • [30] Chinese spoken language understanding in SHTQS
    Mao, Jia-Ju
    Guo, Rong
    Lu, Ru-Zhan
    Journal of Harbin Institute of Technology (New Series), 2005, 12 (02) : 225 - 230