Impact Analysis of the Use of Speech and Language Models Pretrained by Self-Supersivion for Spoken Language Understanding

被引:0
|
作者
Mdhaffar, Salima [1 ]
Pelloin, Valentin [2 ]
Caubriere, Antoine [1 ]
Laperriere, Gaelle
Ghannay, Sahar [3 ]
Jabaian, Bassam [1 ]
Camelin, Nathalie [2 ]
Esteve, Yannick [1 ]
机构
[1] Avignon Univ, LIA, Avignon, France
[2] Le Mans Univ, LIUM, Le Mans, France
[3] Univ Paris Saclay, CNRS, LISN, Paris, France
基金
欧盟地平线“2020”;
关键词
Spoken Language Understanding; Slot Filling; Error Analysis; Self-supervised models;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Pretrained models through self-supervised learning have been recently introduced for both acoustic and language modeling. Applied to spoken language understanding tasks, these models have shown their great potential by improving the state-of-the-art performances on challenging benchmark datasets. In this paper, we present an error analysis reached by the use of such models on the French MEDIA benchmark dataset, known as being one of the most challenging benchmarks for the slot filling task among all the benchmarks accessible to the entire research community. One year ago, the state-of-art system reached a Concept Error Rate (CER) of 13.6% through the use of an end-to-end neural architecture. Some months later, a cascade approach based on the sequential use of a fine-tuned wav2vec 2.0 model and a fine-tuned BERT model reaches a CER of 11.2%. This significant improvement raises questions about the type of errors that remain difficult to treat, but also about those that have been corrected using these models pre-trained through self-supervision learning on a large amount of data. This study brings some answers in order to better understand the limits of such models and open new perspectives to continue improving the performance.
引用
收藏
页码:2949 / 2956
页数:8
相关论文
共 50 条
  • [1] Data Augmentation for Spoken Language Understanding via Pretrained Language Models
    Peng, Baolin
    Zhu, Chenguang
    Zeng, Michael
    Gao, Jianfeng
    INTERSPEECH 2021, 2021, : 1219 - 1223
  • [2] Augmenting Slot Values and Contexts for Spoken Language Understanding with Pretrained Models
    Lin, Haitao
    Xiang, Lu
    Zhou, Yu
    Zhang, Jiajun
    Zong, Chengqing
    INTERSPEECH 2021, 2021, : 4703 - 4707
  • [3] On the Evaluation of Speech Foundation Models for Spoken Language Understanding
    Arora, Siddhant
    Pasad, Ankita
    Chien, Chung-Ming
    Han, Jionghao
    Sharma, Roshan
    Jung, Jee-weon
    Dhamyal, Hira
    Chen, William
    Shona, Suwon
    Lee, Hung-yi
    Livescu, Karen
    Watanabe, Shinji
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 11923 - 11938
  • [4] Textually Pretrained Speech Language Models
    Hassid, Michael
    Remez, Tal
    Nguyen, Tu Anh
    Gat, Itai
    Conneau, Alexis
    Kreuk, Felix
    Copet, Jade
    Defossez, Alexandre
    Synnaeve, Gabriel
    Dupoux, Emmanuel
    Schwartz, Roy
    Adi, Yossi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] ADAPTING PRETRAINED TRANSFORMER TO LATTICES FOR SPOKEN LANGUAGE UNDERSTANDING
    Huang, Chao-Wei
    Chen, Yun-Nung
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 845 - 852
  • [6] SPOKEN LANGUAGE UNDERSTANDING WITHOUT SPEECH RECOGNITION
    Chen, Yuan-Ping
    Price, Ryan
    Bangalore, Srinivas
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6189 - 6193
  • [7] Discriminative Models for Spoken Language Understanding
    Wang, Ye-Yi
    Acero, Alex
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2426 - 2429
  • [8] ON THE USE OF SEMANTICALLY-ALIGNED SPEECH REPRESENTATIONS FOR SPOKEN LANGUAGE UNDERSTANDING
    Laperriere, Gaelle
    Pelloin, Valentin
    Rouvier, Mickael
    Stafylakis, Themos
    Esteve, Yannick
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 361 - 368
  • [9] RNN TRANSDUCER MODELS FOR SPOKEN LANGUAGE UNDERSTANDING
    Thomas, Samuel
    Kuo, Hong-Kwang J.
    Saon, George
    Tuske, Zoltan
    Kingsbury, Brian
    Kurata, Gakuto
    Kons, Zvi
    Hoory, Ron
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7493 - 7497
  • [10] Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding
    Arora, Siddhant
    Futami, Hayato
    Kashiwagi, Yosuke
    Tsunoo, Emiru
    Yan, Brian
    Watanabe, Shinji
    INTERSPEECH 2023, 2023, : 720 - 724