Impact Analysis of the Use of Speech and Language Models Pretrained by Self-Supersivion for Spoken Language Understanding

被引:0
|
作者
Mdhaffar, Salima [1 ]
Pelloin, Valentin [2 ]
Caubriere, Antoine [1 ]
Laperriere, Gaelle
Ghannay, Sahar [3 ]
Jabaian, Bassam [1 ]
Camelin, Nathalie [2 ]
Esteve, Yannick [1 ]
机构
[1] Avignon Univ, LIA, Avignon, France
[2] Le Mans Univ, LIUM, Le Mans, France
[3] Univ Paris Saclay, CNRS, LISN, Paris, France
基金
欧盟地平线“2020”;
关键词
Spoken Language Understanding; Slot Filling; Error Analysis; Self-supervised models;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Pretrained models through self-supervised learning have been recently introduced for both acoustic and language modeling. Applied to spoken language understanding tasks, these models have shown their great potential by improving the state-of-the-art performances on challenging benchmark datasets. In this paper, we present an error analysis reached by the use of such models on the French MEDIA benchmark dataset, known as being one of the most challenging benchmarks for the slot filling task among all the benchmarks accessible to the entire research community. One year ago, the state-of-art system reached a Concept Error Rate (CER) of 13.6% through the use of an end-to-end neural architecture. Some months later, a cascade approach based on the sequential use of a fine-tuned wav2vec 2.0 model and a fine-tuned BERT model reaches a CER of 11.2%. This significant improvement raises questions about the type of errors that remain difficult to treat, but also about those that have been corrected using these models pre-trained through self-supervision learning on a large amount of data. This study brings some answers in order to better understand the limits of such models and open new perspectives to continue improving the performance.
引用
收藏
页码:2949 / 2956
页数:8
相关论文
共 50 条
  • [31] END-TO-END SPOKEN LANGUAGE UNDERSTANDING WITHOUT MATCHED LANGUAGE SPEECH MODEL PRETRAINING DATA
    Price, Ryan
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7979 - 7983
  • [32] Constructing Chinese taxonomy trees from understanding and generative pretrained language models
    Guo, Jianyu
    Chen, Jingnan
    Ren, Li
    Zhou, Huanlai
    Xu, Wenbo
    Jia, Haitao
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [33] Explaining pretrained language models' understanding of linguistic structures using construction grammar
    Weissweiler, Leonie
    Hofmann, Valentin
    Koeksal, Abdullatif
    Schuetze, Hinrich
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2023, 6
  • [34] Constructing Chinese taxonomy trees from understanding and generative pretrained language models
    Guo, Jianyu
    Chen, Jingnan
    Ren, Li
    Zhou, Huanlai
    Xu, Wenbo
    Jia, Haitao
    PeerJ Computer Science, 2024, 10
  • [35] Pretrained Semantic Speech Embeddings for End-to-End Spoken Language Understanding via Cross-Modal Teacher-Student Learning
    Denisov, Pavel
    Vu, Ngoc Thang
    INTERSPEECH 2020, 2020, : 881 - 885
  • [36] JOINT LANGUAGE MODELS FOR AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING
    Bayer, Ali Orkan
    Riccardi, Giuseppe
    2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 199 - 203
  • [37] SLUE: NEW BENCHMARK TASKS FOR SPOKEN LANGUAGE UNDERSTANDING EVALUATION ON NATURAL SPEECH
    Shon, Suwon
    Pasad, Ankita
    Wu, Felix
    Brusco, Pablo
    Artzi, Yoav
    Livescu, Karen
    Han, Kyu J.
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7927 - 7931
  • [38] TRAINING SPOKEN LANGUAGE UNDERSTANDING SYSTEMS WITH NON-PARALLEL SPEECH AND TEXT
    Sari, Leda
    Thomas, Samuel
    Hasegawa-Johnson, Mark
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8109 - 8113
  • [39] A Study into Pre-training Strategies for Spoken Language Understanding on Dysarthric Speech
    Wang, Pu
    BabaAli, Bagher
    Van Hamme, Hugo
    INTERSPEECH 2021, 2021, : 36 - 40
  • [40] Adapting dependency parsing to spontaneous speech for open domain spoken language understanding
    Bechet, Frederic
    Nasr, Alexis
    Favre, Benoit
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 135 - 139