Impact Analysis of the Use of Speech and Language Models Pretrained by Self-Supersivion for Spoken Language Understanding

被引:0
|
作者
Mdhaffar, Salima [1 ]
Pelloin, Valentin [2 ]
Caubriere, Antoine [1 ]
Laperriere, Gaelle
Ghannay, Sahar [3 ]
Jabaian, Bassam [1 ]
Camelin, Nathalie [2 ]
Esteve, Yannick [1 ]
机构
[1] Avignon Univ, LIA, Avignon, France
[2] Le Mans Univ, LIUM, Le Mans, France
[3] Univ Paris Saclay, CNRS, LISN, Paris, France
基金
欧盟地平线“2020”;
关键词
Spoken Language Understanding; Slot Filling; Error Analysis; Self-supervised models;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Pretrained models through self-supervised learning have been recently introduced for both acoustic and language modeling. Applied to spoken language understanding tasks, these models have shown their great potential by improving the state-of-the-art performances on challenging benchmark datasets. In this paper, we present an error analysis reached by the use of such models on the French MEDIA benchmark dataset, known as being one of the most challenging benchmarks for the slot filling task among all the benchmarks accessible to the entire research community. One year ago, the state-of-art system reached a Concept Error Rate (CER) of 13.6% through the use of an end-to-end neural architecture. Some months later, a cascade approach based on the sequential use of a fine-tuned wav2vec 2.0 model and a fine-tuned BERT model reaches a CER of 11.2%. This significant improvement raises questions about the type of errors that remain difficult to treat, but also about those that have been corrected using these models pre-trained through self-supervision learning on a large amount of data. This study brings some answers in order to better understand the limits of such models and open new perspectives to continue improving the performance.
引用
收藏
页码:2949 / 2956
页数:8
相关论文
共 50 条
  • [41] LEVERAGING ACOUSTIC AND LINGUISTIC EMBEDDINGS FROM PRETRAINED SPEECH AND LANGUAGE MODELS FOR INTENT CLASSIFICATION
    Sharma, Bidisha
    Madhavi, Maulik
    Li, Haizhou
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7498 - 7502
  • [42] Combining Statistical and Syntactical Systems for Spoken Language Understanding with Graphical Models
    Schwaerzler, S.
    Geiger, J.
    Schenk, J.
    Al-Hames, M.
    Hoernler, B.
    Ruske, G.
    Rigoll, G.
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1590 - 1593
  • [43] Impact of web based language modeling on speech understanding
    Sarikaya, R
    Kuo, HKJ
    Gao, YQ
    2005 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2005, : 268 - 271
  • [44] A Self-Attentive Model with Gate Mechanism for Spoken Language Understanding
    Li, Changliang
    Li, Liang
    Qi, Ji
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3824 - 3833
  • [45] Multimodality Self-distillation for Fast Inference of Vision and Language Pretrained Models
    Kong, Jun
    Wang, Jin
    Yu, Liang-Chih
    Zhang, Xuejie
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8928 - 8940
  • [46] BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models
    Lavechin, Marvin
    Sy, Yaya
    Titeux, Hadrien
    Blandon, Maria Andrea Cruz
    Rasanen, Okko
    Bredin, Herve
    Dupoux, Emmanuel
    Cristia, Alejandrina
    INTERSPEECH 2023, 2023, : 4588 - 4592
  • [47] A Self-Supervised Integration Method of Pretrained Language Models and Word Definitions
    Jo, Hwiyeol
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 14 - 26
  • [48] On the use of structures for spoken language understanding: A two-step approach
    Jeong, Minwoo
    Lee, Gary Geunbae
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (05) : 1552 - 1561
  • [49] The Use of Clinical Language Models Pretrained on Institutional EHR Data for Downstream Tasks
    Suvirat, Kerdkiat
    Chairat, Sawrawit
    Horsiritham, Kanakorn
    Ingviya, Thammasin
    Kongkamol, Chanon
    Chaichulee, Sitthichok
    2024 21ST INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING, JCSSE 2024, 2024, : 648 - 655
  • [50] New Perspectives on Spoken Language Understanding: Does Machine Need to Fully Understand Speech?
    Kawahara, Tatsuya
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 46 - 50