On the Evaluation of Speech Foundation Models for Spoken Language Understanding

被引:0
|
作者
Arora, Siddhant [1 ]
Pasad, Ankita [2 ]
Chien, Chung-Ming [2 ]
Han, Jionghao [1 ]
Sharma, Roshan [1 ]
Jung, Jee-weon [1 ]
Dhamyal, Hira [1 ]
Chen, William [1 ]
Shona, Suwon [3 ]
Lee, Hung-yi [4 ]
Livescu, Karen [2 ]
Watanabe, Shinji [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Toyota Technol Inst Chicago, Chicago, IL USA
[3] ASAPP, New York, NY USA
[4] Natl Taiwan Univ, Taipei, Taiwan
基金
美国国家科学基金会;
关键词
RECOGNITION;
D O I
暂无
中图分类号
学科分类号
摘要
The Spoken Language Understanding Evaluation (SLUE) suite of benchmark tasks was recently introduced to address the need for open resources and benchmarking of complex spoken language understanding (SLU) tasks, including both classification and sequence generation tasks, on natural speech. The benchmark has demonstrated preliminary success in using pre-trained speech foundation models (SFM) for these SLU tasks. However, the community still lacks a fine-grained understanding of the comparative utility of different SFMs. Inspired by this, we ask: which SFMs offer the most benefits for these complex SLU tasks, and what is the most effective approach for incorporating these SFMs? To answer this, we perform an extensive evaluation of multiple supervised and self-supervised SFMs using several evaluation protocols: (i) frozen SFMs with a lightweight prediction head, (ii) frozen SFMs with a complex prediction head, and (iii) fine-tuned SFMs with a lightweight prediction head. Although the supervised SFMs are pretrained on much more speech recognition data (with labels), they do not always outperform self-supervised SFMs; the latter tend to perform at least as well as, and sometimes better than, supervised SFMs, especially on the sequence generation tasks in SLUE. While there is no universally optimal way of incorporating SFMs, the complex prediction head gives the best performance for most tasks, although it increases the inference time. We also introduce an open-source toolkit and performance leaderboard, SLUE-PERB, for these tasks and modeling strategies.
引用
收藏
页码:11923 / 11938
页数:16
相关论文
共 50 条
  • [21] Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation
    Deng, Keqi
    Watanabe, Shinji
    Shi, Jiatong
    Arora, Siddhant
    INTERSPEECH 2022, 2022, : 1746 - 1750
  • [22] ON-LINE ADAPTATION OF SEMANTIC MODELS FOR SPOKEN LANGUAGE UNDERSTANDING
    Bayer, Ali Orkan
    Riccardi, Giuseppe
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 90 - 95
  • [23] SPEECH-LANGUAGE PRE-TRAINING FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Qian, Yao
    Bianv, Ximo
    Shi, Yu
    Kanda, Naoyuki
    Shen, Leo
    Xiao, Zhen
    Zeng, Michael
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7458 - 7462
  • [24] TRAINING SPOKEN LANGUAGE UNDERSTANDING SYSTEMS WITH NON-PARALLEL SPEECH AND TEXT
    Sari, Leda
    Thomas, Samuel
    Hasegawa-Johnson, Mark
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8109 - 8113
  • [25] A Study into Pre-training Strategies for Spoken Language Understanding on Dysarthric Speech
    Wang, Pu
    BabaAli, Bagher
    Van Hamme, Hugo
    INTERSPEECH 2021, 2021, : 36 - 40
  • [26] Adapting dependency parsing to spontaneous speech for open domain spoken language understanding
    Bechet, Frederic
    Nasr, Alexis
    Favre, Benoit
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 135 - 139
  • [27] An Evaluation Framework for Natural Language Understanding in Spoken Dialogue Systems
    Gordon, Joshua B.
    Passonneau, Rebecca J.
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 72 - 77
  • [28] The Sogou Spoken Language Understanding System for the NLPCC 2018 Evaluation
    Gong, Neng
    Shen, Tongtong
    Wang, Tianshu
    Qi, Diandian
    Li, Meng
    Wang, Jia
    Li, Chi-Ho
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT I, 2018, 11108 : 454 - 463
  • [29] Spoken Language Understanding on the Edge
    Saade, Alaa
    Dureau, Joseph
    Leroy, David
    Caltagirone, Francesco
    Coucke, Alice
    Ball, Adrien
    Doumouro, Clement
    Lavril, Thibaut
    Caulier, Alexandre
    Bluche, Theodore
    Gisselbrecht, Thibault
    Primet, Mael
    FIFTH WORKSHOP ON ENERGY EFFICIENT MACHINE LEARNING AND COGNITIVE COMPUTING - NEURIPS EDITION (EMC2-NIPS 2019), 2019, : 57 - 61
  • [30] Spoken language understanding: A survey
    De Mori, Renato
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 365 - 376