A Study into Pre-training Strategies for Spoken Language Understanding on Dysarthric Speech

被引:7
|
作者
Wang, Pu [1 ]
BabaAli, Bagher [2 ]
Van Hamme, Hugo [1 ]
机构
[1] Katholieke Univ Leuven, Dept Elect Engn ESAT, Leuven, Belgium
[2] Univ Tehran, Coll Sci, Sch Math Stat & Comp Sci, Tehran, Iran
来源
关键词
dysarthric speech; spoken language understanding; pre-training; capsule networks; CAPSULE NETWORKS; RECOGNITION;
D O I
10.21437/Interspeech.2021-1720
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
End-to-end (E2E) spoken language understanding (SLU) systems avoid an intermediate textual representation by mapping speech directly into intents with slot values. This approach requires considerable domain-specific training data. In low-resource scenarios this is a major concern, e.g., in the present study dealing with SLU for dysarthric speech. Pretraining part of the SLU model for automatic speech recognition targets helps but no research has shown to which extent SLU on dysarthric speech benefits from knowledge transferred from other dysarthric speech tasks. This paper investigates the efficiency of pre-training strategies for SLU tasks on dysarthric speech. The designed SLU system consists of a TDNN acoustic model for feature encoding and a capsule network for intent and slot decoding. The acoustic model is pre-trained in two stages: initialization with a corpus of normal speech and finetuning on a mixture of dysarthric and normal speech. By introducing the intelligibility score as a metric of the impairment severity, this paper quantitatively analyzes the relation between generalization and pathology severity for dysarthric speech.
引用
收藏
页码:36 / 40
页数:5
相关论文
共 50 条
  • [31] Pre-training Universal Language Representation
    Li, Yian
    Zhao, Hai
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 5122 - 5133
  • [32] MarkupLM: Pre-training of Text and Markup Language for Visually Rich Document Understanding
    Li, Junlong
    Xu, Yiheng
    Cui, Lei
    Wei, Furu
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 6078 - 6087
  • [33] SPOKEN LANGUAGE UNDERSTANDING WITHOUT SPEECH RECOGNITION
    Chen, Yuan-Ping
    Price, Ryan
    Bangalore, Srinivas
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6189 - 6193
  • [34] Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding
    Li, Shiyang
    Yavuz, Semih
    Chen, Wenhu
    Yan, Xifeng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1006 - 1015
  • [35] Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition
    Zhang, Wangyou
    Qian, Yanmin
    INTERSPEECH 2023, 2023, : 3517 - 3521
  • [36] ASR-Generated Text for Language Model Pre-training Applied to Speech Tasks
    Pelloin, Valentin
    Dary, Franck
    Herve, Nicolas
    Favre, Benoit
    Camelin, Nathalie
    Laurent, Antoine
    Besacier, Laurent
    INTERSPEECH 2022, 2022, : 3453 - 3457
  • [37] Neural speech enhancement with unsupervised pre-training and mixture training
    Hao, Xiang
    Xu, Chenglin
    Xie, Lei
    NEURAL NETWORKS, 2023, 158 : 216 - 227
  • [38] TOWARDS REDUCING THE NEED FOR SPEECH TRAINING DATA TO BUILD SPOKEN LANGUAGE UNDERSTANDING SYSTEMS
    Thomas, Samuel
    Kuo, Hong-Kwang J.
    Kingsbury, Brian
    Saon, George
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7932 - 7936
  • [39] SELF-TRAINING AND PRE-TRAINING ARE COMPLEMENTARY FOR SPEECH RECOGNITION
    Xu, Qiantong
    Baevski, Alexei
    Likhomanenko, Tatiana
    Tomasello, Paden
    Conneau, Alexis
    Collobert, Ronan
    Synnaeve, Gabriel
    Auli, Michael
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3030 - 3034
  • [40] PreQR: Pre-training Representation for SQL Understanding
    Tang, Xiu
    Wu, Sai
    Song, Mingli
    Ying, Shanshan
    Li, Feifei
    Chen, Gang
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 204 - 216