A Study into Pre-training Strategies for Spoken Language Understanding on Dysarthric Speech

被引:7
|
作者
Wang, Pu [1 ]
BabaAli, Bagher [2 ]
Van Hamme, Hugo [1 ]
机构
[1] Katholieke Univ Leuven, Dept Elect Engn ESAT, Leuven, Belgium
[2] Univ Tehran, Coll Sci, Sch Math Stat & Comp Sci, Tehran, Iran
来源
关键词
dysarthric speech; spoken language understanding; pre-training; capsule networks; CAPSULE NETWORKS; RECOGNITION;
D O I
10.21437/Interspeech.2021-1720
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
End-to-end (E2E) spoken language understanding (SLU) systems avoid an intermediate textual representation by mapping speech directly into intents with slot values. This approach requires considerable domain-specific training data. In low-resource scenarios this is a major concern, e.g., in the present study dealing with SLU for dysarthric speech. Pretraining part of the SLU model for automatic speech recognition targets helps but no research has shown to which extent SLU on dysarthric speech benefits from knowledge transferred from other dysarthric speech tasks. This paper investigates the efficiency of pre-training strategies for SLU tasks on dysarthric speech. The designed SLU system consists of a TDNN acoustic model for feature encoding and a capsule network for intent and slot decoding. The acoustic model is pre-trained in two stages: initialization with a corpus of normal speech and finetuning on a mixture of dysarthric and normal speech. By introducing the intelligibility score as a metric of the impairment severity, this paper quantitatively analyzes the relation between generalization and pathology severity for dysarthric speech.
引用
收藏
页码:36 / 40
页数:5
相关论文
共 50 条
  • [21] ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding
    Sun, Yu
    Wang, Shuohuan
    Li, Yukun
    Feng, Shikun
    Tian, Hao
    Wu, Hua
    Wang, Haifeng
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8968 - 8975
  • [22] Swahili Speech Dataset Development and Improved Pre-training Method for Spoken Digit Recognition
    Kivaisi, Alexander R.
    Zhao, Qingjie
    Mbelwa, Jimmy T.
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (07)
  • [23] Speech Pre-training with Acoustic Piece
    Ren, Shuo
    Liu, Shujie
    Wu, Yu
    Zhou, Long
    Wei, Furu
    INTERSPEECH 2022, 2022, : 2648 - 2652
  • [24] Understanding Chinese Video and Language via Contrastive Multimodal Pre-Training
    Lei, Chenyi
    Luo, Shixian
    Liu, Yong
    He, Wanggui
    Wang, Jiamang
    Wang, Guoxin
    Tang, Haihong
    Miao, Chunyan
    Li, Houqiang
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2567 - 2576
  • [25] Multimodal Pre-training Method for Vision-language Understanding and Generation
    Liu T.-Y.
    Wu Z.-X.
    Chen J.-J.
    Jiang Y.-G.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2024 - 2034
  • [26] A Study of Speech Recognition for Kazakh Based on Unsupervised Pre-Training
    Meng, Weijing
    Yolwas, Nurmemet
    SENSORS, 2023, 23 (02)
  • [27] Hierarchical Pre-training for Sequence Labelling in Spoken Dialog
    Chapuis, Emile
    Colombo, Pierre
    Manica, Matteo
    Labeau, Matthieu
    Clavel, Chloe
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2636 - 2648
  • [28] Understanding tables with intermediate pre-training
    Eisenschlos, Julian Martin
    Krichene, Syrine
    Mueller, Thomas
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020,
  • [29] The Impact of Musical Training on Understanding Dysarthric Speech: A Preliminary Study of Transcription Errors
    Connaghan, K. P.
    Fisk, D.
    Patel, R.
    COMMUNICATION DISORDERS QUARTERLY, 2021, 42 (02) : 73 - 80
  • [30] TRAINING SPOKEN LANGUAGE UNDERSTANDING SYSTEMS WITH NON-PARALLEL SPEECH AND TEXT
    Sari, Leda
    Thomas, Samuel
    Hasegawa-Johnson, Mark
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8109 - 8113