CFDRN: A Cognition-Inspired Feature Decomposition and Recombination Network for Dysarthric Speech Recognition

被引:1
|
作者
Lin, Yuqin [1 ]
Wang, Longbiao [1 ,2 ]
Yang, Yanbing [1 ]
Dang, Jianwu [1 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin 300350, Peoples R China
[2] Huiyan Technol Tianjin Co Ltd, Tianjin 300350, Peoples R China
基金
中国国家自然科学基金;
关键词
Adaptation; automatic speech recognition; dysarthria; AUDITORY-CORTEX; OSCILLATIONS; ADAPTATION; PROGRESS;
D O I
10.1109/TASLP.2023.3319276
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
As an essential technology in human-computer interactions, automatic speech recognition (ASR) ensures a convenient life for healthy people; however, people with speech disorders, who truly need support from such a technology, have experienced difficulties in the use of ASR. Disordered ASR is challenging because of the large variabilities in disordered speech. Humans tend to separately process different spectro-temporal features of speech in the left and right hemispheres of their brain, showing significantly better ability in speech perception than machines, especially in disordered speech perception. Inspired by human speech processing, this article proposes a cognition-inspired feature decomposition and recombination network (CFDRN) for dysarthric ASR. In the CFDRN, slow- and rapid-varying temporal processors are designed to decompose features into stable and changeable features, respectively. A gated fusion module was developed to selectively recombine the decomposed features. Moreover, this study utilised an adaptation approach based on unsupervised pre-training techniques to alleviate data scarcity issues in dysarthric ASR. The CFDRNs were added to the layers of the pre-trained model, and the entire model is adapted from normal speech to disordered speech. The effectiveness of the proposed method was validated on the widely used TORGO and UASpeech dysarthria datasets under three popular unsupervised pre-training techniques, wav2vec 2.0, HuBERT, and data2vec. When compared to the baseline methods, the proposed CFDRN with the three pre-training techniques achieved 13.73%similar to 16.23% and 4.50%similar to 13.20% word error rate reductions on the TORGO and UASpeech datasets, respectively. Furthermore, this study clarified several major factors affecting dysarthric ASR performance.
引用
收藏
页码:3824 / 3836
页数:13
相关论文
共 50 条
  • [21] On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition
    Geng, Mengzhe
    Xie, Xurong
    Su, Rongfeng
    Yu, Jianwei
    Jin, Zengrui
    Wang, Tianzi
    Hu, Shujie
    Ye, Zi
    Meng, Helen
    Liu, Xunying
    INTERSPEECH 2023, 2023, : 1753 - 1757
  • [22] A bio-inspired feature extraction for robust speech recognition
    Zouhir, Youssef
    Ouni, Kais
    SPRINGERPLUS, 2014, 3
  • [23] FEATURE EXTRACTION USING PRE-TRAINED CONVOLUTIVE BOTTLENECK NETS FOR DYSARTHRIC SPEECH RECOGNITION
    Takashima, Yuki
    Nakashika, Toru
    Takiguchi, Tetsuya
    Ariki, Yasuo
    2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 1411 - 1415
  • [24] EMOCEPTION: AN INCEPTION INSPIRED EFFICIENT SPEECH EMOTION RECOGNITION NETWORK
    Singh, Chirag
    Kumar, Abhay
    Nagar, Ajay
    Tripathi, Suraj
    Yenigalla, Promod
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 787 - 791
  • [25] Dysarthric Speech Recognition using Time-delay Neural Network based Denoising Autoencoder
    Bhat, Chitralekha
    Das, Biswajit
    Vachhani, Bhavik
    Kopparapu, Sunil Kumar
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 451 - 455
  • [26] Feature Recognition based on Graph decomposition and Neural Network
    Yi RongQing
    Li WenHui
    Duo, Wang
    Hua, Yuan
    THIRD 2008 INTERNATIONAL CONFERENCE ON CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, VOL 2, PROCEEDINGS, 2008, : 864 - 868
  • [27] Competitive crow search algorithm-based hierarchical attention network for dysarthric speech recognition
    Jolad B.
    Khanai R.
    International Journal of Wireless and Mobile Computing, 2023, 25 (04) : 340 - 352
  • [28] Cognitively Inspired Feature Extraction and Speech Recognition for Automated Hearing Loss Testing
    Shibli Nisar
    Muhammad Tariq
    Ahsan Adeel
    Mandar Gogate
    Amir Hussain
    Cognitive Computation, 2019, 11 : 489 - 502
  • [29] Cognitively Inspired Feature Extraction and Speech Recognition for Automated Hearing Loss Testing
    Nisar, Shibli
    Tariq, Muhammad
    Adeel, Ahsan
    Gogate, Mandar
    Hussain, Amir
    COGNITIVE COMPUTATION, 2019, 11 (04) : 489 - 502
  • [30] UTran-DSR: a novel transformer-based model using feature enhancement for dysarthric speech recognition
    Irshad, Usama
    Mahum, Rabbia
    Ganiyu, Ismaila
    Butt, Faisal Shafique
    Hidri, Lotfi
    Ali, Tamer G.
    El-Sherbeeny, Ahmed M.
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01):