Strategy for developing a speech recognition model specialized for patients with depression or Parkinson's disease with small size speech database

被引:0
|
作者
Yoon, Seojin [1 ]
Maeng, Seri [2 ]
Kim, Ryul [3 ]
Lee, Sangmin [1 ]
机构
[1] Inha Univ, Dept Elect Engn, Incheon 22212, South Korea
[2] Inha Univ, Inha Univ Hosp, Coll Med, Dept Psychiat, Incheon 22332, South Korea
[3] Inha Univ, Inha Univ Hosp, Coll Med, Dept Neurol, Incheon 22332, South Korea
关键词
Speech recognition; Depression; Parkinson's disease; Deep learning;
D O I
10.1007/s13534-024-00389-w
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Most of speech recognition models currently in use have been dealt with speech of normal people. The speech recognition rate for patients with depression or Parkinson's disease (PD) who show differences in speech characteristics compared to normal subjects is lower than that of normal subjects. This study explores the model to enhance accuracy of speech recognition for individuals who have depression or PD, aiming to provide them more accurate service. In this study, considering the speech features of patients with depression or PD, we designed a model with the assumption that understanding the overall meaning and context of speech through the utilization of global information, rather than local information, is more effective in enhancing recognition accuracy. We propose the m-Globalformer, a model based on the Globalformer architecture that combines the squeeze-and-excitation (SE) module with the Transformer. The m-Globalformer enhances the utilization of global information by modifying the base SE module. The model employs pre-training and fine-tuning strategies, considering the limited speech data of the patients. In the initial training phase, a large-scale normal speech dataset was used, followed by fine-tuning the model with a small-scale dataset of depression or PD patients. The m-Globalformer demonstrated superior performance in our experiments, achieved character error rates (CER) of 11.28% for depression and 19.67% for PD.
引用
收藏
页码:1049 / 1055
页数:7
相关论文
共 50 条
  • [1] Effective speech recognition system for patients with Parkinson's disease
    Bak, Huiyong
    Kim, Ryul
    Lee, Sangmin
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2022, 41 (06): : 655 - 661
  • [2] Neurocomputational model of speech recognition for pathological speech detection: a case study on Parkinson's disease speech detection
    Hovsepyan, Sevada
    Magimai-Doss, Mathew
    INTERSPEECH 2024, 2024, : 3590 - 3594
  • [4] Speech disorders in patients with Parkinson's disease
    Kerschan, K
    Pankl, W
    Auff, E
    WIENER KLINISCHE WOCHENSCHRIFT, 1998, 110 (08) : 279 - 286
  • [5] Application for detecting depression, Parkinson's disease and dysphonic speech
    Kiss, Gabor
    Sztaho, David
    Tulles, Miklos Gabriel
    INTERSPEECH 2021, 2021, : 956 - 957
  • [6] Parkinson's Disease Recognition by Speech Acoustic Parameters Classification
    Meghraoui, D.
    Boudraa, B.
    Merazi-Meksen, T.
    Boudraa, M.
    MODELLING AND IMPLEMENTATION OF COMPLEX SYSTEMS, MISC 2016, 2016, : 165 - 173
  • [7] The effect of levodopa on speech in patients with Parkinson's disease
    Mrackova, M.
    Marecek, R.
    Mekyska, J.
    Kostalova, M.
    Rektorova, R.
    MOVEMENT DISORDERS, 2022, 37 : S86 - S86
  • [8] Speech dysfluency characteristics in patients with Parkinson's disease
    Manor, Y
    Patel, S
    Menachemi, M
    Shabtai, H
    Ezrati-Vinacour, R
    Giladi, N
    MOVEMENT DISORDERS, 2002, 17 : S82 - S82
  • [9] Automatic Speech Recognition in Noise for Parkinson's Disease: A Pilot Study
    Goudarzi, Alireza
    Moya-Gale, Gemma
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2021, 4
  • [10] Machine Learning Applied to Speech Recordings for Parkinson's Disease Recognition
    Aversano, Lerina
    Bernardi, Mario L.
    Cimitile, Marta
    Iammarino, Martina
    Madau, Antonella
    Verdone, Chiara
    DEEP LEARNING THEORY AND APPLICATIONS, DELTA 2023, 2023, 1875 : 101 - 114