Unraveling the complexities of pathological voice through saliency analysis

被引:1
|
作者
Shaikh, Abdullah Abdul Sattar [1 ]
Bhargavi, M. S. [1 ]
Naik, Ganesh R. [2 ]
机构
[1] Bangalore Inst Technol, Dept Comp Sci & Engn, Bangalore 560004, Karnataka, India
[2] Flinders Univ S Australia, Adelaide Inst Sleep Hlth, Adelaide, SA 5042, Australia
关键词
Pathological voice; Saliency analysis; Autoencoders; Multi-class classification; UNet plus plus; AUTOMATIC DETECTION; CLASSIFICATION; SPEECH; IMPAIRMENTS; FEATURES; HEALTHY;
D O I
10.1016/j.compbiomed.2023.107566
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The human voice is an essential communication tool, but various disorders and habits can disrupt it. Diagnosis of pathological and abnormal voices is very important. Conventional diagnosis of these voice pathologies can be invasive and costly. Voice pathology disorders can be effectively detected using Artificial Intelligence and computer-aided voice pathology classification tools. Previous studies focused primarily on binary classification, leaving limited attention to multi-class classification. This study proposes three different neural network architectures to investigate the feature characteristics of three voice pathologies-Hyperkinetic Dysphonia, Hypokinetic Dysphonia, Reflux Laryngitis, and healthy voices using multi-class classification and the Voice ICar fEDerico II (VOICED) dataset. The study proposes UNet++ autoencoder-based denoiser techniques for accurate feature extraction to overcome noisy data. The architectures include a Multi-Layer Perceptron (MLP) trained on structured feature sets, a Short-Time Fourier Transform (STFT) model, and a Mel-Frequency Cepstral Coefficients (MFCC) model. The MLP model on 143 features achieved 97.1% accuracy, while the STFT model showed similar performance with increased sensitivity of 99.8%. The MFCC model maintained 97.1% accuracy but with a smaller model size and improved accuracy on the Reflux Laryngitis class. The study identifies crucial features through saliency analysis and reveals that detecting voice abnormalities requires the identification of regions of inaudible high-pitch sounds. Additionally, the study highlights the challenges posed by limited and disjointed pathological voice databases and proposes solutions for enhancing the performance of voice abnormality classification. Overall, the study's findings have potential applications in clinical applications and specialized audio-capturing tools.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Unraveling the complexities of urban fluvial flood hydraulics through AI
    Md Abdullah Al Mehedi
    Virginia Smith
    Hossein Hosseiny
    Xun Jiao
    Scientific Reports, 12
  • [2] Unraveling the complexities of urban fluvial flood hydraulics through AI
    Al Mehedi, Md Abdullah
    Smith, Virginia
    Hosseiny, Hossein
    Jiao, Xun
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [3] Source Analysis of Pathological Voice
    Jo, Cheolwoo
    INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS (IMECS 2010), VOLS I-III, 2010, : 1271 - 1274
  • [4] Unraveling the complexities of land transformation and its impact on urban sustainability through land surface temperature analysis
    Saleha Jamal
    Mohd Saqib
    Wani Suhail Ahmad
    Manal Ahmad
    Md Ashif Ali
    Md Babor Ali
    Applied Geomatics, 2023, 15 : 719 - 741
  • [5] Unraveling the complexities of land transformation and its impact on urban sustainability through land surface temperature analysis
    Jamal, Saleha
    Saqib, Mohd
    Ahmad, Wani Suhail
    Ahmad, Manal
    Ali, Md Ashif
    Ali, Md Babor
    APPLIED GEOMATICS, 2023, 15 (3) : 719 - 741
  • [6] Unraveling the complexities of gut endocrinology
    Drucker, Daniel J.
    NATURE CLINICAL PRACTICE ENDOCRINOLOGY & METABOLISM, 2007, 3 (04): : 317 - 317
  • [7] Unraveling the complexities of gut endocrinology
    Daniel J Drucker
    Nature Clinical Practice Endocrinology & Metabolism, 2007, 3 : 317 - 317
  • [8] UNRAVELING THE COMPLEXITIES OF TIME PERCEPTION
    FRIEDMAN, ER
    USA TODAY, 1979, 107 (2408): : 59 - 60
  • [9] Multidimensional Acoustic Analysis of Pathological Voice
    Petrovic-Lazic, Mirjana
    Babac, Snezana
    Ivankovic, Zoran
    Kosanovic, Rade
    SRPSKI ARHIV ZA CELOKUPNO LEKARSTVO, 2009, 137 (5-6) : 234 - 238
  • [10] Unraveling the complexities of circadian and sleep interactions with memory formation through invertebrate research
    Michel, Maximilian
    Lyons, Lisa C.
    FRONTIERS IN SYSTEMS NEUROSCIENCE, 2014, 8