Unraveling the complexities of pathological voice through saliency analysis

被引:1
|
作者
Shaikh, Abdullah Abdul Sattar [1 ]
Bhargavi, M. S. [1 ]
Naik, Ganesh R. [2 ]
机构
[1] Bangalore Inst Technol, Dept Comp Sci & Engn, Bangalore 560004, Karnataka, India
[2] Flinders Univ S Australia, Adelaide Inst Sleep Hlth, Adelaide, SA 5042, Australia
关键词
Pathological voice; Saliency analysis; Autoencoders; Multi-class classification; UNet plus plus; AUTOMATIC DETECTION; CLASSIFICATION; SPEECH; IMPAIRMENTS; FEATURES; HEALTHY;
D O I
10.1016/j.compbiomed.2023.107566
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The human voice is an essential communication tool, but various disorders and habits can disrupt it. Diagnosis of pathological and abnormal voices is very important. Conventional diagnosis of these voice pathologies can be invasive and costly. Voice pathology disorders can be effectively detected using Artificial Intelligence and computer-aided voice pathology classification tools. Previous studies focused primarily on binary classification, leaving limited attention to multi-class classification. This study proposes three different neural network architectures to investigate the feature characteristics of three voice pathologies-Hyperkinetic Dysphonia, Hypokinetic Dysphonia, Reflux Laryngitis, and healthy voices using multi-class classification and the Voice ICar fEDerico II (VOICED) dataset. The study proposes UNet++ autoencoder-based denoiser techniques for accurate feature extraction to overcome noisy data. The architectures include a Multi-Layer Perceptron (MLP) trained on structured feature sets, a Short-Time Fourier Transform (STFT) model, and a Mel-Frequency Cepstral Coefficients (MFCC) model. The MLP model on 143 features achieved 97.1% accuracy, while the STFT model showed similar performance with increased sensitivity of 99.8%. The MFCC model maintained 97.1% accuracy but with a smaller model size and improved accuracy on the Reflux Laryngitis class. The study identifies crucial features through saliency analysis and reveals that detecting voice abnormalities requires the identification of regions of inaudible high-pitch sounds. Additionally, the study highlights the challenges posed by limited and disjointed pathological voice databases and proposes solutions for enhancing the performance of voice abnormality classification. Overall, the study's findings have potential applications in clinical applications and specialized audio-capturing tools.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Proteomics in reproductive biology: Beacon for unraveling the molecular complexities
    Upadhyay, Rahul D.
    Balasinor, N. H.
    Kumar, Anita V.
    Sachdeva, Geetanjali
    Parte, Priyanka
    Dumasia, Kushaan
    BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS, 2013, 1834 (01): : 8 - 15
  • [42] UNRAVELING THE COMPLEXITIES OF TRANSCRIPTION BY RNA POLYMERASE-III
    PALMER, JM
    FOLK, WR
    TRENDS IN BIOCHEMICAL SCIENCES, 1990, 15 (08) : 300 - 304
  • [43] Unraveling the Biologic and Clinical Complexities of HER2
    Park, John W.
    Neve, Richard A.
    Szollosi, Janos
    Benz, Christopher C.
    CLINICAL BREAST CANCER, 2008, 8 (05) : 392 - 401
  • [44] Unraveling Complexities in Genetically Elusive Long QT Syndrome
    Asatryan, Babken
    Murray, Brittney
    Gasperetti, Alessio
    McClellan, Rebecca
    Barth, Andreas S.
    CIRCULATION-ARRHYTHMIA AND ELECTROPHYSIOLOGY, 2024, 17 (02): : 116 - 124
  • [45] Unraveling the Complexities of Immune Checkpoint Inhibitors in Hepatocellular Carcinoma
    Han, Xinpu
    Sun, Qianhui
    Xu, Manman
    Zhu, Guanghui
    Gao, Ruike
    Ni, Baoyi
    Li, Jie
    SEMINARS IN LIVER DISEASE, 2023, 43 (04) : 383 - 401
  • [46] Spotlight on bond strength testing-Unraveling the complexities
    Roeder, Leslie
    Pereira, Patricia N. R.
    Yamamoto, Takatsugu
    Ilie, Nicoleta
    Armstrong, Steven
    Ferracane, Jack
    DENTAL MATERIALS, 2011, 27 (12) : 1197 - 1203
  • [48] Analysis and interpretation of multiple motions through surface saliency
    Nicolescu, Mircea
    Min, Changki
    Medioni, Gerard
    SPATIAL COHERENCE FOR VISUAL MOTION ANALYSIS, 2006, 3667 : 115 - 126
  • [49] Variability in transcription and the complexities of representation, authority and voice
    Jaffe, Alexandra
    DISCOURSE STUDIES, 2007, 9 (06) : 831 - 836
  • [50] Pathological Voice Classification Based on Wavelet Packet Multiscale Analysis
    Zhang, Xuehui
    Hu, Weiping
    2018 INTERNATIONAL CONFERENCE ON ALGORITHMS, COMPUTING AND ARTIFICIAL INTELLIGENCE (ACAI 2018), 2018,