Parallel convolutional neural network and hybrid architectures for accented speech recognition in Malayalam

被引:0
|
作者
Rizwana Kallooravi Thandil [1 ]
V. K. Muneer [1 ]
B. Premjith [2 ]
机构
[1] University of Calicut,Amrita School of Artificial Intelligence
[2] Amrita Vishwa Vidyapeetham,undefined
关键词
Accented speech recognition; Malayalam speech recognition; Speech signal preprocessing; Speech data augmentation; Dimensionality reduction; Speech feature extraction; Neural networks;
D O I
10.1007/s42044-024-00212-w
中图分类号
学科分类号
摘要
This study investigates different approaches to recognizing accented speech for the Malayalam language, a language spoken in the southern region of India. A dataset was constructed for different language accents to conduct the study since there were no freely available datasets in the domain. The data collected has been preprocessed by applying band-pass filters and audio normalization. The speech dataset has been augmented using time-stretching, pitch shifting, and adding Gaussian noise. A total of 585 acoustic features have been extracted from the speech signals using adaptive fast Fourier transform (FFT) window size, spectral contrast, Tonnetz and polyfeatures, harmonic-to-noise ratio (HNR) and formants, zero-crossing rate (ZCR) and short-term Fourier transform, root mean square (RMS) and Mel spectrogram, and Mel-frequency cepstral coefficients (MFCC) and its deltas. Five accented models were constructed using a 2D parallel convolutional neural network (CNN), 4D parallel CNN without attention block, 4D parallel CNN with attention block, Bidirectional long short-term memory, and CNN–long short-term memory hybrid methods. The accented models constructed using 4D Parallel with attention block and hybrid CNN–long short-term memory architecture exhibited better performance with high accuracy and low error rates among all the five model architectures.
引用
收藏
页码:125 / 149
页数:24
相关论文
共 50 条
  • [41] Speech Emotion Recognition in Neurological Disorders Using Convolutional Neural Network
    Zisad, Sharif Noor
    Hossain, Mohammad Shahadat
    Andersson, Karl
    BRAIN INFORMATICS, BI 2020, 2020, 12241 : 287 - 296
  • [42] Convolutional Neural Network (CNN) framework proposed for Malayalam handwritten character recognition system using AlexNet
    Manjusha, J.
    James, A.
    Chandran, Saravanan
    EMERGING TRENDS IN ENGINEERING, SCIENCE AND TECHNOLOGY FOR SOCIETY, ENERGY AND ENVIRONMENT, 2018, : 889 - 894
  • [43] Visual Speech Recognition of Korean Words Using Convolutional Neural Network
    Lee, Sung-Won
    Yu, Je-Hun
    Park, Seung Min
    Sim, Kwee-Bo
    INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS, 2019, 19 (01) : 1 - 9
  • [44] Exploring Convolutional Neural Network Structures and Optimization Techniques for Speech Recognition
    Abdel-Hamid, Ossama
    Deng, Li
    Yu, Dong
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3365 - 3369
  • [45] SAM: A Rethinking of Prominent Convolutional Neural Network Architectures for Visual Object Recognition
    Wang, Zhenyang
    Deng, Zhidong
    Wang, Shiyao
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 1008 - 1014
  • [46] Convolutional Time Delay Neural Network for Khmer Automatic Speech Recognition
    Srun, Nalin
    Leang, Sotheara
    Thu, Ye Kyaw
    Sam, Sethserey
    2022 17TH INTERNATIONAL JOINT SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE PROCESSING (ISAI-NLP 2022) / 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INTERNET OF THINGS (AIOT 2022), 2022,
  • [47] Constructing Speech Emotion Recognition Model Based on Convolutional Neural Network
    Kuo, Jong-Yih
    Chen, Zhao-Ming
    Lin, Hui-Chi
    2021 28TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE WORKSHOPS (APSECW 2021), 2021, : 52 - 56
  • [48] Convolutional Neural Network with Spectrogram and Perceptual Features for Speech Emotion Recognition
    Zhang, Linjuan
    Wang, Longbiao
    Dang, Jianwu
    Guo, Lili
    Guan, Haotian
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT IV, 2018, 11304 : 62 - 71
  • [49] Convolutional Neural Networks for Speech Recognition
    Abdel-Hamid, Ossama
    Mohamed, Abdel-Rahman
    Jiang, Hui
    Deng, Li
    Penn, Gerald
    Yu, Dong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) : 1533 - 1545
  • [50] Improving Accented Mandarin Speech Recognition by Using Recurrent Neural Network based Language Model Adaptation
    Ni, Hao
    Yi, Jiangyan
    Wen, Zhengqi
    Liu, Bin
    Tao, Jianhua
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,