Parallel convolutional neural network and hybrid architectures for accented speech recognition in Malayalam

被引:0
|
作者
Rizwana Kallooravi Thandil [1 ]
V. K. Muneer [1 ]
B. Premjith [2 ]
机构
[1] University of Calicut,Amrita School of Artificial Intelligence
[2] Amrita Vishwa Vidyapeetham,undefined
关键词
Accented speech recognition; Malayalam speech recognition; Speech signal preprocessing; Speech data augmentation; Dimensionality reduction; Speech feature extraction; Neural networks;
D O I
10.1007/s42044-024-00212-w
中图分类号
学科分类号
摘要
This study investigates different approaches to recognizing accented speech for the Malayalam language, a language spoken in the southern region of India. A dataset was constructed for different language accents to conduct the study since there were no freely available datasets in the domain. The data collected has been preprocessed by applying band-pass filters and audio normalization. The speech dataset has been augmented using time-stretching, pitch shifting, and adding Gaussian noise. A total of 585 acoustic features have been extracted from the speech signals using adaptive fast Fourier transform (FFT) window size, spectral contrast, Tonnetz and polyfeatures, harmonic-to-noise ratio (HNR) and formants, zero-crossing rate (ZCR) and short-term Fourier transform, root mean square (RMS) and Mel spectrogram, and Mel-frequency cepstral coefficients (MFCC) and its deltas. Five accented models were constructed using a 2D parallel convolutional neural network (CNN), 4D parallel CNN without attention block, 4D parallel CNN with attention block, Bidirectional long short-term memory, and CNN–long short-term memory hybrid methods. The accented models constructed using 4D Parallel with attention block and hybrid CNN–long short-term memory architecture exhibited better performance with high accuracy and low error rates among all the five model architectures.
引用
收藏
页码:125 / 149
页数:24
相关论文
共 50 条
  • [21] Speech recognition of isolated Malayalam words using wavelet features and artificial neural network
    Krishnan, Vimal V. R.
    Jayakumar, Athulya
    Babu, Anto P.
    DELTA 2008: FOURTH IEEE INTERNATIONAL SYMPOSIUM ON ELECTRONIC DESIGN, TEST AND APPLICATIONS, PROCEEDINGS, 2008, : 240 - 243
  • [22] Using Parallel Architectures in Speech Recognition
    Cardinal, Patrick
    Dumouchel, Pierre
    Boulianne, Gilles
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 3011 - 3014
  • [23] Automatic Speech Recognition trained with Convolutional Neural Network and predicted with Recurrent Neural Network
    Soundarya, M.
    Karthikeyan, P. R.
    Thangarasu, Gunasekar
    2023 9TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENERGY SYSTEMS, ICEES, 2023, : 41 - 45
  • [24] Audiovisual speech recognition based on a deep convolutional neural network
    Rudregowda S.
    Patilkulkarni S.
    Ravi V.
    H.L. G.
    Krichen M.
    Data Science and Management, 2024, 7 (01): : 25 - 34
  • [25] Speech recognition for people with dysphasia using convolutional neural network
    Lin, Bo-Yu
    Huang, Hung-Shing
    Sheu, Ruey-Kai
    Chang, Yue-Shan
    2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 2164 - 2169
  • [26] Dysarthric Speech Recognition Using Convolutional LSTM Neural Network
    Kim, Myungjong
    Cao, Beiming
    An, Kwanghoon
    Wang, Jun
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2948 - 2952
  • [27] Speech Emotion Recognition based on Interactive Convolutional Neural Network
    Cheng, Huihui
    Tang, Xiaoyu
    2020 IEEE 3RD INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP 2020), 2020, : 163 - 167
  • [28] Evaluation of Deep Convolutional Neural Network architectures for Emotion Recognition in the Wild
    Talipu, A.
    Generosi, A.
    Mengoni, M.
    Giraldi, L.
    2019 IEEE 23RD INTERNATIONAL SYMPOSIUM ON CONSUMER TECHNOLOGIES (ISCT), 2019, : 25 - 27
  • [29] Recognition of Bengali Handwritten Digits Using Convolutional Neural Network Architectures
    Hasan, Md Mahmudul
    Ul Islam, Md Rafid
    Mahmood, Md Tareq
    2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
  • [30] Parallelizing Convolutional Neural Network for the Handwriting Recognition Problems with Different Architectures
    Zhou, Junhao
    Chen, Weibin
    Peng, Guishen
    Xiao, Hong
    Wang, Hao
    Chen, Zhigang
    PROCEEDINGS OF 2017 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC 2017), 2017, : 71 - 76