Parallel convolutional neural network and hybrid architectures for accented speech recognition in Malayalam

被引：0

作者：

Rizwana Kallooravi Thandil ^{[1
]}

V. K. Muneer ^{[1
]}

B. Premjith ^{[2
]}

机构：

[1] University of Calicut,Amrita School of Artificial Intelligence

[2] Amrita Vishwa Vidyapeetham,undefined

来源：

Iran Journal of Computer Science | 2025年 / 8卷 / 1期

关键词：

Accented speech recognition; Malayalam speech recognition; Speech signal preprocessing; Speech data augmentation; Dimensionality reduction; Speech feature extraction; Neural networks;

D O I：

10.1007/s42044-024-00212-w

中图分类号：

学科分类号：

摘要：

This study investigates different approaches to recognizing accented speech for the Malayalam language, a language spoken in the southern region of India. A dataset was constructed for different language accents to conduct the study since there were no freely available datasets in the domain. The data collected has been preprocessed by applying band-pass filters and audio normalization. The speech dataset has been augmented using time-stretching, pitch shifting, and adding Gaussian noise. A total of 585 acoustic features have been extracted from the speech signals using adaptive fast Fourier transform (FFT) window size, spectral contrast, Tonnetz and polyfeatures, harmonic-to-noise ratio (HNR) and formants, zero-crossing rate (ZCR) and short-term Fourier transform, root mean square (RMS) and Mel spectrogram, and Mel-frequency cepstral coefficients (MFCC) and its deltas. Five accented models were constructed using a 2D parallel convolutional neural network (CNN), 4D parallel CNN without attention block, 4D parallel CNN with attention block, Bidirectional long short-term memory, and CNN–long short-term memory hybrid methods. The accented models constructed using 4D Parallel with attention block and hybrid CNN–long short-term memory architecture exhibited better performance with high accuracy and low error rates among all the five model architectures.

引用

页码：125 / 149

页数：24

共 50 条

[21] Speech recognition of isolated Malayalam words using wavelet features and artificial neural network
Krishnan, Vimal V. R.
Jayakumar, Athulya
Babu, Anto P.
DELTA 2008: FOURTH IEEE INTERNATIONAL SYMPOSIUM ON ELECTRONIC DESIGN, TEST AND APPLICATIONS, PROCEEDINGS, 2008, : 240 - 243
[22] Using Parallel Architectures in Speech Recognition
Cardinal, Patrick
Dumouchel, Pierre
Boulianne, Gilles
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 3011 - 3014
[23] Automatic Speech Recognition trained with Convolutional Neural Network and predicted with Recurrent Neural Network
Soundarya, M.
Karthikeyan, P. R.
Thangarasu, Gunasekar
2023 9TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENERGY SYSTEMS, ICEES, 2023, : 41 - 45
[24] Audiovisual speech recognition based on a deep convolutional neural network
Rudregowda S.
Patilkulkarni S.
Ravi V.
H.L. G.
Krichen M.
Data Science and Management, 2024, 7 (01): : 25 - 34
[25] Speech recognition for people with dysphasia using convolutional neural network
Lin, Bo-Yu
Huang, Hung-Shing
Sheu, Ruey-Kai
Chang, Yue-Shan
2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 2164 - 2169
[26] Dysarthric Speech Recognition Using Convolutional LSTM Neural Network
Kim, Myungjong
Cao, Beiming
An, Kwanghoon
Wang, Jun
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2948 - 2952
[27] Speech Emotion Recognition based on Interactive Convolutional Neural Network
Cheng, Huihui
Tang, Xiaoyu
2020 IEEE 3RD INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP 2020), 2020, : 163 - 167
[28] Evaluation of Deep Convolutional Neural Network architectures for Emotion Recognition in the Wild
Talipu, A.
Generosi, A.
Mengoni, M.
Giraldi, L.
2019 IEEE 23RD INTERNATIONAL SYMPOSIUM ON CONSUMER TECHNOLOGIES (ISCT), 2019, : 25 - 27
[29] Recognition of Bengali Handwritten Digits Using Convolutional Neural Network Architectures
Hasan, Md Mahmudul
Ul Islam, Md Rafid
Mahmood, Md Tareq
2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
[30] Parallelizing Convolutional Neural Network for the Handwriting Recognition Problems with Different Architectures
Zhou, Junhao
Chen, Weibin
Peng, Guishen
Xiao, Hong
Wang, Hao
Chen, Zhigang
PROCEEDINGS OF 2017 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC 2017), 2017, : 71 - 76

← 1 2 3 4 5 →