Parallel convolutional neural network and hybrid architectures for accented speech recognition in Malayalam

被引：0

作者：

Rizwana Kallooravi Thandil ^{[1
]}

V. K. Muneer ^{[1
]}

B. Premjith ^{[2
]}

机构：

[1] University of Calicut,Amrita School of Artificial Intelligence

[2] Amrita Vishwa Vidyapeetham,undefined

来源：

Iran Journal of Computer Science | 2025年 / 8卷 / 1期

关键词：

Accented speech recognition; Malayalam speech recognition; Speech signal preprocessing; Speech data augmentation; Dimensionality reduction; Speech feature extraction; Neural networks;

D O I：

10.1007/s42044-024-00212-w

中图分类号：

学科分类号：

摘要：

This study investigates different approaches to recognizing accented speech for the Malayalam language, a language spoken in the southern region of India. A dataset was constructed for different language accents to conduct the study since there were no freely available datasets in the domain. The data collected has been preprocessed by applying band-pass filters and audio normalization. The speech dataset has been augmented using time-stretching, pitch shifting, and adding Gaussian noise. A total of 585 acoustic features have been extracted from the speech signals using adaptive fast Fourier transform (FFT) window size, spectral contrast, Tonnetz and polyfeatures, harmonic-to-noise ratio (HNR) and formants, zero-crossing rate (ZCR) and short-term Fourier transform, root mean square (RMS) and Mel spectrogram, and Mel-frequency cepstral coefficients (MFCC) and its deltas. Five accented models were constructed using a 2D parallel convolutional neural network (CNN), 4D parallel CNN without attention block, 4D parallel CNN with attention block, Bidirectional long short-term memory, and CNN–long short-term memory hybrid methods. The accented models constructed using 4D Parallel with attention block and hybrid CNN–long short-term memory architecture exhibited better performance with high accuracy and low error rates among all the five model architectures.

引用

页码：125 / 149

页数：24

共 50 条

[41] Speech Emotion Recognition in Neurological Disorders Using Convolutional Neural Network
Zisad, Sharif Noor
Hossain, Mohammad Shahadat
Andersson, Karl
BRAIN INFORMATICS, BI 2020, 2020, 12241 : 287 - 296
[42] Convolutional Neural Network (CNN) framework proposed for Malayalam handwritten character recognition system using AlexNet
Manjusha, J.
James, A.
Chandran, Saravanan
EMERGING TRENDS IN ENGINEERING, SCIENCE AND TECHNOLOGY FOR SOCIETY, ENERGY AND ENVIRONMENT, 2018, : 889 - 894
[43] Visual Speech Recognition of Korean Words Using Convolutional Neural Network
Lee, Sung-Won
Yu, Je-Hun
Park, Seung Min
Sim, Kwee-Bo
INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS, 2019, 19 (01) : 1 - 9
[44] Exploring Convolutional Neural Network Structures and Optimization Techniques for Speech Recognition
Abdel-Hamid, Ossama
Deng, Li
Yu, Dong
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3365 - 3369
[45] SAM: A Rethinking of Prominent Convolutional Neural Network Architectures for Visual Object Recognition
Wang, Zhenyang
Deng, Zhidong
Wang, Shiyao
2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 1008 - 1014
[46] Convolutional Time Delay Neural Network for Khmer Automatic Speech Recognition
Srun, Nalin
Leang, Sotheara
Thu, Ye Kyaw
Sam, Sethserey
2022 17TH INTERNATIONAL JOINT SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE PROCESSING (ISAI-NLP 2022) / 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INTERNET OF THINGS (AIOT 2022), 2022,
[47] Constructing Speech Emotion Recognition Model Based on Convolutional Neural Network
Kuo, Jong-Yih
Chen, Zhao-Ming
Lin, Hui-Chi
2021 28TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE WORKSHOPS (APSECW 2021), 2021, : 52 - 56
[48] Convolutional Neural Network with Spectrogram and Perceptual Features for Speech Emotion Recognition
Zhang, Linjuan
Wang, Longbiao
Dang, Jianwu
Guo, Lili
Guan, Haotian
NEURAL INFORMATION PROCESSING (ICONIP 2018), PT IV, 2018, 11304 : 62 - 71
[49] Convolutional Neural Networks for Speech Recognition
Abdel-Hamid, Ossama
Mohamed, Abdel-Rahman
Jiang, Hui
Deng, Li
Penn, Gerald
Yu, Dong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) : 1533 - 1545
[50] Improving Accented Mandarin Speech Recognition by Using Recurrent Neural Network based Language Model Adaptation
Ni, Hao
Yi, Jiangyan
Wen, Zhengqi
Liu, Bin
Tao, Jianhua
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,

← 1 2 3 4 5 →