Parallel convolutional neural network and hybrid architectures for accented speech recognition in Malayalam

被引：0

作者：

Rizwana Kallooravi Thandil ^{[1
]}

V. K. Muneer ^{[1
]}

B. Premjith ^{[2
]}

机构：

[1] University of Calicut,Amrita School of Artificial Intelligence

[2] Amrita Vishwa Vidyapeetham,undefined

来源：

Iran Journal of Computer Science | 2025年 / 8卷 / 1期

关键词：

Accented speech recognition; Malayalam speech recognition; Speech signal preprocessing; Speech data augmentation; Dimensionality reduction; Speech feature extraction; Neural networks;

D O I：

10.1007/s42044-024-00212-w

中图分类号：

学科分类号：

摘要：

This study investigates different approaches to recognizing accented speech for the Malayalam language, a language spoken in the southern region of India. A dataset was constructed for different language accents to conduct the study since there were no freely available datasets in the domain. The data collected has been preprocessed by applying band-pass filters and audio normalization. The speech dataset has been augmented using time-stretching, pitch shifting, and adding Gaussian noise. A total of 585 acoustic features have been extracted from the speech signals using adaptive fast Fourier transform (FFT) window size, spectral contrast, Tonnetz and polyfeatures, harmonic-to-noise ratio (HNR) and formants, zero-crossing rate (ZCR) and short-term Fourier transform, root mean square (RMS) and Mel spectrogram, and Mel-frequency cepstral coefficients (MFCC) and its deltas. Five accented models were constructed using a 2D parallel convolutional neural network (CNN), 4D parallel CNN without attention block, 4D parallel CNN with attention block, Bidirectional long short-term memory, and CNN–long short-term memory hybrid methods. The accented models constructed using 4D Parallel with attention block and hybrid CNN–long short-term memory architecture exhibited better performance with high accuracy and low error rates among all the five model architectures.

引用

页码：125 / 149

页数：24

共 50 条

[1] SIMPLIFYING VERY DEEP CONVOLUTIONAL NEURAL NETWORK ARCHITECTURES FOR ROBUST SPEECH RECOGNITION
Rownicka, Joanna
Renals, Steve
Bell, Peter
2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 236 - 243
[2] Malayalam Handwritten Character Recognition Using Convolutional Neural Network
Nair, Pranav P.
James, Ajay
Saravanan, C.
PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICICCT), 2017, : 278 - 281
[3] Speech Emotion Recognition through Hybrid Features and Convolutional Neural Network
Alluhaidan, Ala Saleh
Saidani, Oumaima
Jahangir, Rashid
Nauman, Muhammad Asif
Neffati, Omnia Saidani
APPLIED SCIENCES-BASEL, 2023, 13 (08):
[4] Evolution of Neural Network Architectures for Speech Recognition
Bourlard, Herve
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1767 - 1767
[5] Implementation of Convolutional Neural Network for Speech Recognition
Wang, Zhichao
Na, Xingyu
Liu, Yong
Pan, Jielin
Yan, Yonghong
INTERNATIONAL ACADEMIC CONFERENCE ON THE INFORMATION SCIENCE AND COMMUNICATION ENGINEERING (ISCE 2014), 2014, : 239 - 243
[6] Convolutional Neural Networks for the Recognition of Malayalam Characters
Anil, R.
Manjusha, K.
Kumar, S. Sachin
Soman, K. P.
PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON FRONTIERS OF INTELLIGENT COMPUTING: THEORY AND APPLICATIONS (FICTA) 2014, VOL 2, 2015, 328 : 493 - 500
[7] Simplified neural network architectures for a hybrid speech recognition system with small vocabulary size
Sedarat, H
Khadem, R
Franco, H
PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 1113 - 1116
[8] A Hybrid convolutional neural network for sketch recognition
Zhang, Xingyuan
Huang, Yaping
Zou, Qi
Pei, Yanting
Zhang, Runsheng
Wang, Song
PATTERN RECOGNITION LETTERS, 2020, 130 : 73 - 82
[9] Deep Convolutional Neural Network for Arabic Speech Recognition
Amari, Rafik
Noubigh, Zouhaira
Zrigui, Salah
Berchech, Dhaou
Nicolas, Henri
Zrigui, Mounir
COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2022, 2022, 13501 : 120 - 134
[10] Crossmixed convolutional neural network for digital speech recognition
Diep, Quoc Bao
Phan, Hong Yen
Truong, Thanh-Cong
PLOS ONE, 2024, 19 (04):

← 1 2 3 4 5 →