Hybrid deep learning based automatic speech recognition model for recognizing non-Indian languages

被引:0
|
作者
Gupta, Astha [1 ]
Kumar, Rakesh [1 ]
Kumar, Yogesh [2 ]
机构
[1] Chandigarh Univ, Dept Comp Sci & Engn, Mohali, Punjab, India
[2] Indus Univ, Indus Inst Technol & Engn, Ahmadabad, Gujarat, India
关键词
Automatic Speech Recognition; Spectrogram; Short Term Fourier transform; MFCC; ResNet10; Inception V3; VGG16; DenseNet201; EfficientNetB0;
D O I
10.1007/s11042-023-16748-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech is a natural phenomenon and a significant mode of communication used by humans that is divided into two categories, human-to-human and human-to-machine. Human-to-human communication depends on the language the speaker uses. In contrast, human-to-machine communication is a technique in which machines recognize human speech and act accordingly, often termed Automatic Speech Recognition (ASR). Recognition of Non-Indian language is challenging due to pitch variations and other factors such as accent, pronunciation, etc. This paper proposes a novel approach based on Dense Net201 and EfficientNetB0, i.e., a hybrid model for the recognition of Speech. Initially, 76,263 speech samples are taken from 11 non-Indian languages, including Chinese, Dutch, Finnish, French, German, Greek, Hungarian, Japanese, Russian, Spanish and Persian. When collected, these speech samples are pre-processed by removing noise. Then, Spectrogram, Short-Term Fourier Transform (STFT), Spectral Rolloff-Bandwidth, Mel-frequency Cepstral Coefficient (MFCC), and Chroma feature are used to extract features from the speech sample. Further, a comparative analysis of the proposed approach is shown with other Deep Learning (DL) models like ResNet10, Inception V3, VGG16, DenseNet201, and EfficientNetB0. Standard parameters like Precision, Recall, F1-Score, Confusion Matrix, Accuracy, and Loss curves are used to evaluate the performance of each model by considering speech samples from all the languages mentioned above. Thus, the experimental results show that the hybrid model stands out from all the other models by giving the highest recognition accuracy of 99.84% with a loss of 0.004%.
引用
收藏
页码:30145 / 30166
页数:22
相关论文
共 50 条
  • [21] Automatic Speech Recognition: A survey of deep learning techniques and approaches
    Ahlawat, Harsh
    Aggarwal, Naveen
    Gupta, Deepti
    International Journal of Cognitive Computing in Engineering, 2025, 6 : 201 - 237
  • [22] Investigating a Hybrid Learning Approach for Robust Automatic Speech Recognition
    Pironkov, Gueorgui
    Wood, Sean U. N.
    Dupont, Stephane
    Dutoit, Thierry
    STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2018, 2018, 11171 : 67 - 78
  • [23] DLD: An Optimized Chinese Speech Recognition Model Based on Deep Learning
    Lei, Hong
    Xiao, Yue
    Liang, Yanchun
    Li, Dalin
    Lee, Heow Pueh
    COMPLEXITY, 2022, 2022
  • [24] Deep-Learning-Based BCI for Automatic Imagined Speech Recognition Using SPWVD
    Kamble, Ashwin
    Ghare, Pradnya H.
    Kumar, Vinay
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [25] An Overview of Automatic Speech Recognition Based on Deep Learning and Bio-Signal Sensors
    Venkatesh, N.
    Krishna, K. Sai
    Geetha, M. P.
    Dave, Megha R.
    Kapila, Dhiraj
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE, MACHINE LEARNING AND APPLICATIONS, VOL 1, ICDSMLA 2023, 2025, 1273 : 1068 - 1076
  • [26] Deep-Learning-Based BCI for Automatic Imagined Speech Recognition Using SPWVD
    Kamble, Ashwin
    Ghare, Pradnya H. H.
    Kumar, Vinay
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [27] Deep-Learning-Based BCI for Automatic Imagined Speech Recognition Using SPWVD
    Kamble, Ashwin
    Ghare, Pradnya H.
    Kumar, Vinay
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [28] RETRACTED: Hybrid Algorithm for English Translation Speech Recognition Based on Deep Learning Model and Clustering (Retracted Article)
    Zhang, Baicheng
    SECURITY AND COMMUNICATION NETWORKS, 2022, 2022
  • [29] An automatic speech recognition system in Indian and foreign languages: A state-of-the-art review analysis
    Gupta A.
    Kumar R.
    Kumar Y.
    Intelligent Decision Technologies, 2023, 17 (02) : 505 - 526
  • [30] A Hybrid Deep Learning Model for Recognizing Actions of Distracted Drivers
    Jiao, Shuang-Jian
    Liu, Lin-Yao
    Liu, Qian
    SENSORS, 2021, 21 (21)