Research on a Deep Learning Method for Speech Recognition

被引:0
|
作者
Xiao, Jia [1 ]
Xiaolin, Sun [1 ]
机构
[1] Artificial Intelligence and Software Engineering, Nanyang Normal University, Nanyang,473061, China
关键词
Audition - Convolution - Deep neural networks - Speech enhancement - Speech recognition;
D O I
暂无
中图分类号
学科分类号
摘要
Deep convolutional neural network (CNN) has been widely used in speech recognition technology. The model based on deep CNN can effectively improve the quality of human-computer interaction. However, the existing CNN with fixed convolutional kernel size has a disadvantage on extracting data features. It is hard to effectively identify whether the extracted features sufficient or not. As a result, a self-tuning convolutional kernel (STCK) algorithm is proposed to solve the mentioned problem. Firstly, the computational process of STCK algorithm is derived. Then the calculation formula of the convolutional kernel size is obtained. Meanwhile, Bark-spectrum is introduced to extract the spectrogram of speech signal, which is used as the CNN input to adapt to the human hearing. In addition, the data enhancement strategies are proposed, namely frame channel shielding and Bark-band channel shielding. The presented strategies can further improve the generalization ability of the recognition model. The experimental results show that, compared with another two models (the CNN model without STCK algorithm and the CNN model without the data enhancement strategy), the training loss of the proposed method is minimum. And the recognition error rates for the test samples are reduced by 3.9% and 1%, respectively. © (2024), (International Association of Engineers). All Rights Reserved.
引用
收藏
页码:1272 / 1280
相关论文
共 50 条
  • [21] Persian speech recognition using deep learning
    Veisi, Hadi
    Haji Mani, Armita
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (04) : 893 - 905
  • [22] Speech Emotion Recognition Using Deep Learning
    Alagusundari, N.
    Anuradha, R.
    ARTIFICIAL INTELLIGENCE: THEORY AND APPLICATIONS, VOL 1, AITA 2023, 2024, 843 : 313 - 325
  • [23] Arabic Speech Recognition with Deep Learning: A Review
    Algihab, Wajdan
    Alawwad, Noura
    Aldawish, Anfal
    AlHumoud, Sarah
    SOCIAL COMPUTING AND SOCIAL MEDIA: DESIGN, HUMAN BEHAVIOR AND ANALYTICS, SCSM 2019, PT I, 2019, 11578 : 15 - 31
  • [24] Deep Learning for Environmentally Robust Speech Recognition
    Alhamada, A., I
    Khalifa, O. O.
    Abdalla, A. H.
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON ELECTRONIC DEVICES, SYSTEMS AND APPLICATIONS (ICEDSA2020), 2020, 2306
  • [25] Speech Command Recognition Using Deep Learning
    Ayache, Mohammad
    Kanaan, Hussien
    Kassir, Kawthar
    Kassir, Yasser
    2021 SIXTH INTERNATIONAL CONFERENCE ON ADVANCES IN BIOMEDICAL ENGINEERING (ICABME), 2021, : 24 - 29
  • [26] Emotion Recognition in Speech with Deep Learning Architectures
    Erdal, Mehmet
    Kaechele, Markus
    Schwenker, Friedhelm
    ARTIFICIAL NEURAL NETWORKS IN PATTERN RECOGNITION, 2016, 9896 : 298 - 311
  • [27] Speech Emotion Recognition Using Deep Learning
    Ahmed, Waqar
    Riaz, Sana
    Iftikhar, Khunsa
    Konur, Savas
    ARTIFICIAL INTELLIGENCE XL, AI 2023, 2023, 14381 : 191 - 197
  • [28] EFFICIENT DEEP LEARNING FOR PATHOLOGICAL SPEECH RECOGNITION
    Pham, Tuan D.
    2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 103 - 104
  • [29] Deep learning for Depression Recognition from Speech
    Tian, Han
    Zhu, Zhang
    Jing, Xu
    MOBILE NETWORKS & APPLICATIONS, 2023, 29 (4): : 1212 - 1227
  • [30] Persian speech recognition using deep learning
    Hadi Veisi
    Armita Haji Mani
    International Journal of Speech Technology, 2020, 23 : 893 - 905