Research on a Deep Learning Method for Speech Recognition

被引:0
|
作者
Xiao, Jia [1 ]
Xiaolin, Sun [1 ]
机构
[1] Artificial Intelligence and Software Engineering, Nanyang Normal University, Nanyang,473061, China
关键词
Audition - Convolution - Deep neural networks - Speech enhancement - Speech recognition;
D O I
暂无
中图分类号
学科分类号
摘要
Deep convolutional neural network (CNN) has been widely used in speech recognition technology. The model based on deep CNN can effectively improve the quality of human-computer interaction. However, the existing CNN with fixed convolutional kernel size has a disadvantage on extracting data features. It is hard to effectively identify whether the extracted features sufficient or not. As a result, a self-tuning convolutional kernel (STCK) algorithm is proposed to solve the mentioned problem. Firstly, the computational process of STCK algorithm is derived. Then the calculation formula of the convolutional kernel size is obtained. Meanwhile, Bark-spectrum is introduced to extract the spectrogram of speech signal, which is used as the CNN input to adapt to the human hearing. In addition, the data enhancement strategies are proposed, namely frame channel shielding and Bark-band channel shielding. The presented strategies can further improve the generalization ability of the recognition model. The experimental results show that, compared with another two models (the CNN model without STCK algorithm and the CNN model without the data enhancement strategy), the training loss of the proposed method is minimum. And the recognition error rates for the test samples are reduced by 3.9% and 1%, respectively. © (2024), (International Association of Engineers). All Rights Reserved.
引用
收藏
页码:1272 / 1280
相关论文
共 50 条
  • [31] Fake Speech Recognition Using Deep Learning
    Camacho, Steven
    Maria Ballesteros, Dora
    Renza, Diego
    APPLIED COMPUTER SCIENCES IN ENGINEERING, WEA 2021, 2021, 1431 : 38 - 48
  • [32] Correction to: Deep learning approaches for speech emotion recognition: state of the art and research challenges
    Rashid Jahangir
    Ying Wah Teh
    Faiqa Hanif
    Ghulam Mujtaba
    Multimedia Tools and Applications, 2021, 80 : 23813 - 23813
  • [33] Research on Speech Accurate Recognition Technology Based on Deep Learning DNN-HMM
    Xia Wanyu
    Qiu Wu
    Feng Xiancheng
    MIPPR 2019: PATTERN RECOGNITION AND COMPUTER VISION, 2020, 11430
  • [34] Research on Image Recognition Method of Class Graph Based on Deep Learning
    Wang, Kai
    Liu, Wei
    Gao, Sheng
    Mu, Yongan
    Xu, Fan
    2023 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE INNOVATION, ICAII 2023, 2023, : 65 - 71
  • [35] Research on deep feature learning and condition recognition method for bearing vibration
    Zhu, Xiaoxun
    Luo, Xuezhi
    Zhao, Jianhong
    Hou, Dongnan
    Han, Zhonghe
    Wang, Yu
    APPLIED ACOUSTICS, 2020, 168
  • [36] Research on Recognition Method of Zanthoxylum Armatum Rust Based on Deep Learning
    Xu, Jie
    Wei, Haoliang
    Ye, Meng
    Wang, Wei
    ICCBB 2019: PROCEEDINGS OF THE 2019 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, : 84 - 88
  • [37] Research on Face Recognition Method Based on Deep Learning in Natural Environment
    Yan, Jiali
    Zhang, Longfei
    Wu, YuFeng
    Guo, Penghui
    Zhang, Fuquan
    Tang, Shuo
    Ding, Gangyi
    Zhang, Fuquan
    Xu, Lin
    2017 IEEE 8TH INTERNATIONAL CONFERENCE ON AWARENESS SCIENCE AND TECHNOLOGY (ICAST), 2017, : 501 - 506
  • [38] AVIATION PROFILING METHOD BASED ON DEEP LEARNING TECHNOLOGY FOR EMOTION RECOGNITION BY SPEECH SIGNAL
    Koshekov, K. T.
    Savostin, A. A.
    Seidakhmetov, B. K.
    Anayatova, R. K.
    Fedorov, I. O.
    TRANSPORT AND TELECOMMUNICATION JOURNAL, 2021, 22 (04) : 471 - 481
  • [39] Embedded deep learning models for multilingual speech recognition
    Rahmouni, Mohamed Hedi
    Salhi, Mohamed Salah
    Touti, Ezzeddine
    Allagui, Hatem
    Aoudia, Mouloud
    Barr, Mohammad
    COMPUTERS & ELECTRICAL ENGINEERING, 2025, 123
  • [40] DISTRIBUTED DEEP LEARNING STRATEGIES FOR AUTOMATIC SPEECH RECOGNITION
    Zhang, Wei
    Cui, Xiaodong
    Finkler, Ulrich
    Kingsbury, Brian
    Saon, George
    Kung, David
    Picheny, Michael
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5706 - 5710