A Chinese acoustic model based on convolutional neural network

被引:3
|
作者
Zhang, Qiang [1 ,2 ]
Sang, Jun [1 ,2 ]
Alam, Mohammad S. [3 ]
Cai, Bin [1 ,2 ]
Yang, Li [1 ,2 ]
机构
[1] Chongqing Univ, Minist Educ, Key Lab Dependable Serv Comp Cyber Phys Soc, Chongqing 400044, Peoples R China
[2] Chongqing Univ, Sch Big Data & Software Engn, Chongqing 401331, Peoples R China
[3] Texas A&M Univ Kingsville, Frank H Dotterweich Coll Engn, Kingsville, TX 78363 USA
来源
关键词
Speech recognition; acoustic model; Chinese; convolutional neural network; connectionist temporal classification (CTC);
D O I
10.1117/12.2520356
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech recognition has always been one of the research focuses in the field of human-computer communication and interaction. The main purpose of automatic speech recognition (ASR) is to convert speech waveform signals into text. Acoustic model is the main component of ASR, which is used to connect the observation features of speech signals with the speech modeling units. In recent years, deep learning has become the mainstream technology in the field of speech recognition. In this paper, a convolutional neural network architecture composed of VGG and Connectionist Temporal Classification (CTC) loss function was proposed for speech recognition acoustic model. Traditional acoustic model training is based on frame-level labels with cross-entropy criterion, which requires a tedious label alignment procedure. The CTC loss was adopted to automatically learn the alignments between speech frames and label sequences, such that the training process is end-to-end. The architecture can exploit temporal and spectral structures of speech signals simultaneously. Batch normalization (BN) technique was used for normalizing each layers input to reduce internal covariance shift. To prevent overfitting, dropout technique was used during training to improve network generalization ability. The speech signal was transformed into a spectral image through a series of processing to be the input of the neural network. The input feature is 200 dimensions, and output labels of acoustic mode is 415 Chinese pronunciation without pitch. The experimental results demonstrated that the proposed model achieves the Character error rate (CER) of 17.97% and 23.86% on public Mandarin speech corpus, AISHELL-1 and ST-CMDS-20170001_1, respectively.
引用
收藏
页数:7
相关论文
共 50 条
  • [31] Acoustic Helicopter Recognition via Convolutional Neural Network
    Guo, Yang
    Zhou, Yi
    Guan, Luyang
    Bao, Ming
    2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
  • [32] Industrial Product Design based on Convolutional Neural Network Model
    Wang, Kang
    INTERNATIONAL JOURNAL OF MULTIPHYSICS, 2024, 18 (03) : 910 - 919
  • [33] Cyclic Convolutional Neural Network Model Based on Artificial Intelligence
    Ye, Tianchi
    Wang, Guiping
    Cai, Changqing
    APPLIED MATHEMATICS AND NONLINEAR SCIENCES, 2023,
  • [34] A Model of Traffic Accident Prediction Based on Convolutional Neural Network
    Lu Wenqi
    Luo Dongyu
    Yan Menghua
    2017 2ND IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION ENGINEERING (ICITE), 2017, : 198 - 202
  • [35] Lateral distance detection model based on convolutional neural network
    Zhang, Xiang
    Yang, Wei
    Tang, Xiaolin
    Wang, Yun
    IET INTELLIGENT TRANSPORT SYSTEMS, 2019, 13 (01) : 31 - 39
  • [36] Excimer laser model based on a temporal convolutional neural network
    Sun, Zexu
    Han, Xiaoquan
    Wu, Xiaobin
    Feng, Zebin
    APPLIED OPTICS, 2022, 61 (02) : 362 - 368
  • [37] Text Classification Based on Convolutional Neural Network and Attention Model
    Yang, Shuang
    Tang, Yan
    2020 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD 2020), 2020, : 67 - 73
  • [38] Image Annotation Based on Convolutional Neural Network and Topic Model
    Zhang Lei
    Cai Ming
    LASER & OPTOELECTRONICS PROGRESS, 2019, 56 (20)
  • [39] An image recognition model based on improved convolutional neural network
    Zhou T.
    Journal of Computational and Theoretical Nanoscience, 2016, 13 (07) : 4223 - 4229
  • [40] Probabilistic Model of Object Detection Based on Convolutional Neural Network
    Li, Fang-Qi
    Ren, Xu-Die
    Guo, Hao-Nan
    COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, 2019, 463 : 2059 - 2066