A Chinese acoustic model based on convolutional neural network

被引:3
|
作者
Zhang, Qiang [1 ,2 ]
Sang, Jun [1 ,2 ]
Alam, Mohammad S. [3 ]
Cai, Bin [1 ,2 ]
Yang, Li [1 ,2 ]
机构
[1] Chongqing Univ, Minist Educ, Key Lab Dependable Serv Comp Cyber Phys Soc, Chongqing 400044, Peoples R China
[2] Chongqing Univ, Sch Big Data & Software Engn, Chongqing 401331, Peoples R China
[3] Texas A&M Univ Kingsville, Frank H Dotterweich Coll Engn, Kingsville, TX 78363 USA
来源
关键词
Speech recognition; acoustic model; Chinese; convolutional neural network; connectionist temporal classification (CTC);
D O I
10.1117/12.2520356
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech recognition has always been one of the research focuses in the field of human-computer communication and interaction. The main purpose of automatic speech recognition (ASR) is to convert speech waveform signals into text. Acoustic model is the main component of ASR, which is used to connect the observation features of speech signals with the speech modeling units. In recent years, deep learning has become the mainstream technology in the field of speech recognition. In this paper, a convolutional neural network architecture composed of VGG and Connectionist Temporal Classification (CTC) loss function was proposed for speech recognition acoustic model. Traditional acoustic model training is based on frame-level labels with cross-entropy criterion, which requires a tedious label alignment procedure. The CTC loss was adopted to automatically learn the alignments between speech frames and label sequences, such that the training process is end-to-end. The architecture can exploit temporal and spectral structures of speech signals simultaneously. Batch normalization (BN) technique was used for normalizing each layers input to reduce internal covariance shift. To prevent overfitting, dropout technique was used during training to improve network generalization ability. The speech signal was transformed into a spectral image through a series of processing to be the input of the neural network. The input feature is 200 dimensions, and output labels of acoustic mode is 415 Chinese pronunciation without pitch. The experimental results demonstrated that the proposed model achieves the Character error rate (CER) of 17.97% and 23.86% on public Mandarin speech corpus, AISHELL-1 and ST-CMDS-20170001_1, respectively.
引用
收藏
页数:7
相关论文
共 50 条
  • [41] A Visual Recognition Model Based on Improved Convolutional Neural Network
    Zhou, Jin
    Zhang, Yonglin
    Song, Shaoyun
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 126 : 260 - 260
  • [42] A Malicious URL Detection Model Based on Convolutional Neural Network
    Wang, Zhiqiang
    Ren, Xiaorui
    Li, Shuhao
    Wang, Bingyan
    Zhang, Jianyi
    Yang, Tao
    SECURITY AND COMMUNICATION NETWORKS, 2021, 2021
  • [43] A Convolutional Neural Network Approach for Acoustic Scene Classification
    Valenti, Michele
    Squartini, Stefano
    Diment, Aleksandr
    Parascandolo, Giambattista
    Virtanen, Tuomas
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 1547 - 1554
  • [44] ECG Automatic Classification Model Based on Convolutional Neural Network
    Ding, Ling-Juan
    Wang, Xin-Kang
    Gao, Jie
    Yang, Tao
    Wang, Fa-Xiang
    Wang, Liang-Hung
    2020 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TAIWAN), 2020,
  • [45] A Lightweight Convolutional Neural Network Model for Concrete Damage Classification using Acoustic Emissions
    Zhang, Yuxuan
    Bader, Sebastian
    Oelmann, Bengt
    2022 IEEE SENSORS APPLICATIONS SYMPOSIUM (SAS 2022), 2022,
  • [46] Influence of acoustic field interference structure on underwater acoustic target recognition based on a convolutional neural network
    Zhao, Meng
    Wang, Zhenzhu
    Wang, Wenbo
    Ren, Qunyan
    Ma, Li
    WUWNET'21: THE 15TH ACM INTERNATIONAL CONFERENCE ON UNDERWATER NETWORKS & SYSTEMS, 2021,
  • [47] A Road Segmentation Model Based on Mixture of the Convolutional Neural Network and the Transformer Network
    Xu, Fenglei
    Zhao, Haokai
    Hu, Fuyuan
    Shen, Mingfei
    Wu, Yifei
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2023, 135 (02): : 1559 - 1570
  • [48] Detection and recognition of Chinese character coded marks based on convolutional neural network
    Tao C.
    Shi Y.
    Zhang L.
    Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument, 2019, 40 (08): : 191 - 200
  • [49] Identification of Mongolian and Chinese Species in Natural Scenes Based on Convolutional Neural Network
    Zhang, Jianxin
    Hu, Chunxiao
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 2699 - 2704
  • [50] Tongue segmentation algorithm for traditional Chinese medicine based on convolutional neural network
    Sun, Pengzhao
    Yang, XiaoPing
    Ban, Yuhong
    AOPC 2019: OPTICAL SENSING AND IMAGING TECHNOLOGY, 2019, 11338