A Chinese acoustic model based on convolutional neural network

被引:3
|
作者
Zhang, Qiang [1 ,2 ]
Sang, Jun [1 ,2 ]
Alam, Mohammad S. [3 ]
Cai, Bin [1 ,2 ]
Yang, Li [1 ,2 ]
机构
[1] Chongqing Univ, Minist Educ, Key Lab Dependable Serv Comp Cyber Phys Soc, Chongqing 400044, Peoples R China
[2] Chongqing Univ, Sch Big Data & Software Engn, Chongqing 401331, Peoples R China
[3] Texas A&M Univ Kingsville, Frank H Dotterweich Coll Engn, Kingsville, TX 78363 USA
来源
关键词
Speech recognition; acoustic model; Chinese; convolutional neural network; connectionist temporal classification (CTC);
D O I
10.1117/12.2520356
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech recognition has always been one of the research focuses in the field of human-computer communication and interaction. The main purpose of automatic speech recognition (ASR) is to convert speech waveform signals into text. Acoustic model is the main component of ASR, which is used to connect the observation features of speech signals with the speech modeling units. In recent years, deep learning has become the mainstream technology in the field of speech recognition. In this paper, a convolutional neural network architecture composed of VGG and Connectionist Temporal Classification (CTC) loss function was proposed for speech recognition acoustic model. Traditional acoustic model training is based on frame-level labels with cross-entropy criterion, which requires a tedious label alignment procedure. The CTC loss was adopted to automatically learn the alignments between speech frames and label sequences, such that the training process is end-to-end. The architecture can exploit temporal and spectral structures of speech signals simultaneously. Batch normalization (BN) technique was used for normalizing each layers input to reduce internal covariance shift. To prevent overfitting, dropout technique was used during training to improve network generalization ability. The speech signal was transformed into a spectral image through a series of processing to be the input of the neural network. The input feature is 200 dimensions, and output labels of acoustic mode is 415 Chinese pronunciation without pitch. The experimental results demonstrated that the proposed model achieves the Character error rate (CER) of 17.97% and 23.86% on public Mandarin speech corpus, AISHELL-1 and ST-CMDS-20170001_1, respectively.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] The Implementation of Chinese Acoustic Model Efficiency Optimization Based on Convolutional Neural Network
    Zeng, Hao
    Deng, Yinyin
    Huang, Can
    2018 INTERNATIONAL CONFERENCE ON COMPUTER INFORMATION SCIENCE AND APPLICATION TECHNOLOGY, 2019, 1168
  • [2] Chinese Character Style Transfer Model Based on Convolutional Neural Network
    Chen, Weiran
    Liu, Chunping
    Ji, Yi
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 558 - 569
  • [3] Chinese Sentence Classification Based on Convolutional Neural Network
    Gu, Chengwei
    Wu, Ming
    Zhang, Chuang
    2017 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE APPLICATIONS AND TECHNOLOGIES (AIAAT 2017), 2017, 261
  • [4] An Effective Convolutional Neural Network Model for Chinese Sentiment Analysis
    Zhang, Yu
    Chen, Mengdong
    Liu, Lianzhong
    Wang, Yadong
    APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2017, 1836
  • [5] Acoustic spatial patterns recognition based on convolutional neural network and acoustic visualization
    Wu, Haijun
    Wei, Xinyue
    Zha, Yang
    Jiang, Weikang
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2020, 147 (01): : 459 - 468
  • [6] Closed-Set Chinese Word Segmentation Based on Convolutional Neural Network Model
    Xie, Zhipeng
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2017, 2017, 10565 : 24 - 36
  • [7] SVD-BASED CHANNEL PRUNING FOR CONVOLUTIONAL NEURAL NETWORK IN ACOUSTIC SCENE CLASSIFICATION MODEL
    Wang, Jun
    Li, Shengchen
    Wang, Wenwu
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2019, : 390 - 395
  • [8] Acoustic Modeling Using Auditory Model Features and Convolutional Neural Network
    Suniya, V. S.
    Mathew, Dominic
    PROCEEDINGS OF 2015 IEEE INTERNATIONAL CONFERENCE ON POWER, INSTRUMENTATION, CONTROL AND COMPUTING (PICC), 2015,
  • [9] A Network Intrusion Detection Model Based on Convolutional Neural Network
    Tao, Wenwei
    Zhang, Wenzhe
    Hu, Chao
    Hu, Chaohui
    SECURITY WITH INTELLIGENT COMPUTING AND BIG-DATA SERVICES, 2020, 895 : 771 - 783
  • [10] Design of underwater acoustic touchscreen based on deep convolutional neural network
    Wan, Haopeng
    Chen, Jiaming
    Li, Shuang
    Zou, Jijie
    Jia, Kangning
    Yuan, Peilong
    Sun, Feiyang
    Xu, Xiaodong
    Cheng, Liping
    Fan, Li
    Yan, Xuejun
    Li, Guokuan
    Chen, Xi
    Zhang, Haiou
    APPLIED ACOUSTICS, 2023, 203