A Chinese acoustic model based on convolutional neural network

被引：3

作者：

Zhang, Qiang ^{[1
,2
]}

Sang, Jun ^{[1
,2
]}

Alam, Mohammad S. ^{[3
]}

Cai, Bin ^{[1
,2
]}

Yang, Li ^{[1
,2
]}

机构：

[1] Chongqing Univ, Minist Educ, Key Lab Dependable Serv Comp Cyber Phys Soc, Chongqing 400044, Peoples R China

[2] Chongqing Univ, Sch Big Data & Software Engn, Chongqing 401331, Peoples R China

[3] Texas A&M Univ Kingsville, Frank H Dotterweich Coll Engn, Kingsville, TX 78363 USA

来源：

PATTERN RECOGNITION AND TRACKING XXX | 2019年 / 10995卷

关键词：

Speech recognition; acoustic model; Chinese; convolutional neural network; connectionist temporal classification (CTC);

D O I：

10.1117/12.2520356

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech recognition has always been one of the research focuses in the field of human-computer communication and interaction. The main purpose of automatic speech recognition (ASR) is to convert speech waveform signals into text. Acoustic model is the main component of ASR, which is used to connect the observation features of speech signals with the speech modeling units. In recent years, deep learning has become the mainstream technology in the field of speech recognition. In this paper, a convolutional neural network architecture composed of VGG and Connectionist Temporal Classification (CTC) loss function was proposed for speech recognition acoustic model. Traditional acoustic model training is based on frame-level labels with cross-entropy criterion, which requires a tedious label alignment procedure. The CTC loss was adopted to automatically learn the alignments between speech frames and label sequences, such that the training process is end-to-end. The architecture can exploit temporal and spectral structures of speech signals simultaneously. Batch normalization (BN) technique was used for normalizing each layers input to reduce internal covariance shift. To prevent overfitting, dropout technique was used during training to improve network generalization ability. The speech signal was transformed into a spectral image through a series of processing to be the input of the neural network. The input feature is 200 dimensions, and output labels of acoustic mode is 415 Chinese pronunciation without pitch. The experimental results demonstrated that the proposed model achieves the Character error rate (CER) of 17.97% and 23.86% on public Mandarin speech corpus, AISHELL-1 and ST-CMDS-20170001_1, respectively.

引用

页数：7

共 50 条

[1] The Implementation of Chinese Acoustic Model Efficiency Optimization Based on Convolutional Neural Network
Zeng, Hao
Deng, Yinyin
Huang, Can
2018 INTERNATIONAL CONFERENCE ON COMPUTER INFORMATION SCIENCE AND APPLICATION TECHNOLOGY, 2019, 1168
[2] Chinese Character Style Transfer Model Based on Convolutional Neural Network
Chen, Weiran
Liu, Chunping
Ji, Yi
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 558 - 569
[3] Chinese Sentence Classification Based on Convolutional Neural Network
Gu, Chengwei
Wu, Ming
Zhang, Chuang
2017 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE APPLICATIONS AND TECHNOLOGIES (AIAAT 2017), 2017, 261
[4] An Effective Convolutional Neural Network Model for Chinese Sentiment Analysis
Zhang, Yu
Chen, Mengdong
Liu, Lianzhong
Wang, Yadong
APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2017, 1836
[5] Acoustic spatial patterns recognition based on convolutional neural network and acoustic visualization
Wu, Haijun
Wei, Xinyue
Zha, Yang
Jiang, Weikang
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2020, 147 (01): : 459 - 468
[6] Closed-Set Chinese Word Segmentation Based on Convolutional Neural Network Model
Xie, Zhipeng
CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2017, 2017, 10565 : 24 - 36
[7] SVD-BASED CHANNEL PRUNING FOR CONVOLUTIONAL NEURAL NETWORK IN ACOUSTIC SCENE CLASSIFICATION MODEL
Wang, Jun
Li, Shengchen
Wang, Wenwu
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2019, : 390 - 395
[8] Acoustic Modeling Using Auditory Model Features and Convolutional Neural Network
Suniya, V. S.
Mathew, Dominic
PROCEEDINGS OF 2015 IEEE INTERNATIONAL CONFERENCE ON POWER, INSTRUMENTATION, CONTROL AND COMPUTING (PICC), 2015,
[9] A Network Intrusion Detection Model Based on Convolutional Neural Network
Tao, Wenwei
Zhang, Wenzhe
Hu, Chao
Hu, Chaohui
SECURITY WITH INTELLIGENT COMPUTING AND BIG-DATA SERVICES, 2020, 895 : 771 - 783
[10] Design of underwater acoustic touchscreen based on deep convolutional neural network
Wan, Haopeng
Chen, Jiaming
Li, Shuang
Zou, Jijie
Jia, Kangning
Yuan, Peilong
Sun, Feiyang
Xu, Xiaodong
Cheng, Liping
Fan, Li
Yan, Xuejun
Li, Guokuan
Chen, Xi
Zhang, Haiou
APPLIED ACOUSTICS, 2023, 203

← 1 2 3 4 5 →