English speech recognition based on deep learning with multiple features

被引：2

作者：

Zhaojuan Song

机构：

[1] School of Translation Studies of Qufu Normal University,

来源：

Computing | 2020年 / 102卷

关键词：

Deep neural network; Fusion; Speech recognition; Multiple features; 68T10; 68T35; 68T50;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

English is one of the widely used languages, with the shrinking of the global village, the smart home, the in-vehicle voice system and voice recognition software with English as the recognition language have gradually entered people’s field of vision, and have obtained the majority of users’ love by the practical accuracy. And deep learning technology in many tasks with its hierarchical feature learning ability and data modeling capabilities has achieved more than the performance of shallow learning technology. Therefore, this paper takes English speech as the research object, and proposes a deep learning speech recognition algorithm that combines speech features and speech attributes. Firstly, the deep neural network supervised learning method is used to extract the high-level features of the speech, select the output of the fixed hidden layer as the new speech feature for the newly generated network, and train the GMM–HMM acoustic model with the new speech features; secondly, the speech attribute extractor based on deep neural network is trained for multiple speech attributes, and the extracted speech attributes are classified into phoneme by deep neural network; finally, speech features and speech attribute features are merged into the same CNN framework by the neural network based on the linear feature fusion algorithm. The experimental results show that the proposed English speech recognition algorithm based on deep neural network with multiple features can directly and effectively combine the two methods by combining the speech features and the speech attributes of the speaker in the input layer of the deep neural network, and it can improve the performance of the English speech recognition system significantly.

引用

页码：663 / 682

页数：19

共 50 条

[41] Deep Learning-based Telephony Speech Recognition in the Wild
Han, Kyu J.
Hahm, Seongjun
Kim, Byung-Hak
Kim, Jungsuk
Lane, Ian
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1323 - 1327
[42] Deep Learning Approach towards Emotion Recognition Based on Speech
Butala, Padmanabh
Pawar, Rajendra
Jadhav, Nagesh
Kalangan, Manas
Dhumal, Aniket
Kakad, Sahil
JOURNAL OF ADVANCED APPLIED SCIENTIFIC RESEARCH, 2024, 6 (03): : 16 - 24
[43] Feature Fusion of Speech Emotion Recognition Based on Deep Learning
Liu, Gang
He, Wei
Jin, Bicheng
PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC), 2018, : 193 - 197
[44] Deep Learning Based Emotion Recognition from Chinese Speech
Zhang, Weishan
Zhao, Dehai
Chen, Xiufeng
Zhang, Yuanjie
INCLUSIVE SMART CITIES AND DIGITAL HEALTH, 2016, 9677 : 49 - 58
[45] Challenge Based Visual Speech Recognition Using Deep Learning
McShane, Philip
Stewart, Darryl
2017 12TH INTERNATIONAL CONFERENCE FOR INTERNET TECHNOLOGY AND SECURED TRANSACTIONS (ICITST), 2017, : 405 - 410
[46] Speech Emotion Recognition with Deep Learning
Harar, Pavol
Burget, Radim
Dutta, Malay Kishore
2017 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2017, : 137 - 140
[47] Speech Recognition using Deep Learning
Lakkhanawannakun, Phoemporn
Noyunsan, Chaluemwut
2019 34TH INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS AND COMMUNICATIONS (ITC-CSCC 2019), 2019, : 514 - 517
[48] Deep Learning for Emotional Speech Recognition
Sanchez-Gutierrez, Maximo E.
Marcelo Albornoz, E.
Martinez-Licona, Fabiola
Leonardo Rufiner, H.
Goddard, John
PATTERN RECOGNITION, MCPR 2014, 2014, 8495 : 311 - +
[49] Deep Learning for Emotional Speech Recognition
Alhamada, M., I
Khalifa, O. O.
Abdalla, A. H.
PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON ELECTRONIC DEVICES, SYSTEMS AND APPLICATIONS (ICEDSA2020), 2020, 2306
[50] RETRACTED: Hybrid Algorithm for English Translation Speech Recognition Based on Deep Learning Model and Clustering (Retracted Article)
Zhang, Baicheng
SECURITY AND COMMUNICATION NETWORKS, 2022, 2022

← 1 2 3 4 5 →