Speech Emotion Recognition Using a Dual-Channel Complementary Spectrogram and the CNN-SSAE Neutral Network

被引：10

作者：

Li, Juan ^{[1
,2
]}

Zhang, Xueying ^{[1
]}

Huang, Lixia ^{[1
]}

Li, Fenglian ^{[1
]}

Duan, Shufei ^{[1
]}

Sun, Ying ^{[1
]}

机构：

[1] Taiyuan Univ Technol, Coll Informat & Comp, Jinzhong 030600, Peoples R China

[2] Yuncheng Univ, Dept Phys & Elect Engn, Yuncheng 044000, Peoples R China

来源：

APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 19期

基金：

中国国家自然科学基金;

关键词：

speech emotion recognition; deep learning; Mel spectrogram; IMel spectrogram; STACKED SPARSE AUTOENCODER; SPECTRAL FEATURES; STRESS RECOGNITION; NEURAL-NETWORK; MODEL; PSO;

D O I：

10.3390/app12199518

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Featured Application Emotion recognition is the computer's automatic recognition of the emotional state of input speech. It is a hot research field, resulting from the mutual infiltration and interweaving of phonetics, psychology, digital signal processing, pattern recognition, and artificial intelligence. At present, speech emotion recognition has been widely used in the fields of intelligent signal processing, smart medical care, business intelligence, assistant lie detection, criminal investigation, the service industry, self-driving cars, voice assistants of smartphones, and human psychoanalysis, etc. In the background of artificial intelligence, the realization of smooth communication between people and machines has become the goal pursued by people. Mel spectrograms is a common method used in speech emotion recognition, focusing on the low-frequency part of speech. In contrast, the inverse Mel (IMel) spectrogram, which focuses on the high-frequency part, is proposed to comprehensively analyze emotions. Because the convolutional neural network-stacked sparse autoencoder (CNN-SSAE) can extract deep optimized features, the Mel-IMel dual-channel complementary structure is proposed. In the first channel, a CNN is used to extract the low-frequency information of the Mel spectrogram. The other channel extracts the high-frequency information of the IMel spectrogram. This information is transmitted into an SSAE to reduce the number of dimensions, and obtain the optimized information. Experimental results show that the highest recognition rates achieved on the EMO-DB, SAVEE, and RAVDESS datasets were 94.79%, 88.96%, and 83.18%, respectively. The conclusions are that the recognition rate of the two spectrograms was higher than that of each of the single spectrograms, which proves that the two spectrograms are complementary. The SSAE followed the CNN to get the optimized information, and the recognition rate was further improved, which proves the effectiveness of the CNN-SSAE network.

引用

页数：20

共 50 条

[31] Bioinspired dual-channel speech recognition using graphene-based electromyographic and mechanical sensors
Tian, He
Li, Xiaoshi
Wei, Yuhong
Ji, Shourui
Yang, Qisheng
Gou, Guang-Yang
Wang, Xuefeng
Wu, Fan
Jian, Jinming
Guo, Hao
Qiao, Yancong
Wang, Yu
Gu, Wen
Guo, Yizhe
Yang, Yi
Ren, Tian-Ling
CELL REPORTS PHYSICAL SCIENCE, 2022, 3 (10):
[32] A Text Emotion Analysis Method Using the Dual-Channel Convolution Neural Network in Social Networks
Wu, Di
Zhang, Jianpei
Zhao, Qingchao
MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020
[33] Time-Continuous Emotion Recognition Using Spectrogram Based CNN-RNN Modelling
Fedotov, Dmitrii
Kim, Bobae
Karpov, Alexey
Minker, Wolfgang
SPEECH AND COMPUTER, SPECOM 2019, 2019, 11658 : 93 - 102
[34] SPEECH TRANSMISSION INDEX MEASUREMENTS USING A DUAL-CHANNEL ANALYZER
HOEJBERG, K
JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 1986, 34 (12): : 1038 - 1038
[35] Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS
Toyoshima, Itsuki
Okada, Yoshifumi
Ishimaru, Momoko
Uchiyama, Ryunosuke
Tada, Mayu
SENSORS, 2023, 23 (03)
[36] MFF-SAug: Multi feature fusion with spectrogram augmentation of speech emotion recognition using convolution neural network
Jothimani, S.
Premalatha, K.
CHAOS SOLITONS & FRACTALS, 2022, 162
[37] Dual-channel convolutional neural network for power edge image recognition
Fangrong Zhou
Yi Ma
Bo Wang
Gang Lin
Journal of Cloud Computing, 10
[38] Pain Expression Recognition Based on Dual-Channel Convolutional Neural Network
Xu, Xuebin
Lei, Meng
Liu, Dehua
Wang, Muyu
ADVANCES IN NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, ICNC-FSKD 2022, 2023, 153 : 35 - 42
[39] Dual-channel convolutional neural network for power edge image recognition
Zhou, Fangrong
Ma, Yi
Wang, Bo
Lin, Gang
JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2021, 10 (01):
[40] Speech emotion recognition using the novel PEmoNet (Parallel Emotion Network)
Bhangale, Kishor B.
Kothandaraman, Mohanaprasad
APPLIED ACOUSTICS, 2023, 212

← 1 2 3 4 5 →