Speech Emotion Recognition Using a Dual-Channel Complementary Spectrogram and the CNN-SSAE Neutral Network

被引:10
|
作者
Li, Juan [1 ,2 ]
Zhang, Xueying [1 ]
Huang, Lixia [1 ]
Li, Fenglian [1 ]
Duan, Shufei [1 ]
Sun, Ying [1 ]
机构
[1] Taiyuan Univ Technol, Coll Informat & Comp, Jinzhong 030600, Peoples R China
[2] Yuncheng Univ, Dept Phys & Elect Engn, Yuncheng 044000, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 19期
基金
中国国家自然科学基金;
关键词
speech emotion recognition; deep learning; Mel spectrogram; IMel spectrogram; STACKED SPARSE AUTOENCODER; SPECTRAL FEATURES; STRESS RECOGNITION; NEURAL-NETWORK; MODEL; PSO;
D O I
10.3390/app12199518
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Featured Application Emotion recognition is the computer's automatic recognition of the emotional state of input speech. It is a hot research field, resulting from the mutual infiltration and interweaving of phonetics, psychology, digital signal processing, pattern recognition, and artificial intelligence. At present, speech emotion recognition has been widely used in the fields of intelligent signal processing, smart medical care, business intelligence, assistant lie detection, criminal investigation, the service industry, self-driving cars, voice assistants of smartphones, and human psychoanalysis, etc. In the background of artificial intelligence, the realization of smooth communication between people and machines has become the goal pursued by people. Mel spectrograms is a common method used in speech emotion recognition, focusing on the low-frequency part of speech. In contrast, the inverse Mel (IMel) spectrogram, which focuses on the high-frequency part, is proposed to comprehensively analyze emotions. Because the convolutional neural network-stacked sparse autoencoder (CNN-SSAE) can extract deep optimized features, the Mel-IMel dual-channel complementary structure is proposed. In the first channel, a CNN is used to extract the low-frequency information of the Mel spectrogram. The other channel extracts the high-frequency information of the IMel spectrogram. This information is transmitted into an SSAE to reduce the number of dimensions, and obtain the optimized information. Experimental results show that the highest recognition rates achieved on the EMO-DB, SAVEE, and RAVDESS datasets were 94.79%, 88.96%, and 83.18%, respectively. The conclusions are that the recognition rate of the two spectrograms was higher than that of each of the single spectrograms, which proves that the two spectrograms are complementary. The SSAE followed the CNN to get the optimized information, and the recognition rate was further improved, which proves the effectiveness of the CNN-SSAE network.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] Bioinspired dual-channel speech recognition using graphene-based electromyographic and mechanical sensors
    Tian, He
    Li, Xiaoshi
    Wei, Yuhong
    Ji, Shourui
    Yang, Qisheng
    Gou, Guang-Yang
    Wang, Xuefeng
    Wu, Fan
    Jian, Jinming
    Guo, Hao
    Qiao, Yancong
    Wang, Yu
    Gu, Wen
    Guo, Yizhe
    Yang, Yi
    Ren, Tian-Ling
    CELL REPORTS PHYSICAL SCIENCE, 2022, 3 (10):
  • [32] A Text Emotion Analysis Method Using the Dual-Channel Convolution Neural Network in Social Networks
    Wu, Di
    Zhang, Jianpei
    Zhao, Qingchao
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020
  • [33] Time-Continuous Emotion Recognition Using Spectrogram Based CNN-RNN Modelling
    Fedotov, Dmitrii
    Kim, Bobae
    Karpov, Alexey
    Minker, Wolfgang
    SPEECH AND COMPUTER, SPECOM 2019, 2019, 11658 : 93 - 102
  • [34] SPEECH TRANSMISSION INDEX MEASUREMENTS USING A DUAL-CHANNEL ANALYZER
    HOEJBERG, K
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 1986, 34 (12): : 1038 - 1038
  • [35] Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS
    Toyoshima, Itsuki
    Okada, Yoshifumi
    Ishimaru, Momoko
    Uchiyama, Ryunosuke
    Tada, Mayu
    SENSORS, 2023, 23 (03)
  • [36] MFF-SAug: Multi feature fusion with spectrogram augmentation of speech emotion recognition using convolution neural network
    Jothimani, S.
    Premalatha, K.
    CHAOS SOLITONS & FRACTALS, 2022, 162
  • [37] Dual-channel convolutional neural network for power edge image recognition
    Fangrong Zhou
    Yi Ma
    Bo Wang
    Gang Lin
    Journal of Cloud Computing, 10
  • [38] Pain Expression Recognition Based on Dual-Channel Convolutional Neural Network
    Xu, Xuebin
    Lei, Meng
    Liu, Dehua
    Wang, Muyu
    ADVANCES IN NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, ICNC-FSKD 2022, 2023, 153 : 35 - 42
  • [39] Dual-channel convolutional neural network for power edge image recognition
    Zhou, Fangrong
    Ma, Yi
    Wang, Bo
    Lin, Gang
    JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2021, 10 (01):
  • [40] Speech emotion recognition using the novel PEmoNet (Parallel Emotion Network)
    Bhangale, Kishor B.
    Kothandaraman, Mohanaprasad
    APPLIED ACOUSTICS, 2023, 212