Speech Emotion Recognition Based on Attention MCNN Combined With Gender Information

被引:5
|
作者
Hu, Zhangfang [1 ]
LingHu, Kehuan [1 ]
Yu, Hongling [1 ]
Liao, Chenzhuo [1 ]
机构
[1] Chongqing Univ Posts & Telecommun CQUPT, Key Lab Optoelect Informat Sensing & Technol, Chongqing 400065, Peoples R China
基金
中国国家自然科学基金;
关键词
Emotion recognition; Speech recognition; Mel frequency cepstral coefficient; Gender issues; Feature extraction; Convolutional neural networks; Three-dimensional displays; SER; convolutional neural network; gender information; attention; GRU; FEATURES;
D O I
10.1109/ACCESS.2023.3278106
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Emotion recognition is susceptible to interference such as feature redundancy and speaker gender differences, resulting in low recognition accuracy. This paper proposes a speech emotion recognition (SER) method based on attention mixed convolutional neural network (MCNN) combined with gender information, including two stages of gender recognition and emotion recognition. (1) Using MCNN to identify gender and classify speech samples into male and female. (2) According to the output of the first stage classification, a gender-specific emotion recognition model is established by introducing coordinated attention and a series of gated recurrent network units connecting the attention mechanism (A-GRUs) to achieve emotion recognition results of different genders. The inputs of both stages are dynamic 3D MFCC features generated from the original speech database. The proposed method achieves 95.02% and 86.34% accuracy on EMO-DB and RAVDESS datasets, respectively. The experimental results show that the proposed SER system combined with gender information significantly improves the recognition performance.
引用
收藏
页码:50285 / 50294
页数:10
相关论文
共 50 条
  • [41] A novel dual attention-based BLSTM with hybrid features in speech emotion recognition
    Chen, Qiupu
    Huang, Guimin
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 102
  • [42] A novel dual attention-based BLSTM with hybrid features in speech emotion recognition
    Chen, Qiupu
    Huang, Guimin
    Engineering Applications of Artificial Intelligence, 2021, 102
  • [43] A Robust Framework for Speech Emotion Recognition Using Attention Based Convolutional Peephole LSTM
    Paramasivam, Ramya
    Lavanya, K.
    Divakarachari, Parameshachari Bidare
    Camacho, David
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2025,
  • [44] Self-attention transfer networks for speech emotion recognition
    Ziping ZHAO
    Keru Wang
    Zhongtian BAO
    Zixing ZHANG
    Nicholas CUMMINS
    Shihuang SUN
    Haishuai WANG
    Jianhua TAO
    Bj?rn W.SCHULLER
    虚拟现实与智能硬件(中英文), 2021, 3 (01) : 43 - 54
  • [45] CONTEXT-AWARE ATTENTION MECHANISM FOR SPEECH EMOTION RECOGNITION
    Ramet, Gaetan
    Garner, Philip N.
    Baeriswyl, Michael
    Lazaridis, Alexandros
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 126 - 131
  • [46] SPEECH EMOTION RECOGNITION WITH MULTISCALE AREA ATTENTION AND DATA AUGMENTATION
    Xu, Mingke
    Zhang, Fan
    Cui, Xiaodong
    Zhang, Wei
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6319 - 6323
  • [47] Speech emotion recognition with embedded attention mechanism and hierarchical context
    Cheng Y.
    Chen Y.
    Chen Y.
    Yang Y.
    Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2019, 51 (11): : 100 - 107
  • [48] EFFECTIVE ATTENTION MECHANISM IN DYNAMIC MODELS FOR SPEECH EMOTION RECOGNITION
    Hsiao, Po-Wei
    Chen, Chia-Ping
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2526 - 2530
  • [49] Spatiotemporal and frequential cascaded attention networks for speech emotion recognition
    Li, Shuzhen
    Xing, Xiaofen
    Fan, Weiquan
    Cai, Bolun
    Fordson, Perry
    Xu, Xiangmin
    Neurocomputing, 2021, 448 : 238 - 248
  • [50] Pyramid Memory Block and Timestep Attention for Speech Emotion Recognition
    Gao, Miao
    Yang, Chun
    Zhou, Fang
    Yin, Xu-cheng
    INTERSPEECH 2019, 2019, : 3930 - 3934