Speech Emotion Recognition Based on Attention MCNN Combined With Gender Information

被引:5
|
作者
Hu, Zhangfang [1 ]
LingHu, Kehuan [1 ]
Yu, Hongling [1 ]
Liao, Chenzhuo [1 ]
机构
[1] Chongqing Univ Posts & Telecommun CQUPT, Key Lab Optoelect Informat Sensing & Technol, Chongqing 400065, Peoples R China
基金
中国国家自然科学基金;
关键词
Emotion recognition; Speech recognition; Mel frequency cepstral coefficient; Gender issues; Feature extraction; Convolutional neural networks; Three-dimensional displays; SER; convolutional neural network; gender information; attention; GRU; FEATURES;
D O I
10.1109/ACCESS.2023.3278106
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Emotion recognition is susceptible to interference such as feature redundancy and speaker gender differences, resulting in low recognition accuracy. This paper proposes a speech emotion recognition (SER) method based on attention mixed convolutional neural network (MCNN) combined with gender information, including two stages of gender recognition and emotion recognition. (1) Using MCNN to identify gender and classify speech samples into male and female. (2) According to the output of the first stage classification, a gender-specific emotion recognition model is established by introducing coordinated attention and a series of gated recurrent network units connecting the attention mechanism (A-GRUs) to achieve emotion recognition results of different genders. The inputs of both stages are dynamic 3D MFCC features generated from the original speech database. The proposed method achieves 95.02% and 86.34% accuracy on EMO-DB and RAVDESS datasets, respectively. The experimental results show that the proposed SER system combined with gender information significantly improves the recognition performance.
引用
收藏
页码:50285 / 50294
页数:10
相关论文
共 50 条
  • [21] ATTENTION-BASED SPEECH RECOGNITION USING GAZE INFORMATION
    Segawa, Osamu
    Hayashi, Tomoki
    Takeda, Kazuya
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 465 - 470
  • [22] Cross-Corpus Speech Emotion Recognition Based on Causal Emotion Information Representation
    Fu, Hongliang
    Li, Qianqian
    Tao, Huawei
    Zhu, Chunhua
    Xie, Yue
    Guo, Ruxue
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (08) : 1097 - 1100
  • [23] Speech emotion recognition based on emotion perception
    Gang Liu
    Shifang Cai
    Ce Wang
    EURASIP Journal on Audio, Speech, and Music Processing, 2023
  • [24] Speech emotion recognition based on emotion perception
    Liu, Gang
    Cai, Shifang
    Wang, Ce
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
  • [25] Sparse Autoencoder with Attention Mechanism for Speech Emotion Recognition
    Sun, Ting-Wei
    Wu, An-Yeu
    2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2019), 2019, : 146 - 149
  • [26] Multi-modal Attention for Speech Emotion Recognition
    Pan, Zexu
    Luo, Zhaojie
    Yang, Jichen
    Li, Haizhou
    INTERSPEECH 2020, 2020, : 364 - 368
  • [27] A speech emotion recognition method for the elderly based on feature fusion and attention mechanism
    Jian, Qijian
    Xiang, Min
    Huang, Wei
    THIRD INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION; NETWORK AND COMPUTER TECHNOLOGY (ECNCT 2021), 2022, 12167
  • [28] SPEECH EMOTION RECOGNITION USING SEMANTIC INFORMATION
    Tzirakis, Panagiotis
    Anh Nguyen
    Zafeiriou, Stefanos
    Schuller, Bjoern W.
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6279 - 6283
  • [29] Emotional feature extraction based on phoneme information for speech emotion recognition
    Hyun, Kyang Hak
    Kim, Eun Ho
    Kwak, Yoon Keun
    2007 RO-MAN: 16TH IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, VOLS 1-3, 2007, : 797 - +
  • [30] English speech emotion recognition method based on speech recognition
    Liu, Man
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (2) : 391 - 398