Speech Emotion Recognition Based on Attention MCNN Combined With Gender Information

被引:5
|
作者
Hu, Zhangfang [1 ]
LingHu, Kehuan [1 ]
Yu, Hongling [1 ]
Liao, Chenzhuo [1 ]
机构
[1] Chongqing Univ Posts & Telecommun CQUPT, Key Lab Optoelect Informat Sensing & Technol, Chongqing 400065, Peoples R China
基金
中国国家自然科学基金;
关键词
Emotion recognition; Speech recognition; Mel frequency cepstral coefficient; Gender issues; Feature extraction; Convolutional neural networks; Three-dimensional displays; SER; convolutional neural network; gender information; attention; GRU; FEATURES;
D O I
10.1109/ACCESS.2023.3278106
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Emotion recognition is susceptible to interference such as feature redundancy and speaker gender differences, resulting in low recognition accuracy. This paper proposes a speech emotion recognition (SER) method based on attention mixed convolutional neural network (MCNN) combined with gender information, including two stages of gender recognition and emotion recognition. (1) Using MCNN to identify gender and classify speech samples into male and female. (2) According to the output of the first stage classification, a gender-specific emotion recognition model is established by introducing coordinated attention and a series of gated recurrent network units connecting the attention mechanism (A-GRUs) to achieve emotion recognition results of different genders. The inputs of both stages are dynamic 3D MFCC features generated from the original speech database. The proposed method achieves 95.02% and 86.34% accuracy on EMO-DB and RAVDESS datasets, respectively. The experimental results show that the proposed SER system combined with gender information significantly improves the recognition performance.
引用
收藏
页码:50285 / 50294
页数:10
相关论文
共 50 条
  • [1] End-to-End Speech Emotion Recognition With Gender Information
    Sun, Ting-Wei
    IEEE ACCESS, 2020, 8 (08): : 152423 - 152438
  • [2] Empirical Interpretation of Speech Emotion Perception with Attention Based Model for Speech Emotion Recognition
    Jalal, Md Asif
    Milner, Rosanna
    Hain, Thomas
    INTERSPEECH 2020, 2020, : 4113 - 4117
  • [3] MULTI-HEAD ATTENTION FOR SPEECH EMOTION RECOGNITION WITH AUXILIARY LEARNING OF GENDER RECOGNITION
    Nediyanchath, Anish
    Paramasivam, Periyasamy
    Yenigalla, Promod
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7179 - 7183
  • [4] Informative Speech Features based on Emotion Classes and Gender in Explainable Speech Emotion Recognition
    Yildirim, Huseyin Ediz
    Iren, Deniz
    2023 11TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2023,
  • [5] Combined CNN LSTM with attention for speech emotion recognition based on feature-level fusion
    Liu Y.
    Chen A.
    Zhou G.
    Yi J.
    Xiang J.
    Wang Y.
    Multimedia Tools and Applications, 2024, 83 (21) : 59839 - 59859
  • [6] SPEECH EMOTION RECOGNITION WITH CO-ATTENTION BASED MULTI-LEVEL ACOUSTIC INFORMATION
    Zou, Heqing
    Si, Yuke
    Chen, Chen
    Rajan, Deepu
    Chng, Eng Siong
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7367 - 7371
  • [7] Attention Based Fully Convolutional Network for Speech Emotion Recognition
    Zhang, Yuanyuan
    Du, Jun
    Wang, Zirui
    Zhang, Jianshu
    Tu, Yanhui
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1771 - 1775
  • [8] A Joint Network Based on Interactive Attention for Speech Emotion Recognition
    Hu, Ying
    Hou, Shijing
    Yang, Huamin
    Huang, Hao
    He, Liang
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1715 - 1720
  • [9] Siamese Attention-Based LSTM for Speech Emotion Recognition
    Nizamidin, Tashpolat
    Zhao, Li
    Liang, Ruiyu
    Xie, Yue
    Hamdulla, Askar
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2020, E103A (07) : 937 - 941
  • [10] Attention-Based Dense LSTM for Speech Emotion Recognition
    Xie, Yue
    Liang, Ruiyu
    Liang, Zhenlin
    Zhao, Li
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (07): : 1426 - 1429