A Combined CNN Architecture for Speech Emotion Recognition

被引:1
|
作者
Begazo, Rolinson [1 ]
Aguilera, Ana [2 ,3 ]
Dongo, Irvin [1 ,4 ]
Cardinale, Yudith [5 ]
机构
[1] Univ Catolica San Pablo, Elect & Elect Engn Dept, Arequipa 04001, Peru
[2] Univ Valparaiso, Fac Ingn, Escuela Ingn Informat, Valparaiso 2340000, Chile
[3] Univ Valparaiso, Interdisciplinary Ctr Biomed Res & Hlth Engn MEDIN, Valparaiso 2340000, Chile
[4] Univ Bordeaux, ESTIA Inst Technol, F-64210 Bidart, France
[5] Univ Int Valencia, Grp Invest Ciencia Datos, Valencia 46002, Spain
关键词
speech emotion recognition; deep learning; spectral features; spectrogram imaging; feature fusion; convolutional neural network; NEURAL-NETWORKS; FEATURES; CORPUS;
D O I
10.3390/s24175797
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Emotion recognition through speech is a technique employed in various scenarios of Human-Computer Interaction (HCI). Existing approaches have achieved significant results; however, limitations persist, with the quantity and diversity of data being more notable when deep learning techniques are used. The lack of a standard in feature selection leads to continuous development and experimentation. Choosing and designing the appropriate network architecture constitutes another challenge. This study addresses the challenge of recognizing emotions in the human voice using deep learning techniques, proposing a comprehensive approach, and developing preprocessing and feature selection stages while constructing a dataset called EmoDSc as a result of combining several available databases. The synergy between spectral features and spectrogram images is investigated. Independently, the weighted accuracy obtained using only spectral features was 89%, while using only spectrogram images, the weighted accuracy reached 90%. These results, although surpassing previous research, highlight the strengths and limitations when operating in isolation. Based on this exploration, a neural network architecture composed of a CNN1D, a CNN2D, and an MLP that fuses spectral features and spectogram images is proposed. The model, supported by the unified dataset EmoDSc, demonstrates a remarkable accuracy of 96%.
引用
收藏
页数:39
相关论文
共 50 条
  • [31] Real Time Emotion Recognition from Facial Expressions Using CNN Architecture
    Ozdemir, Mehmet Akif
    Elagoz, Berkay
    Alaybeyoglu, Aysegul
    Sadighzadeh, Reza
    Akan, Aydin
    2019 MEDICAL TECHNOLOGIES CONGRESS (TIPTEKNO), 2019, : 417 - 420
  • [32] Speech Emotion Recognition
    Lalitha, S.
    Madhavan, Abhishek
    Bhushan, Bharath
    Saketh, Srinivas
    2014 INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRONICS, COMPUTERS AND COMMUNICATIONS (ICAECC), 2014,
  • [33] A novel decomposition-based architecture for multilingual speech emotion recognition
    Ravi
    Taran, Sachin
    NEURAL COMPUTING & APPLICATIONS, 2024, : 9347 - 9359
  • [34] Emotion Prompting for Speech Emotion Recognition
    Zhou, Xingfa
    Li, Min
    Yang, Lan
    Sun, Rui
    Wang, Xin
    Zhan, Huayi
    INTERSPEECH 2023, 2023, : 3108 - 3112
  • [35] 1D-CNN: Speech Emotion Recognition System Using a Stacked Network with Dilated CNN Features
    Mustaqeem
    Kwon, Soonil
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 67 (03): : 4039 - 4059
  • [36] Modeling Speech Emotion Recognition via Attention-Oriented Parallel CNN Encoders
    Makhmudov, Fazliddin
    Kutlimuratov, Alpamis
    Akhmedov, Farkhod
    Abdallah, Mohamed S.
    Cho, Young-Im
    ELECTRONICS, 2022, 11 (23)
  • [37] Simulation of English speech emotion recognition based on transfer learning and CNN neural network
    Chen, Xuehua
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (02) : 2349 - 2360
  • [38] Speech-based emotion recognition using a hybrid RNN-CNN network
    Ning, Jingtao
    Zhang, Wenchuan
    SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (01)
  • [39] Speech Emotion Recognition Based on Attention MCNN Combined With Gender Information
    Hu, Zhangfang
    LingHu, Kehuan
    Yu, Hongling
    Liao, Chenzhuo
    IEEE ACCESS, 2023, 11 : 50285 - 50294
  • [40] Facial Emotion Recognition Based on CNN
    Liu, Shuang
    Li, Dahua
    Gao, Qiang
    Song, Yu
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 398 - 403