A Combined CNN Architecture for Speech Emotion Recognition

被引:1
|
作者
Begazo, Rolinson [1 ]
Aguilera, Ana [2 ,3 ]
Dongo, Irvin [1 ,4 ]
Cardinale, Yudith [5 ]
机构
[1] Univ Catolica San Pablo, Elect & Elect Engn Dept, Arequipa 04001, Peru
[2] Univ Valparaiso, Fac Ingn, Escuela Ingn Informat, Valparaiso 2340000, Chile
[3] Univ Valparaiso, Interdisciplinary Ctr Biomed Res & Hlth Engn MEDIN, Valparaiso 2340000, Chile
[4] Univ Bordeaux, ESTIA Inst Technol, F-64210 Bidart, France
[5] Univ Int Valencia, Grp Invest Ciencia Datos, Valencia 46002, Spain
关键词
speech emotion recognition; deep learning; spectral features; spectrogram imaging; feature fusion; convolutional neural network; NEURAL-NETWORKS; FEATURES; CORPUS;
D O I
10.3390/s24175797
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Emotion recognition through speech is a technique employed in various scenarios of Human-Computer Interaction (HCI). Existing approaches have achieved significant results; however, limitations persist, with the quantity and diversity of data being more notable when deep learning techniques are used. The lack of a standard in feature selection leads to continuous development and experimentation. Choosing and designing the appropriate network architecture constitutes another challenge. This study addresses the challenge of recognizing emotions in the human voice using deep learning techniques, proposing a comprehensive approach, and developing preprocessing and feature selection stages while constructing a dataset called EmoDSc as a result of combining several available databases. The synergy between spectral features and spectrogram images is investigated. Independently, the weighted accuracy obtained using only spectral features was 89%, while using only spectrogram images, the weighted accuracy reached 90%. These results, although surpassing previous research, highlight the strengths and limitations when operating in isolation. Based on this exploration, a neural network architecture composed of a CNN1D, a CNN2D, and an MLP that fuses spectral features and spectogram images is proposed. The model, supported by the unified dataset EmoDSc, demonstrates a remarkable accuracy of 96%.
引用
收藏
页数:39
相关论文
共 50 条
  • [21] SPEECH EMOTION RECOGNITION WITH DUAL-SEQUENCE LSTM ARCHITECTURE
    Wang, Jianyou
    Xue, Michael
    Culhane, Ryan
    Diao, Enmao
    Ding, Jie
    Tarokh, Vahid
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6474 - 6478
  • [22] Static, Dynamic and Acceleration Features for CNN-Based Speech Emotion Recognition
    Khalifa, Intissar
    Ejbali, Ridha
    Napoletano, Paolo
    Schettini, Raimondo
    Zaied, Mourad
    AIXIA 2021 - ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13196 : 348 - 358
  • [23] A novel concatenated 1D-CNN model for speech emotion recognition
    Flower, T. Mary Little
    Jaya, T.
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 93
  • [24] Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model
    Mishra, Swami
    Bhatnagar, Nehal
    Prakasam, P.
    Sureshkumar, T. R.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (13) : 37603 - 37620
  • [25] A CNN-Assisted Enhanced Audio Signal Processing for Speech Emotion Recognition
    Mustaqeem
    Kwon, Soonil
    SENSORS, 2020, 20 (01)
  • [26] Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model
    Swami Mishra
    Nehal Bhatnagar
    Prakasam P
    Sureshkumar T. R
    Multimedia Tools and Applications, 2024, 83 : 37603 - 37620
  • [27] EFFICIENT SPEECH EMOTION RECOGNITION USING MULTI-SCALE CNN AND ATTENTION
    Peng, Zixuan
    Lu, Yu
    Pan, Shengfeng
    Liu, Yunfeng
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3020 - 3024
  • [28] Hybrid LSTM-Attention and CNN Model for Enhanced Speech Emotion Recognition
    Makhmudov, Fazliddin
    Kutlimuratov, Alpamis
    Cho, Young-Im
    APPLIED SCIENCES-BASEL, 2024, 14 (23):
  • [29] Speech Emotion Recognition Using Combined Multiple Pairwise Classifiers
    Heracleous, Panikos
    Mohammad, Yasser
    Yoneyama, Akio
    HCI INTERNATIONAL 2021 - LATE BREAKING POSTERS, HCII 2021, PT I, 2021, 1498 : 115 - 118
  • [30] Emotion Recognition Based On CNN
    Cao, Guolu
    Ma, Yuliang
    Meng, Xiaofei
    Gao, Yunyuan
    Meng, Ming
    PROCEEDINGS OF THE 38TH CHINESE CONTROL CONFERENCE (CCC), 2019, : 8627 - 8630