Development and Analysis of Convolutional Neural Network based Accurate Speech Emotion Recognition Models

被引:0
|
作者
Vijayan, Divya M. [1 ]
Arun, A., V [1 ]
Ganeshnath, R. [2 ]
Nath, Ajay S. A. [1 ]
Roy, Rajesh Cherian [3 ]
机构
[1] Model Engn Coll Kochi, Dept Elect, Ernakulam, India
[2] TKM Coll Engn, Dept Elect, Kollam, India
[3] Muthoot Inst Technol & Sci, Dept Comp Sci, Ernakulam, India
关键词
Speech Emotion Recognition; CNN; LSTM; Transformer encoder; Accuracy; RAVDESS dataset; CLASSIFICATION; DEEP;
D O I
10.1109/INDICON56171.2022.10040174
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Automatic speech recognition is a major topic in artificial intelligence and machine learning, with the intent of developing machines that can communicate with humans through speech. Recently, with the emergence of the deep-learning paradigm, end to-end models that extract features and train directly from the raw speech signal have been developed. With the goal of more precisely classifying emotions from speech, this paper presents a comparative analysis on two deep-learning architectures that improves on the models available in the literature in terms of accuracy. Using a combined CNN-LSTM architecture and a CNN-Transformer encoder architecture, this work analyses the complete deep learning strategy for extracting distinct spatial and temporal features and classifying the emotions from speech. Experiments are carried out on the RAVDESS dataset. The CNN-Transformer encoder network achieves high accuracy 82% in these networks, while the CNN-LSTM network achieves 74%.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Speech emotion recognition based on spiking neural network and convolutional neural network
    Du, Chengyan
    Liu, Fu
    Kang, Bing
    Hou, Tao
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 147
  • [2] Speech Emotion Recognition based on Interactive Convolutional Neural Network
    Cheng, Huihui
    Tang, Xiaoyu
    2020 IEEE 3RD INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP 2020), 2020, : 163 - 167
  • [3] Constructing Speech Emotion Recognition Model Based on Convolutional Neural Network
    Kuo, Jong-Yih
    Chen, Zhao-Ming
    Lin, Hui-Chi
    2021 28TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE WORKSHOPS (APSECW 2021), 2021, : 52 - 56
  • [4] Design of a Convolutional Neural Network for Speech Emotion Recognition
    Lee, Kyong Hee
    Kim, Do Hyun
    11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 1332 - 1335
  • [5] CONVOLUTIONAL NEURAL NETWORK TECHNIQUES FOR SPEECH EMOTION RECOGNITION
    Parthasarathy, Srinivas
    Tashev, Ivan
    2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 121 - 125
  • [6] Speech Emotion Recognition of Merged Features Based on Improved Convolutional Neural Network
    Peng, Wangyue
    Tang, Xiaoyu
    2019 2ND IEEE INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP), 2019, : 301 - 305
  • [7] Comparison of Neural Network Models for Speech Emotion Recognition
    Palo, Hemanta Kumar
    Sagar, Sangeet
    2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND BUSINESS ANALYTICS (ICDSBA 2018), 2018, : 127 - 131
  • [8] Improvement of Speech Emotion Recognition by Deep Convolutional Neural Network and Speech Features
    Mohanty, Aniruddha
    Cherukuri, Ravindranath C.
    Prusty, Alok Ranjan
    THIRD CONGRESS ON INTELLIGENT SYSTEMS, CIS 2022, VOL 1, 2023, 608 : 117 - 129
  • [9] Speech emotion recognition based on improved masking EMD and convolutional recurrent neural network
    Sun, Congshan
    Li, Haifeng
    Ma, Lin
    FRONTIERS IN PSYCHOLOGY, 2023, 13
  • [10] Speech Emotion Recognition Based on Multiple Acoustic Features and Deep Convolutional Neural Network
    Bhangale, Kishor
    Kothandaraman, Mohanaprasad
    ELECTRONICS, 2023, 12 (04)