Development and Analysis of Convolutional Neural Network based Accurate Speech Emotion Recognition Models

被引:0
|
作者
Vijayan, Divya M. [1 ]
Arun, A., V [1 ]
Ganeshnath, R. [2 ]
Nath, Ajay S. A. [1 ]
Roy, Rajesh Cherian [3 ]
机构
[1] Model Engn Coll Kochi, Dept Elect, Ernakulam, India
[2] TKM Coll Engn, Dept Elect, Kollam, India
[3] Muthoot Inst Technol & Sci, Dept Comp Sci, Ernakulam, India
关键词
Speech Emotion Recognition; CNN; LSTM; Transformer encoder; Accuracy; RAVDESS dataset; CLASSIFICATION; DEEP;
D O I
10.1109/INDICON56171.2022.10040174
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Automatic speech recognition is a major topic in artificial intelligence and machine learning, with the intent of developing machines that can communicate with humans through speech. Recently, with the emergence of the deep-learning paradigm, end to-end models that extract features and train directly from the raw speech signal have been developed. With the goal of more precisely classifying emotions from speech, this paper presents a comparative analysis on two deep-learning architectures that improves on the models available in the literature in terms of accuracy. Using a combined CNN-LSTM architecture and a CNN-Transformer encoder architecture, this work analyses the complete deep learning strategy for extracting distinct spatial and temporal features and classifying the emotions from speech. Experiments are carried out on the RAVDESS dataset. The CNN-Transformer encoder network achieves high accuracy 82% in these networks, while the CNN-LSTM network achieves 74%.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Deep and shallow features fusion based on deep convolutional neural network for speech emotion recognition
    Sun L.
    Chen J.
    Xie K.
    Gu T.
    International Journal of Speech Technology, 2018, 21 (04) : 931 - 940
  • [22] Continuous Speech Recognition based on Convolutional Neural Network
    Zhang, Qing-qing
    Liu, Yong
    Pan, Jie-lin
    Yan, Yong-hong
    SEVENTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2015), 2015, 9631
  • [23] Speech Emotion Recognition Based on Temporal-Spatial Learnable Graph Convolutional Neural Network
    Yan, Jingjie
    Li, Haihua
    Xu, Fengfeng
    Zhou, Xiaoyang
    Liu, Ying
    Yang, Yuan
    ELECTRONICS, 2024, 13 (11)
  • [24] Speech Emotion Recognition Based on Multi-Task Learning Using a Convolutional Neural Network
    Kim, Nam Kyun
    Lee, Jiwon
    Ha, Hun Kyu
    Lee, Geon Woo
    Lee, Jung Hyuk
    Kim, Hong Kook
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 704 - 707
  • [25] Speech Emotion Recognition using Convolutional Neural Network with Audio Word-based Embedding
    Huang, Kun-Yi
    Wu, Chung-Hsien
    Hong, Qian-Bei
    Su, Ming-Hsiang
    Zeng, Yuan-Rong
    2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 265 - 269
  • [26] Speech Emotion Recognition Using Generative Adversarial Network and Deep Convolutional Neural Network
    Bhangale, Kishor
    Kothandaraman, Mohanaprasad
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (04) : 2341 - 2384
  • [27] Speech Emotion Recognition Using Generative Adversarial Network and Deep Convolutional Neural Network
    Kishor Bhangale
    Mohanaprasad Kothandaraman
    Circuits, Systems, and Signal Processing, 2024, 43 : 2341 - 2384
  • [28] Multimodal speech emotion recognition and classification using convolutional neural network techniques
    Christy, A.
    Vaithyasubramanian, S.
    Jesudoss, A.
    Praveena, M. D. Anto
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (02) : 381 - 388
  • [29] Multimodal speech emotion recognition and classification using convolutional neural network techniques
    A. Christy
    S. Vaithyasubramanian
    A. Jesudoss
    M. D. Anto Praveena
    International Journal of Speech Technology, 2020, 23 : 381 - 388
  • [30] Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions
    Nam, Youngja
    Lee, Chankyu
    SENSORS, 2021, 21 (13)