Fusing traditionally extracted features with deep learned features from the speech spectrogram for anger and stress detection using convolution neural network

被引:0
|
作者
Shalini Kapoor
Tarun Kumar
机构
[1] Research Scholar,Department of Computer Science & Engineering
[2] Dr. A.P.J Abdul Kalam Technical University,undefined
[3] Radha Govind Group of Institution,undefined
来源
关键词
Speech emotion recognition; Convolutional neural networks; Deep learning; Emotion change detection; Spectrograms;
D O I
暂无
中图分类号
学科分类号
摘要
Stress and anger are two negative emotions that affect individuals both mentally and physically; there is a need to tackle them as soon as possible. Automated systems are highly required to monitor mental states and to detect early signs of emotional health issues. In the present work convolutional neural network is proposed for anger and stress detection using handcrafted features and deep learned features from the spectrogram. The objective of using a combined feature set is gathering information from two different representations of speech signals to obtain more prominent features and to boost the accuracy of recognition. The proposed method of emotion assessment is more computationally efficient than similar approaches used for emotion assessment. The preliminary results obtained on experimental evaluation of the proposed approach on three datasets Toronto Emotional Speech Set (TESS), Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and Berlin Emotional Database (EMO-DB) indicate that categorical accuracy is boosted and cross-entropy loss is reduced to a considerable extent. The proposed convolutional neural network (CNN) obtains training (T) and validation (V) categorical accuracy of T = 93.7%, V = 95.6% for TESS, T = 97.5%, V = 95.6% for EMO-DB and T = 96.7%, V = 96.7% for RAVDESS dataset.
引用
收藏
页码:31107 / 31128
页数:21
相关论文
共 50 条
  • [1] Fusing traditionally extracted features with deep learned features from the speech spectrogram for anger and stress detection using convolution neural network
    Kapoor, Shalini
    Kumar, Tarun
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (21) : 31107 - 31128
  • [2] A deep neural network approach to heart murmur detection using spectrogram and peak interval features
    Han, Soyul
    Kang, Taein
    Lee, Jungguk
    Kim, Narin
    Won, Hyejin
    Kim, Yeong-Hwa
    Gong, Wuming
    Kwak, Il-Youp
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
  • [3] Interpretation of a deep analysis of speech imagery features extracted by a capsule neural network
    Macias-Macias, Jose M.
    Ramirez-Quintana, Juan A.
    Chacon-Murguia, Mario I.
    Torres-Garcia, Alejandro A.
    Corral-Martinez, Luis F.
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 159
  • [4] Detection of microcytic hypochromia using cbc and blood film features extracted from convolution neural network by different classifiers
    Shikha Purwar
    Rajiv Kumar Tripathi
    Ravi Ranjan
    Renu Saxena
    Multimedia Tools and Applications, 2020, 79 : 4573 - 4595
  • [5] Detection of microcytic hypochromia using cbc and blood film features extracted from convolution neural network by different classifiers
    Purwar, Shikha
    Tripathi, Rajiv Kumar
    Ranjan, Ravi
    Saxena, Renu
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (7-8) : 4573 - 4595
  • [6] Investigating Modulation Spectrogram Features for Deep Neural Network-based Automatic Speech Recognition
    Baby, Deepak
    Van Hamme, Hugo
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2479 - 2483
  • [7] Convolutional Neural Network with Spectrogram and Perceptual Features for Speech Emotion Recognition
    Zhang, Linjuan
    Wang, Longbiao
    Dang, Jianwu
    Guo, Lili
    Guan, Haotian
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT IV, 2018, 11304 : 62 - 71
  • [8] Mandarin Speech Recognition Using Convolution Neural Network with Augmented Tone Features
    Hu, Xinhui
    Lu, Xugang
    Hori, Chiori
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 15 - 18
  • [9] Arrhythmia Detection Using Deep Belief Network Extracted Features From ECG Signals
    Gourisaria, Mahendra Kumar
    Harshvardhan, G. M.
    Agrawal, Rakshit
    Patra, Sudhansu Shekhar
    Rautaray, Siddharth Swarup
    Pandey, Manjusha
    INTERNATIONAL JOURNAL OF E-HEALTH AND MEDICAL COMMUNICATIONS, 2021, 12 (06)
  • [10] Speech Enhancement using Convolution Neural Network-based Spectrogram Denoising
    Hu Xuhong
    Yan Lin-Huang
    Lu Xun
    Guan Yuan-Sheng
    Hu Wenlin
    Wang Jie
    PROCEEDINGS OF 2021 7TH INTERNATIONAL CONFERENCE ON CONDITION MONITORING OF MACHINERY IN NON-STATIONARY OPERATIONS (CMMNO), 2021, : 310 - 318