TOWARD ROBUST SPEECH EMOTION RECOGNITION AND CLASSIFICATION USING NATURAL LANGUAGE PROCESSING WITH DEEP LEARNING MODEL

被引:0
|
作者
Alahmari, Saad [1 ]
Al-shathry, Najla i. [2 ]
Eltahir, Majdy m. [3 ]
Alzaidi, Muhammad swaileh a. [4 ]
Alghamdi, Ayman ahmad [5 ]
Mahmud, Ahmed [6 ]
机构
[1] Northern Border Univ, Appl Coll, Dept Comp Sci, Ar Ar, Saudi Arabia
[2] Princess Nourah Bint Abdulrahman Univ, Arab Language Teaching Inst, Dept Language Preparat, POB 84428, Riyadh 11671, Saudi Arabia
[3] King Khalid Univ, Appl Coll Mahayil, Dept Informat Syst, Abha, Saudi Arabia
[4] King Saud Univ, Coll Language Sci, Dept English Language, POB 145111, Riyadh, Saudi Arabia
[5] Umm Al qura Univ, Arab Language Inst, Dept Arab Teaching, Mecca, Saudi Arabia
[6] Future Univ Egypt, Res Ctr, New Cairo 11835, Egypt
关键词
Speech Emotion Recognition; Deep Learning; Fractal Seagull Optimization Algorithm; Feature Extraction;
D O I
10.1142/S0218348X25400225
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Speech Emotion Recognition (SER) plays a significant role in human-machine interaction applications. Over the last decade, many SER systems have been anticipated. However, the performance of the SER system remains a challenge owing to the noise, high system complexity and ineffective feature discrimination. SER is challenging and vital, and feature extraction is critical in SER performance. Deep Learning (DL)-based techniques emerge as proficient solutions for SER due to their competence in learning unlabeled data, superior capability of feature representation, capability to handle larger datasets and ability to handle complex features. Different DL techniques, like Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), Deep Neural Network (DNN) and so on, are successfully presented for automated SER. The study proposes a Robust SER and Classification using the Natural Language Processing with DL (RSERC-NLPDL) model. The presented RSERC-NLPDL technique intends to identify the emotions in the speech signals. In the RSERC-NLPDL technique, pre-processing is initially performed to transform the input speech signal into a valid format. Besides, the RSERC-NLPDL technique extracts a set of features comprising Mel-Frequency Cepstral Coefficients (MFCCs), Zero-Crossing Rate (ZCR), Harmonic-to-Noise Rate (HNR) and Teager Energy Operator (TEO). Next, selecting features can be carried out using Fractal Seagull Optimization Algorithm (FSOA). The Temporal Convolutional Autoencoder (TCAE) model is applied to identify speech emotions, and its hyperparameters are selected using fractal Sand Cat Swarm Optimization (SCSO) algorithm. The simulation analysis of the RSERC-NLPDL method is tested using a speech database. The investigational analysis of the RSERC-NLPDL technique showed superior accuracy outcomes of 94.32% and 95.25% under EMODB and RAVDESS datasets over other models in distinct measures.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Speech Emotion Recognition Using Deep Neural Networks, Transfer Learning, and Ensemble Classification Techniques
    Mihalache, Serban
    Burileanu, Dragos
    ROMANIAN JOURNAL OF INFORMATION SCIENCE AND TECHNOLOGY, 2023, 26 (3-4): : 375 - 387
  • [22] Speech Emotion Recognition Using Deep Learning Techniques: A Review
    Khalil, Ruhul Amin
    Jones, Edward
    Babar, Mohammad Inayatullah
    Jan, Tariqullah
    Zafar, Mohammad Haseeb
    Alhussain, Thamer
    IEEE ACCESS, 2019, 7 : 117327 - 117345
  • [23] Emotion recognition from speech using deep learning on spectrograms
    Li, Xingguang
    Song, Wenjun
    Liang, Zonglin
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (03) : 2791 - 2796
  • [24] Speech Emotion Recognition Using Deep Learning on audio recordings
    Suganya, S.
    Charles, E. Y. A.
    2019 19TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER - 2019), 2019,
  • [25] Developing a negative speech emotion recognition model for safety systems using deep learning
    Jena, Shreya
    Basak, Sneha
    Agrawal, Himanshi
    Saini, Bunny
    Gite, Shilpa
    Kotecha, Ketan
    Alfarhood, Sultan
    JOURNAL OF BIG DATA, 2025, 12 (01)
  • [26] Towards Robust Speech Emotion Recognition using Deep Residual Networks for Speech Enhancement
    Triantafyllopoulos, Andreas
    Keren, Gil
    Wagner, Johannes
    Steiner, Ingmar
    Schuller, Bjorn W.
    INTERSPEECH 2019, 2019, : 1691 - 1695
  • [27] Toward Language-Agnostic Speech Emotion Recognition
    Ntalampiras, Stavros
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2020, 68 (1-2): : 7 - 13
  • [28] Language dialect based speech emotion recognition through deep learning techniques
    Sukumar Rajendran
    Sandeep Kumar Mathivanan
    Prabhu Jayagopal
    Maheshwari Venkatasen
    Thanapal Pandi
    Manivannan Sorakaya Somanathan
    Muthamilselvan Thangaval
    Prasanna Mani
    International Journal of Speech Technology, 2021, 24 : 625 - 635
  • [29] Language dialect based speech emotion recognition through deep learning techniques
    Rajendran, Sukumar
    Mathivanan, Sandeep Kumar
    Jayagopal, Prabhu
    Venkatasen, Maheshwari
    Pandi, Thanapal
    Sorakaya Somanathan, Manivannan
    Thangaval, Muthamilselvan
    Mani, Prasanna
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (03) : 625 - 635
  • [30] Noise-Robust Deep Learning Model for Emotion Classification Using Facial Expressions
    Oh, Seungjun
    Kim, Dong-Keun
    IEEE ACCESS, 2024, 12 : 143074 - 143089