TOWARD ROBUST SPEECH EMOTION RECOGNITION AND CLASSIFICATION USING NATURAL LANGUAGE PROCESSING WITH DEEP LEARNING MODEL

被引:0
|
作者
Alahmari, Saad [1 ]
Al-shathry, Najla i. [2 ]
Eltahir, Majdy m. [3 ]
Alzaidi, Muhammad swaileh a. [4 ]
Alghamdi, Ayman ahmad [5 ]
Mahmud, Ahmed [6 ]
机构
[1] Northern Border Univ, Appl Coll, Dept Comp Sci, Ar Ar, Saudi Arabia
[2] Princess Nourah Bint Abdulrahman Univ, Arab Language Teaching Inst, Dept Language Preparat, POB 84428, Riyadh 11671, Saudi Arabia
[3] King Khalid Univ, Appl Coll Mahayil, Dept Informat Syst, Abha, Saudi Arabia
[4] King Saud Univ, Coll Language Sci, Dept English Language, POB 145111, Riyadh, Saudi Arabia
[5] Umm Al qura Univ, Arab Language Inst, Dept Arab Teaching, Mecca, Saudi Arabia
[6] Future Univ Egypt, Res Ctr, New Cairo 11835, Egypt
关键词
Speech Emotion Recognition; Deep Learning; Fractal Seagull Optimization Algorithm; Feature Extraction;
D O I
10.1142/S0218348X25400225
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Speech Emotion Recognition (SER) plays a significant role in human-machine interaction applications. Over the last decade, many SER systems have been anticipated. However, the performance of the SER system remains a challenge owing to the noise, high system complexity and ineffective feature discrimination. SER is challenging and vital, and feature extraction is critical in SER performance. Deep Learning (DL)-based techniques emerge as proficient solutions for SER due to their competence in learning unlabeled data, superior capability of feature representation, capability to handle larger datasets and ability to handle complex features. Different DL techniques, like Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), Deep Neural Network (DNN) and so on, are successfully presented for automated SER. The study proposes a Robust SER and Classification using the Natural Language Processing with DL (RSERC-NLPDL) model. The presented RSERC-NLPDL technique intends to identify the emotions in the speech signals. In the RSERC-NLPDL technique, pre-processing is initially performed to transform the input speech signal into a valid format. Besides, the RSERC-NLPDL technique extracts a set of features comprising Mel-Frequency Cepstral Coefficients (MFCCs), Zero-Crossing Rate (ZCR), Harmonic-to-Noise Rate (HNR) and Teager Energy Operator (TEO). Next, selecting features can be carried out using Fractal Seagull Optimization Algorithm (FSOA). The Temporal Convolutional Autoencoder (TCAE) model is applied to identify speech emotions, and its hyperparameters are selected using fractal Sand Cat Swarm Optimization (SCSO) algorithm. The simulation analysis of the RSERC-NLPDL method is tested using a speech database. The investigational analysis of the RSERC-NLPDL technique showed superior accuracy outcomes of 94.32% and 95.25% under EMODB and RAVDESS datasets over other models in distinct measures.
引用
收藏
页数:15
相关论文
共 50 条
  • [11] Pattern recognition and features selection for speech emotion recognition model using deep learning
    Jermsittiparsert, Kittisak
    Abdurrahman, Abdurrahman
    Siriattakul, Parinya
    Sundeeva, Ludmila A.
    Hashim, Wahidah
    Rahim, Robbi
    Maseleno, Andino
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (04) : 799 - 806
  • [12] Pattern recognition and features selection for speech emotion recognition model using deep learning
    Kittisak Jermsittiparsert
    Abdurrahman Abdurrahman
    Parinya Siriattakul
    Ludmila A. Sundeeva
    Wahidah Hashim
    Robbi Rahim
    Andino Maseleno
    International Journal of Speech Technology, 2020, 23 : 799 - 806
  • [13] Towards Emotion Cause Generation in Natural Language Processing using Deep Learning
    Riyadh, Md
    Shafiq, M. Omair
    2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 140 - 147
  • [14] Robust Features for Emotion Recognition from Speech by Using Gaussian Mixture Model Classification
    Navyasri, M.
    RajeswarRao, R.
    DaveeduRaju, A.
    Ramakrishnamurthy, M.
    INFORMATION AND COMMUNICATION TECHNOLOGY FOR INTELLIGENT SYSTEMS (ICTIS 2017) - VOL 2, 2018, 84 : 437 - 444
  • [15] Exploring Speech Emotion Recognition in Tribal Language with Deep Learning Techniques
    Nayak, Subrat Kumar
    Nayak, Ajit Kumar
    Mishra, Smitaprava
    Mohanty, Prithviraj
    Tripathy, Nrusingha
    Chaudhury, Kumar Surjeet
    INTERNATIONAL JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING SYSTEMS, 2025, 16 (01) : 53 - 64
  • [16] Speech Emotion Recognition with Deep Learning
    Harar, Pavol
    Burget, Radim
    Dutta, Malay Kishore
    2017 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2017, : 137 - 140
  • [17] AUTOMATED MULTI-DIALECT SPEECH RECOGNITION USING STACKED ATTENTION-BASED DEEP LEARNING WITH NATURAL LANGUAGE PROCESSING MODEL
    AL Mazroa, Alanoud
    Miled, Achraf ben
    Asiri, Mashael m
    Alzahrani, Yazeed
    Sayed, Ahmed
    Nafie, Faisal mohammed
    FRACTALS-COMPLEX GEOMETRY PATTERNS AND SCALING IN NATURE AND SOCIETY, 2024, 32 (09N10)
  • [18] Deep learning: from speech recognition to language and multimodal processing
    Deng, Li
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2016, 5
  • [19] Deep learning based Affective Model for Speech Emotion Recognition
    Zhou, Xi
    Guo, Junqi
    Bie, Rongfang
    2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 841 - 846
  • [20] Robust Feature Selection-Based Speech Emotion Classification Using Deep Transfer Learning
    Akinpelu, Samson
    Viriri, Serestina
    APPLIED SCIENCES-BASEL, 2022, 12 (16):