Speech Databases, Speech Features, and Classifiers in Speech Emotion Recognition: A Review

被引:0
|
作者
Dar, G. H. Mohmad [1 ]
Delhibabu, Radhakrishnan [2 ]
机构
[1] Vellore Inst Technol, Sch Adv Sci, Vellore 632014, Tamil Nadu, India
[2] Vellore Inst Technol, Sch Comp Sci & Engn, Vellore 632014, Tamil Nadu, India
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Speech emotion recognition; machine learning; deep learning; affective computing; support vector machine; random forest; Gaussian mixture model; audio features; databases; classifiers; COMMUNICATING EMOTION; INFORMATION FUSION; REPRESENTATIONS; IMPLEMENTATION; AUTOENCODER; GENERATION; DEPRESSION; EXPRESSION; NETWORKS; VALENCE;
D O I
10.1109/ACCESS.2024.3476960
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Emotion recognition from speech signals plays a crucial role in Human-Machine Interaction (HMI), particularly in the development of applications such as affective computing and interactive systems. This review seeks to provide an in-depth examination of current methodologies in speech emotion recognition (SER), with a focus on databases, feature extraction techniques, and classification models. It has been done in the past using low-level descriptors (LLDs) like Mel-Frequency Cepstral Coefficients (MFCCs), linear predictive coding (LPC), and pitch-based features in methods like Support Vector Machines (SVM), Random Forests (RF), and Gaussian Mixture Models (GMM). But the development of deep learning techniques has completely changed the field. Models like convolutional neural networks (CNNs) and long short-term memory (LSTM) networks have shown that they are better at capturing the complex temporal and spectral features of speech. This paper reviews prominent speech emotion datasets, exploring their linguistic diversity, annotation processes, and emotional labels. It also analyzes the efficacy of different speech features and classifiers in handling challenges such as data imbalance, limited data availability, and cross-lingual variations. The review highlights the need for future work to address real-time processing, context-sensitive emotion detection, and the integration of multi-modal data to enhance the performance of SER systems. By consolidating recent advancements and identifying areas for further research, this paper aims to provide a clearer path for optimizing feature extraction and classification techniques in the field of emotion recognition.
引用
收藏
页码:151122 / 151152
页数:31
相关论文
共 50 条
  • [21] Speech Emotion Recognition with Cross-lingual Databases
    Chiou, Bo-Chang
    Chen, Chia-Ping
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 558 - 561
  • [22] Emotion recognition from speech: a review
    Koolagudi, Shashidhar G.
    Rao, K. Sreenivasa
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2012, 15 (02) : 99 - 117
  • [23] From Simulated Speech to Natural Speech, What are the Robust Features for Emotion Recognition?
    Li, Ya
    Chao, Linlin
    Liu, Yazhu
    Bao, Wei
    Tao, Jianhua
    2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 368 - 373
  • [24] Improvement of Speech Emotion Recognition by Deep Convolutional Neural Network and Speech Features
    Mohanty, Aniruddha
    Cherukuri, Ravindranath C.
    Prusty, Alok Ranjan
    THIRD CONGRESS ON INTELLIGENT SYSTEMS, CIS 2022, VOL 1, 2023, 608 : 117 - 129
  • [25] Emotion recognition of mandarin speech for different speech corpora based on nonlinear features
    Gao, Hui
    Chen, Shanguang
    An, Ping
    Su, Guangchuan
    PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 567 - +
  • [26] Novel acoustic features for speech emotion recognition
    ROH Yong-Wan
    KIM Dong-Ju
    LEE Woo-Seok
    HONG Kwang-Seok
    Science in China(Series E:Technological Sciences), 2009, 52 (07) : 1838 - 1848
  • [27] Exploiting the potentialities of features for speech emotion recognition
    Li, Dongdong
    Zhou, Yijun
    Wang, Zhe
    Gao, Daqi
    INFORMATION SCIENCES, 2021, 548 : 328 - 343
  • [28] Significance of Phonological Features in Speech Emotion Recognition
    Wei Wang
    Paul A. Watters
    Xinyi Cao
    Lingjie Shen
    Bo Li
    International Journal of Speech Technology, 2020, 23 : 633 - 642
  • [29] Learning Transferable Features for Speech Emotion Recognition
    Marczewski, Alison
    Veloso, Adriano
    Ziviani, Nivio
    PROCEEDINGS OF THE THEMATIC WORKSHOPS OF ACM MULTIMEDIA 2017 (THEMATIC WORKSHOPS'17), 2017, : 529 - 536
  • [30] Applying articulatory features to speech emotion recognition
    Zhou, Yu
    Sun, Yanqing
    Yang, Lin
    Yan, Yonghong
    2009 INTERNATIONAL CONFERENCE ON RESEARCH CHALLENGES IN COMPUTER SCIENCE, ICRCCS 2009, 2009, : 73 - 76