SPEECH EMOTION RECOGNITION USING AUTOENCODER BOTTLENECK FEATURES AND LSTM

被引:0
|
作者
Huang, Kun-Yi [1 ]
Wu, Chung-Hsien [1 ]
Yang, Tsung-Hsien [1 ]
Su, Ming-Hsiang [1 ]
Chou, Jia-Hui [1 ]
机构
[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan, Taiwan
关键词
Speech emotion recognition; bottleneck features; long-short term memory;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A complete emotional expression contains a complex temporal course in a conversation. Related research on utterance and segment-level processing lacks considering subtle differences in characteristics and historical information. In this work, as Deep Scattering Spectrum (DSS) can obtain more detailed energy distributions in frequency domain than the Low Level Descriptors (LLDs), this work combines LLDs and DSS as the speech features. Autoencoder neural network is then applied to extract the bottleneck features for dimensionality reduction. Finally, the long-short term memory (LSTM) is employed to characterize temporal variation of speech emotion for emotion recognition. For evaluation, the MHMC emotion database was collected and used for performance evaluation. Experimental results show that the proposed method using the bottleneck features from the combination of the LLDs and DSS achieved an emotion recognition accuracy of 98.1%, outperforming the systems using LLDs or DSS individually.
引用
收藏
页码:1 / 4
页数:4
相关论文
共 50 条
  • [21] Speech Emotion Recognition Using Local and Global Features
    Gao, Yuanbo
    Li, Baobin
    Wang, Ning
    Zhu, Tingshao
    BRAIN INFORMATICS, BI 2017, 2017, 10654 : 3 - 13
  • [22] A VECTOR QUANTIZED MASKED AUTOENCODER FOR SPEECH EMOTION RECOGNITION
    Sadok, Samir
    Leglaive, Simon
    Seguier, Renaud
    2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,
  • [23] Sparse Autoencoder with Attention Mechanism for Speech Emotion Recognition
    Sun, Ting-Wei
    Wu, An-Yeu
    2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2019), 2019, : 146 - 149
  • [24] Excitation Features of Speech for Emotion Recognition Using Neutral Speech as Reference
    Kadin, Sudarsana Reddy
    Gangamohan, P.
    Gangashetty, Suryakanth, V
    Alku, Paavo
    Yegnanarayana, B.
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2020, 39 (09) : 4459 - 4481
  • [25] Excitation Features of Speech for Emotion Recognition Using Neutral Speech as Reference
    Sudarsana Reddy Kadiri
    P. Gangamohan
    Suryakanth V. Gangashetty
    Paavo Alku
    B. Yegnanarayana
    Circuits, Systems, and Signal Processing, 2020, 39 : 4459 - 4481
  • [26] Performance Improvement of Speech Emotion Recognition by Neutral Speech Detection Using Autoencoder and Intermediate Representation
    Santoso, Jennifer
    Yamada, Takeshi
    Ishizuka, Kenkichi
    Hashimoto, Taiichi
    Makino, Shoji
    INTERSPEECH 2022, 2022, : 4700 - 4704
  • [27] Using an autoencoder with deformable templates to discover features for automated speech recognition
    Jaitly, Navdeep
    Hinton, Geoffrey E.
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1736 - 1739
  • [28] Using Denoising Autoencoder for Emotion Recognition
    Xia, Rui
    Liu, Yang
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2885 - 2888
  • [29] Speech Emotion Recognition Using Neural Network and Wavelet Features
    Roy, Tanmoy
    Marwala, Tshilidzi
    Chakraverty, S.
    RECENT TRENDS IN WAVE MECHANICS AND VIBRATIONS, WMVC 2018, 2020, : 427 - 438
  • [30] Speech Emotion Recognition Using Auditory Spectrogram and Cepstral Features
    Zhao, Shujie
    Yang, Yan
    Cohen, Israel
    Zhang, Lijun
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 136 - 140