PREDICTION-BASED LEARNING FOR CONTINUOUS EMOTION RECOGNITION IN SPEECH

被引:0
|
作者
Han, Jing [1 ]
Zhang, Zixing [1 ]
Ringeval, Fabien [2 ]
Schuller, Bjorn [1 ,3 ]
机构
[1] Univ Passau, Chair Complex Intelligent Syst, Passau, Germany
[2] Univ Grenoble Alpes, Lab Informat Grenoble, Grenoble, France
[3] Imperial Coll London, Dept Comp, London, England
基金
欧盟第七框架计划;
关键词
Affective computing; hierarchical regression models; support vector regression; long short-term memory; ALGORITHM;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, a prediction-based learning framework is proposed for a continuous prediction task of emotion recognition from speech, which is one of the key components of affective computing in multimedia. The main goal of this framework is to utmost exploit the individual advantages of different regression models cooperatively. To this end, we take two widely used regression models for example, i.e., support vector regression and bidirectional long short-term memory recurrent neural network. We concatenate the two models in a tandem structure by different ways, forming a united cascaded framework. The outputs predicted by the former model are combined together with the original features as the input of the following model for final predictions. The experimental results on a time-and value-continuous spontaneous emotion database (RECOLA) show that, the prediction-based learning framework significantly outperforms the individual models for both arousal and valence dimensions, and provides significantly better results in comparison to other state-of-the-art methodologies on this corpus.
引用
收藏
页码:5005 / 5009
页数:5
相关论文
共 50 条
  • [31] Speech emotion recognition based on an improved supervised manifold learning algorithm
    Zhang S.-Q.
    Li L.-M.
    Zhao Z.-J.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2010, 32 (11): : 2724 - 2729
  • [32] Effective MLP and CNN based ensemble learning for speech emotion recognition
    Middya A.I.
    Nag B.
    Roy S.
    Multimedia Tools and Applications, 2024, 83 (36) : 83963 - 83990
  • [33] An Emotion Recognition Method Using Speech Signals Based on Deep Learning
    Byun, Sung-woo
    Shin, Bo-ra
    Lee, Seok-Pil
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 124 : 181 - 182
  • [34] An Attention Pooling based Representation Learning Method for Speech Emotion Recognition
    Li, Pengcheng
    Song, Yan
    McLoughlin, Ian
    Guo, Wu
    Dai, Lirong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3087 - 3091
  • [35] Feature Selection Based Transfer Subspace Learning for Speech Emotion Recognition
    Song, Peng
    Zheng, Wenming
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2020, 11 (03) : 373 - 382
  • [36] Deep Learning Based Human Emotion Recognition from Speech Signal
    Queen, M. P. Flower
    Sankar, S. Perumal
    Aurtherson, P. Babu
    Jeyakumar, P.
    BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2020, 13 (06): : 119 - 124
  • [37] Speech emotion recognition based on transfer learning from the FaceNet frameworka)
    Liu, Shuhua
    Zhang, Mengyu
    Fang, Ming
    Zhao, Jianwei
    Hou, Kun
    Hung, Chih-Cheng
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2021, 149 (02): : 1338 - 1345
  • [38] Continuous Speech Emotion Recognition with Convolutional Neural Networks
    Vryzas, Nikolaos
    Vrysis, Lazaros
    Matsiola, Maria
    Kotsakis, Rigas
    Dimoulas, Charalampos
    Kalliris, George
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2020, 68 (1-2): : 14 - 24
  • [39] Continuous speech emotion recognition with convolutional neural networks
    Vryzas, Nikolaos
    Vrysis, Lazaros
    Matsiola, Maria
    Kotsakis, Rigas
    Dimoulas, Charalampos
    Kalliris, George
    AES: Journal of the Audio Engineering Society, 2020, 68 (1-2): : 14 - 24
  • [40] Speech emotion recognition with unsupervised feature learning
    Zheng-wei Huang
    Wen-tao Xue
    Qi-rong Mao
    Frontiers of Information Technology & Electronic Engineering, 2015, 16 : 358 - 366