Dimensional Speech Emotion Recognition Review

被引:0
|
作者
Li H.-F. [1 ,2 ]
Chen J. [1 ]
Ma L. [1 ,2 ]
Bo H.-J. [2 ]
Xu C. [1 ]
Li H.-W. [1 ]
机构
[1] School of Computer Science and Technology, Harbin Institute of Technology, Harbin
[2] Shenzhen Academy of Aerospace Technology, Shenzhen
来源
Ruan Jian Xue Bao/Journal of Software | 2020年 / 31卷 / 08期
基金
中国国家自然科学基金;
关键词
Affective computing; Cognitive theory of speech emotion; Dimensional emotion model; Dimensional emotion recognition algorithm; Emotion-related feature extraction from speech;
D O I
10.13328/j.cnki.jos.006078
中图分类号
学科分类号
摘要
Emotion recognition is an interdisciplinary research field which relates to cognitive science, psychology, signal processing, pattern recognition, artificial intelligence, and so on, aiming at helping computer understand human emotion state to realize natural human-computer interaction. In this survey, the psychological theory of emotion is firstly introduced as the theoretical basis for the emotion model used in emotion recognition, including appraisal theory, dimensional models of emotion, brain mechanisms, and computing models. Then, the advanced technologies of dimensional emotion recognition from the artificial intelligence perspective, such as the speech emotion corpora, feature extraction, classification, are presented in detail. Finally, the challenges of dimensional emotion recognition are discussed and the workable solutions and future research directions are proposed. © Copyright 2020, Institute of Software, the Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:2465 / 2491
页数:26
相关论文
共 153 条
  • [1] Crystal D., Non-segmental phonology in language acquisition: A review of the issues, Lingua, 32, 1-2, pp. 1-45, (1973)
  • [2] Liebenthal E, Silbersweig DA, Stern E., The language, tone and prosody of emotions: Neural substrates and dynamics of spoken-word emotion perception, Frontiers in Neuroscience, 10, 506, (2016)
  • [3] Murray IR, Arnott JL., Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion, The Journal of the Acoustical Society of America, 93, 2, pp. 1097-1108, (1993)
  • [4] Williams CE, Stevens KN., Emotions and speech: Some acoustical correlates, The Journal of the Acoustical Society of America, 52, 4B, pp. 1238-1250, (1972)
  • [5] Murray IR, Arnott JL., Synthesizing emotions in speech: Is it time to get excited?, Proc. of the 4th Int'l Conf. on Spoken Language Processing (ICSLP'96), (1996)
  • [6] Valstar M, Gratch J, Schuller B, Et al., Depression, mood, and emotion recognition workshop and challenge, Proc. of the 6th Int'l Workshop on Audio/Visual Emotion Challenge (AVEC 2016), pp. 3-10, (2016)
  • [7] Dhall A, Kaur A, Goecke R, Gedeon T., Emotiw 2018: Audio-video, student engagement and group-level affect prediction, Proc. of the 2018 on Int'l Conf. on Multimodal Interaction, (2018)
  • [8] Li Y, Tao J, Schuller B, Shan S, Jiang D, Jia J., Mec 2016: The multimodal emotion recognition challenge of CCPR 2016, Proc. of the Chinese Conf. on Pattern Recognition, (2016)
  • [9] Li Y, Tao J, Schuller B, Shan S, Jiang D, Jia J., Mec 2017: Multimodal emotion recognition challenge, Proc. of the 2018 1st Asian Conf. on Affective Computing and Intelligent Interaction (ACII Asia), (2018)
  • [10] Christianson SA., The Handbook of Emotion and Memory: Research and Theory, (2014)