Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011

被引:261
作者
Anagnostopoulos, Christos-Nikolaos [1 ]
Iliou, Theodoros [1 ]
Giannoukos, Ioannis [1 ]
机构
[1] Univ Aegean, Cultural Technol & Commun Dept, Lesvos Isl 81100, Lesbos, Greece
关键词
Speech features; Emotion recognition; Classifiers; CLASSIFICATION; FUSION; AUDIO; ASR;
D O I
10.1007/s10462-012-9368-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker emotion recognition is achieved through processing methods that include isolation of the speech signal and extraction of selected features for the final classification. In terms of acoustics, speech processing techniques offer extremely valuable paralinguistic information derived mainly from prosodic and spectral features. In some cases, the process is assisted by speech recognition systems, which contribute to the classification using linguistic information. Both frameworks deal with a very challenging problem, as emotional states do not have clear-cut boundaries and often differ from person to person. In this article, research papers that investigate emotion recognition from audio channels are surveyed and classified, based mostly on extracted and selected features and their classification methodology. Important topics from different classification techniques, such as databases available for experimentation, appropriate feature extraction and selection methods, classifiers and performance issues are discussed, with emphasis on research published in the last decade. This survey also provides a discussion on open trends, along with directions for future research on this topic.
引用
收藏
页码:155 / 177
页数:23
相关论文
共 100 条
[61]  
Pao TL, 2007, 2007 THIRD INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, VOL 1, PROCEEDINGS, P23
[62]  
Pao TL, 2007, 2007 THIRD INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, VOL 1, PROCEEDINGS, P35
[63]   Audiovisual discrimination between laughter and speech [J].
Petridis, Stavros ;
Pantic, Maja .
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, :5117-5120
[64]   Acoustic features extraction for emotion recognition [J].
Rong, Jia ;
Chen, Yi-Ping Phoebe ;
Chowdhury, Morshed ;
Li, Gang .
6TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE, PROCEEDINGS, 2007, :419-+
[65]   AFFECT GRID - A SINGLE-ITEM SCALE OF PLEASURE AND AROUSAL [J].
RUSSELL, JA ;
WEISS, A ;
MENDELSOHN, GA .
JOURNAL OF PERSONALITY AND SOCIAL PSYCHOLOGY, 1989, 57 (03) :493-502
[66]   Facial and vocal expressions of emotion [J].
Russell, JA ;
Bachorowski, JA ;
Fernández-Dols, JM .
ANNUAL REVIEW OF PSYCHOLOGY, 2003, 54 :329-349
[67]   Experimental study of affect bursts [J].
Schröber, M .
SPEECH COMMUNICATION, 2003, 40 (1-2) :99-116
[68]  
Schuller B, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P577
[69]  
Schuller B, 2003, 2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I, PROCEEDINGS, P401
[70]  
Schuller Bjorn, 2009, 2009 10th International Conference on Document Analysis and Recognition (ICDAR), P858, DOI 10.1109/ICDAR.2009.194