Context-Independent Multilingual Emotion Recognition from Speech Signals

被引:42
|
作者
Vladimir Hozjan
Zdravko Kačič
机构
[1] University of Maribor,
[2] Faculty of Electrical Engineering and Computer Science,undefined
关键词
emotions; speech; emotion recognition; cross language emotion recognition;
D O I
10.1023/A:1023426522496
中图分类号
学科分类号
摘要
This paper presents and discusses an analysis of multilingual emotion recognition from speech with database-specific emotional features. Recognition was performed on English, Slovenian, Spanish, and French InterFace emotional speech databases. The InterFace databases included several neutral speaking styles and six emotions: disgust, surprise, joy, fear, anger and sadness. Speech features for emotion recognition were determined in two steps. In the first step, low-level features were defined and in the second high-level features were calculated from low-level features. Low-level features are composed from pitch, derivative of pitch, energy, derivative of energy, and duration of speech segments. High-level features are statistical presentations of low-level features. Database-specific emotional features were selected from high-level features that contain the most information about emotions in speech. Speaker-dependent and monolingual emotion recognisers were defined, as well as multilingual recognisers. Emotion recognition was performed using artificial neural networks. The achieved recognition accuracy was highest for speaker-dependent emotion recognition, smaller for monolingual emotion recognition and smallest for multilingual recognition. The database-specific emotional features are most convenient for use in multilingual emotion recognition. Among speaker-dependent, monolingual, and multilingual emotion recognition, the difference between emotion recognition with all high-level features and emotion recognition with database-specific emotional features is smallest for multilingual emotion recognition—3.84%.
引用
收藏
页码:311 / 320
页数:9
相关论文
共 50 条
  • [31] Context-dependent search in a context-independent network
    Brugnara, F
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 360 - 363
  • [32] CONTEXT-INDEPENDENT AND CONTEXT-DEPENDENT INFORMATION IN CONCEPTS
    BARSALOU, LW
    MEMORY & COGNITION, 1982, 10 (01) : 82 - 93
  • [33] CONSTRAINTS ON THE RANGE OF CONTEXT-INDEPENDENT PRIMING FROM AMBIGUOUS WORDS
    WILLIAMS, JN
    COLOMBO, L
    PSYCHOLOGICAL RESEARCH-PSYCHOLOGISCHE FORSCHUNG, 1995, 58 (01): : 38 - 50
  • [34] Speaker independent speech emotion recognition by ensemble classification
    Schuller, B
    Reiter, S
    Müller, R
    Al-Hames, M
    Lang, M
    Rigoll, G
    2005 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), VOLS 1 AND 2, 2005, : 865 - 868
  • [35] Exploration of an Independent Training Framework for Speech Emotion Recognition
    Zhong, Shunming
    Yu, Baoxian
    Zhang, Han
    IEEE ACCESS, 2020, 8 : 222533 - 222543
  • [36] Getting bored with HTK? Using HMMs for emotion recognition from speech signals
    Pittermann, Angela
    Pittermann, Johannes
    2006 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-4, 2006, : 704 - +
  • [37] Genetic Algorithm for High-Dimensional Emotion Recognition from Speech Signals
    Yue, Liya
    Hu, Pei
    Chu, Shu-Chuan
    Pan, Jeng-Shyang
    ELECTRONICS, 2023, 12 (23)
  • [38] Emotion Recognition from Speech Signals using Excitation Source and Spectral Features
    Choudhury, Akash Roy
    Ghosh, Anik
    Pandey, Rahul
    Barman, Subhas
    PROCEEDINGS OF 2018 IEEE APPLIED SIGNAL PROCESSING CONFERENCE (ASPCON), 2018, : 257 - 261
  • [39] Text Independent Speaker and Emotion Independent Speech Recognition in Emotional Environment
    Revathi, A.
    Venkataramani, Y.
    INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS, VOL 1, 2015, 339 : 43 - 52
  • [40] CONTEXT-INDEPENDENT PHONEME RECOGNITION USING A K-NEAREST NEIGHBOUR CLASSIFICATION APPROACH
    Golipour, Ladan
    O'Shaughnessy, Douglas
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 1341 - 1344