Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features

被引:0
|
作者
Koolagudi, Shashidhar [1 ]
Krothapalli, Sreenivasa [1 ]
机构
[1] Indian Inst Technol Kharagpur, Sch Informat Technol, Kharagpur 721302, W Bengal, India
关键词
Emotion recognition; Consonant region; CV transition region; Pitch synchronous analysis; Spectral features; Vowel onset point; Vowel region;
D O I
10.1007/s10772-012-9150-8
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this work, spectral features extracted from sub-syllabic regions and pitch synchronous analysis are proposed for speech emotion recognition. Linear prediction cepstral coefficients, mel frequency cepstral coefficients and the features extracted from high amplitude regions of spectrum are used to represent emotion specific spectral information. These features are extracted from consonant, vowel and transition regions of each syllable to study the contribution of these regions toward recognition of emotions. Consonant, vowel and the transition regions are determined using vowel onset points. Spectral features extracted from each pitch cycle, are also used to recognize emotions present in speech. The emotions used in this study are: anger, fear, happy, neutral and sad. The emotion recognition performance using sub-syllabic speech segments are compared with the results of conventional block processing approach, where entire speech signal is processed frame by frame. The proposed emotion specific features are evaluated using simulated emotion speech corpus, IITKGP-SESC (Indian Institute of Technology, KharaGPur-Simulated Emotion Speech Corpus). The emotion recognition results obtained using IITKGP-SESC are compared with the results of Berlin emotion speech corpus. Emotion recognition systems are developed using Gaussian mixture models and auto-associative neural networks. The purpose of this study is to explore sub-syllabic regions to identify the emotions embedded in a speech signal, and if possible, to avoid processing of entire speech signal for emotion recognition without serious compromise in the performance.
引用
收藏
页码:495 / 511
页数:17
相关论文
共 50 条
  • [41] Speech Emotion Recognition Using Magnitude and Phase Features
    Shankar D.R.
    Manjula R.B.
    Biradar R.C.
    SN Computer Science, 5 (5)
  • [42] RECOGNITION OF EMOTION IN SPEECH USING VARIOGRAM BASED FEATURES
    Esmaileyan, Zeynab
    Marvi, Hosein
    MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2014, 27 (03) : 156 - 170
  • [43] Speech Emotion Recognition Using ANN on MFCC Features
    Dolka, Harshit
    Xavier, Arul V. M.
    Juliet, Sujitha
    ICSPC'21: 2021 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION (ICPSC), 2021, : 431 - 435
  • [44] Speech Emotion Recognition Using Local and Global Features
    Gao, Yuanbo
    Li, Baobin
    Wang, Ning
    Zhu, Tingshao
    BRAIN INFORMATICS, BI 2017, 2017, 10654 : 3 - 13
  • [45] Emotion recognition using novel speech signal features
    Tabatabaei, Talieh Seyed
    Krishnan, Sridhar
    Guergachi, Aziz
    2007 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, 2007, : 345 - +
  • [46] Speech emotion recognition using multi resolution Hilbert transform based spectral and entropy features
    Mishra, Siba Prasad
    Warule, Pankaj
    Deb, Suman
    APPLIED ACOUSTICS, 2025, 229
  • [47] Excitation Features of Speech for Emotion Recognition Using Neutral Speech as Reference
    Kadin, Sudarsana Reddy
    Gangamohan, P.
    Gangashetty, Suryakanth, V
    Alku, Paavo
    Yegnanarayana, B.
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2020, 39 (09) : 4459 - 4481
  • [48] Excitation Features of Speech for Emotion Recognition Using Neutral Speech as Reference
    Sudarsana Reddy Kadiri
    P. Gangamohan
    Suryakanth V. Gangashetty
    Paavo Alku
    B. Yegnanarayana
    Circuits, Systems, and Signal Processing, 2020, 39 : 4459 - 4481
  • [49] Time course of syllabic and sub-syllabic processing in Mandarin word production: Evidence from the picture-word interference paradigm
    Jie Wang
    Andus Wing-Kuen Wong
    Hsuan-Chih Chen
    Psychonomic Bulletin & Review, 2018, 25 : 1147 - 1152
  • [50] Amplitude Modulation Features for Emotion Recognition from Speech
    Alam, Md Jahangir
    Attabi, Yazid
    Dumouchel, Pierre
    Kenny, Patrick
    O'Shaughnessy, D.
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2419 - 2423