EMOTION RECOGNITION FROM SPEECH VIA BOOSTED GAUSSIAN MIXTURE MODELS

被引:0
|
作者
Tang, Hao [1 ]
Chu, Stephen M. [2 ]
Hasegawa-Johnson, Mark [1 ]
Huang, Thomas S. [1 ]
机构
[1] Univ Illinois, Dept Elect & Comp Engn, 1406 W Green St, Urbana, IL 61801 USA
[2] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
Emotion recognition; Gaussian mixture model; Bayesian optimal classifier; EM algorithm; boosting;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Gaussian mixture models (GMMs) and the minimum error rate classifier (i.e. Bayesian optimal classifier) are popular and effective tools for speech emotion recognition. Typically, GMMs are used to model the class-conditional distributions of acoustic features and their parameters are estimated by the expectation maximization (EM) algorithm based on a training data set Then, classification is performed to minimize the classification error w.r.t. the estimated class-conditional distributions. We call this method the EM-GMM algorithm. In this paper, we introduce a boosting algorithm for reliably and accurately estimating the class-conditional GMMs. The resulting algorithm is named the Boosted-GMM algorithm. Our speech emotion recognition experiments show that the emotion recognition rates are effectively and significantly "boosted" by the Boosted-GMM algorithm as compared to the EM-GMM algorithm. This is due to the fact that the boosting algorithm can lead to more accurate estimates of the class-conditional GMMs, namely the class-conditional distributions of acoustic features.
引用
收藏
页码:294 / +
页数:2
相关论文
共 50 条
  • [41] Skew Gaussian mixture models for speaker recognition
    Matza, Avi
    Bistritz, Yuval
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 12 - 15
  • [42] Skew Gaussian mixture models for speaker recognition
    Matza, Avi
    Bistritz, Yuval
    IET SIGNAL PROCESSING, 2014, 8 (08) : 860 - 867
  • [43] Speech Emotion Recognition Based on Dynamic Models
    Lv, Guoyun
    Hu, Shuixian
    Lu, Xipan
    2014 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), VOLS 1-2, 2014, : 480 - 484
  • [44] Speech emotion recognition: Features and classification models
    Chen, Lijiang
    Mao, Xia
    Xue, Yuli
    Cheng, Lee Lung
    DIGITAL SIGNAL PROCESSING, 2012, 22 (06) : 1154 - 1160
  • [45] Cross-lingual subspace Gaussian mixture models for low-resource speech recognition
    1600, Institute of Electrical and Electronics Engineers Inc., United States (22):
  • [46] MAXIMUM A POSTERIORI ADAPTATION OF SUBSPACE GAUSSIAN MIXTURE MODELS FOR CROSS-LINGUAL SPEECH RECOGNITION
    Lu, Liang
    Ghoshal, Arnab
    Renals, Steve
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4877 - 4880
  • [47] Cross-Lingual Subspace Gaussian Mixture Models for Low-Resource Speech Recognition
    Lu, Liang
    Ghoshal, Arnab
    Renals, Steve
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (01) : 17 - 27
  • [48] DEEP NEURAL NETWORKS WITH AUXILIARY GAUSSIAN MIXTURE MODELS FOR REAL-TIME SPEECH RECOGNITION
    Lei, Xin
    Lin, Hui
    Heigold, Georg
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7634 - 7638
  • [49] Cross-lingual subspace Gaussian mixture models for low-resource speech recognition
    1600, Institute of Electrical and Electronics Engineers Inc., United States (22):
  • [50] Speech emotion recognition via learning analogies
    Ntalampiras, Stavros
    PATTERN RECOGNITION LETTERS, 2021, 144 : 21 - 26