Speaker normalisation for speech-based emotion detection

被引:32
|
作者
Sethu, Vidhyasaharan [1 ,2 ]
Ambikairajah, Eliathainby [1 ,2 ]
Epps, Julien [1 ,3 ]
机构
[1] Univ New S Wales, Sch Elect Engn & Telecommun, Sydney, NSW 2052, Australia
[2] NICTA, Sydney, NSW, Australia
[3] UNSW Asia, Singapore 248922, Singapore
关键词
feature warping; cumulative distribution mapping; emotion detection; hidden Markov model;
D O I
10.1109/ICDSP.2007.4288656
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The focus of this paper is on speech-based emotion detection utilising only acoustic data, i.e. without using any linguistic or semantic information. However, this approach in general Suffers from the fact that acoustic data is speaker-dependent, and can result in inefficient estimation of the statistics modelled by classifiers such as hidden Markov models (HMMs) and Gaussian mixture models (GMMs). We propose the use of speaker-specific feature warping as a means of normalising acoustic features to overcome the problem of speaker dependency. In this paper we compare the performance of a system that uses feature warping to one that does not, The back-end employs ail HMM-based classifier that captures the temporal variations of the feature vectors by modelling them as transitions between different states. Evaluations conducted oil the LDC Emotional Prosody speech corpus reveal a relative increase in classification accuracy of up to 20%.
引用
收藏
页码:611 / +
页数:2
相关论文
共 50 条
  • [1] SPEAKER VARIABILITY IN SPEECH BASED EMOTION MODELS - ANALYSIS AND NORMALISATION
    Sethu, Vidhyasaharan
    Epps, Julien
    Ambikairajah, Eliathamby
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7522 - 7526
  • [2] Avoiding dominance of speaker features in speech-based depression detection
    Zuo, Lishi
    Mak, Man-Wai
    PATTERN RECOGNITION LETTERS, 2023, 173 : 50 - 56
  • [3] Could speaker, gender or age awareness be beneficial in speech-based emotion recognition?
    Sidorov, Maxim
    Schmitt, Alexander
    Semenkin, Eugene
    Minker, Wolfgang
    Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, 2016, : 61 - 68
  • [4] Could Speaker, Gender or Age Awareness be beneficial in Speech-based Emotion Recognition?
    Sidorov, Maxim
    Schmitt, Alexander
    Semenkin, Eugene
    Minker, Wolfgang
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 61 - 68
  • [5] Speech-based Emotion Recognition and Speaker Identification: Static vs. Dynamic Mode of Speech Representation
    Sidorov, Maxim
    Minker, Wolfgang
    Semenkin, Eugene S.
    JOURNAL OF SIBERIAN FEDERAL UNIVERSITY-MATHEMATICS & PHYSICS, 2016, 9 (04): : 518 - 523
  • [6] Speech-Based Techniques for Emotion Detection in Natural Arabic Audio Files
    Kaloub, Ashraf
    Elgabar, Eltyeb Abed
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2025, 22 (01) : 139 - 157
  • [7] Enhancing accuracy and privacy in speech-based depression detection through speaker disentanglement
    Ravi, Vijay
    Wang, Jinhan
    Flint, Jonathan
    Alwan, Abeer
    COMPUTER SPEECH AND LANGUAGE, 2024, 86
  • [8] Effect of Reverberation in Speech-based Emotion Recognition
    Zhao, Shujie
    Yang, Yan
    Chen, Jingdong
    2018 IEEE INTERNATIONAL CONFERENCE ON THE SCIENCE OF ELECTRICAL ENGINEERING IN ISRAEL (ICSEE), 2018,
  • [9] An investigation of speech-based human emotion recognition
    Wang, YJ
    Guan, L
    2004 IEEE 6TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2004, : 15 - 18
  • [10] Towards Robust Speech-Based Emotion Recognition
    Tabatabaei, Talieh S.
    Krishnan, Sridhar
    2010 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2010), 2010,