Selective Acoustic Feature Enhancement for Speech Emotion Recognition With Noisy Speech

被引:0
|
作者
Leem, Seong-Gyun [1 ]
Fulford, Daniel [2 ]
Onnela, Jukka-Pekka [3 ]
Gard, David [4 ]
Busso, Carlos [1 ]
机构
[1] Univ Texas Dallas, Dept Elect & Comp Engn, Richardson, TX 75080 USA
[2] Boston Univ, Occupat Therapy & Psychol & Brain Sci, Boston, MA 02215 USA
[3] Harvard Univ, Harvard TH Chan Sch Publ Hlth, Dept Biostat, Cambridge, MA 02138 USA
[4] San Francisco State Univ, Dept Psychol, San Francisco 94132, CA USA
基金
美国国家卫生研究院;
关键词
Speech enhancement; Noise measurement; Speech recognition; Task analysis; Acoustics; Recording; Training; Feature selection; noisy speech; speech enhancement; speech emotion recognition; MODEL; CORPUS;
D O I
10.1109/TASLP.2023.3340603
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A speech emotion recognition (SER) system deployed on a real-world application can encounter speech contaminated with unconstrained background noise. To deal with this issue, a speech enhancement (SE) module can be attached to the SER system to compensate for the environmental difference of an input. Although the SE module can improve the quality and intelligibility of a given speech, there is a risk of affecting discriminative acoustic features for SER that are resilient to environmental differences. Exploring this idea, we propose to enhance only weak features that degrade the emotion recognition performance. Our model first identifies weak feature sets by using multiple models trained with one acoustic feature at a time using clean speech. After training the single-feature models, we rank each speech feature by measuring three criteria: performance, robustness, and a joint rank ranking that combines performance and robustness. We group the weak features by cumulatively incrementing the features from the bottom to the top of each rank. Once the weak feature set is defined, we only enhance those weak features, keeping the resilient features unchanged. We implement these ideas with the low-level descriptors (LLDs). We show that directly enhancing the weak LLDs leads to better performance than extracting LLDs from an enhanced speech signal. Our experiment with clean and noisy versions of the MSP-Podcast corpus shows that the proposed approach yields a 17.7% (arousal), 21.2% (dominance), and 3.3% (valence) performance gains over a system that enhances all the LLDs for the 10dB signal-to-noise ratio (SNR) condition.
引用
收藏
页码:917 / 929
页数:13
相关论文
共 50 条
  • [41] Composite Feature Extraction for Speech Emotion Recognition
    Fu, Yangzhi
    Yuan, Xiaochen
    2020 IEEE 23RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2020), 2020, : 72 - 77
  • [42] Evolutionary feature generation in speech emotion recognition
    Schuller, Bjorn
    Reiter, Stephan
    Rigoll, Gerhard
    2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 5 - +
  • [43] Speech emotion recognition with unsupervised feature learning
    Huang, Zheng-wei
    Xue, Wen-tao
    Mao, Qi-rong
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2015, 16 (05) : 358 - 366
  • [44] An algorithm study for speech emotion recognition based speech feature analysis
    Zhengbiao, Ji
    Feng, Zhou
    Ming, Zhu
    International Journal of Multimedia and Ubiquitous Engineering, 2015, 10 (11): : 33 - 42
  • [45] Speech emotion recognition with unsupervised feature learning
    Zheng-wei Huang
    Wen-tao Xue
    Qi-rong Mao
    Frontiers of Information Technology & Electronic Engineering, 2015, 16 : 358 - 366
  • [46] Speech Emotion Recognition with Discriminative Feature Learning
    Zhou, Huan
    Liu, Kai
    INTERSPEECH 2020, 2020, : 4094 - 4097
  • [47] Feature selection for emotion recognition of mandarin speech
    College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
    不详
    Zhejiang Daxue Xuebao (Gongxue Ban), 2007, 11 (1816-1822):
  • [48] Discriminative Feature Learning for Speech Emotion Recognition
    Zhang, Yuying
    Zou, Yuexian
    Peng, Junyi
    Luo, Danqing
    Huang, Dongyan
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: TEXT AND TIME SERIES, PT IV, 2019, 11730 : 198 - 210
  • [49] EESpectrum Feature Representations for Speech Emotion Recognition
    Zhao, Ziping
    Zhao, Yiqin
    Bao, Zhongtian
    Wang, Haishuai
    Zhang, Zixing
    Li, Chao
    PROCEEDINGS OF THE JOINT WORKSHOP OF THE 4TH WORKSHOP ON AFFECTIVE SOCIAL MULTIMEDIA COMPUTING AND FIRST MULTI-MODAL AFFECTIVE COMPUTING OF LARGE-SCALE MULTIMEDIA DATA (ASMMC-MMAC'18), 2018, : 27 - 33
  • [50] Speech Emotion Recognition Based on Feature Fusion
    Shen, Qi
    Chen, Guanggen
    Chang, Lin
    PROCEEDINGS OF THE 2017 2ND INTERNATIONAL CONFERENCE ON MATERIALS SCIENCE, MACHINERY AND ENERGY ENGINEERING (MSMEE 2017), 2017, 123 : 1071 - 1074