Selective Acoustic Feature Enhancement for Speech Emotion Recognition With Noisy Speech

被引:0
|
作者
Leem, Seong-Gyun [1 ]
Fulford, Daniel [2 ]
Onnela, Jukka-Pekka [3 ]
Gard, David [4 ]
Busso, Carlos [1 ]
机构
[1] Univ Texas Dallas, Dept Elect & Comp Engn, Richardson, TX 75080 USA
[2] Boston Univ, Occupat Therapy & Psychol & Brain Sci, Boston, MA 02215 USA
[3] Harvard Univ, Harvard TH Chan Sch Publ Hlth, Dept Biostat, Cambridge, MA 02138 USA
[4] San Francisco State Univ, Dept Psychol, San Francisco 94132, CA USA
基金
美国国家卫生研究院;
关键词
Speech enhancement; Noise measurement; Speech recognition; Task analysis; Acoustics; Recording; Training; Feature selection; noisy speech; speech enhancement; speech emotion recognition; MODEL; CORPUS;
D O I
10.1109/TASLP.2023.3340603
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A speech emotion recognition (SER) system deployed on a real-world application can encounter speech contaminated with unconstrained background noise. To deal with this issue, a speech enhancement (SE) module can be attached to the SER system to compensate for the environmental difference of an input. Although the SE module can improve the quality and intelligibility of a given speech, there is a risk of affecting discriminative acoustic features for SER that are resilient to environmental differences. Exploring this idea, we propose to enhance only weak features that degrade the emotion recognition performance. Our model first identifies weak feature sets by using multiple models trained with one acoustic feature at a time using clean speech. After training the single-feature models, we rank each speech feature by measuring three criteria: performance, robustness, and a joint rank ranking that combines performance and robustness. We group the weak features by cumulatively incrementing the features from the bottom to the top of each rank. Once the weak feature set is defined, we only enhance those weak features, keeping the resilient features unchanged. We implement these ideas with the low-level descriptors (LLDs). We show that directly enhancing the weak LLDs leads to better performance than extracting LLDs from an enhanced speech signal. Our experiment with clean and noisy versions of the MSP-Podcast corpus shows that the proposed approach yields a 17.7% (arousal), 21.2% (dominance), and 3.3% (valence) performance gains over a system that enhances all the LLDs for the 10dB signal-to-noise ratio (SNR) condition.
引用
收藏
页码:917 / 929
页数:13
相关论文
共 50 条
  • [31] Feature selection enhancement and feature space visualization for speech-based emotion recognition
    Kanwal S.
    Asghar S.
    Ali H.
    PeerJ Computer Science, 2022, 8
  • [32] Feature selection enhancement and feature space visualization for speech-based emotion recognition
    Kanwal, Sofia
    Asghar, Sohail
    Ali, Hazrat
    PEERJ COMPUTER SCIENCE, 2022, 8
  • [33] Energy contour enhancement for noisy speech recognition
    Hwang, TH
    Chang, SC
    2004 International Symposium on Chinese Spoken Language Processing, Proceedings, 2004, : 249 - 252
  • [34] Speech enhancement strategy for speech recognition microcontroller under noisy environments
    Chan, Kit Yan
    Nordholm, Sven
    Yiu, Ka Fai Cedric
    Togneri, Roberto
    NEUROCOMPUTING, 2013, 118 : 279 - 288
  • [35] Speech Enhancement and Recognition of Compressed Speech Signal in Noisy Reverberant Conditions
    Suman, Maloji
    Khan, Habibulla
    Latha, M. Madhavi
    Kumari, Devarakonda Aruna
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS 2012 (INDIA 2012), 2012, 132 : 379 - +
  • [36] Auditory driven subband speech enhancement for automatic recognition of noisy speech
    Upadhyay N.
    Rosales H.G.
    International Journal of Speech Technology, 2016, 19 (4) : 869 - 880
  • [37] Adversarial Domain Adaptation for Noisy Speech Emotion Recognition
    Cho, Sunyoung
    Yoon, Soosung
    Song, Hyunseung
    2022 22ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2022), 2022, : 1966 - 1970
  • [38] Speech Emotion Recognition Based on EMD in Noisy Environments
    Chu, Yunyun
    Xiong, Weihua
    Chen, Wei
    ADVANCES IN CIVIL ENGINEERING AND BUILDING MATERIALS III, 2014, 831 : 460 - 464
  • [39] Advancing Speech Recognition With No Speech Or With Noisy Speech
    Krishna, Gautam
    Tran, Co
    Carnahan, Mason
    Tewfik, Ahmed
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [40] Speech emotion recognition with unsupervised feature learning
    Zheng-wei HUANG
    Wen-tao XUE
    Qi-rong MAO
    FrontiersofInformationTechnology&ElectronicEngineering, 2015, 16 (05) : 358 - 366