Selective Acoustic Feature Enhancement for Speech Emotion Recognition With Noisy Speech

被引:0
|
作者
Leem, Seong-Gyun [1 ]
Fulford, Daniel [2 ]
Onnela, Jukka-Pekka [3 ]
Gard, David [4 ]
Busso, Carlos [1 ]
机构
[1] Univ Texas Dallas, Dept Elect & Comp Engn, Richardson, TX 75080 USA
[2] Boston Univ, Occupat Therapy & Psychol & Brain Sci, Boston, MA 02215 USA
[3] Harvard Univ, Harvard TH Chan Sch Publ Hlth, Dept Biostat, Cambridge, MA 02138 USA
[4] San Francisco State Univ, Dept Psychol, San Francisco 94132, CA USA
基金
美国国家卫生研究院;
关键词
Speech enhancement; Noise measurement; Speech recognition; Task analysis; Acoustics; Recording; Training; Feature selection; noisy speech; speech enhancement; speech emotion recognition; MODEL; CORPUS;
D O I
10.1109/TASLP.2023.3340603
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A speech emotion recognition (SER) system deployed on a real-world application can encounter speech contaminated with unconstrained background noise. To deal with this issue, a speech enhancement (SE) module can be attached to the SER system to compensate for the environmental difference of an input. Although the SE module can improve the quality and intelligibility of a given speech, there is a risk of affecting discriminative acoustic features for SER that are resilient to environmental differences. Exploring this idea, we propose to enhance only weak features that degrade the emotion recognition performance. Our model first identifies weak feature sets by using multiple models trained with one acoustic feature at a time using clean speech. After training the single-feature models, we rank each speech feature by measuring three criteria: performance, robustness, and a joint rank ranking that combines performance and robustness. We group the weak features by cumulatively incrementing the features from the bottom to the top of each rank. Once the weak feature set is defined, we only enhance those weak features, keeping the resilient features unchanged. We implement these ideas with the low-level descriptors (LLDs). We show that directly enhancing the weak LLDs leads to better performance than extracting LLDs from an enhanced speech signal. Our experiment with clean and noisy versions of the MSP-Podcast corpus shows that the proposed approach yields a 17.7% (arousal), 21.2% (dominance), and 3.3% (valence) performance gains over a system that enhances all the LLDs for the 10dB signal-to-noise ratio (SNR) condition.
引用
收藏
页码:917 / 929
页数:13
相关论文
共 50 条
  • [1] Using Speech Enhancement Preprocessing for Speech Emotion Recognition in Realistic Noisy Conditions
    Zhou, Hengshun
    Du, Jun
    Tu, Yan-Hui
    Lee, Chin-Hui
    INTERSPEECH 2020, 2020, : 4098 - 4102
  • [2] Joint enhancement and classification constraints for noisy speech emotion recognition
    Sun, Linhui
    Lei, Yunlong
    Wang, Shun
    Chen, Shuaitong
    Zhao, Min
    Li, Pingan
    DIGITAL SIGNAL PROCESSING, 2024, 151
  • [3] Noisy speech recognition based on speech enhancement
    Wang, Xia
    Tang, Hongmei
    Zhao, Xiaoqun
    SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 3, PROCEEDINGS, 2007, : 713 - +
  • [4] Knowledge enhancement for speech emotion recognition via multi-level acoustic feature
    Zhao, Huan
    Huang, Nianxin
    Chen, Haijiao
    CONNECTION SCIENCE, 2024, 36 (01)
  • [5] Word graph based feature enhancement for noisy speech recognition
    Yan, Zhi-Jie
    Soong, Frank K.
    Wang, Ren-Hua
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 373 - +
  • [6] Model-based feature enhancement for noisy speech recognition
    Couvreur, C
    Van hamme, H
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1719 - 1722
  • [7] Speech Enhancement Based on Masking Approach Considering Speech Quality and Acoustic Confidence for Noisy Speech Recognition
    Chu, Shih-Chuan
    Wu, Chung-Hsien
    Lin, Yun-Wen
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 536 - 540
  • [8] Emotion recognition from noisy speech
    You, Mingyu
    Chen, Chun
    Bu, Jiajun
    Liu, Jia
    Tao, Jianhua
    2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 1653 - +
  • [9] Speech emotion recognition in noisy environment
    Chenchah, Farah
    Lachiri, Zied
    2016 2ND INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP), 2016, : 788 - 792
  • [10] Sample Reconstruction and Secondary Feature Selection in Noisy Speech Emotion Recognition
    Jiang, Xiaoqing
    2016 17TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2016, : 207 - 212