Nonnegative Matrix Factorization Based Transfer Subspace Learning for Cross-Corpus Speech Emotion Recognition

被引:25
|
作者
Luo, Hui [1 ]
Han, Jiqing [2 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Comp Sci & Technol, Harbin 150001, Heilongjiang, Peoples R China
[2] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin 150001, Heilongjiang, Peoples R China
基金
美国国家科学基金会;
关键词
Non-negative matrix factorization; transfer subspace learning; cross-corpus; speech emotion recognition; ALGORITHMS;
D O I
10.1109/TASLP.2020.3006331
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This article focuses on the cross-corpus speech emotion recognition (SER) task. To overcome the problem that the distribution of training (source) samples is inconsistent with that of testing (target) samples, we propose a non-negative matrix factorization based transfer subspace learning method (NMFTSL). Our method tries to find a shared feature subspace for the source and target corpora, in which the discrepancy between the two distributions is eliminated as much as possible and their individual components are excluded, thus the knowledge of the source corpus can be transferred to the target corpus. Specifically, in this induced subspace, we minimize the distances not only between the marginal distributions but also between the conditional distributions, where both distances are measured by the maximum mean discrepancy criterion. To estimate the conditional distribution of the target corpus, we propose to integrate the prediction of target label and the learning of feature representation into a joint learning model. Meanwhile, we introduce a difference loss to exclude the individual components from the shared subspace, which can further reduce the mutual interference between the source and target individual components. Moreover, we propose a discrimination loss to introduce the labels into the shared subspace, which can improve the discrimination ability of the feature representation. We also provide the solution for the corresponding optimization problem. To evaluate the performance of our method, we construct 30 cross-corpus SER schemes using 6 popular speech emotion corpora. Experimental results show that our approach achieves better overall performance than state-of-the-art methods.
引用
收藏
页码:2047 / 2060
页数:14
相关论文
共 50 条
  • [21] Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation
    Fu, Hongliang
    Zhuang, Zhihao
    Wang, Yang
    Huang, Chen
    Duan, Wenzhuo
    ENTROPY, 2023, 25 (01)
  • [22] Latent sparse transfer subspace learning for cross-corpus facial expression recognition
    Zhang, Wenjing
    Song, Peng
    Chen, Dongliang
    Zhang, Weijian
    Digital Signal Processing: A Review Journal, 2021, 116
  • [23] Latent sparse transfer subspace learning for cross-corpus facial expression recognition
    Zhang, Wenjing
    Song, Peng
    Chen, Dongliang
    Zhang, Weijian
    DIGITAL SIGNAL PROCESSING, 2021, 116
  • [24] Cross-Corpus Speech Emotion Recognition Based on Few-Shot Learning and Domain Adaptation
    Ahn, Youngdo
    Lee, Sung Joo
    Shin, Jong Won
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1190 - 1194
  • [25] Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Recognition
    Ye, Jiaxin
    Wei, Yujie
    Wen, Xin-Cheng
    Ma, Chenglong
    Huang, Zhizhong
    Liu, Kunhong
    Shan, Hongming
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5956 - 5965
  • [26] A STUDY ON CROSS-CORPUS SPEECH EMOTION RECOGNITION AND DATA AUGMENTATION
    Braunschweiler, Norbert
    Doddipatla, Rama
    Keizer, Simon
    Stoyanchev, Svetlana
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 24 - 30
  • [27] Auditory attention model based on Chirplet for cross-corpus speech emotion recognition
    Zhang X.
    Song P.
    Zha C.
    Tao H.
    Zhao L.
    Zhao, Li (zhaoli@seu.edu.cn), 1600, Southeast University (32): : 402 - 407
  • [28] Filter-based multi-task cross-corpus feature learning for speech emotion recognition
    Bakhtiari, Behzad
    Kalhor, Elham
    Ghafarian, Seyed Hossein
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (04) : 3145 - 3153
  • [29] Filter-based multi-task cross-corpus feature learning for speech emotion recognition
    Behzad Bakhtiari
    Elham Kalhor
    Seyed Hossein Ghafarian
    Signal, Image and Video Processing, 2024, 18 : 3145 - 3153
  • [30] Implicitly Aligning Joint Distributions for Cross-Corpus Speech Emotion Recognition
    Lu, Cheng
    Zong, Yuan
    Tang, Chuangao
    Lian, Hailun
    Chang, Hongli
    Zhu, Jie
    Li, Sunan
    Zhao, Yan
    ELECTRONICS, 2022, 11 (17)