Unsupervised Domain Adaptation Integrating Transformers and Mutual Information for Cross-Corpus Speech Emotion Recognition

被引:11
|
作者
Zhang, Shiqing [1 ]
Liu, Ruixin [1 ,2 ]
Yang, Yijiao [1 ,2 ]
Zhao, Xiaoming [1 ]
Yu, Jun [3 ]
机构
[1] Taizhou Univ, Taizhou, Peoples R China
[2] Zhejiang Univ Sci & Technol, Hangzhou, Peoples R China
[3] Hangzhou Dianzi Univ, Hangzhou, Peoples R China
基金
美国国家科学基金会;
关键词
Unsupervised domain adaptation; Cross-corpus speech emotion recognition; Transformer; Mutual Information; AUTOENCODERS; FUSION;
D O I
10.1145/3503161.3548328
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper focuses on an interesting task, i.e., unsupervised cross-corpus Speech Emotion Recognition (SER), in which the labelled training (source) corpus and the unlabelled testing (target) corpus have different feature distributions, resulting in the discrepancy between the source and target domains. To address this issue, this paper proposes an unsupervised domain adaptation method integrating Transformers and Mutual Information (MI) for cross-corpus SER. Initially, our method employs encoder layers of Transformers to capture long-term temporal dynamics in an utterance from the extracted segment-level log-Mel spectrogram features, thereby producing the corresponding utterance-level features for each utterance in two domains. Then, we propose an unsupervised feature decomposition method with a hybrid Max-Min MI strategy to separately learn domain-invariant features and domain-specific features from the extracted mixed utterance-level features, in which the discrepancy between two domains is eliminated as much as possible and meanwhile their individual characteristic is preserved. Finally, an interactive Multi-Head attention fusion strategy is designed to learn the complementarity between domain-invariant features and domain-specific features so that they can be interactively fused for SER. Extensive experiments on the IEMOCAP and MSP-Improv datasets demonstrate the effectiveness of our proposed method on unsupervised cross-corpus SER tasks, outperforming state-of-the-art unsupervised cross-corpus SER methods.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Target-Adapted Subspace Learning for Cross-Corpus Speech Emotion Recognition
    Chen, Xiuzhen
    Zhou, Xiaoyan
    Lu, Cheng
    Zong, Yuan
    Zheng, Wenming
    Tang, Chuangao
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (12) : 2632 - 2636
  • [42] Cross-Corpus Arabic and English Emotion Recognition
    Meftah, Ali
    Seddiq, Yasser
    Alotaibi, Yousef
    Selouani, Sid-Ahmed
    2017 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2017, : 377 - 381
  • [43] Progressive distribution adapted neural networks for cross-corpus speech emotion recognition
    Zong, Yuan
    Lian, Hailun
    Zhang, Jiacheng
    Feng, Ercui
    Lu, Cheng
    Chang, Hongli
    Tang, Chuangao
    FRONTIERS IN NEUROROBOTICS, 2022, 16
  • [44] Transfer Sparse Discriminant Subspace Learning for Cross-Corpus Speech Emotion Recognition
    Zhang, Weijian
    Song, Peng
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (28) : 307 - 318
  • [45] CROSS-CORPUS SPEECH EMOTION RECOGNITION USING JOINT DISTRIBUTION ADAPTIVE REGRESSION
    Zhang, Jiacheng
    Jiang, Lin
    Zong, Yuan
    Zheng, Wenming
    Zhao, Li
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3790 - 3794
  • [46] Auditory attention model based on Chirplet for cross-corpus speech emotion recognition
    Zhang X.
    Song P.
    Zha C.
    Tao H.
    Zhao L.
    Zhao, Li (zhaoli@seu.edu.cn), 1600, Southeast University (32): : 402 - 407
  • [47] A Novel DBN Feature Fusion Model for Cross-Corpus Speech Emotion Recognition
    Zou Cairong
    Zhang Xinran
    Zha Cheng
    Zhao Li
    JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING, 2016, 2016
  • [48] Unsupervised domain adaptation for speech emotion recognition using PCANet
    Huang, Zhengwei
    Xue, Wentao
    Mao, Qirong
    Zhan, Yongzhao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (05) : 6785 - 6799
  • [49] Unsupervised domain adaptation for speech emotion recognition using PCANet
    Zhengwei Huang
    Wentao Xue
    Qirong Mao
    Yongzhao Zhan
    Multimedia Tools and Applications, 2017, 76 : 6785 - 6799
  • [50] Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Recognition
    Ye, Jiaxin
    Wei, Yujie
    Wen, Xin-Cheng
    Ma, Chenglong
    Huang, Zhizhong
    Liu, Kunhong
    Shan, Hongming
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5956 - 5965