Unsupervised Domain Adaptation Integrating Transformers and Mutual Information for Cross-Corpus Speech Emotion Recognition

被引:11
|
作者
Zhang, Shiqing [1 ]
Liu, Ruixin [1 ,2 ]
Yang, Yijiao [1 ,2 ]
Zhao, Xiaoming [1 ]
Yu, Jun [3 ]
机构
[1] Taizhou Univ, Taizhou, Peoples R China
[2] Zhejiang Univ Sci & Technol, Hangzhou, Peoples R China
[3] Hangzhou Dianzi Univ, Hangzhou, Peoples R China
基金
美国国家科学基金会;
关键词
Unsupervised domain adaptation; Cross-corpus speech emotion recognition; Transformer; Mutual Information; AUTOENCODERS; FUSION;
D O I
10.1145/3503161.3548328
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper focuses on an interesting task, i.e., unsupervised cross-corpus Speech Emotion Recognition (SER), in which the labelled training (source) corpus and the unlabelled testing (target) corpus have different feature distributions, resulting in the discrepancy between the source and target domains. To address this issue, this paper proposes an unsupervised domain adaptation method integrating Transformers and Mutual Information (MI) for cross-corpus SER. Initially, our method employs encoder layers of Transformers to capture long-term temporal dynamics in an utterance from the extracted segment-level log-Mel spectrogram features, thereby producing the corresponding utterance-level features for each utterance in two domains. Then, we propose an unsupervised feature decomposition method with a hybrid Max-Min MI strategy to separately learn domain-invariant features and domain-specific features from the extracted mixed utterance-level features, in which the discrepancy between two domains is eliminated as much as possible and meanwhile their individual characteristic is preserved. Finally, an interactive Multi-Head attention fusion strategy is designed to learn the complementarity between domain-invariant features and domain-specific features so that they can be interactively fused for SER. Extensive experiments on the IEMOCAP and MSP-Improv datasets demonstrate the effectiveness of our proposed method on unsupervised cross-corpus SER tasks, outperforming state-of-the-art unsupervised cross-corpus SER methods.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Deep Cross-Corpus Speech Emotion Recognition: Recent Advances and Perspectives
    Zhang, Shiqing
    Liu, Ruixin
    Tao, Xin
    Zhao, Xiaoming
    FRONTIERS IN NEUROROBOTICS, 2021, 15
  • [32] Synthesized speech for model training in cross-corpus recognition of human emotion
    Björn Schuller
    Zixing Zhang
    Felix Weninger
    Felix Burkhardt
    International Journal of Speech Technology, 2012, 15 (3) : 313 - 323
  • [33] Unsupervised Adversarial Domain Adaptation for Cross-Lingual Speech Emotion Recognition
    Latif, Siddique
    Qadir, Junaid
    Bilal, Muhammad
    2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,
  • [34] Transferable discriminant linear regression for cross-corpus speech emotion recognition
    Li, Shaokai
    Song, Peng
    Zhang, Wenjing
    APPLIED ACOUSTICS, 2022, 197
  • [35] Analysis of Deep Learning Architectures for Cross-corpus Speech Emotion Recognition
    Parry, Jack
    Palaz, Dimitri
    Clarke, Georgia
    Lecomte, Pauline
    Mead, Rebecca
    Berger, Michael
    Hofer, Gregor
    INTERSPEECH 2019, 2019, : 1656 - 1660
  • [36] Few Shot Learning Guided by Emotion Distance for Cross-corpus Speech Emotion Recognition
    Yue, Pengcheng
    Wu, Yanfeng
    Qu, Leyuan
    Zheng, Shukai
    Zhao, Shuyuan
    Li, Taihao
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1008 - 1012
  • [37] Cross-Corpus Speech Emotion Recognition Based on Domain-Adaptive Least-Squares Regression
    Zong, Yuan
    Zheng, Wenming
    Zhang, Tong
    Huang, Xiaohua
    IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (05) : 585 - 589
  • [38] Cross-Corpus Speech Emotion Recognition Based on Deep Domain-Adaptive Convolutional Neural Network
    Liu, Jiateng
    Zheng, Wenming
    Zong, Yuan
    Lu, Cheng
    Tang, Chuangao
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (02) : 459 - 463
  • [39] Deep Transductive Transfer Regression Network for Cross-Corpus Speech Emotion Recognition
    Zhao, Yan
    Wang, Jincen
    Ye, Ru
    Zong, Yuan
    Zheng, Wenming
    Zhao, Li
    INTERSPEECH 2022, 2022, : 371 - 375
  • [40] Cross-Corpus Speech Emotion Recognition Based on Sparse Subspace Transfer Learning
    Zhao, Keke
    Song, Peng
    Zhang, Wenjing
    Zhang, Weijian
    Li, Shaokai
    Chen, Dongliang
    Zheng, Wenming
    BIOMETRIC RECOGNITION (CCBR 2021), 2021, 12878 : 466 - 473