Unsupervised Domain Adaptation Integrating Transformers and Mutual Information for Cross-Corpus Speech Emotion Recognition

被引:11
|
作者
Zhang, Shiqing [1 ]
Liu, Ruixin [1 ,2 ]
Yang, Yijiao [1 ,2 ]
Zhao, Xiaoming [1 ]
Yu, Jun [3 ]
机构
[1] Taizhou Univ, Taizhou, Peoples R China
[2] Zhejiang Univ Sci & Technol, Hangzhou, Peoples R China
[3] Hangzhou Dianzi Univ, Hangzhou, Peoples R China
基金
美国国家科学基金会;
关键词
Unsupervised domain adaptation; Cross-corpus speech emotion recognition; Transformer; Mutual Information; AUTOENCODERS; FUSION;
D O I
10.1145/3503161.3548328
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper focuses on an interesting task, i.e., unsupervised cross-corpus Speech Emotion Recognition (SER), in which the labelled training (source) corpus and the unlabelled testing (target) corpus have different feature distributions, resulting in the discrepancy between the source and target domains. To address this issue, this paper proposes an unsupervised domain adaptation method integrating Transformers and Mutual Information (MI) for cross-corpus SER. Initially, our method employs encoder layers of Transformers to capture long-term temporal dynamics in an utterance from the extracted segment-level log-Mel spectrogram features, thereby producing the corresponding utterance-level features for each utterance in two domains. Then, we propose an unsupervised feature decomposition method with a hybrid Max-Min MI strategy to separately learn domain-invariant features and domain-specific features from the extracted mixed utterance-level features, in which the discrepancy between two domains is eliminated as much as possible and meanwhile their individual characteristic is preserved. Finally, an interactive Multi-Head attention fusion strategy is designed to learn the complementarity between domain-invariant features and domain-specific features so that they can be interactively fused for SER. Extensive experiments on the IEMOCAP and MSP-Improv datasets demonstrate the effectiveness of our proposed method on unsupervised cross-corpus SER tasks, outperforming state-of-the-art unsupervised cross-corpus SER methods.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Improved Cross-Corpus Speech Emotion Recognition Using Deep Local Domain Adaptation
    ZHAO Huijuan
    YE Ning
    WANG Ruchuan
    ChineseJournalofElectronics, 2023, 32 (03) : 640 - 646
  • [2] Improved Cross-Corpus Speech Emotion Recognition Using Deep Local Domain Adaptation
    Zhao Huijuan
    Ye Ning
    Wang Ruchuan
    CHINESE JOURNAL OF ELECTRONICS, 2023, 32 (03) : 640 - 646
  • [3] Transfer Subspace Learning for Unsupervised Cross-Corpus Speech Emotion Recognition
    Liu, Na
    Zhang, Baofeng
    Liu, Bin
    Shi, Jingang
    Yang, Lei
    Li, Zhiwei
    Zhu, Junchao
    IEEE ACCESS, 2021, 9 : 95925 - 95937
  • [4] UNSUPERVISED CROSS-CORPUS SPEECH EMOTION RECOGNITION USING DOMAIN-ADAPTIVE SUBSPACE LEARNING
    Liu, Na
    Zong, Yuan
    Zhang, Baofeng
    Liu, Li
    Chen, Jie
    Zhao, Guoying
    Zhu, Junchao
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5144 - 5148
  • [5] Cross-corpus speech emotion recognition using semi-supervised domain adaptation network
    Zhang, Yumei
    Jia, Maoshen
    Cao, Xuan
    Ru, Jiawei
    Zhang, Xinfeng
    SPEECH COMMUNICATION, 2025, 168
  • [6] Convolutional Auto-Encoder and Adversarial Domain Adaptation for Cross-Corpus Speech Emotion Recognition
    Wang, Yang
    Fu, Hongliang
    Tao, Huawei
    Yang, Jing
    Ge, Hongyi
    Xie, Yue
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2022, E105D (10) : 1803 - 1806
  • [7] Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition
    Latif, Siddique
    Rana, Rajib
    Khalifa, Sara
    Jurdak, Raja
    Schuller, Bjorn
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 1912 - 1926
  • [8] Cross-Corpus Speech Emotion Recognition Based on Few-Shot Learning and Domain Adaptation
    Ahn, Youngdo
    Lee, Sung Joo
    Shin, Jong Won
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1190 - 1194
  • [9] DOMAIN GENERALIZATION WITH TRIPLET NETWORK FOR CROSS-CORPUS SPEECH EMOTION RECOGNITION
    Lee, Shi-wook
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 389 - 396
  • [10] Cross-Corpus Speech Emotion Recognition Based on Causal Emotion Information Representation
    Fu, Hongliang
    Li, Qianqian
    Tao, Huawei
    Zhu, Chunhua
    Xie, Yue
    Guo, Ruxue
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (08) : 1097 - 1100