Unsupervised Domain Adaptation Integrating Transformers and Mutual Information for Cross-Corpus Speech Emotion Recognition

被引:11
|
作者
Zhang, Shiqing [1 ]
Liu, Ruixin [1 ,2 ]
Yang, Yijiao [1 ,2 ]
Zhao, Xiaoming [1 ]
Yu, Jun [3 ]
机构
[1] Taizhou Univ, Taizhou, Peoples R China
[2] Zhejiang Univ Sci & Technol, Hangzhou, Peoples R China
[3] Hangzhou Dianzi Univ, Hangzhou, Peoples R China
基金
美国国家科学基金会;
关键词
Unsupervised domain adaptation; Cross-corpus speech emotion recognition; Transformer; Mutual Information; AUTOENCODERS; FUSION;
D O I
10.1145/3503161.3548328
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper focuses on an interesting task, i.e., unsupervised cross-corpus Speech Emotion Recognition (SER), in which the labelled training (source) corpus and the unlabelled testing (target) corpus have different feature distributions, resulting in the discrepancy between the source and target domains. To address this issue, this paper proposes an unsupervised domain adaptation method integrating Transformers and Mutual Information (MI) for cross-corpus SER. Initially, our method employs encoder layers of Transformers to capture long-term temporal dynamics in an utterance from the extracted segment-level log-Mel spectrogram features, thereby producing the corresponding utterance-level features for each utterance in two domains. Then, we propose an unsupervised feature decomposition method with a hybrid Max-Min MI strategy to separately learn domain-invariant features and domain-specific features from the extracted mixed utterance-level features, in which the discrepancy between two domains is eliminated as much as possible and meanwhile their individual characteristic is preserved. Finally, an interactive Multi-Head attention fusion strategy is designed to learn the complementarity between domain-invariant features and domain-specific features so that they can be interactively fused for SER. Extensive experiments on the IEMOCAP and MSP-Improv datasets demonstrate the effectiveness of our proposed method on unsupervised cross-corpus SER tasks, outperforming state-of-the-art unsupervised cross-corpus SER methods.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] A STUDY ON CROSS-CORPUS SPEECH EMOTION RECOGNITION AND DATA AUGMENTATION
    Braunschweiler, Norbert
    Doddipatla, Rama
    Keizer, Simon
    Stoyanchev, Svetlana
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 24 - 30
  • [22] Cross-Corpus Speech Emotion Recognition Based on Multi-Task Learning and Subdomain Adaptation
    Fu, Hongliang
    Zhuang, Zhihao
    Wang, Yang
    Huang, Chen
    Duan, Wenzhuo
    ENTROPY, 2023, 25 (01)
  • [23] Contrastive Learning for Domain Transfer in Cross-Corpus Emotion Recognition
    Yin, Yufeng
    Lu, Liupei
    Xiao, Yao
    Xu, Zhi
    Cai, Kaijie
    Jiang, Haonan
    Gratch, Jonathan
    Soleymani, Mohammad
    2021 9TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2021,
  • [24] Unsupervised Cross-Corpus Speech Emotion Recognition Using a Multi-Source Cycle-GAN
    Su, Bo-Hao
    Lee, Chi-Chun
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 1991 - 2004
  • [25] Implicitly Aligning Joint Distributions for Cross-Corpus Speech Emotion Recognition
    Lu, Cheng
    Zong, Yuan
    Tang, Chuangao
    Lian, Hailun
    Chang, Hongli
    Zhu, Jie
    Li, Sunan
    Zhao, Yan
    ELECTRONICS, 2022, 11 (17)
  • [26] Synthesized speech for model training in cross-corpus recognition of human emotion
    Schuller, Bjorn
    Zhang, Zixing
    Weninger, Felix
    Burkhardt, Felix
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2012, 15 (03) : 313 - 323
  • [27] Cross-Corpus Speech Emotion Recognition Based on Hybrid Neural Networks
    Rehman, Abdul
    Liu, Zhen-Tao
    Li, Dan-Yun
    Wu, Bao-Han
    PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 7464 - 7468
  • [28] A Cross-Corpus Recognition of Emotional Speech
    Xiao, Zhongzhe
    Wu, Di
    Zhang, Xiaojun
    Tao, Zhi
    PROCEEDINGS OF 2016 9TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2016, : 42 - 46
  • [29] Transfer Linear Subspace Learning for Cross-Corpus Speech Emotion Recognition
    Song, Peng
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2019, 10 (02) : 265 - 275
  • [30] Progressively Discriminative Transfer Network for Cross-Corpus Speech Emotion Recognition
    Lu, Cheng
    Tang, Chuangao
    Zhang, Jiacheng
    Zong, Yuan
    ENTROPY, 2022, 24 (08)