MMATERIC: Multi-Task Learning and Multi-Fusion for AudioText Emotion Recognition in Conversation

被引:5
|
作者
Liang, Xingwei [1 ,2 ]
Zou, You [1 ]
Zhuang, Xinnan [1 ]
Yang, Jie [3 ]
Niu, Taiyu [2 ]
Xu, Ruifeng [2 ]
机构
[1] Konka Corp, Shenzhen 518053, Peoples R China
[2] Harbin Inst Technol, Joint Lab HIT Konka, Shenzhen 518055, Peoples R China
[3] No Arizona Univ, Sch Informat Comp & Cyber Syst, Flagstaff, AZ 86011 USA
基金
中国国家自然科学基金;
关键词
emotion recognition in conversation; Multi-Task Learning and Multi-Fusion for AudioText Emotion Recognition in Conversation; multi-task learning; multimodal fusion;
D O I
10.3390/electronics12071534
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The accurate recognition of emotions in conversations helps understand the speaker's intentions and facilitates various analyses in artificial intelligence, especially in human-computer interaction systems. However, most previous methods need more ability to track the different emotional states of each speaker in a dialogue. To alleviate this dilemma, we propose a new approach, Multi-Task Learning and Multi-Fusion AudioText Emotion Recognition in Conversation (MMATERIC) for emotion recognition in conversation. MMATERIC can refer to and combine the benefits of two distinct tasks: emotion recognition in text and emotion recognition in speech, and production of fused multimodal features to recognize the emotions of different speakers in dialogue. At the core of MATTERIC are three modules: an encoder with multimodal attention, a speaker emotion detection unit (SED-Unit), and a decoder with speaker emotion detection Bi-LSTM (SED-Bi-LSTM). Together, these three modules model the changing emotions of a speaker at a given moment in a conversation. Meanwhile, we adopt multiple fusion strategies in different stages, mainly using model fusion and decision stage fusion to improve the model's accuracy. Simultaneously, our multimodal framework allows features to interact across modalities and allows potential adaptation flows from one modality to another. Our experimental results on two benchmark datasets show that our proposed method is effective and outperforms the state-of-the-art baseline methods. The performance improvement of our method is mainly attributed to the combination of three core modules of MATTERIC and the different fusion methods we adopt in each stage.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] LEVERAGING VALENCE AND ACTIVATION INFORMATION VIA MULTI-TASK LEARNING FOR CATEGORICAL EMOTION RECOGNITION
    Xia, Rui
    Liu, Yang
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5301 - 5305
  • [32] Multi-Task and Attention Collaborative Network for Facial Emotion Recognition
    Wang, Xiaohua
    Yu, Cong
    Gu, Yu
    Hu, Min
    Ren, Fuji
    IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2021, 16 (04) : 568 - 576
  • [33] SELECTIVE MULTI-TASK LEARNING FOR SPEECH EMOTION RECOGNITION USING CORPORA OF DIFFERENT STYLES
    Zhang, Heran
    Mimura, Masato
    Kawahara, Tatsuya
    Ishizuka, Kenkichi
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7707 - 7711
  • [34] MTLFuseNet: A novel emotion recognition model based on deep latent feature fusion of EEG signals and multi-task learning
    Li, Rui
    Ren, Chao
    Ge, Yiqing
    Zhao, Qiqi
    Yang, Yikun
    Shi, Yuhan
    Zhang, Xiaowei
    Hu, Bin
    KNOWLEDGE-BASED SYSTEMS, 2023, 276
  • [35] MULTI-MODAL MULTI-TASK DEEP LEARNING FOR SPEAKER AND EMOTION RECOGNITION OF TV-SERIES DATA
    Novitasari, Sashi
    Quoc Truong Do
    Sakti, Sakriani
    Lestari, Dessi
    Nakamura, Satoshi
    2018 ORIENTAL COCOSDA - INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2018, : 37 - 42
  • [36] Multi-label emotion classification based on adversarial multi-task learning
    Lin, Nankai
    Fu, Sihui
    Lin, Xiaotian
    Wang, Lianxi
    INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (06)
  • [37] A Novel DE-CNN-BiLSTM Multi-Fusion Model for EEG Emotion Recognition
    Cui, Fachang
    Wang, Ruqing
    Ding, Weiwei
    Chen, Yao
    Huang, Liya
    MATHEMATICS, 2022, 10 (04)
  • [38] MULTI-OBJECTIVE MULTI-TASK LEARNING ON RNNLM FOR SPEECH RECOGNITION
    Song, Minguang
    Zhao, Yunxin
    Wang, Shaojun
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 197 - 203
  • [39] Multi-Domain and Multi-Task Learning for Human Action Recognition
    Liu, An-An
    Xu, Ning
    Nie, Wei-Zhi
    Su, Yu-Ting
    Zhang, Yong-Dong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (02) : 853 - 867
  • [40] Multi-EmoNet: A Novel Multi-Task Neural Network for Driver Emotion Recognition
    Cui, Yaodong
    Ma, Yintao
    Li, Wenbo
    Bian, Ning
    Li, Guofa
    Cao, Dongpu
    IFAC PAPERSONLINE, 2020, 53 (05): : 650 - 655