Topics Guided Multimodal Fusion Network for Conversational Emotion Recognition

被引:0
|
作者
Yuan, Peicong [1 ]
Cai, Guoyong [1 ]
Chen, Ming [1 ]
Tang, Xiaolv [1 ]
机构
[1] Guilin Univ Elect Technol, Guilin, Peoples R China
关键词
Emotion Recognition in Conversation; Neural Topic Model; Multimodal Fusion;
D O I
10.1007/978-981-97-5669-8_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotion Recognition in Conversation (ERC) is a very challenging task. Previous methods capture the semantic dependencies between utterances through complex conversational context modeling, ignoring the impact of the topic information contained in the utterances; furthermore, the commonality of multimodal information has not been effectively explored. To this end, the Topics Guided Multimodal Fusion Network (TGMFN) is proposed to extract effective utterance topic information and explore cross-modal commonality and complementarity to improve model performance. First, the VAE-based neural topic model is used to build a conversational topic model, and a new topic sampling strategy is designed that is different from the traditional reparameterization trick so that the topic modeling is more suitable for utterances. Second, a facial feature extraction method in multi-party conversations is proposed to extract rich facial features in the video. Finally, the Topic-Guided Vision-Audio features Aware fusion (TGV2A) module is designed based on the conversation topic, which fully fuses modal information such as the speaker's facial feature and topic-related co-occurrence information, and captures the commonality and complementarity between multimodal information to improve feature-semantic richness. Extensive experiments have been conducted on two multimodal ERC datasets IEMOCAP and MELD. Experimental results indicate that the proposed TGMFN model shows superior performance over the leading baseline methods.
引用
收藏
页码:250 / 262
页数:13
相关论文
共 50 条
  • [1] PGIF: A Personality-Guided Iterative Feedback Graph Network for Multimodal Conversational Emotion Recognition
    Xie, Yunhe
    Mao, Rui
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2025,
  • [2] MALN: Multimodal Adversarial Learning Network for Conversational Emotion Recognition
    Ren, Minjie
    Huang, Xiangdong
    Liu, Jing
    Liu, Ming
    Li, Xuanya
    Liu, An-An
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (11) : 6965 - 6980
  • [3] Revisiting Disentanglement and Fusion on Modality and Context in Conversational Multimodal Emotion Recognition
    Li, Bobo
    Fei, Hao
    Liao, Lizi
    Zhao, Yu
    Teng, Chong
    Chua, Tat-Seng
    Ji, Donghong
    Li, Fei
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5923 - 5934
  • [4] Multi-loop graph convolutional network for multimodal conversational emotion recognition
    Ren, Minjie
    Huang, Xiangdong
    Li, Wenhui
    Liu, Jing
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 94
  • [5] Multimodal Emotion Recognition Using a Hierarchical Fusion Convolutional Neural Network
    Zhang, Yong
    Cheng, Cheng
    Zhang, Yidie
    IEEE ACCESS, 2021, 9 : 7943 - 7951
  • [6] Combining Multimodal Features within a Fusion Network for Emotion Recognition in the Wild
    Sun, Bo
    Li, Liandong
    Zhou, Guoyan
    Wu, Xuewen
    He, Jun
    Yu, Lejun
    Li, Dongxue
    Wei, Qinglan
    ICMI'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2015, : 497 - 502
  • [7] A multi-stage dynamical fusion network for multimodal emotion recognition
    Chen, Sihan
    Tang, Jiajia
    Zhu, Li
    Kong, Wanzeng
    COGNITIVE NEURODYNAMICS, 2023, 17 (03) : 671 - 680
  • [8] A multi-stage dynamical fusion network for multimodal emotion recognition
    Sihan Chen
    Jiajia Tang
    Li Zhu
    Wanzeng Kong
    Cognitive Neurodynamics, 2023, 17 : 671 - 680
  • [9] FMFN: A Fuzzy Multimodal Fusion Network for Emotion Recognition in Ensemble Conducting
    Han, Xiao
    Chen, Fuyang
    Ban, Junrong
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2025, 33 (01) : 168 - 179
  • [10] MFGCN: Multimodal fusion graph convolutional network for speech emotion recognition
    Qi, Xin
    Wen, Yujun
    Zhang, Pengzhou
    Huang, Heyan
    NEUROCOMPUTING, 2025, 611