Topics Guided Multimodal Fusion Network for Conversational Emotion Recognition

被引:0
|
作者
Yuan, Peicong [1 ]
Cai, Guoyong [1 ]
Chen, Ming [1 ]
Tang, Xiaolv [1 ]
机构
[1] Guilin Univ Elect Technol, Guilin, Peoples R China
关键词
Emotion Recognition in Conversation; Neural Topic Model; Multimodal Fusion;
D O I
10.1007/978-981-97-5669-8_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotion Recognition in Conversation (ERC) is a very challenging task. Previous methods capture the semantic dependencies between utterances through complex conversational context modeling, ignoring the impact of the topic information contained in the utterances; furthermore, the commonality of multimodal information has not been effectively explored. To this end, the Topics Guided Multimodal Fusion Network (TGMFN) is proposed to extract effective utterance topic information and explore cross-modal commonality and complementarity to improve model performance. First, the VAE-based neural topic model is used to build a conversational topic model, and a new topic sampling strategy is designed that is different from the traditional reparameterization trick so that the topic modeling is more suitable for utterances. Second, a facial feature extraction method in multi-party conversations is proposed to extract rich facial features in the video. Finally, the Topic-Guided Vision-Audio features Aware fusion (TGV2A) module is designed based on the conversation topic, which fully fuses modal information such as the speaker's facial feature and topic-related co-occurrence information, and captures the commonality and complementarity between multimodal information to improve feature-semantic richness. Extensive experiments have been conducted on two multimodal ERC datasets IEMOCAP and MELD. Experimental results indicate that the proposed TGMFN model shows superior performance over the leading baseline methods.
引用
收藏
页码:250 / 262
页数:13
相关论文
共 50 条
  • [21] Dual-Branch Multimodal Fusion Network for Driver Facial Emotion Recognition
    Wang, Le
    Chang, Yuchen
    Wang, Kaiping
    APPLIED SCIENCES-BASEL, 2024, 14 (20):
  • [22] GraphMFT: A graph network based multimodal fusion technique for emotion recognition in conversation
    Li, Jiang
    Wang, Xiaoping
    Lv, Guoqing
    Zeng, Zhigang
    NEUROCOMPUTING, 2023, 550
  • [23] MM-DFN: MULTIMODAL DYNAMIC FUSION NETWORK FOR EMOTION RECOGNITION IN CONVERSATIONS
    Hu, Dou
    Hou, Xiaolong
    Wei, Lingwei
    Jiang, Lianxin
    Mo, Yang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7037 - 7041
  • [24] Multimodal Emotion Recognition Based on Feature Fusion
    Xu, Yurui
    Wu, Xiao
    Su, Hang
    Liu, Xiaorui
    2022 INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2022), 2022, : 7 - 11
  • [25] MULTIMODAL TRANSFORMER FUSION FOR CONTINUOUS EMOTION RECOGNITION
    Huang, Jian
    Tao, Jianhua
    Liu, Bin
    Lian, Zheng
    Niu, Mingyue
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3507 - 3511
  • [26] Fusion with Hierarchical Graphs for Multimodal Emotion Recognition
    Tang, Shuyun
    Luo, Zhaojie
    Nan, Guoshun
    Baba, Jun
    Yoshikawa, Yuichiro
    Ishiguro, Hiroshi
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1288 - 1296
  • [27] Multimodal Transformer Fusion for Emotion Recognition: A Survey
    Belaref, Amdjed
    Seguier, Renaud
    2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024, 2024, : 107 - 113
  • [28] ICON: Interactive Conversational Memory Network for Multimodal Emotion Detection
    Hazarika, Devamanyu
    Poria, Soujanya
    Mihalcea, Rada
    Cambria, Erik
    Zimmermann, Roger
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2594 - 2604
  • [29] Directed Acyclic Graph Network for Conversational Emotion Recognition
    Shen, Weizhou
    Wu, Siyue
    Yang, Yunyi
    Quan, Xiaojun
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 1551 - 1560
  • [30] MFDR: Multiple-stage Fusion and Dynamically Refined Network for Multimodal Emotion Recognition
    Zhao, Ziping
    Gao, Tian
    Wang, Haishuai
    Schuller, Bjoern
    INTERSPEECH 2024, 2024, : 3719 - 3723