Cross-Modal Semantic Fusion Video Emotion Analysis Based on Attention Mechanism

被引:0
|
作者
Zhao, Lianfen [1 ]
Pan, Zhengjun [1 ]
机构
[1] Software Engn Inst Guangzhou, Dept Network Technol, Guangzhou, Peoples R China
关键词
emotional analysis; cross-modal; semantic fusion; pre-training model; self-attention mechanism;
D O I
10.1109/ICCCBDA56900.2023.10154781
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In view of the semantic gap problem existing in the existing multimodal video emotion analysis model due to the insufficient fusion process of different modes, which affects the performance of emotion analysis, a multimodal video emotion analysis model based on multi-head attention mechanism and multimodal cross fusion is proposed. The model first extracts the single modal features of text, voice, vision (image) and so on in the video; Then, the GRU network is used to extract the temporal characteristics of each single mode context; Then, the multi-head attention mechanism is used to fuse text-voice, text-video and voice-video, and then the single-mode features and the features after the fusion are cross-mode fused; Finally, through the attention mechanism, the fused features are input into the emotion classification network for classification. The experimental results show that compared with the existing models, this model can better fuse the attribute features within and between modes, and has a certain improvement in the accuracy of emotion recognition and F1 value.
引用
收藏
页码:381 / 386
页数:6
相关论文
共 50 条
  • [31] VIDEO QUESTION GENERATION VIA SEMANTIC RICH CROSS-MODAL SELF-ATTENTION NETWORKS LEARNING
    Wang, Yu-Siang
    Su, Hung-Ting
    Chang, Chen-Hsi
    Liu, Zhe-Yu
    Hsu, Winston H.
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 2423 - 2427
  • [32] RGB-D Saliency Detection Based on Attention Mechanism and Multi-Scale Cross-Modal Fusion
    Cui Z.
    Feng Z.
    Wang F.
    Liu Q.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (06): : 893 - 902
  • [33] MCFusion: infrared and visible image fusion based multiscale receptive field and cross-modal enhanced attention mechanism
    Jiang, Min
    Wang, Zhiyuan
    Kong, Jun
    Zhuang, Danfeng
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (01)
  • [34] Kernel Cross-Modal Factor Analysis for Information Fusion With Application to Bimodal Emotion Recognition
    Wang, Yongjin
    Guan, Ling
    Venetsanopoulos, Anastasios N.
    IEEE TRANSACTIONS ON MULTIMEDIA, 2012, 14 (03) : 597 - 607
  • [35] Cross-modal image fusion guided by subjective visual attention
    Fang, Aiqing
    Zhao, Xinbo
    Zhang, Yanning
    NEUROCOMPUTING, 2020, 414 (414) : 333 - 345
  • [36] A cross-modal fusion network based on graph feature learning for multimodal emotion recognition
    Cao Xiaopeng
    Zhang Linying
    Chen Qiuxian
    Ning Hailong
    Dong Yizhuo
    The Journal of China Universities of Posts and Telecommunications, 2024, 31 (06) : 16 - 25
  • [37] Attention-based cross-modal fusion for audio-visual voice activity detection in musical video streams
    Hou, Yuanbo
    Yu, Zhesong
    Liang, Xia
    Du, Xingjian
    Zhu, Bilei
    Ma, Zejun
    Botteldooren, Dick
    INTERSPEECH 2021, 2021, : 321 - 325
  • [38] Cross-modal semantic priming
    Tabossi, P
    LANGUAGE AND COGNITIVE PROCESSES, 1996, 11 (06): : 569 - 576
  • [39] Cross-Modal Semantic Communications
    Li, Ang
    Wei, Xin
    Wu, Dan
    Zhou, Liang
    IEEE WIRELESS COMMUNICATIONS, 2022, 29 (06) : 144 - 151
  • [40] CCMA: CapsNet for audio-video sentiment analysis using cross-modal attention
    Li, Haibin
    Guo, Aodi
    Li, Yaqian
    VISUAL COMPUTER, 2025, 41 (03): : 1609 - 1620