Cross-Modal Semantic Fusion Video Emotion Analysis Based on Attention Mechanism

被引:0
|
作者
Zhao, Lianfen [1 ]
Pan, Zhengjun [1 ]
机构
[1] Software Engn Inst Guangzhou, Dept Network Technol, Guangzhou, Peoples R China
关键词
emotional analysis; cross-modal; semantic fusion; pre-training model; self-attention mechanism;
D O I
10.1109/ICCCBDA56900.2023.10154781
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In view of the semantic gap problem existing in the existing multimodal video emotion analysis model due to the insufficient fusion process of different modes, which affects the performance of emotion analysis, a multimodal video emotion analysis model based on multi-head attention mechanism and multimodal cross fusion is proposed. The model first extracts the single modal features of text, voice, vision (image) and so on in the video; Then, the GRU network is used to extract the temporal characteristics of each single mode context; Then, the multi-head attention mechanism is used to fuse text-voice, text-video and voice-video, and then the single-mode features and the features after the fusion are cross-mode fused; Finally, through the attention mechanism, the fused features are input into the emotion classification network for classification. The experimental results show that compared with the existing models, this model can better fuse the attribute features within and between modes, and has a certain improvement in the accuracy of emotion recognition and F1 value.
引用
收藏
页码:381 / 386
页数:6
相关论文
共 50 条
  • [1] Research on cross-modal emotion recognition based on multi-layer semantic fusion
    Xu Z.
    Gao Y.
    Mathematical Biosciences and Engineering, 2024, 21 (02) : 2488 - 2514
  • [2] A cross-modal conditional mechanism based on attention for text-video retrieval
    Du, Wanru
    Jing, Xiaochuan
    Zhu, Quan
    Wang, Xiaoyin
    Liu, Xuan
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (11) : 20073 - 20092
  • [3] Multi-corpus emotion recognition method based on cross-modal gated attention fusion
    Ryumina, Elena
    Ryumin, Dmitry
    Axyonov, Alexandr
    Ivanko, Denis
    Karpov, Alexey
    PATTERN RECOGNITION LETTERS, 2025, 190 : 192 - 200
  • [4] Multimodal Sentiment Analysis Based on a Cross-Modal Multihead Attention Mechanism
    Deng, Lujuan
    Liu, Boyi
    Li, Zuhe
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 78 (01): : 1157 - 1170
  • [5] Cross-modal attention fusion network for RGB-D semantic segmentation
    Zhao, Qiankun
    Wan, Yingcai
    Xu, Jiqian
    Fang, Lijin
    NEUROCOMPUTING, 2023, 548
  • [6] Cross-Modal Attention Mechanism for Weakly Supervised Video Anomaly Detection
    Sun, Wenwen
    Cao, Lin
    Guo, Yanan
    Du, Kangning
    BIOMETRIC RECOGNITION, CCBR 2023, 2023, 14463 : 437 - 446
  • [7] Cross-Modal Video Emotion Analysis Method Based on Multi-Task Learning
    Miao, Yuqing
    Dong, Han
    Zhang, Wanzhen
    Zhou, Ming
    Cai, Guoyong
    Du, Huawei
    Computer Engineering and Applications, 2023, 59 (12) : 141 - 147
  • [8] A Short Video Classification Framework Based on Cross-Modal Fusion
    Pang, Nuo
    Guo, Songlin
    Yan, Ming
    Chan, Chien Aun
    SENSORS, 2023, 23 (20)
  • [9] Semantic Guidance Fusion Network for Cross-Modal Semantic Segmentation
    Zhang, Pan
    Chen, Ming
    Gao, Meng
    SENSORS, 2024, 24 (08)
  • [10] Cross-modal domain generalization semantic segmentation based on fusion features
    Yue, Wanlin
    Zhou, Zhiheng
    Cao, Yinglie
    Liuman
    KNOWLEDGE-BASED SYSTEMS, 2024, 302