Cross-Modal Semantic Fusion Video Emotion Analysis Based on Attention Mechanism

被引:0
|
作者
Zhao, Lianfen [1 ]
Pan, Zhengjun [1 ]
机构
[1] Software Engn Inst Guangzhou, Dept Network Technol, Guangzhou, Peoples R China
关键词
emotional analysis; cross-modal; semantic fusion; pre-training model; self-attention mechanism;
D O I
10.1109/ICCCBDA56900.2023.10154781
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In view of the semantic gap problem existing in the existing multimodal video emotion analysis model due to the insufficient fusion process of different modes, which affects the performance of emotion analysis, a multimodal video emotion analysis model based on multi-head attention mechanism and multimodal cross fusion is proposed. The model first extracts the single modal features of text, voice, vision (image) and so on in the video; Then, the GRU network is used to extract the temporal characteristics of each single mode context; Then, the multi-head attention mechanism is used to fuse text-voice, text-video and voice-video, and then the single-mode features and the features after the fusion are cross-mode fused; Finally, through the attention mechanism, the fused features are input into the emotion classification network for classification. The experimental results show that compared with the existing models, this model can better fuse the attribute features within and between modes, and has a certain improvement in the accuracy of emotion recognition and F1 value.
引用
收藏
页码:381 / 386
页数:6
相关论文
共 50 条
  • [41] BIDIRECTIONAL FOCUSED SEMANTIC ALIGNMENT ATTENTION NETWORK FOR CROSS-MODAL RETRIEVAL
    Cheng, Shuli
    Wang, Liejun
    Du, Anyu
    Li, Yongming
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4340 - 4344
  • [42] Cross-Modal Attention With Semantic Consistence for Image-Text Matching
    Xu, Xing
    Wang, Tan
    Yang, Yang
    Zuo, Lin
    Shen, Fumin
    Shen, Heng Tao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (12) : 5412 - 5425
  • [43] Enhanced Cross-Modal Transformer Model for Video Semantic Similarity Measurement
    Li, Da
    Zhu, Boqing
    Xu, Kele
    Yang, Sen
    Feng, Dawei
    Liu, Bo
    Wang, Huaimin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (01) : 475 - 479
  • [44] Multi-task Gated Contextual Cross-Modal Attention Framework for Sentiment and Emotion Analysis
    Sangwan, Suyash
    Chauhan, Dushyant Singh
    Akhtar, Md Shad
    Ekbal, Asif
    Bhattacharyya, Pushpak
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT IV, 2019, 1142 : 662 - 669
  • [45] A language-guided cross-modal semantic fusion retrieval method
    Zhu, Ligu
    Zhou, Fei
    Wang, Suping
    Shi, Lei
    Kou, Feifei
    Li, Zeyu
    Zhou, Pengpeng
    SIGNAL PROCESSING, 2025, 234
  • [46] Semantic Preservation and Hash Fusion Network for Unsupervised Cross-Modal Retrieval
    Shu, Xinsheng
    Li, Mingyong
    WEB AND BIG DATA, APWEB-WAIM 2024, PT V, 2024, 14965 : 146 - 161
  • [47] Semantic supervised learning based Cross-Modal Retrieval
    Li, Zhuoyi
    Fu, Hao
    Gu, Guanghua
    PROCEEDINGS OF THE ACM TURING AWARD CELEBRATION CONFERENCE-CHINA 2024, ACM-TURC 2024, 2024, : 207 - 209
  • [48] Narrowband Radar Micromotion Targets Recognition Strategy Based on Graph Fusion Network Constructed by Cross-Modal Attention Mechanism
    Zhang, Yuanjie
    Gao, Ting
    Xie, Hongtu
    Liu, Haozong
    Ge, Mengfan
    Xu, Bin
    Zhu, Nannan
    Lu, Zheng
    REMOTE SENSING, 2025, 17 (04)
  • [49] Local-to-Global Cross-Modal Attention-Aware Fusion for HSI-X Semantic Segmentation
    Zhang, Xuming
    Yokoya, Naoto
    Gu, Xingfa
    Tian, Qingjiu
    Bruzzone, Lorenzo
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [50] Cross-Modal Learning for Event-Based Semantic Segmentation via Attention Soft Alignment
    Xie, Chuyun
    Gao, Wei
    Guo, Ren
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (03): : 2359 - 2366