Cross-Modal Semantic Fusion Video Emotion Analysis Based on Attention Mechanism

被引：0

作者：

Zhao, Lianfen ^{[1
]}

Pan, Zhengjun ^{[1
]}

机构：

[1] Software Engn Inst Guangzhou, Dept Network Technol, Guangzhou, Peoples R China

来源：

2023 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYTICS, ICCCBDA | 2023年

关键词：

emotional analysis; cross-modal; semantic fusion; pre-training model; self-attention mechanism;

D O I：

10.1109/ICCCBDA56900.2023.10154781

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In view of the semantic gap problem existing in the existing multimodal video emotion analysis model due to the insufficient fusion process of different modes, which affects the performance of emotion analysis, a multimodal video emotion analysis model based on multi-head attention mechanism and multimodal cross fusion is proposed. The model first extracts the single modal features of text, voice, vision (image) and so on in the video; Then, the GRU network is used to extract the temporal characteristics of each single mode context; Then, the multi-head attention mechanism is used to fuse text-voice, text-video and voice-video, and then the single-mode features and the features after the fusion are cross-mode fused; Finally, through the attention mechanism, the fused features are input into the emotion classification network for classification. The experimental results show that compared with the existing models, this model can better fuse the attribute features within and between modes, and has a certain improvement in the accuracy of emotion recognition and F1 value.

引用

页码：381 / 386

页数：6

共 50 条

[41] BIDIRECTIONAL FOCUSED SEMANTIC ALIGNMENT ATTENTION NETWORK FOR CROSS-MODAL RETRIEVAL
Cheng, Shuli
Wang, Liejun
Du, Anyu
Li, Yongming
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4340 - 4344
[42] Cross-Modal Attention With Semantic Consistence for Image-Text Matching
Xu, Xing
Wang, Tan
Yang, Yang
Zuo, Lin
Shen, Fumin
Shen, Heng Tao
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (12) : 5412 - 5425
[43] Enhanced Cross-Modal Transformer Model for Video Semantic Similarity Measurement
Li, Da
Zhu, Boqing
Xu, Kele
Yang, Sen
Feng, Dawei
Liu, Bo
Wang, Huaimin
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (01) : 475 - 479
[44] Multi-task Gated Contextual Cross-Modal Attention Framework for Sentiment and Emotion Analysis
Sangwan, Suyash
Chauhan, Dushyant Singh
Akhtar, Md Shad
Ekbal, Asif
Bhattacharyya, Pushpak
NEURAL INFORMATION PROCESSING (ICONIP 2019), PT IV, 2019, 1142 : 662 - 669
[45] A language-guided cross-modal semantic fusion retrieval method
Zhu, Ligu
Zhou, Fei
Wang, Suping
Shi, Lei
Kou, Feifei
Li, Zeyu
Zhou, Pengpeng
SIGNAL PROCESSING, 2025, 234
[46] Semantic Preservation and Hash Fusion Network for Unsupervised Cross-Modal Retrieval
Shu, Xinsheng
Li, Mingyong
WEB AND BIG DATA, APWEB-WAIM 2024, PT V, 2024, 14965 : 146 - 161
[47] Semantic supervised learning based Cross-Modal Retrieval
Li, Zhuoyi
Fu, Hao
Gu, Guanghua
PROCEEDINGS OF THE ACM TURING AWARD CELEBRATION CONFERENCE-CHINA 2024, ACM-TURC 2024, 2024, : 207 - 209
[48] Narrowband Radar Micromotion Targets Recognition Strategy Based on Graph Fusion Network Constructed by Cross-Modal Attention Mechanism
Zhang, Yuanjie
Gao, Ting
Xie, Hongtu
Liu, Haozong
Ge, Mengfan
Xu, Bin
Zhu, Nannan
Lu, Zheng
REMOTE SENSING, 2025, 17 (04)
[49] Local-to-Global Cross-Modal Attention-Aware Fusion for HSI-X Semantic Segmentation
Zhang, Xuming
Yokoya, Naoto
Gu, Xingfa
Tian, Qingjiu
Bruzzone, Lorenzo
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[50] Cross-Modal Learning for Event-Based Semantic Segmentation via Attention Soft Alignment
Xie, Chuyun
Gao, Wei
Guo, Ren
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (03): : 2359 - 2366

← 1 2 3 4 5 →