Cross-Modal Semantic Fusion Video Emotion Analysis Based on Attention Mechanism

被引：0

作者：

Zhao, Lianfen ^{[1
]}

Pan, Zhengjun ^{[1
]}

机构：

[1] Software Engn Inst Guangzhou, Dept Network Technol, Guangzhou, Peoples R China

来源：

2023 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYTICS, ICCCBDA | 2023年

关键词：

emotional analysis; cross-modal; semantic fusion; pre-training model; self-attention mechanism;

D O I：

10.1109/ICCCBDA56900.2023.10154781

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In view of the semantic gap problem existing in the existing multimodal video emotion analysis model due to the insufficient fusion process of different modes, which affects the performance of emotion analysis, a multimodal video emotion analysis model based on multi-head attention mechanism and multimodal cross fusion is proposed. The model first extracts the single modal features of text, voice, vision (image) and so on in the video; Then, the GRU network is used to extract the temporal characteristics of each single mode context; Then, the multi-head attention mechanism is used to fuse text-voice, text-video and voice-video, and then the single-mode features and the features after the fusion are cross-mode fused; Finally, through the attention mechanism, the fused features are input into the emotion classification network for classification. The experimental results show that compared with the existing models, this model can better fuse the attribute features within and between modes, and has a certain improvement in the accuracy of emotion recognition and F1 value.

引用

页码：381 / 386

页数：6

共 50 条

[31] VIDEO QUESTION GENERATION VIA SEMANTIC RICH CROSS-MODAL SELF-ATTENTION NETWORKS LEARNING
Wang, Yu-Siang
Su, Hung-Ting
Chang, Chen-Hsi
Liu, Zhe-Yu
Hsu, Winston H.
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 2423 - 2427
[32] RGB-D Saliency Detection Based on Attention Mechanism and Multi-Scale Cross-Modal Fusion
Cui Z.
Feng Z.
Wang F.
Liu Q.
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (06): : 893 - 902
[33] MCFusion: infrared and visible image fusion based multiscale receptive field and cross-modal enhanced attention mechanism
Jiang, Min
Wang, Zhiyuan
Kong, Jun
Zhuang, Danfeng
JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (01)
[34] Kernel Cross-Modal Factor Analysis for Information Fusion With Application to Bimodal Emotion Recognition
Wang, Yongjin
Guan, Ling
Venetsanopoulos, Anastasios N.
IEEE TRANSACTIONS ON MULTIMEDIA, 2012, 14 (03) : 597 - 607
[35] Cross-modal image fusion guided by subjective visual attention
Fang, Aiqing
Zhao, Xinbo
Zhang, Yanning
NEUROCOMPUTING, 2020, 414 (414) : 333 - 345
[36] A cross-modal fusion network based on graph feature learning for multimodal emotion recognition
Cao Xiaopeng
Zhang Linying
Chen Qiuxian
Ning Hailong
Dong Yizhuo
The Journal of China Universities of Posts and Telecommunications, 2024, 31 (06) : 16 - 25
[37] Attention-based cross-modal fusion for audio-visual voice activity detection in musical video streams
Hou, Yuanbo
Yu, Zhesong
Liang, Xia
Du, Xingjian
Zhu, Bilei
Ma, Zejun
Botteldooren, Dick
INTERSPEECH 2021, 2021, : 321 - 325
[38] Cross-modal semantic priming
Tabossi, P
LANGUAGE AND COGNITIVE PROCESSES, 1996, 11 (06): : 569 - 576
[39] Cross-Modal Semantic Communications
Li, Ang
Wei, Xin
Wu, Dan
Zhou, Liang
IEEE WIRELESS COMMUNICATIONS, 2022, 29 (06) : 144 - 151
[40] CCMA: CapsNet for audio-video sentiment analysis using cross-modal attention
Li, Haibin
Guo, Aodi
Li, Yaqian
VISUAL COMPUTER, 2025, 41 (03): : 1609 - 1620

← 1 2 3 4 5 →