Research on Video Retrieval Technology based on Multimodal Fusion and Attention Mechanism

被引：0

作者：

Tai, Tianyang ^{[1
]}

Zeng, Fanfeng ^{[1
]}

机构：

[1] North China Univ Technol, Coll Informat, Beijing, Peoples R China

来源：

PROCEEDINGS OF 2023 7TH INTERNATIONAL CONFERENCE ON ELECTRONIC INFORMATION TECHNOLOGY AND COMPUTER ENGINEERING, EITCE 2023 | 2023年

关键词：

Multimodal fusion; Video retrieval; Attention mechanism;

D O I：

10.1145/3650400.3650477

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Feature extraction and matching are crucial in video retrieval tasks. However, existing algorithms often overlook motion features in action-related videos and focus only on global static features. Distinguishing between key action features and background features is challenging, which hinders capturing global dependency relationships during the convolutional process. This results in less expressive features and reduced accuracy in video retrieval. In this paper, we propose a video retrieval model that combines multi-modal fusion and attention mechanism. Our model employs the Slow Fast backbone network, extracting skeleton motion features and static image features from video sequences using the Slow and Fast networks respectively. To address feature fusion, we introduce a 3D residual attention structure between the two branches. By incorporating bilateral connections and hash encoding, we construct a hash layer to map features into binary codes, improving retrieval efficiency. Experimental results on UCF101 and HMDB51 datasets validate the effectiveness of our approach, demonstrating its advantages over state-of-the-art video retrieval methods.

引用

页码：470 / 474

页数：5

共 50 条

[1] Attention-Based Multimodal Fusion for Video Description
Hori, Chiori
Hori, Takaaki
Lee, Teng-Yok
Zhang, Ziming
Harsham, Bret
Hershey, John R.
Marks, Tim K.
Sumi, Kazuhiko
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4203 - 4212
[2] Hierarchical attention-based multimodal fusion for video captioning
Wu, Chunlei
Wei, Yiwei
Chu, Xiaoliang
Weichen, Sun
Su, Fei
Wang, Leiquan
NEUROCOMPUTING, 2018, 315 : 362 - 370
[3] Voice Keyword Retrieval Method Using Attention Mechanism and Multimodal Information Fusion
Zhang, Hongli
SCIENTIFIC PROGRAMMING, 2021, 2021
[4] Multimodal Keyless Attention Fusion for Video Classification
Long, Xiang
Gan, Chuang
de Melo, Gerard
Liu, Xiao
Li, Yandong
Li, Fu
Wen, Shilei
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 7202 - 7209
[5] Multimodal Fusion Method Based on Self-Attention Mechanism
Zhu, Hu
Wang, Ze
Shi, Yu
Hua, Yingying
Xu, Guoxia
Deng, Lizhen
WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2020, 2020
[6] Multimodal False News Detection Based on Fusion Attention Mechanism
Liu, Hualing
Chen, Shanghui
Qiao, Liang
Liu, Yaxin
Computer Engineering and Applications, 2023, 59 (09) : 95 - 103
[7] Research on multimodal hate speech detection based on self-attention mechanism feature fusion
Mao, Junjie
Shi, Hanxiao
Li, Xiaojun
JOURNAL OF SUPERCOMPUTING, 2025, 81 (01):
[8] The research of video retrieval technology
Wang, J. X.
Wang, Y. L.
Ma, Z. H.
INFORMATION TECHNOLOGY AND COMPUTER APPLICATION ENGINEERING, 2014, : 561 - 563
[9] Research on a Microexpression Recognition Technology Based on Multimodal Fusion
Kang, Jie
Chen, Xiao Ying
Liu, Qi Yuan
Jin, Si Han
Yang, Cheng Han
Hu, Cong
COMPLEXITY, 2021, 2021
[10] MAF: Multimodal Auto Attention Fusion for Video Classification
Zheng, Chengjie
Ding, Wei
Shen, Shigian
Chen, Ping
ADVANCES AND TRENDS IN ARTIFICIAL INTELLIGENCE. THEORY AND APPLICATIONS, IEA/AIE 2023, PT I, 2023, 13925 : 253 - 264

← 1 2 3 4 5 →