Research on Video Retrieval Technology based on Multimodal Fusion and Attention Mechanism

被引:0
|
作者
Tai, Tianyang [1 ]
Zeng, Fanfeng [1 ]
机构
[1] North China Univ Technol, Coll Informat, Beijing, Peoples R China
关键词
Multimodal fusion; Video retrieval; Attention mechanism;
D O I
10.1145/3650400.3650477
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Feature extraction and matching are crucial in video retrieval tasks. However, existing algorithms often overlook motion features in action-related videos and focus only on global static features. Distinguishing between key action features and background features is challenging, which hinders capturing global dependency relationships during the convolutional process. This results in less expressive features and reduced accuracy in video retrieval. In this paper, we propose a video retrieval model that combines multi-modal fusion and attention mechanism. Our model employs the Slow Fast backbone network, extracting skeleton motion features and static image features from video sequences using the Slow and Fast networks respectively. To address feature fusion, we introduce a 3D residual attention structure between the two branches. By incorporating bilateral connections and hash encoding, we construct a hash layer to map features into binary codes, improving retrieval efficiency. Experimental results on UCF101 and HMDB51 datasets validate the effectiveness of our approach, demonstrating its advantages over state-of-the-art video retrieval methods.
引用
收藏
页码:470 / 474
页数:5
相关论文
共 50 条
  • [1] Attention-Based Multimodal Fusion for Video Description
    Hori, Chiori
    Hori, Takaaki
    Lee, Teng-Yok
    Zhang, Ziming
    Harsham, Bret
    Hershey, John R.
    Marks, Tim K.
    Sumi, Kazuhiko
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4203 - 4212
  • [2] Hierarchical attention-based multimodal fusion for video captioning
    Wu, Chunlei
    Wei, Yiwei
    Chu, Xiaoliang
    Weichen, Sun
    Su, Fei
    Wang, Leiquan
    NEUROCOMPUTING, 2018, 315 : 362 - 370
  • [3] Voice Keyword Retrieval Method Using Attention Mechanism and Multimodal Information Fusion
    Zhang, Hongli
    SCIENTIFIC PROGRAMMING, 2021, 2021
  • [4] Multimodal Keyless Attention Fusion for Video Classification
    Long, Xiang
    Gan, Chuang
    de Melo, Gerard
    Liu, Xiao
    Li, Yandong
    Li, Fu
    Wen, Shilei
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 7202 - 7209
  • [5] Multimodal Fusion Method Based on Self-Attention Mechanism
    Zhu, Hu
    Wang, Ze
    Shi, Yu
    Hua, Yingying
    Xu, Guoxia
    Deng, Lizhen
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2020, 2020
  • [6] Multimodal False News Detection Based on Fusion Attention Mechanism
    Liu, Hualing
    Chen, Shanghui
    Qiao, Liang
    Liu, Yaxin
    Computer Engineering and Applications, 2023, 59 (09) : 95 - 103
  • [7] Research on multimodal hate speech detection based on self-attention mechanism feature fusion
    Mao, Junjie
    Shi, Hanxiao
    Li, Xiaojun
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (01):
  • [8] The research of video retrieval technology
    Wang, J. X.
    Wang, Y. L.
    Ma, Z. H.
    INFORMATION TECHNOLOGY AND COMPUTER APPLICATION ENGINEERING, 2014, : 561 - 563
  • [9] Research on a Microexpression Recognition Technology Based on Multimodal Fusion
    Kang, Jie
    Chen, Xiao Ying
    Liu, Qi Yuan
    Jin, Si Han
    Yang, Cheng Han
    Hu, Cong
    COMPLEXITY, 2021, 2021
  • [10] MAF: Multimodal Auto Attention Fusion for Video Classification
    Zheng, Chengjie
    Ding, Wei
    Shen, Shigian
    Chen, Ping
    ADVANCES AND TRENDS IN ARTIFICIAL INTELLIGENCE. THEORY AND APPLICATIONS, IEA/AIE 2023, PT I, 2023, 13925 : 253 - 264