Video-Text Retrieval by Supervised Sparse Multi-Grained Learning

被引:0
|
作者
Wang, Yimu [1 ]
Shi, Peng [1 ]
机构
[1] Univ Waterloo, Waterloo, ON, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While recent progress in video-text retrieval has been advanced by the exploration of better representation learning, in this paper, we present a novel multi-grained sparse learning framework, S3MA, to learn an aligned sparse space shared between the video and the text for video-text retrieval. The shared sparse space is initialized with a finite number of sparse concepts, each of which refers to a number of words. With the text data at hand, we learn and update the shared sparse space in a supervised manner using the proposed similarity and alignment losses. Moreover, to enable multi-grained alignment, we incorporate frame representations for better modeling the video modality and calculating fine-grained and coarse-grained similarities. Benefiting from the learned shared sparse space and multi-grained similarities, extensive experiments on several video-text retrieval benchmarks demonstrate the superiority of S3MA over existing methods. Our code is available at link.
引用
收藏
页码:633 / 649
页数:17
相关论文
共 50 条
  • [1] X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval
    Ma, Yiwei
    Xu, Guohai
    Sun, Xiaoshuai
    Yan, Ming
    Zhang, Ji
    Ji, Rongrong
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
  • [2] Multi-event Video-Text Retrieval
    Zhang, Gengyuan
    Ren, Jisen
    Gu, Jindong
    Tresp, Volker
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22056 - 22066
  • [3] Deep learning for video-text retrieval: a review
    Zhu, Cunjuan
    Jia, Qi
    Chen, Wei
    Guo, Yanming
    Liu, Yu
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2023, 12 (01)
  • [4] Deep learning for video-text retrieval: a review
    Cunjuan Zhu
    Qi Jia
    Wei Chen
    Yanming Guo
    Yu Liu
    International Journal of Multimedia Information Retrieval, 2023, 12
  • [5] Fine-Grained Cross-Modal Contrast Learning for Video-Text Retrieval
    Liu, Hui
    Lv, Gang
    Gu, Yanhong
    Nian, Fudong
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT V, ICIC 2024, 2024, 14866 : 298 - 310
  • [6] Exploiting Unlabeled Videos for Video-Text Retrieval via Pseudo-Supervised Learning
    Lu, Yu
    Quan, Ruijie
    Zhu, Linchao
    Yang, Yi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 6748 - 6760
  • [7] MGSGA: Multi-grained and Semantic-Guided Alignment for Text-Video Retrieval
    Xiaoyu Wu
    Jiayao Qian
    Lulu Yang
    Neural Processing Letters, 56
  • [8] MGSGA: Multi-grained and Semantic-Guided Alignment for Text-Video Retrieval
    Wu, Xiaoyu
    Qian, Jiayao
    Yang, Lulu
    NEURAL PROCESSING LETTERS, 2024, 56 (02)
  • [9] SViTT: Temporal Learning of Sparse Video-Text Transformers
    Li, Yi
    Min, Kyle
    Tripathi, Subarna
    Vasconcelos, Nuno
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18919 - 18929
  • [10] Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning
    Chen, Shizhe
    Zhao, Yida
    Jin, Qin
    Wu, Qi
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 10635 - 10644