Video-Text Retrieval by Supervised Sparse Multi-Grained Learning

被引：0

作者：

Wang, Yimu ^{[1
]}

Shi, Peng ^{[1
]}

机构：

[1] Univ Waterloo, Waterloo, ON, Canada

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While recent progress in video-text retrieval has been advanced by the exploration of better representation learning, in this paper, we present a novel multi-grained sparse learning framework, S3MA, to learn an aligned sparse space shared between the video and the text for video-text retrieval. The shared sparse space is initialized with a finite number of sparse concepts, each of which refers to a number of words. With the text data at hand, we learn and update the shared sparse space in a supervised manner using the proposed similarity and alignment losses. Moreover, to enable multi-grained alignment, we incorporate frame representations for better modeling the video modality and calculating fine-grained and coarse-grained similarities. Benefiting from the learned shared sparse space and multi-grained similarities, extensive experiments on several video-text retrieval benchmarks demonstrate the superiority of S3MA over existing methods. Our code is available at link.

引用

页码：633 / 649

页数：17

共 50 条

[1] X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval
Ma, Yiwei
Xu, Guohai
Sun, Xiaoshuai
Yan, Ming
Zhang, Ji
Ji, Rongrong
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
[2] Multi-event Video-Text Retrieval
Zhang, Gengyuan
Ren, Jisen
Gu, Jindong
Tresp, Volker
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22056 - 22066
[3] Deep learning for video-text retrieval: a review
Zhu, Cunjuan
Jia, Qi
Chen, Wei
Guo, Yanming
Liu, Yu
INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2023, 12 (01)
[4] Deep learning for video-text retrieval: a review
Cunjuan Zhu
Qi Jia
Wei Chen
Yanming Guo
Yu Liu
International Journal of Multimedia Information Retrieval, 2023, 12
[5] Fine-Grained Cross-Modal Contrast Learning for Video-Text Retrieval
Liu, Hui
Lv, Gang
Gu, Yanhong
Nian, Fudong
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT V, ICIC 2024, 2024, 14866 : 298 - 310
[6] Exploiting Unlabeled Videos for Video-Text Retrieval via Pseudo-Supervised Learning
Lu, Yu
Quan, Ruijie
Zhu, Linchao
Yang, Yi
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 6748 - 6760
[7] MGSGA: Multi-grained and Semantic-Guided Alignment for Text-Video Retrieval
Xiaoyu Wu
Jiayao Qian
Lulu Yang
Neural Processing Letters, 56
[8] MGSGA: Multi-grained and Semantic-Guided Alignment for Text-Video Retrieval
Wu, Xiaoyu
Qian, Jiayao
Yang, Lulu
NEURAL PROCESSING LETTERS, 2024, 56 (02)
[9] SViTT: Temporal Learning of Sparse Video-Text Transformers
Li, Yi
Min, Kyle
Tripathi, Subarna
Vasconcelos, Nuno
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18919 - 18929
[10] Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning
Chen, Shizhe
Zhao, Yida
Jin, Qin
Wu, Qi
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 10635 - 10644

← 1 2 3 4 5 →