Video-Text Retrieval by Supervised Sparse Multi-Grained Learning

被引：0

作者：

Wang, Yimu ^{[1
]}

Shi, Peng ^{[1
]}

机构：

[1] Univ Waterloo, Waterloo, ON, Canada

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While recent progress in video-text retrieval has been advanced by the exploration of better representation learning, in this paper, we present a novel multi-grained sparse learning framework, S3MA, to learn an aligned sparse space shared between the video and the text for video-text retrieval. The shared sparse space is initialized with a finite number of sparse concepts, each of which refers to a number of words. With the text data at hand, we learn and update the shared sparse space in a supervised manner using the proposed similarity and alignment losses. Moreover, to enable multi-grained alignment, we incorporate frame representations for better modeling the video modality and calculating fine-grained and coarse-grained similarities. Benefiting from the learned shared sparse space and multi-grained similarities, extensive experiments on several video-text retrieval benchmarks demonstrate the superiority of S3MA over existing methods. Our code is available at link.

引用

页码：633 / 649

页数：17

共 50 条

[41] Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval
Chen, Yizhen
Wang, Jie
Lin, Lijian
Qi, Zhongang
Ma, Jin
Shan, Ying
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 396 - 404
[42] Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval
Mithun, Niluthpol Chowdhury
Li, Juncheng
Metze, Florian
Roy-Chowdhury, Amit K.
ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 19 - 27
[43] Multi-grained unsupervised evidence retrieval for question answering
Hao You
Neural Computing and Applications, 2023, 35 : 21247 - 21257
[44] Multi-grained unsupervised evidence retrieval for question answering
You, Hao
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (28): : 21247 - 21257
[45] Video-Text Pre-training with Learned Regions for Retrieval
Yan, Rui
Shou, Mike Zheng
Ge, Yixiao
Wang, Jinpeng
Lin, Xudong
Cai, Guanyu
Tang, Jinhui
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3100 - 3108
[46] Mask to Reconstruct: Cooperative Semantics Completion for Video-text Retrieval
Fang, Han
Yang, Zhifei
Zang, Xianghao
Ban, Chao
He, Zhongjiang
Sun, Hao
Zhou, Lanxiang
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3847 - 3856
[47] Dual Alignment Unsupervised Domain Adaptation for Video-Text Retrieval
Hao, Xiaoshuai
Zhang, Wanqian
Wu, Dayan
Zhu, Fei
Li, Bo
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18962 - 18972
[48] Adaptive Token Excitation with Negative Selection for Video-Text Retrieval
Yu, Juntao
Ni, Zhangkai
Su, Taiyi
Wang, Hanli
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VII, 2023, 14260 : 349 - 361
[49] Uncertainty-Aware with Negative Samples for Video-Text Retrieval
Song, Weitao
Chen, Weiran
Xu, Jialiang
Ji, Yi
Li, Ying
Liu, Chunping
PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 318 - 332
[50] Using Multimodal Contrastive Knowledge Distillation for Video-Text Retrieval
Ma, Wentao
Chen, Qingchao
Zhou, Tongqing
Zhao, Shan
Cai, Zhiping
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (10) : 5486 - 5497

← 1 2 3 4 5 →