Text-guided distillation learning to diversify video embeddings for text-video retrieval

被引:0
|
作者
Lee, Sangmin [1 ]
Kim, Hyung-Il [2 ]
Ro, Yong Man [3 ]
机构
[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
[2] Elect & Telecommun Res Inst, Visual Intelligence Res Sect, Daejeon 34129, South Korea
[3] Korea Adv Inst Sci & Technol, Image & Video Syst Lab, Daejeon 34141, South Korea
关键词
text-video retrieval; Diverse video embedding; Text-guided distillation learning; Text-agnostic; One-to-many correspondence;
D O I
10.1016/j.patcog.2024.110754
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Conventional text-video retrieval methods typically match a video with a text on a one-to-one manner. However, a single video can contain diverse semantics, and text descriptions can vary significantly. Therefore, such methods fail to match a video with multiple texts simultaneously. In this paper, we propose a novel approach to tackle this one-to-many correspondence problem in text-video retrieval. We devise diverse temporal aggregation and a multi-key memory to address temporal and semantic diversity, consequently constructing multiple video embedding paths from a single video. Additionally, we introduce text-guided distillation learning that enables each video path to acquire meaningful distinct competencies in representing varied semantics. Our video embedding approach is text-agnostic, allowing the prepared video embeddings to be used continuously for any new text query. Experiments show our method outperforms existing methods on four datasets. We further validate the effectiveness of our designs with ablation studies and analyses on diverse video embeddings.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Text-video retrieval method based on enhanced self-attention and multi-task learning
    Wu, Xiaoyu
    Qian, Jiayao
    Wang, Tiantian
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (16) : 24387 - 24406
  • [42] TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval
    Liu, Yuqi
    Xiong, Pengfei
    Xu, Luhui
    Cao, Shengming
    Jin, Qin
    COMPUTER VISION - ECCV 2022, PT XIV, 2022, 13674 : 319 - 335
  • [43] Text-guided visual representation learning for medical image retrieval systems
    Serieys, Guillaume
    Kurtz, Camille
    Fournier, Laure
    Cloppet, Florence
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 593 - 598
  • [44] Video and Text Matching with Conditioned Embeddings
    Ali, Ameen
    Schwartz, Idan
    Hazan, Tamir
    Wolf, Lior
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 478 - 487
  • [45] Text-Guided Knowledge Transfer for Remote Sensing Image-Text Retrieval
    Liu, An-An
    Yang, Bo
    Li, Wenhui
    Song, Dan
    Sun, Zhengya
    Ren, Tongwei
    Wei, Zhiqiang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [46] CelebV-Text: A Large-Scale Facial Text-Video Dataset
    Yu, Jianhui
    Zhu, Hao
    Jiang, Liming
    Loy, Chen Change
    Cai, Weidong
    Wu, Wayne
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14805 - 14814
  • [47] Text-guided Graph Temporal Modeling for few-shot video classification
    Deng, Fuqin
    Zhong, Jiaming
    Li, Nannan
    Fu, Lanhui
    Jiang, Bingchun
    Yi, Ningbo
    Qi, Feng
    Xin, He
    Lam, Tin Lun
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
  • [48] Text-Guided Object Detector for Multi-modal Video Question Answering
    Shen, Ruoyue
    Inoue, Nakamasa
    Shinoda, Koichi
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 1032 - 1042
  • [49] Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning
    Jiang, Chen
    Liu, Hong
    Yu, Xuzheng
    Wang, Qing
    Cheng, Yuan
    Xu, Jia
    Liu, Zhongyi
    Guo, Qingpei
    Chu, Wei
    Yang, Ming
    Qi, Yuan
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4626 - 4636
  • [50] Fine-grained Cross-modal Alignment Network for Text-Video Retrieval
    Han, Ning
    Chen, Jingjing
    Xiao, Guangyi
    Zhang, Hao
    Zeng, Yawen
    Chen, Hao
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3826 - 3834