Improving Video Retrieval Using Multilingual Knowledge Transfer

被引:2
|
作者
Madasu, Avinash [1 ,2 ]
Aflalo, Estelle [1 ]
Stan, Gabriela Ben Melech [1 ]
Tseng, Shao-Yen [1 ]
Bertasius, Gedas [2 ]
Lal, Vasudev [1 ]
机构
[1] Intel Labs, Cognit Comp Res, Santa Clara, CA 95054 USA
[2] Univ North Carolina Chapel Hill, Chapel Hill, NC USA
关键词
Video-retrieval; Multi-lingual; Multi-modal;
D O I
10.1007/978-3-031-28244-7_42
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Video retrieval has seen tremendous progress with the development of vision-language models. However, further improving these models require additional labelled data which is a huge manual effort. In this paper, we propose a framework MKTVR, that utilizes knowledge transfer from a multilingual model to boost the performance of video retrieval. We first use state-of-the-art machine translation models to construct pseudo ground-truth multilingual video-text pairs. We then use this data to learn a video-text representation where English and non-English text queries are represented in a common embedding space based on pretrained multilingual models. We evaluate our proposed approach on four English video retrieval datasets such as MSRVTT, MSVD, DiDeMo and Charades. Experimental results demonstrate that our approach achieves state-of-the-art results on all datasets outperforming previous models. Finally, we also evaluate our model on a multilingual video-retrieval dataset encompassing six languages and show that our model outperforms previous multilingual video retrieval models in a zero-shot setting.
引用
收藏
页码:669 / 684
页数:16
相关论文
共 50 条
  • [41] CROSS-LANGUAGE KNOWLEDGE TRANSFER USING MULTILINGUAL DEEP NEURAL NETWORK WITH SHARED HIDDEN LAYERS
    Huang, Jui-Ting
    Li, Jinyu
    Yu, Dong
    Deng, Li
    Gong, Yifan
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7304 - 7308
  • [42] VTC: Improving Video-Text Retrieval with User Comments
    Hanu, Laura
    Thewlis, James
    Asano, Yuki M.
    Rupprecht, Christian
    COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 616 - 633
  • [43] Using shot frequency in video retrieval
    Ruan Jian Xue Bao, 9 (956-961):
  • [44] Video retrieval using semantic data
    Del Bimbo, A
    STATE-OF-THE-ART IN CONTENT-BASED IMAGE AND VIDEO RETRIEVAL, 2001, 22 : 279 - 295
  • [45] Using structure for video object retrieval
    Hohl, L
    Souvannavong, F
    Merialdo, B
    Huet, B
    IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS, 2004, 3115 : 564 - 572
  • [46] Improving interactive video retrieval by exploiting automatically-extracted video structural semantics
    Mezaris, Vasileios
    Sidiropoulos, Panagiotis
    Kompatsiaris, Ioannis
    FIFTH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2011), 2011, : 224 - 227
  • [47] Improving Patient Knowledge of Aneuploidy Testing Using an Educational Video A Randomized Controlled Trial
    Mulla, Bethany M.
    Chang, Olivia H.
    Modest, Anna M.
    Hacker, Michele R.
    Marchand, Karen F.
    O'Brien, Karen E.
    OBSTETRICS AND GYNECOLOGY, 2018, 132 (02): : 445 - 452
  • [48] X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models
    Zhengbao, Jiang
    Anastasopoulos, Antonios
    Jun, Araki
    Haibo, Ding
    Neubig, Graham
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 5943 - 5959
  • [49] Distillation for Multilingual Information Retrieval
    Yang, Eugene
    Lawrie, Dawn
    Mayfield, James
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2368 - 2373
  • [50] Multilingual information retrieval system
    Hong, Z
    Syin, C
    Lia, KF
    MULTIMEDIA STORAGE AND ARCHIVING SYSTEMS, 1996, 2916 : 33 - 44