Improving Video Retrieval Using Multilingual Knowledge Transfer

被引:2
|
作者
Madasu, Avinash [1 ,2 ]
Aflalo, Estelle [1 ]
Stan, Gabriela Ben Melech [1 ]
Tseng, Shao-Yen [1 ]
Bertasius, Gedas [2 ]
Lal, Vasudev [1 ]
机构
[1] Intel Labs, Cognit Comp Res, Santa Clara, CA 95054 USA
[2] Univ North Carolina Chapel Hill, Chapel Hill, NC USA
关键词
Video-retrieval; Multi-lingual; Multi-modal;
D O I
10.1007/978-3-031-28244-7_42
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Video retrieval has seen tremendous progress with the development of vision-language models. However, further improving these models require additional labelled data which is a huge manual effort. In this paper, we propose a framework MKTVR, that utilizes knowledge transfer from a multilingual model to boost the performance of video retrieval. We first use state-of-the-art machine translation models to construct pseudo ground-truth multilingual video-text pairs. We then use this data to learn a video-text representation where English and non-English text queries are represented in a common embedding space based on pretrained multilingual models. We evaluate our proposed approach on four English video retrieval datasets such as MSRVTT, MSVD, DiDeMo and Charades. Experimental results demonstrate that our approach achieves state-of-the-art results on all datasets outperforming previous models. Finally, we also evaluate our model on a multilingual video-retrieval dataset encompassing six languages and show that our model outperforms previous multilingual video retrieval models in a zero-shot setting.
引用
收藏
页码:669 / 684
页数:16
相关论文
共 50 条
  • [1] Improving Deliverable Speech-to-text Systems with Multilingual Knowledge Transfer
    Ma, Jeff
    Keith, Francis
    Ng, Tim
    Siu, Man-hung
    Kimball, Owen
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 127 - 131
  • [2] Using knowledge representation languages for video annotation and retrieval
    Bertini, M.
    D'Amico, G.
    Del Bimbo, A.
    Torniai, C.
    FLEXIBLE QUERY ANSWERING SYSTEMS, PROCEEDINGS, 2006, 4027 : 634 - 646
  • [3] Multilingual Information Retrieval using GHSOM
    Yang, Hsin-Chang
    Lee, Chung-Hong
    ISDA 2008: EIGHTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, VOL 1, PROCEEDINGS, 2008, : 225 - +
  • [4] Improving Causality in Interpretable Video Retrieval
    Devi, Varsha
    Mulhem, Philippe
    Quenot, Georges
    20TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2023, 2023, : 249 - 255
  • [5] Improving Video Retrieval by Adaptive Margin
    He, Feng
    Wang, Qi
    Feng, Zhifan
    Jiang, Wenbin
    Lu, Yajuan
    Zhu, Yong
    Tan, Xiao
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1359 - 1368
  • [6] Improving multimedia retrieval with a video OCR
    Das, Dipanjan
    Chen, Datong
    Hauptmann, Alexander G.
    MULTIMEDIA CONTENT ACCESS: ALGORITHMS AND SYSTEMS II, 2008, 6820
  • [7] Using Multimodal Contrastive Knowledge Distillation for Video-Text Retrieval
    Ma, Wentao
    Chen, Qingchao
    Zhou, Tongqing
    Zhao, Shan
    Cai, Zhiping
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (10) : 5486 - 5497
  • [8] Multilingual Knowledge Graph Completion via Ensemble Knowledge Transfer
    Chen, Xuelu
    Chen, Muhao
    Fan, Changjun
    Uppunda, Ankith
    Sun, Yizhou
    Zaniolo, Carlo
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020,
  • [9] Knowledge Distillation-Based Multilingual Fusion Code Retrieval
    Li, Wen
    Xu, Junfei
    Chen, Qi
    ALGORITHMS, 2022, 15 (01)
  • [10] VIDLANKD: Improving Language Understanding via Video-Distilled Knowledge Transfer
    Tang, Zineng
    Cho, Jaemin
    Tan, Hao
    Bansal, Mohit
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,