Improving Video Retrieval Using Multilingual Knowledge Transfer

被引：2

作者：

Madasu, Avinash ^{[1
,2
]}

Aflalo, Estelle ^{[1
]}

Stan, Gabriela Ben Melech ^{[1
]}

Tseng, Shao-Yen ^{[1
]}

Bertasius, Gedas ^{[2
]}

Lal, Vasudev ^{[1
]}

机构：

[1] Intel Labs, Cognit Comp Res, Santa Clara, CA 95054 USA

[2] Univ North Carolina Chapel Hill, Chapel Hill, NC USA

来源：

ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT I | 2023年 / 13980卷

关键词：

Video-retrieval; Multi-lingual; Multi-modal;

D O I：

10.1007/978-3-031-28244-7_42

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Video retrieval has seen tremendous progress with the development of vision-language models. However, further improving these models require additional labelled data which is a huge manual effort. In this paper, we propose a framework MKTVR, that utilizes knowledge transfer from a multilingual model to boost the performance of video retrieval. We first use state-of-the-art machine translation models to construct pseudo ground-truth multilingual video-text pairs. We then use this data to learn a video-text representation where English and non-English text queries are represented in a common embedding space based on pretrained multilingual models. We evaluate our proposed approach on four English video retrieval datasets such as MSRVTT, MSVD, DiDeMo and Charades. Experimental results demonstrate that our approach achieves state-of-the-art results on all datasets outperforming previous models. Finally, we also evaluate our model on a multilingual video-retrieval dataset encompassing six languages and show that our model outperforms previous multilingual video retrieval models in a zero-shot setting.

引用

页码：669 / 684

页数：16

共 50 条

[41] CROSS-LANGUAGE KNOWLEDGE TRANSFER USING MULTILINGUAL DEEP NEURAL NETWORK WITH SHARED HIDDEN LAYERS
Huang, Jui-Ting
Li, Jinyu
Yu, Dong
Deng, Li
Gong, Yifan
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7304 - 7308
[42] VTC: Improving Video-Text Retrieval with User Comments
Hanu, Laura
Thewlis, James
Asano, Yuki M.
Rupprecht, Christian
COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 616 - 633
[43] Using shot frequency in video retrieval
Ruan Jian Xue Bao, 9 (956-961):
[44] Video retrieval using semantic data
Del Bimbo, A
STATE-OF-THE-ART IN CONTENT-BASED IMAGE AND VIDEO RETRIEVAL, 2001, 22 : 279 - 295
[45] Using structure for video object retrieval
Hohl, L
Souvannavong, F
Merialdo, B
Huet, B
IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS, 2004, 3115 : 564 - 572
[46] Improving interactive video retrieval by exploiting automatically-extracted video structural semantics
Mezaris, Vasileios
Sidiropoulos, Panagiotis
Kompatsiaris, Ioannis
FIFTH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2011), 2011, : 224 - 227
[47] Improving Patient Knowledge of Aneuploidy Testing Using an Educational Video A Randomized Controlled Trial
Mulla, Bethany M.
Chang, Olivia H.
Modest, Anna M.
Hacker, Michele R.
Marchand, Karen F.
O'Brien, Karen E.
OBSTETRICS AND GYNECOLOGY, 2018, 132 (02): : 445 - 452
[48] X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models
Zhengbao, Jiang
Anastasopoulos, Antonios
Jun, Araki
Haibo, Ding
Neubig, Graham
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 5943 - 5959
[49] Distillation for Multilingual Information Retrieval
Yang, Eugene
Lawrie, Dawn
Mayfield, James
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2368 - 2373
[50] Multilingual information retrieval system
Hong, Z
Syin, C
Lia, KF
MULTIMEDIA STORAGE AND ARCHIVING SYSTEMS, 1996, 2916 : 33 - 44

← 1 2 3 4 5 →