An Empirical Study of Frame Selection for Text-to-Video Retrieval

被引：0

作者：

Wu, Mengxia ^{[1
]}

Cao, Min ^{[1
]}

Bai, Yang ^{[1
]}

Zeng, Ziyin ^{[1
]}

Chen, Chen ^{[2
]}

Nie, Liqiang ^{[3
]}

Zhang, Min ^{[1
]}

机构：

[1] Soochow Univ, Suzhou, Peoples R China

[2] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China

[3] Harbin Inst Technol, Shenzhen, Peoples R China

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023 | 2023年

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text-to-video retrieval (TVR) aims to find the most relevant video in a large video gallery given a query text. The intricate and abundant context of the video challenges the performance and efficiency of TVR. To handle the serialized video contexts, existing methods typically select a subset of frames within a video to represent the video content for TVR. How to select the most representative frames is a crucial issue, whereby the selected frames are required to not only retain the semantic information of the video but also promote retrieval efficiency by excluding temporally redundant frames. In this paper, we make the first empirical study of frame selection for TVR. We systemically classify existing frame selection methods into text-free and text-guided ones, under which we detailedly analyze six different frame selections in terms of effectiveness and efficiency. Among them, two frame selections are first developed in this paper. According to the comprehensive analysis on multiple TVR benchmarks, we empirically conclude that the TVR with proper frame selections can significantly improve the retrieval efficiency without sacrificing the retrieval performance.

引用

页码：6821 / 6832

页数：12

共 50 条

[21] Text-to-video Generation: Research Status, Progress and Challenges
Deng Z.
He X.
Peng Y.
Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2024, 46 (05): : 1632 - 1644
[22] MotionDirector: Motion Customization of Text-to-Video Diffusion Models
Zhao, Rui
Gu, Yuchao
Wu, Jay Zhangjie
Zhang, David Junhao
Liu, Jia-Wei
Wu, Weijia
Keppo, Jussi
Shou, Mike Zheng
COMPUTER VISION - ECCV 2024, PT LVI, 2025, 15114 : 273 - 290
[23] Modeling Accounting Workplace Interactions with Text-to-Video Animation
Phillips, Fred
Sheehan, Norman T.
ACCOUNTING PERSPECTIVES, 2013, 12 (01) : 75 - 87
[24] An Investigation into the Issues Concerning the Copyright of Content Generated by Text-to-Video AI
Zhou Chunguang
Yi Jia
Contemporary Social Sciences, 2024, 9 (05) : 95 - 117
[25] SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models
Guo, Yuwei
Yang, Ceyuan
Rao, Anyi
Agrawala, Maneesh
Lin, Dahua
Dai, Bo
COMPUTER VISION - ECCV 2024, PT XLII, 2025, 15100 : 330 - 348
[26] A dataset of text prompts, videos and video quality metrics from generative text-to-video AI models
Chivileva, Iya
Lynch, Philip
Ward, Tomas E.
Smeaton, Alan F.
DATA IN BRIEF, 2024, 54
[27] Text-to-video generative artificial intelligence: sora in neurosurgery: correspondence
Daungsupawong, Hinpetch
Wiwanitkit, Viroj
NEUROSURGICAL REVIEW, 2024, 47 (01)
[28] Conditional GAN with Discriminative Filter Generation for Text-to-Video Synthesis
Balaji, Yogesh
Min, Martin Renqiang
Bai, Bing
Chellappa, Rama
Graf, Hans Peter
PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 1995 - 2001
[29] ScenarioDiff: Text-to-video Generation with Dynamic Transformations of Scene Conditions
Zhang, Yipeng
Wang, Xin
Chen, Hong
Qin, Chenyang
Hao, Yibo
Mei, Hong
Zhu, Wenwu
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025,
[30] Video Moment Retrieval from Text Queries via Single Frame Annotation
Cui, Ran
Qian, Tianwen
Peng, Pai
Daskalaki, Elena
Chen, Jingjing
Guo, Xiaowei
Sun, Huyang
Jiang, Yu-Gang
PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 1033 - 1043

← 1 2 3 4 5 →