An Empirical Study of Frame Selection for Text-to-Video Retrieval

被引:0
|
作者
Wu, Mengxia [1 ]
Cao, Min [1 ]
Bai, Yang [1 ]
Zeng, Ziyin [1 ]
Chen, Chen [2 ]
Nie, Liqiang [3 ]
Zhang, Min [1 ]
机构
[1] Soochow Univ, Suzhou, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[3] Harbin Inst Technol, Shenzhen, Peoples R China
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-video retrieval (TVR) aims to find the most relevant video in a large video gallery given a query text. The intricate and abundant context of the video challenges the performance and efficiency of TVR. To handle the serialized video contexts, existing methods typically select a subset of frames within a video to represent the video content for TVR. How to select the most representative frames is a crucial issue, whereby the selected frames are required to not only retain the semantic information of the video but also promote retrieval efficiency by excluding temporally redundant frames. In this paper, we make the first empirical study of frame selection for TVR. We systemically classify existing frame selection methods into text-free and text-guided ones, under which we detailedly analyze six different frame selections in terms of effectiveness and efficiency. Among them, two frame selections are first developed in this paper. According to the comprehensive analysis on multiple TVR benchmarks, we empirically conclude that the TVR with proper frame selections can significantly improve the retrieval efficiency without sacrificing the retrieval performance.
引用
收藏
页码:6821 / 6832
页数:12
相关论文
共 50 条
  • [21] Text-to-video Generation: Research Status, Progress and Challenges
    Deng Z.
    He X.
    Peng Y.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2024, 46 (05): : 1632 - 1644
  • [22] MotionDirector: Motion Customization of Text-to-Video Diffusion Models
    Zhao, Rui
    Gu, Yuchao
    Wu, Jay Zhangjie
    Zhang, David Junhao
    Liu, Jia-Wei
    Wu, Weijia
    Keppo, Jussi
    Shou, Mike Zheng
    COMPUTER VISION - ECCV 2024, PT LVI, 2025, 15114 : 273 - 290
  • [23] Modeling Accounting Workplace Interactions with Text-to-Video Animation
    Phillips, Fred
    Sheehan, Norman T.
    ACCOUNTING PERSPECTIVES, 2013, 12 (01) : 75 - 87
  • [24] An Investigation into the Issues Concerning the Copyright of Content Generated by Text-to-Video AI
    Zhou Chunguang
    Yi Jia
    Contemporary Social Sciences, 2024, 9 (05) : 95 - 117
  • [25] SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models
    Guo, Yuwei
    Yang, Ceyuan
    Rao, Anyi
    Agrawala, Maneesh
    Lin, Dahua
    Dai, Bo
    COMPUTER VISION - ECCV 2024, PT XLII, 2025, 15100 : 330 - 348
  • [26] A dataset of text prompts, videos and video quality metrics from generative text-to-video AI models
    Chivileva, Iya
    Lynch, Philip
    Ward, Tomas E.
    Smeaton, Alan F.
    DATA IN BRIEF, 2024, 54
  • [27] Text-to-video generative artificial intelligence: sora in neurosurgery: correspondence
    Daungsupawong, Hinpetch
    Wiwanitkit, Viroj
    NEUROSURGICAL REVIEW, 2024, 47 (01)
  • [28] Conditional GAN with Discriminative Filter Generation for Text-to-Video Synthesis
    Balaji, Yogesh
    Min, Martin Renqiang
    Bai, Bing
    Chellappa, Rama
    Graf, Hans Peter
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 1995 - 2001
  • [29] ScenarioDiff: Text-to-video Generation with Dynamic Transformations of Scene Conditions
    Zhang, Yipeng
    Wang, Xin
    Chen, Hong
    Qin, Chenyang
    Hao, Yibo
    Mei, Hong
    Zhu, Wenwu
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025,
  • [30] Video Moment Retrieval from Text Queries via Single Frame Annotation
    Cui, Ran
    Qian, Tianwen
    Peng, Pai
    Daskalaki, Elena
    Chen, Jingjing
    Guo, Xiaowei
    Sun, Huyang
    Jiang, Yu-Gang
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 1033 - 1043