Zero-Shot Video Grounding With Pseudo Query Lookup and Verification

被引:4
|
作者
Lu, Yu [1 ]
Quan, Ruijie [2 ]
Zhu, Linchao [2 ]
Yang, Yi [2 ]
机构
[1] Univ Technol Sydney, Australian Artificial Intelligence Inst, ReLER Lab, Ultimo, NSW 2007, Australia
[2] Zhejiang Univ, Coll Comp Sci & Technol, CCAI, Hangzhou 310027, Peoples R China
基金
澳大利亚研究理事会;
关键词
Grounding; Detectors; Proposals; Training; Task analysis; Visualization; Semantics; Video grounding; zero-shot learning; vision and language; NETWORK; LOCALIZATION;
D O I
10.1109/TIP.2024.3365249
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video grounding, the process of identifying a specific moment in an untrimmed video based on a natural language query, has become a popular topic in video understanding. However, fully supervised learning approaches for video grounding that require large amounts of annotated data can be expensive and time-consuming. Recently, zero-shot video grounding (ZS-VG) methods that leverage pre-trained object detectors and language models to generate pseudo-supervision for training video grounding models have been developed. However, these approaches have limitations in recognizing diverse categories and capturing specific dynamics and interactions in the video context. To tackle these challenges, we introduce a novel two-stage ZS-VG framework called Lookup-and-Verification (LoVe), which treats the pseudo-query generation procedure as a video-to-concept retrieval problem. Our approach allows for the extraction of diverse concepts from an open-concept pool and employs a verification process to ensure the relevance of the retrieved concepts to the objects or events of interest in the video proposals. Comprehensive experimental results on the Charades-STA, ActivityNet-Captions, and DiDeMo datasets demonstrate the effectiveness of the LoVe framework.
引用
收藏
页码:1643 / 1654
页数:12
相关论文
共 50 条
  • [21] Pseudo Transfer with Marginalized Corrupted Attribute for Zero-shot Learning
    Long, Teng
    Xu, Xing
    Li, Youyou
    Shen, Fumin
    Song, Jingkuan
    Shen, Heng Tao
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1802 - 1810
  • [22] GenQREnsemble: Zero-Shot LLM Ensemble Prompting for Generative Query Reformulation
    Dhole, Kaustubh D.
    Agichtein, Eugene
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT III, 2024, 14610 : 326 - 335
  • [23] DUET: Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning
    Chen, Zhuo
    Huang, Yufeng
    Chen, Jiaoyan
    Geng, Yuxia
    Zhang, Wen
    Fang, Yin
    Pan, Jeff Z.
    Chen, Huajun
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 405 - 413
  • [24] Zero-shot visual grounding via coarse-to-fine representation learning
    Mi, Jinpeng
    Jin, Shaofei
    Chen, Zhiqian
    Liu, Dan
    Wei, Xian
    Zhang, Jianwei
    NEUROCOMPUTING, 2024, 610
  • [25] Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
    Yang, Shuai
    Zhou, Yifan
    Liu, Ziwei
    Loy, Chen Change
    PROCEEDINGS OF THE SIGGRAPH ASIA 2023 CONFERENCE PAPERS, 2023,
  • [26] Efficient and consistent zero-shot video generation with diffusion models
    Frakes, Ethan
    Khalid, Umar
    Chen, Chen
    REAL-TIME IMAGE PROCESSING AND DEEP LEARNING 2024, 2024, 13034
  • [27] Prompt-based Zero-shot Video Moment Retrieval
    Wang, Guolong
    Wu, Xun
    Liu, Zhaoyuan
    Yan, Junchi
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
  • [28] Unleashing the Power of Contrastive Learning for Zero-Shot Video Summarization
    Pang, Zongshang
    Nakashima, Yuta
    Otani, Mayu
    Nagahara, Hajime
    JOURNAL OF IMAGING, 2024, 10 (09)
  • [29] SKETCHQL Demonstration: Zero-shot Video Moment Querying with Sketches
    Wu, Renzhi
    Chunduri, Pramod
    Shah, Dristi j
    Aravind, Ashmitha Julius
    Payani, Ali
    Chu, Xu
    Arulraj, Joy
    Rong, Kexin
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (12): : 4429 - 4432
  • [30] Video Attribute Prototype Network: A New Perspective for Zero-Shot Video Classification
    Wang, Bo
    Zhao, Kaili
    Zhao, Hongyang
    Pu, Shi
    Xiao, Bo
    Guo, Jun
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 315 - 324