Zero-Shot Video Grounding With Pseudo Query Lookup and Verification

被引:4
|
作者
Lu, Yu [1 ]
Quan, Ruijie [2 ]
Zhu, Linchao [2 ]
Yang, Yi [2 ]
机构
[1] Univ Technol Sydney, Australian Artificial Intelligence Inst, ReLER Lab, Ultimo, NSW 2007, Australia
[2] Zhejiang Univ, Coll Comp Sci & Technol, CCAI, Hangzhou 310027, Peoples R China
基金
澳大利亚研究理事会;
关键词
Grounding; Detectors; Proposals; Training; Task analysis; Visualization; Semantics; Video grounding; zero-shot learning; vision and language; NETWORK; LOCALIZATION;
D O I
10.1109/TIP.2024.3365249
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video grounding, the process of identifying a specific moment in an untrimmed video based on a natural language query, has become a popular topic in video understanding. However, fully supervised learning approaches for video grounding that require large amounts of annotated data can be expensive and time-consuming. Recently, zero-shot video grounding (ZS-VG) methods that leverage pre-trained object detectors and language models to generate pseudo-supervision for training video grounding models have been developed. However, these approaches have limitations in recognizing diverse categories and capturing specific dynamics and interactions in the video context. To tackle these challenges, we introduce a novel two-stage ZS-VG framework called Lookup-and-Verification (LoVe), which treats the pseudo-query generation procedure as a video-to-concept retrieval problem. Our approach allows for the extraction of diverse concepts from an open-concept pool and employs a verification process to ensure the relevance of the retrieved concepts to the objects or events of interest in the video proposals. Comprehensive experimental results on the Charades-STA, ActivityNet-Captions, and DiDeMo datasets demonstrate the effectiveness of the LoVe framework.
引用
收藏
页码:1643 / 1654
页数:12
相关论文
共 50 条
  • [1] Language-free Training for Zero-shot Video Grounding
    Kim, Dahye
    Park, Jungin
    Lee, Jiyoung
    Park, Seongheon
    Sohn, Kwanghoon
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2538 - 2547
  • [2] Zero-Shot Video Grounding for Automatic Video Understanding in Sustainable Smart Cities
    Wang, Ping
    Sun, Li
    Wang, Liuan
    Sun, Jun
    SUSTAINABILITY, 2023, 15 (01)
  • [3] Zero-shot Query Contextualization for Conversational Search
    Krasakis, Antonios Minas
    Yates, Andrew
    Kanoulas, Evangelos
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 1880 - 1884
  • [4] Zero-shot Query Reformulation for Conversational Search
    Yang, Dayu
    Zhang, Yue
    Fang, Hui
    PROCEEDINGS OF THE 2023 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2023, 2023, : 257 - 263
  • [5] VTG-GPT: Tuning-Free Zero-Shot Video Temporal Grounding with GPT
    Xu, Yifang
    Sun, Yunzhuo
    Xie, Zien
    Zhai, Benxiang
    Du, Sidan
    APPLIED SCIENCES-BASEL, 2024, 14 (05):
  • [6] Zero-shot Fact Verification by Claim Generation
    Pan, Liangming
    Chen, Wenhu
    Xiong, Wenhan
    Kan, Min-Yen
    Wang, William Yang
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 476 - 483
  • [7] Zero-shot Natural Language Video Localization
    Nam, Jinwoo
    Ahn, Daechul
    Kang, Dongyeop
    Ha, Seong Jong
    Choi, Jonghyun
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1450 - 1459
  • [8] Zero-Shot Grounding of Objects from Natural Language Queries
    Sadhu, Arka
    Chen, Kan
    Nevatia, Ram
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4693 - 4702
  • [9] VidToMe: Video Token Merging for Zero-Shot Video Editing
    Li, Xirui
    Ma, Chao
    Yang, Xiaokang
    Yang, Ming-Hsuan
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 7486 - 7495
  • [10] Generating Structured Pseudo Labels for Noise-resistant Zero-shot Video Sentence Localization
    Zheng, Minghang
    Gong, Shaogang
    Jin, Hailin
    Peng, Yuxin
    Liu, Yang
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14197 - 14209