Zero-Shot Video Grounding With Pseudo Query Lookup and Verification

被引:4
|
作者
Lu, Yu [1 ]
Quan, Ruijie [2 ]
Zhu, Linchao [2 ]
Yang, Yi [2 ]
机构
[1] Univ Technol Sydney, Australian Artificial Intelligence Inst, ReLER Lab, Ultimo, NSW 2007, Australia
[2] Zhejiang Univ, Coll Comp Sci & Technol, CCAI, Hangzhou 310027, Peoples R China
基金
澳大利亚研究理事会;
关键词
Grounding; Detectors; Proposals; Training; Task analysis; Visualization; Semantics; Video grounding; zero-shot learning; vision and language; NETWORK; LOCALIZATION;
D O I
10.1109/TIP.2024.3365249
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video grounding, the process of identifying a specific moment in an untrimmed video based on a natural language query, has become a popular topic in video understanding. However, fully supervised learning approaches for video grounding that require large amounts of annotated data can be expensive and time-consuming. Recently, zero-shot video grounding (ZS-VG) methods that leverage pre-trained object detectors and language models to generate pseudo-supervision for training video grounding models have been developed. However, these approaches have limitations in recognizing diverse categories and capturing specific dynamics and interactions in the video context. To tackle these challenges, we introduce a novel two-stage ZS-VG framework called Lookup-and-Verification (LoVe), which treats the pseudo-query generation procedure as a video-to-concept retrieval problem. Our approach allows for the extraction of diverse concepts from an open-concept pool and employs a verification process to ensure the relevance of the retrieved concepts to the objects or events of interest in the video proposals. Comprehensive experimental results on the Charades-STA, ActivityNet-Captions, and DiDeMo datasets demonstrate the effectiveness of the LoVe framework.
引用
收藏
页码:1643 / 1654
页数:12
相关论文
共 50 条
  • [11] Triple Verification Network for Generalized Zero-Shot Learning
    Zhang, Haofeng
    Long, Yang
    Guan, Yu
    Shao, Ling
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (01) : 506 - 517
  • [12] Dual-verification network for zero-shot learning
    Zhang, Haofeng
    Long, Yang
    Yang, Wankou
    Shao, Ling
    INFORMATION SCIENCES, 2019, 470 : 43 - 57
  • [13] Person Search by Text Attribute Query as Zero-Shot Learning
    Dong, Qi
    Gong, Shaogang
    Zhu, Xiatian
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3651 - 3660
  • [14] QueryForm: A Simple Zero-shot Form Entity Query Framework
    Wang, Zifen
    Zhang, Zizhao
    Devlin, Jacob
    Lee, Chen-Yu
    Su, Guolong
    Zhang, Hao
    Dy, Jennifer
    Perot, Vincent
    Pfister, Tomas
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 4146 - 4159
  • [15] Grounding Visual Concepts for Zero-Shot Event Detection and Event Captioning
    Li, Zhihui
    Chang, Xiaojun
    Yao, Lina
    Pan, Shirui
    Ge Zongyuan
    Zhang, Huaxiang
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 297 - 305
  • [16] Zero-Shot Video Retrieval Using Content and Concepts
    Dalton, Jeffrey
    Allan, James
    Mirajkar, Pranav
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 1857 - 1860
  • [17] Zero-Shot Open Entity Typing as Type-Compatible Grounding
    Zhou, Ben
    Khashabi, Daniel
    Tsai, Chen-Tse
    Roth, Dan
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2065 - 2076
  • [18] Latent Concept Extraction for Zero-shot Video Retrieval
    Ueki, Kazuya
    2018 INTERNATIONAL CONFERENCE ON IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ), 2018,
  • [19] Orthogonal Temporal Interpolation for Zero-Shot Video Recognition
    Zhu, Yan
    Zhuo, Junbao
    Ma, Bin
    Geng, Jiajia
    Wei, Xiaoming
    Wei, Xiaolin
    Wang, Shuhui
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7491 - 7501
  • [20] Learning to Model Relationships for Zero-Shot Video Classification
    Gao, Junyu
    Zhang, Tianzhu
    Xu, Changsheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (10) : 3476 - 3491