Zero-Shot Video Grounding With Pseudo Query Lookup and Verification

被引:4
|
作者
Lu, Yu [1 ]
Quan, Ruijie [2 ]
Zhu, Linchao [2 ]
Yang, Yi [2 ]
机构
[1] Univ Technol Sydney, Australian Artificial Intelligence Inst, ReLER Lab, Ultimo, NSW 2007, Australia
[2] Zhejiang Univ, Coll Comp Sci & Technol, CCAI, Hangzhou 310027, Peoples R China
基金
澳大利亚研究理事会;
关键词
Grounding; Detectors; Proposals; Training; Task analysis; Visualization; Semantics; Video grounding; zero-shot learning; vision and language; NETWORK; LOCALIZATION;
D O I
10.1109/TIP.2024.3365249
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video grounding, the process of identifying a specific moment in an untrimmed video based on a natural language query, has become a popular topic in video understanding. However, fully supervised learning approaches for video grounding that require large amounts of annotated data can be expensive and time-consuming. Recently, zero-shot video grounding (ZS-VG) methods that leverage pre-trained object detectors and language models to generate pseudo-supervision for training video grounding models have been developed. However, these approaches have limitations in recognizing diverse categories and capturing specific dynamics and interactions in the video context. To tackle these challenges, we introduce a novel two-stage ZS-VG framework called Lookup-and-Verification (LoVe), which treats the pseudo-query generation procedure as a video-to-concept retrieval problem. Our approach allows for the extraction of diverse concepts from an open-concept pool and employs a verification process to ensure the relevance of the retrieved concepts to the objects or events of interest in the video proposals. Comprehensive experimental results on the Charades-STA, ActivityNet-Captions, and DiDeMo datasets demonstrate the effectiveness of the LoVe framework.
引用
收藏
页码:1643 / 1654
页数:12
相关论文
共 50 条
  • [41] Zero-Shot Video Classification Combined with 3D DenseNet
    Yin M.
    Zhao X.
    Guo S.
    Chen Z.
    Zhang J.
    Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2023, 48 (03): : 480 - 488
  • [42] Zero-shot Video Classification with Appropriate Web and Task Knowledge Transfer
    Zhuo, Junbao
    Zhu, Yan
    Cui, Shuhao
    Wang, Shuhui
    Ma, Bin
    Huang, Qingming
    Wei, Xiaoming
    Wei, Xiaolin
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5761 - 5772
  • [43] Zero-shot Video Moment Retrieval With Off-the-Shelf Models
    Diwan, Anuj
    Peng, Puyuan
    Mooney, Raymond J.
    TRANSFER LEARNING FOR NATURAL LANGUAGE PROCESSING WORKSHOP, VOL 203, 2022, 203 : 10 - 21
  • [44] Zero-Shot Video Moment Retrieval With Angular Reconstructive Text Embeddings
    Jiang, Xun
    Xu, Xing
    Zhou, Zailei
    Yang, Yang
    Shen, Fumin
    Shen, Heng Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9657 - 9670
  • [45] Motion-Attentive Transition for Zero-Shot Video Object Segmentation
    Zhou, Tianfei
    Wang, Shunzhou
    Zhou, Yi
    Yao, Yazhou
    Li, Jianwu
    Shao, Ling
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13066 - 13073
  • [46] ReGen: A good Generative zero-shot video classifier should be Rewarded
    Bulat, Adrian
    Sanchez, Enrique
    Martinez, Brais
    Tzimiropoulos, Georgios
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13477 - 13487
  • [47] Hierarchical Graph Pattern Understanding for Zero-Shot Video Object Segmentation
    Pei, Gensheng
    Shen, Fumin
    Yao, Yazhou
    Chen, Tao
    Hua, Xian-Sheng
    Shen, Heng-Tao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 5909 - 5920
  • [48] Generalized Zero-Shot Video Classification via Generative Adversarial Networks
    Hong, Mingyao
    Li, Guorong
    Zhang, Xinfeng
    Huang, Qingming
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2419 - 2426
  • [49] Semantic matters: A constrained approach for zero-shot video action recognition
    Quan, Zhenzhen
    Chen, Jialei
    Deguchi, Daisuke
    Sun, Jie
    Zhang, Chenkai
    Li, Yujun
    Murase, Hiroshi
    PATTERN RECOGNITION, 2025, 162
  • [50] Zero-Shot Learning on Human-Object Interaction Recognition in video
    Maraghi, Vali Ollah
    Faez, Karim
    2019 5TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS 2019), 2019,