Zero-Shot Video Grounding With Pseudo Query Lookup and Verification

被引：4

作者：

Lu, Yu ^{[1
]}

Quan, Ruijie ^{[2
]}

Zhu, Linchao ^{[2
]}

Yang, Yi ^{[2
]}

机构：

[1] Univ Technol Sydney, Australian Artificial Intelligence Inst, ReLER Lab, Ultimo, NSW 2007, Australia

[2] Zhejiang Univ, Coll Comp Sci & Technol, CCAI, Hangzhou 310027, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2024年 / 33卷

基金：

澳大利亚研究理事会;

关键词：

Grounding; Detectors; Proposals; Training; Task analysis; Visualization; Semantics; Video grounding; zero-shot learning; vision and language; NETWORK; LOCALIZATION;

D O I：

10.1109/TIP.2024.3365249

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video grounding, the process of identifying a specific moment in an untrimmed video based on a natural language query, has become a popular topic in video understanding. However, fully supervised learning approaches for video grounding that require large amounts of annotated data can be expensive and time-consuming. Recently, zero-shot video grounding (ZS-VG) methods that leverage pre-trained object detectors and language models to generate pseudo-supervision for training video grounding models have been developed. However, these approaches have limitations in recognizing diverse categories and capturing specific dynamics and interactions in the video context. To tackle these challenges, we introduce a novel two-stage ZS-VG framework called Lookup-and-Verification (LoVe), which treats the pseudo-query generation procedure as a video-to-concept retrieval problem. Our approach allows for the extraction of diverse concepts from an open-concept pool and employs a verification process to ensure the relevance of the retrieved concepts to the objects or events of interest in the video proposals. Comprehensive experimental results on the Charades-STA, ActivityNet-Captions, and DiDeMo datasets demonstrate the effectiveness of the LoVe framework.

引用

页码：1643 / 1654

页数：12

共 50 条

[41] Zero-Shot Video Classification Combined with 3D DenseNet
Yin M.
Zhao X.
Guo S.
Chen Z.
Zhang J.
Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2023, 48 (03): : 480 - 488
[42] Zero-shot Video Classification with Appropriate Web and Task Knowledge Transfer
Zhuo, Junbao
Zhu, Yan
Cui, Shuhao
Wang, Shuhui
Ma, Bin
Huang, Qingming
Wei, Xiaoming
Wei, Xiaolin
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5761 - 5772
[43] Zero-shot Video Moment Retrieval With Off-the-Shelf Models
Diwan, Anuj
Peng, Puyuan
Mooney, Raymond J.
TRANSFER LEARNING FOR NATURAL LANGUAGE PROCESSING WORKSHOP, VOL 203, 2022, 203 : 10 - 21
[44] Zero-Shot Video Moment Retrieval With Angular Reconstructive Text Embeddings
Jiang, Xun
Xu, Xing
Zhou, Zailei
Yang, Yang
Shen, Fumin
Shen, Heng Tao
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9657 - 9670
[45] Motion-Attentive Transition for Zero-Shot Video Object Segmentation
Zhou, Tianfei
Wang, Shunzhou
Zhou, Yi
Yao, Yazhou
Li, Jianwu
Shao, Ling
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13066 - 13073
[46] ReGen: A good Generative zero-shot video classifier should be Rewarded
Bulat, Adrian
Sanchez, Enrique
Martinez, Brais
Tzimiropoulos, Georgios
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13477 - 13487
[47] Hierarchical Graph Pattern Understanding for Zero-Shot Video Object Segmentation
Pei, Gensheng
Shen, Fumin
Yao, Yazhou
Chen, Tao
Hua, Xian-Sheng
Shen, Heng-Tao
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 5909 - 5920
[48] Generalized Zero-Shot Video Classification via Generative Adversarial Networks
Hong, Mingyao
Li, Guorong
Zhang, Xinfeng
Huang, Qingming
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2419 - 2426
[49] Semantic matters: A constrained approach for zero-shot video action recognition
Quan, Zhenzhen
Chen, Jialei
Deguchi, Daisuke
Sun, Jie
Zhang, Chenkai
Li, Yujun
Murase, Hiroshi
PATTERN RECOGNITION, 2025, 162
[50] Zero-Shot Learning on Human-Object Interaction Recognition in video
Maraghi, Vali Ollah
Faez, Karim
2019 5TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS 2019), 2019,

← 1 2 3 4 5 →