Video Attribute Prototype Network: A New Perspective for Zero-Shot Video Classification

被引：0

作者：

Wang, Bo ^{[1
]}

Zhao, Kaili ^{[1
]}

Zhao, Hongyang ^{[1
]}

Pu, Shi

Xiao, Bo ^{[1
]}

Guo, Jun ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW | 2023年

关键词：

D O I：

10.1109/ICCVW60793.2023.00039

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video attributes, which leverage video contents to instantiate class semantics, play a critical role in diversifying semantics in zero-shot video classification, thereby facilitating semantic transfer from seen to unseen classes. However, few presences discuss video attributes, and most methods consider class names as class semantics that tend to be loosely defined. In this paper, we propose a Video Attribute Prototype Network (VAPNet) to generate video attributes that learns in-context semantics between video captions and class semantics. Specifically, we introduce a cross-attention module in the Transformer decoder by considering video captions as queries to probe and pool semantic-associated class-wise features. To alleviate noises in pre-extracted captions, we learn caption features through a stochastic representation derived from a Gaussian representation where the variance encodes uncertainties. We utilize a joint video-to-attribute and video-to-video contrastive loss to calibrate visual and semantic features. Experiments show that VAPNet significantly outperforms SoTA by relative improvements of 14.3% on UCF101 and 8.8% on HMDB51, and further surpasses the pre-trained vision-language SoTA by 4.1% and 17.2%. Code is available.

引用

页码：315 / 324

页数：10

共 50 条

[31] Attention-Based Video Disentangling and Matching Network for Zero-Shot Action Recognition
Su, Yong
Zhu, Shuang
Xing, Meng
Xu, Hengpeng
Li, Zhengtao
COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, VOL. 1, 2022, 878 : 368 - 375
[32] Hierarchical Co-Attention Propagation Network for Zero-Shot Video Object Segmentation
Pei, Gensheng
Yao, Yazhou
Shen, Fumin
Huang, Dan
Huang, Xingguo
Shen, Heng-Tao
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2348 - 2359
[33] MATNet: Motion-Attentive Transition Network for Zero-Shot Video Object Segmentation
Zhou, Tianfei
Li, Jianwu
Wang, Shunzhou
Tao, Ran
Shen, Jianbing
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 8326 - 8338
[34] Dual Prototype Contrastive Network for Generalized Zero-Shot Learning
Jiang, Huajie
Li, Zhengxian
Hu, Yongli
Yin, Baocai
Yang, Jian
van den Hengel, Anton
Yang, Ming-Hsuan
Qi, Yuankai
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1111 - 1122
[35] Efficient and consistent zero-shot video generation with diffusion models
Frakes, Ethan
Khalid, Umar
Chen, Chen
REAL-TIME IMAGE PROCESSING AND DEEP LEARNING 2024, 2024, 13034
[36] Prompt-based Zero-shot Video Moment Retrieval
Wang, Guolong
Wu, Xun
Liu, Zhaoyuan
Yan, Junchi
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
[37] Attribute subspaces for zero-shot learning
Zhou, Lei
Liu, Yang
Bai, Xiao
Li, Na
Yu, Xiaohan
Zhou, Jun
Hancock, Edwin R.
PATTERN RECOGNITION, 2023, 144
[38] Zero-Shot Learning with Attribute Selection
Guo, Yuchen
Ding, Guiguang
Han, Jungong
Tang, Sheng
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 6870 - 6877
[39] Unleashing the Power of Contrastive Learning for Zero-Shot Video Summarization
Pang, Zongshang
Nakashima, Yuta
Otani, Mayu
Nagahara, Hajime
JOURNAL OF IMAGING, 2024, 10 (09)
[40] SKETCHQL Demonstration: Zero-shot Video Moment Querying with Sketches
Wu, Renzhi
Chunduri, Pramod
Shah, Dristi j
Aravind, Ashmitha Julius
Payani, Ali
Chu, Xu
Arulraj, Joy
Rong, Kexin
PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (12): : 4429 - 4432

← 1 2 3 4 5 →