Video Attribute Prototype Network: A New Perspective for Zero-Shot Video Classification

被引:0
|
作者
Wang, Bo [1 ]
Zhao, Kaili [1 ]
Zhao, Hongyang [1 ]
Pu, Shi
Xiao, Bo [1 ]
Guo, Jun [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
关键词
D O I
10.1109/ICCVW60793.2023.00039
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video attributes, which leverage video contents to instantiate class semantics, play a critical role in diversifying semantics in zero-shot video classification, thereby facilitating semantic transfer from seen to unseen classes. However, few presences discuss video attributes, and most methods consider class names as class semantics that tend to be loosely defined. In this paper, we propose a Video Attribute Prototype Network (VAPNet) to generate video attributes that learns in-context semantics between video captions and class semantics. Specifically, we introduce a cross-attention module in the Transformer decoder by considering video captions as queries to probe and pool semantic-associated class-wise features. To alleviate noises in pre-extracted captions, we learn caption features through a stochastic representation derived from a Gaussian representation where the variance encodes uncertainties. We utilize a joint video-to-attribute and video-to-video contrastive loss to calibrate visual and semantic features. Experiments show that VAPNet significantly outperforms SoTA by relative improvements of 14.3% on UCF101 and 8.8% on HMDB51, and further surpasses the pre-trained vision-language SoTA by 4.1% and 17.2%. Code is available.
引用
收藏
页码:315 / 324
页数:10
相关论文
共 50 条
  • [31] Attention-Based Video Disentangling and Matching Network for Zero-Shot Action Recognition
    Su, Yong
    Zhu, Shuang
    Xing, Meng
    Xu, Hengpeng
    Li, Zhengtao
    COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, VOL. 1, 2022, 878 : 368 - 375
  • [32] Hierarchical Co-Attention Propagation Network for Zero-Shot Video Object Segmentation
    Pei, Gensheng
    Yao, Yazhou
    Shen, Fumin
    Huang, Dan
    Huang, Xingguo
    Shen, Heng-Tao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2348 - 2359
  • [33] MATNet: Motion-Attentive Transition Network for Zero-Shot Video Object Segmentation
    Zhou, Tianfei
    Li, Jianwu
    Wang, Shunzhou
    Tao, Ran
    Shen, Jianbing
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 8326 - 8338
  • [34] Dual Prototype Contrastive Network for Generalized Zero-Shot Learning
    Jiang, Huajie
    Li, Zhengxian
    Hu, Yongli
    Yin, Baocai
    Yang, Jian
    van den Hengel, Anton
    Yang, Ming-Hsuan
    Qi, Yuankai
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1111 - 1122
  • [35] Efficient and consistent zero-shot video generation with diffusion models
    Frakes, Ethan
    Khalid, Umar
    Chen, Chen
    REAL-TIME IMAGE PROCESSING AND DEEP LEARNING 2024, 2024, 13034
  • [36] Prompt-based Zero-shot Video Moment Retrieval
    Wang, Guolong
    Wu, Xun
    Liu, Zhaoyuan
    Yan, Junchi
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
  • [37] Attribute subspaces for zero-shot learning
    Zhou, Lei
    Liu, Yang
    Bai, Xiao
    Li, Na
    Yu, Xiaohan
    Zhou, Jun
    Hancock, Edwin R.
    PATTERN RECOGNITION, 2023, 144
  • [38] Zero-Shot Learning with Attribute Selection
    Guo, Yuchen
    Ding, Guiguang
    Han, Jungong
    Tang, Sheng
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 6870 - 6877
  • [39] Unleashing the Power of Contrastive Learning for Zero-Shot Video Summarization
    Pang, Zongshang
    Nakashima, Yuta
    Otani, Mayu
    Nagahara, Hajime
    JOURNAL OF IMAGING, 2024, 10 (09)
  • [40] SKETCHQL Demonstration: Zero-shot Video Moment Querying with Sketches
    Wu, Renzhi
    Chunduri, Pramod
    Shah, Dristi j
    Aravind, Ashmitha Julius
    Payani, Ali
    Chu, Xu
    Arulraj, Joy
    Rong, Kexin
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (12): : 4429 - 4432