Exploiting CLIP for Zero-shot HOI Detection Requires Knowledge Distillation at Multiple Levels

被引:1
|
作者
Wan, Bo [1 ]
Tuytelaars, Tinne [1 ]
机构
[1] Katholieke Univ Leuven, ESAT, Leuven, Belgium
基金
欧洲研究理事会;
关键词
D O I
10.1109/WACV57701.2024.00182
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we investigate the task of zero-shot human-object interaction (HOI) detection, a novel paradigm for identifying HOIs without the need for task-specific annotations. To address this challenging task, we employ CLIP, a large-scale pre-trained vision-language model (VLM), for knowledge distillation on multiple levels. Specifically, we design a multi-branch neural network that leverages CLIP for learning HOI representations at various levels, including global images, local union regions encompassing human-object pairs, and individual instances of humans or objects. To train our model, CLIP is utilized to generate HOI scores for both global images and local union regions that serve as supervision signals. The extensive experiments demonstrate the effectiveness of our novel multi-level CLIP knowledge integration strategy. Notably, the model achieves strong performance, which is even comparable with some fully-supervised and weakly-supervised methods on the public HICO-DET benchmark. Code is available at https://github.com/bobwan1995/Zeroshot-HOI-with-CLIP.
引用
收藏
页码:1794 / 1804
页数:11
相关论文
共 50 条
  • [1] CLIP4HOI: Towards Adapting CLIP for Practical Zero-Shot HOI Detection
    Mao, Yunyao
    Deng, Jiajun
    Zhou, Wengang
    Li, Li
    Fang, Yao
    Li, Houqiang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [2] End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation
    Wu, Mingrui
    Gu, Jiaxin
    Shen, Yunhang
    Lin, Mingbao
    Chen, Chao
    Sun, Xiaoshuai
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 2839 - 2846
  • [3] Zero-Shot Knowledge Distillation in Deep Networks
    Nayak, Gaurav Kumar
    Mopuri, Konda Reddy
    Shaj, Vaisakh
    Babu, R. Venkatesh
    Chakraborty, Anirban
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [4] CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No
    Wang, Hualiang
    Li, Yi
    Yao, Huifeng
    Li, Xiaomeng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1802 - 1812
  • [5] HOICS: ZERO-SHOT HOI DETECTION VIA COMPATIBILITY SELF-LEARNING
    Jiang, Miao
    Li, Min
    Ren, Junxing
    Huang, Weiqing
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 4185 - 4189
  • [6] Exploring Conditional Multi-modal Prompts for Zero-Shot HOI Detection
    Lei, Ting
    Yin, Shaofeng
    Peng, Yuxin
    Liu, Yang
    COMPUTER VISION-ECCV 2024, PT LXXXII, 2025, 15140 : 1 - 19
  • [7] Towards Zero-Shot Knowledge Distillation for Natural Language Processing
    Rashid, Ahmad
    Lioutas, Vasileios
    Ghaddar, Abbas
    Rezagholizadeh, Mehdi
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 6551 - 6561
  • [8] Online Zero-Shot Classification with CLIP
    Qian, Qi
    Hu, Juhua
    COMPUTER VISION - ECCV 2024, PT LXXVII, 2024, 15135 : 462 - 477
  • [9] Knowledge Distillation Classifier Generation Network for Zero-Shot Learning
    Yu, Yunlong
    Li, Bin
    Ji, Zhong
    Han, Jungong
    Zhang, Zhongfei
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (06) : 3183 - 3194
  • [10] Attribute Distillation for Zero-Shot Recognition
    Li, Houjun
    Wei, Boquan
    Computer Engineering and Applications, 60 (09): : 219 - 227