Exploiting CLIP for Zero-shot HOI Detection Requires Knowledge Distillation at Multiple Levels

被引:1
|
作者
Wan, Bo [1 ]
Tuytelaars, Tinne [1 ]
机构
[1] Katholieke Univ Leuven, ESAT, Leuven, Belgium
基金
欧洲研究理事会;
关键词
D O I
10.1109/WACV57701.2024.00182
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we investigate the task of zero-shot human-object interaction (HOI) detection, a novel paradigm for identifying HOIs without the need for task-specific annotations. To address this challenging task, we employ CLIP, a large-scale pre-trained vision-language model (VLM), for knowledge distillation on multiple levels. Specifically, we design a multi-branch neural network that leverages CLIP for learning HOI representations at various levels, including global images, local union regions encompassing human-object pairs, and individual instances of humans or objects. To train our model, CLIP is utilized to generate HOI scores for both global images and local union regions that serve as supervision signals. The extensive experiments demonstrate the effectiveness of our novel multi-level CLIP knowledge integration strategy. Notably, the model achieves strong performance, which is even comparable with some fully-supervised and weakly-supervised methods on the public HICO-DET benchmark. Code is available at https://github.com/bobwan1995/Zeroshot-HOI-with-CLIP.
引用
收藏
页码:1794 / 1804
页数:11
相关论文
共 50 条
  • [21] A dynamic semantic knowledge graph for zero-shot object detection
    Wen Lv
    Hongbo Shi
    Shuai Tan
    Bing Song
    Yang Tao
    The Visual Computer, 2023, 39 : 4513 - 4527
  • [22] A dynamic semantic knowledge graph for zero-shot object detection
    Lv, Wen
    Shi, Hongbo
    Tan, Shuai
    Song, Bing
    Tao, Yang
    VISUAL COMPUTER, 2023, 39 (10): : 4513 - 4527
  • [23] Generalized Zero-shot Intent Detection via Commonsense Knowledge
    Siddique, A. B.
    Jamour, Fuad
    Xu, Luxun
    Hristidis, Vagelis
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1925 - 1929
  • [24] Zero-Shot Object Detection
    Bansal, Ankan
    Sikka, Karan
    Sharma, Gaurav
    Chellappa, Rama
    Divakaran, Ajay
    COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 397 - 414
  • [25] Anomaly Detection Based on Zero-Shot Outlier Synthesis and Hierarchical Feature Distillation
    Rivera, Adin Ramirez
    Khan, Adil
    Bekkouch, Imad Eddine Ibrahim
    Sheikh, Taimoor Shakeel
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (01) : 281 - 291
  • [26] Enhancing Zero-Shot Anomaly Detection: CLIP-SAM Collaboration with Cascaded Prompts
    Hou, Yanning
    Xu, Ke
    Li, Junfa
    Ruan, Yanran
    Qiu, Jianfeng
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT II, 2025, 15032 : 46 - 60
  • [27] Relationship-Preserving Knowledge Distillation for Zero-Shot Sketch Based Image Retrieval
    Tian, Jialin
    Xu, Xing
    Wang, Zheng
    Shen, Fumin
    Liu, Xin
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5473 - 5481
  • [28] Zero-Shot Cross-Lingual Knowledge Transfer in VQA via Multimodal Distillation
    Weng, Yu
    Dong, Jun
    He, Wenbin
    Chaomurilige
    Liu, Xuan
    Liu, Zheng
    Gao, Honghao
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, : 1 - 11
  • [29] Zero-Shot Visual Sentiment Prediction via Cross-Domain Knowledge Distillation
    Moroto, Yuya
    Ye, Yingrui
    Maeda, Keisuke
    Ogawa, Takahiro
    Haseyama, Miki
    IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 177 - 185
  • [30] Enhancing Zero-shot and Few-shot Stance Detection with Commonsense Knowledge Graph
    Liu, Rui
    Lin, Zheng
    Tan, Yutong
    Wang, Weiping
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3152 - 3157