Exploiting CLIP for Zero-shot HOI Detection Requires Knowledge Distillation at Multiple Levels

被引：1

作者：

Wan, Bo ^{[1
]}

Tuytelaars, Tinne ^{[1
]}

机构：

[1] Katholieke Univ Leuven, ESAT, Leuven, Belgium

来源：

2024 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION, WACV 2024 | 2024年

基金：

欧洲研究理事会;

关键词：

D O I：

10.1109/WACV57701.2024.00182

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we investigate the task of zero-shot human-object interaction (HOI) detection, a novel paradigm for identifying HOIs without the need for task-specific annotations. To address this challenging task, we employ CLIP, a large-scale pre-trained vision-language model (VLM), for knowledge distillation on multiple levels. Specifically, we design a multi-branch neural network that leverages CLIP for learning HOI representations at various levels, including global images, local union regions encompassing human-object pairs, and individual instances of humans or objects. To train our model, CLIP is utilized to generate HOI scores for both global images and local union regions that serve as supervision signals. The extensive experiments demonstrate the effectiveness of our novel multi-level CLIP knowledge integration strategy. Notably, the model achieves strong performance, which is even comparable with some fully-supervised and weakly-supervised methods on the public HICO-DET benchmark. Code is available at https://github.com/bobwan1995/Zeroshot-HOI-with-CLIP.

引用

页码：1794 / 1804

页数：11

共 50 条

[21] A dynamic semantic knowledge graph for zero-shot object detection
Wen Lv
Hongbo Shi
Shuai Tan
Bing Song
Yang Tao
The Visual Computer, 2023, 39 : 4513 - 4527
[22] A dynamic semantic knowledge graph for zero-shot object detection
Lv, Wen
Shi, Hongbo
Tan, Shuai
Song, Bing
Tao, Yang
VISUAL COMPUTER, 2023, 39 (10): : 4513 - 4527
[23] Generalized Zero-shot Intent Detection via Commonsense Knowledge
Siddique, A. B.
Jamour, Fuad
Xu, Luxun
Hristidis, Vagelis
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1925 - 1929
[24] Zero-Shot Object Detection
Bansal, Ankan
Sikka, Karan
Sharma, Gaurav
Chellappa, Rama
Divakaran, Ajay
COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 397 - 414
[25] Anomaly Detection Based on Zero-Shot Outlier Synthesis and Hierarchical Feature Distillation
Rivera, Adin Ramirez
Khan, Adil
Bekkouch, Imad Eddine Ibrahim
Sheikh, Taimoor Shakeel
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (01) : 281 - 291
[26] Enhancing Zero-Shot Anomaly Detection: CLIP-SAM Collaboration with Cascaded Prompts
Hou, Yanning
Xu, Ke
Li, Junfa
Ruan, Yanran
Qiu, Jianfeng
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT II, 2025, 15032 : 46 - 60
[27] Relationship-Preserving Knowledge Distillation for Zero-Shot Sketch Based Image Retrieval
Tian, Jialin
Xu, Xing
Wang, Zheng
Shen, Fumin
Liu, Xin
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5473 - 5481
[28] Zero-Shot Cross-Lingual Knowledge Transfer in VQA via Multimodal Distillation
Weng, Yu
Dong, Jun
He, Wenbin
Chaomurilige
Liu, Xuan
Liu, Zheng
Gao, Honghao
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, : 1 - 11
[29] Zero-Shot Visual Sentiment Prediction via Cross-Domain Knowledge Distillation
Moroto, Yuya
Ye, Yingrui
Maeda, Keisuke
Ogawa, Takahiro
Haseyama, Miki
IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 177 - 185
[30] Enhancing Zero-shot and Few-shot Stance Detection with Commonsense Knowledge Graph
Liu, Rui
Lin, Zheng
Tan, Yutong
Wang, Weiping
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3152 - 3157

← 1 2 3 4 5 →