Discovering Syntactic Interaction Clues for Human-Object Interaction Detection

被引：2

作者：

Lu, Jinguo ^{[1
]}

Ren, Weihong ^{[1
,2
]}

Jiang, Weibo ^{[1
]}

Chen, Xi'ai ^{[2
,3
]}

Wang, Qiang ^{[4
,5
]}

Han, Zhi ^{[2
,3
]}

Liu, Honghai ^{[1
]}

机构：

[1] Harbin Inst Technol, Shenzhen, Peoples R China

[2] Shenyang Univ, Shenyang, Peoples R China

[3] Chinese Acad Sci, Shenyang Inst Automat, State Key Lab Robot, Shenyang, Peoples R China

[4] Chinese Acad Sci, Inst Robot, Beijing, Peoples R China

[5] Chinese Acad Sci, Inst Intelligent Mfg, Beijing, Peoples R China

来源：

2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2024年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/CVPR52733.2024.02665

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, Vision-Language Model (VLM) has greatly advanced the Human-Object Interaction (HOI) detection. The existing VLM-based HOI detectors typically adopt a handcrafted template (e.g., a photo of a person [action] a/an [object]) to acquire text knowledge through the VLM text encoder. However, such approaches, only encoding the action-specific text prompts in vocabulary level, may suffer from learning ambiguity without exploring the fine-grained clues from the perspective of interaction context. In this paper, we propose a novel method to discover Syntactic Interaction Clues for HOI detection (SICHOI) by using VLM. Specifically, we first investigate what are the essential elements for an interaction context, and then establish a syntactic interaction bank from three levels: spatial relationship, action-oriented posture and situational condition. Further, to align visual features with the syntactic interaction bank, we adopt a multi-view extractor to jointly aggregate visual features from instance, interaction, and image levels accordingly. In addition, we also introduce a dual cross-attention decoder to perform context propagation between text knowledge and visual features, thereby enhancing the HOI detection. Experimental results demonstrate that our proposed method achieves state-of-the-art performance on HICO-DET and V-COCO.

引用

页码：28212 / 28222

页数：11

共 50 条

[21] Parallel disentangling network for human-object interaction detection
Cheng, Yamin
Duan, Hancong
Wang, Chen
Chen, Zhijun
PATTERN RECOGNITION, 2024, 146
[22] Human-Object Interaction Detection Based on Star Graph
Cai, Shuang
Ma, Shiwei
Gu, Dongzhou
Wang, Chang
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (09)
[23] Transferable Interactiveness Knowledge for Human-Object Interaction Detection
Li, Yong-Lu
Zhou, Siyuan
Huang, Xijie
Xu, Liang
Ma, Ze
Fang, Hao-Shu
Wang, Yan-Feng
Lu, Cewu
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3580 - 3589
[24] Affordance Transfer Learning for Human-Object Interaction Detection
Hou, Zhi
Yu, Baosheng
Qiao, Yu
Peng, Xiaojiang
Tao, Dacheng
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 495 - 504
[25] Human-Object Interaction Detection via Disentangled Transformer
Zhou, Desen
Liu, Zhichao
Wang, Jian
Wang, Leshan
Hu, Tao
Ding, Errui
Wang, Jingdong
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19546 - 19555
[26] Spatial-Net for Human-Object Interaction Detection
Mansour, Ahmed E.
Mohammed, Ammar
Elsayed, Hussein Abd El Atty
Elramly, Salwa
IEEE Access, 2022, 10 : 88920 - 88931
[27] Reimagining Violent Action Detection with Human-Object Interaction
Baskaran, Vishnu Monn
Sutopo, Ricky
Lim, JunYi
Lim, Joanne Mun-Yee
Wong, KokSheik
2024 IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE, AVSS 2024, 2024,
[28] Human-Object Interaction Detection with Ratio-Transformer
Wang, Tianlang
Lu, Tao
Fang, Wenhua
Zhang, Yanduo
SYMMETRY-BASEL, 2022, 14 (08):
[29] Semantic Inference Network for Human-Object Interaction Detection
Liu, Hongyi
Mo, Lisha
Ma, Huimin
IMAGE AND GRAPHICS, ICIG 2019, PT I, 2019, 11901 : 518 - 529
[30] Geometric Features Enhanced Human-Object Interaction Detection
Zhu, Manli
Ho, Edmond S. L.
Chen, Shuang
Yang, Longzhi
Shum, Hubert P. H.
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 1

← 1 2 3 4 5 →