Discovering Syntactic Interaction Clues for Human-Object Interaction Detection

被引：2

作者：

Lu, Jinguo ^{[1
]}

Ren, Weihong ^{[1
,2
]}

Jiang, Weibo ^{[1
]}

Chen, Xi'ai ^{[2
,3
]}

Wang, Qiang ^{[4
,5
]}

Han, Zhi ^{[2
,3
]}

Liu, Honghai ^{[1
]}

机构：

[1] Harbin Inst Technol, Shenzhen, Peoples R China

[2] Shenyang Univ, Shenyang, Peoples R China

[3] Chinese Acad Sci, Shenyang Inst Automat, State Key Lab Robot, Shenyang, Peoples R China

[4] Chinese Acad Sci, Inst Robot, Beijing, Peoples R China

[5] Chinese Acad Sci, Inst Intelligent Mfg, Beijing, Peoples R China

来源：

2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2024年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/CVPR52733.2024.02665

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, Vision-Language Model (VLM) has greatly advanced the Human-Object Interaction (HOI) detection. The existing VLM-based HOI detectors typically adopt a handcrafted template (e.g., a photo of a person [action] a/an [object]) to acquire text knowledge through the VLM text encoder. However, such approaches, only encoding the action-specific text prompts in vocabulary level, may suffer from learning ambiguity without exploring the fine-grained clues from the perspective of interaction context. In this paper, we propose a novel method to discover Syntactic Interaction Clues for HOI detection (SICHOI) by using VLM. Specifically, we first investigate what are the essential elements for an interaction context, and then establish a syntactic interaction bank from three levels: spatial relationship, action-oriented posture and situational condition. Further, to align visual features with the syntactic interaction bank, we adopt a multi-view extractor to jointly aggregate visual features from instance, interaction, and image levels accordingly. In addition, we also introduce a dual cross-attention decoder to perform context propagation between text knowledge and visual features, thereby enhancing the HOI detection. Experimental results demonstrate that our proposed method achieves state-of-the-art performance on HICO-DET and V-COCO.

引用

页码：28212 / 28222

页数：11

共 50 条

[1] Segmenting Key Clues to Induce Human-Object Interaction Detection
Xue, Mingliang
Wang, Siwei
Fu, Bing
Zhao, Zhengyang
Liu, Tao
Lai, Lingfeng
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 60 - 71
[2] Human-object interaction detection with depth-augmented clues
Cheng, Yamin
Duan, Hancong
Wang, Chen
Wang, Zhi
NEUROCOMPUTING, 2022, 500 : 978 - 988
[3] A Survey of Human-Object Interaction Detection
Gong X.
Zhang Z.
Liu L.
Ma B.
Wu K.
Xinan Jiaotong Daxue Xuebao/Journal of Southwest Jiaotong University, 2022, 57 (04): : 693 - 704
[4] Enhanced Transformer Interaction Components for Human-Object Interaction Detection
Zhang, JinHui
Zhao, Yuxiao
Zhang, Xian
Wang, Xiang
Zhao, Yuxuan
Wang, Peng
Hu, Jian
ACM SYMPOSIUM ON SPATIAL USER INTERACTION, SUI 2023, 2023,
[5] An Improved Human-Object Interaction Detection Network
Gao, Song
Wang, Hongyu
Song, Jilai
Xu, Fang
Zou, Fengshan
PROCEEDINGS OF 2019 IEEE 13TH INTERNATIONAL CONFERENCE ON ANTI-COUNTERFEITING, SECURITY, AND IDENTIFICATION (IEEE-ASID'2019), 2019, : 192 - 196
[6] Learning Human-Object Interaction Detection using Interaction Points
Wang, Tiancai
Yang, Tong
Danelljan, Martin
Khan, Fahad Shahbaz
Zhang, Xiangyu
Sun, Jian
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4115 - 4124
[7] Distance Matters in Human-Object Interaction Detection
Wang, Guangzhi
Guo, Yangyang
Wong, Yongkang
Kankanhalli, Mohan
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4546 - 4554
[8] Human-object interaction detection with missing objects
Kogashi, Kaen
Wu, Yang
Nobuhara, Shohei
Nishino, Ko
IMAGE AND VISION COMPUTING, 2021, 113
[9] Agglomerative Transformer for Human-Object Interaction Detection
Tu, Danyang
Sun, Wei
Zhai, Guangtao
Shen, Wei
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21557 - 21567
[10] Diagnosing Rarity in Human-object Interaction Detection
Kilickaya, Mert
Smeulders, Arnold
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 3956 - 3960

← 1 2 3 4 5 →