Discovering Syntactic Interaction Clues for Human-Object Interaction Detection

被引:2
|
作者
Lu, Jinguo [1 ]
Ren, Weihong [1 ,2 ]
Jiang, Weibo [1 ]
Chen, Xi'ai [2 ,3 ]
Wang, Qiang [4 ,5 ]
Han, Zhi [2 ,3 ]
Liu, Honghai [1 ]
机构
[1] Harbin Inst Technol, Shenzhen, Peoples R China
[2] Shenyang Univ, Shenyang, Peoples R China
[3] Chinese Acad Sci, Shenyang Inst Automat, State Key Lab Robot, Shenyang, Peoples R China
[4] Chinese Acad Sci, Inst Robot, Beijing, Peoples R China
[5] Chinese Acad Sci, Inst Intelligent Mfg, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52733.2024.02665
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, Vision-Language Model (VLM) has greatly advanced the Human-Object Interaction (HOI) detection. The existing VLM-based HOI detectors typically adopt a handcrafted template (e.g., a photo of a person [action] a/an [object]) to acquire text knowledge through the VLM text encoder. However, such approaches, only encoding the action-specific text prompts in vocabulary level, may suffer from learning ambiguity without exploring the fine-grained clues from the perspective of interaction context. In this paper, we propose a novel method to discover Syntactic Interaction Clues for HOI detection (SICHOI) by using VLM. Specifically, we first investigate what are the essential elements for an interaction context, and then establish a syntactic interaction bank from three levels: spatial relationship, action-oriented posture and situational condition. Further, to align visual features with the syntactic interaction bank, we adopt a multi-view extractor to jointly aggregate visual features from instance, interaction, and image levels accordingly. In addition, we also introduce a dual cross-attention decoder to perform context propagation between text knowledge and visual features, thereby enhancing the HOI detection. Experimental results demonstrate that our proposed method achieves state-of-the-art performance on HICO-DET and V-COCO.
引用
收藏
页码:28212 / 28222
页数:11
相关论文
共 50 条
  • [1] Segmenting Key Clues to Induce Human-Object Interaction Detection
    Xue, Mingliang
    Wang, Siwei
    Fu, Bing
    Zhao, Zhengyang
    Liu, Tao
    Lai, Lingfeng
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 60 - 71
  • [2] Human-object interaction detection with depth-augmented clues
    Cheng, Yamin
    Duan, Hancong
    Wang, Chen
    Wang, Zhi
    NEUROCOMPUTING, 2022, 500 : 978 - 988
  • [3] A Survey of Human-Object Interaction Detection
    Gong X.
    Zhang Z.
    Liu L.
    Ma B.
    Wu K.
    Xinan Jiaotong Daxue Xuebao/Journal of Southwest Jiaotong University, 2022, 57 (04): : 693 - 704
  • [4] Enhanced Transformer Interaction Components for Human-Object Interaction Detection
    Zhang, JinHui
    Zhao, Yuxiao
    Zhang, Xian
    Wang, Xiang
    Zhao, Yuxuan
    Wang, Peng
    Hu, Jian
    ACM SYMPOSIUM ON SPATIAL USER INTERACTION, SUI 2023, 2023,
  • [5] An Improved Human-Object Interaction Detection Network
    Gao, Song
    Wang, Hongyu
    Song, Jilai
    Xu, Fang
    Zou, Fengshan
    PROCEEDINGS OF 2019 IEEE 13TH INTERNATIONAL CONFERENCE ON ANTI-COUNTERFEITING, SECURITY, AND IDENTIFICATION (IEEE-ASID'2019), 2019, : 192 - 196
  • [6] Learning Human-Object Interaction Detection using Interaction Points
    Wang, Tiancai
    Yang, Tong
    Danelljan, Martin
    Khan, Fahad Shahbaz
    Zhang, Xiangyu
    Sun, Jian
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4115 - 4124
  • [7] Distance Matters in Human-Object Interaction Detection
    Wang, Guangzhi
    Guo, Yangyang
    Wong, Yongkang
    Kankanhalli, Mohan
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4546 - 4554
  • [8] Human-object interaction detection with missing objects
    Kogashi, Kaen
    Wu, Yang
    Nobuhara, Shohei
    Nishino, Ko
    IMAGE AND VISION COMPUTING, 2021, 113
  • [9] Agglomerative Transformer for Human-Object Interaction Detection
    Tu, Danyang
    Sun, Wei
    Zhai, Guangtao
    Shen, Wei
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21557 - 21567
  • [10] Diagnosing Rarity in Human-object Interaction Detection
    Kilickaya, Mert
    Smeulders, Arnold
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 3956 - 3960