Discovering Syntactic Interaction Clues for Human-Object Interaction Detection

被引:2
|
作者
Lu, Jinguo [1 ]
Ren, Weihong [1 ,2 ]
Jiang, Weibo [1 ]
Chen, Xi'ai [2 ,3 ]
Wang, Qiang [4 ,5 ]
Han, Zhi [2 ,3 ]
Liu, Honghai [1 ]
机构
[1] Harbin Inst Technol, Shenzhen, Peoples R China
[2] Shenyang Univ, Shenyang, Peoples R China
[3] Chinese Acad Sci, Shenyang Inst Automat, State Key Lab Robot, Shenyang, Peoples R China
[4] Chinese Acad Sci, Inst Robot, Beijing, Peoples R China
[5] Chinese Acad Sci, Inst Intelligent Mfg, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52733.2024.02665
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, Vision-Language Model (VLM) has greatly advanced the Human-Object Interaction (HOI) detection. The existing VLM-based HOI detectors typically adopt a handcrafted template (e.g., a photo of a person [action] a/an [object]) to acquire text knowledge through the VLM text encoder. However, such approaches, only encoding the action-specific text prompts in vocabulary level, may suffer from learning ambiguity without exploring the fine-grained clues from the perspective of interaction context. In this paper, we propose a novel method to discover Syntactic Interaction Clues for HOI detection (SICHOI) by using VLM. Specifically, we first investigate what are the essential elements for an interaction context, and then establish a syntactic interaction bank from three levels: spatial relationship, action-oriented posture and situational condition. Further, to align visual features with the syntactic interaction bank, we adopt a multi-view extractor to jointly aggregate visual features from instance, interaction, and image levels accordingly. In addition, we also introduce a dual cross-attention decoder to perform context propagation between text knowledge and visual features, thereby enhancing the HOI detection. Experimental results demonstrate that our proposed method achieves state-of-the-art performance on HICO-DET and V-COCO.
引用
收藏
页码:28212 / 28222
页数:11
相关论文
共 50 条
  • [21] Parallel disentangling network for human-object interaction detection
    Cheng, Yamin
    Duan, Hancong
    Wang, Chen
    Chen, Zhijun
    PATTERN RECOGNITION, 2024, 146
  • [22] Human-Object Interaction Detection Based on Star Graph
    Cai, Shuang
    Ma, Shiwei
    Gu, Dongzhou
    Wang, Chang
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (09)
  • [23] Transferable Interactiveness Knowledge for Human-Object Interaction Detection
    Li, Yong-Lu
    Zhou, Siyuan
    Huang, Xijie
    Xu, Liang
    Ma, Ze
    Fang, Hao-Shu
    Wang, Yan-Feng
    Lu, Cewu
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3580 - 3589
  • [24] Affordance Transfer Learning for Human-Object Interaction Detection
    Hou, Zhi
    Yu, Baosheng
    Qiao, Yu
    Peng, Xiaojiang
    Tao, Dacheng
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 495 - 504
  • [25] Human-Object Interaction Detection via Disentangled Transformer
    Zhou, Desen
    Liu, Zhichao
    Wang, Jian
    Wang, Leshan
    Hu, Tao
    Ding, Errui
    Wang, Jingdong
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19546 - 19555
  • [26] Spatial-Net for Human-Object Interaction Detection
    Mansour, Ahmed E.
    Mohammed, Ammar
    Elsayed, Hussein Abd El Atty
    Elramly, Salwa
    IEEE Access, 2022, 10 : 88920 - 88931
  • [27] Reimagining Violent Action Detection with Human-Object Interaction
    Baskaran, Vishnu Monn
    Sutopo, Ricky
    Lim, JunYi
    Lim, Joanne Mun-Yee
    Wong, KokSheik
    2024 IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE, AVSS 2024, 2024,
  • [28] Human-Object Interaction Detection with Ratio-Transformer
    Wang, Tianlang
    Lu, Tao
    Fang, Wenhua
    Zhang, Yanduo
    SYMMETRY-BASEL, 2022, 14 (08):
  • [29] Semantic Inference Network for Human-Object Interaction Detection
    Liu, Hongyi
    Mo, Lisha
    Ma, Huimin
    IMAGE AND GRAPHICS, ICIG 2019, PT I, 2019, 11901 : 518 - 529
  • [30] Geometric Features Enhanced Human-Object Interaction Detection
    Zhu, Manli
    Ho, Edmond S. L.
    Chen, Shuang
    Yang, Longzhi
    Shum, Hubert P. H.
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 1