Discovering Syntactic Interaction Clues for Human-Object Interaction Detection

被引:2
|
作者
Lu, Jinguo [1 ]
Ren, Weihong [1 ,2 ]
Jiang, Weibo [1 ]
Chen, Xi'ai [2 ,3 ]
Wang, Qiang [4 ,5 ]
Han, Zhi [2 ,3 ]
Liu, Honghai [1 ]
机构
[1] Harbin Inst Technol, Shenzhen, Peoples R China
[2] Shenyang Univ, Shenyang, Peoples R China
[3] Chinese Acad Sci, Shenyang Inst Automat, State Key Lab Robot, Shenyang, Peoples R China
[4] Chinese Acad Sci, Inst Robot, Beijing, Peoples R China
[5] Chinese Acad Sci, Inst Intelligent Mfg, Beijing, Peoples R China
来源
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2024年
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52733.2024.02665
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, Vision-Language Model (VLM) has greatly advanced the Human-Object Interaction (HOI) detection. The existing VLM-based HOI detectors typically adopt a handcrafted template (e.g., a photo of a person [action] a/an [object]) to acquire text knowledge through the VLM text encoder. However, such approaches, only encoding the action-specific text prompts in vocabulary level, may suffer from learning ambiguity without exploring the fine-grained clues from the perspective of interaction context. In this paper, we propose a novel method to discover Syntactic Interaction Clues for HOI detection (SICHOI) by using VLM. Specifically, we first investigate what are the essential elements for an interaction context, and then establish a syntactic interaction bank from three levels: spatial relationship, action-oriented posture and situational condition. Further, to align visual features with the syntactic interaction bank, we adopt a multi-view extractor to jointly aggregate visual features from instance, interaction, and image levels accordingly. In addition, we also introduce a dual cross-attention decoder to perform context propagation between text knowledge and visual features, thereby enhancing the HOI detection. Experimental results demonstrate that our proposed method achieves state-of-the-art performance on HICO-DET and V-COCO.
引用
收藏
页码:28212 / 28222
页数:11
相关论文
共 50 条
  • [41] Pose graph parsing network for human-object interaction detection
    Su, Zhan
    Wang, Yuting
    Xie, Qing
    Yu, Ruiyun
    NEUROCOMPUTING, 2022, 476 : 53 - 62
  • [42] Rethinking vision transformer through human-object interaction detection
    Cheng, Yamin
    Zhao, Zitian
    Wang, Zhi
    Duan, Hancong
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 122
  • [43] Effective actor-centric human-object interaction detection
    Xu, Kunlun
    Li, Zhimin
    Zhang, Zhijun
    Dong, Leizhen
    Xu, Wenhui
    Yan, Luxin
    Zhong, Sheng
    Zou, Xu
    IMAGE AND VISION COMPUTING, 2022, 121
  • [44] Egocentric Human-Object Interaction Detection Exploiting Synthetic Data
    Leonardi, Rosario
    Ragusa, Francesco
    Furnari, Antonino
    Farinella, Giovanni Maria
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT II, 2022, 13232 : 237 - 248
  • [45] Knowledge guided relation enhancement for human-object interaction detection
    Su, Rui
    Gao, Yongbin
    Yu, Wenjun
    Wu, Chenmou
    Jiang, Xiaoyan
    Zhou, Shubo
    APPLIED INTELLIGENCE, 2025, 55 (06)
  • [46] Mask-Guided Transformer for Human-Object Interaction Detection
    Ying, Daocheng
    Yang, Hua
    Sun, Jun
    2022 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2022,
  • [47] Cascaded Human-Object Interaction Recognition
    Zhou, Tianfei
    Wang, Wenguan
    Qi, Siyuan
    Ling, Haibin
    Shen, Jianbing
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4262 - 4271
  • [48] Human-Centric Parsing Network for Human-Object Interaction Detection
    Chen, Guanyu
    Chen, Chong
    Zhao, Zhicheng
    Su, Fei
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5488 - 5494
  • [49] Diagnosing Human-Object Interaction Detectors
    Zhu, Fangrui
    Xie, Yiming
    Xie, Weidi
    Jiang, Huaizu
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, : 2227 - 2244
  • [50] iCGPN: Interaction-centric graph parsing network for human-object interaction detection
    Yang, Wenhao
    Chen, Guanyu
    Zhao, Zhicheng
    Su, Fei
    Meng, Hongying
    NEUROCOMPUTING, 2022, 502 : 98 - 109