Discovering Syntactic Interaction Clues for Human-Object Interaction Detection

被引:2
|
作者
Lu, Jinguo [1 ]
Ren, Weihong [1 ,2 ]
Jiang, Weibo [1 ]
Chen, Xi'ai [2 ,3 ]
Wang, Qiang [4 ,5 ]
Han, Zhi [2 ,3 ]
Liu, Honghai [1 ]
机构
[1] Harbin Inst Technol, Shenzhen, Peoples R China
[2] Shenyang Univ, Shenyang, Peoples R China
[3] Chinese Acad Sci, Shenyang Inst Automat, State Key Lab Robot, Shenyang, Peoples R China
[4] Chinese Acad Sci, Inst Robot, Beijing, Peoples R China
[5] Chinese Acad Sci, Inst Intelligent Mfg, Beijing, Peoples R China
来源
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2024年
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52733.2024.02665
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, Vision-Language Model (VLM) has greatly advanced the Human-Object Interaction (HOI) detection. The existing VLM-based HOI detectors typically adopt a handcrafted template (e.g., a photo of a person [action] a/an [object]) to acquire text knowledge through the VLM text encoder. However, such approaches, only encoding the action-specific text prompts in vocabulary level, may suffer from learning ambiguity without exploring the fine-grained clues from the perspective of interaction context. In this paper, we propose a novel method to discover Syntactic Interaction Clues for HOI detection (SICHOI) by using VLM. Specifically, we first investigate what are the essential elements for an interaction context, and then establish a syntactic interaction bank from three levels: spatial relationship, action-oriented posture and situational condition. Further, to align visual features with the syntactic interaction bank, we adopt a multi-view extractor to jointly aggregate visual features from instance, interaction, and image levels accordingly. In addition, we also introduce a dual cross-attention decoder to perform context propagation between text knowledge and visual features, thereby enhancing the HOI detection. Experimental results demonstrate that our proposed method achieves state-of-the-art performance on HICO-DET and V-COCO.
引用
收藏
页码:28212 / 28222
页数:11
相关论文
共 50 条
  • [31] Hierarchical Reasoning Network for Human-Object Interaction Detection
    Gao, Yiming
    Kuang, Zhanghui
    Li, Guanbin
    Zhang, Wayne
    Lin, Liang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 8306 - 8317
  • [32] Transferable Interactiveness Knowledge for Human-Object Interaction Detection
    Li, Yong-Lu
    Liu, Xinpeng
    Wu, Xiaoqian
    Huang, Xijie
    Xu, Liang
    Lu, Cewu
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (07) : 3870 - 3882
  • [33] Weakly-supervised Human-object Interaction Detection
    Sugimoto, Masaki
    Furuta, Ryosuke
    Taniguchi, Yukinobu
    VISAPP: PROCEEDINGS OF THE 16TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS - VOL. 5: VISAPP, 2021, : 293 - 300
  • [34] Exploiting Scene Graphs for Human-Object Interaction Detection
    He, Tao
    Gao, Lianli
    Song, Jingkuan
    Li, Yuan-Fang
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 15964 - 15973
  • [35] Highlighting Object Category Immunity for the Generalization of Human-Object Interaction Detection
    Liu, Xinpeng
    Li, Yong-Lu
    Lu, Cewu
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1819 - 1827
  • [36] Discovering Human-Object Interaction Concepts via Self-Compositional Learning
    Hou, Zhi
    Yu, Baosheng
    Tao, Dacheng
    COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 461 - 478
  • [37] ERNet: An Efficient and Reliable Human-Object Interaction Detection Network
    Lim, JunYi
    Baskaran, Vishnu Monn
    Lim, Joanne Mun-Yee
    Wong, KokSheik
    See, John
    Tistarelli, Massimo
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 964 - 979
  • [38] Multi-stream Network for Human-object Interaction Detection
    Wang, Chang
    Sun, Jinyu
    Ma, Shiwei
    Lu, Yuqiu
    Liu, Wang
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (08)
  • [39] Polysemy Deciphering Network for Robust Human-Object Interaction Detection
    Zhong, Xubin
    Ding, Changxing
    Qu, Xian
    Tao, Dacheng
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (06) : 1910 - 1929
  • [40] Disentangled Pre-training for Human-Object Interaction Detection
    Li, Zhuolong
    Li, Xingao
    Ding, Changxing
    Xu, Xiangmin
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 28191 - 28201