Image attention transformer network for indoor 3D object detection

被引:0
|
作者
REN KeYan
YAN Tong
HU ZhaoXin
HAN HongGui
ZHANG YunLu
机构
[1] FacultyofInformationTechnology,BeijingUniversityofTechnology
关键词
D O I
暂无
中图分类号
TP391.41 [];
学科分类号
080203 ;
摘要
Point clouds and RGB images are both critical data for 3D object detection. While recent multi-modal methods combine them directly and show remarkable performances, they ignore the distinct forms of these two types of data. For mitigating the influence of this intrinsic difference on performance, we propose a novel but effective fusion model named LI-Attention model, which takes both RGB features and point cloud features into consideration and assigns a weight to each RGB feature by attention mechanism.Furthermore, based on the LI-Attention model, we propose a 3D object detection method called image attention transformer network(IAT-Net) specialized for indoor RGB-D scene. Compared with previous work on multi-modal detection, IAT-Net fuses elaborate RGB features from 2D detection results with point cloud features in attention mechanism, meanwhile generates and refines 3D detection results with transformer model. Extensive experiments demonstrate that our approach outperforms stateof-the-art performance on two widely used benchmarks of indoor 3D object detection, SUN RGB-D and NYU Depth V2, while ablation studies have been provided to analyze the effect of each module. And the source code for the proposed IAT-Net is publicly available at https://github.com/wisper181/IAT-Net.
引用
收藏
页码:2176 / 2190
页数:15
相关论文
共 50 条
  • [21] 3D object detection based on fusion of point cloud and image by mutual attention
    Chen J.-Y.
    Bai T.-Y.
    Zhao L.
    Guangxue Jingmi Gongcheng/Optics and Precision Engineering, 2021, 29 (09): : 2247 - 2254
  • [22] RADIANT: Radar-Image Association Network for 3D Object Detection
    Long, Yunfei
    Kumar, Abhinav
    Morris, Daniel
    Liu, Xiaoming
    Castro, Marcos
    Chakravarty, Punarjay
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 1808 - 1816
  • [23] High-order multilayer attention fusion network for 3D object detection
    Zhang, Baowen
    Zhao, Yongyong
    Su, Chengzhi
    Cao, Guohua
    ENGINEERING REPORTS, 2024, 6 (12)
  • [24] Multimodal Transformer for Automatic 3D Annotation and Object Detection
    Liu, Chang
    Qian, Xiaoyan
    Huang, Binxiao
    Qi, Xiaojuan
    Lam, Edmund
    Tan, Siew-Chong
    Wong, Ngai
    COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 : 657 - 673
  • [25] SEFormer: Structure Embedding Transformer for 3D Object Detection
    Feng, Xiaoyu
    Du, Heming
    Fan, Hehe
    Duan, Yueqi
    Liu, Yongpan
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 632 - 640
  • [26] Monocular 3D object detection for an indoor robot environment
    Kim, Jiwon
    Lee, GiJae
    Kim, Jun-Sik
    Kim, Hyunwoo J.
    Kim, KangGeon
    2020 29TH IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (RO-MAN), 2020, : 438 - 445
  • [27] GFENet: Group-Free Enhancement Network for Indoor Scene 3D Object Detection
    Zhou, Feng
    Dai, Ju
    Pan, Junjun
    Zhu, Mengxiao
    Cai, Xingquan
    Huang, Bin
    Wang, Chen
    ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT III, 2024, 14497 : 119 - 136
  • [28] FATUnetr:fully attention Transformer for 3D medical image segmentation
    Li, QingFeng
    Tong, Jigang
    Yang, Sen
    Du, Shengzhi
    2024 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION, ICMA 2024, 2024, : 1415 - 1419
  • [29] TPAFNet: Transformer-Driven Pyramid Attention Fusion Network for 3D Medical Image Segmentation
    Li, Zheng
    Zhang, Jinhui
    Wei, Siyi
    Gao, Yueyang
    Cao, Chengwei
    Wu, Zhiwei
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (11) : 6803 - 6814
  • [30] PointGAT: Graph attention networks for 3D object detection
    Zhou H.
    Wang W.
    Liu G.
    Zhou Q.
    Intelligent and Converged Networks, 2022, 3 (02): : 204 - 216