Image attention transformer network for indoor 3D object detection

被引:0
|
作者
REN KeYan
YAN Tong
HU ZhaoXin
HAN HongGui
ZHANG YunLu
机构
[1] FacultyofInformationTechnology,BeijingUniversityofTechnology
关键词
D O I
暂无
中图分类号
TP391.41 [];
学科分类号
080203 ;
摘要
Point clouds and RGB images are both critical data for 3D object detection. While recent multi-modal methods combine them directly and show remarkable performances, they ignore the distinct forms of these two types of data. For mitigating the influence of this intrinsic difference on performance, we propose a novel but effective fusion model named LI-Attention model, which takes both RGB features and point cloud features into consideration and assigns a weight to each RGB feature by attention mechanism.Furthermore, based on the LI-Attention model, we propose a 3D object detection method called image attention transformer network(IAT-Net) specialized for indoor RGB-D scene. Compared with previous work on multi-modal detection, IAT-Net fuses elaborate RGB features from 2D detection results with point cloud features in attention mechanism, meanwhile generates and refines 3D detection results with transformer model. Extensive experiments demonstrate that our approach outperforms stateof-the-art performance on two widely used benchmarks of indoor 3D object detection, SUN RGB-D and NYU Depth V2, while ablation studies have been provided to analyze the effect of each module. And the source code for the proposed IAT-Net is publicly available at https://github.com/wisper181/IAT-Net.
引用
收藏
页码:2176 / 2190
页数:15
相关论文
共 50 条
  • [31] PTA-Det: Point Transformer Associating Point Cloud and Image for 3D Object Detection
    Wan, Rui
    Zhao, Tianyun
    Zhao, Wei
    SENSORS, 2023, 23 (06)
  • [32] ASPVNet: Attention Based Sparse Point-Voxel Network for 3D Object Detection
    Yu, Bingxin
    Wang, Lu
    He, Yuhong
    Wang, Xiaoyang
    Cheng, Jun
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT X, 2025, 15040 : 161 - 176
  • [33] 3D Object Detection Based on Sparse Self-Attention Graph Neural Network
    Peng, Zhichen
    Feng, Ansong
    Wang, Tianzhu
    Shao, Xinzhe
    Ku, Tao
    Computer Engineering and Applications, 61 (03): : 295 - 305
  • [34] PillarDAN: Pillar-based Dual Attention Attention Network for 3D Object Detection with 4D RaDAR
    Li, Jingzhong
    Yang, Lin
    Chen, Yuxuan
    Yang, Yixin
    Jin, Yue
    Akiyama, Kuanta
    2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 1851 - 1857
  • [35] Explicit3D: Graph network with spatial inference for single image 3D object detection
    Liu, Yanjun
    Yang, Wenming
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2024, 124
  • [36] A multilevel fusion network for 3D object detection
    Xia, Chunlong
    Wei, Ping
    Wei, Wenwen
    Zheng, Nanning
    NEUROCOMPUTING, 2021, 437 : 107 - 117
  • [37] A multilevel fusion network for 3D object detection
    Xia, Chunlong
    Wei, Ping
    Wei, Wenwen
    Zheng, Nanning
    Neurocomputing, 2021, 437 : 107 - 117
  • [38] 3D point cloud object detection algorithm based on Transformer
    Liu M.
    Yang Q.
    Hu G.
    Guo Y.
    Zhang J.
    Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University, 2023, 41 (06): : 1190 - 1197
  • [39] CenterFormer: Center-Based Transformer for 3D Object Detection
    Zhou, Zixiang
    Zhao, Xiangchen
    Wang, Yu
    Wang, Panqu
    Foroosh, Hassan
    COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 : 496 - 513
  • [40] Improving 3D Object Detection with Channel-wise Transformer
    Sheng, Hualian
    Cai, Sijia
    Liu, Yuan
    Deng, Bing
    Huang, Jianqiang
    Hua, Xian-Sheng
    Zhao, Min-Jian
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 2723 - 2732