Image attention transformer network for indoor 3D object detection

被引：0

作者：

REN KeYan

YAN Tong

HU ZhaoXin

HAN HongGui

ZHANG YunLu

机构：

[1] FacultyofInformationTechnology,BeijingUniversityofTechnology

来源：

Science China(Technological Sciences) | 2024年 / 67卷 / 07期

关键词：

D O I：

暂无

中图分类号：

TP391.41 [];

学科分类号：

080203 ;

摘要：

Point clouds and RGB images are both critical data for 3D object detection. While recent multi-modal methods combine them directly and show remarkable performances, they ignore the distinct forms of these two types of data. For mitigating the influence of this intrinsic difference on performance, we propose a novel but effective fusion model named LI-Attention model, which takes both RGB features and point cloud features into consideration and assigns a weight to each RGB feature by attention mechanism.Furthermore, based on the LI-Attention model, we propose a 3D object detection method called image attention transformer network(IAT-Net) specialized for indoor RGB-D scene. Compared with previous work on multi-modal detection, IAT-Net fuses elaborate RGB features from 2D detection results with point cloud features in attention mechanism, meanwhile generates and refines 3D detection results with transformer model. Extensive experiments demonstrate that our approach outperforms stateof-the-art performance on two widely used benchmarks of indoor 3D object detection, SUN RGB-D and NYU Depth V2, while ablation studies have been provided to analyze the effect of each module. And the source code for the proposed IAT-Net is publicly available at https://github.com/wisper181/IAT-Net.

引用

页码：2176 / 2190

页数：15

共 50 条

[21] 3D object detection based on fusion of point cloud and image by mutual attention
Chen J.-Y.
Bai T.-Y.
Zhao L.
Guangxue Jingmi Gongcheng/Optics and Precision Engineering, 2021, 29 (09): : 2247 - 2254
[22] RADIANT: Radar-Image Association Network for 3D Object Detection
Long, Yunfei
Kumar, Abhinav
Morris, Daniel
Liu, Xiaoming
Castro, Marcos
Chakravarty, Punarjay
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 1808 - 1816
[23] High-order multilayer attention fusion network for 3D object detection
Zhang, Baowen
Zhao, Yongyong
Su, Chengzhi
Cao, Guohua
ENGINEERING REPORTS, 2024, 6 (12)
[24] Multimodal Transformer for Automatic 3D Annotation and Object Detection
Liu, Chang
Qian, Xiaoyan
Huang, Binxiao
Qi, Xiaojuan
Lam, Edmund
Tan, Siew-Chong
Wong, Ngai
COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 : 657 - 673
[25] SEFormer: Structure Embedding Transformer for 3D Object Detection
Feng, Xiaoyu
Du, Heming
Fan, Hehe
Duan, Yueqi
Liu, Yongpan
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 632 - 640
[26] Monocular 3D object detection for an indoor robot environment
Kim, Jiwon
Lee, GiJae
Kim, Jun-Sik
Kim, Hyunwoo J.
Kim, KangGeon
2020 29TH IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (RO-MAN), 2020, : 438 - 445
[27] GFENet: Group-Free Enhancement Network for Indoor Scene 3D Object Detection
Zhou, Feng
Dai, Ju
Pan, Junjun
Zhu, Mengxiao
Cai, Xingquan
Huang, Bin
Wang, Chen
ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT III, 2024, 14497 : 119 - 136
[28] FATUnetr:fully attention Transformer for 3D medical image segmentation
Li, QingFeng
Tong, Jigang
Yang, Sen
Du, Shengzhi
2024 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION, ICMA 2024, 2024, : 1415 - 1419
[29] TPAFNet: Transformer-Driven Pyramid Attention Fusion Network for 3D Medical Image Segmentation
Li, Zheng
Zhang, Jinhui
Wei, Siyi
Gao, Yueyang
Cao, Chengwei
Wu, Zhiwei
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (11) : 6803 - 6814
[30] PointGAT: Graph attention networks for 3D object detection
Zhou H.
Wang W.
Liu G.
Zhou Q.
Intelligent and Converged Networks, 2022, 3 (02): : 204 - 216

← 1 2 3 4 5 →