CDAF3D: Cross-Dimensional Attention Fusion for Indoor 3D Object Detection

被引:0
|
作者
Wang, Shilin [1 ]
Huang, Hai [1 ]
Zhu, Yueyan [1 ]
Tang, Zhenqi [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Informat & Commun Engn, Beijing 100876, Peoples R China
基金
国家重点研发计划;
关键词
Indoor 3D Object Detection; Fusion Features; Point Cloud;
D O I
10.1007/978-981-97-8493-6_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
3D object detection is a crucial task in computer vision and autonomous systems, which is widely utilized in robotics, autonomous driving, and augmented reality. With the advancement of input devices, researchers propose to use multimodal information to improve the detection accuracy. However, integrating 2D and 3D features effectively to harness their complementary nature for detection tasks is still a challenge. In this paper, we note that the complementary nature of geometric and visual texture information can effectively strengthen feature fusion, which plays a key role in detection. To this end, we propose the Cross-Dimensional Attention Fusion-based indoor 3D object detection method (CDAF3D). This method dynamically learns geometric information with corresponding 2D image texture details through a cross-dimensional attention mechanism, enabling the model to capture and integrate spatial and textural information effectively. Additionally, due to the nature of 3D object detection, where intersecting entities with different specific labels are unrealistic, we further propose Preventive 3D Intersect Loss (P3DIL). This loss enhances detection accuracy by addressing intersections between objects of different labels. We evaluate the proposed CDAF3D on the SUN RGB-D and Scannet v2 datasets. Our results achieve 78.2 mAP@0.25 and 66.5 mAP@0.50 on ScanNetV2 and 70.3 mAP@0.25 and 54.1 mAP@0.50 on SUN RGB-D. The proposed CDAF3D outperforms all the multi-sensor-based methods with 3D IoU thresholds of 0.25 and 0.5.
引用
收藏
页码:165 / 177
页数:13
相关论文
共 50 条
  • [21] Sparse Dense Fusion for 3D Object Detection
    Gao, Yulu
    Sima, Chonghao
    Shi, Shaoshuai
    Di, Shangzhe
    Liu, Si
    Li, Hongyang
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 10939 - 10946
  • [22] Voxel Field Fusion for 3D Object Detection
    Li, Yanwei
    Qi, Xiaojuan
    Chen, Yukang
    Wang, Liwei
    Li, Zeming
    Sun, Jian
    Jia, Jiaya
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1110 - 1119
  • [23] Fully Sparse Fusion for 3D Object Detection
    Li, Yingyan
    Fan, Lue
    Liu, Yang
    Huang, Zehao
    Chen, Yuntao
    Wang, Naiyan
    Zhang, Zhaoxiang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (11) : 7217 - 7231
  • [24] Radar Voxel Fusion for 3D Object Detection
    Nobis, Felix
    Shafiei, Ehsan
    Karle, Phillip
    Betz, Johannes
    Lienkamp, Markus
    APPLIED SCIENCES-BASEL, 2021, 11 (12):
  • [25] Dense projection fusion for 3D object detection
    Chen, Zhao
    Hu, Bin-Jie
    Luo, Chengxi
    Chen, Guohao
    Zhu, Haohui
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [26] A multilevel fusion network for 3D object detection
    Xia, Chunlong
    Wei, Ping
    Wei, Wenwen
    Zheng, Nanning
    Neurocomputing, 2021, 437 : 107 - 117
  • [27] TBFNT3D: Two-Branch Fusion Network With Transformer for Multimodal Indoor 3D Object Detection
    Cheng, Jun
    Zhang, Sheng
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (10) : 6523 - 6530
  • [28] FusionPillars: A 3D Object Detection Network with Cross-Fusion and Self-Fusion
    Zhang, Jing
    Xu, Da
    Li, Yunsong
    Zhao, Liping
    Su, Rui
    REMOTE SENSING, 2023, 15 (10)
  • [29] SOA: Seed point offset attention for indoor 3D object detection in point clouds
    Shu, Jun
    Yu, Shiqi
    Shu, Xinyi
    Hu, Jiewen
    COMPUTERS & GRAPHICS-UK, 2024, 123
  • [30] Monocular 3D object detection for an indoor robot environment
    Kim, Jiwon
    Lee, GiJae
    Kim, Jun-Sik
    Kim, Hyunwoo J.
    Kim, KangGeon
    2020 29TH IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (RO-MAN), 2020, : 438 - 445