Multi-Modal 3D Object Detection by Box Matching

被引:2
|
作者
Liu, Zhe [1 ]
Ye, Xiaoqing [2 ]
Zou, Zhikang [2 ]
He, Xinwei [3 ]
Tan, Xiao [2 ]
Ding, Errui [2 ]
Wang, Jingdong [2 ]
Bai, Xiang [4 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan 430074, Peoples R China
[2] Baidu Inc, Beijing 100085, Peoples R China
[3] Huazhong Agr Univ, Coll Informat, Wuhan 430070, Peoples R China
[4] Huazhong Univ Sci & Technol, Sch Software, Wuhan 430074, Peoples R China
关键词
Three-dimensional displays; Laser radar; Feature extraction; Cameras; Sensors; Proposals; Object detection; Multi-modal; 3D object detection; feature alignment; box matching;
D O I
10.1109/TITS.2024.3453963
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Multi-modal 3D object detection has received growing attention as the information from different sensors like LiDAR and cameras are complementary. Most fusion methods for 3D detection rely on an accurate alignment and calibration between 3D point clouds and RGB images. However, such an assumption is not reliable in a real-world self-driving system, as the alignment between different modalities is easily affected by asynchronous sensors and disturbed sensor placement. We propose a novel Fusion network by Box Matching (FBMNet) for multi-modal 3D detection, which provides an alternative way for cross-modal feature alignment by learning the correspondence at the bounding box level to free up the dependency of calibration during inference. With the learned assignments between 3D and 2D object proposals, the fusion for detection can be effectively performed by combining their ROI features. Extensive experiments on the nuScenes dataset demonstrate that our method is much more robust in dealing with challenging cases such as asynchronous sensors, misaligned sensor placement, and degenerated camera images than existing fusion methods. We hope that our could provide an available solution to dealing with these challenging cases for safety in real autonomous driving scenarios.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Multi-Modal Fusion Based on Depth Adaptive Mechanism for 3D Object Detection
    Liu, Zhanwen
    Cheng, Juanru
    Fan, Jin
    Lin, Shan
    Wang, Yang
    Zhao, Xiangmo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 707 - 717
  • [22] Height-Adaptive Deformable Multi-Modal Fusion for 3D Object Detection
    Li, Jiahao
    Chen, Lingshan
    Li, Zhen
    IEEE ACCESS, 2025, 13 : 52385 - 52396
  • [23] Frustum FusionNet: Amodal 3D Object Detection with Multi-Modal Feature Fusion
    Zuo, Liangyu
    Li, Yaochen
    Han, Mengtao
    Li, Qiao
    Liu, Yuehu
    2021 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2021, : 2746 - 2751
  • [24] Enhancing 3D object detection through multi-modal fusion for cooperative perception
    Xia, Bin
    Zhou, Jun
    Kong, Fanyu
    You, Yuhe
    Yang, Jiarui
    Lin, Lin
    ALEXANDRIA ENGINEERING JOURNAL, 2024, 104 : 46 - 55
  • [25] Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection
    Huang, Linyan
    Li, Zhiqi
    Sima, Chonghao
    Wang, Wenhai
    Wang, Jingdong
    Qiao, Yu
    Li, Hongyang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [26] MMDistill: Multi-Modal BEV Distillation Framework for Multi-View 3D Object Detection
    Jiao, Tianzhe
    Chen, Yuming
    Zhang, Zhe
    Guo, Chaopeng
    Song, Jie
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 81 (03): : 4307 - 4325
  • [27] SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection
    Xie, Yichen
    Xu, Chenfeng
    Rakotosaona, Marie-Julie
    Rim, Patrick
    Tombari, Federico
    Keutzer, Kurt
    Tomizuka, Masayoshi
    Zhan, Wei
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 17545 - 17556
  • [28] Multi-modal information fusion for LiDAR-based 3D object detection framework
    Ruixin Ma
    Yong Yin
    Jing Chen
    Rihao Chang
    Multimedia Tools and Applications, 2024, 83 : 7995 - 8012
  • [29] Dual-domain deformable feature fusion for multi-modal 3D object detection
    Wang, Shihao
    Deng, Tao
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (06)
  • [30] LSSAttn: Towards Dense and Accurate View Transformation for Multi-modal 3D Object Detection
    Jiang, Qi
    Sun, Hao
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 6600 - 6606