Fully Sparse Fusion for 3D Object Detection

被引:7
|
作者
Li, Yingyan [1 ,2 ]
Fan, Lue [1 ,2 ]
Liu, Yang [1 ]
Huang, Zehao [3 ]
Chen, Yuntao [4 ]
Wang, Naiyan [3 ]
Zhang, Zhaoxiang [1 ,2 ,4 ]
机构
[1] Chinese Acad Sci CASIA, Inst Automat, Ctr Researchon Intelligent Percept & Comp CRIPAC, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci UCAS, Sch Future Technol, Beijing 100049, Peoples R China
[3] TuSimple, Beijing 100020, Peoples R China
[4] Chinese Acad Sci HKISICAS, Hong Kong Inst Sci & Innovat, Ctr Artificial Intelligence & Robot, Hong Kong, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Three-dimensional displays; Feature extraction; Laser radar; Cameras; Detectors; Instance segmentation; Point cloud compression; 3D object detection; multi-sensor fusion; fully sparse architecture; autonomous driving; long-range perception;
D O I
10.1109/TPAMI.2024.3392303
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Currently prevalent multi-modal 3D detection methods rely on dense detectors that usually use dense Bird's-Eye-View (BEV) feature maps. However, the cost of such BEV feature maps is quadratic to the detection range, making it not scalable for long-range detection. Recently, LiDAR-only fully sparse architecture has been gaining attention for its high efficiency in long-range perception. In this paper, we study how to develop a multi-modal fully sparse detector. Specifically, our proposed detector integrates the well-studied 2D instance segmentation into the LiDAR side, which is parallel to the 3D instance segmentation part in the LiDAR-only baseline. The proposed instance-based fusion framework maintains full sparsity while overcoming the constraints associated with the LiDAR-only fully sparse detector. Our framework showcases state-of-the-art performance on the widely used nuScenes dataset, Waymo Open Dataset, and the long-range Argoverse 2 dataset. Notably, the inference speed of our proposed method under the long-range perception setting is 2.7x faster than that of other state-of-the-art multimodal 3D detection methods.
引用
收藏
页码:7217 / 7231
页数:15
相关论文
共 50 条
  • [41] An improved dense-to-sparse cross-modal fusion network for 3D object detection in RGB-D images
    Chen, Yan
    Ni, Jianjun
    Tang, Guangyi
    Cao, Weidong
    Yang, Simon X.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (04) : 12159 - 12184
  • [42] An improved dense-to-sparse cross-modal fusion network for 3D object detection in RGB-D images
    Yan Chen
    Jianjun Ni
    Guangyi Tang
    Weidong Cao
    Simon X. Yang
    Multimedia Tools and Applications, 2024, 83 : 12159 - 12184
  • [43] FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection
    Rukhovich, Danila
    Vorontsova, Anna
    Konushin, Anton
    COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 477 - 493
  • [44] 2D/3D Sensor Exploitation and Fusion for Enhanced Object Detection
    Xu, Jiejun
    Kim, Kyungnam
    Zhang, Zhiqi
    Chen, Hai-wen
    Owechko, Yuri
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2014, : 778 - 784
  • [45] ObjectFusion: Multi-modal 3D Object Detection with Object-Centric Fusion
    Cai, Qi
    Pan, Yingwei
    Yao, Ting
    Ngo, Chong-Wah
    Mei, Tao
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 18021 - 18030
  • [46] FusionPillars: A 3D Object Detection Network with Cross-Fusion and Self-Fusion
    Zhang, Jing
    Xu, Da
    Li, Yunsong
    Zhao, Liping
    Su, Rui
    REMOTE SENSING, 2023, 15 (10)
  • [47] Obstacle and Planar Object Detection using Sparse 3D Information for a Smart Walker
    Cloix, Severine
    Weiss, Viviana
    Bologna, Guido
    Pun, Thierry
    Hasler, David
    PROCEEDINGS OF THE 2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, THEORY AND APPLICATIONS (VISAPP 2014), VOL 2, 2014, : 292 - 298
  • [48] Probabilistic instance shape reconstruction with sparse LiDAR for monocular 3D object detection
    Ji, Chaofeng
    Wu, Han
    Liu, Guizhong
    NEUROCOMPUTING, 2023, 529 : 92 - 100
  • [49] TSFF: a two-stage fusion framework for 3D object detection
    Jiang, Guoqing
    Li, Saiya
    Huang, Ziyu
    Cai, Guorong
    Su, Jinhe
    PeerJ Computer Science, 2024, 10
  • [50] DyFusion: Cross-Attention 3D Object Detection with Dynamic Fusion
    Bi, Jiangfeng
    Wei, Haiyue
    Zhang, Guoxin
    Yang, Kuihe
    Song, Ziying
    IEEE LATIN AMERICA TRANSACTIONS, 2024, 22 (02) : 106 - 112