HCPVF: Hierarchical Cascaded Point-Voxel Fusion for 3D Object Detection

被引:6
|
作者
Fan, Baojie [1 ,2 ,3 ]
Zhang, Kexin [1 ,2 ,3 ]
Tian, Jiandong [4 ]
机构
[1] Nanjing Univ Posts & Telecommun, Coll Automat, Nanjing 210023, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Coll Artificial Intelligence, Nanjing 210023, Peoples R China
[3] Wuzhou Univ, Guangxi Key Lab Machine Vis & Intelligent Control, Wuzhou 543002, Peoples R China
[4] Chinese Acad Sci, Shenyang Inst Automat, State Key Lab Robot, Beijing 100045, Peoples R China
基金
中国国家自然科学基金;
关键词
Three-dimensional displays; Feature extraction; Point cloud compression; Proposals; Object detection; Detectors; Transformers; 3D object detection; BEV; voxel; point cloud;
D O I
10.1109/TCSVT.2023.3268849
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With the astonishing development of 3D sensors, point cloud based 3D object detection is attracting increasing attention from both industry and academia, and widely applied in various fields, such as robotics and autonomous driving. However, how to balance the 3D object detecting accuracy and speed is still a challenging problem. In this paper, we study this issue and propose a novel and effective 3D point cloudy object detection network based on hierarchical cascaded point-voxel fusion, called HCPVF. Firstly, a novel bird's-eye-view(BEV) attention mechanism with linear complexity is developed to improve point cloud feature backbone network, which can be implemented easily to mine the point-to-point similarity in BEV's view, by two cascaded linear layers and two normalization layers. This operation captures long-range dependencies and reduces the uneven sampling of sparse BEV features, making the extracted point cloudy features more discriminative. Secondly, the proposed HCPVF module is equipped with dual-level hierarchical cascaded detection head, including voxel level and the following point level. The voxel level is composed of coarse Region of interest(RoI) pooling and fine RoI pooling, which are cooperated to aggregate voxel features from different grid divisions and predict relatively coarse detection boxes. In the following, the point level is based on Key Points Transformer. It firstly encodes the spatial context information between the original point and the voxel level box. And then, a novel dual-weighted decoder is developed to enhance the context interaction by weighting the channel and spatial dimensions to obtain more accurate detection results. This design utilizes the voxel based method with high computational efficiency and the point based method with more complete spatial information, fusing low-level voxel features and high-level point features through hierarchical cascaded strategy. Extensive experiments demonstate that the proposed HCPVF achieves state-of-the-art 3D detection performance while maintaining computational efficiency on both the Waymo Open Dataset and the highly-competitive KITTI benchmark.
引用
收藏
页码:8997 / 9009
页数:13
相关论文
共 50 条
  • [41] Planar object detection from 3D point clouds based on pyramid voxel representation
    Zhaozheng Hu
    Dongfang Bai
    Multimedia Tools and Applications, 2017, 76 : 24343 - 24357
  • [42] DVST: Deformable Voxel Set Transformer for 3D Object Detection from Point Clouds
    Ning, Yaqian
    Cao, Jie
    Bao, Chun
    Hao, Qun
    REMOTE SENSING, 2023, 15 (23)
  • [43] AEPF: Attention-Enabled Point Fusion for 3D Object Detection
    Sharma, Sachin
    Meyer, Richard T.
    Asher, Zachary D.
    SENSORS, 2024, 24 (17)
  • [44] 3D Object Detection Based on Feature Fusion of Point Cloud Sequences
    Zhai, Zhenyu
    Wang, Qiantong
    Pan, Zongxu
    Hu, Wenlong
    Hu, Yuxin
    2022 IEEE 17TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2022, : 1240 - 1245
  • [45] VIDF-Net: A Voxel-Image Dynamic Fusion method for 3D object detection
    Xiang, Xuezhi
    Li, Dianang
    Wang, Xi
    Zhou, Xiankun
    Qiao, Yulong
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 249
  • [46] VoPiFNet: Voxel-Pixel Fusion Network for Multi-Class 3D Object Detection
    Wang, Chia-Hung
    Chen, Hsueh-Wei
    Chen, Yi
    Hsiao, Pei-Yung
    Fu, Li-Chen
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, : 8527 - 8537
  • [47] 3D point cloud object detection method in view of voxel based on graph convolution network
    Zhao Y.
    Arxidin A.
    Chen R.
    Zhou Y.
    Zhang Q.
    Hongwai yu Jiguang Gongcheng/Infrared and Laser Engineering, 2021, 50 (10):
  • [48] MsSVT: Mixed-scale Sparse Voxel Transformer for 3D Object Detection on Point Clouds
    Dong, Shaocong
    Ding, Lihe
    Wang, Haiyang
    Xu, Tingfa
    Xu, Xinli
    Bian, Ziyang
    Wang, Ying
    Wang, Jie
    Li, Jianan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [49] HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection
    Noh, Jongyoun
    Lee, Sanghoon
    Ham, Bumsub
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 14600 - 14609
  • [50] Point cloud 3D object detection algorithm based on local information fusion
    Zhang, Linjie
    Chai, Zhilei
    Wang, Ning
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2024, 58 (11): : 2219 - 2229