HCPVF: Hierarchical Cascaded Point-Voxel Fusion for 3D Object Detection

被引:6
|
作者
Fan, Baojie [1 ,2 ,3 ]
Zhang, Kexin [1 ,2 ,3 ]
Tian, Jiandong [4 ]
机构
[1] Nanjing Univ Posts & Telecommun, Coll Automat, Nanjing 210023, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Coll Artificial Intelligence, Nanjing 210023, Peoples R China
[3] Wuzhou Univ, Guangxi Key Lab Machine Vis & Intelligent Control, Wuzhou 543002, Peoples R China
[4] Chinese Acad Sci, Shenyang Inst Automat, State Key Lab Robot, Beijing 100045, Peoples R China
基金
中国国家自然科学基金;
关键词
Three-dimensional displays; Feature extraction; Point cloud compression; Proposals; Object detection; Detectors; Transformers; 3D object detection; BEV; voxel; point cloud;
D O I
10.1109/TCSVT.2023.3268849
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With the astonishing development of 3D sensors, point cloud based 3D object detection is attracting increasing attention from both industry and academia, and widely applied in various fields, such as robotics and autonomous driving. However, how to balance the 3D object detecting accuracy and speed is still a challenging problem. In this paper, we study this issue and propose a novel and effective 3D point cloudy object detection network based on hierarchical cascaded point-voxel fusion, called HCPVF. Firstly, a novel bird's-eye-view(BEV) attention mechanism with linear complexity is developed to improve point cloud feature backbone network, which can be implemented easily to mine the point-to-point similarity in BEV's view, by two cascaded linear layers and two normalization layers. This operation captures long-range dependencies and reduces the uneven sampling of sparse BEV features, making the extracted point cloudy features more discriminative. Secondly, the proposed HCPVF module is equipped with dual-level hierarchical cascaded detection head, including voxel level and the following point level. The voxel level is composed of coarse Region of interest(RoI) pooling and fine RoI pooling, which are cooperated to aggregate voxel features from different grid divisions and predict relatively coarse detection boxes. In the following, the point level is based on Key Points Transformer. It firstly encodes the spatial context information between the original point and the voxel level box. And then, a novel dual-weighted decoder is developed to enhance the context interaction by weighting the channel and spatial dimensions to obtain more accurate detection results. This design utilizes the voxel based method with high computational efficiency and the point based method with more complete spatial information, fusing low-level voxel features and high-level point features through hierarchical cascaded strategy. Extensive experiments demonstate that the proposed HCPVF achieves state-of-the-art 3D detection performance while maintaining computational efficiency on both the Waymo Open Dataset and the highly-competitive KITTI benchmark.
引用
收藏
页码:8997 / 9009
页数:13
相关论文
共 50 条
  • [31] PVC-SSD: Point-Voxel Dual-Channel Fusion With Cascade Point Estimation for Anchor-Free Single-Stage 3-D Object Detection
    Deng, Pengzhen
    Zhou, Li
    Chen, Jie
    IEEE SENSORS JOURNAL, 2024, 24 (09) : 14894 - 14904
  • [32] Voxel-to-Pillar: Knowledge Distillation of 3D Object Detection in Point Cloud
    Zhang, Jinbao
    Liu, Jun
    PROCEEDINGS OF THE 4TH EUROPEAN SYMPOSIUM ON SOFTWARE ENGINEERING, ESSE 2023, 2024, : 99 - 104
  • [33] F-PVNet: Frustum-Level 3-D Object Detection on Point-Voxel Feature Representation for Autonomous Driving
    Tao, Chongben
    Fu, Shiping
    Wang, Chen
    Luo, Xizhao
    Li, Huayi
    Gao, Zhen
    Zhang, Zufeng
    Zheng, Sifa
    IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (09) : 8031 - 8045
  • [34] IPVNet: Learning implicit point-voxel features for open-surface 3D reconstruction
    Arshad, Mohammad Samiul
    Beksi, William J.
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 97
  • [35] 3D Dynamic Target Detection Algorithm Based on Voxel Point Cloud Fusion
    Zhou F.
    Tao C.
    Zhang Z.
    Gao H.
    Xu F.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2022, 34 (06): : 901 - 912
  • [36] 改进Point-Voxel特征提取的3D小目标检测
    李宇轩
    陈壹华
    温兴
    严彬彬
    张航
    微电子学与计算机, 2023, 40 (02) : 50 - 58
  • [37] 3D Object Detection with Fusion Point Attention Mechanism in LiDAR Point Cloud
    Liu Weili
    Zhu Deli
    Luo Huahao
    Li Yi
    ACTA PHOTONICA SINICA, 2023, 52 (09)
  • [38] HPV-RCNN: Hybrid Point-Voxel Two-Stage Network for LiDAR Based 3-D Object Detection
    Feng, Chen
    Xiang, Chao
    Xie, Xiaopo
    Zhang, Yuan
    Yang, Mingchuan
    Li, Xuesong
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2023, 10 (06) : 3066 - 3076
  • [39] HRNet: 3D object detection network for point cloud with hierarchical refinement
    Lu, Bin
    Sun, Yang
    Yang, Zhenyu
    Song, Ran
    Jiang, Haiyan
    Liu, Yonghuai
    PATTERN RECOGNITION, 2024, 149
  • [40] Planar object detection from 3D point clouds based on pyramid voxel representation
    Hu, Zhaozheng
    Bai, Dongfang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (22) : 24343 - 24357