PiSFANet: Pillar Scale-Aware Feature Aggregation Network for Real-Time 3D Pedestrian Detection

被引:0
|
作者
Yan, Weiqing [1 ]
Liu, Shile [1 ]
Tang, Chang [2 ]
Zhou, Wujie [3 ]
机构
[1] Yantai Univ, Sch Comp & Control Engn, Yantai 261400, Peoples R China
[2] China Univ Geosci, Sch Comp Sci, Wuhan 430074, Peoples R China
[3] Zhejiang Univ Sci & Technol, Sch Informat & Elect Engn, Zhejiang 310023, Peoples R China
关键词
Feature extraction; Pedestrians; Three-dimensional displays; Point cloud compression; Encoding; Real-time systems; Object detection; 3D object detection; real-time; scale-aware; pillar-based;
D O I
10.1109/LSP.2024.3426294
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Detecting 3D pedestrian from point cloud data in real-time while accounting for scale is crucial in various robotic and autonomous driving applications. Currently, the most successful methods for 3D object detection rely on voxel-based techniques, but these tend to be computationally inefficient for deployment in aerial scenarios. Conversely, the pillar-based approach exclusively employs 2D convolution, requiring fewer computational resources, albeit potentially sacrificing detection accuracy compared to voxel-based methods. Previous pillar-based approaches suffered from inadequate pillar feature encoding. In this letter, we introduce a real-time and scale-aware 3D Pedestrian Detection, which incorporates a robust encoder network designed for effective pillar feature extraction. The Proposed TriFocus Attention module (TriFA), which integrates external attention and similar attention strategies based on Squeeze and Exception. By comprehensively supervising the point-wise, channel-wise, and pillar-wise of pillar features, it enhances the encoding ability of pillars, suppresses noise in pillar features, and enhances the expression ability of pillar features. The proposed Bidirectional Scale-Aware Feature Pyramid module (BiSAFP) integrates a scale-aware module into the multi-scale pyramid structure. This addition enhances its ability to perceive pedestrian within low-level features. Moreover, it ensures that the significance of feature maps across various feature levels is fully taken into account. BiSAFP represents a lightweight multi-scale pyramid network that minimally impacts inference time while substantially boosting network performance. Our approach achieves real-time detection, processing up to 30 frames per second (FPS).
引用
收藏
页码:2000 / 2004
页数:5
相关论文
共 50 条
  • [1] Accurate and Real-Time 3D Pedestrian Detection Using an Efficient Attentive Pillar Network
    Le, Duy Tho
    Shi, Hengcan
    Rezatofighi, Hamid
    Cai, Jianfei
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (02) : 1159 - 1166
  • [2] Scale-Aware Hierarchical Detection Network for Pedestrian Detection
    Zhang, Xiaowei
    Cao, Shuai
    Chen, Chenglizhao
    IEEE ACCESS, 2020, 8 : 94429 - 94439
  • [3] Improving multispectral pedestrian detection with scale-aware permutation attention and adjacent feature aggregation
    Zuo, Xin
    Wang, Zhi
    Shen, Jifeng
    Yang, Wankou
    IET COMPUTER VISION, 2023, 17 (07) : 726 - 738
  • [4] Single-Shot Scale-Aware Network for Real-Time Face Detection
    Shifeng Zhang
    Longyin Wen
    Hailin Shi
    Zhen Lei
    Siwei Lyu
    Stan Z. Li
    International Journal of Computer Vision, 2019, 127 : 537 - 559
  • [5] Single-Shot Scale-Aware Network for Real-Time Face Detection
    Zhang, Shifeng
    Wen, Longyin
    Shi, Hailin
    Lei, Zhen
    Lyu, Siwei
    Li, Stan Z.
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2019, 127 (6-7) : 537 - 559
  • [6] Real-time crowd counting via lightweight scale-aware network
    Zhu, Fushun
    Yan, Hua
    Chen, Xinyue
    Li, Tong
    NEUROCOMPUTING, 2022, 472 : 54 - 67
  • [7] Real-time dynamic scale-aware fusion detection network: take road damage detection as an example
    Pan, Weichao
    Wang, Xu
    Huan, Wenqing
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2025, 22 (02)
  • [8] Real-time pedestrian detection based on resolution aware feature transformation
    Yu, Shuqin, 1600, Binary Information Press (10):
  • [9] REAL-TIME PEDESTRIAN AND VEHICLE DETECTION IN VIDEO USING 3D CUES
    Lee, Ping-Han
    Chiu, Tzu-Hsuan
    Lin, Yen-Liang
    Hung, Yi-Ping
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 614 - +
  • [10] SRDAN: Scale-aware and Range-aware Domain Adaptation Network for Cross-dataset 3D Object Detection
    Zhang, Weichen
    Li, Wen
    Xu, Dong
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 6765 - 6775