ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention

被引:0
|
作者
He, Chenhang [1 ]
Li, Ruihuang [1 ,2 ]
Zhang, Guowen [1 ]
Zhang, Lei [1 ,2 ]
机构
[1] Hong Kong Polytech Univ, Hong Kong, Peoples R China
[2] OPPO Res, Shenzhen, Peoples R China
来源
关键词
3D Object Detection; Voxel Transformer;
D O I
10.1007/978-3-031-73397-0_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Window-based transformers excel in large-scale point cloud understanding by capturing context-aware representations with affordable attention computation in a more localized manner. However, the sparse nature of point clouds leads to a significant variance in the number of voxels per window. Existing methods group the voxels in each window into fixed-length sequences through extensive sorting and padding operations, resulting in a non-negligible computational and memory overhead. In this paper, we introduce ScatterFormer, which to the best of our knowledge, is the first to directly apply attention to voxels across different windows as a single sequence. The key of ScatterFormer is a Scattered Linear Attention (SLA) module, which leverages the pre-computation of key-value pairs in linear attention to enable parallel computation on the variable-length voxel sequences divided by windows. Leveraging the hierarchical structure of GPUs and shared memory, we propose a chunk-wise algorithm that reduces the SLA module's latency to less than 1 millisecond on moderate GPUs. Furthermore, we develop a cross-window interaction module that improves the locality and connectivity of voxel features across different windows, eliminating the need for extensive window shifting. Our proposed ScatterFormer demonstrates 73.8 mAP (L2) on the Waymo Open Dataset and 72.4 NDS on the NuScenes dataset, running at an outstanding detection rate of 23 FPS. The code is available at https://github.com/skyhehe123/ScatterFormer.
引用
收藏
页码:74 / 92
页数:19
相关论文
共 50 条
  • [1] PARAMETER-EFFICIENT VISION TRANSFORMER WITH LINEAR ATTENTION
    Zhao, Youpeng
    Tang, Huadong
    Jiang, Yingying
    Yong, A.
    Wu, Qiang
    Wang, Jun
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1275 - 1279
  • [2] PPLA-Transformer: An Efficient Transformer for Defect Detection with Linear Attention Based on Pyramid Pooling
    Song, Xiaona
    Tian, Yubo
    Liu, Haichao
    Wang, Lijun
    Niu, Jinxing
    SENSORS, 2025, 25 (03)
  • [3] Efficient transformer tracking with adaptive attention
    Xiao, Dingkun
    Wei, Zhenzhong
    Zhang, Guangjun
    IET COMPUTER VISION, 2024,
  • [4] Efficient Attention: Attention with Linear Complexities
    Shen Zhuoran
    Zhang Mingyuan
    Zhao Haiyu
    Yi Shuai
    Li Hongsheng
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3530 - 3538
  • [5] FLatten Transformer: Vision Transformer using Focused Linear Attention
    Han, Dongchen
    Pan, Xuran
    Han, Yizeng
    Song, Shiji
    Huang, Gao
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5938 - 5948
  • [6] PatchFormer: An Efficient Point Transformer with Patch Attention
    Zhang, Cheng
    Wan, Haocheng
    Shen, Xinyi
    Wu, Zizhao
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11789 - 11798
  • [7] An Efficient Transformer with Distance-aware Attention
    Duan, Gaoxiang
    Zheng, Xiaoying
    Zhu, Yongxin
    Ren, Tao
    Yan, Yan
    2023 IEEE 9TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD, BIGDATASECURITY, IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING, HPSC AND IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY, IDS, 2023, : 96 - 101
  • [8] VITALT: a robust and efficient brain tumor detection system using vision transformer with attention and linear transformation
    Poornam, S.
    Angelina, J. Jane Rubel
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (12): : 6403 - 6419
  • [9] VITALT: a robust and efficient brain tumor detection system using vision transformer with attention and linear transformation
    S. Poornam
    J. Jane Rubel Angelina
    Neural Computing and Applications, 2024, 36 : 6403 - 6419
  • [10] Transformer With Linear-Window Attention for Feature Matching
    Shen, Zhiwei
    Kong, Bin
    Dong, Xiaoyu
    IEEE ACCESS, 2023, 11 : 121202 - 121211