ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention

被引:0
|
作者
He, Chenhang [1 ]
Li, Ruihuang [1 ,2 ]
Zhang, Guowen [1 ]
Zhang, Lei [1 ,2 ]
机构
[1] Hong Kong Polytech Univ, Hong Kong, Peoples R China
[2] OPPO Res, Shenzhen, Peoples R China
来源
关键词
3D Object Detection; Voxel Transformer;
D O I
10.1007/978-3-031-73397-0_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Window-based transformers excel in large-scale point cloud understanding by capturing context-aware representations with affordable attention computation in a more localized manner. However, the sparse nature of point clouds leads to a significant variance in the number of voxels per window. Existing methods group the voxels in each window into fixed-length sequences through extensive sorting and padding operations, resulting in a non-negligible computational and memory overhead. In this paper, we introduce ScatterFormer, which to the best of our knowledge, is the first to directly apply attention to voxels across different windows as a single sequence. The key of ScatterFormer is a Scattered Linear Attention (SLA) module, which leverages the pre-computation of key-value pairs in linear attention to enable parallel computation on the variable-length voxel sequences divided by windows. Leveraging the hierarchical structure of GPUs and shared memory, we propose a chunk-wise algorithm that reduces the SLA module's latency to less than 1 millisecond on moderate GPUs. Furthermore, we develop a cross-window interaction module that improves the locality and connectivity of voxel features across different windows, eliminating the need for extensive window shifting. Our proposed ScatterFormer demonstrates 73.8 mAP (L2) on the Waymo Open Dataset and 72.4 NDS on the NuScenes dataset, running at an outstanding detection rate of 23 FPS. The code is available at https://github.com/skyhehe123/ScatterFormer.
引用
收藏
页码:74 / 92
页数:19
相关论文
共 50 条
  • [31] Transformer-like model with linear attention for speech emotion recognition
    Du, Jing
    Tang, Manting
    Zhao, Li
    Journal of Southeast University (English Edition), 2021, 37 (02): : 164 - 170
  • [32] An Efficient Piecewise Linear Approximation of Non-linear Operations for Transformer Inference
    Lu, Haodong
    Mei, Qichang
    Wang, Kun
    2023 IEEE 31ST ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, FCCM, 2023, : 206 - 206
  • [33] Efficient Deraining model using Transformer and Kernel Basis Attention for UAVs
    Tomida, Yuto
    Katayama, Takafumi
    Song, Tian
    Shimamoto, Takashi
    2024 INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS, AND COMMUNICATIONS, ITC-CSCC 2024, 2024,
  • [34] Cross-Parallel Attention and Efficient Match Transformer for Aerial Tracking
    Deng, Anping
    Han, Guangliang
    Zhang, Zhongbo
    Chen, Dianbing
    Ma, Tianjiao
    Liu, Zhichao
    REMOTE SENSING, 2024, 16 (06)
  • [35] Efficient convolutional dual-attention transformer for automatic modulation recognition
    Yi, Zengrui
    Meng, Hua
    Gao, Lu
    He, Zhonghang
    Yang, Meng
    APPLIED INTELLIGENCE, 2025, 55 (03)
  • [36] Efficient Diffusion Transformer with Step-Wise Dynamic Attention Mediators
    Pu, Yifan
    Xia, Zhuofan
    Guo, Jiayi
    Han, Dongchen
    Li, Qixiu
    Li, Duo
    Yuan, Yuhui
    Li, Ji
    Han, Yizeng
    Song, Shiji
    Huang, Gao
    Li, Xiu
    COMPUTER VISION - ECCV 2024, PT XV, 2025, 15073 : 424 - 441
  • [37] Decomformer: Decompose Self-Attention of Transformer for Efficient Image Restoration
    Lee, Eunho
    Hwang, Youngbae
    IEEE ACCESS, 2024, 12 : 38672 - 38684
  • [38] DBA: Efficient Transformer With Dynamic Bilinear Low-Rank Attention
    Qin, Bosheng
    Li, Juncheng
    Tang, Siliang
    Zhuang, Yueting
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025,
  • [39] ScatterFormer: Locally-Invariant Scattering Transformer for Patient-Independent Multispectral Detection of Epileptiform Discharges
    Zheng, Ruizhe
    Li, Jun
    Wang, Yi
    Luo, Tian
    Yu, Yuguo
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 148 - 158
  • [40] Efficient Linear Attention for Fast and Accurate Keypoint Matching
    Suwanwimolkul, Suwichaya
    Komorita, Satoshi
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 330 - 341