ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention

被引：0

作者：

He, Chenhang ^{[1
]}

Li, Ruihuang ^{[1
,2
]}

Zhang, Guowen ^{[1
]}

Zhang, Lei ^{[1
,2
]}

机构：

[1] Hong Kong Polytech Univ, Hong Kong, Peoples R China

[2] OPPO Res, Shenzhen, Peoples R China

来源：

COMPUTER VISION - ECCV 2024, PT XXIX | 2025年 / 15087卷

关键词：

3D Object Detection; Voxel Transformer;

D O I：

10.1007/978-3-031-73397-0_5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Window-based transformers excel in large-scale point cloud understanding by capturing context-aware representations with affordable attention computation in a more localized manner. However, the sparse nature of point clouds leads to a significant variance in the number of voxels per window. Existing methods group the voxels in each window into fixed-length sequences through extensive sorting and padding operations, resulting in a non-negligible computational and memory overhead. In this paper, we introduce ScatterFormer, which to the best of our knowledge, is the first to directly apply attention to voxels across different windows as a single sequence. The key of ScatterFormer is a Scattered Linear Attention (SLA) module, which leverages the pre-computation of key-value pairs in linear attention to enable parallel computation on the variable-length voxel sequences divided by windows. Leveraging the hierarchical structure of GPUs and shared memory, we propose a chunk-wise algorithm that reduces the SLA module's latency to less than 1 millisecond on moderate GPUs. Furthermore, we develop a cross-window interaction module that improves the locality and connectivity of voxel features across different windows, eliminating the need for extensive window shifting. Our proposed ScatterFormer demonstrates 73.8 mAP (L2) on the Waymo Open Dataset and 72.4 NDS on the NuScenes dataset, running at an outstanding detection rate of 23 FPS. The code is available at https://github.com/skyhehe123/ScatterFormer.

引用

页码：74 / 92

页数：19

共 50 条

[1] PARAMETER-EFFICIENT VISION TRANSFORMER WITH LINEAR ATTENTION
Zhao, Youpeng
Tang, Huadong
Jiang, Yingying
Yong, A.
Wu, Qiang
Wang, Jun
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1275 - 1279
[2] PPLA-Transformer: An Efficient Transformer for Defect Detection with Linear Attention Based on Pyramid Pooling
Song, Xiaona
Tian, Yubo
Liu, Haichao
Wang, Lijun
Niu, Jinxing
SENSORS, 2025, 25 (03)
[3] Efficient transformer tracking with adaptive attention
Xiao, Dingkun
Wei, Zhenzhong
Zhang, Guangjun
IET COMPUTER VISION, 2024,
[4] Efficient Attention: Attention with Linear Complexities
Shen Zhuoran
Zhang Mingyuan
Zhao Haiyu
Yi Shuai
Li Hongsheng
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3530 - 3538
[5] FLatten Transformer: Vision Transformer using Focused Linear Attention
Han, Dongchen
Pan, Xuran
Han, Yizeng
Song, Shiji
Huang, Gao
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5938 - 5948
[6] PatchFormer: An Efficient Point Transformer with Patch Attention
Zhang, Cheng
Wan, Haocheng
Shen, Xinyi
Wu, Zizhao
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11789 - 11798
[7] An Efficient Transformer with Distance-aware Attention
Duan, Gaoxiang
Zheng, Xiaoying
Zhu, Yongxin
Ren, Tao
Yan, Yan
2023 IEEE 9TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD, BIGDATASECURITY, IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING, HPSC AND IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY, IDS, 2023, : 96 - 101
[8] VITALT: a robust and efficient brain tumor detection system using vision transformer with attention and linear transformation
Poornam, S.
Angelina, J. Jane Rubel
NEURAL COMPUTING & APPLICATIONS, 2024, 36 (12): : 6403 - 6419
[9] VITALT: a robust and efficient brain tumor detection system using vision transformer with attention and linear transformation
S. Poornam
J. Jane Rubel Angelina
Neural Computing and Applications, 2024, 36 : 6403 - 6419
[10] Transformer With Linear-Window Attention for Feature Matching
Shen, Zhiwei
Kong, Bin
Dong, Xiaoyu
IEEE ACCESS, 2023, 11 : 121202 - 121211

← 1 2 3 4 5 →