FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer

被引：28

作者：

Liu, Zhijian ^{[1
]}

Yang, Xinyu ^{[1
,2
]}

Tang, Haotian ^{[1
]}

Yang, Shang ^{[1
,3
]}

Han, Song ^{[1
]}

机构：

[1] MIT, Cambridge, MA 02139 USA

[2] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

[3] Tsinghua Univ, Beijing, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年

基金：

美国国家科学基金会;

关键词：

VISION;

D O I：

10.1109/CVPR52729.2023.00122

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Transformer, as an alternative to CNN, has been proven effective in many modalities (e.g., texts and images). For 3D point cloud transformers, existing efforts focus primarily on pushing their accuracy to the state-of-the-art level. However, their latency lags behind sparse convolution-based models (3x slower), hindering their usage in resource-constrained, latency-sensitive applications (such as autonomous driving). This inefficiency comes from point clouds' sparse and irregular nature, whereas transformers are designed for dense, regular workloads. This paper presents FlatFormer to close this latency gap by trading spatial proximity for better computational regularity. We first flatten the point cloud with window-based sorting and partition points into groups of equal sizes rather than windows of equal shapes. This effectively avoids expensive structuring and padding overheads. We then apply self-attention within groups to extract local features, alternate sorting axis to gather features from different directions, and shift windows to exchange features across groups. FlatFormer delivers state-of-the-art accuracy on Waymo Open Dataset with 4.6x speedup over (transformer-based) SST and 1.4x speedup over (sparse convolutional) CenterPoint. This is the first point cloud transformer that achieves real-time performance on edge GPUs and is faster than sparse convolutional methods while achieving on-par or even superior accuracy on large-scale benchmarks.

引用

页码：1200 / 1211

页数：12

共 50 条

[1] PatchFormer: An Efficient Point Transformer with Patch Attention
Zhang, Cheng
Wan, Haocheng
Shen, Xinyi
Wu, Zizhao
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11789 - 11798
[2] SWPT: Spherical Window-Based Point Cloud Transformer
Guo, Xindong
Sun, Yu
Zhao, Rong
Kuang, Liqun
Han, Xie
COMPUTER VISION - ACCV 2022, PT I, 2023, 13841 : 396 - 412
[3] OAAFormer: Robust and Efficient Point Cloud Registration Through Overlapping-Aware Attention in Transformer
Gao, Jun-Jie
Dong, Qiu-Jie
Wang, Rui-An
Chen, Shuang-Min
Xin, Shi-Qing
Tu, Chang-He
Wang, Wenping
Journal of Computer Science and Technology, 2024, 39 (04) : 755 - 770
[4] Multiscale geometric window transformer for orthodontic teeth point cloud registration
Wang, Hao
Tian, Yan
Xu, Yongchuan
Xu, Jiahui
Yang, Tao
Lu, Yan
Chen, Hong
MULTIMEDIA SYSTEMS, 2024, 30 (03)
[5] PointSwin: Modeling Self-Attention with Shifted Window on Point Cloud
Jiang, Cheng
Peng, Yuanxi
Tang, Xuebin
Li, Chunchao
Li, Teng
APPLIED SCIENCES-BASEL, 2022, 12 (24):
[6] PReFormer: A memory-efficient transformer for point cloud semantic segmentation
Akwensi, Perpetual Hope
Wang, Ruisheng
Guo, Bo
INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 128
[7] PCT: Point cloud transformer
Meng-Hao Guo
Jun-Xiong Cai
Zheng-Ning Liu
Tai-Jiang Mu
Ralph R.Martin
Shi-Min Hu
Computational Visual Media, 2021, 7 (02) : 187 - 199
[8] PCT: Point cloud transformer
Guo, Meng-Hao
Cai, Jun-Xiong
Liu, Zheng-Ning
Mu, Tai-Jiang
Martin, Ralph R.
Hu, Shi-Min
COMPUTATIONAL VISUAL MEDIA, 2021, 7 (02) : 187 - 199
[9] PCT: Point cloud transformer
Meng-Hao Guo
Jun-Xiong Cai
Zheng-Ning Liu
Tai-Jiang Mu
Ralph R. Martin
Shi-Min Hu
Computational Visual Media, 2021, 7 : 187 - 199
[10] Transformer Tracking with Cyclic Shifting Window Attention
Song, Zikai
Yu, Junqing
Chen, Yi-Ping Phoebe
Yang, Wei
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8781 - 8790

← 1 2 3 4 5 →