FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer

被引:28
|
作者
Liu, Zhijian [1 ]
Yang, Xinyu [1 ,2 ]
Tang, Haotian [1 ]
Yang, Shang [1 ,3 ]
Han, Song [1 ]
机构
[1] MIT, Cambridge, MA 02139 USA
[2] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[3] Tsinghua Univ, Beijing, Peoples R China
基金
美国国家科学基金会;
关键词
VISION;
D O I
10.1109/CVPR52729.2023.00122
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer, as an alternative to CNN, has been proven effective in many modalities (e.g., texts and images). For 3D point cloud transformers, existing efforts focus primarily on pushing their accuracy to the state-of-the-art level. However, their latency lags behind sparse convolution-based models (3x slower), hindering their usage in resource-constrained, latency-sensitive applications (such as autonomous driving). This inefficiency comes from point clouds' sparse and irregular nature, whereas transformers are designed for dense, regular workloads. This paper presents FlatFormer to close this latency gap by trading spatial proximity for better computational regularity. We first flatten the point cloud with window-based sorting and partition points into groups of equal sizes rather than windows of equal shapes. This effectively avoids expensive structuring and padding overheads. We then apply self-attention within groups to extract local features, alternate sorting axis to gather features from different directions, and shift windows to exchange features across groups. FlatFormer delivers state-of-the-art accuracy on Waymo Open Dataset with 4.6x speedup over (transformer-based) SST and 1.4x speedup over (sparse convolutional) CenterPoint. This is the first point cloud transformer that achieves real-time performance on edge GPUs and is faster than sparse convolutional methods while achieving on-par or even superior accuracy on large-scale benchmarks.
引用
收藏
页码:1200 / 1211
页数:12
相关论文
共 50 条
  • [1] PatchFormer: An Efficient Point Transformer with Patch Attention
    Zhang, Cheng
    Wan, Haocheng
    Shen, Xinyi
    Wu, Zizhao
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11789 - 11798
  • [2] SWPT: Spherical Window-Based Point Cloud Transformer
    Guo, Xindong
    Sun, Yu
    Zhao, Rong
    Kuang, Liqun
    Han, Xie
    COMPUTER VISION - ACCV 2022, PT I, 2023, 13841 : 396 - 412
  • [3] OAAFormer: Robust and Efficient Point Cloud Registration Through Overlapping-Aware Attention in Transformer
    Gao, Jun-Jie
    Dong, Qiu-Jie
    Wang, Rui-An
    Chen, Shuang-Min
    Xin, Shi-Qing
    Tu, Chang-He
    Wang, Wenping
    Journal of Computer Science and Technology, 2024, 39 (04) : 755 - 770
  • [4] Multiscale geometric window transformer for orthodontic teeth point cloud registration
    Wang, Hao
    Tian, Yan
    Xu, Yongchuan
    Xu, Jiahui
    Yang, Tao
    Lu, Yan
    Chen, Hong
    MULTIMEDIA SYSTEMS, 2024, 30 (03)
  • [5] PointSwin: Modeling Self-Attention with Shifted Window on Point Cloud
    Jiang, Cheng
    Peng, Yuanxi
    Tang, Xuebin
    Li, Chunchao
    Li, Teng
    APPLIED SCIENCES-BASEL, 2022, 12 (24):
  • [6] PReFormer: A memory-efficient transformer for point cloud semantic segmentation
    Akwensi, Perpetual Hope
    Wang, Ruisheng
    Guo, Bo
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 128
  • [7] PCT: Point cloud transformer
    Meng-Hao Guo
    Jun-Xiong Cai
    Zheng-Ning Liu
    Tai-Jiang Mu
    Ralph R.Martin
    Shi-Min Hu
    Computational Visual Media, 2021, 7 (02) : 187 - 199
  • [8] PCT: Point cloud transformer
    Guo, Meng-Hao
    Cai, Jun-Xiong
    Liu, Zheng-Ning
    Mu, Tai-Jiang
    Martin, Ralph R.
    Hu, Shi-Min
    COMPUTATIONAL VISUAL MEDIA, 2021, 7 (02) : 187 - 199
  • [9] PCT: Point cloud transformer
    Meng-Hao Guo
    Jun-Xiong Cai
    Zheng-Ning Liu
    Tai-Jiang Mu
    Ralph R. Martin
    Shi-Min Hu
    Computational Visual Media, 2021, 7 : 187 - 199
  • [10] Transformer Tracking with Cyclic Shifting Window Attention
    Song, Zikai
    Yu, Junqing
    Chen, Yi-Ping Phoebe
    Yang, Wei
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8781 - 8790