CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation

被引:42
|
作者
Yu, Qihang [1 ,4 ]
Wang, Huiyu [1 ]
Kim, Dahun [2 ]
Qiao, Siyuan [3 ]
Collins, Maxwell [3 ]
Zhu, Yukun [3 ]
Adam, Hartwig [3 ]
Yuille, Alan [1 ]
Chen, Liang-Chieh [3 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
[2] Korea Adv Inst Sci & Technol, Daejeon, South Korea
[3] Google Res, Mountain View, CA USA
[4] Google, Mountain View, CA 94043 USA
关键词
D O I
10.1109/CVPR52688.2022.00259
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose Clustering Mask Transformer (CMT-DeepLab), a transformer-based framework for panoptic segmentation designed around clustering. It rethinks the existing transformer architectures used in segmentation and detection; CMT-DeepLab considers the object queries as cluster centers, which fill the role of grouping the pixels when applied to segmentation. The clustering is computed with an alternating procedure, by first assigning pixels to the clusters by their feature affinity, and then updating the cluster centers and pixel features. Together, these operations comprise the Clustering Mask Transformer (CMT) layer, which produces cross-attention that is denser and more consistent with the final segmentation task. CMT-DeepLab improves the performance over prior art significantly by 4.4% PQ, achieving a new state-of-the-art of 55.7% PQ on the COCO test-dev set.
引用
收藏
页码:2550 / 2560
页数:11
相关论文
共 26 条
  • [1] MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers
    Wang, Huiyu
    Zhu, Yukun
    Adam, Hartwig
    Yuille, Alan
    Chen, Liang-Chieh
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5459 - 5470
  • [2] Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers
    Li, Zhiqi
    Wang, Wenhai
    Xie, Enze
    Yu, Zhiding
    Anandkumar, Anima
    Alvarez, Jose M.
    Luo, Ping
    Lu, Tong
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1270 - 1279
  • [3] DEEP MARKOV CLUSTERING FOR PANOPTIC SEGMENTATION
    Ye, Minxiang
    Zhang, Yifei
    Zhu, Shiqiang
    Xie, Anhuan
    Zhang, Dan
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2380 - 2384
  • [4] Time-Space Transformers for Video Panoptic Segmentation
    Petrovai, Andra
    Nedevschi, Sergiu
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2643 - 2652
  • [5] ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation
    Qiao, Siyuan
    Zhu, Yukun
    Adam, Hartwig
    Yuille, Alan
    Chen, Liang-Chieh
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3996 - 4007
  • [6] A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting
    Van Gansbeke, Wouter
    De Brabandere, Bert
    COMPUTER VISION - ECCV 2024, PT XV, 2025, 15073 : 78 - 97
  • [7] Mask-Pyramid Network: A Novel Panoptic Segmentation Method
    Xian, Peng-Fei
    Po, Lai-Man
    Xiong, Jing-Jing
    Zhao, Yu-Zhi
    Yu, Wing-Yin
    Cheung, Kwok-Wai
    SENSORS, 2024, 24 (05)
  • [8] Mask-Based Panoptic LiDAR Segmentation for Autonomous Driving
    Marcuzzi, Rodrigo
    Nunes, Lucas
    Wiesmann, Louis
    Behley, Jens
    Stachniss, Cyrill
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (02) : 1141 - 1148
  • [9] Panoptic Segmentation of UAV Images with Deformable Convolution Network and Mask Scoring
    Chen, Hongwei
    Ding, Laihui
    Yao, Fengqin
    Ren, Pengfei
    Wang, Shengke
    TWELFTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2020), 2021, 11720
  • [10] Scalable 3D Panoptic Segmentation As Superpoint Graph Clustering
    Robert, Damien
    Raguet, Hugo
    Landrieu, Loic
    2024 INTERNATIONAL CONFERENCE IN 3D VISION, 3DV 2024, 2024, : 179 - 189