MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers

被引:271
|
作者
Wang, Huiyu [1 ,3 ]
Zhu, Yukun [2 ]
Adam, Hartwig [2 ]
Yuille, Alan [1 ]
Chen, Liang-Chieh [2 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
[2] Google Res, Mountain View, CA USA
[3] Google, Mountain View, CA 94043 USA
来源
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年
关键词
D O I
10.1109/CVPR46437.2021.00542
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present MaX-DeepLab, the first end-to-end model for panoptic segmentation. Our approach simplifies the current pipeline that depends heavily on surrogate sub-tasks and hand-designed components, such as box detection, non-maximum suppression, thing-stuff merging, etc. Although these sub-tasks are tackled by area experts, they fail to comprehensively solve the target task. By contrast, our MaX-DeepLab directly predicts class-labeled masks with a mask transformer, and is trained with a panoptic quality inspired loss via bipartite matching. Our mask transformer employs a dual-path architecture that introduces a global memory path in addition to a CNN path, allowing direct communication with any CNN layers. As a result, MaX-DeepLab shows a significant 7.1% PQ gain in the box-free regime on the challenging COCO dataset, closing the gap between box-based and box-free methods for the first time. A small variant of MaX-DeepLab improves 3.0% PQ over DETR with similar parameters and M-Adds. Furthermore, MaX-DeepLab, without test time augmentation, achieves new state-of-the-art 51.3% PQ on COCO test-dev set.
引用
收藏
页码:5459 / 5470
页数:12
相关论文
共 50 条
  • [1] CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation
    Yu, Qihang
    Wang, Huiyu
    Kim, Dahun
    Qiao, Siyuan
    Collins, Maxwell
    Zhu, Yukun
    Adam, Hartwig
    Yuille, Alan
    Chen, Liang-Chieh
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2550 - 2560
  • [2] An End-to-End Network for Panoptic Segmentation
    Liu, Huanyu
    Peng, Chao
    Yu, Changqian
    Wang, Jingbo
    Liu, Xu
    Yu, Gang
    Jiang, Wei
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6165 - 6174
  • [3] Panoster: End-to-End Panoptic Segmentation of LiDAR Point Clouds
    Gasperini, Stefano
    Mahani, Mohammad-Ali Nikouei
    Marcos-Ramiro, Alvaro
    Navab, Nassir
    Tombari, Federico
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02) : 3216 - 3223
  • [4] End-to-End Video Instance Segmentation with Transformers
    Wang, Yuqing
    Xu, Zhaoliang
    Wang, Xinlong
    Shen, Chunhua
    Cheng, Baoshan
    Shen, Hao
    Xia, Huaxia
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 8737 - 8746
  • [5] Mask DeepLab: End-to-end image segmentation for change detection in high-resolution remote sensing images
    Wang, Yanheng
    Gao, Lianru
    Hong, Danfeng
    Sha, Jianjun
    Liu, Lian
    Zhang, Bing
    Rong, Xianhui
    Zhang, Yonggang
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2021, 104
  • [6] Mask4D: End-to-End Mask-Based 4D Panoptic Segmentation for LiDAR Sequences
    Marcuzzi, Rodrigo
    Nunes, Lucas
    Wiesmann, Louis
    Marks, Elias
    Behley, Jens
    Stachniss, Cyrill
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (11): : 7487 - 7494
  • [7] EfficientDPS: Efficient and End-to-End Depth-aware Panoptic Segmentation
    Wu, Shengkai
    Ren, Liangliang
    Gao, Linfeng
    Li, Yupeng
    Liu, Wenyu
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2024), 2024, : 16199 - 16206
  • [8] Segmentation mask guided end-to-end person search
    Zheng, Dingyuan
    Xiao, Jimin
    Huang, Kaizhu
    Zhao, Yao
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 86
  • [9] End-to-End Referring Video Object Segmentation with Multimodal Transformers
    Botach, Adam
    Zheltonozhskii, Evgenii
    Baskin, Chaim
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4975 - 4985
  • [10] Panoptic Segmentation with an End-to-End Cell R-CNN for Pathology Image Analysis
    Zhang, Donghao
    Song, Yang
    Liu, Dongnan
    Jia, Haozhe
    Liu, Siqi
    Xia, Yong
    Huang, Heng
    Cai, Weidong
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2018, PT II, 2018, 11071 : 237 - 244