MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers

被引:271
|
作者
Wang, Huiyu [1 ,3 ]
Zhu, Yukun [2 ]
Adam, Hartwig [2 ]
Yuille, Alan [1 ]
Chen, Liang-Chieh [2 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
[2] Google Res, Mountain View, CA USA
[3] Google, Mountain View, CA 94043 USA
来源
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年
关键词
D O I
10.1109/CVPR46437.2021.00542
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present MaX-DeepLab, the first end-to-end model for panoptic segmentation. Our approach simplifies the current pipeline that depends heavily on surrogate sub-tasks and hand-designed components, such as box detection, non-maximum suppression, thing-stuff merging, etc. Although these sub-tasks are tackled by area experts, they fail to comprehensively solve the target task. By contrast, our MaX-DeepLab directly predicts class-labeled masks with a mask transformer, and is trained with a panoptic quality inspired loss via bipartite matching. Our mask transformer employs a dual-path architecture that introduces a global memory path in addition to a CNN path, allowing direct communication with any CNN layers. As a result, MaX-DeepLab shows a significant 7.1% PQ gain in the box-free regime on the challenging COCO dataset, closing the gap between box-based and box-free methods for the first time. A small variant of MaX-DeepLab improves 3.0% PQ over DETR with similar parameters and M-Adds. Furthermore, MaX-DeepLab, without test time augmentation, achieves new state-of-the-art 51.3% PQ on COCO test-dev set.
引用
收藏
页码:5459 / 5470
页数:12
相关论文
共 50 条
  • [31] The End-to-End Segmentation on Automotive Radar Imagery
    Xiao, Yang
    Daniel, Liam
    Gashinova, Marina
    2021 18TH EUROPEAN RADAR CONFERENCE (EURAD), 2021, : 265 - 268
  • [32] End-to-End Instance Segmentation with Recurrent Attention
    Ren, Mengye
    Zemel, Richard S.
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 293 - 301
  • [33] End-to-End Ultrametric Learning for Hierarchical Segmentation
    Lapertot, Raphael
    Chierchia, Giovanni
    Perret, Benjamin
    DISCRETE GEOMETRY AND MATHEMATICAL MORPHOLOGY, DGMM 2024, 2024, 14605 : 286 - 297
  • [34] SWINBERT: End-to-End Transformers with Sparse Attention for Video Captioning
    Lin, Kevin
    Li, Linjie
    Lin, Chung-Ching
    Ahmed, Faisal
    Gan, Zhe
    Liu, Zicheng
    Lu, Yumao
    Wang, Lijuan
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17928 - 17937
  • [35] End-to-End Human-Gaze-Target Detection with Transformers
    Tu, Danyang
    Min, Xiongkuo
    Duan, Huiyu
    Guo, Guodong
    Zhai, Guangtao
    Shen, Wei
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2192 - 2200
  • [36] VRDFormer: End-to-End Video Visual Relation Detection with Transformers
    Zheng, Sipeng
    Chen, Shizhe
    Jin, Qin
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18814 - 18824
  • [37] End-to-End Multi-Person Pose Estimation with Transformers
    Shi, Dahu
    Wei, Xing
    Li, Liangqi
    Ren, Ye
    Tan, Wenming
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11059 - 11068
  • [38] Deeply Tensor Compressed Transformers for End-to-End Object Detection
    Zhen, Peining
    Gao, Ziyang
    Hou, Tianshu
    Cheng, Yuan
    Chen, Hai-Bao
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 4716 - 4724
  • [39] Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network
    Wang, Haowei
    Ji, Jiayi
    Zhou, Yiyi
    Wu, Yongjian
    Sun, Xiaoshuai
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 2528 - 2536
  • [40] Image Inpainting by End-to-End Cascaded Refinement With Mask Awareness
    Zhu, Manyu
    He, Dongliang
    Li, Xin
    Li, Chao
    Li, Fu
    Liu, Xiao
    Ding, Errui
    Zhang, Zhaoxiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 4855 - 4866