MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers

被引:271
|
作者
Wang, Huiyu [1 ,3 ]
Zhu, Yukun [2 ]
Adam, Hartwig [2 ]
Yuille, Alan [1 ]
Chen, Liang-Chieh [2 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
[2] Google Res, Mountain View, CA USA
[3] Google, Mountain View, CA 94043 USA
来源
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年
关键词
D O I
10.1109/CVPR46437.2021.00542
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present MaX-DeepLab, the first end-to-end model for panoptic segmentation. Our approach simplifies the current pipeline that depends heavily on surrogate sub-tasks and hand-designed components, such as box detection, non-maximum suppression, thing-stuff merging, etc. Although these sub-tasks are tackled by area experts, they fail to comprehensively solve the target task. By contrast, our MaX-DeepLab directly predicts class-labeled masks with a mask transformer, and is trained with a panoptic quality inspired loss via bipartite matching. Our mask transformer employs a dual-path architecture that introduces a global memory path in addition to a CNN path, allowing direct communication with any CNN layers. As a result, MaX-DeepLab shows a significant 7.1% PQ gain in the box-free regime on the challenging COCO dataset, closing the gap between box-based and box-free methods for the first time. A small variant of MaX-DeepLab improves 3.0% PQ over DETR with similar parameters and M-Adds. Furthermore, MaX-DeepLab, without test time augmentation, achieves new state-of-the-art 51.3% PQ on COCO test-dev set.
引用
收藏
页码:5459 / 5470
页数:12
相关论文
共 50 条
  • [21] End-to-End diagnosis of breast biopsy images with transformers
    Mehta, Sachin
    Lu, Ximing
    Wu, Wenjun
    Weaver, Donald
    Hajishirzi, Hannaneh
    Elmore, Joann G.
    Shapiro, Linda G.
    MEDICAL IMAGE ANALYSIS, 2022, 79
  • [22] RETR: END-TO-END REFERRING EXPRESSION COMPREHENSION WITH TRANSFORMERS
    Rui, Yang
    2022 19TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2022,
  • [23] Towards End-to-End Image Compression and Analysis with Transformers
    Bai, Yuanchao
    Yang, Xu
    Liu, Xianming
    Jiang, Junjun
    Wang, Yaowei
    Ji, Xiangyang
    Gao, Wen
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 104 - 112
  • [24] TRANSBUILDING: AN END-TO-END POLYGONAL BUILDING EXTRACTION WITH TRANSFORMERS
    Zhang, Mingming
    Liu, Qingjie
    Wang, Wei
    Wang, Yunhong
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 460 - 464
  • [25] REGTR: End-to-end Point Cloud Correspondences with Transformers
    Yew, Zi Jian
    Lee, Gim Hee
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 6667 - 6676
  • [26] On the Use of Transformers for End-to-End Optical Music Recognition
    Rios-Vila, Antonio
    Inesta, Jose M.
    Calvo-Zaragoza, Jorge
    PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2022), 2022, 13256 : 470 - 481
  • [27] VPDETR: End-to-End Vanishing Point DEtection TRansformers
    Chen, Taiyan
    Ying, Xianghua
    Yang, Jinfa
    Wang, Ruibin
    Guo, Ruohao
    Xing, Bowei
    Shi, Ji
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1192 - 1200
  • [28] End-to-End Graph-Constrained Vectorized Floorplan Generation with Panoptic Refinement
    Liu, Jiachen
    Xue, Yuan
    Duarte, Jose
    Shekhawat, Krishnendra
    Zhou, Zihan
    Huang, Xiaolei
    COMPUTER VISION - ECCV 2022, PT XV, 2022, 13675 : 547 - 562
  • [29] Modeling Stroke Mask for End-to-End Text Erasing
    Du, Xiangcheng
    Zhou, Zhao
    Zheng, Yingbin
    Ma, Tianlong
    Wu, Xingjiao
    Jin, Cheng
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 6140 - 6148
  • [30] End-to-End Supervised Lung Lobe Segmentation
    Ferreira, Filipe T.
    Sousa, Patrick
    Galdran, Adrian
    Sousa, Marta R.
    Campilho, Aurelio
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,