MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers

被引:271
|
作者
Wang, Huiyu [1 ,3 ]
Zhu, Yukun [2 ]
Adam, Hartwig [2 ]
Yuille, Alan [1 ]
Chen, Liang-Chieh [2 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
[2] Google Res, Mountain View, CA USA
[3] Google, Mountain View, CA 94043 USA
来源
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年
关键词
D O I
10.1109/CVPR46437.2021.00542
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present MaX-DeepLab, the first end-to-end model for panoptic segmentation. Our approach simplifies the current pipeline that depends heavily on surrogate sub-tasks and hand-designed components, such as box detection, non-maximum suppression, thing-stuff merging, etc. Although these sub-tasks are tackled by area experts, they fail to comprehensively solve the target task. By contrast, our MaX-DeepLab directly predicts class-labeled masks with a mask transformer, and is trained with a panoptic quality inspired loss via bipartite matching. Our mask transformer employs a dual-path architecture that introduces a global memory path in addition to a CNN path, allowing direct communication with any CNN layers. As a result, MaX-DeepLab shows a significant 7.1% PQ gain in the box-free regime on the challenging COCO dataset, closing the gap between box-based and box-free methods for the first time. A small variant of MaX-DeepLab improves 3.0% PQ over DETR with similar parameters and M-Adds. Furthermore, MaX-DeepLab, without test time augmentation, achieves new state-of-the-art 51.3% PQ on COCO test-dev set.
引用
收藏
页码:5459 / 5470
页数:12
相关论文
共 50 条
  • [41] Semantic Mask for Transformer based End-to-End Speech Recognition
    Wang, Chengyi
    Wu, Yu
    Du, Yujiao
    Li, Jinyu
    Liu, Shujie
    Lu, Liang
    Ren, Shuo
    Ye, Guoli
    Zhao, Sheng
    Zhou, Ming
    INTERSPEECH 2020, 2020, : 971 - 975
  • [42] ColorRL: Reinforced Coloring for End-to-End Instance Segmentation
    Tuan, Tran Anh
    Khoa, Nguyen Tuan
    Tran Minh Quan
    Jeong, Won-Ki
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 16722 - 16731
  • [43] Evaluating Subtitle Segmentation for End-to-end Generation Systems
    Karakanta, Alina
    Buet, Franc
    Cettolo, Mauro
    Yvon, Francois
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3069 - 3078
  • [44] An end-to-end generative framework for video segmentation and recognition
    Kuehne, Hilde
    Gall, Juergen
    Serre, Thomas
    2016 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2016), 2016,
  • [45] End-to-End Segmentation-based News Summarization
    Liu, Yang
    Zhu, Chenguang
    Zeng, Michael
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 544 - 554
  • [46] Learned Watershed: End-to-End Learning of Seeded Segmentation
    Wolf, Steffen
    Schott, Lukas
    Koethe, Ullrich
    Hamprecht, Fred
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2030 - 2038
  • [47] End-to-End Simultaneous Speech Translation with Differentiable Segmentation
    Zhang, Shaolei
    Feng, Yang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 7659 - 7680
  • [48] Liver Segmentation A Weakly End-to-End Supervised Model
    Ouassit, Youssef
    Ardchir, Soufiane
    Moulouki, Reda
    El Ghoumari, Mohammed Yassine
    Azzouazi, Mohamed
    INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2020, 16 (09) : 77 - 87
  • [49] UniInst: Unique representation for end-to-end instance segmentation
    Ou, Yimin
    Yang, Rui
    Ma, Lufan
    Liu, Yong
    Yan, Jiangpeng
    Xu, Shang
    Wang, Chengjie
    Li, Xiu
    NEUROCOMPUTING, 2022, 514 : 551 - 562
  • [50] An End-to-End Tree Based Approach for Instance Segmentation
    Manohar, K., V
    Niitani, Yusuke
    COMPUTER VISION - ECCV 2018 WORKSHOPS, PT V, 2019, 11133 : 521 - 527