MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers

被引：271

作者：

Wang, Huiyu ^{[1
,3
]}

Zhu, Yukun ^{[2
]}

Adam, Hartwig ^{[2
]}

Yuille, Alan ^{[1
]}

Chen, Liang-Chieh ^{[2
]}

机构：

[1] Johns Hopkins Univ, Baltimore, MD 21218 USA

[2] Google Res, Mountain View, CA USA

[3] Google, Mountain View, CA 94043 USA

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年

关键词：

D O I：

10.1109/CVPR46437.2021.00542

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present MaX-DeepLab, the first end-to-end model for panoptic segmentation. Our approach simplifies the current pipeline that depends heavily on surrogate sub-tasks and hand-designed components, such as box detection, non-maximum suppression, thing-stuff merging, etc. Although these sub-tasks are tackled by area experts, they fail to comprehensively solve the target task. By contrast, our MaX-DeepLab directly predicts class-labeled masks with a mask transformer, and is trained with a panoptic quality inspired loss via bipartite matching. Our mask transformer employs a dual-path architecture that introduces a global memory path in addition to a CNN path, allowing direct communication with any CNN layers. As a result, MaX-DeepLab shows a significant 7.1% PQ gain in the box-free regime on the challenging COCO dataset, closing the gap between box-based and box-free methods for the first time. A small variant of MaX-DeepLab improves 3.0% PQ over DETR with similar parameters and M-Adds. Furthermore, MaX-DeepLab, without test time augmentation, achieves new state-of-the-art 51.3% PQ on COCO test-dev set.

引用

页码：5459 / 5470

页数：12

共 50 条

[1] CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation
Yu, Qihang
Wang, Huiyu
Kim, Dahun
Qiao, Siyuan
Collins, Maxwell
Zhu, Yukun
Adam, Hartwig
Yuille, Alan
Chen, Liang-Chieh
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2550 - 2560
[2] An End-to-End Network for Panoptic Segmentation
Liu, Huanyu
Peng, Chao
Yu, Changqian
Wang, Jingbo
Liu, Xu
Yu, Gang
Jiang, Wei
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6165 - 6174
[3] Panoster: End-to-End Panoptic Segmentation of LiDAR Point Clouds
Gasperini, Stefano
Mahani, Mohammad-Ali Nikouei
Marcos-Ramiro, Alvaro
Navab, Nassir
Tombari, Federico
IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02) : 3216 - 3223
[4] End-to-End Video Instance Segmentation with Transformers
Wang, Yuqing
Xu, Zhaoliang
Wang, Xinlong
Shen, Chunhua
Cheng, Baoshan
Shen, Hao
Xia, Huaxia
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 8737 - 8746
[5] Mask DeepLab: End-to-end image segmentation for change detection in high-resolution remote sensing images
Wang, Yanheng
Gao, Lianru
Hong, Danfeng
Sha, Jianjun
Liu, Lian
Zhang, Bing
Rong, Xianhui
Zhang, Yonggang
INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2021, 104
[6] Mask4D: End-to-End Mask-Based 4D Panoptic Segmentation for LiDAR Sequences
Marcuzzi, Rodrigo
Nunes, Lucas
Wiesmann, Louis
Marks, Elias
Behley, Jens
Stachniss, Cyrill
IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (11): : 7487 - 7494
[7] EfficientDPS: Efficient and End-to-End Depth-aware Panoptic Segmentation
Wu, Shengkai
Ren, Liangliang
Gao, Linfeng
Li, Yupeng
Liu, Wenyu
2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2024), 2024, : 16199 - 16206
[8] Segmentation mask guided end-to-end person search
Zheng, Dingyuan
Xiao, Jimin
Huang, Kaizhu
Zhao, Yao
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 86
[9] End-to-End Referring Video Object Segmentation with Multimodal Transformers
Botach, Adam
Zheltonozhskii, Evgenii
Baskin, Chaim
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4975 - 4985
[10] Panoptic Segmentation with an End-to-End Cell R-CNN for Pathology Image Analysis
Zhang, Donghao
Song, Yang
Liu, Dongnan
Jia, Haozhe
Liu, Siqi
Xia, Yong
Huang, Heng
Cai, Weidong
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2018, PT II, 2018, 11071 : 237 - 244

← 1 2 3 4 5 →