Masked-attention Mask Transformer for Universal Image Segmentation

被引：1023

作者：

Cheng, Bowen ^{[1
,2
]}

Misra, Ishan ^{[1
]}

Schwing, Alexander G. ^{[2
]}

Kirillov, Alexander ^{[1
]}

Girdhar, Rohit ^{[1
]}

机构：

[1] Facebook AI Res FAIR, Menlo Pk, CA 94025 USA

[2] Univ Illinois Urbana Champaign UIUC, Champaign, IL 61820 USA

来源：

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) | 2022年

关键词：

D O I：

10.1109/CVPR52688.2022.00135

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Image segmentation groups pixels with different semantics, e.g., category or instance membership. Each choice of semantics defines a task. While only the semantics of each task differ, current research focuses on designing specialized architectures for each task. We present Masked-attention Mask Transformer (Mask2Former), a new architecture capable of addressing any image segmentation task (panoptic, instance or semantic). Its key components include masked attention, which extracts localized features by constraining cross-attention within predicted mask regions. In addition to reducing the research effort by at least three times, it outperforms the best specialized architectures by a significant margin on four popular datasets. Most notably, Mask2Former sets a new state-of-the-art for panoptic segmentation (57.8 PQ on COCO), instance segmentation (50.1 AP on COCO) and semantic segmentation (57.7 mIoU on ADE20K).

引用

页码：1280 / 1289

页数：10

共 50 条

[1] MATIS: MASKED-ATTENTION TRANSFORMERS FOR SURGICAL INSTRUMENT SEGMENTATION
Ayobi, Nicolas
Perez-Rondon, Alejandra
Rodriguez, Santiago
Arbelaez, Pablo
2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,
[2] EEND-M2F: Masked-attention mask transformers for speaker diarization
Harkonen, Marc
Broughton, Samuel J.
Samarakoon, Lahiru
INTERSPEECH 2024, 2024, : 37 - 41
[3] Masked-attention diffusion guidance for spatially controlling text-to-image generation
Endo, Yuki
VISUAL COMPUTER, 2024, 40 (09): : 6033 - 6045
[4] Enhancing Semantically Masked Transformer With Local Attention for Semantic Segmentation
Xia, Zhengyu
Kim, Joohee
IEEE ACCESS, 2023, 11 : 122345 - 122356
[5] COM: Contrastive Masked-attention model for incomplete multimodal learning
Qian, Shuwei
Wang, Chongjun
NEURAL NETWORKS, 2023, 162 : 443 - 455
[6] Mask2Anomaly: Mask Transformer for Universal Open-Set Segmentation
Rai, Shyam Nandan
Cermelli, Fabio
Caputo, Barbara
Masone, Carlo
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 9286 - 9302
[7] OneFormer: One Transformer to Rule Universal Image Segmentation
Jain, Jitesh
Li, Jiachen
Chiu, MangTik
Hassani, Ali
Orloy, Nikita
Shi, Humphrey
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2989 - 2998
[8] De-noising mask transformer for referring image segmentation
Wang, Yehui
Lei, Fang
Wang, Baoyan
Zhang, Qiang
Zhen, Xiantong
Zhang, Lei
IMAGE AND VISION COMPUTING, 2025, 154
[9] MaskDGNets: Masked-attention guided dynamic graph aggregation network for event extraction
Zhang, Guangwei
Xie, Fei
Yu, Lei
PLOS ONE, 2024, 19 (11):
[10] Mask-Attention-Free Transformer for 3D Instance Segmentation
Lai, Xin
Yuan, Yuhui
Chu, Ruihang
Chen, Yukang
Hu, Han
Jia, Jiaya
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3670 - 3680

← 1 2 3 4 5 →