Masked-attention Mask Transformer for Universal Image Segmentation

被引:1023
|
作者
Cheng, Bowen [1 ,2 ]
Misra, Ishan [1 ]
Schwing, Alexander G. [2 ]
Kirillov, Alexander [1 ]
Girdhar, Rohit [1 ]
机构
[1] Facebook AI Res FAIR, Menlo Pk, CA 94025 USA
[2] Univ Illinois Urbana Champaign UIUC, Champaign, IL 61820 USA
关键词
D O I
10.1109/CVPR52688.2022.00135
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image segmentation groups pixels with different semantics, e.g., category or instance membership. Each choice of semantics defines a task. While only the semantics of each task differ, current research focuses on designing specialized architectures for each task. We present Masked-attention Mask Transformer (Mask2Former), a new architecture capable of addressing any image segmentation task (panoptic, instance or semantic). Its key components include masked attention, which extracts localized features by constraining cross-attention within predicted mask regions. In addition to reducing the research effort by at least three times, it outperforms the best specialized architectures by a significant margin on four popular datasets. Most notably, Mask2Former sets a new state-of-the-art for panoptic segmentation (57.8 PQ on COCO), instance segmentation (50.1 AP on COCO) and semantic segmentation (57.7 mIoU on ADE20K).
引用
收藏
页码:1280 / 1289
页数:10
相关论文
共 50 条
  • [41] Transformer and group parallel axial attention co-encoder for medical image segmentation
    Li, Chaoqun
    Wang, Liejun
    Li, Yongming
    SCIENTIFIC REPORTS, 2022, 12 (01):
  • [42] Enhanced Swin Transformer and Edge Spatial Attention for Remote Sensing Image Semantic Segmentation
    Liu, Fuxiang
    Hu, Zhiqiang
    Li, Lei
    Li, Hanlu
    Liu, Xinxin
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 1296 - 1300
  • [43] A study of attention information from transformer layers in hybrid medical image segmentation networks
    Hasany, Syed Nouman
    Petitjean, Caroline
    Meriaudeau, Fabrice
    MEDICAL IMAGING 2023, 2023, 12464
  • [44] UAT: Universal Attention Transformer for Video Captioning
    Im, Heeju
    Choi, Yong-Suk
    SENSORS, 2022, 22 (13)
  • [45] Mask Matching Transformer for Few-Shot Segmentation
    Jiao, Siyu
    Zhang, Gengwei
    Navasardyan, Shant
    Chen, Ling
    Zhao, Yao
    Wei, Yunchao
    Shi, Humphrey
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [46] Mask Grounding for Referring Image Segmentation
    Chng, Yong Xien
    Zheng, Henry
    Han, Yizeng
    Qiu, Xuchong
    Huang, Gao
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 26563 - 26573
  • [47] Brain tumor image segmentation method using hybrid attention module and improved mask RCNN
    Yuan, Jinglin
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [48] Masked and Adaptive Transformer for Exemplar Based Image Translation
    Jiang, Chang
    Gao, Fei
    Ma, Biao
    Lin, Yuhao
    Wang, Nannan
    Xu, Gang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22418 - 22427
  • [49] HDR Image Reconstruction Algorithm Based on Masked Transformer
    Zhang, Zuheng
    Chen, Xiaodong
    Yi, Wang
    Cai, Huaiyu
    LASER & OPTOELECTRONICS PROGRESS, 2025, 62 (02)
  • [50] Dual Branch Masked Transformer for Hyperspectral Image Classification
    Li, Kuo
    Chen, Yushi
    Huang, Lingbo
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21