Masked-attention Mask Transformer for Universal Image Segmentation

被引:1023
|
作者
Cheng, Bowen [1 ,2 ]
Misra, Ishan [1 ]
Schwing, Alexander G. [2 ]
Kirillov, Alexander [1 ]
Girdhar, Rohit [1 ]
机构
[1] Facebook AI Res FAIR, Menlo Pk, CA 94025 USA
[2] Univ Illinois Urbana Champaign UIUC, Champaign, IL 61820 USA
关键词
D O I
10.1109/CVPR52688.2022.00135
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image segmentation groups pixels with different semantics, e.g., category or instance membership. Each choice of semantics defines a task. While only the semantics of each task differ, current research focuses on designing specialized architectures for each task. We present Masked-attention Mask Transformer (Mask2Former), a new architecture capable of addressing any image segmentation task (panoptic, instance or semantic). Its key components include masked attention, which extracts localized features by constraining cross-attention within predicted mask regions. In addition to reducing the research effort by at least three times, it outperforms the best specialized architectures by a significant margin on four popular datasets. Most notably, Mask2Former sets a new state-of-the-art for panoptic segmentation (57.8 PQ on COCO), instance segmentation (50.1 AP on COCO) and semantic segmentation (57.7 mIoU on ADE20K).
引用
收藏
页码:1280 / 1289
页数:10
相关论文
共 50 条
  • [31] Bayesian Transformer Using Disentangled Mask Attention
    Chien, Jen-Tzung
    Huang, Yu-Han
    INTERSPEECH 2022, 2022, : 1761 - 1765
  • [32] Mask Attention Networks: Rethinking and Strengthen Transformer
    Fan, Zhihao
    Gong, Yeyun
    Lit, Dayiheng
    Wei, Zhongyu
    Wang, Siyuan
    Jiao, Jian
    Duan, Nan
    Zhang, Ruofei
    Huang, Xuanjing
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1692 - 1701
  • [33] Adaptive Masked Autoencoder Transformer for image classification
    Chen, Xiangru
    Liu, Chenjing
    Hu, Peng
    Lin, Jie
    Gong, Yunhong
    Chen, Yingke
    Peng, Dezhong
    Geng, Xue
    APPLIED SOFT COMPUTING, 2024, 164
  • [34] An effective masked transformer network for image denoising
    Xu, Shaoping
    Xiao, Nan
    Tao, Wuyong
    Zhou, Changfei
    Xiong, Minghai
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (6-7) : 4997 - 5010
  • [35] Masked Diffusion Transformer is a Strong Image Synthesizer
    Gao, Shanghua
    Zhou, Pan
    Cheng, Ming-Ming
    Yan, Shuicheng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 23107 - 23116
  • [36] Masked hybrid attention with Laplacian query fusion and tripartite sequence matching for medical image segmentation
    Favour Ekong
    Yongbin Yu
    Rutherford Agbeshi Patamia
    Kwabena Sarpong
    Chiagoziem C. Ukwuoma
    Xiangxiang Wang
    Akpanika Robert Ukot
    Jingye Cai
    Neural Computing and Applications, 2025, 37 (8) : 5891 - 5911
  • [37] Contrastive Transformer Masked Image Hashing for Degraded Image Retrieval
    Shen, Xiaobo
    Cai, Haoyu
    Gong, Xiuwen
    Zheng, Yuhui
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 1218 - 1226
  • [38] SMART: Semantic-Aware Masked Attention Relational Transformer for Multi-label Image Recognition
    Wu, Hongjun
    Xu, Cheng
    Liu, Hongzhe
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2158 - 2162
  • [39] Mask Transformer: Unpaired Text Style Transfer Based on Masked Language
    Wu, Chunhua
    Chen, Xiaolong
    Li, Xingbiao
    APPLIED SCIENCES-BASEL, 2020, 10 (18):
  • [40] Semantic Segmentation Method of UAV Image Based on Window Attention Aggregation Swin Transformer
    Li, Junjie
    Yi, Shi
    He, Runhua
    Liu, Xi
    Computer Engineering and Applications, 2024, 60 (15) : 198 - 210