SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders

被引:0
|
作者
Li, Gang [1 ,2 ]
Zheng, Heliang [3 ]
Liu, Daqing [3 ]
Wang, Chaoyue [3 ]
Su, Bing [4 ]
Zheng, Changwen [1 ]
机构
[1] Chinese Acad Sci, Inst Software, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] JD Explore Acad, Beijing, Peoples R China
[4] Renmin Univ China, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, significant progress has been made in masked image modeling to catch up to masked language modeling. However, unlike words in NLP, the lack of semantic decomposition of images still makes masked autoencoding (MAE) different between vision and language. In this paper, we explore a potential visual analogue of words, i.e., semantic parts, and we integrate semantic information into the training process of MAE by proposing a Semantic-Guided Masking strategy. Compared to widely adopted random masking, our masking strategy can gradually guide the network to learn various information, i.e., from intra-part patterns to inter-part relations. In particular, we achieve this in two steps. 1) Semantic part learning: we design a self-supervised part learning method to obtain semantic parts by leveraging and refining the multi-head attention of a ViT-based encoder. 2) Semantic-guided MAE (SemMAE) training: we design a masking strategy that varies from masking a portion of patches in each part to masking a portion of (whole) parts in an image. Extensive experiments on various vision tasks show that SemMAE can learn better image representation by integrating semantic information. In particular, SemMAE achieves 84.5% fine-tuning accuracy on ImageNet-1k, which outperforms the vanilla MAE by 1.4%. In the semantic segmentation and fine-grained recognition tasks, SemMAE also brings significant improvements and yields the state-of-the-art performance. Our code is available at https://github.com/ucasligang/SemMAE.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] ColorMAE: Exploring Data-Independent Masking Strategies in Masked AutoEncoders
    Hinojosa, Carlos
    Liu, Shuming
    Ghanem, Bernard
    COMPUTER VISION - ECCV 2024, PT XX, 2025, 15078 : 432 - 449
  • [22] SEGSID: A Semantic-Guided Framework for Sonar Image Despeckling
    Liu, Shaohua
    Lu, Junzhe
    Dou, Hongkun
    Li, Jiajun
    Deng, Yue
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 652 - 666
  • [23] Semantic-Guided Feature Selection for Industrial Automation Systems
    Ringsquandl, Martin
    Lamparter, Steffen
    Brandt, Sebastian
    Hubauer, Thomas
    Lepratti, Raffaello
    SEMANTIC WEB - ISWC 2015, PT II, 2015, 9367 : 225 - 240
  • [24] Improving Masked Autoencoders by Learning Where to Mask
    Chen, Haijian
    Zhang, Wendong
    Wang, Yunbo
    Yang, Xiaokang
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VIII, 2024, 14432 : 377 - 390
  • [25] SGFNet: Semantic-Guided Fusion Network for RGB-Thermal Semantic Segmentation
    WangLi, Yike
    Li, Gongyang
    Liu, Zhi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) : 7737 - 7748
  • [26] Semantic-Guided Class-Imbalance Learning Model for Zero-Shot Image Classification
    Ji, Zhong
    Yu, Xuejie
    Yu, Yunlong
    Pang, Yanwei
    Zhang, Zhongfei
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (07) : 6543 - 6554
  • [27] Semantic-Guided High-Order Region Attention Embedding for Zero-Shot Learning
    Zhang, Rui
    Xu, Xiangyu
    Zhu, Qi
    2021 13TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2021, : 266 - 272
  • [28] Masked Autoencoders for Medical Ultrasound Videos Using ROI-Aware Masking
    Szijarto, Adam
    Magyar, Balint
    Szeier, Thomas A.
    Tolvaj, Mate
    Fabian, Alexandra
    Lakatos, Balint K.
    Ladanyi, Zsuzsanna
    Bagyura, Zsolt
    Merkely, Bela
    Kovacs, Attila
    Tokodi, Marton
    SIMPLIFYING MEDICAL ULTRASOUND, ASMUS 2024, 2025, 15186 : 167 - 176
  • [29] A Sub-captions Semantic-Guided Network for Image Captioning
    Tian, Wei-Dong
    Zhu, Jun-jun
    Wu, Shuang
    Zhao, Zhong-Qiu
    Zhang, Yu-Zheng
    Zhang, Tian-yu
    INTELLIGENT COMPUTING METHODOLOGIES, PT III, 2022, 13395 : 367 - 379
  • [30] Semantic-Guided Transformer Network for Crop Classification in Hyperspectral Images
    Pi, Weiqiang
    Zhang, Tao
    Wang, Rongyang
    Ma, Guowei
    Wang, Yong
    Du, Jianmin
    JOURNAL OF IMAGING, 2025, 11 (02)