SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders

被引:0
|
作者
Li, Gang [1 ,2 ]
Zheng, Heliang [3 ]
Liu, Daqing [3 ]
Wang, Chaoyue [3 ]
Su, Bing [4 ]
Zheng, Changwen [1 ]
机构
[1] Chinese Acad Sci, Inst Software, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] JD Explore Acad, Beijing, Peoples R China
[4] Renmin Univ China, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, significant progress has been made in masked image modeling to catch up to masked language modeling. However, unlike words in NLP, the lack of semantic decomposition of images still makes masked autoencoding (MAE) different between vision and language. In this paper, we explore a potential visual analogue of words, i.e., semantic parts, and we integrate semantic information into the training process of MAE by proposing a Semantic-Guided Masking strategy. Compared to widely adopted random masking, our masking strategy can gradually guide the network to learn various information, i.e., from intra-part patterns to inter-part relations. In particular, we achieve this in two steps. 1) Semantic part learning: we design a self-supervised part learning method to obtain semantic parts by leveraging and refining the multi-head attention of a ViT-based encoder. 2) Semantic-guided MAE (SemMAE) training: we design a masking strategy that varies from masking a portion of patches in each part to masking a portion of (whole) parts in an image. Extensive experiments on various vision tasks show that SemMAE can learn better image representation by integrating semantic information. In particular, SemMAE achieves 84.5% fine-tuning accuracy on ImageNet-1k, which outperforms the vanilla MAE by 1.4%. In the semantic segmentation and fine-grained recognition tasks, SemMAE also brings significant improvements and yields the state-of-the-art performance. Our code is available at https://github.com/ucasligang/SemMAE.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Where to Mask: Structure-Guided Masking for Graph Masked Autoencoders
    Li, Chuang
    Wang, Yuyao
    Zhang, Yibing
    Mai, Xueqi
    Tao, Dapeng
    Wu, Jia
    Hu, Wenbin
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 2180 - 2188
  • [2] AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with Masked Autoencoders
    Bandara, Wele Gedara Chaminda
    Patel, Naman
    Gholami, Ali
    Nikkhah, Mehdi
    Agrawal, Motilal
    Patel, Vishal M.
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14507 - 14517
  • [3] Semantic-guided hashing learning for domain adaptive retrieval
    Zhang, Wei
    Yang, Xiaoqiong
    Teng, Shaohua
    Wu, NaiQi
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2023, 26 (03): : 1093 - 1112
  • [4] Semantic-guided hashing learning for domain adaptive retrieval
    Wei Zhang
    Xiaoqiong Yang
    Shaohua Teng
    NaiQi Wu
    World Wide Web, 2023, 26 : 1093 - 1112
  • [5] Learning Local Features by Jointly Semantic-Guided and Task Rewards
    Wang, Li
    Zhang, Yunzhou
    Ge, Fawei
    Bai, Wenjing
    Zhang, Jinpeng
    Wang, Yifan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (03) : 2045 - 2056
  • [6] Label embedding semantic-guided hashing
    Long, Jun
    Sun, Longzhi
    Guo, Lin
    Hua, Liujie
    Yang, Zhan
    NEUROCOMPUTING, 2022, 477 : 1 - 13
  • [7] Semantic-Guided Novel Category Discovery
    Wang, Weishuai
    Lei, Ting
    Chen, Qingchao
    Liu, Yang
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5607 - 5614
  • [8] Semantic-Guided Multi-Attention Localization for Zero-Shot Learning
    Zhu, Yizhe
    Xie, Jianwen
    Tang, Zhiqiang
    Peng, Xi
    Elgammal, Ahmed
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [9] Semantic-guided Reinforced Region Embedding for Generalized Zero-Shot Learning
    Ge, Jiannan
    Xie, Hongtao
    Min, Shaobo
    Zhang, Yongdong
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1406 - 1414
  • [10] Semantic-Guided Feature Distillation for Multimodal Recommendation
    Liu, Fan
    Chen, Huilin
    Cheng, Zhiyong
    Nie, Liqiang
    Kankanhalli, Mohan
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6567 - 6575