SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders

被引:0
|
作者
Li, Gang [1 ,2 ]
Zheng, Heliang [3 ]
Liu, Daqing [3 ]
Wang, Chaoyue [3 ]
Su, Bing [4 ]
Zheng, Changwen [1 ]
机构
[1] Chinese Acad Sci, Inst Software, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] JD Explore Acad, Beijing, Peoples R China
[4] Renmin Univ China, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, significant progress has been made in masked image modeling to catch up to masked language modeling. However, unlike words in NLP, the lack of semantic decomposition of images still makes masked autoencoding (MAE) different between vision and language. In this paper, we explore a potential visual analogue of words, i.e., semantic parts, and we integrate semantic information into the training process of MAE by proposing a Semantic-Guided Masking strategy. Compared to widely adopted random masking, our masking strategy can gradually guide the network to learn various information, i.e., from intra-part patterns to inter-part relations. In particular, we achieve this in two steps. 1) Semantic part learning: we design a self-supervised part learning method to obtain semantic parts by leveraging and refining the multi-head attention of a ViT-based encoder. 2) Semantic-guided MAE (SemMAE) training: we design a masking strategy that varies from masking a portion of patches in each part to masking a portion of (whole) parts in an image. Extensive experiments on various vision tasks show that SemMAE can learn better image representation by integrating semantic information. In particular, SemMAE achieves 84.5% fine-tuning accuracy on ImageNet-1k, which outperforms the vanilla MAE by 1.4%. In the semantic segmentation and fine-grained recognition tasks, SemMAE also brings significant improvements and yields the state-of-the-art performance. Our code is available at https://github.com/ucasligang/SemMAE.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Semantic-Guided Inpainting Network for Complex Urban Scenes Manipulation
    Ardino, Pierfrancesco
    Liu, Yahui
    Ricci, Elisa
    Lepri, Bruno
    de Nadai, Marco
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9280 - 9287
  • [42] Semantic Communication for Efficient Image Transmission Tasks based on Masked Autoencoders
    Wu, Jiale
    Wu, Celimuge
    Lin, Yangfei
    Bao, Jingjing
    Du, Zhaoyang
    Zhong, Lei
    Chen, Xianfu
    Ji, Yusheng
    2023 IEEE 98TH VEHICULAR TECHNOLOGY CONFERENCE, VTC2023-FALL, 2023,
  • [43] Enhancing Representation Learning of EEG Data with Masked Autoencoders
    Zhou, Yifei
    Liu, Sitong
    AUGMENTED COGNITION, PT II, AC 2024, 2024, 14695 : 88 - 100
  • [44] Contextual Semantic-Guided Entity-Centric GCN for Relation Extraction
    Long, Jun
    Liu, Lei
    Fei, Hongxiao
    Xiang, Yiping
    Li, Haoran
    Huang, Wenti
    Yang, Liu
    MATHEMATICS, 2022, 10 (08)
  • [45] BCT-Net: semantic-guided breast cancer segmentation on BUS
    Xin, Junchang
    Yu, Yaqi
    Shen, Qi
    Zhang, Shudi
    Su, Na
    Wang, Zhiqiong
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2025,
  • [46] Strongly representative semantic-guided segmentation network for pancreatic and pancreatic tumors
    Cao, Luyang
    Li, Jianwei
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 87
  • [47] Semantic-guided multi-scale human skeleton action recognition
    Qi, Yongfeng
    Hu, Jinlin
    Zhuang, Liqiang
    Pei, Xiaoxu
    APPLIED INTELLIGENCE, 2023, 53 (09) : 9763 - 9778
  • [48] Image style transfer with collection representation space and semantic-guided reconstruction
    Ma, Zhuoqi
    Li, Jie
    Wang, Nannan
    Gao, Xinbo
    NEURAL NETWORKS, 2020, 129 (129) : 123 - 137
  • [49] Neuron Semantic-Guided Test Generation for Deep Neural Networks Fuzzing
    Huang, Li
    Sun, Weifeng
    Yan, Meng
    Liu, Zhongxin
    Lei, Yan
    Lo, David
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2025, 34 (01)
  • [50] Semantic-Guided Completion Network for Video Inpainting in Complex Urban Scene
    Wang, Jianan
    Xuan, Hanyu
    Wu, Zhiliang
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT XI, 2024, 14435 : 224 - 236