Evolved Hierarchical Masking for Self-Supervised Learning

被引:0
|
作者
Feng, Zhanzhou [1 ]
Zhang, Shiliang [1 ,2 ]
机构
[1] Peking Univ, Sch Comp Sci, State Key Lab Multimedia Informat Proc, Beijing 100871, Peoples R China
[2] Peng Cheng Lab, Shenzhen 518055, Peoples R China
关键词
Visualization; Training; Semantics; Self-supervised learning; Semantic segmentation; Image classification; Computational modeling; Neurons; Representation learning; Predictive models; masked image modeling; efficient learning; model pretraining;
D O I
10.1109/TPAMI.2024.3490776
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing Masked Image Modeling methods apply fixed mask patterns to guide the self-supervised training. As those mask patterns resort to different criteria to depict image contents, sticking to a fixed pattern leads to a limited vision cues modeling capability. This paper introduces an evolved hierarchical masking method to pursue general visual cues modeling in self-supervised learning. The proposed method leverages the vision model being trained to parse the input visual cues into a hierarchy structure, which is hence adopted to generate masks accordingly. The accuracy of hierarchy is on par with the capability of the model being trained, leading to evolved mask patterns at different training stages. Initially, generated masks focus on low-level visual cues to grasp basic textures, then gradually evolve to depict higher-level cues to reinforce the learning of more complicated object semantics and contexts. Our method does not require extra pre-trained models or annotations and ensures training efficiency by evolving the training difficulty. We conduct extensive experiments on seven downstream tasks including partial-duplicate image retrieval relying on low-level details, as well as image classification and semantic segmentation that require semantic parsing capability. Experimental results demonstrate that it substantially boosts performance across these tasks. For instance, it surpasses the recent MAE by 1.1% in imageNet-1K classification and 1.4% in ADE20K segmentation with the same training epochs. We also align the proposed method with the current research focus on LLMs. The proposed approach bridges the gap with large-scale pre-training on semantic demanding tasks and enhances intricate detail perception in tasks requiring low-level feature recognition.
引用
收藏
页码:1013 / 1027
页数:15
相关论文
共 50 条
  • [41] Hierarchical Self-supervised Augmented Knowledge Distillation
    Yang, Chuanguang
    An, Zhulin
    Cai, Linhang
    Xu, Yongjun
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 1217 - 1223
  • [42] Mining Nuanced Weibo Sentiment with Hierarchical Graph Modeling and Self-Supervised Learning
    Wang, Chuyang
    Konpang, Jessada
    Sirikham, Adisorn
    Tian, Shasha
    ELECTRONICS, 2025, 14 (01):
  • [43] Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning
    Chen, Richard J.
    Chen, Chengkuan
    Li, Yicong
    Chen, Tiffany Y.
    Trister, Andrew D.
    Krishnan, Rahul G.
    Mahmood, Faisal
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16123 - 16134
  • [44] Cluster Head Detection for Hierarchical UAV Swarm With Graph Self-Supervised Learning
    Mou, Zhiyu
    Gao, Feifei
    Liu, Jun
    Yun, Xiang
    Wu, Qihui
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2024, 72 : 5517 - 5532
  • [45] Image Masking for Robust Self-Supervised Monocular Depth Estimation
    Chawla, Hemang
    Jeeveswaran, Kishaan
    Arani, Elahe
    Zonooz, Bahram
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 10054 - 10060
  • [46] Enhancing image quality prediction with self-supervised visual masking
    Cogalan, U.
    Bemana, M.
    Seidel, H. P.
    Myszkowski, K.
    COMPUTER GRAPHICS FORUM, 2024, 43 (02)
  • [47] Adaptive-Masking Policy with Deep Reinforcement Learning for Self-Supervised Medical Image Segmentation
    Xu, Gang
    Wang, Shengxin
    Lukasiewicz, Thomas
    Xu, Zhenghua
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2285 - 2290
  • [48] SSIM: self-supervised learning method based on spatially selected shifts and irregular image masking
    Shao, Yunxue
    Wang, Zhiyang
    Wang, Lingfeng
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (03):
  • [49] Self-supervised graph learning with target-adaptive masking for session-based recommendation
    Wang, Yitong
    Cai, Fei
    Pan, Zhiqiang
    Song, Chengyu
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2023, 24 (01) : 73 - 87
  • [50] Self-Supervised Adversarial Variational Learning
    Ye, Fei
    Bors, Adrian. G.
    PATTERN RECOGNITION, 2024, 148