Evolved Hierarchical Masking for Self-Supervised Learning

被引:0
|
作者
Feng, Zhanzhou [1 ]
Zhang, Shiliang [1 ,2 ]
机构
[1] Peking Univ, Sch Comp Sci, State Key Lab Multimedia Informat Proc, Beijing 100871, Peoples R China
[2] Peng Cheng Lab, Shenzhen 518055, Peoples R China
关键词
Visualization; Training; Semantics; Self-supervised learning; Semantic segmentation; Image classification; Computational modeling; Neurons; Representation learning; Predictive models; masked image modeling; efficient learning; model pretraining;
D O I
10.1109/TPAMI.2024.3490776
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing Masked Image Modeling methods apply fixed mask patterns to guide the self-supervised training. As those mask patterns resort to different criteria to depict image contents, sticking to a fixed pattern leads to a limited vision cues modeling capability. This paper introduces an evolved hierarchical masking method to pursue general visual cues modeling in self-supervised learning. The proposed method leverages the vision model being trained to parse the input visual cues into a hierarchy structure, which is hence adopted to generate masks accordingly. The accuracy of hierarchy is on par with the capability of the model being trained, leading to evolved mask patterns at different training stages. Initially, generated masks focus on low-level visual cues to grasp basic textures, then gradually evolve to depict higher-level cues to reinforce the learning of more complicated object semantics and contexts. Our method does not require extra pre-trained models or annotations and ensures training efficiency by evolving the training difficulty. We conduct extensive experiments on seven downstream tasks including partial-duplicate image retrieval relying on low-level details, as well as image classification and semantic segmentation that require semantic parsing capability. Experimental results demonstrate that it substantially boosts performance across these tasks. For instance, it surpasses the recent MAE by 1.1% in imageNet-1K classification and 1.4% in ADE20K segmentation with the same training epochs. We also align the proposed method with the current research focus on LLMs. The proposed approach bridges the gap with large-scale pre-training on semantic demanding tasks and enhances intricate detail perception in tasks requiring low-level feature recognition.
引用
收藏
页码:1013 / 1027
页数:15
相关论文
共 50 条
  • [31] Quantum self-supervised learning
    Jaderberg, B.
    Anderson, L. W.
    Xie, W.
    Albanie, S.
    Kiffner, M.
    Jaksch, D.
    QUANTUM SCIENCE AND TECHNOLOGY, 2022, 7 (03):
  • [32] Self-Supervised Learning for Electroencephalography
    Rafiei, Mohammad H.
    Gauthier, Lynne V.
    Adeli, Hojjat
    Takabi, Daniel
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 1457 - 1471
  • [33] Self-supervised speech representation learning based on positive sample comparison and masking reconstruction
    Zhang, Wenlin
    Liu, Xuepeng
    Niu, Tong
    Chen, Qi
    Qu, Dan
    Tongxin Xuebao/Journal on Communications, 2022, 43 (07): : 163 - 171
  • [34] Spectral Salt-and-Pepper Patch Masking for Self-Supervised Speech Representation Learning
    Kim, June-Woo
    Chung, Hoon
    Jung, Ho-Young
    MATHEMATICS, 2023, 11 (15)
  • [35] Generative Self-Supervised Learning With Spectral-Spatial Masking for Hyperspectral Target Detection
    Chen, Xi
    Zhang, Yuxiang
    Dong, Yanni
    Du, Bo
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 1
  • [36] GDMol: Generative Double-Masking Self-Supervised Learning for Molecular Property Prediction
    Liu, Yingxu
    Fan, Qing
    Xu, Chengcheng
    Ning, Xiangzhen
    Wang, Yu
    Liu, Yang
    Zhang, Yanmin
    Chen, Yadong
    Liu, Haichun
    MOLECULAR INFORMATICS, 2024,
  • [37] Self-supervised Adversarial Masking for 3D Point Cloud Representation Learning
    Szachniewicz, Michal
    Kozlowski, Wojciech
    Stypulkowski, Michal
    Zieba, Maciej
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, ACIIDS 2024, 2024, 14796 : 156 - 168
  • [38] Learning a self-supervised tone mapping operator via feature contrast masking loss
    Wang, C.
    Chen, B.
    Seidel, HP.
    Myszkowski, K.
    Serrano, A.
    COMPUTER GRAPHICS FORUM, 2022, 41 (02) : 71 - 84
  • [39] Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency
    Qing, Zhiwu
    Zhang, Shiwei
    Huang, Ziyuan
    Xu, Yi
    Wang, Xiang
    Tang, Mingqian
    Gao, Changxin
    Jin, Rong
    Sang, Nong
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13811 - 13821
  • [40] A New Self-supervised Method for Supervised Learning
    Yang, Yuhang
    Ding, Zilin
    Cheng, Xuan
    Wang, Xiaomin
    Liu, Ming
    INTERNATIONAL CONFERENCE ON COMPUTER VISION, APPLICATION, AND DESIGN (CVAD 2021), 2021, 12155