Evolved Hierarchical Masking for Self-Supervised Learning

被引:0
|
作者
Feng, Zhanzhou [1 ]
Zhang, Shiliang [1 ,2 ]
机构
[1] Peking Univ, Sch Comp Sci, State Key Lab Multimedia Informat Proc, Beijing 100871, Peoples R China
[2] Peng Cheng Lab, Shenzhen 518055, Peoples R China
关键词
Visualization; Training; Semantics; Self-supervised learning; Semantic segmentation; Image classification; Computational modeling; Neurons; Representation learning; Predictive models; masked image modeling; efficient learning; model pretraining;
D O I
10.1109/TPAMI.2024.3490776
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing Masked Image Modeling methods apply fixed mask patterns to guide the self-supervised training. As those mask patterns resort to different criteria to depict image contents, sticking to a fixed pattern leads to a limited vision cues modeling capability. This paper introduces an evolved hierarchical masking method to pursue general visual cues modeling in self-supervised learning. The proposed method leverages the vision model being trained to parse the input visual cues into a hierarchy structure, which is hence adopted to generate masks accordingly. The accuracy of hierarchy is on par with the capability of the model being trained, leading to evolved mask patterns at different training stages. Initially, generated masks focus on low-level visual cues to grasp basic textures, then gradually evolve to depict higher-level cues to reinforce the learning of more complicated object semantics and contexts. Our method does not require extra pre-trained models or annotations and ensures training efficiency by evolving the training difficulty. We conduct extensive experiments on seven downstream tasks including partial-duplicate image retrieval relying on low-level details, as well as image classification and semantic segmentation that require semantic parsing capability. Experimental results demonstrate that it substantially boosts performance across these tasks. For instance, it surpasses the recent MAE by 1.1% in imageNet-1K classification and 1.4% in ADE20K segmentation with the same training epochs. We also align the proposed method with the current research focus on LLMs. The proposed approach bridges the gap with large-scale pre-training on semantic demanding tasks and enhances intricate detail perception in tasks requiring low-level feature recognition.
引用
收藏
页码:1013 / 1027
页数:15
相关论文
共 50 条
  • [1] Evolved Part Masking for Self-Supervised Learning
    Feng, Zhanzhou
    Zhang, Shiliang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10386 - 10395
  • [2] Adversarial Masking for Self-Supervised Learning
    Shi, Yuge
    Siddharth, N.
    Torr, Philip H. S.
    Kosiorek, Adam R.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [3] Masking Hierarchical Tokens for Underwater Acoustic Target Recognition With Self-Supervised Learning
    Feng, Sheng
    Zhu, Xiaoqian
    Ma, Shuqing
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1365 - 1379
  • [4] Self-Supervised Learning With Segmental Masking for Speech Representation
    Yue, Xianghu
    Lin, Jingru
    Gutierrez, Fabian Ritter
    Li, Haizhou
    IEEE Journal on Selected Topics in Signal Processing, 2022, 16 (06): : 1367 - 1379
  • [5] Self-Supervised Learning With Segmental Masking for Speech Representation
    Yue, Xianghu
    Lin, Jingru
    Gutierrez, Fabian Ritter
    Li, Haizhou
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1367 - 1379
  • [6] Hierarchical Self-supervised Representation Learning for Movie Understanding
    Xiao, Fanyi
    Kundu, Kaustav
    Tighe, Joseph
    Modolo, Davide
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9717 - 9726
  • [7] SHERLock: Self-Supervised Hierarchical Event Representation Learning
    Roychowdhury, S.
    Sontakke, S. A.
    Itti, L.
    Sarkar, M.
    Aggarwal, M.
    Badjatiya, P.
    Puri, N.
    Krishnamurthy, B.
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2672 - 2678
  • [8] Learning with Noisy labels via Self-supervised Adversarial Noisy Masking
    Tu, Yuanpeng
    Zhang, Boshen
    Li, Yuxi
    Liu, Liang
    Li, Jian
    Zhang, Jiangning
    Wang, Yabiao
    Wang, Chengjie
    Zhao, Cai Rong
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 16186 - 16195
  • [9] Hierarchical Detection of Network Anomalies : A Self-Supervised Learning Approach
    Kye, Hyoseon
    Kim, Miru
    Kwon, Minhae
    IEEE Signal Processing Letters, 2022, 29 : 1908 - 1912
  • [10] Hierarchical Self-Supervised Learning for Knowledge-Aware Recommendation
    Zhou, Cong
    Zhou, Sihang
    Huang, Jian
    Wang, Dong
    APPLIED SCIENCES-BASEL, 2024, 14 (20):