Evolved Hierarchical Masking for Self-Supervised Learning

被引：0

作者：

Feng, Zhanzhou ^{[1
]}

Zhang, Shiliang ^{[1
,2
]}

机构：

[1] Peking Univ, Sch Comp Sci, State Key Lab Multimedia Informat Proc, Beijing 100871, Peoples R China

[2] Peng Cheng Lab, Shenzhen 518055, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2025年 / 47卷 / 02期

关键词：

Visualization; Training; Semantics; Self-supervised learning; Semantic segmentation; Image classification; Computational modeling; Neurons; Representation learning; Predictive models; masked image modeling; efficient learning; model pretraining;

D O I：

10.1109/TPAMI.2024.3490776

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing Masked Image Modeling methods apply fixed mask patterns to guide the self-supervised training. As those mask patterns resort to different criteria to depict image contents, sticking to a fixed pattern leads to a limited vision cues modeling capability. This paper introduces an evolved hierarchical masking method to pursue general visual cues modeling in self-supervised learning. The proposed method leverages the vision model being trained to parse the input visual cues into a hierarchy structure, which is hence adopted to generate masks accordingly. The accuracy of hierarchy is on par with the capability of the model being trained, leading to evolved mask patterns at different training stages. Initially, generated masks focus on low-level visual cues to grasp basic textures, then gradually evolve to depict higher-level cues to reinforce the learning of more complicated object semantics and contexts. Our method does not require extra pre-trained models or annotations and ensures training efficiency by evolving the training difficulty. We conduct extensive experiments on seven downstream tasks including partial-duplicate image retrieval relying on low-level details, as well as image classification and semantic segmentation that require semantic parsing capability. Experimental results demonstrate that it substantially boosts performance across these tasks. For instance, it surpasses the recent MAE by 1.1% in imageNet-1K classification and 1.4% in ADE20K segmentation with the same training epochs. We also align the proposed method with the current research focus on LLMs. The proposed approach bridges the gap with large-scale pre-training on semantic demanding tasks and enhances intricate detail perception in tasks requiring low-level feature recognition.

引用

页码：1013 / 1027

页数：15

共 50 条

[41] Hierarchical Self-supervised Augmented Knowledge Distillation
Yang, Chuanguang
An, Zhulin
Cai, Linhang
Xu, Yongjun
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 1217 - 1223
[42] Mining Nuanced Weibo Sentiment with Hierarchical Graph Modeling and Self-Supervised Learning
Wang, Chuyang
Konpang, Jessada
Sirikham, Adisorn
Tian, Shasha
ELECTRONICS, 2025, 14 (01):
[43] Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning
Chen, Richard J.
Chen, Chengkuan
Li, Yicong
Chen, Tiffany Y.
Trister, Andrew D.
Krishnan, Rahul G.
Mahmood, Faisal
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16123 - 16134
[44] Cluster Head Detection for Hierarchical UAV Swarm With Graph Self-Supervised Learning
Mou, Zhiyu
Gao, Feifei
Liu, Jun
Yun, Xiang
Wu, Qihui
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2024, 72 : 5517 - 5532
[45] Image Masking for Robust Self-Supervised Monocular Depth Estimation
Chawla, Hemang
Jeeveswaran, Kishaan
Arani, Elahe
Zonooz, Bahram
2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 10054 - 10060
[46] Enhancing image quality prediction with self-supervised visual masking
Cogalan, U.
Bemana, M.
Seidel, H. P.
Myszkowski, K.
COMPUTER GRAPHICS FORUM, 2024, 43 (02)
[47] Adaptive-Masking Policy with Deep Reinforcement Learning for Self-Supervised Medical Image Segmentation
Xu, Gang
Wang, Shengxin
Lukasiewicz, Thomas
Xu, Zhenghua
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2285 - 2290
[48] SSIM: self-supervised learning method based on spatially selected shifts and irregular image masking
Shao, Yunxue
Wang, Zhiyang
Wang, Lingfeng
JOURNAL OF SUPERCOMPUTING, 2025, 81 (03):
[49] Self-supervised graph learning with target-adaptive masking for session-based recommendation
Wang, Yitong
Cai, Fei
Pan, Zhiqiang
Song, Chengyu
FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2023, 24 (01) : 73 - 87
[50] Self-Supervised Adversarial Variational Learning
Ye, Fei
Bors, Adrian. G.
PATTERN RECOGNITION, 2024, 148

← 1 2 3 4 5 →