AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with Masked Autoencoders

被引:11
|
作者
Bandara, Wele Gedara Chaminda [1 ]
Patel, Naman [2 ]
Gholami, Ali [2 ]
Nikkhah, Mehdi [2 ]
Agrawal, Motilal [2 ]
Patel, Vishal M. [1 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
[2] Zippin, San Francisco, CA USA
关键词
D O I
10.1109/CVPR52729.2023.01394
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Masked Autoencoders (MAEs) learn generalizable representations for image, text, audio, video, etc., by reconstructing masked input data from tokens of the visible data. Current MAE approaches for videos rely on random patch, tube, or frame based masking strategies to select these tokens. This paper proposes AdaMAE, an adaptive masking strategy for MAEs that is end-to-end trainable. Our adaptive masking strategy samples visible tokens based on the semantic context using an auxiliary sampling network. This network estimates a categorical distribution over spacetime-patch tokens. The tokens that increase the expected reconstruction error are rewarded and selected as visible tokens, motivated by the policy gradient algorithm in reinforcement learning. We show that AdaMAE samples more tokens from the high spatiotemporal information regions, thereby allowing us to mask 95% of tokens, resulting in lower memory requirements and faster pre-training. We conduct ablation studies on the Something-Something v2 (SSv2) dataset to demonstrate the efficacy of our adaptive sampling approach and report state-of-the-art results of 70.0% and 81.7% in top-1 accuracy on SSv2 and Kinetics-400 action classification datasets with a ViT-Base backbone and 800 pre-training epochs. Code and pre-trained models are available at: https://github.com/wgcban/adamae.git.
引用
收藏
页码:14507 / 14517
页数:11
相关论文
共 50 条
  • [41] PatchMixing Masked Autoencoders for 3D Point Cloud Self-Supervised Learning
    Lin, Chengxing
    Xu, Wenju
    Zhu, Jian
    Nie, Yongwei
    Cai, Ruichu
    Xu, Xuemiao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9882 - 9897
  • [42] Domain Invariant Masked Autoencoders for Self-supervised Learning from Multi-domains
    Yang, Haiyang
    Tang, Shixiang
    Chen, Meilin
    Wang, Yizhou
    Zhu, Feng
    Bai, Lei
    Zhao, Rui
    Ouyang, Wanli
    COMPUTER VISION, ECCV 2022, PT XXXI, 2022, 13691 : 151 - 168
  • [43] HAT-GAE: Self-supervised graph autoencoders with hierarchical adaptive masking and trainable corruption
    Sun, Chengyu
    Hu, Liang
    Li, Hongtu
    Li, Shuai
    Li, Tuohang
    Chi, Ling
    KNOWLEDGE-BASED SYSTEMS, 2023, 279
  • [44] Learning Efficient, Collective Monte Carlo Moves with Variational Autoencoders
    Monroe, Jacob, I
    Shen, Vincent K.
    JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2022, 18 (06) : 3622 - 3636
  • [45] Masked Siamese Networks for Label-Efficient Learning
    Assran, Mahmoud
    Caron, Mathilde
    Misra, Ishan
    Bojanowski, Piotr
    Bordes, Florian
    Vincent, Pascal
    Joulin, Armand
    Rabbat, Mike
    Ballas, Nicolas
    COMPUTER VISION, ECCV 2022, PT XXXI, 2022, 13691 : 456 - 473
  • [46] A Self-adaptive Learning Rate Principle for Stacked Denoising Autoencoders
    HAO Qianqian
    DING Jinkou
    WANG Jianfei
    软件, 2015, 36 (09) : 82 - 86
  • [47] Automatic Modulation Recognition for Radio Frequency Proximity Sensor Signals Based on Masked Autoencoders and Transfer Learning
    Yi, Guanghua
    Hao, Xinhong
    Yan, Xiaopeng
    Wang, Jiawei
    Dai, Jian
    IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2024, 60 (06) : 8700 - 8712
  • [48] Spatiotemporal adaptive quantization for an efficient video rate control
    Lee, SW
    Kim, WJ
    Kim, K
    OPTICAL ENGINEERING, 2005, 44 (06) : 1 - 2
  • [49] Fully Self-Supervised Out-of-Domain Few-Shot Learning with Masked Autoencoders
    Walsh, Reece
    Osman, Islam
    Abdelaziz, Omar
    Shehata, Mohamed S.
    JOURNAL OF IMAGING, 2024, 10 (01)
  • [50] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
    Tong, Zhan
    Song, Yibing
    Wang, Jue
    Wang, Limin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,