AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with Masked Autoencoders

被引:11
|
作者
Bandara, Wele Gedara Chaminda [1 ]
Patel, Naman [2 ]
Gholami, Ali [2 ]
Nikkhah, Mehdi [2 ]
Agrawal, Motilal [2 ]
Patel, Vishal M. [1 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
[2] Zippin, San Francisco, CA USA
关键词
D O I
10.1109/CVPR52729.2023.01394
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Masked Autoencoders (MAEs) learn generalizable representations for image, text, audio, video, etc., by reconstructing masked input data from tokens of the visible data. Current MAE approaches for videos rely on random patch, tube, or frame based masking strategies to select these tokens. This paper proposes AdaMAE, an adaptive masking strategy for MAEs that is end-to-end trainable. Our adaptive masking strategy samples visible tokens based on the semantic context using an auxiliary sampling network. This network estimates a categorical distribution over spacetime-patch tokens. The tokens that increase the expected reconstruction error are rewarded and selected as visible tokens, motivated by the policy gradient algorithm in reinforcement learning. We show that AdaMAE samples more tokens from the high spatiotemporal information regions, thereby allowing us to mask 95% of tokens, resulting in lower memory requirements and faster pre-training. We conduct ablation studies on the Something-Something v2 (SSv2) dataset to demonstrate the efficacy of our adaptive sampling approach and report state-of-the-art results of 70.0% and 81.7% in top-1 accuracy on SSv2 and Kinetics-400 action classification datasets with a ViT-Base backbone and 800 pre-training epochs. Code and pre-trained models are available at: https://github.com/wgcban/adamae.git.
引用
收藏
页码:14507 / 14517
页数:11
相关论文
共 50 条
  • [31] On the use of Deep Autoencoders for Efficient Embedded Reinforcement Learning
    Prakash, Bharat
    Horton, Mark
    Waytowich, Nicholas R.
    Hairston, William David
    Oates, Tim
    Mohsenin, Tinoosh
    GLSVLSI '19 - PROCEEDINGS OF THE 2019 ON GREAT LAKES SYMPOSIUM ON VLSI, 2019, : 507 - 512
  • [32] MaeFE: Masked Autoencoders Family of Electrocardiogram for Self-Supervised Pretraining and Transfer Learning
    Zhang, Huaicheng
    Liu, Wenhan
    Shi, Jiguang
    Chang, Sheng
    Wang, Hao
    He, Jin
    Huang, Qijun
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [33] A Self-Supervised Learning Approach to Road Anomaly Detection Using Masked Autoencoders
    Dutta, Proma
    Podder, Kanchon Kanti
    Zhang, Jian
    Hecht, Christian
    Swarna, Surya
    Bhavsar, Parth
    INTERNATIONAL CONFERENCE ON TRANSPORTATION AND DEVELOPMENT 2024: PAVEMENTS AND INFRASTRUCTURE SYSTEMS, ICTD 2024, 2024, : 536 - 547
  • [34] Cross-view Contrastive Mutual Learning Across Masked Autoencoders for Mammography Diagnosis
    Wu, Qingxia
    Tan, Hongna
    Qiao, Zhi
    Dong, Pei
    Shen, Dinggang
    Wang, Meiyun
    Xue, Zhong
    MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2023, PT II, 2024, 14349 : 74 - 83
  • [35] MaeFE: Masked Autoencoders Family of Electrocardiogram for Self-Supervised Pretraining and Transfer Learning
    Zhang, Huaicheng
    Liu, Wenhan
    Shi, Jiguang
    Chang, Sheng
    Wang, Hao
    He, Jin
    Huang, Qijun
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [36] Tackling Missing Modalities in Audio-Visual Representation Learning Using Masked Autoencoders
    Chochlakis, Georgios
    Lavania, Chandrashekhar
    Mathur, Prashant
    Han, Kyu J.
    INTERSPEECH 2024, 2024, : 4678 - 4682
  • [37] Learning Adaptive Weight Masking for Adversarial Examples
    Kubo, Yoshimasa
    Traynor, Michael
    Trappenberg, Thomas
    Oore, Sageev
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [38] Adversarial Adaptive Interpolation in Autoencoders for Dually Regularizing Representation Learning
    Li, Guanyue
    Wei, Xiwen
    Wu, Si
    Yu, Zhiwen
    Qian, Sheng
    Wong, Hau-San
    IEEE MULTIMEDIA, 2022, 29 (03) : 57 - 67
  • [39] SMAE: Few-shot Learning for HDR Deghosting with Saturation-Aware Masked Autoencoders
    Yan, Qingsen
    Zhang, Song
    Chen, Weiye
    Tang, Hao
    Zhu, Yu
    Sun, Jinqiu
    Van Gool, Luc
    Zhang, Yanning
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 5775 - 5784
  • [40] MU-MAE: Multimodal Masked Autoencoders-Based One-Shot Learning
    Liu, Rex
    Liu, Xin
    2024 IEEE 7TH INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL, MIPR 2024, 2024, : 253 - 259