AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with Masked Autoencoders

被引:11
|
作者
Bandara, Wele Gedara Chaminda [1 ]
Patel, Naman [2 ]
Gholami, Ali [2 ]
Nikkhah, Mehdi [2 ]
Agrawal, Motilal [2 ]
Patel, Vishal M. [1 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
[2] Zippin, San Francisco, CA USA
关键词
D O I
10.1109/CVPR52729.2023.01394
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Masked Autoencoders (MAEs) learn generalizable representations for image, text, audio, video, etc., by reconstructing masked input data from tokens of the visible data. Current MAE approaches for videos rely on random patch, tube, or frame based masking strategies to select these tokens. This paper proposes AdaMAE, an adaptive masking strategy for MAEs that is end-to-end trainable. Our adaptive masking strategy samples visible tokens based on the semantic context using an auxiliary sampling network. This network estimates a categorical distribution over spacetime-patch tokens. The tokens that increase the expected reconstruction error are rewarded and selected as visible tokens, motivated by the policy gradient algorithm in reinforcement learning. We show that AdaMAE samples more tokens from the high spatiotemporal information regions, thereby allowing us to mask 95% of tokens, resulting in lower memory requirements and faster pre-training. We conduct ablation studies on the Something-Something v2 (SSv2) dataset to demonstrate the efficacy of our adaptive sampling approach and report state-of-the-art results of 70.0% and 81.7% in top-1 accuracy on SSv2 and Kinetics-400 action classification datasets with a ViT-Base backbone and 800 pre-training epochs. Code and pre-trained models are available at: https://github.com/wgcban/adamae.git.
引用
收藏
页码:14507 / 14517
页数:11
相关论文
共 50 条
  • [1] SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders
    Li, Gang
    Zheng, Heliang
    Liu, Daqing
    Wang, Chaoyue
    Su, Bing
    Zheng, Changwen
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [2] Masked Autoencoders As Spatiotemporal Learners
    Feichtenhofer, Christoph
    Fan, Haoqi
    Li, Yanghao
    He, Kaiming
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [3] Efficient Masked Autoencoders With Self-Consistency
    Li, Zhaowen
    Zhu, Yousong
    Chen, Zhiyang
    Li, Wei
    Zhao, Rui
    Zhao, Chaoyang
    Tang, Ming
    Wang, Jinqiao
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 8743 - 8757
  • [4] ColorMAE: Exploring Data-Independent Masking Strategies in Masked AutoEncoders
    Hinojosa, Carlos
    Liu, Shuming
    Ghanem, Bernard
    COMPUTER VISION - ECCV 2024, PT XX, 2025, 15078 : 432 - 449
  • [5] Masked Autoencoders are Efficient Class Incremental Learners
    Zhai, Jiang-Tian
    Liu, Xialei
    Bagdanov, Andrew D.
    Li, Ke
    Cheng, Ming-Ming
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 19047 - 19056
  • [6] MAR: Masked Autoencoders for Efficient Action Recognition
    Qing Z.
    Zhang S.
    Huang Z.
    Wang X.
    Wang Y.
    Lv Y.
    Gao C.
    Sang N.
    IEEE Transactions on Multimedia, 2024, 26 : 218 - 233
  • [7] Where to Mask: Structure-Guided Masking for Graph Masked Autoencoders
    Li, Chuang
    Wang, Yuyao
    Zhang, Yibing
    Mai, Xueqi
    Tao, Dapeng
    Wu, Jia
    Hu, Wenbin
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 2180 - 2188
  • [8] Masked Autoencoders Enable Efficient Knowledge Distillers
    Bai, Yutong
    Wang, Zeyu
    Xiao, Junfei
    Wei, Chen
    Wang, Huiyu
    Yuille, Alan
    Zhou, Yuyin
    Xie, Cihang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 24256 - 24265
  • [9] Improving Masked Autoencoders by Learning Where to Mask
    Chen, Haijian
    Zhang, Wendong
    Wang, Yunbo
    Yang, Xiaokang
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VIII, 2024, 14432 : 377 - 390
  • [10] Masked Autoencoders for Medical Ultrasound Videos Using ROI-Aware Masking
    Szijarto, Adam
    Magyar, Balint
    Szeier, Thomas A.
    Tolvaj, Mate
    Fabian, Alexandra
    Lakatos, Balint K.
    Ladanyi, Zsuzsanna
    Bagyura, Zsolt
    Merkely, Bela
    Kovacs, Attila
    Tokodi, Marton
    SIMPLIFYING MEDICAL ULTRASOUND, ASMUS 2024, 2025, 15186 : 167 - 176