AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with Masked Autoencoders

被引：11

作者：

Bandara, Wele Gedara Chaminda ^{[1
]}

Patel, Naman ^{[2
]}

Gholami, Ali ^{[2
]}

Nikkhah, Mehdi ^{[2
]}

Agrawal, Motilal ^{[2
]}

Patel, Vishal M. ^{[1
]}

机构：

[1] Johns Hopkins Univ, Baltimore, MD 21218 USA

[2] Zippin, San Francisco, CA USA

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.01394

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Masked Autoencoders (MAEs) learn generalizable representations for image, text, audio, video, etc., by reconstructing masked input data from tokens of the visible data. Current MAE approaches for videos rely on random patch, tube, or frame based masking strategies to select these tokens. This paper proposes AdaMAE, an adaptive masking strategy for MAEs that is end-to-end trainable. Our adaptive masking strategy samples visible tokens based on the semantic context using an auxiliary sampling network. This network estimates a categorical distribution over spacetime-patch tokens. The tokens that increase the expected reconstruction error are rewarded and selected as visible tokens, motivated by the policy gradient algorithm in reinforcement learning. We show that AdaMAE samples more tokens from the high spatiotemporal information regions, thereby allowing us to mask 95% of tokens, resulting in lower memory requirements and faster pre-training. We conduct ablation studies on the Something-Something v2 (SSv2) dataset to demonstrate the efficacy of our adaptive sampling approach and report state-of-the-art results of 70.0% and 81.7% in top-1 accuracy on SSv2 and Kinetics-400 action classification datasets with a ViT-Base backbone and 800 pre-training epochs. Code and pre-trained models are available at: https://github.com/wgcban/adamae.git.

引用

页码：14507 / 14517

页数：11

共 50 条

[1] SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders
Li, Gang
Zheng, Heliang
Liu, Daqing
Wang, Chaoyue
Su, Bing
Zheng, Changwen
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[2] Masked Autoencoders As Spatiotemporal Learners
Feichtenhofer, Christoph
Fan, Haoqi
Li, Yanghao
He, Kaiming
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[3] Efficient Masked Autoencoders With Self-Consistency
Li, Zhaowen
Zhu, Yousong
Chen, Zhiyang
Li, Wei
Zhao, Rui
Zhao, Chaoyang
Tang, Ming
Wang, Jinqiao
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 8743 - 8757
[4] ColorMAE: Exploring Data-Independent Masking Strategies in Masked AutoEncoders
Hinojosa, Carlos
Liu, Shuming
Ghanem, Bernard
COMPUTER VISION - ECCV 2024, PT XX, 2025, 15078 : 432 - 449
[5] Masked Autoencoders are Efficient Class Incremental Learners
Zhai, Jiang-Tian
Liu, Xialei
Bagdanov, Andrew D.
Li, Ke
Cheng, Ming-Ming
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 19047 - 19056
[6] MAR: Masked Autoencoders for Efficient Action Recognition
Qing Z.
Zhang S.
Huang Z.
Wang X.
Wang Y.
Lv Y.
Gao C.
Sang N.
IEEE Transactions on Multimedia, 2024, 26 : 218 - 233
[7] Where to Mask: Structure-Guided Masking for Graph Masked Autoencoders
Li, Chuang
Wang, Yuyao
Zhang, Yibing
Mai, Xueqi
Tao, Dapeng
Wu, Jia
Hu, Wenbin
PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 2180 - 2188
[8] Masked Autoencoders Enable Efficient Knowledge Distillers
Bai, Yutong
Wang, Zeyu
Xiao, Junfei
Wei, Chen
Wang, Huiyu
Yuille, Alan
Zhou, Yuyin
Xie, Cihang
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 24256 - 24265
[9] Improving Masked Autoencoders by Learning Where to Mask
Chen, Haijian
Zhang, Wendong
Wang, Yunbo
Yang, Xiaokang
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VIII, 2024, 14432 : 377 - 390
[10] Masked Autoencoders for Medical Ultrasound Videos Using ROI-Aware Masking
Szijarto, Adam
Magyar, Balint
Szeier, Thomas A.
Tolvaj, Mate
Fabian, Alexandra
Lakatos, Balint K.
Ladanyi, Zsuzsanna
Bagyura, Zsolt
Merkely, Bela
Kovacs, Attila
Tokodi, Marton
SIMPLIFYING MEDICAL ULTRASOUND, ASMUS 2024, 2025, 15186 : 167 - 176

← 1 2 3 4 5 →