Adaptive Masked Autoencoder Transformer for image classification

被引:1
|
作者
Chen, Xiangru [1 ,2 ]
Liu, Chenjing [1 ,2 ]
Hu, Peng
Lin, Jie [1 ,2 ]
Gong, Yunhong
Chen, Yingke [4 ]
Peng, Dezhong [1 ,3 ]
Geng, Xue [2 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Agcy Sci Technol & Res, Inst Infocomm Res, Singapore 138632, Singapore
[3] Sichuan Newstrong UHD Video Technol Co Ltd, Chengdu 610095, Peoples R China
[4] Northumbria Univ, Dept Comp & Informat Sci, Newcastle Upon Tyne NE1 8ST, England
基金
中国国家自然科学基金;
关键词
Vision transformer; Masked image modeling; Image classification;
D O I
10.1016/j.asoc.2024.111958
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision Transformers (ViTs) have exhibited exceptional performance across a broad spectrum of visual tasks. Nonetheless, their computational requirements often surpass those of prevailing CNN-based models. Token sparsity techniques have been employed as a means to alleviate this issue. Regrettably, these techniques often result in the loss of semantic information and subsequent deterioration in performance. In order to address these challenges, we propose the Adaptive Masked Autoencoder Transformer (AMAT), a masked image modeling-based method. AMAT integrates a novel adaptive masking mechanism and a training objective function for both pre-training and fine-tuning stages. Our primary objective is to reduce the complexity of Vision Transformer models while concurrently enhancing their final accuracy. Through experiments conducted on the ILSVRC-2012 dataset, our proposed method surpasses the original ViT by achieving up to 40% FLOPs savings. Moreover, AMAT outperforms the efficient DynamicViT model by 0.1% while saving 4% FLOPs. Furthermore, on the Places365 dataset, AMAT achieves a 0.3% accuracy loss while saving 21% FLOPs compared to MAE. These findings effectively demonstrate the efficacy of AMAT in mitigating computational complexity while maintaining a high level of accuracy.
引用
收藏
页数:10
相关论文
共 50 条
  • [11] Cervical OCT image classification using contrastive masked autoencoders with Swin Transformer
    Wang, Qingbin
    Xiong, Yuxuan
    Zhu, Hanfeng
    Mu, Xuefeng
    Zhang, Yan
    Ma, Yutao
    COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2024, 118
  • [12] SS-MAE: Spatial–Spectral Masked Autoencoder for Multisource Remote Sensing Image Classification
    Lin, Junyan
    Gao, Feng
    Shi, Xiaochen
    Dong, Junyu
    Du, Qian
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 14
  • [13] Medical Image Classification Using Self-Supervised Learning-Based Masked Autoencoder
    Fan, Zong
    Wang, Zhimin
    Gong, Ping
    Lee, Christine U.
    Tang, Shanshan
    Zhang, Xiaohui
    Hao, Yao
    Zhang, Zhongwei
    Song, Pengfei
    Chen, Shigao
    Li, Hua
    MEDICAL IMAGING 2024: IMAGE PROCESSING, 2024, 12926
  • [14] Masked Transformer for Image Anomaly Localization
    De Nardin, Axel
    Mishra, Pankaj
    Foresti, Gian Luca
    Piciarelli, Claudio
    INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2022, 32 (07)
  • [15] DEMAE: Diffusion-Enhanced Masked Autoencoder for Hyperspectral Image Classification With Few Labeled Samples
    Li, Ziyu
    Xue, Zhaohui
    Jia, Mingming
    Nie, Xiangyu
    Wu, Hao
    Zhang, Mengxue
    Su, Hongjun
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [16] Continuously Masked Transformer for Image Inpainting
    Ko, Keunsoo
    Kim, Chang-Su
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13123 - 13132
  • [17] MaskGIT: Masked Generative Image Transformer
    Chang, Huiwen
    Zhang, Han
    Jiang, Lu
    Liu, Ce
    Freeman, William T.
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11305 - 11315
  • [18] Adaptive Transformer-Based Conditioned Variational Autoencoder for Incomplete Social Event Classification
    Li, Zhangming
    Qian, Shengsheng
    Cao, Jie
    Fang, Quan
    Xu, Changsheng
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1698 - 1707
  • [19] Masked Autoencoder Pretraining for Event Classification in Elite Soccer
    Rudolph, Yannick
    Brefeld, Ulf
    MACHINE LEARNING AND DATA MINING FOR SPORTS ANALYTICS, MLSA 2023, 2024, 2035 : 24 - 35
  • [20] Spectral-Spatial Masked Transformer With Supervised and Contrastive Learning for Hyperspectral Image Classification
    Huang, Lingbo
    Chen, Yushi
    He, Xin
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61