Improving Visual Representation Learning through Perceptual Understanding

被引:2
|
作者
Tukra, Samyakh [1 ]
Hoffman, Frederick [1 ]
Chatfield, Ken [1 ]
机构
[1] Tractable AI, London, England
关键词
D O I
10.1109/CVPR52729.2023.01392
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present an extension to masked autoencoders (MAE) which improves on the representations learnt by the model by explicitly encouraging the learning of higher scene-level features. We do this by: (i) the introduction of a perceptual similarity term between generated and real images (ii) incorporating several techniques from the adversarial training literature including multi-scale training and adaptive discriminator augmentation. The combination of these results in not only better pixel reconstruction but also representations which appear to capture better higher-level details within images. More consequentially, we show how our method, Perceptual MAE, leads to better performance when used for downstream tasks outperforming previous methods. We achieve 78.1% top-1 accuracy linear probing on ImageNet-1K and up to 88.1% when fine-tuning, with similar results for other downstream tasks, all without use of additional pre-trained models or data.
引用
收藏
页码:14486 / 14495
页数:10
相关论文
共 50 条
  • [1] Editorial: Improving visual deficits with perceptual learning
    Campana, Gianluca
    Maniglia, Marcello
    FRONTIERS IN PSYCHOLOGY, 2015, 6
  • [2] Category representation in primary visual cortex after visual perceptual learning
    Zhaofan Liu
    Yin Yan
    Da-Hui Wang
    Cognitive Neurodynamics, 2024, 18 : 23 - 35
  • [3] Category representation in primary visual cortex after visual perceptual learning
    Liu, Zhaofan
    Yan, Yin
    Wang, Da-Hui
    COGNITIVE NEURODYNAMICS, 2024, 18 (01) : 23 - 35
  • [4] Improving Visual Reasoning Through Semantic Representation
    Zheng, Wenfeng
    Liu, Xiangjun
    Ni, Xubin
    Yin, Lirong
    Yang, Bo
    IEEE ACCESS, 2021, 9 : 91476 - 91486
  • [5] Cross-cultural understanding through visual representation
    Beckman, Kristina
    Smith, Susan N.
    COLOMBIAN APPLIED LINGUISTICS JOURNAL, 2006, 8 : 137 - 151
  • [6] TURL: Table Understanding through Representation Learning
    Deng, Xiang
    Sun, Huan
    Lees, Alyssa
    Wu, You
    Yu, Cong
    SIGMOD RECORD, 2022, 51 (01) : 33 - 40
  • [7] TURL: Table Understanding through Representation Learning
    Deng, Xiang
    Sun, Huan
    Lees, Alyssa
    Wu, You
    Yu, Cong
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 14 (03): : 307 - 319
  • [8] Enhancing visual communication through representation learning
    Wei, Yuhan
    Lee, Changwook
    Han, Seokwon
    Kim, Anna
    FRONTIERS IN NEUROSCIENCE, 2024, 18
  • [9] Perceptual Visual Feature Learning With Applications in Sports Educational Image Understanding
    Liu, Tengsheng
    Xu, Minghui
    IEEE ACCESS, 2024, 12 : 41168 - 41179
  • [10] Improving learning and understanding through concept mapping
    Canas, Alberto J.
    Reiska, Priit
    Shvaikovsky, Oleg
    KNOWLEDGE MANAGEMENT & E-LEARNING-AN INTERNATIONAL JOURNAL, 2023, 15 (03) : 369 - 380