Peripheral Vision Transformer

被引:0
|
作者
Min, Juhong [1 ]
Zhao, Yucheng [2 ,3 ]
Luo, Chong [2 ]
Cho, Minsu [1 ]
机构
[1] Pohang Univ Sci & Technol POSTECH, Pohang, South Korea
[2] Microsoft Res Asia MSRA, Beijing, Peoples R China
[3] Univ Sci & Technol China, Hefei, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human vision possesses a special type of visual processing systems called peripheral vision. Partitioning the entire visual field into multiple contour regions based on the distance to the center of our gaze, the peripheral vision provides us the ability to perceive various visual features at different regions. In this work, we take a biologically inspired approach and explore to model peripheral vision in deep neural networks for visual recognition. We propose to incorporate peripheral position encoding to the multi-head self-attention layers to let the network learn to partition the visual field into diverse peripheral regions given training data. We evaluate the proposed network, dubbed PerViT, on ImageNet-1K and systematically investigate the inner workings of the model for machine perception, showing that the network learns to perceive visual data similarly to the way that human vision does. The performance improvements in image classification over the baselines across different model sizes demonstrate the efficacy of the proposed method.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Vision Transformer with Deformable Attention
    Xia, Zhuofan
    Pan, Xuran
    Song, Shiji
    Li, Li Erran
    Huang, Gao
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4784 - 4793
  • [32] Vision Transformer With Quadrangle Attention
    Zhang, Qiming
    Zhang, Jing
    Xu, Yufei
    Tao, Dacheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (05) : 3608 - 3624
  • [33] CONTINUAL LEARNING IN VISION TRANSFORMER
    Takeda, Mana
    Yanai, Keiji
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 616 - 620
  • [34] Adder Attention for Vision Transformer
    Shu, Han
    Wang, Jiahao
    Chen, Hanting
    Li, Lin
    Yang, Yujiu
    Wang, Yunhe
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [35] On the Faithfulness of Vision Transformer Explanations
    Wu, Junyi
    Kang, Weitai
    Tang, Hao
    Hong, Yuan
    Yan, Yan
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 10936 - 10945
  • [36] ViViT: A Video Vision Transformer
    Arnab, Anurag
    Dehghani, Mostafa
    Heigold, Georg
    Sun, Chen
    Lucic, Mario
    Schmid, Cordelia
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 6816 - 6826
  • [37] Spiking Convolutional Vision Transformer
    Talafha, Sameerah
    Rekabdar, Banafsheh
    Mousas, Christos
    Ekenna, Chinwe
    2023 IEEE 17TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, ICSC, 2023, : 225 - 226
  • [38] Towards Robust Vision Transformer
    Mao, Xiaofeng
    Qi, Gege
    Chen, Yuefeng
    Li, Xiaodan
    Duan, Ranjie
    Ye, Shaokai
    He, Yuan
    Xue, Hui
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 12032 - 12041
  • [39] A lightweight vision transformer with symmetric modules for vision tasks
    Liang, Shengjun
    Yu, Mingxin
    Lu, Wenshuai
    Ji, Xinglong
    Tang, Xiongxin
    Liu, Xiaolin
    You, Rui
    INTELLIGENT DATA ANALYSIS, 2023, 27 (06) : 1741 - 1757
  • [40] FLatten Transformer: Vision Transformer using Focused Linear Attention
    Han, Dongchen
    Pan, Xuran
    Han, Yizeng
    Song, Shiji
    Huang, Gao
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5938 - 5948