Adaptive Hybrid Vision Transformer for Small Datasets

被引:1
|
作者
Yin, Mingjun [1 ]
Chang, Zhiyong [2 ]
Wang, Yan [3 ]
机构
[1] Univ Melbourne, Melbourne, Vic, Australia
[2] Peking Univ, Beijing, Peoples R China
[3] Xiaochuan Chuhai, Beijing, Peoples R China
关键词
Vision Transformer; Small Dataset; Self-Attention;
D O I
10.1109/ICTAI59109.2023.00132
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, vision Transformers (ViTs) have achieved competitive performance on many computer vision tasks. However, vision Transformers show impaired performance on small datasets when training from scratch compared with Convolutional Neural Networks (CNNs), which is interpreted as the lack of locality inductive bias. This impedes the application of vision Transformers for small-size datasets. In this work, we propose Adaptive Hybrid Vision Transformer (AHVT) as the solution to boost the performance of vision Transformers on small-scale datasets. Specifically, on spatial dimension, we exploit a Convolutional Overlapping Patch Embedding (COPE) layer to inject desirable inductive bias in model, forcing the model to learn the local token features. On channel dimension, we insert a adaptive channel features aggregation block into vanilla feed forward network to calibrate channel responses. Meanwhile, we add several extra learnable "cardinality tokens" to patch token sequences to capture cross-channel interaction. We present extensive experiments to validate the effectiveness of our method on five small/medium datasets including CIFAR10/100, SVHN, Tiny-ImageNet and ImageNet-1k. Our approach attains state-of-the-art performance on above four small datasets when training from scratch.
引用
收藏
页码:873 / 880
页数:8
相关论文
共 50 条
  • [21] The Hybrid Vision Transformer Approach for Hyperpigmentation Nail Disease Detection
    Kumar, Krish
    Kumar, Chandan
    Nijhawan, Rahul
    Mittal, Ankush
    PROCEEDINGS OF SECOND INTERNATIONAL CONFERENCE ON SUSTAINABLE EXPERT SYSTEMS (ICSES 2021), 2022, 351 : 31 - 42
  • [22] Enhancing Security: Infused Hybrid Vision Transformer for Signature Verification
    Ishfaq, Muhammad
    Saadia, Ayesha
    Alserhani, Faeiz M.
    Gul, Ammara
    IEEE ACCESS, 2024, 12 : 137504 - 137521
  • [23] Hyneter:Hybrid Network Transformer for Multiple Computer Vision Tasks
    Chen, Dong
    Miao, Duoqian
    Zhao, Xuerong
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (06) : 8773 - 8785
  • [24] FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization
    Vasu, Pavan Kumar Anasosalu
    Gabriel, James
    Zhu, Jeff
    Tuzel, Oncel
    Ranjan, Anurag
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5762 - 5772
  • [25] VisionNet: An efficient vision transformer-based hybrid adaptive networks for eye cancer detection with enhanced cheetah optimizer
    Akshaya, B.
    Sakthivel, P.
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 97
  • [26] Hybrid UNet transformer architecture for ischemic stoke segmentation with MRI and CT datasets
    Soh, Wei Kwek
    Rajapakse, Jagath C.
    FRONTIERS IN NEUROSCIENCE, 2023, 17
  • [27] Hybrid AI model for power transformer assessment using imbalanced DGA datasets
    Wang, Lin
    Littler, Tim
    Liu, Xueqin
    IET RENEWABLE POWER GENERATION, 2023, 17 (08) : 1912 - 1922
  • [28] Predicting the Strength of Composites with Computer Vision Using Small Experimental Datasets
    Lai, Po-Hao
    Gomez, Enrique D.
    Vogt, Bryan D.
    Reinhart, Wesley F.
    ACS MATERIALS LETTERS, 2025, 7 (04): : 1503 - 1511
  • [29] Vision Transformers for Small Histological Datasets Learned Through Knowledge Distillation
    Kanwal, Neel
    Eftestol, Trygve
    Khoraminia, Farbod
    Zuiverloon, Tahlita C. M.
    Engan, Kjersti
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2023, PT III, 2023, 13937 : 167 - 179
  • [30] IRSTFormer: A Hierarchical Vision Transformer for Infrared Small Target Detection
    Chen, Gao
    Wang, Weihua
    Tan, Sirui
    REMOTE SENSING, 2022, 14 (14)