Adaptive Hybrid Vision Transformer for Small Datasets

被引:1
|
作者
Yin, Mingjun [1 ]
Chang, Zhiyong [2 ]
Wang, Yan [3 ]
机构
[1] Univ Melbourne, Melbourne, Vic, Australia
[2] Peking Univ, Beijing, Peoples R China
[3] Xiaochuan Chuhai, Beijing, Peoples R China
关键词
Vision Transformer; Small Dataset; Self-Attention;
D O I
10.1109/ICTAI59109.2023.00132
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, vision Transformers (ViTs) have achieved competitive performance on many computer vision tasks. However, vision Transformers show impaired performance on small datasets when training from scratch compared with Convolutional Neural Networks (CNNs), which is interpreted as the lack of locality inductive bias. This impedes the application of vision Transformers for small-size datasets. In this work, we propose Adaptive Hybrid Vision Transformer (AHVT) as the solution to boost the performance of vision Transformers on small-scale datasets. Specifically, on spatial dimension, we exploit a Convolutional Overlapping Patch Embedding (COPE) layer to inject desirable inductive bias in model, forcing the model to learn the local token features. On channel dimension, we insert a adaptive channel features aggregation block into vanilla feed forward network to calibrate channel responses. Meanwhile, we add several extra learnable "cardinality tokens" to patch token sequences to capture cross-channel interaction. We present extensive experiments to validate the effectiveness of our method on five small/medium datasets including CIFAR10/100, SVHN, Tiny-ImageNet and ImageNet-1k. Our approach attains state-of-the-art performance on above four small datasets when training from scratch.
引用
收藏
页码:873 / 880
页数:8
相关论文
共 50 条
  • [41] Hybrid Vision Transformer for Domain Adaptable Person Re-identification
    Waseem, Muhammad Danish
    Tahir, Muhammad Atif
    Durrani, Muhammad Nouman
    ADVANCES IN COMPUTATIONAL COLLECTIVE INTELLIGENCE (ICCCI 2021), 2021, 1463 : 114 - 122
  • [42] Facial Expression Recognition Based on Vision Transformer with Hybrid Local Attention
    Tian, Yuan
    Zhu, Jingxuan
    Yao, Huang
    Chen, Di
    APPLIED SCIENCES-BASEL, 2024, 14 (15):
  • [43] CSFNet: a compact and efficient convolution-transformer hybrid vision model
    Feng, Jian
    Wu, Peng
    Xu, Renjie
    Zhang, Xiaoming
    Wang, Tao
    Li, Xuan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (29) : 72679 - 72699
  • [44] PlantVitGnet: A Hybrid Model of Vision Transformer and GoogLeNet for Plant Disease Identification
    Gupta, Pradeep
    Jadon, Rakesh Singh
    JOURNAL OF PHYTOPATHOLOGY, 2025, 173 (02)
  • [45] Vision Transformer With Hybrid Shifted Windows for Gastrointestinal Endoscopy Image Classification
    Wang, Wei
    Yang, Xin
    Tang, Jinhui
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4452 - 4461
  • [46] Hybrid Transformer for Lesion Segmentation on Adaptive Optics Retinal Images
    Liu, Jianfei
    Li, Joanne
    Wolde, Amday
    Cukras, Catherine
    Tam, Johnny
    MEDICAL IMAGING 2022: COMPUTER-AIDED DIAGNOSIS, 2022, 12033
  • [47] Depth-Wise Convolutions in Vision Transformers for efficient training on small datasets
    Zhang, Tianxiao
    Xu, Wenju
    Luo, Bo
    Wang, Guanghui
    NEUROCOMPUTING, 2025, 617
  • [48] Transfer Learning Methods as a New Approach in Computer Vision Tasks with Small Datasets
    Brodzicki, Andrzej
    Piekarski, Michal
    Kucharski, Dariusz
    Jaworek-Korjakowska, Joanna
    Gorgon, Marek
    FOUNDATIONS OF COMPUTING AND DECISION SCIENCES, 2020, 45 (03) : 179 - 193
  • [49] Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets
    Lu, Zhiying
    Xie, Hongtao
    Liu, Chuanbin
    Zhang, Yongdong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [50] DAEEGViT: A domain adaptive vision transformer framework for EEG cognitive state identification
    Ouyang, Yu
    Liu, Yang
    Shan, Liang
    Jia, Zhe
    Qian, Dongguan
    Zeng, Tao
    Zeng, Hong
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 100