Adaptive Hybrid Vision Transformer for Small Datasets

被引：1

作者：

Yin, Mingjun ^{[1
]}

Chang, Zhiyong ^{[2
]}

Wang, Yan ^{[3
]}

机构：

[1] Univ Melbourne, Melbourne, Vic, Australia

[2] Peking Univ, Beijing, Peoples R China

[3] Xiaochuan Chuhai, Beijing, Peoples R China

来源：

2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI | 2023年

关键词：

Vision Transformer; Small Dataset; Self-Attention;

D O I：

10.1109/ICTAI59109.2023.00132

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, vision Transformers (ViTs) have achieved competitive performance on many computer vision tasks. However, vision Transformers show impaired performance on small datasets when training from scratch compared with Convolutional Neural Networks (CNNs), which is interpreted as the lack of locality inductive bias. This impedes the application of vision Transformers for small-size datasets. In this work, we propose Adaptive Hybrid Vision Transformer (AHVT) as the solution to boost the performance of vision Transformers on small-scale datasets. Specifically, on spatial dimension, we exploit a Convolutional Overlapping Patch Embedding (COPE) layer to inject desirable inductive bias in model, forcing the model to learn the local token features. On channel dimension, we insert a adaptive channel features aggregation block into vanilla feed forward network to calibrate channel responses. Meanwhile, we add several extra learnable "cardinality tokens" to patch token sequences to capture cross-channel interaction. We present extensive experiments to validate the effectiveness of our method on five small/medium datasets including CIFAR10/100, SVHN, Tiny-ImageNet and ImageNet-1k. Our approach attains state-of-the-art performance on above four small datasets when training from scratch.

引用

页码：873 / 880

页数：8

共 50 条

[21] The Hybrid Vision Transformer Approach for Hyperpigmentation Nail Disease Detection
Kumar, Krish
Kumar, Chandan
Nijhawan, Rahul
Mittal, Ankush
PROCEEDINGS OF SECOND INTERNATIONAL CONFERENCE ON SUSTAINABLE EXPERT SYSTEMS (ICSES 2021), 2022, 351 : 31 - 42
[22] Enhancing Security: Infused Hybrid Vision Transformer for Signature Verification
Ishfaq, Muhammad
Saadia, Ayesha
Alserhani, Faeiz M.
Gul, Ammara
IEEE ACCESS, 2024, 12 : 137504 - 137521
[23] Hyneter:Hybrid Network Transformer for Multiple Computer Vision Tasks
Chen, Dong
Miao, Duoqian
Zhao, Xuerong
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (06) : 8773 - 8785
[24] FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization
Vasu, Pavan Kumar Anasosalu
Gabriel, James
Zhu, Jeff
Tuzel, Oncel
Ranjan, Anurag
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5762 - 5772
[25] VisionNet: An efficient vision transformer-based hybrid adaptive networks for eye cancer detection with enhanced cheetah optimizer
Akshaya, B.
Sakthivel, P.
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 97
[26] Hybrid UNet transformer architecture for ischemic stoke segmentation with MRI and CT datasets
Soh, Wei Kwek
Rajapakse, Jagath C.
FRONTIERS IN NEUROSCIENCE, 2023, 17
[27] Hybrid AI model for power transformer assessment using imbalanced DGA datasets
Wang, Lin
Littler, Tim
Liu, Xueqin
IET RENEWABLE POWER GENERATION, 2023, 17 (08) : 1912 - 1922
[28] Predicting the Strength of Composites with Computer Vision Using Small Experimental Datasets
Lai, Po-Hao
Gomez, Enrique D.
Vogt, Bryan D.
Reinhart, Wesley F.
ACS MATERIALS LETTERS, 2025, 7 (04): : 1503 - 1511
[29] Vision Transformers for Small Histological Datasets Learned Through Knowledge Distillation
Kanwal, Neel
Eftestol, Trygve
Khoraminia, Farbod
Zuiverloon, Tahlita C. M.
Engan, Kjersti
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2023, PT III, 2023, 13937 : 167 - 179
[30] IRSTFormer: A Hierarchical Vision Transformer for Infrared Small Target Detection
Chen, Gao
Wang, Weihua
Tan, Sirui
REMOTE SENSING, 2022, 14 (14)

← 1 2 3 4 5 →