Adaptive Hybrid Vision Transformer for Small Datasets

被引：1

作者：

Yin, Mingjun ^{[1
]}

Chang, Zhiyong ^{[2
]}

Wang, Yan ^{[3
]}

机构：

[1] Univ Melbourne, Melbourne, Vic, Australia

[2] Peking Univ, Beijing, Peoples R China

[3] Xiaochuan Chuhai, Beijing, Peoples R China

来源：

2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI | 2023年

关键词：

Vision Transformer; Small Dataset; Self-Attention;

D O I：

10.1109/ICTAI59109.2023.00132

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, vision Transformers (ViTs) have achieved competitive performance on many computer vision tasks. However, vision Transformers show impaired performance on small datasets when training from scratch compared with Convolutional Neural Networks (CNNs), which is interpreted as the lack of locality inductive bias. This impedes the application of vision Transformers for small-size datasets. In this work, we propose Adaptive Hybrid Vision Transformer (AHVT) as the solution to boost the performance of vision Transformers on small-scale datasets. Specifically, on spatial dimension, we exploit a Convolutional Overlapping Patch Embedding (COPE) layer to inject desirable inductive bias in model, forcing the model to learn the local token features. On channel dimension, we insert a adaptive channel features aggregation block into vanilla feed forward network to calibrate channel responses. Meanwhile, we add several extra learnable "cardinality tokens" to patch token sequences to capture cross-channel interaction. We present extensive experiments to validate the effectiveness of our method on five small/medium datasets including CIFAR10/100, SVHN, Tiny-ImageNet and ImageNet-1k. Our approach attains state-of-the-art performance on above four small datasets when training from scratch.

引用

页码：873 / 880

页数：8

共 50 条

[41] Hybrid Vision Transformer for Domain Adaptable Person Re-identification
Waseem, Muhammad Danish
Tahir, Muhammad Atif
Durrani, Muhammad Nouman
ADVANCES IN COMPUTATIONAL COLLECTIVE INTELLIGENCE (ICCCI 2021), 2021, 1463 : 114 - 122
[42] Facial Expression Recognition Based on Vision Transformer with Hybrid Local Attention
Tian, Yuan
Zhu, Jingxuan
Yao, Huang
Chen, Di
APPLIED SCIENCES-BASEL, 2024, 14 (15):
[43] CSFNet: a compact and efficient convolution-transformer hybrid vision model
Feng, Jian
Wu, Peng
Xu, Renjie
Zhang, Xiaoming
Wang, Tao
Li, Xuan
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (29) : 72679 - 72699
[44] PlantVitGnet: A Hybrid Model of Vision Transformer and GoogLeNet for Plant Disease Identification
Gupta, Pradeep
Jadon, Rakesh Singh
JOURNAL OF PHYTOPATHOLOGY, 2025, 173 (02)
[45] Vision Transformer With Hybrid Shifted Windows for Gastrointestinal Endoscopy Image Classification
Wang, Wei
Yang, Xin
Tang, Jinhui
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4452 - 4461
[46] Hybrid Transformer for Lesion Segmentation on Adaptive Optics Retinal Images
Liu, Jianfei
Li, Joanne
Wolde, Amday
Cukras, Catherine
Tam, Johnny
MEDICAL IMAGING 2022: COMPUTER-AIDED DIAGNOSIS, 2022, 12033
[47] Depth-Wise Convolutions in Vision Transformers for efficient training on small datasets
Zhang, Tianxiao
Xu, Wenju
Luo, Bo
Wang, Guanghui
NEUROCOMPUTING, 2025, 617
[48] Transfer Learning Methods as a New Approach in Computer Vision Tasks with Small Datasets
Brodzicki, Andrzej
Piekarski, Michal
Kucharski, Dariusz
Jaworek-Korjakowska, Joanna
Gorgon, Marek
FOUNDATIONS OF COMPUTING AND DECISION SCIENCES, 2020, 45 (03) : 179 - 193
[49] Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets
Lu, Zhiying
Xie, Hongtao
Liu, Chuanbin
Zhang, Yongdong
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[50] DAEEGViT: A domain adaptive vision transformer framework for EEG cognitive state identification
Ouyang, Yu
Liu, Yang
Shan, Liang
Jia, Zhe
Qian, Dongguan
Zeng, Tao
Zeng, Hong
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 100

← 1 2 3 4 5 →