Adaptive Hybrid Vision Transformer for Small Datasets

被引：1

作者：

Yin, Mingjun ^{[1
]}

Chang, Zhiyong ^{[2
]}

Wang, Yan ^{[3
]}

机构：

[1] Univ Melbourne, Melbourne, Vic, Australia

[2] Peking Univ, Beijing, Peoples R China

[3] Xiaochuan Chuhai, Beijing, Peoples R China

来源：

2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI | 2023年

关键词：

Vision Transformer; Small Dataset; Self-Attention;

D O I：

10.1109/ICTAI59109.2023.00132

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, vision Transformers (ViTs) have achieved competitive performance on many computer vision tasks. However, vision Transformers show impaired performance on small datasets when training from scratch compared with Convolutional Neural Networks (CNNs), which is interpreted as the lack of locality inductive bias. This impedes the application of vision Transformers for small-size datasets. In this work, we propose Adaptive Hybrid Vision Transformer (AHVT) as the solution to boost the performance of vision Transformers on small-scale datasets. Specifically, on spatial dimension, we exploit a Convolutional Overlapping Patch Embedding (COPE) layer to inject desirable inductive bias in model, forcing the model to learn the local token features. On channel dimension, we insert a adaptive channel features aggregation block into vanilla feed forward network to calibrate channel responses. Meanwhile, we add several extra learnable "cardinality tokens" to patch token sequences to capture cross-channel interaction. We present extensive experiments to validate the effectiveness of our method on five small/medium datasets including CIFAR10/100, SVHN, Tiny-ImageNet and ImageNet-1k. Our approach attains state-of-the-art performance on above four small datasets when training from scratch.

引用

页码：873 / 880

页数：8

共 50 条

[31] AViT: Adapting Vision Transformers for Small Skin Lesion Segmentation Datasets
Du, Siyi
Bayasi, Nourhan
Hamarneh, Ghassan
Garbi, Rafeef
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023 WORKSHOPS, 2023, 14393 : 25 - 36
[32] Comparative Analysis of Vision Transformer Models for Facial Emotion Recognition Using Augmented Balanced Datasets
Bobojanov, Sukhrob
Kim, Byeong Man
Arabboev, Mukhriddin
Begmatov, Shohruh
APPLIED SCIENCES-BASEL, 2023, 13 (22):
[33] Adaptive Differential Privacy Algorithm for Federated Learning on Small Datasets
Xia, Lei
Yang, Huanbo
2024 3RD INTERNATIONAL CONFERENCE ON ROBOTICS, ARTIFICIAL INTELLIGENCE AND INTELLIGENT CONTROL, RAIIC 2024, 2024, : 497 - 502
[34] Image-Adaptive Hint Generation via Vision Transformer for Outpainting
Kong, Daehyeon
Kong, Kyeongbo
Kim, Kyunghun
Min, Sung-Jun
Kang, Suk-Ju
2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 4029 - 4038
[35] ViT-CAPS: Vision transformer with contrastive adaptive prompt segmentation
Rashid, Khawaja Iftekhar
Yang, Chenhui
NEUROCOMPUTING, 2025, 625
[36] Adaptive Parking Slot Occupancy Detection Using Vision Transformer and LLIE
Pannerselvam, Karthick
2021 IEEE INTERNATIONAL SMART CITIES CONFERENCE (ISC2), 2021,
[37] Residual Vision Transformer and Adaptive Fusion Autoencoders for Monocular Depth Estimation
Yang, Wei-Jong
Wu, Chih-Chen
Yang, Jar-Ferr
SENSORS, 2025, 25 (01)
[38] Exploring the synergies of hybrid convolutional neural network and Vision Transformer architectures for computer vision: A survey
Haruna, Yunusa
Qin, Shiyin
Chukkol, Abdulrahman Hamman Adama
Yusuf, Abdulganiyu Abdu
Bello, Isah
Lawan, Adamu
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 144
[39] Systematic Review of Hybrid Vision Transformer Architectures for Radiological Image Analysis
Kim, Ji Woong
Khan, Aisha Urooj
Banerjee, Imon
JOURNAL OF IMAGING INFORMATICS IN MEDICINE, 2025,
[40] HaViT: Hybrid-Attention Based Vision Transformer for Video Classification
Li, Li
Zhuang, Liansheng
Gao, Shenghua
Wang, Shafei
COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 502 - 517

← 1 2 3 4 5 →