Efficient Low-rank Backpropagation for Vision Transformer Adaptation

被引：0

作者：

Yang, Yuedong ^{[1
]}

Chiang, Hung-Yueh ^{[1
]}

Li, Guihong ^{[1
]}

Marculescu, Diana ^{[1
]}

Marculescu, Radu ^{[1
]}

机构：

[1] Univ Texas Austin, Chandra Family Dept Elect & Comp Engn, Austin, TX 78712 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The increasing scale of vision transformers (ViT) has made the efficient fine-tuning of these large models for specific needs a significant challenge in various applications. This issue originates from the computationally demanding matrix multiplications required during the backpropagation process through linear layers in ViT. In this paper, we tackle this problem by proposing a new Low-rank Back-Propagation viaWalsh-Hadamard Transformation (LBP-WHT) method. Intuitively, LBP-WHT projects the gradient into a low-rank space and carries out backpropagation. This approach substantially reduces the computation needed for adapting ViT, as matrix multiplication in the low-rank space is far less resource-intensive. We conduct extensive experiments with different models (ViT, hybrid convolution-ViT model) on multiple datasets to demonstrate the effectiveness of our method. For instance, when adapting an EfficientFormer-L1 model on CIFAR100, our LBP-WHT achieves 10.4% higher accuracy than the state-of-the-art baseline, while requiring 9 MFLOPs less computation. As the first work to accelerate ViT adaptation with low-rank backpropagation, our LBP-WHT method is complementary to many prior efforts and can be combined with them for better performance. Code: https://github.com/SLDGroup/LBP- WHT

引用

页数：12

共 50 条

[41] Low-Rank Transformer for High-Resolution Hyperspectral Computational Imaging
Liu, Yuanye
Dian, Renwei
Li, Shutao
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (02) : 809 - 824
[42] LORTSAR: Low-Rank Transformer for Skeleton-Based Action Recognition
Oraki, Soroush
Zhuang, Harry
Liang, Jie
ADVANCES IN VISUAL COMPUTING, ISVC 2024, PT I, 2025, 15046 : 196 - 207
[43] Low-Rank Prompt-Guided Transformer for Hyperspectral Image Denoising
Tan, Xiaodong
Shao, Mingwen
Qiao, Yuanjian
Liu, Tiyao
Cao, Xiangyong
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[44] LRTD: A Low-rank Transformer with Dynamic Depth and Width for Speech Recognition
Yu, Fan
Xi, Wei
Yang, Zhao
Tong, Ziye
Sun, Jingtong
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[45] Dynamic Low-rank Estimation for Transformer-based Language Models
Huai, Ting
Lie, Xiao
Gao, Shangqian
Hsu, Yenchang
Shen, Yilin
Jin, Hongxia
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 9275 - 9287
[46] Ayaka: A Versatile Transformer Accelerator With Low-Rank Estimation and Heterogeneous Dataflow
Qin, Yubin
Wang, Yang
Deng, Dazheng
Yang, Xiaolong
Zhao, Zhiren
Zhou, Yang
Fan, Yuanqi
Wei, Jingchuan
Chen, Tianbao
Liu, Leibo
Wei, Shaojun
Hu, Yang
Yin, Shouyi
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2024, 59 (10) : 3342 - 3356
[47] From low-rank retractions to dynamical low-rank approximation and back
Seguin, Axel
Ceruti, Gianluca
Kressner, Daniel
BIT NUMERICAL MATHEMATICS, 2024, 64 (03)
[48] Low-rank and sparse matrices fitting algorithm for low-rank representation
Zhao, Jianxi
Zhao, Lina
COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2020, 79 (02) : 407 - 425
[49] Low-rank Parareal: a low-rank parallel-in-time integrator
Benjamin Carrel
Martin J. Gander
Bart Vandereycken
BIT Numerical Mathematics, 2023, 63
[50] Low-rank Parareal: a low-rank parallel-in-time integrator
Carrel, Benjamin
Gander, Martin J.
Vandereycken, Bart
BIT NUMERICAL MATHEMATICS, 2023, 63 (01)

← 1 2 3 4 5 →