Layer-Wise Invertibility for Extreme Memory Cost Reduction of CNN Training

被引：3

作者：

Hascoet, Tristan ^{[1
]}

Febvre, Quentin ^{[2
]}

Zhuang, Weihao ^{[1
]}

Ariki, Yasuo ^{[1
]}

Takiguchi, Tetusya ^{[1
]}

机构：

[1] Kobe Univ, Kobe, Hyogo, Japan

[2] Sicara, Paris, France

来源：

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW) | 2019年

关键词：

D O I：

10.1109/ICCVW.2019.00258

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Convolutional Neural Networks (CNN) have demonstrated state-of-the-art results on various computer vision problems. However, training CNNs require specialized GPU with large memory. GPU memory has been a major bottleneck of the CNN training procedure, limiting the size of both inputs and model architectures. Given the ubiquity of CNN in computer vision, optimizing the memory consumption of CNN training would have wide spread practical benefits. Recently, reversible neural networks have been proposed to alleviate this memory bottleneck by recomputing hidden activations through inverse operations during the backward pass of the backpropagation algorithm. In this paper, we push this idea to extreme and design a reversible neural network with minimal training memory consumption. The result demonstrated that we can train CI-FAR10 dataset on Nvidia GTX750 GPU only with IGB memory and achieve 93% accuracy within 67 minutes.

引用

页码：2049 / 2052

页数：4

共 50 条

[41] Efficient Layer-Wise N:M Sparse CNN Accelerator with Flexible SPEC: Sparse Processing Element Clusters
Xie, Xiaoru
Zhu, Mingyu
Lu, Siyuan
Wang, Zhongfeng
MICROMACHINES, 2023, 14 (03)
[42] First-Order Sensitivity Analysis for Hidden Neuron Selection in Layer-Wise Training of Networks
Li, Bo
Chen, Cheng
NEURAL PROCESSING LETTERS, 2018, 48 (02) : 1105 - 1121
[43] First-Order Sensitivity Analysis for Hidden Neuron Selection in Layer-Wise Training of Networks
Bo Li
Cheng Chen
Neural Processing Letters, 2018, 48 : 1105 - 1121
[44] 2-D latent space models: Layer-wise perceptual training and spatial grounding
Grujicic, Dusan
Blaschko, Matthew
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2437 - 2443
[45] Deep Neural Network Quantization via Layer-Wise Optimization Using Limited Training Data
Chen, Shangyu
Wang, Wenya
Pan, Sinno Jialin
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 3329 - 3336
[46] Multithreaded Layer-wise Training of Sparse Deep Neural Networks using Compressed Sparse Column
Mofrad, Mohammad Hasanzadeh
Melhem, Rami
Ahmad, Yousuf
Hammoud, Mohammad
2019 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2019,
[47] L2-GCN Layer-Wise and Learned Efficient Training of Graph Convolutional Networks
You, Yuning
Chen, Tianlong
Wang, Zhangyang
Shen, Yang
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 2124 - 2132
[48] ANALYSIS OF LAYER-WISE TRAINING IN DIRECT SPEECH TO SPEECH TRANSLATION USING BI-LSTM
Arya, Lalaram
Agarwal, Ayush
Mishra, Jagabandhu
Prasanna, S. R. Mahadeva
2022 25TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA 2022), 2022,
[49] State-space Models with Layer-wise Nonlinearity are Universal Approximators with Exponential Decaying Memory
Wang, Shida
Xue, Beichen
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[50] Layer-wise hint-based training for knowledge transfer in a teacher-student framework
Bae, Ji-Hoon
Yim, Junho
Kim, Nae-Soo
Pyo, Cheol-Sig
Kim, Junmo
ETRI JOURNAL, 2019, 41 (02) : 242 - 253

← 1 2 3 4 5 →