Layer-Wise Invertibility for Extreme Memory Cost Reduction of CNN Training

被引:3
|
作者
Hascoet, Tristan [1 ]
Febvre, Quentin [2 ]
Zhuang, Weihao [1 ]
Ariki, Yasuo [1 ]
Takiguchi, Tetusya [1 ]
机构
[1] Kobe Univ, Kobe, Hyogo, Japan
[2] Sicara, Paris, France
关键词
D O I
10.1109/ICCVW.2019.00258
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Convolutional Neural Networks (CNN) have demonstrated state-of-the-art results on various computer vision problems. However, training CNNs require specialized GPU with large memory. GPU memory has been a major bottleneck of the CNN training procedure, limiting the size of both inputs and model architectures. Given the ubiquity of CNN in computer vision, optimizing the memory consumption of CNN training would have wide spread practical benefits. Recently, reversible neural networks have been proposed to alleviate this memory bottleneck by recomputing hidden activations through inverse operations during the backward pass of the backpropagation algorithm. In this paper, we push this idea to extreme and design a reversible neural network with minimal training memory consumption. The result demonstrated that we can train CI-FAR10 dataset on Nvidia GTX750 GPU only with IGB memory and achieve 93% accuracy within 67 minutes.
引用
收藏
页码:2049 / 2052
页数:4
相关论文
共 50 条
  • [41] Efficient Layer-Wise N:M Sparse CNN Accelerator with Flexible SPEC: Sparse Processing Element Clusters
    Xie, Xiaoru
    Zhu, Mingyu
    Lu, Siyuan
    Wang, Zhongfeng
    MICROMACHINES, 2023, 14 (03)
  • [42] First-Order Sensitivity Analysis for Hidden Neuron Selection in Layer-Wise Training of Networks
    Li, Bo
    Chen, Cheng
    NEURAL PROCESSING LETTERS, 2018, 48 (02) : 1105 - 1121
  • [43] First-Order Sensitivity Analysis for Hidden Neuron Selection in Layer-Wise Training of Networks
    Bo Li
    Cheng Chen
    Neural Processing Letters, 2018, 48 : 1105 - 1121
  • [44] 2-D latent space models: Layer-wise perceptual training and spatial grounding
    Grujicic, Dusan
    Blaschko, Matthew
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2437 - 2443
  • [45] Deep Neural Network Quantization via Layer-Wise Optimization Using Limited Training Data
    Chen, Shangyu
    Wang, Wenya
    Pan, Sinno Jialin
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 3329 - 3336
  • [46] Multithreaded Layer-wise Training of Sparse Deep Neural Networks using Compressed Sparse Column
    Mofrad, Mohammad Hasanzadeh
    Melhem, Rami
    Ahmad, Yousuf
    Hammoud, Mohammad
    2019 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2019,
  • [47] L2-GCN Layer-Wise and Learned Efficient Training of Graph Convolutional Networks
    You, Yuning
    Chen, Tianlong
    Wang, Zhangyang
    Shen, Yang
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 2124 - 2132
  • [48] ANALYSIS OF LAYER-WISE TRAINING IN DIRECT SPEECH TO SPEECH TRANSLATION USING BI-LSTM
    Arya, Lalaram
    Agarwal, Ayush
    Mishra, Jagabandhu
    Prasanna, S. R. Mahadeva
    2022 25TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA 2022), 2022,
  • [49] State-space Models with Layer-wise Nonlinearity are Universal Approximators with Exponential Decaying Memory
    Wang, Shida
    Xue, Beichen
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [50] Layer-wise hint-based training for knowledge transfer in a teacher-student framework
    Bae, Ji-Hoon
    Yim, Junho
    Kim, Nae-Soo
    Pyo, Cheol-Sig
    Kim, Junmo
    ETRI JOURNAL, 2019, 41 (02) : 242 - 253