SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks

被引：0

作者：

Wang L. ^{[1
]}

Ye J. ^{[2
]}

Zhao Y. ^{[2
]}

Wu W. ^{[3
]}

Li A. ^{[4
]}

Song S.L. ^{[4
]}

Xu Z. ^{[5
]}

Kraska T. ^{[1
,5
]}

机构：

[1] Wang, Linnan

[2] Ye, Jinmian

[3] Zhao, Yiyang

[4] Wu, Wei

[5] Li, Ang

[6] Song, Shuaiwen Leon

[7] Xu, Zenglin

[8] 1,Kraska, Tim

来源：

| 2018年 / Association for Computing Machinery, 2 Penn Plaza, Suite 701, New York, NY 10121-0701, United States卷 / 53期

基金：

中国国家自然科学基金;

关键词：

GPU memory management; neural networks; runtime scheduling;

D O I：

10.1145/3178487.3178491

中图分类号：

学科分类号：

摘要：

Going deeper and wider in neural architectures improves their accuracy, while the limited GPU DRAM places an undesired restriction on the network design domain. Deep Learning (DL) practitioners either need to change to less desired network architectures, or nontrivially dissect a network across multiGPUs. These distract DL practitioners from concentrating on their original machine learning tasks. We present SuperNeurons: a dynamic GPU memory scheduling runtime to enable the network training far beyond the GPU DRAM capacity. SuperNeurons features 3 memory optimizations, Liveness Analysis, Unified Tensor Pool, and Cost-Aware Recomputation; together they effectively reduce the network-wide peak memory usage down to the maximal memory usage among layers. We also address the performance issues in these memory-saving techniques. Given the limited GPU DRAM, SuperNeurons not only provisions the necessary memory for the training, but also dynamically allocates the memory for convolution workspaces to achieve the high performance. Evaluations against Caffe, Torch, MXNet and TensorFlow have demonstrated that SuperNeurons trains at least 3.2432 deeper network than current ones with the leading performance. Particularly, SuperNeurons can train ResNet2500 that has 104 basic network layers on a 12GB K40c. © 2018 ACM.

引用

页码：41 / 53

页数：12

共 50 条

[1] SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks
Wang, Linnan
Ye, Jinmian
Zhao, Yiyang
Wu, Wei
Li, Ang
Song, Shuaiwen Leon
Xu, Zenglin
Kraska, Tim
ACM SIGPLAN NOTICES, 2018, 53 (01) : 41 - 53
[2] Dynamic Memory Management for GPU-based training of Deep Neural Networks
Shriram, S. B.
Garg, Anshuj
Kulkarni, Purushottam
2019 IEEE 33RD INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2019), 2019, : 200 - 209
[3] pommDNN: Performance optimal GPU memory management for deep neural network training
Chen, Weiduo
Dong, Xiaoshe
Chen, Xinhang
Liu, Song
Xia, Qin
Wang, Qiang
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 152 : 160 - 169
[4] AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-deep Neural Networks
Guo, Jinrong
Liu, Wantao
Wang, Wang
Yao, Chunrong
Han, Jizhong
Li, Ruixuan
Lu, Yijun
Hu, Songlin
2019 IEEE 37TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2019), 2019, : 65 - 72
[5] A Scalable GPU-enabled Framework for Training Deep Neural Networks
Del Monte, Bonaventura
Prodan, Radu
2016 2ND INTERNATIONAL CONFERENCE ON GREEN HIGH PERFORMANCE COMPUTING (ICGHPC), 2016,
[6] POSTER: A GPU Memory Efficient Speed-up Scheme for Training Ultra-Deep Neural Networks
Guo, Jinrong
Liu, Wantao
Wang, Wang
Lu, Qu
Hu, Songlin
Han, Jizhong
Li, Ruixuan
PROCEEDINGS OF THE 24TH SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '19), 2019, : 397 - 398
[7] FreeLunch: Compression-based GPU Memory Management for Convolutional Neural Networks
Patel, Shaurya
Liu, Tongping
Guan, Hui
PROCEEDINGS OF MCHPC 2021: WORKSHOP ON MEMORY CENTRIC HIGH PERFORMANCE COMPUTING, 2021, : 1 - 8
[8] Training of deep neural networks for the generation of dynamic movement primitives
Pahic, Rok
Ridge, Barry
Gams, Andrej
Morimoto, Jun
Ude, Ales
NEURAL NETWORKS, 2020, 127 : 121 - 131
[9] Duplo: Lifting Redundant Memory Accesses of Deep Neural Networks for GPU Tensor Cores
Kim, Hyeonjin
Ahn, Sungwoo
Oh, Yunho
Kim, Bogil
Ro, Won Woo
Song, William J.
2020 53RD ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO 2020), 2020, : 725 - 737
[10] Neural Networks Training on Graphics Processing Unit (GPU) Using Dynamic Parallelism (DP)
Hall, Will
Tian, Yun
INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 2, 2023, 543 : 811 - 818

← 1 2 3 4 5 →