Efficient Use of GPU Memory for Large-Scale Deep Learning Model Training

被引:13
|
作者
Choi, Hyeonseong [1 ]
Lee, Jaehwan [1 ]
机构
[1] Korea Aerosp Univ, Sch Elect & Informat Engn, Goyang Si 10540, South Korea
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 21期
基金
新加坡国家研究基金会;
关键词
deep learning; large-scale model; CUDA Unified Memory; PyTorch;
D O I
10.3390/app112110377
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
To achieve high accuracy when performing deep learning, it is necessary to use a large-scale training model. However, due to the limitations of GPU memory, it is difficult to train large-scale training models within a single GPU. NVIDIA introduced a technology called CUDA Unified Memory with CUDA 6 to overcome the limitations of GPU memory by virtually combining GPU memory and CPU memory. In addition, in CUDA 8, memory advise options are introduced to efficiently utilize CUDA Unified Memory. In this work, we propose a newly optimized scheme based on CUDA Unified Memory to efficiently use GPU memory by applying different memory advise to each data type according to access patterns in deep learning training. We apply CUDA Unified Memory technology to PyTorch to see the performance of large-scale learning models through the expanded GPU memory. We conduct comprehensive experiments on how to efficiently utilize Unified Memory by applying memory advises when performing deep learning. As a result, when the data used for deep learning are divided into three types and a memory advise is applied to the data according to the access pattern, the deep learning execution time is reduced by 9.4% compared to the default Unified Memory.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] On Efficient Training of Large-Scale Deep Learning Models
    Shen, Li
    Sun, Yan
    Yu, Zhiyuan
    Ding, Liang
    Tian, Xinmei
    Tao, Dacheng
    ACM COMPUTING SURVEYS, 2025, 57 (03)
  • [2] Enabling Efficient Large-Scale Deep Learning Training with Cache Coherent Disaggregated Memory Systems
    Wang, Zixuan
    Sim, Joonseop
    Lim, Euicheol
    Zhao, Jishen
    2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022), 2022, : 126 - 140
  • [3] GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching
    Guo, Cong
    Zhang, Rui
    Xu, Jiale
    Leng, Jingwen
    Liu, Zihan
    Huang, Ziyu
    Guo, Minyi
    Wu, Hao
    Zhao, Shouren
    Zhao, Junping
    Zhang, Ke
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, ASPLOS 2024, VOL 2, 2024, : 450 - 466
  • [4] Efficient Large-scale Deep Learning Framework for Heterogeneous Multi-GPU Cluster
    Kim, Youngrang
    Choi, Hyeonseong
    Lee, Jaehwan
    Kim, Jik-Soo
    Jei, Hyunseung
    Roh, Hongchan
    2019 IEEE 4TH INTERNATIONAL WORKSHOPS ON FOUNDATIONS AND APPLICATIONS OF SELF* SYSTEMS (FAS*W 2019), 2019, : 176 - 181
  • [5] Efficient MPI-AllReduce for large-scale deep learning on GPU-clusters
    Truong Thao Nguyen
    Wahib, Mohamed
    Takano, Ryousei
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (12):
  • [6] Training large-scale language models with limited GPU memory: a survey
    Yu TANG
    Linbo QIAO
    Lujia YIN
    Peng LIANG
    Ao SHEN
    Zhilin YANG
    Lizhi ZHANG
    Dongsheng LI
    Frontiers of Information Technology & Electronic Engineering, 2025, 26 (03) : 309 - 331
  • [7] Training large-scale language models with limited GPU memory: a survey
    Tang, Yu
    Qiao, Linbo
    Yin, Lujia
    Liang, Peng
    Shen, Ao
    Yang, Zhilin
    Zhang, Lizhi
    Li, Dongsheng
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2025, : 309 - 331
  • [8] Resource-efficient Federated Learning for Large-scale Model Training
    Song, Zilin
    Li, Zhengze
    Yuan, Tingting
    Fu, Xiaoming
    PROCEEDINGS OF THE WORKSHOP ON MOBILITY IN THE EVOLVING INTERNET ARCHITECTURE TO BE HELD IN CONJUNCTION WITH MOBICOM 2024, MOBIARCH 2024, 2024, : 43 - 48
  • [9] Large-Scale Semi-Supervised Training in Deep Learning Acoustic Model for ASR
    Long, Yanhua
    Li, Yijie
    Wei, Shuang
    Zhang, Qiaozheng
    Yang, Chunxia
    IEEE ACCESS, 2019, 7 : 133615 - 133627
  • [10] Memory-Efficient Learning for Large-Scale Computational Imaging
    Kellman, Michael
    Zhang, Kevin
    Markley, Eric
    Tamir, Jon
    Bostan, Emrah
    Lustig, Michael
    Waller, Laura
    IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2020, 6 : 1403 - 1414