Efficient Use of GPU Memory for Large-Scale Deep Learning Model Training

被引：13

作者：

Choi, Hyeonseong ^{[1
]}

Lee, Jaehwan ^{[1
]}

机构：

[1] Korea Aerosp Univ, Sch Elect & Informat Engn, Goyang Si 10540, South Korea

来源：

APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 21期

基金：

新加坡国家研究基金会;

关键词：

deep learning; large-scale model; CUDA Unified Memory; PyTorch;

D O I：

10.3390/app112110377

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

To achieve high accuracy when performing deep learning, it is necessary to use a large-scale training model. However, due to the limitations of GPU memory, it is difficult to train large-scale training models within a single GPU. NVIDIA introduced a technology called CUDA Unified Memory with CUDA 6 to overcome the limitations of GPU memory by virtually combining GPU memory and CPU memory. In addition, in CUDA 8, memory advise options are introduced to efficiently utilize CUDA Unified Memory. In this work, we propose a newly optimized scheme based on CUDA Unified Memory to efficiently use GPU memory by applying different memory advise to each data type according to access patterns in deep learning training. We apply CUDA Unified Memory technology to PyTorch to see the performance of large-scale learning models through the expanded GPU memory. We conduct comprehensive experiments on how to efficiently utilize Unified Memory by applying memory advises when performing deep learning. As a result, when the data used for deep learning are divided into three types and a memory advise is applied to the data according to the access pattern, the deep learning execution time is reduced by 9.4% compared to the default Unified Memory.

引用

页数：17

共 50 条

[31] Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training
Zhao, Mark
Agarwal, Niket
Basant, Aarti
Gedik, Bugra
Pan, Satadru
Ozdal, Mustafa
Komuravelli, Rakesh
Pan, Jerry
Bao, Tianshu
Lu, Haowei
Narayanan, Sundaram
Langman, Jack
Wilfong, Kevin
Rastogi, Harsha
Wu, Carole-Jean
Kozyrakis, Christos
Pol, Parik
PROCEEDINGS OF THE 2022 THE 49TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '22), 2022, : 1042 - 1057
[32] Large-scale transport simulation by deep learning
Jie Pan
Nature Computational Science, 2021, 1 : 306 - 306
[33] Tractable large-scale deep reinforcement learning
Sarang, Nima
Poullis, Charalambos
COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 232
[34] Large-scale transport simulation by deep learning
Pan, Jie
NATURE COMPUTATIONAL SCIENCE, 2021, 1 (05): : 306 - 306
[35] The three pillars of large-scale deep learning
Hoefler, Torsten
2021 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2021, : 908 - 908
[36] Learning Deep Representation with Large-scale Attributes
Ouyang, Wanli
Li, Hongyang
Zeng, Xingyu
Wang, Xiaogang
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1895 - 1903
[37] Large-scale Pollen Recognition with Deep Learning
de Geus, Andre R.
Barcelos, Celia A. Z.
Batista, Marcos A.
da Silva, Sergio F.
2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
[38] Deep Learning on Large-scale Muticore Clusters
Sakiyama, Kazumasa
Kato, Shinpei
Ishikawa, Yutaka
Hori, Atsushi
Monrroy, Abraham
2018 30TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2018), 2018, : 314 - 321
[39] Predicting Statistics of Asynchronous SGD Parameters for a Large-Scale Distributed Deep Learning System on GPU Supercomputers
Oyama, Yosuke
Nomura, Akihiro
Sato, Ikuro
Nishimura, Hiroki
Tamatsu, Yukimasa
Matsuoka, Satoshi
2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 66 - 75
[40] Efficient Machine Learning On Large-Scale Graphs
Erickson, Parker
Lee, Victor E.
Shi, Feng
Tang, Jiliang
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4788 - 4789

← 1 2 3 4 5 →