GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching

被引：1

作者：

Guo, Cong ^{[1
]}

Zhang, Rui ^{[2
]}

Xu, Jiale ^{[1
]}

Leng, Jingwen ^{[1
]}

Liu, Zihan ^{[1
]}

Huang, Ziyu ^{[1
]}

Guo, Minyi ^{[1
]}

Wu, Hao ^{[2
]}

Zhao, Shouren ^{[2
]}

Zhao, Junping ^{[2
]}

Zhang, Ke ^{[2
]}

机构：

[1] Shanghai Jiao Tong Univ, Shanghai Qi Zhi Inst, Shanghai, Peoples R China

[2] Ant Grp, Hangzhou, Peoples R China

来源：

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, ASPLOS 2024, VOL 2 | 2024年

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Memory Defragmentation; GPU; Deep Learning; Virtual Memory Stitching;

D O I：

10.1145/3620665.3640423

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large-scale deep neural networks (DNNs), such as large language models (LLMs), have revolutionized the artificial intelligence (AI) field and become increasingly popular. However, training or fine-tuning such models requires substantial computational power and resources, where the memory capacity of a single acceleration device like a GPU is one of the most important bottlenecks. Owing to the prohibitively large overhead (e.g., 10x) of GPUs' native memory allocator, DNN frameworks like PyTorch and TensorFlow adopt a caching allocator that maintains a memory pool with a splitting mechanism for fast memory (de)allocation. Unfortunately, the caching allocator's efficiency degrades quickly for popular memory reduction techniques such as recomputation, offloading, distributed training, and low-rank adaptation. The primary reason is that those memory reduction techniques introduce frequent and irregular memory (de)allocation requests, leading to severe fragmentation problems for the splitting-based caching allocator. To mitigate this fragmentation problem, we propose a novel memory allocation framework based on low-level GPU virtual memory management called GPU memory lake (GMLake). GMLake employs a novel virtual memory stitching (VMS) mechanism, which can fuse or combine non-contiguous memory blocks with a virtual memory address mapping. GMLake can reduce average of 9.2 GB (up to 25 GB) GPU memory usage and 15% (up to 33%) fragmentation among eight LLM models on GPU A100 with 80 GB memory. GMLake is completely transparent to the DNN models and memory reduction techniques and ensures the seamless execution of resource-intensive deep-learning tasks. We have opensourced GMLake at https://github.com/intelligent-machinelearning/glake/tree/main/GMLake.

引用

页码：450 / 466

页数：17

共 50 条

[11] A Novel Memory-Optimized Approach for Large-scale Peridynamics on the GPU
Bartlett J.
Storti D.
Journal of Peridynamics and Nonlocal Modeling, 2023, 5 (4) : 472 - 490
[12] Individual Differences in Spatial Memory for a Large-Scale Virtual Environment
Saxon, Morgan A.
Fernberg, Phillip
Creem-Regehr, Sarah H.
Stefanucci, Jeanine K.
Chamberlain, Brent
COGNITIVE PROCESSING, 2021, 22 (SUPPL 1) : 7 - 8
[13] Efficient large scale Image Stitching for Virtual Microscopy
Steckhan, Dirk
Bergen, Tobias
Wittenberg, Thomas
Rupp, Stephan
2008 30TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-8, 2008, : 4019 - 4023
[14] TSPLIT: Fine-grained GPU Memory Management for Efficient DNN Training via Tensor Splitting
Nie, Xiaonan
Miao, Xupeng
Yang, Zhi
Cui, Bin
2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 2615 - 2628
[15] Large-scale Memory of Sequences using Binary Sparse Neural Networks on GPU
Marques, Max Raphael Sobroza
Hacene, Ghouthi Boukli
Lassance, Carlos Eduardo Rosar Kos
Horrein, Pierre-Henri
2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 553 - 559
[16] Enabling Efficient Large-Scale Deep Learning Training with Cache Coherent Disaggregated Memory Systems
Wang, Zixuan
Sim, Joonseop
Lim, Euicheol
Zhao, Jishen
2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022), 2022, : 126 - 140
[17] Memory efficient large-scale image-based localization
Guoyu Lu
Nicu Sebe
Congfu Xu
Chandra Kambhamettu
Multimedia Tools and Applications, 2015, 74 : 479 - 503
[18] Memory-Efficient Learning for Large-Scale Computational Imaging
Kellman, Michael
Zhang, Kevin
Markley, Eric
Tamir, Jon
Bostan, Emrah
Lustig, Michael
Waller, Laura
IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2020, 6 : 1403 - 1414
[19] Memory-efficient detection of large-scale obfuscated malware
Wang Y.
Zhang M.
International Journal of Wireless and Mobile Computing, 2024, 26 (01) : 48 - 60
[20] Memory efficient large-scale image-based localization
Lu, Guoyu
Sebe, Nicu
Xu, Congfu
Kambhamettu, Chandra
MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (02) : 479 - 503

← 1 2 3 4 5 →