GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching

被引:1
|
作者
Guo, Cong [1 ]
Zhang, Rui [2 ]
Xu, Jiale [1 ]
Leng, Jingwen [1 ]
Liu, Zihan [1 ]
Huang, Ziyu [1 ]
Guo, Minyi [1 ]
Wu, Hao [2 ]
Zhao, Shouren [2 ]
Zhao, Junping [2 ]
Zhang, Ke [2 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai Qi Zhi Inst, Shanghai, Peoples R China
[2] Ant Grp, Hangzhou, Peoples R China
来源
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, ASPLOS 2024, VOL 2 | 2024年
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Memory Defragmentation; GPU; Deep Learning; Virtual Memory Stitching;
D O I
10.1145/3620665.3640423
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large-scale deep neural networks (DNNs), such as large language models (LLMs), have revolutionized the artificial intelligence (AI) field and become increasingly popular. However, training or fine-tuning such models requires substantial computational power and resources, where the memory capacity of a single acceleration device like a GPU is one of the most important bottlenecks. Owing to the prohibitively large overhead (e.g., 10x) of GPUs' native memory allocator, DNN frameworks like PyTorch and TensorFlow adopt a caching allocator that maintains a memory pool with a splitting mechanism for fast memory (de)allocation. Unfortunately, the caching allocator's efficiency degrades quickly for popular memory reduction techniques such as recomputation, offloading, distributed training, and low-rank adaptation. The primary reason is that those memory reduction techniques introduce frequent and irregular memory (de)allocation requests, leading to severe fragmentation problems for the splitting-based caching allocator. To mitigate this fragmentation problem, we propose a novel memory allocation framework based on low-level GPU virtual memory management called GPU memory lake (GMLake). GMLake employs a novel virtual memory stitching (VMS) mechanism, which can fuse or combine non-contiguous memory blocks with a virtual memory address mapping. GMLake can reduce average of 9.2 GB (up to 25 GB) GPU memory usage and 15% (up to 33%) fragmentation among eight LLM models on GPU A100 with 80 GB memory. GMLake is completely transparent to the DNN models and memory reduction techniques and ensures the seamless execution of resource-intensive deep-learning tasks. We have opensourced GMLake at https://github.com/intelligent-machinelearning/glake/tree/main/GMLake.
引用
收藏
页码:450 / 466
页数:17
相关论文
共 50 条
  • [11] A Novel Memory-Optimized Approach for Large-scale Peridynamics on the GPU
    Bartlett J.
    Storti D.
    Journal of Peridynamics and Nonlocal Modeling, 2023, 5 (4) : 472 - 490
  • [12] Individual Differences in Spatial Memory for a Large-Scale Virtual Environment
    Saxon, Morgan A.
    Fernberg, Phillip
    Creem-Regehr, Sarah H.
    Stefanucci, Jeanine K.
    Chamberlain, Brent
    COGNITIVE PROCESSING, 2021, 22 (SUPPL 1) : 7 - 8
  • [13] Efficient large scale Image Stitching for Virtual Microscopy
    Steckhan, Dirk
    Bergen, Tobias
    Wittenberg, Thomas
    Rupp, Stephan
    2008 30TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-8, 2008, : 4019 - 4023
  • [14] TSPLIT: Fine-grained GPU Memory Management for Efficient DNN Training via Tensor Splitting
    Nie, Xiaonan
    Miao, Xupeng
    Yang, Zhi
    Cui, Bin
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 2615 - 2628
  • [15] Large-scale Memory of Sequences using Binary Sparse Neural Networks on GPU
    Marques, Max Raphael Sobroza
    Hacene, Ghouthi Boukli
    Lassance, Carlos Eduardo Rosar Kos
    Horrein, Pierre-Henri
    2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 553 - 559
  • [16] Enabling Efficient Large-Scale Deep Learning Training with Cache Coherent Disaggregated Memory Systems
    Wang, Zixuan
    Sim, Joonseop
    Lim, Euicheol
    Zhao, Jishen
    2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022), 2022, : 126 - 140
  • [17] Memory efficient large-scale image-based localization
    Guoyu Lu
    Nicu Sebe
    Congfu Xu
    Chandra Kambhamettu
    Multimedia Tools and Applications, 2015, 74 : 479 - 503
  • [18] Memory-Efficient Learning for Large-Scale Computational Imaging
    Kellman, Michael
    Zhang, Kevin
    Markley, Eric
    Tamir, Jon
    Bostan, Emrah
    Lustig, Michael
    Waller, Laura
    IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2020, 6 : 1403 - 1414
  • [19] Memory-efficient detection of large-scale obfuscated malware
    Wang Y.
    Zhang M.
    International Journal of Wireless and Mobile Computing, 2024, 26 (01) : 48 - 60
  • [20] Memory efficient large-scale image-based localization
    Lu, Guoyu
    Sebe, Nicu
    Xu, Congfu
    Kambhamettu, Chandra
    MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (02) : 479 - 503