GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching

被引:1
|
作者
Guo, Cong [1 ]
Zhang, Rui [2 ]
Xu, Jiale [1 ]
Leng, Jingwen [1 ]
Liu, Zihan [1 ]
Huang, Ziyu [1 ]
Guo, Minyi [1 ]
Wu, Hao [2 ]
Zhao, Shouren [2 ]
Zhao, Junping [2 ]
Zhang, Ke [2 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai Qi Zhi Inst, Shanghai, Peoples R China
[2] Ant Grp, Hangzhou, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Memory Defragmentation; GPU; Deep Learning; Virtual Memory Stitching;
D O I
10.1145/3620665.3640423
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large-scale deep neural networks (DNNs), such as large language models (LLMs), have revolutionized the artificial intelligence (AI) field and become increasingly popular. However, training or fine-tuning such models requires substantial computational power and resources, where the memory capacity of a single acceleration device like a GPU is one of the most important bottlenecks. Owing to the prohibitively large overhead (e.g., 10x) of GPUs' native memory allocator, DNN frameworks like PyTorch and TensorFlow adopt a caching allocator that maintains a memory pool with a splitting mechanism for fast memory (de)allocation. Unfortunately, the caching allocator's efficiency degrades quickly for popular memory reduction techniques such as recomputation, offloading, distributed training, and low-rank adaptation. The primary reason is that those memory reduction techniques introduce frequent and irregular memory (de)allocation requests, leading to severe fragmentation problems for the splitting-based caching allocator. To mitigate this fragmentation problem, we propose a novel memory allocation framework based on low-level GPU virtual memory management called GPU memory lake (GMLake). GMLake employs a novel virtual memory stitching (VMS) mechanism, which can fuse or combine non-contiguous memory blocks with a virtual memory address mapping. GMLake can reduce average of 9.2 GB (up to 25 GB) GPU memory usage and 15% (up to 33%) fragmentation among eight LLM models on GPU A100 with 80 GB memory. GMLake is completely transparent to the DNN models and memory reduction techniques and ensures the seamless execution of resource-intensive deep-learning tasks. We have opensourced GMLake at https://github.com/intelligent-machinelearning/glake/tree/main/GMLake.
引用
收藏
页码:450 / 466
页数:17
相关论文
共 50 条
  • [31] Scalable and Memory-Efficient Clustering of Large-Scale Social Networks
    Whang, Joyce Jiyoung
    Sui, Xin
    Dhillon, Inderjit S.
    12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, : 705 - 714
  • [32] Memory-efficient Large-scale Linear Support Vector Machine
    Alrajeh, Abdullah
    Takeda, Akiko
    Niranjan, Mahesan
    SEVENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2014), 2015, 9445
  • [33] Reversals in the large-scale αΩ-dynamo with memory
    Feschenko, L. K.
    Vodinchar, G. M.
    NONLINEAR PROCESSES IN GEOPHYSICS, 2015, 22 (04) : 361 - 369
  • [34] Event memory uniquely predicts memory for large-scale space
    Sargent, Jesse Q.
    Zacks, Jeffrey M.
    Hambrick, David Z.
    Lin, Nan
    MEMORY & COGNITION, 2019, 47 (02) : 212 - 228
  • [35] Event memory uniquely predicts memory for large-scale space
    Jesse Q. Sargent
    Jeffrey M. Zacks
    David Z. Hambrick
    Nan Lin
    Memory & Cognition, 2019, 47 : 212 - 228
  • [36] POSTER: ParGNN: Efficient Training for Large-Scale Graph Neural Network on GPU Clusters
    Li, Shunde
    Gu, Junyu
    Wang, Jue
    Yao, Tiechui
    Liang, Zhiqiang
    Shi, Yumeng
    Li, Shigang
    Xi, Weiting
    Li, Shushen
    Zhou, Chunbao
    Wang, Yangang
    Chi, Xuebin
    PROCEEDINGS OF THE 29TH ACM SIGPLAN ANNUAL SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, PPOPP 2024, 2024, : 469 - 471
  • [37] Fast and Efficient Method for Large-Scale Aerial Image Stitching
    Nam Thanh Pham
    Park, Sihyun
    Park, Chun-Su
    IEEE ACCESS, 2021, 9 : 127852 - 127865
  • [38] A Stencil Framework to Realize Large-scale Computations Beyond Device Memory Capacity on GPU Supercomputers
    Shimokawabe, Takashi
    Endo, Toshio
    Onodera, Naoyuki
    Aoki, Takayuki
    2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2017, : 525 - 529
  • [39] GPU-Accelerated and Memory-Independent Layout Generation for Arbitrarily Large-Scale Metadevices
    Moayed Baharlou, Sina
    Hemayat, Saeed
    Toussaint Jr, Kimani C.
    Ndao, Abdoulaye
    ADVANCED THEORY AND SIMULATIONS, 2024, 7 (01)
  • [40] Memory-Efficient Modeling and Slicing of Large-Scale Adaptive Lattice Structures
    Liu, Shengjun
    Liu, Tao
    Zou, Qiang
    Wang, Weiming
    Doubrovski, Eugeni L.
    Wang, Charlie C. L.
    JOURNAL OF COMPUTING AND INFORMATION SCIENCE IN ENGINEERING, 2021, 21 (06)