Grape: Practical and Efficient Graph-based Executions for Dynamic Deep Neural Networks on GPUs

被引:0
|
作者
Zheng, Bojian [1 ]
Yu, Cody Hao [2 ]
Wang, Jie [2 ]
Ding, Yaoyao [1 ]
Liu, Yizhi [2 ]
Wang, Yida [2 ]
Pekhimenko, Gennady [1 ,3 ]
机构
[1] Univ Toronto, CentML, Toronto, ON, Canada
[2] Amazon, Santa Clara, CA USA
[3] Vector Inst, Toronto, ON, Canada
基金
加拿大自然科学与工程研究理事会; 加拿大创新基金会;
关键词
machine learning compilers; CUDA graphs; dynamic neural networks;
D O I
10.1145/3613424.3614248
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Achieving high performance in machine learning workloads is a crucial yet difficult task. To achieve high runtime performance on hardware platforms such as GPUs, graph-based executions such as CUDA graphs are often used to eliminate CPU runtime overheads by submitting jobs in the granularity of multiple kernels. However, many machine learning workloads, especially dynamic deep neural networks (DNNs) with varying-sized inputs or datadependent control flows, face challenges when directly using CUDA graphs to achieve optimal performance. We observe that the use of graph-based executions poses three key challenges in terms of efficiency and even practicability: (1) Extra data movements when copying input values to graphs' placeholders. (2) High GPU memory consumption due to the numerous CUDA graphs created to efficiently support dynamic-shape workloads. (3) Inability to handle data-dependent control flows. To address those challenges, we propose Grape, a new graph compiler that enables practical and efficient graph-based executions for dynamic DNNs on GPUs. Grape comprises three key components: (1) an alias predictor that automatically removes extra data movements by leveraging code positions at the Python frontend, (2) a metadata compressor that efficiently utilizes the data redundancy in CUDA graphs' memory regions by compressing them, and (3) a predication rewriter that safely replaces control flows with predication contexts while preserving programs' semantics. The three components improve the efficiency and broaden the optimization scope of graph-based executions while allowing machine learning practitioners to program dynamic DNNs at the Python level with minimal source code changes. We evaluate Grape on state-of-the-art text generation (GPT-2, GPT-J) and speech recognition (Wav2Vec2) workloads, which include both training and inference, using real systems with modern GPUs. Our evaluation shows that Grape achieves up to 36.43x less GPU memory consumption and up to 1.26x better performance than prior works on graph-based executions that directly use CUDA graphs. Furthermore, Grape can optimize workloads that are impractical for prior works due to the three key challenges, achieving 1.78x and 1.82x better performance on GPT-J and Wav2Vec2 respectively than the original implementations that do not use graph-based executions.
引用
收藏
页码:1364 / 1380
页数:17
相关论文
共 50 条
  • [41] Graph-Based Audio Classification Using Pre-Trained Models and Graph Neural Networks
    Castro-Ospina, Andres Eduardo
    Solarte-Sanchez, Miguel Angel
    Vega-Escobar, Laura Stella
    Isaza, Claudia
    Martinez-Vargas, Juan David
    SENSORS, 2024, 24 (07)
  • [42] Efficient Graph-Based Image Segmentation
    Pedro F. Felzenszwalb
    Daniel P. Huttenlocher
    International Journal of Computer Vision, 2004, 59 : 167 - 181
  • [43] Performance of Training Sparse Deep Neural Networks on GPUs
    Wang, Jianzong
    Huang, Zhangcheng
    Kong, Lingwei
    Xiao, Jing
    Wang, Pengyu
    Zhang, Lu
    Li, Chao
    2019 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2019,
  • [44] Detailed Characterization of Deep Neural Networks on GPUs and FPGAs
    Karki, Aajna
    Keshava, Chethan Palangotu
    Shivakumar, Spoorthi Mysore
    Skow, Joshua
    Hegde, Goutam Madhukeshwar
    Jeon, Hyeran
    12TH WORKSHOP ON GENERAL PURPOSE PROCESSING USING GPUS (GPGPU 12), 2019, : 12 - 21
  • [45] Efficient graph-based image segmentation
    Felzenszwalb, PF
    Huttenlocher, DP
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2004, 59 (02) : 167 - 181
  • [46] AN EFFICIENT GRAPH-BASED VISUAL RERANKING
    Huang, Chong
    Dong, Yuan
    Bai, Hongliang
    Wang, Lezi
    Zhao, Nan
    Cen, Shusheng
    Zhao, Jian
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 1671 - 1675
  • [47] Dynamic graph-based software fingerprinting
    Collberg, Christian S.
    Thomborson, Clark
    Townsend, Gregg M.
    ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 2007, 29 (06):
  • [48] Dynamic Graph-Based Malware Classifier
    Jazi, Hossein Hadian
    Ghorbani, Ali A.
    2016 14TH ANNUAL CONFERENCE ON PRIVACY, SECURITY AND TRUST (PST), 2016,
  • [49] HatchEnsemble: an efficient and practical uncertainty quantification method for deep neural networks
    Yufeng Xia
    Jun Zhang
    Tingsong Jiang
    Zhiqiang Gong
    Wen Yao
    Ling Feng
    Complex & Intelligent Systems, 2021, 7 : 2855 - 2869
  • [50] Graph-Based Global Reasoning Networks
    Chen, Yunpeng
    Rohrbach, Marcus
    Yan, Zhicheng
    Yan, Shuicheng
    Feng, Jiashi
    Kalantidis, Yannis
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 433 - 442