Grape: Practical and Efficient Graph-based Executions for Dynamic Deep Neural Networks on GPUs

被引:0
|
作者
Zheng, Bojian [1 ]
Yu, Cody Hao [2 ]
Wang, Jie [2 ]
Ding, Yaoyao [1 ]
Liu, Yizhi [2 ]
Wang, Yida [2 ]
Pekhimenko, Gennady [1 ,3 ]
机构
[1] Univ Toronto, CentML, Toronto, ON, Canada
[2] Amazon, Santa Clara, CA USA
[3] Vector Inst, Toronto, ON, Canada
基金
加拿大自然科学与工程研究理事会; 加拿大创新基金会;
关键词
machine learning compilers; CUDA graphs; dynamic neural networks;
D O I
10.1145/3613424.3614248
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Achieving high performance in machine learning workloads is a crucial yet difficult task. To achieve high runtime performance on hardware platforms such as GPUs, graph-based executions such as CUDA graphs are often used to eliminate CPU runtime overheads by submitting jobs in the granularity of multiple kernels. However, many machine learning workloads, especially dynamic deep neural networks (DNNs) with varying-sized inputs or datadependent control flows, face challenges when directly using CUDA graphs to achieve optimal performance. We observe that the use of graph-based executions poses three key challenges in terms of efficiency and even practicability: (1) Extra data movements when copying input values to graphs' placeholders. (2) High GPU memory consumption due to the numerous CUDA graphs created to efficiently support dynamic-shape workloads. (3) Inability to handle data-dependent control flows. To address those challenges, we propose Grape, a new graph compiler that enables practical and efficient graph-based executions for dynamic DNNs on GPUs. Grape comprises three key components: (1) an alias predictor that automatically removes extra data movements by leveraging code positions at the Python frontend, (2) a metadata compressor that efficiently utilizes the data redundancy in CUDA graphs' memory regions by compressing them, and (3) a predication rewriter that safely replaces control flows with predication contexts while preserving programs' semantics. The three components improve the efficiency and broaden the optimization scope of graph-based executions while allowing machine learning practitioners to program dynamic DNNs at the Python level with minimal source code changes. We evaluate Grape on state-of-the-art text generation (GPT-2, GPT-J) and speech recognition (Wav2Vec2) workloads, which include both training and inference, using real systems with modern GPUs. Our evaluation shows that Grape achieves up to 36.43x less GPU memory consumption and up to 1.26x better performance than prior works on graph-based executions that directly use CUDA graphs. Furthermore, Grape can optimize workloads that are impractical for prior works due to the three key challenges, achieving 1.78x and 1.82x better performance on GPT-J and Wav2Vec2 respectively than the original implementations that do not use graph-based executions.
引用
收藏
页码:1364 / 1380
页数:17
相关论文
共 50 条
  • [31] A Graph-Based Semi-Supervised PolSAR Image Classification Method Using Deep Convolutional Neural Networks
    Wei Z.-Q.
    Bi H.-X.
    Liu X.
    Bi, Hai-Xia (bhxwzq@163.com), 1600, Chinese Institute of Electronics (48): : 66 - 74
  • [32] Graph-based ahead monitoring of vulnerabilities in large dynamic transportation networks
    Furno, Angelo
    El Faouzi, Nour-Eddin
    Sharma, Rajesh
    Zimeo, Eugenio
    PLOS ONE, 2021, 16 (03):
  • [33] Dynamic Graph-based Multi-cell Scheduling for Femtocell Networks
    Pateromichelakis, Emmanouil
    Shariat, Mehrdad
    Ul Quddus, Atta
    Tafazolli, Rahim
    2012 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE WORKSHOPS (WCNCW), 2012, : 98 - 102
  • [34] Graph-Based Modeling and Analysis of Dynamic Flows in Steam Supply Networks
    Hoshino, Hikaru
    Susuki, Yoshihiko
    2015 54TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2015, : 1358 - 1363
  • [35] A Practical Tutorial on Graph Neural Networks
    Ward, Isaac Ronald
    Joyner, Jack
    Lickfold, Casey
    Guo, Yulan
    Bennamoun, Mohammed
    ACM COMPUTING SURVEYS, 2022, 54 (10S)
  • [36] Graph-based Neural Sentence Ordering
    Yin, Yongjing
    Song, Linfeng
    Su, Jinsong
    Zeng, Jiali
    Zhou, Chulun
    Luo, Jiebo
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 5387 - 5393
  • [37] Practical Attacks Against Graph-based Clustering
    Chen, Yizheng
    Nadji, Yacin
    Kountouras, Athanasios
    Monrose, Fabian
    Perdisci, Roberto
    Antonakakis, Manos
    Vasiloglou, Nikolaos
    CCS'17: PROCEEDINGS OF THE 2017 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2017, : 1125 - 1142
  • [38] Efficient Graph-Based Document Similarity
    Paul, Christian
    Rettinger, Achim
    Mogadala, Aditya
    Knoblock, Craig A.
    Szekely, Pedro
    SEMANTIC WEB: LATEST ADVANCES AND NEW DOMAINS, 2016, 9678 : 334 - 349
  • [39] Graph-based explanations of tau forecasting for Alzheimer's disease using graph neural networks
    Balaji, Vibha
    Song, Tzu-An
    Yang, Fan
    Johnson, Keith
    Dutta, Joyita
    JOURNAL OF NUCLEAR MEDICINE, 2023, 64
  • [40] Graph-Based Data Representation and Prediction in Medical Domain Tasks Using Graph Neural Networks
    Sofiia, Vdovkina
    Ilya, Derevitskii
    Levon, Abramyan
    Aleksandra, Vatian
    COMPUTATIONAL SCIENCE, ICCS 2024, PT IV, 2024, 14835 : 371 - 378