Coordinated Batching and DVFS for DNN Inference on GPU Accelerators

被引:35
|
作者
Nabavinejad, Seyed Morteza [1 ]
Reda, Sherief [2 ]
Ebrahimi, Masoumeh [3 ]
机构
[1] Inst Res Fundamental Sci IPM, Sch Comp Sci, Tehran 1953833511, Iran
[2] Brown Univ, Sch Engn, Providence, RI 02912 USA
[3] KTH Royal Inst Technol, S-11428 Stockholm, Sweden
基金
瑞典研究理事会;
关键词
Throughput; Graphics processing units; Power demand; Runtime; Bayes methods; Resource management; Optimization; Deep neural networks; GPU accelerator; power consumption; throughput; batch size; dynamic voltage frequency scaling;
D O I
10.1109/TPDS.2022.3144614
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Employing hardware accelerators to improve the performance and energy-efficiency of DNN applications is on the rise. One challenge of using hardware accelerators, including the GPU-based ones, is that their performance is limited by internal and external factors, such as power caps. A common approach to meet the power cap constraint is using the Dynamic Voltage Frequency Scaling (DVFS) technique. However, the functionally of this technique is limited and platform-dependent. To tackle this challenge, we propose a new control knob, which is the size of input batches fed to the GPU accelerator in DNN inference applications. We first evaluate the impact of batch size on power consumption and performance of DNN inference. Then, we introduce the design and implementation of a fast and lightweight runtime system, called BatchDVFS. Dynamic batching is implemented in BatchDVFS to adaptively change the batch size, and hence, trade-off throughput with power consumption. It employs an approach based on binary search to find the proper batch size within a short period of time. Combining dynamic batching with the DVFS technique, BatchDVFS can control the power consumption in wider ranges, and hence, yield higher throughput in the presence of power caps. To find near-optimal solution for long-running jobs that can afford a relatively significant profiling overhead, compared with BatchDVFS overhead, we also design an approach, called BOBD, that employs Bayesian Optimization to wisely explore the vast state space resulted by combination of the batch size and DVFS solutions. Conducting several experiments using a modern GPU and several DNN models and input datasets, we show that our BatchDVFS can significantly surpass the techniques solely based on DVFS or batching, regarding throughput (up to 11.2x and 2.2x, respectively), while successfully meeting the power cap.
引用
收藏
页码:2496 / 2508
页数:13
相关论文
共 50 条
  • [1] Accelerating DNN Inference with GraphBLAS and the GPU
    Wang, Xiaoyun
    Lin, Zhongyi
    Yang, Carl
    Owens, John D.
    2019 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2019,
  • [2] Efficient Adaptive Batching of DNN Inference Services for Improved Latency
    Khan, Osama
    Yu, Junyeol
    Kim, Yeonjae
    Seo, Euiseong
    38TH INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING, ICOIN 2024, 2024, : 197 - 200
  • [3] A Deep Investigation on Stealthy DVFS Fault Injection Attacks at DNN Hardware Accelerators
    Xu, Junge
    Zhang, Fan
    Jin, Wenguang
    Yang, Kun
    Wang, Zeke
    Jiang, Weixiong
    Ha, Yajun
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2025, 44 (01) : 39 - 51
  • [4] Balanced Sparsity for Efficient DNN Inference on GPU
    Yao, Zhuliang
    Cao, Shijie
    Xiao, Wencong
    Zhang, Chen
    Nie, Lanshun
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 5676 - 5683
  • [5] Targeting DNN Inference Via Efficient Utilization of Heterogeneous Precision DNN Accelerators
    Spantidi, Ourania
    Zervakis, Georgios
    Alsalamin, Sami
    Roman-Ballesteros, Isai
    Henkel, Joerg
    Amrouch, Hussam
    Anagnostopoulos, Iraklis
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2023, 11 (01) : 112 - 125
  • [6] Performance Characterization of Containerized DNN Training and Inference on Edge Accelerators
    Prashanthi, S. K.
    Hegde, Vinayaka
    Patchava, Keerthana
    Das, Ankita
    Simmhan, Yogesh
    2023 IEEE 30TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS, HIPC 2023, 2023, : 127 - 131
  • [7] TFApprox: Towards a Fast Emulation of DNN Approximate Hardware Accelerators on GPU
    Vaverka, Filip
    Mrazek, Vojtech
    Vasicek, Zdenek
    Sekanina, Lukas
    PROCEEDINGS OF THE 2020 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2020), 2020, : 294 - 297
  • [8] HarmonyBatch: Batching multi-SLO DNN Inference with Heterogeneous Serverless Functions
    Chen, Jiabin
    Xu, Fei
    Gu, Yikun
    Chen, Li
    Liu, Fangming
    Zhou, Zhi
    2024 IEEE/ACM 32ND INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE, IWQOS, 2024,
  • [9] Occamy: Memory-efficient GPU Compiler for DNN Inference
    Lee, Jaeho
    Jeong, Shinnung
    Song, Seungbin
    Kim, Kunwoo
    Choi, Heelim
    Kim, Youngsok
    Kim, Hanjun
    2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [10] Mixed Precision Quantization for ReRAM-based DNN Inference Accelerators
    Huang, Sitao
    Ankit, Aayush
    Silveira, Plinio
    Antunes, Rodrigo
    Chalamalasetti, Sai Rahul
    El Hajj, Izzat
    Kim, Dong Eun
    Aguiar, Glaucimar
    Bruel, Pedro
    Serebryakov, Sergey
    Xu, Cong
    Li, Can
    Faraboschi, Paolo
    Strachan, John Paul
    Chen, Deming
    Roy, Kaushik
    Hwu, Wen-mei
    Milojicic, Dejan
    2021 26TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2021, : 372 - 377