Coordinated Batching and DVFS for DNN Inference on GPU Accelerators

被引:35
|
作者
Nabavinejad, Seyed Morteza [1 ]
Reda, Sherief [2 ]
Ebrahimi, Masoumeh [3 ]
机构
[1] Inst Res Fundamental Sci IPM, Sch Comp Sci, Tehran 1953833511, Iran
[2] Brown Univ, Sch Engn, Providence, RI 02912 USA
[3] KTH Royal Inst Technol, S-11428 Stockholm, Sweden
基金
瑞典研究理事会;
关键词
Throughput; Graphics processing units; Power demand; Runtime; Bayes methods; Resource management; Optimization; Deep neural networks; GPU accelerator; power consumption; throughput; batch size; dynamic voltage frequency scaling;
D O I
10.1109/TPDS.2022.3144614
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Employing hardware accelerators to improve the performance and energy-efficiency of DNN applications is on the rise. One challenge of using hardware accelerators, including the GPU-based ones, is that their performance is limited by internal and external factors, such as power caps. A common approach to meet the power cap constraint is using the Dynamic Voltage Frequency Scaling (DVFS) technique. However, the functionally of this technique is limited and platform-dependent. To tackle this challenge, we propose a new control knob, which is the size of input batches fed to the GPU accelerator in DNN inference applications. We first evaluate the impact of batch size on power consumption and performance of DNN inference. Then, we introduce the design and implementation of a fast and lightweight runtime system, called BatchDVFS. Dynamic batching is implemented in BatchDVFS to adaptively change the batch size, and hence, trade-off throughput with power consumption. It employs an approach based on binary search to find the proper batch size within a short period of time. Combining dynamic batching with the DVFS technique, BatchDVFS can control the power consumption in wider ranges, and hence, yield higher throughput in the presence of power caps. To find near-optimal solution for long-running jobs that can afford a relatively significant profiling overhead, compared with BatchDVFS overhead, we also design an approach, called BOBD, that employs Bayesian Optimization to wisely explore the vast state space resulted by combination of the batch size and DVFS solutions. Conducting several experiments using a modern GPU and several DNN models and input datasets, we show that our BatchDVFS can significantly surpass the techniques solely based on DVFS or batching, regarding throughput (up to 11.2x and 2.2x, respectively), while successfully meeting the power cap.
引用
收藏
页码:2496 / 2508
页数:13
相关论文
共 50 条
  • [31] Analog Weights in ReRAM DNN Accelerators
    Eshraghian, Jason K.
    Kang, Sung-Mo
    Baek, Seungbum
    Orchard, Garrick
    Iu, Herbert Ho-Ching
    Lei, Wen
    2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2019), 2019, : 267 - 271
  • [32] SECDA-TFLite: A toolkit for efficient development of FPGA-based DNN accelerators for edge inference
    Haris, Jude
    Gibson, Perry
    Cano, Jose
    Agostini, Nicolas Bohm
    Kaeli, David
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2023, 173 : 140 - 151
  • [33] Rapid Emulation of Approximate DNN Accelerators
    Farahbakhsh, Amirreza
    Hosseini, Seyedmehdi
    Kachuee, Sajjad
    Sharilkhani, Mohammad
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [34] A Case for Emerging Memories in DNN Accelerators
    Mukherjee, Avilash
    Saurav, Kumar
    Nair, Prashant
    Shekhar, Sudip
    Lis, Mieszko
    PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021), 2021, : 938 - 941
  • [35] Control Variate Approximation for DNN Accelerators
    Zervakis, Georgios
    Spantidi, Ourania
    Anagnostopoulos, Iraklis
    Amrouch, Hussam
    Henkel, Joerg
    2021 58TH ACM/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2021, : 481 - 486
  • [36] Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU
    Zhao, Zhihe
    Ling, Neiwen
    Guan, Nan
    Xing, Guoliang
    PROCEEDINGS OF THE 21ST ACM CONFERENCE ON EMBEDDED NETWORKED SENSOR SYSTEMS, SENSYS 2023, 2023, : 97 - 110
  • [37] Guaranteeing That Multilevel Prioritized DNN Models on an Embedded GPU Have Inference Performance Proportional to Respective Priorities
    Kim, Myungsun
    IEEE EMBEDDED SYSTEMS LETTERS, 2022, 14 (02) : 83 - 86
  • [38] Aaron: Compile-time Kernel Adaptation for Multi-DNN Inference Acceleration on Edge GPU
    Zhao, Zhihe
    Ling, Neiwen
    Guan, Nan
    Xing, Guoliang
    PROCEEDINGS OF THE TWENTIETH ACM CONFERENCE ON EMBEDDED NETWORKED SENSOR SYSTEMS, SENSYS 2022, 2022, : 802 - 803
  • [39] A survey and measurement study of GPU DVFS on energy conservation
    Mei, Xinxin
    Wang, Qiang
    Chu, Xiaowen
    DIGITAL COMMUNICATIONS AND NETWORKS, 2017, 3 (02) : 89 - 100
  • [40] Enabling High-Performance DNN Inference Accelerators using Non-Volatile Analog Memory (Invited)
    Chen, An
    Ambrogio, Stefano
    Narayanan, Pritish
    Tsai, Hsinyu
    Mackin, Charles
    2020 IEEE ELECTRON DEVICES TECHNOLOGY AND MANUFACTURING CONFERENCE (EDTM 2020), 2020,