HarmonyBatch: Batching multi-SLO DNN Inference with Heterogeneous Serverless Functions

被引：0

作者：

Chen, Jiabin ^{[1
]}

Xu, Fei ^{[1
]}

Gu, Yikun ^{[1
]}

Chen, Li ^{[2
]}

Liu, Fangming ^{[3
]}

Zhou, Zhi ^{[4
]}

机构：

[1] East China Normal Univ, Shanghai Key Lab Multidimens Infonnat Proc, Shanghai, Peoples R China

[2] Univ Louisiana Lafayette, Lafayette, LA 70504 USA

[3] Peng Cheng Lab, Shenzhen, Peoples R China

[4] Sun Yat Sen Univ, Guangzhou, Peoples R China

来源：

2024 IEEE/ACM 32ND INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE, IWQOS | 2024年

关键词：

serverless computing; resource provisioning; DNN inference; SLO guarantee;

D O I：

10.1109/IWQoS61813.2024.10682915

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep Neural Network (DNN) inference on serverless functions is gaining prominence due to its potential for substantial budget savings. Existing works on serverless DNN inference solely optimize batching requests from one application with a single Service Level Objective (SLO) on CPU functions. However, production serverless DNN inference traces indicate that the request arrival rate of applications is surprisingly low, which inevitably causes a long batching time and SLO violations. Hence, there is an urgent need for batching multiple DNN inference requests with diverse SLOs (i.e., multi-SLO DNN inference) in serverless platforms. Moreover, the potential performance and cost benefits of deploying heterogeneous (i.e., CPU and GPU) functions for DNN inference have received scant attention. In this paper, we present HarmonyBatch, a cost-efficient resource provisioning framework designed to achieve predictable performance for multi-SLO DNN inference with heterogeneous serverless functions. Specifically, we construct an analytical performance and cost model of DNN inference on both CPU and GPU functions, by explicitly considering the GPU time-slicing scheduling mechanism and request arrival rate distribution. Based on such a model, we devise a two-stage merging strategy in HarmonyBatch to judiciously batch the multi-SLO DNN inference requests into application groups. It aims to minimize the budget of function provisioning for each application group while guaranteeing diverse performance SLOs of inference applications. We have implemented a prototype of HarmonyBatch on Alibaba Cloud Function Compute. Extensive prototype experiments with representative DNN inference workloads demonstrate that HarmonyBatch can provide predictable performance to serverless DNN inference workloads while reducing the monetary cost by up to 82:9% compared to the state-of-the-art methods.

引用

页数：10

共 5 条

[1] Cost-Efficient Serverless Inference Serving with Joint Batching and Multi-Processing
Cai, Shen
Zhou, Zhi
Zhao, Kongyange
Chen, Xu
PROCEEDINGS OF THE 14TH ACM SIGOPS ASIA-PACIFIC WORKSHOP ON SYSTEMS, APSYS 2023, 2023, : 43 - 49
[2] Accelerating DNN Inference with Heterogeneous Multi-DPU Engines
Du, Zelin
Zhang, Wei
Zhou, Zimeng
Shao, Zili
Ju, Lei
2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
[3] Multi-exit DNN inference acceleration for intelligent terminal with heterogeneous processors
Zhang, Jinghui
Xin, Weilong
Lv, Dingyang
Wang, Jiawei
Cai, Guangxing
Dong, Fang
SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2023, 40
[4] EdgeSP: Scalable Multi-device Parallel DNN Inference on Heterogeneous Edge Clusters
Gao, Zhipeng
Sun, Shan
Zhang, Yinghan
Mo, Zijia
Zhao, Chen
ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2021, PT II, 2022, 13156 : 317 - 333
[5] CARIn: Constraint-Aware and Responsive Inference on Heterogeneous Devices for Single- and Multi-DNN Workloads
Panopoulos, Ioannis
Venieris, Stylianos
Venieris, Iakovos
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2024, 23 (04)

← 1 →