Adaptive Kernel Merge and Fusion for Multi-Tenant Inference in Embedded GPUs

被引：0

作者：

Jeon, Jaebeom ^{[1
]}

Koo, Gunjae ^{[2
]}

Yoon, Myung Kuk ^{[3
]}

Oh, Yunho ^{[1
]}

机构：

[1] Korea Univ, Sch Elect Engn, Seoul 02857, South Korea

[2] Korea Univ, Dept Comp Sci & Engn, Seoul 02857, South Korea

[3] Ewha Womans Univ, Dept Comp Sci & Engn, Seoul 03760, South Korea

来源：

IEEE EMBEDDED SYSTEMS LETTERS | 2024年 / 16卷 / 04期

基金：

新加坡国家研究基金会;

关键词：

Kernel; Throughput; Graphics processing units; Delays; Task analysis; Software; Switches; Embedded graphics processing unit (GPU); inference; multitenancy;

D O I：

10.1109/LES.2024.3351753

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

This letter proposes a new scheme that improves throughput and reduces queuing delay while running multiple inferences in embedded graphics processing unit (GPU)-based systems. We observe that an embedded system runs inference with a fixed number of deep learning models and that inference requests often use the same model. Unlike prior work that proposed kernel fusion or scheduling techniques, this letter proposes a new software technique that merges and fuses kernels by monitoring the requests in a queue. The proposed technique first monitors a fixed number of requests and groups the requests running the same model. Then, it creates the kernels that iteratively process the grouped requests. We call such a technique kernel merging. After that, the proposed technique performs kernel fusion with merged kernels. Eventually, our idea minimizes the number of concurrent kernels, thus mitigating stalls caused by frequent context switching in a GPU. In our evaluation, the proposed kernel merge and fusion achieve $2.7\times $ better throughput, 47% shorter average kernel execution time, and 63% shorter tail latency than prior work.

引用

页码：421 / 424

页数：4

共 46 条

[21] Delen: Enabling Flexible and Adaptive Model-serving for Multi-tenant Edge AI
Liang, Qianlin
Hanafy, Walid A.
Bashir, Noman
Ali-Eldin, Ahmed
Irwin, David
Shenoy, Prashant
PROCEEDINGS 8TH ACM/IEEE CONFERENCE ON INTERNET OF THINGS DESIGN AND IMPLEMENTATION, IOTDI 2023, 2023, : 209 - 221
[22] Voltage Noise-Based Adversarial Attacks on Machine Learning Inference in Multi-Tenant FPGA Accelerators
Majumdar, Saikat
Teodorescu, Radu
2024 IEEE INTERNATIONAL SYMPOSIUM ON HARDWARE ORIENTED SECURITY AND TRUST, HOST, 2024, : 80 - 85
[23] Towards Application-centric Fairness in Multi-Tenant Clouds with Adaptive CPU Sharing Model
Ayodele, Anthony O.
Rao, Jia
Boult, Terrance E.
PROCEEDINGS OF 2016 IEEE 9TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2016, : 367 - 375
[24] Self-Adaptive Auxiliary Cube for Multi-Tenant Slicing in Multi-Layer Computing Power Networks
Ma, Huangxu
Zhang, Jiawei
Gu, Zhiqun
Raj, Rishu
Kilper, Daniel C.
Ji, Yuefeng
JOURNAL OF LIGHTWAVE TECHNOLOGY, 2025, 43 (08) : 3663 - 3684
[25] LTSS: Load-Adaptive Traffic Steering and Forwarding for Security Services in Multi-Tenant Cloud Datacenters
Du, Xue-Kai
Lu, Zhi-Hui
Duan, Qiang
Wu, Jie
Wu, Cheng-Rong
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2017, 32 (06) : 1265 - 1278
[26] LTSS: Load-Adaptive Traffic Steering and Forwarding for Security Services in Multi-Tenant Cloud Datacenters
Xue-Kai Du
Zhi-Hui Lu
Qiang Duan
Jie Wu
Cheng-Rong Wu
Journal of Computer Science and Technology, 2017, 32 : 1265 - 1278
[27] Octopus: SLO-Aware Progressive Inference Serving via Deep Reinforcement Learning in Multi-tenant Edge Cluster
Zhang, Ziyang
Zhao, Yang
Liu, Jie
SERVICE-ORIENTED COMPUTING, ICSOC 2023, PT II, 2023, 14420 : 242 - 258
[28] VELTAIR: Towards High-Performance Multi-tenant Deep Learning Services via Adaptive Compilation and Scheduling
Liu, Zihan
Leng, Jingwen
Zhang, Zhihui
Chen, Quan
Li, Chao
Guo, Minyi
ASPLOS '22: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, 2022, : 388 - 401
[29] WA-OPShare: Workload-Adaptive Over-Provisioning Space Allocation for Multi-Tenant SSDs
Wen, Yuhong
Zhou, You
Wu, Fei
Li, Shu
Wang, Zhenghong
Xie, Changsheng
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (11) : 4527 - 4538
[30] Adaptive Kernel Kalman Filter Multi-Sensor Fusion
Sun, Mengwei
Davies, Michael E.
Hopgood, James R.
Proudler, Ian
2021 IEEE 24TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2021, : 1005 - 1012

← 1 2 3 4 5 →