Adaptive Kernel Merge and Fusion for Multi-Tenant Inference in Embedded GPUs

被引:0
|
作者
Jeon, Jaebeom [1 ]
Koo, Gunjae [2 ]
Yoon, Myung Kuk [3 ]
Oh, Yunho [1 ]
机构
[1] Korea Univ, Sch Elect Engn, Seoul 02857, South Korea
[2] Korea Univ, Dept Comp Sci & Engn, Seoul 02857, South Korea
[3] Ewha Womans Univ, Dept Comp Sci & Engn, Seoul 03760, South Korea
基金
新加坡国家研究基金会;
关键词
Kernel; Throughput; Graphics processing units; Delays; Task analysis; Software; Switches; Embedded graphics processing unit (GPU); inference; multitenancy;
D O I
10.1109/LES.2024.3351753
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This letter proposes a new scheme that improves throughput and reduces queuing delay while running multiple inferences in embedded graphics processing unit (GPU)-based systems. We observe that an embedded system runs inference with a fixed number of deep learning models and that inference requests often use the same model. Unlike prior work that proposed kernel fusion or scheduling techniques, this letter proposes a new software technique that merges and fuses kernels by monitoring the requests in a queue. The proposed technique first monitors a fixed number of requests and groups the requests running the same model. Then, it creates the kernels that iteratively process the grouped requests. We call such a technique kernel merging. After that, the proposed technique performs kernel fusion with merged kernels. Eventually, our idea minimizes the number of concurrent kernels, thus mitigating stalls caused by frequent context switching in a GPU. In our evaluation, the proposed kernel merge and fusion achieve $2.7\times $ better throughput, 47% shorter average kernel execution time, and 63% shorter tail latency than prior work.
引用
收藏
页码:421 / 424
页数:4
相关论文
共 46 条
  • [21] Delen: Enabling Flexible and Adaptive Model-serving for Multi-tenant Edge AI
    Liang, Qianlin
    Hanafy, Walid A.
    Bashir, Noman
    Ali-Eldin, Ahmed
    Irwin, David
    Shenoy, Prashant
    PROCEEDINGS 8TH ACM/IEEE CONFERENCE ON INTERNET OF THINGS DESIGN AND IMPLEMENTATION, IOTDI 2023, 2023, : 209 - 221
  • [22] Voltage Noise-Based Adversarial Attacks on Machine Learning Inference in Multi-Tenant FPGA Accelerators
    Majumdar, Saikat
    Teodorescu, Radu
    2024 IEEE INTERNATIONAL SYMPOSIUM ON HARDWARE ORIENTED SECURITY AND TRUST, HOST, 2024, : 80 - 85
  • [23] Towards Application-centric Fairness in Multi-Tenant Clouds with Adaptive CPU Sharing Model
    Ayodele, Anthony O.
    Rao, Jia
    Boult, Terrance E.
    PROCEEDINGS OF 2016 IEEE 9TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2016, : 367 - 375
  • [24] Self-Adaptive Auxiliary Cube for Multi-Tenant Slicing in Multi-Layer Computing Power Networks
    Ma, Huangxu
    Zhang, Jiawei
    Gu, Zhiqun
    Raj, Rishu
    Kilper, Daniel C.
    Ji, Yuefeng
    JOURNAL OF LIGHTWAVE TECHNOLOGY, 2025, 43 (08) : 3663 - 3684
  • [25] LTSS: Load-Adaptive Traffic Steering and Forwarding for Security Services in Multi-Tenant Cloud Datacenters
    Du, Xue-Kai
    Lu, Zhi-Hui
    Duan, Qiang
    Wu, Jie
    Wu, Cheng-Rong
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2017, 32 (06) : 1265 - 1278
  • [26] LTSS: Load-Adaptive Traffic Steering and Forwarding for Security Services in Multi-Tenant Cloud Datacenters
    Xue-Kai Du
    Zhi-Hui Lu
    Qiang Duan
    Jie Wu
    Cheng-Rong Wu
    Journal of Computer Science and Technology, 2017, 32 : 1265 - 1278
  • [27] Octopus: SLO-Aware Progressive Inference Serving via Deep Reinforcement Learning in Multi-tenant Edge Cluster
    Zhang, Ziyang
    Zhao, Yang
    Liu, Jie
    SERVICE-ORIENTED COMPUTING, ICSOC 2023, PT II, 2023, 14420 : 242 - 258
  • [28] VELTAIR: Towards High-Performance Multi-tenant Deep Learning Services via Adaptive Compilation and Scheduling
    Liu, Zihan
    Leng, Jingwen
    Zhang, Zhihui
    Chen, Quan
    Li, Chao
    Guo, Minyi
    ASPLOS '22: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, 2022, : 388 - 401
  • [29] WA-OPShare: Workload-Adaptive Over-Provisioning Space Allocation for Multi-Tenant SSDs
    Wen, Yuhong
    Zhou, You
    Wu, Fei
    Li, Shu
    Wang, Zhenghong
    Xie, Changsheng
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (11) : 4527 - 4538
  • [30] Adaptive Kernel Kalman Filter Multi-Sensor Fusion
    Sun, Mengwei
    Davies, Michael E.
    Hopgood, James R.
    Proudler, Ian
    2021 IEEE 24TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2021, : 1005 - 1012