Adaptive Kernel Merge and Fusion for Multi-Tenant Inference in Embedded GPUs

被引:0
|
作者
Jeon, Jaebeom [1 ]
Koo, Gunjae [2 ]
Yoon, Myung Kuk [3 ]
Oh, Yunho [1 ]
机构
[1] Korea Univ, Sch Elect Engn, Seoul 02857, South Korea
[2] Korea Univ, Dept Comp Sci & Engn, Seoul 02857, South Korea
[3] Ewha Womans Univ, Dept Comp Sci & Engn, Seoul 03760, South Korea
基金
新加坡国家研究基金会;
关键词
Kernel; Throughput; Graphics processing units; Delays; Task analysis; Software; Switches; Embedded graphics processing unit (GPU); inference; multitenancy;
D O I
10.1109/LES.2024.3351753
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This letter proposes a new scheme that improves throughput and reduces queuing delay while running multiple inferences in embedded graphics processing unit (GPU)-based systems. We observe that an embedded system runs inference with a fixed number of deep learning models and that inference requests often use the same model. Unlike prior work that proposed kernel fusion or scheduling techniques, this letter proposes a new software technique that merges and fuses kernels by monitoring the requests in a queue. The proposed technique first monitors a fixed number of requests and groups the requests running the same model. Then, it creates the kernels that iteratively process the grouped requests. We call such a technique kernel merging. After that, the proposed technique performs kernel fusion with merged kernels. Eventually, our idea minimizes the number of concurrent kernels, thus mitigating stalls caused by frequent context switching in a GPU. In our evaluation, the proposed kernel merge and fusion achieve $2.7\times $ better throughput, 47% shorter average kernel execution time, and 63% shorter tail latency than prior work.
引用
收藏
页码:421 / 424
页数:4
相关论文
共 46 条
  • [1] Looking Beyond GPUs for DNN Scheduling on Multi-Tenant Clusters
    Mohan, Jayashree
    Phanishayee, Amar
    Kulkarni, Janardhan
    Chidambaram, Vijay
    PROCEEDINGS OF THE 16TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, OSDI 2022, 2022, : 579 - 596
  • [2] Multi-tenant virtual GPUs for optimising performance of a financial risk application
    Prades, Javier
    Varghese, Blesson
    Reano, Carlos
    Silla, Federico
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2017, 108 : 28 - 44
  • [3] Adaptive Performance Isolation Middleware for Multi-tenant SaaS
    Walraven, Stefan
    De Borger, Wouter
    Vanbrabant, Bart
    Lagaisse, Bert
    Van Landuyt, Dimitri
    Joosen, Wouter
    2015 IEEE/ACM 8TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2015, : 112 - 121
  • [4] Adaptive Purchase Option for Multi-Tenant Data Center
    Zhan, Yong
    Xu, Du
    Yang, Huiran
    Tang, Mi
    Peng, Shuping
    Simeonidou, Dimitra
    2015 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2015, : 358 - 363
  • [5] Load adaptive merging algorithm for multi-tenant PON environments
    Mohammadani K.H.
    Butt R.A.
    Memon K.A.
    Hussaini N.N.
    Shaikh A.
    Optical Switching and Networking, 2023, 47
  • [6] Adaptive task scheduling method in multi-tenant cloud computing
    Ramegowda A.
    Agarkhed J.
    Patil S.R.
    International Journal of Information Technology, 2020, 12 (4) : 1093 - 1102
  • [7] Adaptive Database Schema Design for Multi-Tenant Data Management
    Ni, Jiacai
    Li, Guoliang
    Wang, Lijun
    Feng, Jianhua
    Zhang, Jun
    Li, Lei
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (09) : 2079 - 2093
  • [8] Gemini: Enabling Multi-Tenant GPU Sharing Based on Kernel Burst Estimation
    Chen, Hung-Hsin
    Lin, En-Te
    Chou, Yu-Min
    Chou, Jerry
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2023, 11 (01) : 854 - 867
  • [9] Automated Runtime-Aware Scheduling for Multi-Tenant DNN Inference on GPU
    Yu, Fuxun
    Bray, Shawn
    Wang, Di
    Shangguan, Longfei
    Tang, Xulong
    Liu, Chenchen
    Chen, Xiang
    2021 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN (ICCAD), 2021,
  • [10] Adaptive virtual machine assignment for multi-tenant data center networks
    Suzuki, Takaya
    Kimura, Tomotaka
    Hirata, Kouji
    Muraguchi, Masahiro
    2015 INTERNATIONAL CONFERENCE ON COMPUTER, INFORMATION AND TELECOMMUNICATION SYSTEMS (CITS), 2015,