Adaptive Kernel Merge and Fusion for Multi-Tenant Inference in Embedded GPUs

被引:0
|
作者
Jeon, Jaebeom [1 ]
Koo, Gunjae [2 ]
Yoon, Myung Kuk [3 ]
Oh, Yunho [1 ]
机构
[1] Korea Univ, Sch Elect Engn, Seoul 02857, South Korea
[2] Korea Univ, Dept Comp Sci & Engn, Seoul 02857, South Korea
[3] Ewha Womans Univ, Dept Comp Sci & Engn, Seoul 03760, South Korea
基金
新加坡国家研究基金会;
关键词
Kernel; Throughput; Graphics processing units; Delays; Task analysis; Software; Switches; Embedded graphics processing unit (GPU); inference; multitenancy;
D O I
10.1109/LES.2024.3351753
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This letter proposes a new scheme that improves throughput and reduces queuing delay while running multiple inferences in embedded graphics processing unit (GPU)-based systems. We observe that an embedded system runs inference with a fixed number of deep learning models and that inference requests often use the same model. Unlike prior work that proposed kernel fusion or scheduling techniques, this letter proposes a new software technique that merges and fuses kernels by monitoring the requests in a queue. The proposed technique first monitors a fixed number of requests and groups the requests running the same model. Then, it creates the kernels that iteratively process the grouped requests. We call such a technique kernel merging. After that, the proposed technique performs kernel fusion with merged kernels. Eventually, our idea minimizes the number of concurrent kernels, thus mitigating stalls caused by frequent context switching in a GPU. In our evaluation, the proposed kernel merge and fusion achieve $2.7\times $ better throughput, 47% shorter average kernel execution time, and 63% shorter tail latency than prior work.
引用
收藏
页码:421 / 424
页数:4
相关论文
共 46 条
  • [41] Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution
    Lin, Junxiong
    Wang, Yan
    Tao, Zeng
    Wang, Boyang
    Zhao, Qing
    Wang, Haorang
    Tong, Xuan
    Mai, Xinji
    Lin, Yuxuan
    Song, Wei
    Yu, Jiawen
    Yan, Shaoqi
    Zhang, Wenqiang
    COMPUTER VISION - ECCV 2024, PT LII, 2025, 15110 : 363 - 380
  • [42] Knowledge-embedded multi-layer collaborative adaptive fusion network: Addressing challenges in foggy conditions and complex imaging
    Chen, Zhu
    Li, Fan
    Diao, Yueqin
    Zhao, Wanlong
    Fan, Puyin
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (10)
  • [43] Multi-sensor information fusion for remaining useful life prediction of machining tools by adaptive network based fuzzy inference system
    Wu, Jun
    Su, Yongheng
    Cheng, Yiwei
    Shao, Xinyu
    Deng, Chao
    Liu, Cheng
    APPLIED SOFT COMPUTING, 2018, 68 : 13 - 23
  • [44] Concentration measurement of three-phase flow based on multi-sensor data fusion using adaptive fuzzy inference system
    Wang, Xiaoxin
    Hu, Hongli
    Zhang, Aimin
    FLOW MEASUREMENT AND INSTRUMENTATION, 2014, 39 : 1 - 8
  • [45] Multi-scale kernel Fisher discriminant analysis with adaptive neuro-fuzzy inference system (ANFIS) in fault detection and diagnosis framework for chemical process systems
    Nor, Norazwan Md
    Hussain, Mohd Azlan
    Hassan, Che Rosmani Che
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (13): : 9283 - 9297
  • [46] Multi-scale kernel Fisher discriminant analysis with adaptive neuro-fuzzy inference system (ANFIS) in fault detection and diagnosis framework for chemical process systems
    Norazwan Md Nor
    Mohd Azlan Hussain
    Che Rosmani Che Hassan
    Neural Computing and Applications, 2020, 32 : 9283 - 9297