Adaptive Kernel Merge and Fusion for Multi-Tenant Inference in Embedded GPUs

被引:0
|
作者
Jeon, Jaebeom [1 ]
Koo, Gunjae [2 ]
Yoon, Myung Kuk [3 ]
Oh, Yunho [1 ]
机构
[1] Korea Univ, Sch Elect Engn, Seoul 02857, South Korea
[2] Korea Univ, Dept Comp Sci & Engn, Seoul 02857, South Korea
[3] Ewha Womans Univ, Dept Comp Sci & Engn, Seoul 03760, South Korea
基金
新加坡国家研究基金会;
关键词
Kernel; Throughput; Graphics processing units; Delays; Task analysis; Software; Switches; Embedded graphics processing unit (GPU); inference; multitenancy;
D O I
10.1109/LES.2024.3351753
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This letter proposes a new scheme that improves throughput and reduces queuing delay while running multiple inferences in embedded graphics processing unit (GPU)-based systems. We observe that an embedded system runs inference with a fixed number of deep learning models and that inference requests often use the same model. Unlike prior work that proposed kernel fusion or scheduling techniques, this letter proposes a new software technique that merges and fuses kernels by monitoring the requests in a queue. The proposed technique first monitors a fixed number of requests and groups the requests running the same model. Then, it creates the kernels that iteratively process the grouped requests. We call such a technique kernel merging. After that, the proposed technique performs kernel fusion with merged kernels. Eventually, our idea minimizes the number of concurrent kernels, thus mitigating stalls caused by frequent context switching in a GPU. In our evaluation, the proposed kernel merge and fusion achieve $2.7\times $ better throughput, 47% shorter average kernel execution time, and 63% shorter tail latency than prior work.
引用
收藏
页码:421 / 424
页数:4
相关论文
共 46 条
  • [31] RETRACTED: Reinforcement learning-based controller for adaptive workflow scheduling in multi-tenant cloud computing (Retracted Article)
    Kumar, D. Suresh
    Kannan, R. Jagadeesh
    INTERNATIONAL JOURNAL OF ELECTRICAL ENGINEERING EDUCATION, 2020,
  • [32] A Broker Based Architecture for Adaptive Load Balancing and Elastic Resource Provisioning and Deprovisioning in Multi-tenant Based Cloud Environments
    Somasundaram, Thamarai Selvi
    Govindarajan, Kannan
    Rajagopalan, M. R.
    Rao, S. Madhusudhana
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, 2013, 174 : 561 - 573
  • [33] Efficient CUDA stream management for multi-DNN real-time inference on embedded GPUs
    Pang, Weiguang
    Luo, Xiantong
    Chen, Kailun
    Ji, Dong
    Qiao, Lei
    Yi, Wang
    JOURNAL OF SYSTEMS ARCHITECTURE, 2023, 139
  • [34] ElasticRoom: Multi-Tenant DNN Inference Engine via Co-design with Resource-constrained Compilation and Strong Priority Scheduling
    Ma, Lixian
    Chen, Haoruo
    Shao, En
    Wang, Leping
    Chen, Quan
    Tan, Guangming
    PROCEEDINGS OF THE 33RD INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, HPDC 2024, 2024,
  • [35] Enhancing Bandwidth Efficiency and Upstream Delay Reduction in Multi-Tenant Passive Optical Networks through Adaptive Bandwidth Allocation and Merging
    Memon, Kamran Ali
    Hussain, Khalid
    Qureshi, Khurram Karim
    2024 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CCECE 2024, 2024, : 735 - 739
  • [36] Multi-Feature Fusion and Adaptive Kernel Combination for SAR Image Classification
    Wu, Xiaoying
    Wen, Xianbin
    Xu, Haixia
    Yuan, Liming
    Guo, Changlun
    APPLIED SCIENCES-BASEL, 2021, 11 (04): : 1 - 23
  • [37] Scale Adaptive Kernel Correlation Filter Tracker with Multi-feature Fusion
    Tao, Qiang
    Zuo, Tao
    Lin, Yunhan
    2019 2ND INTERNATIONAL CONFERENCE ON INTELLIGENT AUTONOMOUS SYSTEMS (ICOIAS 2019), 2019, : 96 - 101
  • [38] Learning Rate Adaptive Kernel Correlation Filter Tracking Algorithm Combined with Multi-feature Fusion
    Wang, Chengzhao
    Yu, Qingsong
    Sun, Jun
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (CSSE 2019), 2019,
  • [39] Multi-Scale Kernel Correlation Filter Algorithm for Visual Tracking Based on the Fusion of Adaptive Features
    Chen Faling
    Ding Qinghai
    Chang Zheng
    Chen Hongyu
    Luo Haibo
    Hui Bin
    Liu Yunpeng
    ACTA OPTICA SINICA, 2020, 40 (03)
  • [40] Robust Object Tracking Using Adaptive Multi-Features Fusion based on Local Kernel Learning
    Zhao, Hainan
    Wang, Xuan
    2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014), 2014, : 333 - 336