Accelerating DNN Inference with Heterogeneous Multi-DPU Engines

被引:0
|
作者
Du, Zelin [1 ,3 ]
Zhang, Wei [1 ,2 ]
Zhou, Zimeng [1 ,2 ]
Shao, Zili [3 ]
Ju, Lei [1 ,2 ]
机构
[1] Shandong Univ, Sch Cyber Sci & Technol, Qingdao, Peoples R China
[2] Quan Cheng Lab, Jinan, Peoples R China
[3] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong, Peoples R China
来源
2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC | 2023年
关键词
FPGA; CNN accelerator; DPU; pipeline;
D O I
10.1109/DAC56929.2023.10247793
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Deep Learning Processor (DPU) programmable engine released by the official Xilinx Vitis AI toolchain has become one of the commercial off-the-shelf (COTS) solutions for Convolutional Neural Networks (CNNs) inference on Xilinx FPGAs. While modern FPGA devices generally have enough hardware resources to accommodate multi-DPUs simultaneously, the Xilinx toolchain currently only supports the deployment of multiple homogeneous DPUs engines that running independent inference tasks (task-level parallelism). In this work, we demonstrate that deployment of multiple heterogeneous DPU engines makes better resource efficiency for a given FPGA device. Moreover, we show that pipelined execution of a CNN inference task over heterogeneous multi-DPU engines may further improve overall inference throughput with carefully designed CNN layers-to-DPU mapping and scheduling. Finally, for a given CNN model and an FPGA device, we propose a comprehensive framework that automatically determines the optimal heterogeneous DPU deployment, and adaptively chooses the execution scheme between task-level and pipelined parallelism. Compared with the state-of-the-art solution with homogeneous multi-DPU engines and network-level parallelism, the proposed framework shows an average improvement of 13% (up-to 19%) and 6.6% (up-to 10%) on the Xilinx Zynq UltraScale+ MPSoC ZCU104 and ZCU102 platforms, respectively.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Accelerating DNN Inference with GraphBLAS and the GPU
    Wang, Xiaoyun
    Lin, Zhongyi
    Yang, Carl
    Owens, John D.
    2019 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2019,
  • [2] Virtual PIM: Resource-aware Dynamic DPU Allocation and Workload Scheduling Framework for Multi-DPU PIM Architecture
    Kim, Donghyeon
    Kim, Taehoon
    Hwang, Inyong
    Park, Taehyeong
    Kim, Hanjun
    Kim, Youngsok
    Park, Yongjun
    2023 32ND INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT, 2023, : 112 - 123
  • [3] DNN Surgery: Accelerating DNN Inference on the Edge Through Layer Partitioning
    Liang, Huanghuang
    Sang, Qianlong
    Hu, Chuang
    Cheng, Dazhao
    Zhou, Xiaobo
    Wang, Dan
    Bao, Wei
    Wang, Yu
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2023, 11 (03) : 3111 - 3125
  • [4] Accelerating DNN Inference by Edge-Cloud Collaboration
    Chen, Jianan
    Qi, Qi
    Wang, Jingyu
    Sun, Haifeng
    Liao, Jianxin
    2021 IEEE INTERNATIONAL PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE (IPCCC), 2021,
  • [5] Irina: Accelerating DNN Inference with Efficient Online Scheduling
    Wu, Xiaorui
    Xu, Hong
    Wang, Yi
    PROCEEDINGS OF 2020 4TH ASIA-PACIFIC WORKSHOP ON NETWORKING, APNET 2020, 2020, : 36 - 43
  • [6] Multi-exit DNN inference acceleration for intelligent terminal with heterogeneous processors
    Zhang, Jinghui
    Xin, Weilong
    Lv, Dingyang
    Wang, Jiawei
    Cai, Guangxing
    Dong, Fang
    SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2023, 40
  • [7] HarmonyBatch: Batching multi-SLO DNN Inference with Heterogeneous Serverless Functions
    Chen, Jiabin
    Xu, Fei
    Gu, Yikun
    Chen, Li
    Liu, Fangming
    Zhou, Zhi
    2024 IEEE/ACM 32ND INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE, IWQOS, 2024,
  • [8] EdgeSP: Scalable Multi-device Parallel DNN Inference on Heterogeneous Edge Clusters
    Gao, Zhipeng
    Sun, Shan
    Zhang, Yinghan
    Mo, Zijia
    Zhao, Chen
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2021, PT II, 2022, 13156 : 317 - 333
  • [9] Accelerating DNN Inference With Reliability Guarantee in Vehicular Edge Computing
    Liu, Kai
    Liu, Chunhui
    Yan, Guozhi
    Lee, Victor C. S.
    Cao, Jiannong
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2023, 31 (06) : 3238 - 3253
  • [10] Targeting DNN Inference Via Efficient Utilization of Heterogeneous Precision DNN Accelerators
    Spantidi, Ourania
    Zervakis, Georgios
    Alsalamin, Sami
    Roman-Ballesteros, Isai
    Henkel, Joerg
    Amrouch, Hussam
    Anagnostopoulos, Iraklis
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2023, 11 (01) : 112 - 125