Accelerating DNN Inference with Heterogeneous Multi-DPU Engines

被引:0
|
作者
Du, Zelin [1 ,3 ]
Zhang, Wei [1 ,2 ]
Zhou, Zimeng [1 ,2 ]
Shao, Zili [3 ]
Ju, Lei [1 ,2 ]
机构
[1] Shandong Univ, Sch Cyber Sci & Technol, Qingdao, Peoples R China
[2] Quan Cheng Lab, Jinan, Peoples R China
[3] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong, Peoples R China
来源
2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC | 2023年
关键词
FPGA; CNN accelerator; DPU; pipeline;
D O I
10.1109/DAC56929.2023.10247793
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Deep Learning Processor (DPU) programmable engine released by the official Xilinx Vitis AI toolchain has become one of the commercial off-the-shelf (COTS) solutions for Convolutional Neural Networks (CNNs) inference on Xilinx FPGAs. While modern FPGA devices generally have enough hardware resources to accommodate multi-DPUs simultaneously, the Xilinx toolchain currently only supports the deployment of multiple homogeneous DPUs engines that running independent inference tasks (task-level parallelism). In this work, we demonstrate that deployment of multiple heterogeneous DPU engines makes better resource efficiency for a given FPGA device. Moreover, we show that pipelined execution of a CNN inference task over heterogeneous multi-DPU engines may further improve overall inference throughput with carefully designed CNN layers-to-DPU mapping and scheduling. Finally, for a given CNN model and an FPGA device, we propose a comprehensive framework that automatically determines the optimal heterogeneous DPU deployment, and adaptively chooses the execution scheme between task-level and pipelined parallelism. Compared with the state-of-the-art solution with homogeneous multi-DPU engines and network-level parallelism, the proposed framework shows an average improvement of 13% (up-to 19%) and 6.6% (up-to 10%) on the Xilinx Zynq UltraScale+ MPSoC ZCU104 and ZCU102 platforms, respectively.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Memory-aware and context-aware multi-DNN inference on the edge
    Cox, Bart
    Birke, Robert
    Chen, Lydia Y.
    PERVASIVE AND MOBILE COMPUTING, 2022, 83
  • [42] A DNN inference acceleration algorithm combining model partition and task allocation in heterogeneous edge computing system
    Shi, Lei
    Xu, Zhigang
    Sun, Yabo
    Shi, Yi
    Fan, Yuqi
    Ding, Xu
    PEER-TO-PEER NETWORKING AND APPLICATIONS, 2021, 14 (06) : 4031 - 4045
  • [43] MMExit: Enabling Fast and Efficient Multi-modal DNN Inference with Adaptive Network Exits
    Hou, Xiaofeng
    Liu, Jiacheng
    Tang, Xuehan
    Li, Chao
    Cheng, Kwang-Ting
    Li, Li
    Guo, Minyi
    EURO-PAR 2023: PARALLEL PROCESSING, 2023, 14100 : 426 - 440
  • [44] OmniBoost: Boosting Throughput of Heterogeneous Embedded Devices under Multi-DNN Workload
    Karatzas, Andreas
    Anagnostopoulos, Iraklis
    2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [45] Collaborative non-chain DNN inference with multi-device based on layer parallel
    Zhang, Qiuping
    Sun, Sheng
    Luo, Junjie
    Liu, Min
    Li, Zhongcheng
    Yang, Huan
    Wang, Yuwei
    DIGITAL COMMUNICATIONS AND NETWORKS, 2024, 10 (06) : 1748 - 1759
  • [46] Precision-aware Latency and Energy Balancing on Multi-Accelerator Platforms for DNN Inference
    Risso, Matteo
    Burrello, Alessio
    Sarda, Giuseppe Maria
    Benini, Luca
    Macii, Enrico
    Poncino, Massimo
    Verhelst, Marian
    Pagliari, Daniele Jahier
    2023 IEEE/ACM INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN, ISLPED, 2023,
  • [47] AdaEE: Adaptive Early-Exit DNN Inference Through Multi-Armed Bandits
    Pacheco, Roberto G.
    Shifrin, Mark
    Couto, Rodrigo S.
    Menasche, Daniel S.
    Hanawal, Manjesh K.
    Campista, Miguel Elias M.
    ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 3726 - 3731
  • [48] Collaborative non-chain DNN inference with multi-device based on layer parallel
    Qiuping Zhang
    Sheng Sun
    Junjie Luo
    Min Liu
    Zhongcheng Li
    Huan Yang
    Yuwei Wang
    Digital Communications and Networks, 2024, 10 (06) : 1748 - 1759
  • [49] Efficient CUDA stream management for multi-DNN real-time inference on embedded GPUs
    Pang, Weiguang
    Luo, Xiantong
    Chen, Kailun
    Ji, Dong
    Qiao, Lei
    Yi, Wang
    JOURNAL OF SYSTEMS ARCHITECTURE, 2023, 139
  • [50] Multi-Compression Scale DNN Inference Acceleration based on Cloud-Edge-End Collaboration
    Qi, Huamei
    Ren, Fang
    Wang, Leilei
    Jiang, Ping
    Wan, Shaohua
    Deng, Xiaoheng
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2024, 23 (01)