Accelerating DNN Inference with Heterogeneous Multi-DPU Engines

被引:0
|
作者
Du, Zelin [1 ,3 ]
Zhang, Wei [1 ,2 ]
Zhou, Zimeng [1 ,2 ]
Shao, Zili [3 ]
Ju, Lei [1 ,2 ]
机构
[1] Shandong Univ, Sch Cyber Sci & Technol, Qingdao, Peoples R China
[2] Quan Cheng Lab, Jinan, Peoples R China
[3] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong, Peoples R China
来源
2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC | 2023年
关键词
FPGA; CNN accelerator; DPU; pipeline;
D O I
10.1109/DAC56929.2023.10247793
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Deep Learning Processor (DPU) programmable engine released by the official Xilinx Vitis AI toolchain has become one of the commercial off-the-shelf (COTS) solutions for Convolutional Neural Networks (CNNs) inference on Xilinx FPGAs. While modern FPGA devices generally have enough hardware resources to accommodate multi-DPUs simultaneously, the Xilinx toolchain currently only supports the deployment of multiple homogeneous DPUs engines that running independent inference tasks (task-level parallelism). In this work, we demonstrate that deployment of multiple heterogeneous DPU engines makes better resource efficiency for a given FPGA device. Moreover, we show that pipelined execution of a CNN inference task over heterogeneous multi-DPU engines may further improve overall inference throughput with carefully designed CNN layers-to-DPU mapping and scheduling. Finally, for a given CNN model and an FPGA device, we propose a comprehensive framework that automatically determines the optimal heterogeneous DPU deployment, and adaptively chooses the execution scheme between task-level and pipelined parallelism. Compared with the state-of-the-art solution with homogeneous multi-DPU engines and network-level parallelism, the proposed framework shows an average improvement of 13% (up-to 19%) and 6.6% (up-to 10%) on the Xilinx Zynq UltraScale+ MPSoC ZCU104 and ZCU102 platforms, respectively.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] CoEdge: Cooperative DNN Inference With Adaptive Workload Partitioning Over Heterogeneous Edge Devices
    Zeng, Liekang
    Chen, Xu
    Zhou, Zhi
    Yang, Lei
    Zhang, Junshan
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2021, 29 (02) : 595 - 608
  • [32] Jily: Cost-Aware AutoScaling of Heterogeneous GPU for DNN Inference in Public Cloud
    Wang, Zhaoxing
    Tang, Xuehai
    Liu, Qiuyang
    Han, Jizhong
    2019 IEEE 38TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2019,
  • [33] Sub-Word Parallel Precision-Scalable MAC Engines for Efficient Embedded DNN Inference
    Mei, Linyan
    Dandekar, Mohit
    Rodopoulos, Dimitrios
    Constantin, Jeremy
    Debacker, Peter
    Lauwereins, Rudy
    Verhelst, Marian
    2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2019), 2019, : 6 - 10
  • [34] Efficient Single- and Multi-DNN Inference Using TensorRT Framework
    Zhdanovskiy, Vyacheslav
    Teplyakov, Lev
    Belyaev, Philipp
    SIXTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION, ICMV 2023, 2024, 13072
  • [35] Joint Optimization of Device Placement and Model Partitioning for Cooperative DNN Inference in Heterogeneous Edge Computing
    Dai, Penglin
    Han, Biao
    Li, Ke
    Xu, Xincao
    Xing, Huanlai
    Liu, Kai
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2025, 24 (01) : 210 - 226
  • [36] Multi-Exit DNN Inference Acceleration Based on Multi-Dimensional Optimization for Edge Intelligence
    Dong, Fang
    Wang, Huitian
    Shen, Dian
    Huang, Zhaowu
    He, Qiang
    Zhang, Jinghui
    Wen, Liangsheng
    Zhang, Tingting
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2023, 22 (09) : 5389 - 5405
  • [37] On Accelerating Multi-Layered Heterogeneous Network Embedding Learning
    Shuai, Hong-Han
    Tsai, Cheng-Ming
    Hsu, Yun-Jui
    Hsiao, Ta-Che
    2018 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2018,
  • [38] Heterogeneous Accelerator Design for Multi-DNN Workloads via Heuristic Optimization
    Balaskas, Konstantinos
    Khdr, Heba
    Sikal, Mohammed Bakr
    Kreb, Fabian
    Siozios, Kostas
    Becker, Jurgen
    Henkel, Jorg
    IEEE EMBEDDED SYSTEMS LETTERS, 2024, 16 (04) : 317 - 320
  • [39] A DNN inference acceleration algorithm combining model partition and task allocation in heterogeneous edge computing system
    Lei Shi
    Zhigang Xu
    Yabo Sun
    Yi Shi
    Yuqi Fan
    Xu Ding
    Peer-to-Peer Networking and Applications, 2021, 14 : 4031 - 4045
  • [40] Automated Runtime-Aware Scheduling for Multi-Tenant DNN Inference on GPU
    Yu, Fuxun
    Bray, Shawn
    Wang, Di
    Shangguan, Longfei
    Tang, Xulong
    Liu, Chenchen
    Chen, Xiang
    2021 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN (ICCAD), 2021,