Accelerating DNN Inference with Heterogeneous Multi-DPU Engines

被引：0

作者：

Du, Zelin ^{[1
,3
]}

Zhang, Wei ^{[1
,2
]}

Zhou, Zimeng ^{[1
,2
]}

Shao, Zili ^{[3
]}

Ju, Lei ^{[1
,2
]}

机构：

[1] Shandong Univ, Sch Cyber Sci & Technol, Qingdao, Peoples R China

[2] Quan Cheng Lab, Jinan, Peoples R China

[3] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong, Peoples R China

来源：

2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC | 2023年

关键词：

FPGA; CNN accelerator; DPU; pipeline;

D O I：

10.1109/DAC56929.2023.10247793

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The Deep Learning Processor (DPU) programmable engine released by the official Xilinx Vitis AI toolchain has become one of the commercial off-the-shelf (COTS) solutions for Convolutional Neural Networks (CNNs) inference on Xilinx FPGAs. While modern FPGA devices generally have enough hardware resources to accommodate multi-DPUs simultaneously, the Xilinx toolchain currently only supports the deployment of multiple homogeneous DPUs engines that running independent inference tasks (task-level parallelism). In this work, we demonstrate that deployment of multiple heterogeneous DPU engines makes better resource efficiency for a given FPGA device. Moreover, we show that pipelined execution of a CNN inference task over heterogeneous multi-DPU engines may further improve overall inference throughput with carefully designed CNN layers-to-DPU mapping and scheduling. Finally, for a given CNN model and an FPGA device, we propose a comprehensive framework that automatically determines the optimal heterogeneous DPU deployment, and adaptively chooses the execution scheme between task-level and pipelined parallelism. Compared with the state-of-the-art solution with homogeneous multi-DPU engines and network-level parallelism, the proposed framework shows an average improvement of 13% (up-to 19%) and 6.6% (up-to 10%) on the Xilinx Zynq UltraScale+ MPSoC ZCU104 and ZCU102 platforms, respectively.

引用

页数：6

共 50 条

[41] Memory-aware and context-aware multi-DNN inference on the edge
Cox, Bart
Birke, Robert
Chen, Lydia Y.
PERVASIVE AND MOBILE COMPUTING, 2022, 83
[42] A DNN inference acceleration algorithm combining model partition and task allocation in heterogeneous edge computing system
Shi, Lei
Xu, Zhigang
Sun, Yabo
Shi, Yi
Fan, Yuqi
Ding, Xu
PEER-TO-PEER NETWORKING AND APPLICATIONS, 2021, 14 (06) : 4031 - 4045
[43] MMExit: Enabling Fast and Efficient Multi-modal DNN Inference with Adaptive Network Exits
Hou, Xiaofeng
Liu, Jiacheng
Tang, Xuehan
Li, Chao
Cheng, Kwang-Ting
Li, Li
Guo, Minyi
EURO-PAR 2023: PARALLEL PROCESSING, 2023, 14100 : 426 - 440
[44] OmniBoost: Boosting Throughput of Heterogeneous Embedded Devices under Multi-DNN Workload
Karatzas, Andreas
Anagnostopoulos, Iraklis
2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
[45] Collaborative non-chain DNN inference with multi-device based on layer parallel
Zhang, Qiuping
Sun, Sheng
Luo, Junjie
Liu, Min
Li, Zhongcheng
Yang, Huan
Wang, Yuwei
DIGITAL COMMUNICATIONS AND NETWORKS, 2024, 10 (06) : 1748 - 1759
[46] Precision-aware Latency and Energy Balancing on Multi-Accelerator Platforms for DNN Inference
Risso, Matteo
Burrello, Alessio
Sarda, Giuseppe Maria
Benini, Luca
Macii, Enrico
Poncino, Massimo
Verhelst, Marian
Pagliari, Daniele Jahier
2023 IEEE/ACM INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN, ISLPED, 2023,
[47] AdaEE: Adaptive Early-Exit DNN Inference Through Multi-Armed Bandits
Pacheco, Roberto G.
Shifrin, Mark
Couto, Rodrigo S.
Menasche, Daniel S.
Hanawal, Manjesh K.
Campista, Miguel Elias M.
ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 3726 - 3731
[48] Collaborative non-chain DNN inference with multi-device based on layer parallel
Qiuping Zhang
Sheng Sun
Junjie Luo
Min Liu
Zhongcheng Li
Huan Yang
Yuwei Wang
Digital Communications and Networks, 2024, 10 (06) : 1748 - 1759
[49] Efficient CUDA stream management for multi-DNN real-time inference on embedded GPUs
Pang, Weiguang
Luo, Xiantong
Chen, Kailun
Ji, Dong
Qiao, Lei
Yi, Wang
JOURNAL OF SYSTEMS ARCHITECTURE, 2023, 139
[50] Multi-Compression Scale DNN Inference Acceleration based on Cloud-Edge-End Collaboration
Qi, Huamei
Ren, Fang
Wang, Leilei
Jiang, Ping
Wan, Shaohua
Deng, Xiaoheng
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2024, 23 (01)

← 1 2 3 4 5 →