Accelerating DNN Inference with Heterogeneous Multi-DPU Engines

被引：0

作者：

Du, Zelin ^{[1
,3
]}

Zhang, Wei ^{[1
,2
]}

Zhou, Zimeng ^{[1
,2
]}

Shao, Zili ^{[3
]}

Ju, Lei ^{[1
,2
]}

机构：

[1] Shandong Univ, Sch Cyber Sci & Technol, Qingdao, Peoples R China

[2] Quan Cheng Lab, Jinan, Peoples R China

[3] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong, Peoples R China

来源：

2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC | 2023年

关键词：

FPGA; CNN accelerator; DPU; pipeline;

D O I：

10.1109/DAC56929.2023.10247793

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The Deep Learning Processor (DPU) programmable engine released by the official Xilinx Vitis AI toolchain has become one of the commercial off-the-shelf (COTS) solutions for Convolutional Neural Networks (CNNs) inference on Xilinx FPGAs. While modern FPGA devices generally have enough hardware resources to accommodate multi-DPUs simultaneously, the Xilinx toolchain currently only supports the deployment of multiple homogeneous DPUs engines that running independent inference tasks (task-level parallelism). In this work, we demonstrate that deployment of multiple heterogeneous DPU engines makes better resource efficiency for a given FPGA device. Moreover, we show that pipelined execution of a CNN inference task over heterogeneous multi-DPU engines may further improve overall inference throughput with carefully designed CNN layers-to-DPU mapping and scheduling. Finally, for a given CNN model and an FPGA device, we propose a comprehensive framework that automatically determines the optimal heterogeneous DPU deployment, and adaptively chooses the execution scheme between task-level and pipelined parallelism. Compared with the state-of-the-art solution with homogeneous multi-DPU engines and network-level parallelism, the proposed framework shows an average improvement of 13% (up-to 19%) and 6.6% (up-to 10%) on the Xilinx Zynq UltraScale+ MPSoC ZCU104 and ZCU102 platforms, respectively.

引用

页数：6

共 50 条

[31] CoEdge: Cooperative DNN Inference With Adaptive Workload Partitioning Over Heterogeneous Edge Devices
Zeng, Liekang
Chen, Xu
Zhou, Zhi
Yang, Lei
Zhang, Junshan
IEEE-ACM TRANSACTIONS ON NETWORKING, 2021, 29 (02) : 595 - 608
[32] Jily: Cost-Aware AutoScaling of Heterogeneous GPU for DNN Inference in Public Cloud
Wang, Zhaoxing
Tang, Xuehai
Liu, Qiuyang
Han, Jizhong
2019 IEEE 38TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2019,
[33] Sub-Word Parallel Precision-Scalable MAC Engines for Efficient Embedded DNN Inference
Mei, Linyan
Dandekar, Mohit
Rodopoulos, Dimitrios
Constantin, Jeremy
Debacker, Peter
Lauwereins, Rudy
Verhelst, Marian
2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2019), 2019, : 6 - 10
[34] Efficient Single- and Multi-DNN Inference Using TensorRT Framework
Zhdanovskiy, Vyacheslav
Teplyakov, Lev
Belyaev, Philipp
SIXTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION, ICMV 2023, 2024, 13072
[35] Joint Optimization of Device Placement and Model Partitioning for Cooperative DNN Inference in Heterogeneous Edge Computing
Dai, Penglin
Han, Biao
Li, Ke
Xu, Xincao
Xing, Huanlai
Liu, Kai
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2025, 24 (01) : 210 - 226
[36] Multi-Exit DNN Inference Acceleration Based on Multi-Dimensional Optimization for Edge Intelligence
Dong, Fang
Wang, Huitian
Shen, Dian
Huang, Zhaowu
He, Qiang
Zhang, Jinghui
Wen, Liangsheng
Zhang, Tingting
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2023, 22 (09) : 5389 - 5405
[37] On Accelerating Multi-Layered Heterogeneous Network Embedding Learning
Shuai, Hong-Han
Tsai, Cheng-Ming
Hsu, Yun-Jui
Hsiao, Ta-Che
2018 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2018,
[38] Heterogeneous Accelerator Design for Multi-DNN Workloads via Heuristic Optimization
Balaskas, Konstantinos
Khdr, Heba
Sikal, Mohammed Bakr
Kreb, Fabian
Siozios, Kostas
Becker, Jurgen
Henkel, Jorg
IEEE EMBEDDED SYSTEMS LETTERS, 2024, 16 (04) : 317 - 320
[39] A DNN inference acceleration algorithm combining model partition and task allocation in heterogeneous edge computing system
Lei Shi
Zhigang Xu
Yabo Sun
Yi Shi
Yuqi Fan
Xu Ding
Peer-to-Peer Networking and Applications, 2021, 14 : 4031 - 4045
[40] Automated Runtime-Aware Scheduling for Multi-Tenant DNN Inference on GPU
Yu, Fuxun
Bray, Shawn
Wang, Di
Shangguan, Longfei
Tang, Xulong
Liu, Chenchen
Chen, Xiang
2021 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN (ICCAD), 2021,

← 1 2 3 4 5 →