Accelerating DNN Inference with Heterogeneous Multi-DPU Engines

被引：0

作者：

Du, Zelin ^{[1
,3
]}

Zhang, Wei ^{[1
,2
]}

Zhou, Zimeng ^{[1
,2
]}

Shao, Zili ^{[3
]}

Ju, Lei ^{[1
,2
]}

机构：

[1] Shandong Univ, Sch Cyber Sci & Technol, Qingdao, Peoples R China

[2] Quan Cheng Lab, Jinan, Peoples R China

[3] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong, Peoples R China

来源：

2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC | 2023年

关键词：

FPGA; CNN accelerator; DPU; pipeline;

D O I：

10.1109/DAC56929.2023.10247793

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The Deep Learning Processor (DPU) programmable engine released by the official Xilinx Vitis AI toolchain has become one of the commercial off-the-shelf (COTS) solutions for Convolutional Neural Networks (CNNs) inference on Xilinx FPGAs. While modern FPGA devices generally have enough hardware resources to accommodate multi-DPUs simultaneously, the Xilinx toolchain currently only supports the deployment of multiple homogeneous DPUs engines that running independent inference tasks (task-level parallelism). In this work, we demonstrate that deployment of multiple heterogeneous DPU engines makes better resource efficiency for a given FPGA device. Moreover, we show that pipelined execution of a CNN inference task over heterogeneous multi-DPU engines may further improve overall inference throughput with carefully designed CNN layers-to-DPU mapping and scheduling. Finally, for a given CNN model and an FPGA device, we propose a comprehensive framework that automatically determines the optimal heterogeneous DPU deployment, and adaptively chooses the execution scheme between task-level and pipelined parallelism. Compared with the state-of-the-art solution with homogeneous multi-DPU engines and network-level parallelism, the proposed framework shows an average improvement of 13% (up-to 19%) and 6.6% (up-to 10%) on the Xilinx Zynq UltraScale+ MPSoC ZCU104 and ZCU102 platforms, respectively.

引用

页数：6

共 50 条

[1] Accelerating DNN Inference with GraphBLAS and the GPU
Wang, Xiaoyun
Lin, Zhongyi
Yang, Carl
Owens, John D.
2019 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2019,
[2] Virtual PIM: Resource-aware Dynamic DPU Allocation and Workload Scheduling Framework for Multi-DPU PIM Architecture
Kim, Donghyeon
Kim, Taehoon
Hwang, Inyong
Park, Taehyeong
Kim, Hanjun
Kim, Youngsok
Park, Yongjun
2023 32ND INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT, 2023, : 112 - 123
[3] DNN Surgery: Accelerating DNN Inference on the Edge Through Layer Partitioning
Liang, Huanghuang
Sang, Qianlong
Hu, Chuang
Cheng, Dazhao
Zhou, Xiaobo
Wang, Dan
Bao, Wei
Wang, Yu
IEEE TRANSACTIONS ON CLOUD COMPUTING, 2023, 11 (03) : 3111 - 3125
[4] Accelerating DNN Inference by Edge-Cloud Collaboration
Chen, Jianan
Qi, Qi
Wang, Jingyu
Sun, Haifeng
Liao, Jianxin
2021 IEEE INTERNATIONAL PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE (IPCCC), 2021,
[5] Irina: Accelerating DNN Inference with Efficient Online Scheduling
Wu, Xiaorui
Xu, Hong
Wang, Yi
PROCEEDINGS OF 2020 4TH ASIA-PACIFIC WORKSHOP ON NETWORKING, APNET 2020, 2020, : 36 - 43
[6] Multi-exit DNN inference acceleration for intelligent terminal with heterogeneous processors
Zhang, Jinghui
Xin, Weilong
Lv, Dingyang
Wang, Jiawei
Cai, Guangxing
Dong, Fang
SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2023, 40
[7] HarmonyBatch: Batching multi-SLO DNN Inference with Heterogeneous Serverless Functions
Chen, Jiabin
Xu, Fei
Gu, Yikun
Chen, Li
Liu, Fangming
Zhou, Zhi
2024 IEEE/ACM 32ND INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE, IWQOS, 2024,
[8] EdgeSP: Scalable Multi-device Parallel DNN Inference on Heterogeneous Edge Clusters
Gao, Zhipeng
Sun, Shan
Zhang, Yinghan
Mo, Zijia
Zhao, Chen
ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2021, PT II, 2022, 13156 : 317 - 333
[9] Accelerating DNN Inference With Reliability Guarantee in Vehicular Edge Computing
Liu, Kai
Liu, Chunhui
Yan, Guozhi
Lee, Victor C. S.
Cao, Jiannong
IEEE-ACM TRANSACTIONS ON NETWORKING, 2023, 31 (06) : 3238 - 3253
[10] Targeting DNN Inference Via Efficient Utilization of Heterogeneous Precision DNN Accelerators
Spantidi, Ourania
Zervakis, Georgios
Alsalamin, Sami
Roman-Ballesteros, Isai
Henkel, Joerg
Amrouch, Hussam
Anagnostopoulos, Iraklis
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2023, 11 (01) : 112 - 125

← 1 2 3 4 5 →