Accelerating DNN Inference with Heterogeneous Multi-DPU Engines

被引:0
|
作者
Du, Zelin [1 ,3 ]
Zhang, Wei [1 ,2 ]
Zhou, Zimeng [1 ,2 ]
Shao, Zili [3 ]
Ju, Lei [1 ,2 ]
机构
[1] Shandong Univ, Sch Cyber Sci & Technol, Qingdao, Peoples R China
[2] Quan Cheng Lab, Jinan, Peoples R China
[3] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong, Peoples R China
来源
2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC | 2023年
关键词
FPGA; CNN accelerator; DPU; pipeline;
D O I
10.1109/DAC56929.2023.10247793
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Deep Learning Processor (DPU) programmable engine released by the official Xilinx Vitis AI toolchain has become one of the commercial off-the-shelf (COTS) solutions for Convolutional Neural Networks (CNNs) inference on Xilinx FPGAs. While modern FPGA devices generally have enough hardware resources to accommodate multi-DPUs simultaneously, the Xilinx toolchain currently only supports the deployment of multiple homogeneous DPUs engines that running independent inference tasks (task-level parallelism). In this work, we demonstrate that deployment of multiple heterogeneous DPU engines makes better resource efficiency for a given FPGA device. Moreover, we show that pipelined execution of a CNN inference task over heterogeneous multi-DPU engines may further improve overall inference throughput with carefully designed CNN layers-to-DPU mapping and scheduling. Finally, for a given CNN model and an FPGA device, we propose a comprehensive framework that automatically determines the optimal heterogeneous DPU deployment, and adaptively chooses the execution scheme between task-level and pipelined parallelism. Compared with the state-of-the-art solution with homogeneous multi-DPU engines and network-level parallelism, the proposed framework shows an average improvement of 13% (up-to 19%) and 6.6% (up-to 10%) on the Xilinx Zynq UltraScale+ MPSoC ZCU104 and ZCU102 platforms, respectively.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] EDDIS: Accelerating Distributed Data -Parallel DNN Training for Heterogeneous GPU Cluster
    Ahn, Shinyoung
    Ahn, Hooyoung
    Choi, Hyeonseong
    Lee, Jaehyun
    2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024, 2024, : 1167 - 1168
  • [22] SAVE: Sparsity-Aware Vector Engine for Accelerating DNN Training and Inference on CPUs
    Gong, Zhangxiaowen
    Ji, Houxiang
    Fletcher, Christopher W.
    Hughes, Christopher J.
    Baghsorkhi, Sara
    Torrellas, Josep
    2020 53RD ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO 2020), 2020, : 796 - 810
  • [23] CoopFL: Accelerating federated learning with DNN partitioning and offloading in heterogeneous edge computing
    Wang, Zhiyuan
    Xu, Hongli
    Xu, Yang
    Jiang, Zhida
    Liu, Jianchun
    COMPUTER NETWORKS, 2023, 220
  • [24] A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters
    Jiang, Yimin
    Zhu, Yibo
    Lan, Chang
    Yi, Bairen
    Cui, Yong
    Guo, Chuanxiong
    PROCEEDINGS OF THE 14TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI '20), 2020, : 463 - 479
  • [25] Artifact: MASA: Responsive Multi-DNN Inference on the Edge
    Cox, Bart
    Galjaard, Jeroen
    Ghiassi, Amirmasoud
    Birke, Robert
    Chen, Lydia Y.
    2021 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS AND OTHER AFFILIATED EVENTS (PERCOM WORKSHOPS), 2021, : 446 - 447
  • [26] Heterogeneous Dataflow Accelerators for Multi-DNN Workloads
    Kwon, Hyoukjun
    Lai, Liangzhen
    Pellauer, Michael
    Krishna, Tushar
    Chen, Yu-Hsin
    Chandra, Vikas
    2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), 2021, : 71 - 83
  • [27] Input Feature Pruning for Accelerating GNN Inference on Heterogeneous Platforms
    Yik, Jason
    Kuppannagari, Sanmukh R.
    Zeng, Hanqing
    Prasanna, Viktor K.
    2022 IEEE 29TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS, HIPC, 2022, : 282 - 291
  • [28] Accelerating on-device DNN inference during service outage through scheduling early exit
    Wang, Zizhao
    Bao, Wei
    Yuan, Dong
    Ge, Liming
    Tran, Nguyen H.
    Zomaya, Albert Y.
    COMPUTER COMMUNICATIONS, 2020, 162 : 69 - 82
  • [29] Aries: A DNN Inference Scheduling Framework for Multi-core Accelerators
    Xiang, Yunyi
    Wu, Zheng
    Yao, Haidong
    Xiong, Xiankui
    Yang, Fan
    2024 5TH INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKS AND INTERNET OF THINGS, CNIOT 2024, 2024, : 186 - 191
  • [30] Pantheon: Preemptible Multi-DNN Inference on Mobile Edge GPUs
    Han, Lixiang
    Zhou, Zimu
    Li, Zhenjiang
    PROCEEDINGS OF THE 2024 THE 22ND ANNUAL INTERNATIONAL CONFERENCE ON MOBILE SYSTEMS, APPLICATIONS AND SERVICES, MOBISYS 2024, 2024, : 465 - 478