Pflow: An end-to-end heterogeneous acceleration framework for CNN inference on FPGAs

被引:0
|
作者
Wan, Yi [1 ]
Xie, Xianzhong [1 ]
Yi, Lingjie [1 ]
Jiang, Bo [1 ]
Chen, Junfan [2 ]
Jiang, Yi [1 ]
机构
[1] Chongqing Univ Posts & Telecommun, Coll Comp Sci & Technol, Chongqing 400065, Peoples R China
[2] Chongqing Haiyunjiexun Technol Co Ltd, Chongqing, Peoples R China
关键词
Heterogeneous computing; Computation graph reconstruction; Acceleration framework; FPGA; CONVOLUTIONAL NEURAL-NETWORKS; DESIGN; FLOW;
D O I
10.1016/j.sysarc.2024.103113
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Field -Programmable Gate Arrays (FPGAs), renowned for their high performance per watt, are extensively utilized to accelerate Convolutional Neural Networks (CNNs) in edge computing environments, primarily employing dataflow-based and instruction set -based approaches. Compared to the instruction set -based approach that features fast and versatile circuit design, the dataflow-based approach can significantly enhance performance at the expense of design versatility. Nevertheless, edge computing environments require both high energy efficiency and adaptability to various scenarios. This paper proposes a novel end -to -end heterogeneous acceleration framework for CNN inference on FPGAs, named Pflow. The basic idea is to decouple network deployment and hardware details with a hardware-software co -design approach. First, a dataflow accelerator with an adaptive scheduling strategy is proposed. The adaptive scheduling strategy, along with a scalable design, maximizes hardware utilization in terms of computing resources and bandwidth. Secondly, we design a novel operator -perception method to automate the processes of network reconstruction and operator fusion. Thirdly, we integrate Pflow into the industrial -grade deep learning framework Paddle-Lite. We evaluate Pflow by implementing several networks on two representative FPGA platforms. Experimental results demonstrate that Pflow achieves energy efficiencies of 46.5 GOPS/W on Xilinx Zynq Ultrascale+ MPSoC 3EG and 59.4 GOPS/W on Virtex UltraScale+ XCVU13P. It also reaches a throughput of up to 255.7 GOPS on the former and 3.686 TOPS on the latter.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Code-Based Cryptography for Confidential Inference on FPGAs: An End-to-End Methodology
    Karn, Rupesh Raj
    Knechtel, Johann
    Sinanoglu, Ozgur
    2024 25TH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN, ISQED 2024, 2024,
  • [2] DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos
    Parger, Mathias
    Tang, Chengcheng
    Twigg, Christopher D.
    Keskin, Cem
    Wang, Robert
    Steinberger, Markus
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 12487 - 12496
  • [3] FlexCNN: An End-to-end Framework for Composing CNN Accelerators on FPGA
    Basalama, Suhail
    Sohrabizadeh, Atefeh
    Wang, Jie
    Guo, Licheng
    Cong, Jason
    ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2023, 16 (02)
  • [4] End-to-end Quality of Service Framework for Heterogeneous Networks
    Baldi, Mario
    Giacomelli, Riccardo
    2009 IFIP/IEEE INTERNATIONAL SYMPOSIUM ON INTEGRATED NETWORK MANAGEMENT - WORKSHOPS, 2009, : 245 - 248
  • [5] Sparse R-CNN: An End-to-End Framework for Object Detection
    Sun, Peize
    Zhang, Rufeng
    Jiang, Yi
    Kong, Tao
    Xu, Chenfeng
    Zhan, Wei
    Tomizuka, Masayoshi
    Yuan, Zehuan
    Luo, Ping
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 15650 - 15664
  • [6] SNIFF: A Scalable Network Inference Framework for Measuring End-to-End Performance
    Tang, Zhongzheng
    Wang, Luning
    Xu, Qian
    Lu, Kejie
    Wang, Jianping
    Wu, Kui
    Jia, Xiaohua
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2022, 9 (03): : 1909 - 1923
  • [7] An end-to-end RNS CNN Accelerator
    Sakellariou, Vasilis
    Paliouras, Vassilis
    Kouretas, Ioannis
    Saleh, Hani
    Stouraitis, Thanos
    2024 IEEE 6TH INTERNATIONAL CONFERENCE ON AI CIRCUITS AND SYSTEMS, AICAS 2024, 2024, : 75 - 79
  • [8] GCONV Chain: Optimizing the Whole-Life Cost in End-to-end CNN Acceleration
    Zhang, Jiaqi
    Chen, Xiangru
    Ray, Sandip
    IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (09) : 2300 - 2312
  • [9] A focus module-based lightweight end-to-end CNN framework for voiceprint recognition
    Karthikeyan Velayuthapandian
    Suja Priyadharsini Subramoniam
    Signal, Image and Video Processing, 2023, 17 : 2817 - 2825
  • [10] A CNN-Based End-to-End Learning Framework Toward Intelligent Communication Systems
    Wu, Nan
    Wang, Xudong
    Lin, Bin
    Zhang, Kaiyao
    IEEE ACCESS, 2019, 7 : 110197 - 110204