FlexCNN: An End-to-end Framework for Composing CNN Accelerators on FPGA

被引:19
|
作者
Basalama, Suhail [1 ]
Sohrabizadeh, Atefeh [1 ]
Wang, Jie [1 ]
Guo, Licheng [1 ]
Cong, Jason [1 ]
机构
[1] Univ Calif Los Angeles, 404 Westwood Blvd Engn,6 Room 468, Los Angeles, CA 90095 USA
关键词
FPGA; CNN; ONNX; systolic array; transposed convolution; dilated convolution; OpenPose; U-Net; E-Net;
D O I
10.1145/3570928
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With reduced data reuse and parallelism, recent convolutional neural networks (CNNs) create new challenges for FPGA acceleration. Systolic arrays (SAs) are efficient, scalable architectures for convolutional layers, but without proper optimizations, their efficiency drops dramatically for reasons: (1) the different dimensions within same-type layers, (2) the different convolution layers especially transposed and dilated convolutions, and (3) CNN's complex dataflow graph. Furthermore, significant overheads arise when integrating FPGAs into machine learning frameworks. Therefore, we present a flexible, composable architecture called FlexCNN, which delivers high computation efficiency by employing dynamic tiling, layer fusion, and data layout optimizations. Additionally, we implement a novel versatile SA to process normal, transposed, and dilated convolutions efficiently. FlexCNN also uses a fully pipelined software-hardware integration that alleviates the software overheads. Moreover, with an automated compilation flow, FlexCNN takes a CNN in the ONNX1 representation, performs a design space exploration, and generates an FPGA accelerator. The framework is tested using three complex CNNs: OpenPose, U-Net, and E-Net. The architecture optimizations achieve 2.3x performance improvement. Compared to a standard SA, the versatile SA achieves close-to-ideal speedups, with up to 15.98x and 13.42x for transposed and dilated convolutions, with a 6% average area overhead. The pipelined integration leads to a 5x speedup for OpenPose.
引用
收藏
页数:32
相关论文
共 50 条
  • [11] A focus module-based lightweight end-to-end CNN framework for voiceprint recognition
    Velayuthapandian, Karthikeyan
    Subramoniam, Suja Priyadharsini
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (06) : 2817 - 2825
  • [12] End-to-end acceleration of the YOLO object detection framework on FPGA-only devices
    Dezheng Zhang
    Aibin Wang
    Ruchan Mo
    Dong Wang
    Neural Computing and Applications, 2024, 36 : 1067 - 1089
  • [13] PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in-DRAM
    Olgun, Ataberk
    Luna, Juan Gomez
    Kanellopoulos, Konstantinos
    Salami, Behzad
    Hassan, Hasan
    Ergin, Oguz
    Mutlu, Onur
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2022, 20 (01)
  • [14] End-to-end acceleration of the YOLO object detection framework on FPGA-only devices
    Zhang, Dezheng
    Wang, Aibin
    Mo, Ruchan
    Wang, Dong
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (03): : 1067 - 1089
  • [15] End-to-End Computer Vision Framework
    Orhei, Ciprian
    Mocofan, Muguras
    Vert, Silviu
    Vasiu, Radu
    2020 14TH INTERNATIONAL SYMPOSIUM ON ELECTRONICS AND TELECOMMUNICATIONS (ISETC), 2020, : 63 - 66
  • [16] Retargeting Video With an End-to-End Framework
    Le, Thi-Ngoc-Hanh
    Huang, HuiGuang
    Chen, Yi-Ru
    Lee, Tong-Yee
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (09) : 6164 - 6176
  • [17] A Framework for Evaluating the End-to-End Trustworthiness
    Mohammadi, Nazila Gol
    Bandyszak, Torsten
    Weyer, Thorsten
    Kalogiros, Costas
    Kanakakis, Michalis
    2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 1, 2015, : 638 - 645
  • [18] CNN-Based End-To-End Language Identification
    Wang, Yutian
    Zhou, Huan
    Wang, Zheng
    Wang, Jingling
    Wang, Hui
    PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 2475 - 2479
  • [19] Toward an End-to-End Model for ISAC-I Accelerators
    Shelbaya, O.
    Kester, O. K.
    9TH INTERNATIONAL PARTICLE ACCELERATOR CONFERENCE (IPAC18), 2018, 1067
  • [20] Bifrost: End-to-End Evaluation and Optimization of Reconfigurable DNN Accelerators
    Stjerngren, Axel
    Gibson, Perry
    Cano, Jose
    2022 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS 2022), 2022, : 288 - 299