A High-Level Modeling Framework for Estimating Hardware Metrics of CNN Accelerators

被引:9
|
作者
Juracy, Leonardo Rezende [1 ]
Moreira, Matheus Trevisan [2 ]
Amory, Alexandre de Morais [3 ]
Hampel, Alexandre F. [1 ]
Moraes, Fernando Gehm [1 ]
机构
[1] Pontifical Catholic Univ Rio Grande Sul PUCRS, Sch Technol, BR-90619900 Porto Alegre, RS, Brazil
[2] Chronos Tech, San Diego, CA 92122 USA
[3] TeCIP Inst, Scuola Super SantAnna, I-56124 Pisa, Italy
关键词
Convolutional neural networks; Space exploration; Estimation; Computer architecture; Training; Hardware acceleration; Convolution; CNN; convolution hardware accelerator; system simulator; PPA; design space exploration;
D O I
10.1109/TCSI.2021.3104644
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
GPUs became the reference platform for both training and inference phases of Convolutional Neural Networks (CNN) due to their tailored architecture to the CNN operators. However, GPUs are power-hungry architectures. A path to enable the deployment of CNNs in energy-constrained devices is adopting hardware accelerators for the inference phase. The design space exploration of CNNs using standard approaches, such as RTL, is limited due to their complexity. Thus, designers need frameworks enabling design space exploration that delivers accurate hardware estimation metrics to deploy CNNs. This work proposes a framework to explore CNNs design space, providing power, performance, and area (PPA) estimations. The heart of the framework is a system simulator. The system simulator front-end is TensorFlow, and the back-end is performance estimations obtained from the physical synthesis of hardware accelerators, not only from components like multipliers and adders. The first set of results evaluate the CNN accuracy using integer quantization, the accelerators PPA after physical synthesis, and the benefits of using a system simulator. These results allow a rich design space exploration, enabling selecting the best set of CNN parameters to meet the design constraints.
引用
收藏
页码:4783 / 4795
页数:13
相关论文
共 50 条
  • [1] High-level synthesis of nonprogrammable hardware accelerators
    Schreiber, Robert
    Aditya, Shail
    Rau, B. Ramakrishna
    Kathail, Vinod
    Mahlke, Scott
    Abraham, Santosh
    Snider, Greg
    HP Laboratories Technical Report, 2000, (31):
  • [2] High-level synthesis of nonprogrammable hardware accelerators
    Schreiber, R
    Aditya, S
    Rau, BR
    Kathail, V
    Mahlke, S
    Abraham, S
    Snider, G
    IEEE INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES, AND PROCESSORS, PROCEEDINGS, 2000, : 113 - 124
  • [3] LogCA: A High-Level Performance Model for Hardware Accelerators
    Bin Altaf, Muhammad Shoaib
    Wood, David A.
    44TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2017), 2017, : 375 - 388
  • [4] Securing Hardware Accelerators: A New Challenge for High-Level Synthesis
    Pilato, Christian
    Garg, Siddharth
    Wu, Kaijie
    Karri, Ramesh
    Regazzoni, Francesco
    IEEE EMBEDDED SYSTEMS LETTERS, 2018, 10 (03) : 77 - 80
  • [5] ESTIMATING LOWER HARDWARE BOUNDS IN HIGH-LEVEL SYNTHESIS
    WEHN, N
    GLESNER, M
    VIELHAUER, C
    VLSI 93, 1994, 42 : 261 - 270
  • [6] A HIGH-LEVEL LANGUAGE FOR DESIGN AND MODELING OF HARDWARE
    NAVABI, Z
    JOURNAL OF SYSTEMS AND SOFTWARE, 1992, 18 (01) : 5 - 18
  • [7] PICO-NPA: High-level synthesis of nonprogrammable hardware accelerators
    Schreiber, R
    Aditya, S
    Mahlke, S
    Kathail, V
    Rau, BR
    Cronquist, D
    Sivaraman, M
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2002, 31 (02): : 127 - 142
  • [8] PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators
    Robert Schreiber
    Shail Aditya
    Scott Mahlke
    Vinod Kathail
    B. Ramakrishna Rau
    Darren Cronquist
    Mukund Sivaraman
    Journal of VLSI signal processing systems for signal, image and video technology, 2002, 31 : 127 - 142
  • [9] COSMOS: Coordination of High-Level Synthesis and Memory Optimization for Hardware Accelerators
    Piccolboni, Luca
    Mantovani, Paolo
    Di Guglielmo, Giuseppe
    Carloni, Luca P.
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2017, 16
  • [10] Register Allocation for High-Level Synthesis of Hardware Accelerators Targeting FPGAs
    Hempel, Gerald
    Hoyer, Jan
    Pionteck, Thilo
    Hochberger, Christian
    2013 8TH INTERNATIONAL WORKSHOP ON RECONFIGURABLE AND COMMUNICATION-CENTRIC SYSTEMS-ON-CHIP (RECOSOC), 2013,