Laius: An 8-bit Fixed-point CNN Hardware Inference Engine

被引:34
|
作者
Li, Zhisheng [1 ]
Wang, Lei [1 ]
Guo, Shasha [1 ]
Deng, Yu [1 ]
Dou, Qiang [1 ]
Zhou, Haifang [1 ]
Lu, Wenyuan [2 ]
机构
[1] Natl Univ Def Technol, Sch Comp Sci, Changsha, Hunan, Peoples R China
[2] Xian Satellite Monitoring & Control Ctr, Xian, Shaanxi, Peoples R China
关键词
CNN accelerator; FPGA; LeNet; Inference; Implementation;
D O I
10.1109/ISPA/IUCC.2017.00030
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Convolutional Neural Network (CNN) is one of the most effective neural network model for many classification tasks, such as voice recognition, computer vision and biological information processing. Unfortunately, Computation of CNN is both memory-intensive and computation-intensive, which brings a huge challenge to the design of the hardware accelerators. A large number of hardware accelerators for CNN inference are designed by the industry and the academia. Most of the engines are based on 32-bit floating point matrix multiplication, where the data precision is over-provisioned for inference job and the hardware cost are too high. In this paper, a 8-bit fixed-point LeNet inference engine (Laius) is designed and implemented on FPGA. In order to reduce the consumption of FPGA resource, we proposed a methodology to find the optimal bit-length for weight and bias in LeNet, which results in using 8-bit fixed point for most of the computation and using 16-bit fixed point for other computation. The PE (Processing Element) design is proposed. Pipelining and PE tiling technique is use to improve the performance of the inference engine. By theoretical analysis, we came to the conclusion that DSP resource in FPGA is the most critical resource, it should be carefully used during the design process. We implement the inference engine on Xilinx 485t FPGA. Experiment result shows that the designed LeNet inference engine can achieve 44.9 Gops throughput with 8-bit fixed-point operation after pipelining. Moreover, with only 1% loss of accuracy, the 8-bit fixed-point engine largely reduce 31.43% in latency, 87.01% in LUT consumption, 66.50% in BRAM consumption, 65.11% in DSP consumption and 47.95% reduction in power compared to a 32-bit fixed-point inference engine with the same structure.
引用
收藏
页码:143 / 150
页数:8
相关论文
共 50 条
  • [21] Strong 8-bit Sboxes with efficient masking in hardware extended version
    Boss E.
    Grosso V.
    Güneysu T.
    Leander G.
    Moradi A.
    Schneider T.
    Journal of Cryptographic Engineering, 2017, 7 (2) : 149 - 165
  • [22] Classifying 8-bit to 8-bit S-boxes based on power mappings from the point of DDT and LAT distributions
    Aslan, Bora
    Sakalli, M. Tolga
    Bulus, Ercan
    ARITHMETIC OF FINITE FIELDS, PROCEEDINGS, 2008, 5130 : 123 - +
  • [23] Fixed-point multiplication: A probabilistic bit-pattern view
    Ahmadi, A.
    Zwolinski, M.
    MICROELECTRONICS RELIABILITY, 2011, 51 (04) : 790 - 796
  • [24] Fixed-point DSP implementation for a low bit rate vocoder
    Yao, Fengying
    Li, Bizhou
    Zhang, Min
    International Conference on Solid-State and Integrated Circuit Technology Proceedings, 1998, : 365 - 368
  • [25] A fixed-point DSP implementation for a low bit rate vocoder
    Yao, FY
    Li, BZ
    Zhang, M
    1998 5TH INTERNATIONAL CONFERENCE ON SOLID-STATE AND INTEGRATED CIRCUIT TECHNOLOGY PROCEEDINGS, 1998, : 365 - 368
  • [26] $5.25 buys a 16-bit fixed-point DSP
    不详
    COMPUTER DESIGN, 1996, 35 (09): : 112 - 112
  • [27] A SystemC profiling framework to improve fixed-point hardware utilization
    Linhares, Alisson
    Rusa, Henrique
    Formiga, Daniel
    Azevedo, Rodol Fo
    33RD SYMPOSIUM ON INTEGRATED CIRCUITS AND SYSTEMS DESIGN (SBCCI 2020), 2020,
  • [28] Optimize Hardware with Fixed-Point Variable Length Phase Factors
    Schmuland, Todd E.
    Jamali, Mohsin M.
    Longbrake, Matthew B.
    Buxa, Peter E.
    2012 IEEE 10TH INTERNATIONAL NEW CIRCUITS AND SYSTEMS CONFERENCE (NEWCAS), 2012, : 113 - 116
  • [29] Modification of Theoretical Fixed-point LMS Algorithm for Implementation in Hardware
    Hu, Zheng-wei
    Xie, Zhi-yuan
    PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON ELECTRONIC COMMERCE AND SECURITY, VOL II, 2009, : 174 - 178
  • [30] AN OSCILLOSCOPIC POINT-PLOTTER INTERFACE FOR 8-BIT MICROCOMPUTERS
    FINLEY, GP
    DILOLLO, V
    BEHAVIOR RESEARCH METHODS & INSTRUMENTATION, 1981, 13 (01): : 51 - 54