Performance-Driven LSTM Accelerator Hardware Using Split-Matrix-Based MVM

被引:1
|
作者
Joseph, Tresa [1 ]
Bindiya, T. S. [1 ]
机构
[1] Natl Inst Technol Calicut, Dept Elect & Commun Engn, Kattangal 673601, Kerala, India
关键词
Recurrent neural network; Long short-term memory; Systolic array architecture; Parallel computing; RECURRENT NEURAL-NETWORKS;
D O I
10.1007/s00034-023-02412-4
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper proposes a new hardware approach for accelerating matrix vector multiplication (MVM) employing systolic array architecture and parallel data processing units, which is particularly useful in multiplication intensive applications such as neural networks. The hardware complexity of the parallel computations is reduced by a technique named as split-matrix approach, in which the larger matrices are split into smaller matrices. In the proposed architecture, 8-bit fixed-point representation is considered and matrices are treated to be circulant in nature. The resulting MVM architecture benefits with reduced implementation complexity in terms of cell area, reduced delay, and power consumption. It is found to result in a 13.9% reduction in logic cell area and a 38.15% reduction in total power consumption when compared to those of the latest baseline design. Also, the proposed architecture is able to achieve a considerably improved minimum permissible clock period of 0.410ns. The development of a long short-term memory (LSTM) architecture using the proposed design also serves to prove the effectiveness of the proposed MVM architecture. The LSTM developed using the proposed MVM provides a 37.57% reduction in the cell area and a 22.86% reduction in the total power in comparison with the latest baseline design and is able to achieve a minimum clock period of 0.42 ns.
引用
收藏
页码:6660 / 6683
页数:24
相关论文
共 50 条
  • [31] Fast performance-driven optimization for buffered clock trees based on Lagrangian relaxation
    Chen, CP
    Chang, YW
    Wong, DF
    33RD DESIGN AUTOMATION CONFERENCE, PROCEEDINGS 1996, 1996, : 405 - 408
  • [32] PERFORMANCE-DRIVEN SPACING ALGORITHMS USING ATTRACTIVE AND REPULSIVE CONSTRAINTS FOR SUBMICRON LSIS
    ONOZAWA, A
    CHAUDHARY, K
    KUH, ES
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 1995, 14 (06) : 707 - 719
  • [33] Using Discrete Event Simulation (DES) To Support Performance-Driven Healthcare Design
    Cai, Hui
    Jia, Jun
    HERD-HEALTH ENVIRONMENTS RESEARCH & DESIGN JOURNAL, 2019, 12 (03) : 89 - 106
  • [34] A CNN Hardware Accelerator Using Triangle-based Convolution
    Thomas, Amal K.
    Poddar, Soumyajit
    Mondal, Hemanta Kumar
    ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS, 2022, 18 (04)
  • [35] Genetic-based Machine Learning using Hardware Accelerator
    Yoshikawa, Masaya
    Terai, Hidekazu
    PROCEEDINGS OF THE 12TH WSEAS INTERNATIONAL CONFERENCE ON CIRCUITS: NEW ASPECTS OF CIRCUITS, 2008, : 284 - +
  • [36] Adaptive library-based device performance-driven optical proximity correction
    Teh, S. H.
    Heng, C. H.
    Tay, A.
    ELECTRONICS LETTERS, 2010, 46 (07) : 513 - 514
  • [37] Toward performance-driven reduction of the cost of RET-based lithography control
    Gupta, P
    Kahng, AB
    Sylvester, D
    Yang, J
    COST AND PERFORMANCE IN INTEGRATED CIRCUIT CREATION, 2003, 5043 : 123 - 133
  • [38] A Performance-Driven Multilevel Framework for the X-Based Full-Chip Router
    Ho, Tsung-Yi
    INTEGRATED CIRCUIT AND SYSTEMS DESIGN: POWER AND TIMING MODELING, OPTIMIZATION AND SIMULATION, 2009, 5349 : 209 - 218
  • [39] PIXAR: A performance-driven X-architecture router based on a novel multilevel framework
    Ho, Tsung-Yi
    INTEGRATION-THE VLSI JOURNAL, 2009, 42 (03) : 400 - 408
  • [40] Energy- and performance-driven NoC communication architecture synthesis using a decomposition approach
    Ogras, UY
    Marculescu, R
    DESIGN, AUTOMATION AND TEST IN EUROPE CONFERENCE AND EXHIBITION, VOLS 1 AND 2, PROCEEDINGS, 2005, : 352 - 357