Performance-Driven LSTM Accelerator Hardware Using Split-Matrix-Based MVM

被引：1

作者：

Joseph, Tresa ^{[1
]}

Bindiya, T. S. ^{[1
]}

机构：

[1] Natl Inst Technol Calicut, Dept Elect & Commun Engn, Kattangal 673601, Kerala, India

来源：

CIRCUITS SYSTEMS AND SIGNAL PROCESSING | 2023年 / 42卷 / 11期

关键词：

Recurrent neural network; Long short-term memory; Systolic array architecture; Parallel computing; RECURRENT NEURAL-NETWORKS;

D O I：

10.1007/s00034-023-02412-4

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper proposes a new hardware approach for accelerating matrix vector multiplication (MVM) employing systolic array architecture and parallel data processing units, which is particularly useful in multiplication intensive applications such as neural networks. The hardware complexity of the parallel computations is reduced by a technique named as split-matrix approach, in which the larger matrices are split into smaller matrices. In the proposed architecture, 8-bit fixed-point representation is considered and matrices are treated to be circulant in nature. The resulting MVM architecture benefits with reduced implementation complexity in terms of cell area, reduced delay, and power consumption. It is found to result in a 13.9% reduction in logic cell area and a 38.15% reduction in total power consumption when compared to those of the latest baseline design. Also, the proposed architecture is able to achieve a considerably improved minimum permissible clock period of 0.410ns. The development of a long short-term memory (LSTM) architecture using the proposed design also serves to prove the effectiveness of the proposed MVM architecture. The LSTM developed using the proposed MVM provides a 37.57% reduction in the cell area and a 22.86% reduction in the total power in comparison with the latest baseline design and is able to achieve a minimum clock period of 0.42 ns.

引用

页码：6660 / 6683

页数：24

共 50 条

[31] Fast performance-driven optimization for buffered clock trees based on Lagrangian relaxation
Chen, CP
Chang, YW
Wong, DF
33RD DESIGN AUTOMATION CONFERENCE, PROCEEDINGS 1996, 1996, : 405 - 408
[32] PERFORMANCE-DRIVEN SPACING ALGORITHMS USING ATTRACTIVE AND REPULSIVE CONSTRAINTS FOR SUBMICRON LSIS
ONOZAWA, A
CHAUDHARY, K
KUH, ES
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 1995, 14 (06) : 707 - 719
[33] Using Discrete Event Simulation (DES) To Support Performance-Driven Healthcare Design
Cai, Hui
Jia, Jun
HERD-HEALTH ENVIRONMENTS RESEARCH & DESIGN JOURNAL, 2019, 12 (03) : 89 - 106
[34] A CNN Hardware Accelerator Using Triangle-based Convolution
Thomas, Amal K.
Poddar, Soumyajit
Mondal, Hemanta Kumar
ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS, 2022, 18 (04)
[35] Genetic-based Machine Learning using Hardware Accelerator
Yoshikawa, Masaya
Terai, Hidekazu
PROCEEDINGS OF THE 12TH WSEAS INTERNATIONAL CONFERENCE ON CIRCUITS: NEW ASPECTS OF CIRCUITS, 2008, : 284 - +
[36] Adaptive library-based device performance-driven optical proximity correction
Teh, S. H.
Heng, C. H.
Tay, A.
ELECTRONICS LETTERS, 2010, 46 (07) : 513 - 514
[37] Toward performance-driven reduction of the cost of RET-based lithography control
Gupta, P
Kahng, AB
Sylvester, D
Yang, J
COST AND PERFORMANCE IN INTEGRATED CIRCUIT CREATION, 2003, 5043 : 123 - 133
[38] A Performance-Driven Multilevel Framework for the X-Based Full-Chip Router
Ho, Tsung-Yi
INTEGRATED CIRCUIT AND SYSTEMS DESIGN: POWER AND TIMING MODELING, OPTIMIZATION AND SIMULATION, 2009, 5349 : 209 - 218
[39] PIXAR: A performance-driven X-architecture router based on a novel multilevel framework
Ho, Tsung-Yi
INTEGRATION-THE VLSI JOURNAL, 2009, 42 (03) : 400 - 408
[40] Energy- and performance-driven NoC communication architecture synthesis using a decomposition approach
Ogras, UY
Marculescu, R
DESIGN, AUTOMATION AND TEST IN EUROPE CONFERENCE AND EXHIBITION, VOLS 1 AND 2, PROCEEDINGS, 2005, : 352 - 357

← 1 2 3 4 5 →