Maximizing System Performance by Balancing Computation Loads in LSTM Accelerators

被引:0
|
作者
Park, Junki [1 ]
Kung, Jaeha [1 ]
Yi, Wooseok [1 ]
Kim, Jae-Joon [1 ]
机构
[1] Pohang Univ Sci & Technol POSTECH, Pohang, South Korea
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The LSTM is a popular neural network model for modeling or analyzing the time-varying data. The main operation of LSTM is a matrix-vector multiplication and it becomes sparse (spMxV) due to the widely-accepted weight pruning in deep learning. This paper presents a new sparse matrix format, named CBSR, to maximize the inference speed of the LSTM accelerator. In the CBSR format, speed-up is achieved by balancing out the computation loads over PEs. Along with the new format, we present a simple network transformation to completely remove the hardware overhead incurred when using the CBSR format. Also, the detailed analysis on the impact of network size or the number of PEs is performed, which lacks in the prior work. The simulation results show 1.6 similar to 38% improvement in the system performance compared to the well-known CSC/CSR format. The power analysis is also performed in 65nm CMOS technology to show 9 similar to 22% energy savings.
引用
收藏
页码:7 / 12
页数:6
相关论文
共 50 条
  • [31] Method for calculating dynamic loads and energy consumption of a sucker rod installation with an automatic balancing system
    Urazakov, Kamil R.
    Molchanova, Veronika A.
    Tugunov, Pavel M.
    JOURNAL OF MINING INSTITUTE, 2020, 246 : 640 - 649
  • [32] Balancing the grid loads by large scale integration of hydrogen technologies: The case of the Spanish power system
    Gutierrez-Martin, F.
    Guerrero-Hernandez, I.
    INTERNATIONAL JOURNAL OF HYDROGEN ENERGY, 2012, 37 (02) : 1151 - 1161
  • [33] PERFORMANCE ANALYSIS OF DYNAMIC MULTITASKING IMPRECISE COMPUTATION SYSTEM
    LIM, CC
    ZHAO, W
    IEE PROCEEDINGS-E COMPUTERS AND DIGITAL TECHNIQUES, 1991, 138 (05): : 345 - 350
  • [34] Architecture and Performance of Devito, a System for Automated Stencil Computation
    Luporini, Fabio
    Louboutin, Mathias
    Lange, Michael
    Kukreja, Navjot
    Witte, Philipp
    Huckelheim, Jan
    Yount, Charles
    Kelly, Paul H. J.
    Herrmann, Felix J.
    Gorman, Gerard J.
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2020, 46 (01):
  • [35] Development of a train performance computation system on an engineering workstation
    Hirano, J.
    Tomii, N.
    Yamashita, O.
    Kimura, Y.
    Periodica Polytechnica, Mechanical Engineering, 1992, 36 (01):
  • [36] Convex hull-based multi-objective evolutionary computation for maximizing receiver operating characteristics performance
    Hong, Wenjing
    Tang, Ke
    MEMETIC COMPUTING, 2016, 8 (01) : 35 - 44
  • [37] System-Level Communication Performance Estimation for DMA-Controlled Accelerators
    Kim, Sunwoo
    Park, Sungkyung
    Park, Chester Sungchung
    IEEE ACCESS, 2021, 9 : 141389 - 141402
  • [38] Performance-Based Practical Design: Maximizing System 7 Performance by Rethinking Design Decisions
    Mooney, Robert
    ITE JOURNAL-INSTITUTE OF TRANSPORTATION ENGINEERS, 2015, 85 (12): : 38 - 42
  • [39] Joint Computation and Traffic Loads Balancing Task Offloading in Multi-Access Edge Computing Systems Interconnected by Elastic Optical Networks
    Xin, Jingjie
    Li, Xin
    Zhang, Lu
    Zhang, Yongjun
    Huang, Shanguo
    IEEE COMMUNICATIONS LETTERS, 2023, 27 (09) : 2378 - 2382
  • [40] Performance Prediction of Antenna Control Servo System based on LSTM Network
    Satish, Nishank
    Menon, Sajith
    Arora, Divyang
    Devi, M. Parvathi
    Santhalakshmi, S.
    2024 IEEE SPACE, AEROSPACE AND DEFENCE CONFERENCE, SPACE 2024, 2024, : 244 - 247