A Balanced Sparse Matrix Convolution Accelerator for Efficient CNN Training

被引:0
|
作者
Chen, Yuechen [1 ]
Louri, Ahmed [2 ]
Liu, Shanshan [3 ]
Lombardi, Fabrizio [4 ]
机构
[1] Frostburg State Univ, Dept Comp Sci & Informat Technol, Frostburg, MD 21532 USA
[2] George Washington Univ, Dept Elect & Comp Engn, Washington, DC 20052 USA
[3] Univ Elect Sci & Technol China, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China
[4] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA USA
关键词
Convolutional neural network; training; sparse matrix compression; memory traffic; load balancing;
D O I
10.1109/TCSI.2024.3430831
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Sparse Convolutional Neural Network (CNN) training is well known to be time-consuming due to significant off-chip memory traffic. To effectively deploy sparse training, existing accelerators store matrices in a compressed format to eliminate memory accesses for zeros; hence, accelerators are designed to process compressed matrices to avoid zero computations. We have observed that the compression rate is greatly affected by the sparsity in the matrices with different formats. Given the varying levels of sparsity in activations, weights, errors, and gradients matrices throughout the sparse training process, it becomes impractical to achieve consistently high compression rates using a singular compression method for the entire duration of the training. Moreover, random zeros in the matrices result in irregular computation patterns, further increasing execution time. To address these issues, we propose a balanced sparse matrix convolution accelerator design for efficient CNN training. Specifically, a dual matrix compression technique is developed that seamlessly combines two widely used sparse matrix compression formats with a control algorithm for lower memory traffic during training. Based on this compression technique, a two-level workload balancing technique is then designed to further reduce the execution time and energy consumption. Finally, an accelerator is implemented to support the proposed techniques. The cycle-accurate simulation results show that the proposed accelerator reduces the execution time by 34% and the energy consumption by 24% on average compared to existing sparse training accelerators.
引用
收藏
页码:4638 / 4651
页数:14
相关论文
共 50 条
  • [31] A Computationally Efficient Neural Video Compression Accelerator Based on a Sparse CNN-Transformer Hybrid Network
    Zhang, Siyu
    Mao, Wendong
    Shi, Huihong
    Wang, Zhongfeng
    2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2024,
  • [32] IMCA: An Efficient In-Memory Convolution Accelerator
    Yantir, Hasan Erdem
    Eltawil, Ahmed M.
    Salama, Khaled N.
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2021, 29 (03) : 447 - 460
  • [33] ESCALATE: Boosting the Efficiency of Sparse CNN Accelerator with Kernel Decomposition
    Li, Shiyu
    Hanson, Edward
    Qian, Xuehai
    Li, Hai Helen
    Chen, Yiran
    PROCEEDINGS OF 54TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2021, 2021, : 992 - 1004
  • [34] SPEC2: SPECtral SParsE CNN Accelerator on FPGAs
    Niu, Yue
    Zeng, Hanqing
    Srivastava, Ajitesh
    Lakhotia, Kartik
    Kannan, Rajgopal
    Wang, Yanzhi
    Prasanna, Viktor
    2019 IEEE 26TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC), 2019, : 195 - 204
  • [35] An FPGA-Based CNN Accelerator Integrating Depthwise Separable Convolution
    Liu, Bing
    Zou, Danyin
    Feng, Lei
    Feng, Shou
    Fu, Ping
    Li, Junbao
    ELECTRONICS, 2019, 8 (03)
  • [36] An Area Efficient Superconducting Unary CNN Accelerator
    Gonzalez-Guerrero, Patricia
    Huch, Kylie
    Patra, Nirmalendu
    Popovici, Thom
    Michelogiannakis, George
    2023 24TH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN, ISQED, 2023, : 675 - 682
  • [37] Spatula: A Hardware Accelerator for Sparse Matrix Factorization
    Feldmann, Axel
    Sanchez, Daniel
    56TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2023, 2023, : 91 - 104
  • [38] On optimal and balanced sparse matrix partitioning problems
    Grandjean, Anael
    Langguth, Johannes
    Ucar, Bora
    2012 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2012, : 257 - 265
  • [39] SCA: A Secure CNN Accelerator for Both Training and Inference
    Zhao, Lei
    Zhang, Youtao
    Yang, Jun
    PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2020,
  • [40] A High Efficient Architecture for Convolution Neural Network Accelerator
    Kong Anmin
    Zhao Bin
    2019 2ND INTERNATIONAL CONFERENCE ON INTELLIGENT AUTONOMOUS SYSTEMS (ICOIAS 2019), 2019, : 131 - 134