A Balanced Sparse Matrix Convolution Accelerator for Efficient CNN Training

被引:0
|
作者
Chen, Yuechen [1 ]
Louri, Ahmed [2 ]
Liu, Shanshan [3 ]
Lombardi, Fabrizio [4 ]
机构
[1] Frostburg State Univ, Dept Comp Sci & Informat Technol, Frostburg, MD 21532 USA
[2] George Washington Univ, Dept Elect & Comp Engn, Washington, DC 20052 USA
[3] Univ Elect Sci & Technol China, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China
[4] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA USA
关键词
Convolutional neural network; training; sparse matrix compression; memory traffic; load balancing;
D O I
10.1109/TCSI.2024.3430831
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Sparse Convolutional Neural Network (CNN) training is well known to be time-consuming due to significant off-chip memory traffic. To effectively deploy sparse training, existing accelerators store matrices in a compressed format to eliminate memory accesses for zeros; hence, accelerators are designed to process compressed matrices to avoid zero computations. We have observed that the compression rate is greatly affected by the sparsity in the matrices with different formats. Given the varying levels of sparsity in activations, weights, errors, and gradients matrices throughout the sparse training process, it becomes impractical to achieve consistently high compression rates using a singular compression method for the entire duration of the training. Moreover, random zeros in the matrices result in irregular computation patterns, further increasing execution time. To address these issues, we propose a balanced sparse matrix convolution accelerator design for efficient CNN training. Specifically, a dual matrix compression technique is developed that seamlessly combines two widely used sparse matrix compression formats with a control algorithm for lower memory traffic during training. Based on this compression technique, a two-level workload balancing technique is then designed to further reduce the execution time and energy consumption. Finally, an accelerator is implemented to support the proposed techniques. The cycle-accurate simulation results show that the proposed accelerator reduces the execution time by 34% and the energy consumption by 24% on average compared to existing sparse training accelerators.
引用
收藏
页码:4638 / 4651
页数:14
相关论文
共 50 条
  • [1] SqueezeFlow: A Sparse CNN Accelerator Exploiting Concise Convolution Rules
    Li, Jiajun
    Jiang, Shuhao
    Gong, Shijun
    Wu, Jingya
    Yan, Junchao
    Yan, Guihai
    Li, Xiaowei
    IEEE TRANSACTIONS ON COMPUTERS, 2019, 68 (11) : 1663 - 1677
  • [2] An Efficient Sparse CNN Inference Accelerator With Balanced Intra-and Inter-PE Workload
    Guo, Jianbo
    Xu, Tongqing
    Wu, Zhenyang
    Xiao, Hao
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2024,
  • [3] MSCA: A Multi-grained Sparse Convolution Accelerator for DNN Training
    Mao, Yingchang
    Liu, Qiang
    Cheung, Ray C. C.
    2024 IEEE 35TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, ASAP 2024, 2024, : 34 - 35
  • [4] Support Convolution of CNN with Compression Sparse Matrix Multiplication Flow in TVM
    Liao, Hui-Hsin
    Lee, Chao-Lin
    Lee, Jenq-Kuen
    Lai, Wei-Chih
    Hung, Ming-Yu
    Huang, Chung-Wen
    50TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOP PROCEEDINGS - ICPP WORKSHOPS '21, 2021,
  • [5] An Efficient FPGA Accelerator Optimized for High Throughput Sparse CNN Inference
    Wen, Jiayu
    Ma, Yufei
    Wang, Zhongfeng
    APCCAS 2020: PROCEEDINGS OF THE 2020 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS 2020), 2020, : 165 - 168
  • [6] An Efficient CNN Training Accelerator Leveraging Transposable Block Sparsity
    Xu, Mingyang
    Lu, Jinming
    Wang, Zhongfeng
    Lin, Jun
    2022 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2022): INTELLIGENT TECHNOLOGY IN THE POST-PANDEMIC ERA, 2022, : 230 - 233
  • [7] WinTA: An Efficient Reconfigurable CNN Training Accelerator With Decomposition Winograd
    Lu, Jinming
    Wang, Hui
    Lin, Jun
    Wang, Zhongfeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2024, 71 (02) : 634 - 645
  • [8] A Depthwise Separable Convolution Architecture for CNN Accelerator
    Srivastava, Harsh
    Sarawadekar, Kishor
    PROCEEDINGS OF 2020 IEEE APPLIED SIGNAL PROCESSING CONFERENCE (ASPCON 2020), 2020, : 1 - 5
  • [9] An efficient CNN accelerator for pattern-compressed sparse neural networks on FPGA
    Zhang, Yonghua
    Wang, Haojie
    Pan, Zhenhua
    NEUROCOMPUTING, 2025, 611
  • [10] Design of Power-Efficient Training Accelerator for Convolution Neural Networks
    Hong, JiUn
    Arslan, Saad
    Lee, TaeGeon
    Kim, HyungWon
    ELECTRONICS, 2021, 10 (07)