A Balanced Sparse Matrix Convolution Accelerator for Efficient CNN Training

被引:0
|
作者
Chen, Yuechen [1 ]
Louri, Ahmed [2 ]
Liu, Shanshan [3 ]
Lombardi, Fabrizio [4 ]
机构
[1] Frostburg State Univ, Dept Comp Sci & Informat Technol, Frostburg, MD 21532 USA
[2] George Washington Univ, Dept Elect & Comp Engn, Washington, DC 20052 USA
[3] Univ Elect Sci & Technol China, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China
[4] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA USA
关键词
Convolutional neural network; training; sparse matrix compression; memory traffic; load balancing;
D O I
10.1109/TCSI.2024.3430831
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Sparse Convolutional Neural Network (CNN) training is well known to be time-consuming due to significant off-chip memory traffic. To effectively deploy sparse training, existing accelerators store matrices in a compressed format to eliminate memory accesses for zeros; hence, accelerators are designed to process compressed matrices to avoid zero computations. We have observed that the compression rate is greatly affected by the sparsity in the matrices with different formats. Given the varying levels of sparsity in activations, weights, errors, and gradients matrices throughout the sparse training process, it becomes impractical to achieve consistently high compression rates using a singular compression method for the entire duration of the training. Moreover, random zeros in the matrices result in irregular computation patterns, further increasing execution time. To address these issues, we propose a balanced sparse matrix convolution accelerator design for efficient CNN training. Specifically, a dual matrix compression technique is developed that seamlessly combines two widely used sparse matrix compression formats with a control algorithm for lower memory traffic during training. Based on this compression technique, a two-level workload balancing technique is then designed to further reduce the execution time and energy consumption. Finally, an accelerator is implemented to support the proposed techniques. The cycle-accurate simulation results show that the proposed accelerator reduces the execution time by 34% and the energy consumption by 24% on average compared to existing sparse training accelerators.
引用
收藏
页码:4638 / 4651
页数:14
相关论文
共 50 条
  • [41] An Efficient FPGA-Based Accelerator Design for Convolution
    Song, Peng-Fei
    Pan, Jeng-Shyang
    Yang, Chun-Sheng
    Lee, Chiou-Yng
    2017 IEEE 8TH INTERNATIONAL CONFERENCE ON AWARENESS SCIENCE AND TECHNOLOGY (ICAST), 2017, : 494 - 500
  • [42] An Efficient Frequency Encoding Scheme for Optical Convolution Accelerator
    Xia, Gongyu
    Liu, Jiacheng
    Hong, Qilin
    Zhu, Pingyu
    Xu, Ping
    Zhu, Zhihong
    PHOTONICS, 2025, 12 (01)
  • [43] An Efficient Accelerator for Sparse Convolutional Neural Networks
    You, Weijie
    Wu, Chang
    2019 IEEE 13TH INTERNATIONAL CONFERENCE ON ASIC (ASICON), 2019,
  • [44] Automatic Compiler Based FPGA Accelerator for CNN Training
    Venkataramanaiah, Shreyas Kolala
    Ma, Yufei
    Yin, Shihui
    Nurvithadhi, Eriko
    Dasu, Aravind
    Cao, Yu
    Seo, Jae-sun
    2019 29TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2019, : 166 - 172
  • [45] Convolution with Logarithmic Filter Groups for Efficient Shallow CNN
    Lee, Tae Kwan
    Baddar, Wissam J.
    Kim, Seong Tae
    Ro, Yong Man
    MULTIMEDIA MODELING, MMM 2018, PT I, 2018, 10704 : 117 - 129
  • [46] An efficient implementation of 2D convolution in CNN
    Chang, Jing
    Sha, Jin
    IEICE ELECTRONICS EXPRESS, 2017, 14 (01): : 1 - 8
  • [47] Winols: A Large-Tiling Sparse Winograd CNN Accelerator on FPGAs
    Xie, Kunpeng
    Lu, Ye
    He, Xinyu
    Yi, Dezhi
    Dong, Huijuan
    Chen, Yao
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2024, 21 (02)
  • [48] Laius: an energy-efficient FPGA CNN accelerator with the support of a fixed-point training framework
    Nie, Zikai
    Li, Zhisheng
    Wang, Lei
    Guo, Shasha
    Deng, Yu
    Deng, Rangyu
    Dou, Qiang
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2020, 21 (03) : 418 - 428
  • [49] An Efficient CNN Architecture for Image Classification on FPGA Accelerator
    Mujawar, Shahmustafa
    Kiran, Divya
    Ramasangu, Hariharan
    2018 SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRONICS, COMPUTERS AND COMMUNICATIONS (ICAECC), 2018,
  • [50] CNN inference simulator for accurate and efficient accelerator design
    Choi, Seong Bin
    Lee, Sang Seol
    Jang, Sung Joon
    2019 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC), 2019, : 283 - 284