A Balanced Sparse Matrix Convolution Accelerator for Efficient CNN Training

被引:0
|
作者
Chen, Yuechen [1 ]
Louri, Ahmed [2 ]
Liu, Shanshan [3 ]
Lombardi, Fabrizio [4 ]
机构
[1] Frostburg State Univ, Dept Comp Sci & Informat Technol, Frostburg, MD 21532 USA
[2] George Washington Univ, Dept Elect & Comp Engn, Washington, DC 20052 USA
[3] Univ Elect Sci & Technol China, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China
[4] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA USA
关键词
Convolutional neural network; training; sparse matrix compression; memory traffic; load balancing;
D O I
10.1109/TCSI.2024.3430831
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Sparse Convolutional Neural Network (CNN) training is well known to be time-consuming due to significant off-chip memory traffic. To effectively deploy sparse training, existing accelerators store matrices in a compressed format to eliminate memory accesses for zeros; hence, accelerators are designed to process compressed matrices to avoid zero computations. We have observed that the compression rate is greatly affected by the sparsity in the matrices with different formats. Given the varying levels of sparsity in activations, weights, errors, and gradients matrices throughout the sparse training process, it becomes impractical to achieve consistently high compression rates using a singular compression method for the entire duration of the training. Moreover, random zeros in the matrices result in irregular computation patterns, further increasing execution time. To address these issues, we propose a balanced sparse matrix convolution accelerator design for efficient CNN training. Specifically, a dual matrix compression technique is developed that seamlessly combines two widely used sparse matrix compression formats with a control algorithm for lower memory traffic during training. Based on this compression technique, a two-level workload balancing technique is then designed to further reduce the execution time and energy consumption. Finally, an accelerator is implemented to support the proposed techniques. The cycle-accurate simulation results show that the proposed accelerator reduces the execution time by 34% and the energy consumption by 24% on average compared to existing sparse training accelerators.
引用
收藏
页码:4638 / 4651
页数:14
相关论文
共 50 条
  • [21] Efficient Layer-Wise N:M Sparse CNN Accelerator with Flexible SPEC: Sparse Processing Element Clusters
    Xie, Xiaoru
    Zhu, Mingyu
    Lu, Siyuan
    Wang, Zhongfeng
    MICROMACHINES, 2023, 14 (03)
  • [22] Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference
    Liu, Zhi-Gang
    Whatmough, Paul N.
    Mattina, Matthew
    IEEE COMPUTER ARCHITECTURE LETTERS, 2020, 19 (01) : 34 - 37
  • [23] A CNN Hardware Accelerator Using Triangle-based Convolution
    Thomas, Amal K.
    Poddar, Soumyajit
    Mondal, Hemanta Kumar
    ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS, 2022, 18 (04)
  • [24] Configurable CNN Accelerator in Speech Processing based on Vector Convolution
    Hui, Lanqing
    Cao, Shan
    Chen, Zhiyong
    Li, Shan
    Xu, Shugong
    2022 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2022): INTELLIGENT TECHNOLOGY IN THE POST-PANDEMIC ERA, 2022, : 146 - 149
  • [25] MVP: An Efficient CNN Accelerator with Matrix, Vector, and Processing-Near-Memory Units
    Lee, Sunjung
    Choi, Jaewan
    Jung, Wonkyung
    Kim, Byeongho
    Park, Jaehyun
    Kim, Hweesoo
    Ahn, Jung Ho
    ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2022, 27 (05)
  • [26] Edge-Side Fine-Grained Sparse CNN Accelerator With Efficient Dynamic Pruning Scheme
    Wu, Bi
    Yu, Tianyang
    Chen, Ke
    Liu, Weiqiang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2024, 71 (03) : 1285 - 1298
  • [27] An Efficient Sparse CNNs Accelerator on FPGA
    Zhang, Yonghua
    Jiang, Hongxu
    Li, Xiaobin
    Wang, Haojie
    Dong, Dong
    Cao, Yongxiang
    2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022), 2022, : 504 - 505
  • [28] Efficient Accelerator for Dilated and Transposed Convolution with Decomposition
    Chang, Kuo-Wei
    Chang, Tian-Sheuan
    2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
  • [29] Data Stream Oriented Fine-grained Sparse CNN Accelerator with Efficient Unstructured Pruning Strategy
    Yu, Tianyang
    Wu, Bi
    Chen, Ke
    Yan, Chenggang
    Liu, Weiqiang
    PROCEEDINGS OF THE 32ND GREAT LAKES SYMPOSIUM ON VLSI 2022, GLSVLSI 2022, 2022, : 243 - 248
  • [30] A Tiny Accelerator for Mixed-Bit Sparse CNN Based on Efficient Fetch Method of SIMO SPad
    Hu, Xianghong
    Liu, Xuejiao
    Liu, Yu
    Zhang, Haowei
    Huang, Xijie
    Guan, Xihao
    Liang, Luhong
    Tsui, Chi Ying
    Xiong, Xiaoming
    Cheng, Kwang-Ting
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2023, 70 (08) : 3079 - 3083