A Balanced Sparse Matrix Convolution Accelerator for Efficient CNN Training

被引：0

作者：

Chen, Yuechen ^{[1
]}

Louri, Ahmed ^{[2
]}

Liu, Shanshan ^{[3
]}

Lombardi, Fabrizio ^{[4
]}

机构：

[1] Frostburg State Univ, Dept Comp Sci & Informat Technol, Frostburg, MD 21532 USA

[2] George Washington Univ, Dept Elect & Comp Engn, Washington, DC 20052 USA

[3] Univ Elect Sci & Technol China, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China

[4] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA USA

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS | 2024年 / 71卷 / 10期

关键词：

Convolutional neural network; training; sparse matrix compression; memory traffic; load balancing;

D O I：

10.1109/TCSI.2024.3430831

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Sparse Convolutional Neural Network (CNN) training is well known to be time-consuming due to significant off-chip memory traffic. To effectively deploy sparse training, existing accelerators store matrices in a compressed format to eliminate memory accesses for zeros; hence, accelerators are designed to process compressed matrices to avoid zero computations. We have observed that the compression rate is greatly affected by the sparsity in the matrices with different formats. Given the varying levels of sparsity in activations, weights, errors, and gradients matrices throughout the sparse training process, it becomes impractical to achieve consistently high compression rates using a singular compression method for the entire duration of the training. Moreover, random zeros in the matrices result in irregular computation patterns, further increasing execution time. To address these issues, we propose a balanced sparse matrix convolution accelerator design for efficient CNN training. Specifically, a dual matrix compression technique is developed that seamlessly combines two widely used sparse matrix compression formats with a control algorithm for lower memory traffic during training. Based on this compression technique, a two-level workload balancing technique is then designed to further reduce the execution time and energy consumption. Finally, an accelerator is implemented to support the proposed techniques. The cycle-accurate simulation results show that the proposed accelerator reduces the execution time by 34% and the energy consumption by 24% on average compared to existing sparse training accelerators.

引用

页码：4638 / 4651

页数：14

共 50 条

[1] SqueezeFlow: A Sparse CNN Accelerator Exploiting Concise Convolution Rules
Li, Jiajun
Jiang, Shuhao
Gong, Shijun
Wu, Jingya
Yan, Junchao
Yan, Guihai
Li, Xiaowei
IEEE TRANSACTIONS ON COMPUTERS, 2019, 68 (11) : 1663 - 1677
[2] An Efficient Sparse CNN Inference Accelerator With Balanced Intra-and Inter-PE Workload
Guo, Jianbo
Xu, Tongqing
Wu, Zhenyang
Xiao, Hao
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2024,
[3] MSCA: A Multi-grained Sparse Convolution Accelerator for DNN Training
Mao, Yingchang
Liu, Qiang
Cheung, Ray C. C.
2024 IEEE 35TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, ASAP 2024, 2024, : 34 - 35
[4] Support Convolution of CNN with Compression Sparse Matrix Multiplication Flow in TVM
Liao, Hui-Hsin
Lee, Chao-Lin
Lee, Jenq-Kuen
Lai, Wei-Chih
Hung, Ming-Yu
Huang, Chung-Wen
50TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOP PROCEEDINGS - ICPP WORKSHOPS '21, 2021,
[5] An Efficient FPGA Accelerator Optimized for High Throughput Sparse CNN Inference
Wen, Jiayu
Ma, Yufei
Wang, Zhongfeng
APCCAS 2020: PROCEEDINGS OF THE 2020 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS 2020), 2020, : 165 - 168
[6] An Efficient CNN Training Accelerator Leveraging Transposable Block Sparsity
Xu, Mingyang
Lu, Jinming
Wang, Zhongfeng
Lin, Jun
2022 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2022): INTELLIGENT TECHNOLOGY IN THE POST-PANDEMIC ERA, 2022, : 230 - 233
[7] WinTA: An Efficient Reconfigurable CNN Training Accelerator With Decomposition Winograd
Lu, Jinming
Wang, Hui
Lin, Jun
Wang, Zhongfeng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2024, 71 (02) : 634 - 645
[8] A Depthwise Separable Convolution Architecture for CNN Accelerator
Srivastava, Harsh
Sarawadekar, Kishor
PROCEEDINGS OF 2020 IEEE APPLIED SIGNAL PROCESSING CONFERENCE (ASPCON 2020), 2020, : 1 - 5
[9] An efficient CNN accelerator for pattern-compressed sparse neural networks on FPGA
Zhang, Yonghua
Wang, Haojie
Pan, Zhenhua
NEUROCOMPUTING, 2025, 611
[10] Design of Power-Efficient Training Accelerator for Convolution Neural Networks
Hong, JiUn
Arslan, Saad
Lee, TaeGeon
Kim, HyungWon
ELECTRONICS, 2021, 10 (07)

← 1 2 3 4 5 →