Design of Processing-in-Memory With Triple Computational Path and Sparsity Handling for Energy-Efficient DNN Training

被引:1
|
作者
Han, Wontak [1 ]
Heo, Jaehoon [1 ]
Kim, Junsoo [1 ]
Lim, Sukbin [1 ]
Kim, Joo-Young [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Elect Engn, Daejeon 34141, South Korea
关键词
Training; Computational modeling; Computer architecture; Deep learning; Circuits and systems; Power demand; Neurons; Accelerator architecture; machine learning; processing-in-memory architecture; bit-serial operation; inference; training; sparsity handling; SRAM; energy-efficient architecture; DEEP NEURAL-NETWORKS; SRAM; ACCELERATOR; MACRO;
D O I
10.1109/JETCAS.2022.3168852
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
As machine learning (ML) and artificial intelligence (AI) have become mainstream technologies, many accelerators have been proposed to cope with their computation kernels. However, they access the external memory frequently due to the large size of deep neural network model, suffering from the von Neumann bottleneck. Moreover, as privacy issue is becoming more critical, on-device training is emerging as its solution. However, on-device training is challenging because it should perform the training under a limited power budget, which requires a lot more computations and memory accesses than the inference. In this paper, we present an energy-efficient processing-in-memory (PIM) architecture supporting end-to-end on-device training named T-PIM. Its macro design includes an 8T-SRAM cell-based PIM block to compute in-memory AND operation and three computational datapaths for end-to-end training. Each of three computational paths integrates arithmetic units for forward propagation, backward propagation, and gradient calculation and weight update, respectively, allowing the weight data stored in the memory stationary. T-PIM also supports variable bit precision to cover various ML scenarios. It can use fully variable input bit precision and 2-bit, 4-bit, 8-bit, and 16-bit weight bit precision for the forward propagation and the same input bit precision and 16-bit weight bit precision for the backward propagation. In addition, T-PIM implements sparsity handling schemes that skip the computation for input data and turn off the arithmetic units for weight data to reduce both unnecessary computations and leakage power. Finally, we fabricate the T-PIM chip on a 5.04mm(2) die in a 28-nm CMOS logic process. It operates at 50-280MHz with the supply voltage of 0.75-1.05V, dissipating 5.25-51.23mW power in inference and 6.10-37.75mW in training. As a result, it achieves 17.90-161.08TOPS/W energy efficiency for the inference of 1-bit activation and 2-bit weight data, and 0.84-7.59TOPS/W for the training of 8-bit activation/error and 16-bit weight data. In conclusion, T-PIM is the first PIM chip that supports end-to-end training, demonstrating 2.02 times performance improvement over the latest PIM that partially supports training.
引用
收藏
页码:354 / 366
页数:13
相关论文
共 50 条
  • [41] Algorithm/Architecture Co-Design for Energy-Efficient Acceleration of Multi-Task DNN
    Shin, Jaekang
    Choi, Seungkyu
    Ra, Jongwoo
    Kim, Lee -Sup
    PROCEEDINGS OF THE 59TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC 2022, 2022, : 253 - 258
  • [42] The Hardware and Algorithm Co-Design for Energy-Efficient DNN Processor on Edge/Mobile Devices
    Lee, Jinsu
    Kang, Sanghoon
    Lee, Jinmook
    Shin, Dongjoo
    Han, Donghyeon
    Yoo, Hoi-Jun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2020, 67 (10) : 3458 - 3470
  • [43] Energy-efficient DNN-training with Stretchable DRAM Refresh Controller and Critical-bit Protection
    Duy-Thanh Nguyen
    Chang, Ik-Joon
    2019 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC), 2019, : 168 - 169
  • [44] A Digital SRAM Computing-in-Memory Design Utilizing Activation Unstructured Sparsity for High-Efficient DNN Inference
    Zhong, Baiqing
    Wang, Mingyu
    Zhang, Chuanghao
    Mai, Yangzhan
    Li, Xiaojie
    Yu, Zhiyi
    2023 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI, ISVLSI, 2023, : 7 - 12
  • [45] PIPECIM: Energy-Efficient Pipelined Computing-in-Memory Computation Engine With Sparsity-Aware Technique
    Wang, Yuanbo
    Chang, Liang
    Wang, Jingke
    Zhao, Pan
    Zeng, Jiahao
    Zhao, Xin
    Hao, Wuyang
    Zhou, Liang
    Tan, Haining
    Han, Yinhe
    Zhou, Jun
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2025, 33 (02) : 525 - 536
  • [46] Quantization and sparsity-aware processing for energy-efficient NVM-based convolutional neural networks
    Bao, Han
    Qin, Yifan
    Chen, Jia
    Yang, Ling
    Li, Jiancong
    Zhou, Houji
    Li, Yi
    Miao, Xiangshui
    FRONTIERS IN ELECTRONICS, 2022, 3
  • [47] Energy-Efficient Online Training with In Situ Parallel Computing on Electrochemical Memory Arrays
    Lu, Yingming
    Yang, Zhen
    Tao, Yaoyu
    Cai, Lei
    Zhang, Teng
    Yan, Longhao
    Huang, Ru
    Yang, Yuchao
    ADVANCED INTELLIGENT SYSTEMS, 2025,
  • [48] An Energy-Efficient Computing-in-Memory Neuromorphic System with On-Chip Training
    Zhao, Zhao
    Wang, Yuan
    Zhang, Xinyue
    Cui, Xiaoxin
    Huang, Ru
    2019 IEEE BIOMEDICAL CIRCUITS AND SYSTEMS CONFERENCE (BIOCAS 2019), 2019,
  • [49] Design and Implementation of a Hybrid, ADC/DAC-Free, Input-Sparsity-Aware, Precision Reconfigurable RRAM Processing-in-Memory Chip
    Wang J.
    Zhang T.
    Liu S.
    Liu Y.
    Wu Y.
    Hu S.
    Chen T.
    Liu Y.
    Yang Y.
    Huang R.
    IEEE Journal of Solid-State Circuits, 2024, 59 (02) : 595 - 604
  • [50] Collaborative communication and computational design for energy-efficient edge based learning network
    Dong, Erqiang
    Guo, Hengchuan
    EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2023, 2023 (01)