DualPIM: A Dual-Precision and Low-Power CNN Inference Engine Using SRAM- and eDRAM-based Processing-in-Memory Arrays

被引:1
|
作者
Jung, Sangwoo [1 ]
Lee, Jaehyun [1 ]
Noh, Huiseong [1 ]
Yoon, Jong-Hyeok [1 ]
Kung, Jaeha [1 ]
机构
[1] DGIST, Dept EECS, Daegu, South Korea
基金
新加坡国家研究基金会;
关键词
convolutional neural networks; deep learning; processing-in-memory; quantized neural networks;
D O I
10.1109/AICAS54282.2022.9869905
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, machine learning community has focused on developing deep learning models that are not only accurate but also efficient to deploy them on resource-limited devices. One popular approach to improve the model efficiency is to aggressively quantize both features and weight parameters. However, the quantization generally entails accuracy degradation thus additional compensation techniques are required. In this work, we present a novel network architecture, named DualNet, that leverages two separate bit-precision paths to effectively achieve high accuracy and low model complexity. On top of this new network architecture, we propose to utilize both SRAM-and eDRAM-based processing-in-memory (PIM) arrays, named DualPIM, to run each computing path in a DualNet at a dedicated PIM array. As a result, the proposed DualNet significantly reduces the energy consumption by 81% on average compared to other quantized neural networks (i.e., 4-bit and ternary), while achieving 13% higher accuracy on average.
引用
收藏
页码:70 / 73
页数:4
相关论文
共 1 条
  • [1] A Dual-Precision and Low-Power CNN Inference Engine Using a Heterogeneous Processing-in-Memory Architecture
    Jung, Sangwoo
    Lee, Jaehyun
    Park, Dahoon
    Lee, Youngjoo
    Yoon, Jong-Hyeok
    Kung, Jaeha
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2024, : 1 - 14