Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks

被引:8
|
作者
Eckert, Charles [1 ]
Wang, Xiaowei [1 ]
Wang, Jingcheng [2 ]
Subramaniyan, Arun [1 ]
Iyer, Ravi [3 ]
Sylvester, Dennis [4 ]
Blaauw, David [5 ]
Das, Reetuparna [1 ]
机构
[1] Univ Michigan, Dept Comp Sci & Engn, Ann Arbor, MI 48109 USA
[2] Univ Michigan, Dept Elect Engn & Comp Sci, Ann Arbor, MI 48109 USA
[3] Intel Corp, Santa Clara, CA 95051 USA
[4] Univ Michigan, Elect Engn & Comp Sci, Ann Arbor, MI 48109 USA
[5] Univ Michigan, Ann Arbor, MI 48109 USA
关键词
D O I
10.1109/MM.2019.2908101
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This article presents Neural Cache architecture, which repurposes cache structures to transform them into massively parallel compute units capable of running inferences for deep neural networks. Techniques to do in situ arithmetic in SRAM arrays create efficient data mapping, and reducing data movement is proposed. Neural Cache architecture is capable of fully executing convolutional, fully connected, and pooling layers in cache. Our experimental results show that the proposed architecture can improve efficiency over a GPU by 128 x while requiring a minimal area overhead of 2%.
引用
收藏
页码:11 / 19
页数:9
相关论文
共 50 条
  • [31] Cache-locality Based Adaptive Warp Scheduling for Neural Network Acceleration on GPGPUs
    Hu, Weiming
    Zhou, Yi
    Quan, Ying
    Wang, Yuanfeng
    Lou, Xin
    2022 IEEE 35TH INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE (IEEE SOCC 2022), 2022, : 190 - 195
  • [32] GENES-IV - A BIT-SERIAL PROCESSING ELEMENT FOR A MULTIMODEL NEURAL-NETWORK ACCELERATOR
    IENNE, P
    VIREDAZ, MA
    JOURNAL OF VLSI SIGNAL PROCESSING, 1995, 9 (03): : 257 - 273
  • [33] Neural Language Modeling With Implicit Cache Pointers
    Li, Ke
    Povey, Daniel
    Khudanpur, Sanjeev
    INTERSPEECH 2020, 2020, : 3625 - 3629
  • [34] Fully-Asynchronous Cache-Efficient Simulation of Detailed Neural Networks
    Magalhaes, Bruno R. C.
    Sterling, Thomas
    Hines, Michael
    Schurmann, Felix
    COMPUTATIONAL SCIENCE - ICCS 2019, PT III, 2019, 11538 : 421 - 434
  • [35] Cache Management in Information-Centric Networks using Convolutional Neural Network
    Chiu, Kelvin H. T.
    Zhang, Jun
    Bensaou, Brahim
    2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2020,
  • [36] Cache Compression with Golomb-Rice Code and Quantization for Convolutional Neural Networks
    Bae, Seung-Hwan
    Lee, Hyuk-Jae
    Kim, Hyun
    2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
  • [37] Bit-serial convolution with prediction threshold for convolutional neural networks: Electrical Engineering Subject Index: EL7 Signal Processing
    Hsiao, Jen-Hao
    Chin, Wen-Long
    Wu, Yu-Feng
    Chang, Deng-Kai
    Journal of the Chinese Institute of Engineers, Transactions of the Chinese Institute of Engineers,Series A, 2022, 45 (03): : 266 - 272
  • [38] Bit-serial convolution with prediction threshold for convolutional neural networks Electrical Engineering Subject Index: EL7 Signal Processing
    Hsiao, Jen-Hao
    Chin, Wen-Long
    Wu, Yu-Feng
    Chang, Deng-Kai
    JOURNAL OF THE CHINESE INSTITUTE OF ENGINEERS, 2022, 45 (03) : 266 - 272
  • [39] GENES IV: a bit-serial processing element for a multi-model neural-network accelerator
    Swiss Federal Inst of Technology, Lausanne, Switzerland
    J VLSI Signal Process, 3 (257-273):
  • [40] Acceleration of Deep Recurrent Neural Networks with an FPGA cluster
    Sun, Yuxi
    Ben Ahmed, Akram
    Amano, Hideharu
    PROCEEDINGS OF THE 10TH INTERNATIONAL SYMPOSIUM ON HIGHLY EFFICIENT ACCELERATORS AND RECONFIGURABLE TECHNOLOGIES (HEART), 2019,