Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks

被引:8
|
作者
Eckert, Charles [1 ]
Wang, Xiaowei [1 ]
Wang, Jingcheng [2 ]
Subramaniyan, Arun [1 ]
Iyer, Ravi [3 ]
Sylvester, Dennis [4 ]
Blaauw, David [5 ]
Das, Reetuparna [1 ]
机构
[1] Univ Michigan, Dept Comp Sci & Engn, Ann Arbor, MI 48109 USA
[2] Univ Michigan, Dept Elect Engn & Comp Sci, Ann Arbor, MI 48109 USA
[3] Intel Corp, Santa Clara, CA 95051 USA
[4] Univ Michigan, Elect Engn & Comp Sci, Ann Arbor, MI 48109 USA
[5] Univ Michigan, Ann Arbor, MI 48109 USA
关键词
D O I
10.1109/MM.2019.2908101
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This article presents Neural Cache architecture, which repurposes cache structures to transform them into massively parallel compute units capable of running inferences for deep neural networks. Techniques to do in situ arithmetic in SRAM arrays create efficient data mapping, and reducing data movement is proposed. Neural Cache architecture is capable of fully executing convolutional, fully connected, and pooling layers in cache. Our experimental results show that the proposed architecture can improve efficiency over a GPU by 128 x while requiring a minimal area overhead of 2%.
引用
收藏
页码:11 / 19
页数:9
相关论文
共 50 条
  • [21] Data-Pattern-Driven LUT for Efficient In-Cache Computing in CNNs Acceleration
    Fei, Zhengpan
    Lyu, Mingchuan
    Kawakami, Satoshi
    Inoue, Koji
    IEEE COMPUTER ARCHITECTURE LETTERS, 2025, 24 (01) : 81 - 84
  • [22] A Dedicated Bit-serial Hardware Neuron for Massively-Parallel Neural Networks in Fast Epilepsy Diagnosis
    Kueh, Si Mon
    Kazmierski, Tom
    2017 IEEE-NIH HEALTHCARE INNOVATIONS AND POINT OF CARE TECHNOLOGIES (HI-POCT), 2017, : 105 - 108
  • [23] Bit-Serial multiplier based Neural Processing Element with Approximate adder tree
    Jo, Cheolwon
    Lee, KwangYeob
    2020 17TH INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC 2020), 2020, : 286 - 287
  • [24] A Comparison of Bit-Parallel and Bit-Serial Architectures for WDM Networks
    Krishna M. Sivalingam
    Photonic Network Communication, 1999, 1 : 89 - 103
  • [25] Colonnade: A Reconfigurable SRAM-Based Digital Bit-Serial Compute-In-Memory Macro for Processing Neural Networks
    Kim, Hyunjoon
    Yoo, Taegeun
    Kim, Tony Tae-Hyoung
    Kim, Bongjin
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2021, 56 (07) : 2221 - 2233
  • [26] Neural Network Language Model with Cache
    Soutner, Daniel
    Loose, Zdenek
    Mueller, Ludek
    Prazak, Ales
    TEXT, SPEECH AND DIALOGUE, TSD 2012, 2012, 7499 : 528 - 534
  • [27] A comparison of bit-parallel and bit-serial architectures for WDM networks
    Sivalingam, KM
    PHOTONIC NETWORK COMMUNICATIONS, 1999, 1 (01) : 89 - 103
  • [28] Estimating neural networks-based algorithm for adaptive cache replacement
    Obaidat, MS
    Khalid, H
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 1998, 28 (04): : 602 - 611
  • [29] Bit Efficient Quantization for Deep Neural Networks
    Nayak, Prateeth
    Zhang, David
    Chai, Sek
    FIFTH WORKSHOP ON ENERGY EFFICIENT MACHINE LEARNING AND COGNITIVE COMPUTING - NEURIPS EDITION (EMC2-NIPS 2019), 2019, : 52 - 56
  • [30] Graph4Cache: A Graph Neural Network Model for Cache Prefetching
    Shang, Jing
    Wu, Zhihui
    Xiao, Zhiwen
    Zhang, Yifei
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2024, 61 (08): : 1945 - 1956