Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks

被引：8

作者：

Eckert, Charles ^{[1
]}

Wang, Xiaowei ^{[1
]}

Wang, Jingcheng ^{[2
]}

Subramaniyan, Arun ^{[1
]}

Iyer, Ravi ^{[3
]}

Sylvester, Dennis ^{[4
]}

Blaauw, David ^{[5
]}

Das, Reetuparna ^{[1
]}

机构：

[1] Univ Michigan, Dept Comp Sci & Engn, Ann Arbor, MI 48109 USA

[2] Univ Michigan, Dept Elect Engn & Comp Sci, Ann Arbor, MI 48109 USA

[3] Intel Corp, Santa Clara, CA 95051 USA

[4] Univ Michigan, Elect Engn & Comp Sci, Ann Arbor, MI 48109 USA

[5] Univ Michigan, Ann Arbor, MI 48109 USA

来源：

IEEE MICRO | 2019年 / 39卷 / 03期

关键词：

D O I：

10.1109/MM.2019.2908101

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

This article presents Neural Cache architecture, which repurposes cache structures to transform them into massively parallel compute units capable of running inferences for deep neural networks. Techniques to do in situ arithmetic in SRAM arrays create efficient data mapping, and reducing data movement is proposed. Neural Cache architecture is capable of fully executing convolutional, fully connected, and pooling layers in cache. Our experimental results show that the proposed architecture can improve efficiency over a GPU by 128 x while requiring a minimal area overhead of 2%.

引用

页码：11 / 19

页数：9

共 50 条

[21] Data-Pattern-Driven LUT for Efficient In-Cache Computing in CNNs Acceleration
Fei, Zhengpan
Lyu, Mingchuan
Kawakami, Satoshi
Inoue, Koji
IEEE COMPUTER ARCHITECTURE LETTERS, 2025, 24 (01) : 81 - 84
[22] A Dedicated Bit-serial Hardware Neuron for Massively-Parallel Neural Networks in Fast Epilepsy Diagnosis
Kueh, Si Mon
Kazmierski, Tom
2017 IEEE-NIH HEALTHCARE INNOVATIONS AND POINT OF CARE TECHNOLOGIES (HI-POCT), 2017, : 105 - 108
[23] Bit-Serial multiplier based Neural Processing Element with Approximate adder tree
Jo, Cheolwon
Lee, KwangYeob
2020 17TH INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC 2020), 2020, : 286 - 287
[24] A Comparison of Bit-Parallel and Bit-Serial Architectures for WDM Networks
Krishna M. Sivalingam
Photonic Network Communication, 1999, 1 : 89 - 103
[25] Colonnade: A Reconfigurable SRAM-Based Digital Bit-Serial Compute-In-Memory Macro for Processing Neural Networks
Kim, Hyunjoon
Yoo, Taegeun
Kim, Tony Tae-Hyoung
Kim, Bongjin
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2021, 56 (07) : 2221 - 2233
[26] Neural Network Language Model with Cache
Soutner, Daniel
Loose, Zdenek
Mueller, Ludek
Prazak, Ales
TEXT, SPEECH AND DIALOGUE, TSD 2012, 2012, 7499 : 528 - 534
[27] A comparison of bit-parallel and bit-serial architectures for WDM networks
Sivalingam, KM
PHOTONIC NETWORK COMMUNICATIONS, 1999, 1 (01) : 89 - 103
[28] Estimating neural networks-based algorithm for adaptive cache replacement
Obaidat, MS
Khalid, H
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 1998, 28 (04): : 602 - 611
[29] Bit Efficient Quantization for Deep Neural Networks
Nayak, Prateeth
Zhang, David
Chai, Sek
FIFTH WORKSHOP ON ENERGY EFFICIENT MACHINE LEARNING AND COGNITIVE COMPUTING - NEURIPS EDITION (EMC2-NIPS 2019), 2019, : 52 - 56
[30] Graph4Cache: A Graph Neural Network Model for Cache Prefetching
Shang, Jing
Wu, Zhihui
Xiao, Zhiwen
Zhang, Yifei
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2024, 61 (08): : 1945 - 1956

← 1 2 3 4 5 →