Efficient Neural Network Acceleration on GPGPU using Content Addressable Memory

被引：0

作者：

Imani, Mohsen ^{[1
]}

Peroni, Daniel ^{[1
]}

Kim, Yeseong ^{[1
]}

Rahimi, Abbas ^{[2
]}

Rosing, Tajana ^{[1
]}

机构：

[1] Univ Calif San Diego, CSE, La Jolla, CA 92093 USA

[2] Univ Calif Berkeley, EECS, Berkeley, CA 94720 USA

来源：

PROCEEDINGS OF THE 2017 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE) | 2017年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, neural networks have been demonstrated to be effective models for image processing, video segmentation, speech recognition, computer vision and gaming. However, high energy computation and low performance are the primary bottlenecks of running the neural networks. In this paper, we propose an energy/performance-efficient network acceleration technique on General Purpose GPU (GPGPU) architecture which utilizes specialized resistive nearest content addressable memory blocks, called NNCAM, by exploiting computation locality of the learning algorithms. NNCAM stores highly frequent patterns corresponding to neural network operations and searches for the most similar patterns to reuse the computation results. To improve NNCAM computation efficiency and accuracy, we proposed layer-based associative update and selective approximation techniques. The layer-based update improves data locality of NNCAM blocks by filling NNCAM values based on the frequent computation patterns of each neural network layer. To guarantee the appropriate level of computation accuracy while providing maximum energy saving, our design adaptively allocates the neural network operations to either NNCAM or GPGPU floating point units (FPUs). The selective approximation relaxes computation on neural network layers by considering the impact on accuracy. In evaluation, we integrate NNCAM blocks with the modern AMD southern Island GPU architecture. Our experimental evaluation shows that the enhanced GPGPU can result in 68% energy savings and 40% speedup running on four popular convolutional neural networks (CNN), ensuring acceptable < 2% quality loss.

引用

页码：1026 / 1031

页数：6

共 50 条

[41] High-Performance and Robust Binarized Neural Network Accelerator Based on Modified Content-Addressable Memory
Choi, Sureum
Jeon, Youngjun
Seo, Yeongkyo
ELECTRONICS, 2022, 11 (17)
[42] SUSTAINED OSCILLATIONS IN A SYMMETRIC COOPERATIVE-COMPETITIVE NEURAL NETWORK - DISPROOF OF A CONJECTURE ABOUT CONTENT ADDRESSABLE MEMORY
COHEN, MA
NEURAL NETWORKS, 1988, 1 (03) : 217 - 221
[43] Scalable Ternary Content Addressable Memory Implementation Using FPGAs
Jiang, Weirong
2013 ACM/IEEE SYMPOSIUM ON ARCHITECTURES FOR NETWORKING AND COMMUNICATIONS SYSTEMS (ANCS), 2013, : 71 - 82
[44] Deep Packet Inspection using Ternary Content Addressable Memory
Jayashree, S.
Shivashankarappa, N.
2014 INTERNATIONAL CONFERENCE ON CIRCUITS, COMMUNICATION, CONTROL AND COMPUTING (I4C), 2014, : 441 - 447
[45] A CONTENT ADDRESSABLE MEMORY CIRCUIT USING JOSEPHSON-JUNCTIONS
MORISUE, M
KANEKO, M
HOSOYA, H
IEEE TRANSACTIONS ON MAGNETICS, 1987, 23 (02) : 743 - 746
[46] Binary Content-Addressable Memory System using Nanoelectromechanical Memory Switch
Kim, Hyunju
Kim, Youngmin
2020 17TH INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC 2020), 2020, : 270 - 271
[47] AN IMPLEMENTATION EFFICIENT LEARNING ALGORITHM FOR ADAPTIVE-CONTROL USING ASSOCIATIVE CONTENT-ADDRESSABLE MEMORY
HU, YD
FELLMAN, RD
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1995, 25 (04): : 704 - 709
[48] CONTENT ADDRESSABLE MEMORY CIRCUIT USING JOSEPHSON JUNCTIONS.
Morisue, M.
Kaneko, M.
Hosoya, H.
IEEE Transactions on Magnetics, 1986, MAG-23 (02)
[49] Parallel PSO for Efficient Neural Network Training Using GPGPU and Apache Spark in Edge Computing Sets
Capel, Manuel I.
Salguero-Hidalgo, Alberto
Holgado-Terriza, Juan A.
ALGORITHMS, 2024, 17 (09)
[50] Speed Up Method for Neural Network Learning by Using GPGPU
Tsuchida, Yuta
Yoshioka, Michifumi
6TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS, AND THE 13TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS, 2012, : 193 - 196

← 1 2 3 4 5 →