Efficient Neural Network Acceleration on GPGPU using Content Addressable Memory

被引:0
|
作者
Imani, Mohsen [1 ]
Peroni, Daniel [1 ]
Kim, Yeseong [1 ]
Rahimi, Abbas [2 ]
Rosing, Tajana [1 ]
机构
[1] Univ Calif San Diego, CSE, La Jolla, CA 92093 USA
[2] Univ Calif Berkeley, EECS, Berkeley, CA 94720 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, neural networks have been demonstrated to be effective models for image processing, video segmentation, speech recognition, computer vision and gaming. However, high energy computation and low performance are the primary bottlenecks of running the neural networks. In this paper, we propose an energy/performance-efficient network acceleration technique on General Purpose GPU (GPGPU) architecture which utilizes specialized resistive nearest content addressable memory blocks, called NNCAM, by exploiting computation locality of the learning algorithms. NNCAM stores highly frequent patterns corresponding to neural network operations and searches for the most similar patterns to reuse the computation results. To improve NNCAM computation efficiency and accuracy, we proposed layer-based associative update and selective approximation techniques. The layer-based update improves data locality of NNCAM blocks by filling NNCAM values based on the frequent computation patterns of each neural network layer. To guarantee the appropriate level of computation accuracy while providing maximum energy saving, our design adaptively allocates the neural network operations to either NNCAM or GPGPU floating point units (FPUs). The selective approximation relaxes computation on neural network layers by considering the impact on accuracy. In evaluation, we integrate NNCAM blocks with the modern AMD southern Island GPU architecture. Our experimental evaluation shows that the enhanced GPGPU can result in 68% energy savings and 40% speedup running on four popular convolutional neural networks (CNN), ensuring acceptable < 2% quality loss.
引用
收藏
页码:1026 / 1031
页数:6
相关论文
共 50 条
  • [41] High-Performance and Robust Binarized Neural Network Accelerator Based on Modified Content-Addressable Memory
    Choi, Sureum
    Jeon, Youngjun
    Seo, Yeongkyo
    ELECTRONICS, 2022, 11 (17)
  • [42] SUSTAINED OSCILLATIONS IN A SYMMETRIC COOPERATIVE-COMPETITIVE NEURAL NETWORK - DISPROOF OF A CONJECTURE ABOUT CONTENT ADDRESSABLE MEMORY
    COHEN, MA
    NEURAL NETWORKS, 1988, 1 (03) : 217 - 221
  • [43] Scalable Ternary Content Addressable Memory Implementation Using FPGAs
    Jiang, Weirong
    2013 ACM/IEEE SYMPOSIUM ON ARCHITECTURES FOR NETWORKING AND COMMUNICATIONS SYSTEMS (ANCS), 2013, : 71 - 82
  • [44] Deep Packet Inspection using Ternary Content Addressable Memory
    Jayashree, S.
    Shivashankarappa, N.
    2014 INTERNATIONAL CONFERENCE ON CIRCUITS, COMMUNICATION, CONTROL AND COMPUTING (I4C), 2014, : 441 - 447
  • [45] A CONTENT ADDRESSABLE MEMORY CIRCUIT USING JOSEPHSON-JUNCTIONS
    MORISUE, M
    KANEKO, M
    HOSOYA, H
    IEEE TRANSACTIONS ON MAGNETICS, 1987, 23 (02) : 743 - 746
  • [46] Binary Content-Addressable Memory System using Nanoelectromechanical Memory Switch
    Kim, Hyunju
    Kim, Youngmin
    2020 17TH INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC 2020), 2020, : 270 - 271
  • [47] AN IMPLEMENTATION EFFICIENT LEARNING ALGORITHM FOR ADAPTIVE-CONTROL USING ASSOCIATIVE CONTENT-ADDRESSABLE MEMORY
    HU, YD
    FELLMAN, RD
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1995, 25 (04): : 704 - 709
  • [48] CONTENT ADDRESSABLE MEMORY CIRCUIT USING JOSEPHSON JUNCTIONS.
    Morisue, M.
    Kaneko, M.
    Hosoya, H.
    IEEE Transactions on Magnetics, 1986, MAG-23 (02)
  • [49] Parallel PSO for Efficient Neural Network Training Using GPGPU and Apache Spark in Edge Computing Sets
    Capel, Manuel I.
    Salguero-Hidalgo, Alberto
    Holgado-Terriza, Juan A.
    ALGORITHMS, 2024, 17 (09)
  • [50] Speed Up Method for Neural Network Learning by Using GPGPU
    Tsuchida, Yuta
    Yoshioka, Michifumi
    6TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS, AND THE 13TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS, 2012, : 193 - 196