Pod-racing: bulk-bitwise to floating-point compute in racetrack memory for machine learning at the edge

被引:3
|
作者
Ollivier, Sebastien [1 ]
Zhang, Xinyi [1 ]
Tang, Yue [1 ]
Choudhuri, Chayanika [1 ]
Hu, Jingtong [1 ]
Jones, Alex K. [1 ]
机构
[1] Univ Pittsburgh, Pittsburgh, PA 15260 USA
基金
美国国家科学基金会;
关键词
Floating-point arithmetic; Nanowires; Field programmable gate arrays; Random access memory; Convolutional neural networks; Common Information Model (computing); Edge computing; Machine learning;
D O I
10.1109/MM.2022.3195761
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Convolutional neural networks (CNNs) have become a ubiquitous algorithm with growing applications in mobile and edge settings. We describe a compute-in-memory (CIM) technique called POD-RACING using Racetrack memory (RM) to accelerate CNNs for edge systems. Using transverse read, a technique that can determine the number of "1"s in multiple adjacent domains, POD-RACING can efficiently implement multioperand bulk-bitwise and addition computations, and two-operand multiplication. We discuss how POD-RACING can implement both variable precision integer and floating point arithmetic using digital CIM. This allows both CNN inference and on-device training without expensive data movement to the cloud. Based on these functions we demonstrate the implementation of several CNNs with backpropagation using RM CIM and compare these to the state-of-the-art implementations of CNN inference and training. During training, POD-RACING improves efficiency by 2x, energy consumption by $\geq$>= 27%, and increases throughput by $\geq$>= 18% versus a state-of-the-art field-programmable gate array accelerator.
引用
收藏
页码:9 / 16
页数:8
相关论文
共 3 条
  • [1] An energy-efficient floating-point compute SRAM with pipelined in-memory bit-parallel exponent and bitwise mantissa processing
    Mai, Yangzhan
    Wang, Mingyu
    Zhong, Baiqing
    Zhang, Chuanghao
    Zhang, Yicong
    Yu, Zhiyi
    ELECTRONICS LETTERS, 2023, 59 (14)
  • [2] A 22 nm Floating-Point ReRAM Compute-in-Memory Macro Using Residue-Shared ADC for AI Edge Device
    Hsu, Hung-Hsi
    Wen, Tai-Hao
    Khwa, Win-San
    Huang, Wei-Hsing
    Ke, Zhao-En
    Chin, Yu-Hsiang
    Wen, Hua-Jin
    Chang, Yu-Chen
    Hsu, Wei-Ting
    Lele, Ashwin Sanjay
    Zhang, Bo
    Wu, Ping-Sheng
    Lo, Chung-Chuan
    Liu, Ren-Shuo
    Hsieh, Chih-Cheng
    Tang, Kea-Tiong
    Teng, Shih-Hsin
    Chou, Chung-Cheng
    Chih, Yu-Der
    Chang, Tsung-Yung Jonathan
    Chang, Meng-Fan
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2025, 60 (01) : 171 - 183
  • [3] A Floating-Point 6T SRAM In-Memory-Compute Macro Using Hybrid-Domain Structure for Advanced AI Edge Chips
    Wu, Ping-Chun
    Su, Jian-Wei
    Hong, Li-Yang
    Ren, Jin-Sheng
    Chien, Chih-Han
    Chen, Ho-Yu
    Ke, Chao-En
    Hsiao, Hsu-Ming
    Li, Sih-Han
    Sheu, Shyh-Shyuan
    Lo, Wei-Chung
    Chang, Shih-Chieh
    Lo, Chung-Chuan
    Liu, Ren-Shuo
    Hsieh, Chih-Cheng
    Tang, Kea-Tiong
    Chang, Meng-Fan
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2024, 59 (01) : 196 - 207