RAMAN: A Reconfigurable and Sparse tinyML Accelerator for Inference on Edge

被引:2
|
作者
Krishna, Adithya [1 ,2 ]
Rohit Nudurupati, Srikanth [3 ]
Chandana, D. G. [3 ]
Dwivedi, Pritesh [3 ]
van Schaik, Andre [2 ]
Mehendale, Mahesh [3 ]
Thakur, Chetan Singh [3 ]
机构
[1] Indian Inst Sci, Dept Elect Syst Engn, Bengaluru 560012, India
[2] Western Sydney Univ, MARCS Inst, Int Ctr Neuromorph Syst, Penrith, NSW 2751, Australia
[3] Indian Inst Sci, Dept Elect Syst Engn, Bengaluru 560012, India
来源
IEEE INTERNET OF THINGS JOURNAL | 2024年 / 11卷 / 14期
关键词
Standards; Field programmable gate arrays; Random access memory; Hardware; Costs; Convolution; System-on-chip; Convolutional neural networks (CNNs); deep learning; hardware acceleration; sparse processing; DEEP NEURAL-NETWORKS; FLEXIBLE ACCELERATOR; EFFICIENT; PROCESSOR; CNN;
D O I
10.1109/JIOT.2024.3386832
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep neural network (DNN)-based inference at the edge is challenging as these compute, and data-intensive algorithms need to be implemented at low cost and low power while meeting the latency constraints of the target applications. Sparsity, in both activations and weights inherent to DNNs, is a key knob to leverage. In this article, we present RAMAN, a Re-configurable and spArse tinyML Accelerator for infereNce on edge, architected to exploit the sparsity to reduce area (storage), power, as well as latency. RAMAN can be configured to support a wide range of DNN topologies-consisting of different convolution layer types and a range of layer parameters (feature-map size and the number of channels). RAMAN can also be configured to support accuracy versus power/latency tradeoffs using techniques deployed at compile-time and run-time. We present the salient features of the architecture, provide implementation results, and compare the same with the state of the art. RAMAN employs novel dataflow inspired by Gustavson's algorithm that has optimal input activation (IA) and output activation (OA) reuse to minimize memory access and the overall data movement cost. The dataflow allows RAMAN to locally reduce the partial sum (Psum) within a processing element array to eliminate the Psum writeback traffic. Additionally, we suggest a method to reduce peak activation memory by overlapping IA and OA on the same memory space, which can reduce storage requirements by up to 50%. RAMAN was implemented on a low-power and resource-constrained Efinix Ti60 FPGA with 37.2K LUTs and 8.6K register utilization. RAMAN processes all layers of the MobileNetV1 model at 98.47 GOp/s/W and the DS-CNN model at 79.68 GOp/s/W by leveraging both weight and activation sparsity.
引用
收藏
页码:24831 / 24845
页数:15
相关论文
共 50 条
  • [41] MicroFlow: An Efficient Rust-Based Inference Engine for TinyML
    Carnelos, Matteo
    Pasti, Francesco
    Bellotto, Nicola
    INTERNET OF THINGS, 2025, 30
  • [42] MARLIN: A Co-Design Methodology for Approximate ReconfigurabLe Inference of Neural Networks at the Edge
    Guella, Flavia
    Valpreda, Emanuele
    Caon, Michele
    Masera, Guido
    Martina, Maurizio
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2024, 71 (05) : 2105 - 2118
  • [43] REACT: A Heterogeneous Reconfigurable Neural Network Accelerator with Software-Configurable NoCs for Training and Inference on Wearables
    Upadhyay, Mohit
    Juneja, Rohan
    Wang, Bo
    Zhou, Jun
    Wong, Weng-Fai
    Peh, Li-Shiuan
    PROCEEDINGS OF THE 59TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC 2022, 2022, : 1291 - 1296
  • [44] Fuzzy Logic Based Hardware Accelerator with Partially Reconfigurable Defuzzification Stage for Image Edge Detection
    Kurdi A.H.
    Grantner J.L.
    Abdel-Qader I.M.
    International Journal of Reconfigurable Computing, 2017, 2017
  • [45] Unlocking Edge Intelligence Through Tiny Machine Learning (TinyML)
    Zaidi, Syed Ali Raza
    Hayajneh, Ali M.
    Hafeez, Maryam
    Ahmed, Q. Z.
    IEEE ACCESS, 2022, 10 : 100867 - 100877
  • [46] Low Complexity Reconfigurable-Scalable Architecture Design Methodology for Deep Neural Network Inference Accelerator
    Nimbekar, Anagha
    Vatti, Chandrasekhara Srinivas
    Dinesh, Y. V. Sai
    Singh, Sunidhi
    Gupta, Tarun
    Chandrapu, Ramesh Reddy
    Acharyya, Amit
    2022 IEEE 35TH INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE (IEEE SOCC 2022), 2022, : 83 - 88
  • [47] ReCompAc: Reconfigurable Compute Accelerator
    Duric, Milovan
    Palomar, Oscar
    Smith, Aaron
    2013 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS (RECONFIG), 2013,
  • [48] A RECONFIGURABLE ACCELERATOR FOR QUANTUM COMPUTATIONS
    Zampetakis, Michali
    Samoladas, Vasilis
    Dollas, Apostolos
    2008 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE AND LOGIC APPLICATIONS, VOLS 1 AND 2, 2008, : 622 - 625
  • [49] A Reconfigurable Accelerator for Morphological Operations
    Tekleyohannes, Menbere Kina
    Weis, Christian
    Wehn, Norbert
    Klein, Martin
    Siegrist, Michael
    2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 186 - 193
  • [50] A 1036 TOp/s/W, 12.2 mW, 2.72 μJ/Inference All Digital TNN Accelerator in 22 nm FDX Technology for TinyML Applications
    Scherer, Moritz
    Di Mauro, Alfio
    Rutishauser, Georg
    Fischer, Tim
    Benini, Luca
    IEEE SYMPOSIUM ON LOW-POWER AND HIGH-SPEED CHIPS AND SYSTEMS (2022 IEEE COOL CHIPS 25), 2022,