RAMAN: A Reconfigurable and Sparse tinyML Accelerator for Inference on Edge

被引：2

作者：

Krishna, Adithya ^{[1
,2
]}

Rohit Nudurupati, Srikanth ^{[3
]}

Chandana, D. G. ^{[3
]}

Dwivedi, Pritesh ^{[3
]}

van Schaik, Andre ^{[2
]}

Mehendale, Mahesh ^{[3
]}

Thakur, Chetan Singh ^{[3
]}

机构：

[1] Indian Inst Sci, Dept Elect Syst Engn, Bengaluru 560012, India

[2] Western Sydney Univ, MARCS Inst, Int Ctr Neuromorph Syst, Penrith, NSW 2751, Australia

[3] Indian Inst Sci, Dept Elect Syst Engn, Bengaluru 560012, India

来源：

IEEE INTERNET OF THINGS JOURNAL | 2024年 / 11卷 / 14期

关键词：

Standards; Field programmable gate arrays; Random access memory; Hardware; Costs; Convolution; System-on-chip; Convolutional neural networks (CNNs); deep learning; hardware acceleration; sparse processing; DEEP NEURAL-NETWORKS; FLEXIBLE ACCELERATOR; EFFICIENT; PROCESSOR; CNN;

D O I：

10.1109/JIOT.2024.3386832

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep neural network (DNN)-based inference at the edge is challenging as these compute, and data-intensive algorithms need to be implemented at low cost and low power while meeting the latency constraints of the target applications. Sparsity, in both activations and weights inherent to DNNs, is a key knob to leverage. In this article, we present RAMAN, a Re-configurable and spArse tinyML Accelerator for infereNce on edge, architected to exploit the sparsity to reduce area (storage), power, as well as latency. RAMAN can be configured to support a wide range of DNN topologies-consisting of different convolution layer types and a range of layer parameters (feature-map size and the number of channels). RAMAN can also be configured to support accuracy versus power/latency tradeoffs using techniques deployed at compile-time and run-time. We present the salient features of the architecture, provide implementation results, and compare the same with the state of the art. RAMAN employs novel dataflow inspired by Gustavson's algorithm that has optimal input activation (IA) and output activation (OA) reuse to minimize memory access and the overall data movement cost. The dataflow allows RAMAN to locally reduce the partial sum (Psum) within a processing element array to eliminate the Psum writeback traffic. Additionally, we suggest a method to reduce peak activation memory by overlapping IA and OA on the same memory space, which can reduce storage requirements by up to 50%. RAMAN was implemented on a low-power and resource-constrained Efinix Ti60 FPGA with 37.2K LUTs and 8.6K register utilization. RAMAN processes all layers of the MobileNetV1 model at 98.47 GOp/s/W and the DS-CNN model at 79.68 GOp/s/W by leveraging both weight and activation sparsity.

引用

页码：24831 / 24845

页数：15

共 50 条

[41] MicroFlow: An Efficient Rust-Based Inference Engine for TinyML
Carnelos, Matteo
Pasti, Francesco
Bellotto, Nicola
INTERNET OF THINGS, 2025, 30
[42] MARLIN: A Co-Design Methodology for Approximate ReconfigurabLe Inference of Neural Networks at the Edge
Guella, Flavia
Valpreda, Emanuele
Caon, Michele
Masera, Guido
Martina, Maurizio
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2024, 71 (05) : 2105 - 2118
[43] REACT: A Heterogeneous Reconfigurable Neural Network Accelerator with Software-Configurable NoCs for Training and Inference on Wearables
Upadhyay, Mohit
Juneja, Rohan
Wang, Bo
Zhou, Jun
Wong, Weng-Fai
Peh, Li-Shiuan
PROCEEDINGS OF THE 59TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC 2022, 2022, : 1291 - 1296
[44] Fuzzy Logic Based Hardware Accelerator with Partially Reconfigurable Defuzzification Stage for Image Edge Detection
Kurdi A.H.
Grantner J.L.
Abdel-Qader I.M.
International Journal of Reconfigurable Computing, 2017, 2017
[45] Unlocking Edge Intelligence Through Tiny Machine Learning (TinyML)
Zaidi, Syed Ali Raza
Hayajneh, Ali M.
Hafeez, Maryam
Ahmed, Q. Z.
IEEE ACCESS, 2022, 10 : 100867 - 100877
[46] Low Complexity Reconfigurable-Scalable Architecture Design Methodology for Deep Neural Network Inference Accelerator
Nimbekar, Anagha
Vatti, Chandrasekhara Srinivas
Dinesh, Y. V. Sai
Singh, Sunidhi
Gupta, Tarun
Chandrapu, Ramesh Reddy
Acharyya, Amit
2022 IEEE 35TH INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE (IEEE SOCC 2022), 2022, : 83 - 88
[47] ReCompAc: Reconfigurable Compute Accelerator
Duric, Milovan
Palomar, Oscar
Smith, Aaron
2013 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS (RECONFIG), 2013,
[48] A RECONFIGURABLE ACCELERATOR FOR QUANTUM COMPUTATIONS
Zampetakis, Michali
Samoladas, Vasilis
Dollas, Apostolos
2008 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE AND LOGIC APPLICATIONS, VOLS 1 AND 2, 2008, : 622 - 625
[49] A Reconfigurable Accelerator for Morphological Operations
Tekleyohannes, Menbere Kina
Weis, Christian
Wehn, Norbert
Klein, Martin
Siegrist, Michael
2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 186 - 193
[50] A 1036 TOp/s/W, 12.2 mW, 2.72 μJ/Inference All Digital TNN Accelerator in 22 nm FDX Technology for TinyML Applications
Scherer, Moritz
Di Mauro, Alfio
Rutishauser, Georg
Fischer, Tim
Benini, Luca
IEEE SYMPOSIUM ON LOW-POWER AND HIGH-SPEED CHIPS AND SYSTEMS (2022 IEEE COOL CHIPS 25), 2022,

← 1 2 3 4 5 →