Hardware Accelerator Design for Sparse DNN Inference and Training: A Tutorial

被引：3

作者：

Mao, Wendong ^{[1
]}

Wang, Meiqi ^{[1
]}

Xie, Xiaoru ^{[2
]}

Wu, Xiao ^{[2
]}

Wang, Zhongfeng ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Integrated Circuits, Shenzhen Campus, Shenzhen 518107, Guangdong, Peoples R China

[2] Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210008, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS | 2024年 / 71卷 / 03期

关键词：

Hardware acceleration; sparsity; CNN; transformer; tutorial; deep learning; FLEXIBLE ACCELERATOR; NEURAL-NETWORKS; EFFICIENT; ARCHITECTURE;

D O I：

10.1109/TCSII.2023.3344681

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Deep neural networks (DNNs) are widely used in many fields, such as artificial intelligence generated content (AIGC) and robotics. To efficiently support these tasks, the model pruning technique is developed to compress the computational and memory-intensive DNNs. However, directly executing these sparse models on a common hardware accelerator can cause significant under-utilization, since invalid data resulting from the sparse patterns leads to unnecessary computations and irregular memory accesses. This brief analyzes the critical issues in accelerating sparse models, and provides an overview of typical hardware designs for various sparse DNNs, such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), generative adversarial networks (GANs), and Transformers. Following the overview, we give a practical guideline of designing efficient accelerators for sparse DNNs with qualitative metrics to evaluate hardware overhead under different cases. In addition, we highlight potential opportunities in terms of hardware/software/algorithm co-optimizations from the perspective of sparse DNN implementation, and provide insights into recent design trends for the efficient implementation of transformers with sparse attention, which facilitates large language model (LLM) deployments with high throughput and energy efficiency.

引用

页码：1708 / 1714

页数：7

共 50 条

[21] Hardware Design of Cryptographic Accelerator
Hulic, Michal
Vokorokos, Liberios
Adam, Norbert
Fecil'ak, Peter
2018 IEEE 16TH WORLD SYMPOSIUM ON APPLIED MACHINE INTELLIGENCE AND INFORMATICS (SAMI 2018): DEDICATED TO THE MEMORY OF PIONEER OF ROBOTICS ANTAL (TONY) K. BEJCZY, 2018, : 201 - 206
[22] Data Flow Mapping onto DNN Accelerator Considering Hardware Cost
Parchamdar, Baharealsadat
Reshadi, Midia
PROCEEDINGS OF THE 2020 IEEE DALLAS CIRCUITS AND SYSTEMS CONFERENCE (DCAS 2020), 2020,
[23] TUTORIAL - DESIGN PARAMETERS OF TOMOGRAPHIC HARDWARE
MORRIS, R
MATERIALS EVALUATION, 1985, 43 (02) : 11 - 11
[24] An Efficient Sparse-Aware Summation Optimization Strategy for DNN Accelerator
Zhang, Danqing
Li, Baiting
Wang, Hang
Zhang, Xuchong
Sun, Hongbin
2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
[25] AUTOHET: An Automated Heterogeneous ReRAM-Based Accelerator for DNN Inference
Wu, Tong
Het, Shuibing
Zhu, Jianxin
Chen, Weijian
Yang, Siling
Chen, Ping
Yin, Yanlong
Zhang, Xuechen
Sun, Xian-He
Chen, Gang
53RD INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2024, 2024, : 1052 - 1061
[26] An Efficient Hardware Accelerator for Sparse Transformer Neural Networks
Fang, Chao
Guo, Shouliang
Wu, Wei
Lin, Jun
Wang, Zhongfeng
Hsu, Ming Kai
Liu, Lingzhi
2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 2670 - 2674
[27] Custom Hardware Inference Accelerator for TensorFlow Lite for Microcontrollers
Manor, Erez
Greenberg, Shlomo
IEEE ACCESS, 2022, 10 : 73484 - 73493
[28] RAMAN: A Reconfigurable and Sparse tinyML Accelerator for Inference on Edge
Krishna, Adithya
Rohit Nudurupati, Srikanth
Chandana, D. G.
Dwivedi, Pritesh
van Schaik, Andre
Mehendale, Mahesh
Thakur, Chetan Singh
IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (14): : 24831 - 24845
[29] Late Breaking Results: Hardware-Efficient Stochastic Rounding Unit Design for DNN Training
Chang, Sung-En
Yuan, Geng
Lu, Alec
Sun, Mengshu
Li, Yanyu
Ma, Xiaolong
Li, Zhengang
Xie, Yanyue
Qin, Minghai
Lin, Xue
Fang, Zhenman
Wang, Yanzhi
PROCEEDINGS OF THE 59TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC 2022, 2022, : 1396 - 1397
[30] Sparse-T: Hardware accelerator thread for unstructured sparse data processing
Vasireddy, Pranathi
Kavi, Krishna
Mehta, Gayatri
2022 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2022,

← 1 2 3 4 5 →