A 95.6-TOPS/W Deep Learning Inference Accelerator With Per-Vector Scaled 4-bit Quantization in 5 nm

被引:13
|
作者
Keller, Ben [1 ]
Venkatesan, Rangharajan [1 ]
Dai, Steve [1 ]
Tell, Stephen G. [2 ]
Zimmer, Brian [1 ]
Sakr, Charbel [1 ]
Dally, William J. [1 ]
Gray, C. Thomas [2 ]
Khailany, Brucek [3 ]
机构
[1] NVIDIA Inc, Santa Clara, CA 95051 USA
[2] NVIDIA Inc, Durham, NC 27713 USA
[3] NVIDIA Inc, Austin, TX 78717 USA
关键词
Transformers; Quantization (signal); Arithmetic; Task analysis; Costs; Deep learning; Optimization; Accuracy-efficiency trade-off; BERT; deep neural network (DNN) inference accelerator; quantization; transformers; EFFICIENT;
D O I
10.1109/JSSC.2023.3234893
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The energy efficiency of deep neural network (DNN) inference can be improved with custom accelerators. DNN inference accelerators often employ specialized hardware techniques to improve energy efficiency, but many of these techniques result in catastrophic accuracy loss on transformer-based DNNs, which have become ubiquitous for natural language processing (NLP) tasks. This article presents a DNN accelerator designed for efficient execution of transformers. The proposed accelerator implements per-vector scaled quantization (VSQ), which employs an independent scale factor for each 64-element vector to enable the use of 4-bit arithmetic with little accuracy loss and low energy overhead. Using a multilevel dataflow to maximize reuse, the 5-nm prototype achieves 95.6 tera-operations per second per Watt (TOPS/W) at 0.46 V on a 4-bit benchmarking layer with VSQ. At a nominal voltage of 0.67 V, the accelerator achieves 1734 inferences/s/W (38.7 TOPS/W) with only 0.7% accuracy loss on BERT-Base and 4714 inferences/s/W (38.6 TOPS/W) with 0.15% accuracy loss on ResNet-50 by using quantization-aware fine-tuning to recover accuracy, demonstrating a practical accelerator design for energy-efficient DNN inference.
引用
收藏
页码:1129 / 1141
页数:13
相关论文
共 1 条
  • [1] A 3.4-to-13.3TOPS/W 3.6TOPS Dual-Core Deep-Learning Accelerator for Versatile AI Applications in 7nm 5G Smartphone SoC
    Lin, Chien-Hung
    Cheng, Chih-Chung
    Tsai, Yi-Min
    Hung, Sheng-Je
    Kuo, Yu-Ting
    Wang, Perry H.
    Tsung, Pei-Kuei
    Hsu, Jeng-Yun
    Lai, Wei-Chih
    Liu, Chia-Hung
    Wang, Shao-Yu
    Kuo, Chin-Hua
    Chang, Chih-Yu
    Lee, Ming-Hsien
    Lin, Tsung-Yao
    Chen, Chih-Cheng
    2020 IEEE INTERNATIONAL SOLID- STATE CIRCUITS CONFERENCE (ISSCC), 2020, : 134 - +