Optimization of General Matrix Multiply Library for Ternary Weight for Fast DNN Inference

被引：0

作者：

Choi, Seokhyeon ^{[1
]}

Shim, Kyuhong ^{[1
]}

Choi, Jungwook ^{[2
]}

Sung, Wonyong ^{[1
]}

Shim, Byonghyo ^{[1
]}

机构：

[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul, South Korea

[2] Hanyang Univ, Dept Elect Engn, Hanyang, South Korea

来源：

JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY | 2022年 / 94卷 / 10期

关键词：

Matrix multiplication; Implementation; Deep neural networks; Inference;

D O I：

10.1007/s11265-022-01782-3

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Efficient implementation of deep neural networks (DNNs) on CPU-based systems is critical owing to the proliferation of applications in embedded and Internet of Things systems. Nowdays, most CPUs are equipped with single instruction multiple data (SIMD) instructions, which are used to implement an efficient general matrix multiply (GEMM) library for accelerating DNN inference. Quantized neural networks are actively investigated to simplify DNN computation and memory requirements; however, the current CPU libraries do not efficiently support arithmetic operations below eight bits. Hence, we developed TernGEMM, a GEMM library composed of SIMD instructions for DNNs with ternary weights and sub-8-bit activations. TernGEMM is implemented using simple logical operations that replace the long-latency multiply-add operation. Instead of fixing the accumulation bit precision as 32-bit, TernGEMM accumulates the partial sums in a bit-incremental manner to exploit parallelism in 8-bit and 16-bit SIMD instructions. Furthermore, we propose different tile sizes for TernGEMM to better support the diverse dimensions of DNNs. Compared with a state-of-the-art reduced precision DNN GEMM library, i.e., GEMMLowp, TernGEMM achieve x1.785 to x4.147 speedup for ResNet50, MobileNet-V2, and EfficientNet-B0, as evaluated on both Intel and ARM CPUs.

引用

页码：929 / 943

页数：15

共 4 条

[1] Optimization of General Matrix Multiply Library for Ternary Weight for Fast DNN Inference
Seokhyeon Choi
Kyuhong Shim
Jungwook Choi
Wonyong Sung
Byonghyo Shim
Journal of Signal Processing Systems, 2022, 94 : 929 - 943
[2] TernGEMM: GEneral Matrix Multiply Library with Ternary Weights for Fast DNN Inference
Choi, Seokhyeon
Shim, Kyuhong
Choi, Jungwook
Sung, Wonyong
Shim, Byonghyo
2021 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS 2021), 2021, : 111 - 116
[3] A portable and high-performance general matrix-multiply (GEMM) library for GPUs and single-chip CPU/GPU systems
Garg, Rahul
Hendren, Laurie
2014 22ND EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2014), 2014, : 681 - 689
[4] The Weight Consistency Matrix Framework for General Non-Binary LDPC Code Optimization: Applications in Flash Memories
Hareedy, Ahmed
Lanka, Chinmayi
Schoeny, Clayton
Dolecek, Lara
2016 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, 2016, : 2709 - 2713

← 1 →