Optimization of General Matrix Multiply Library for Ternary Weight for Fast DNN Inference

被引:0
|
作者
Choi, Seokhyeon [1 ]
Shim, Kyuhong [1 ]
Choi, Jungwook [2 ]
Sung, Wonyong [1 ]
Shim, Byonghyo [1 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul, South Korea
[2] Hanyang Univ, Dept Elect Engn, Hanyang, South Korea
关键词
Matrix multiplication; Implementation; Deep neural networks; Inference;
D O I
10.1007/s11265-022-01782-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Efficient implementation of deep neural networks (DNNs) on CPU-based systems is critical owing to the proliferation of applications in embedded and Internet of Things systems. Nowdays, most CPUs are equipped with single instruction multiple data (SIMD) instructions, which are used to implement an efficient general matrix multiply (GEMM) library for accelerating DNN inference. Quantized neural networks are actively investigated to simplify DNN computation and memory requirements; however, the current CPU libraries do not efficiently support arithmetic operations below eight bits. Hence, we developed TernGEMM, a GEMM library composed of SIMD instructions for DNNs with ternary weights and sub-8-bit activations. TernGEMM is implemented using simple logical operations that replace the long-latency multiply-add operation. Instead of fixing the accumulation bit precision as 32-bit, TernGEMM accumulates the partial sums in a bit-incremental manner to exploit parallelism in 8-bit and 16-bit SIMD instructions. Furthermore, we propose different tile sizes for TernGEMM to better support the diverse dimensions of DNNs. Compared with a state-of-the-art reduced precision DNN GEMM library, i.e., GEMMLowp, TernGEMM achieve x1.785 to x4.147 speedup for ResNet50, MobileNet-V2, and EfficientNet-B0, as evaluated on both Intel and ARM CPUs.
引用
收藏
页码:929 / 943
页数:15
相关论文
共 4 条
  • [1] Optimization of General Matrix Multiply Library for Ternary Weight for Fast DNN Inference
    Seokhyeon Choi
    Kyuhong Shim
    Jungwook Choi
    Wonyong Sung
    Byonghyo Shim
    Journal of Signal Processing Systems, 2022, 94 : 929 - 943
  • [2] TernGEMM: GEneral Matrix Multiply Library with Ternary Weights for Fast DNN Inference
    Choi, Seokhyeon
    Shim, Kyuhong
    Choi, Jungwook
    Sung, Wonyong
    Shim, Byonghyo
    2021 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS 2021), 2021, : 111 - 116
  • [3] A portable and high-performance general matrix-multiply (GEMM) library for GPUs and single-chip CPU/GPU systems
    Garg, Rahul
    Hendren, Laurie
    2014 22ND EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2014), 2014, : 681 - 689
  • [4] The Weight Consistency Matrix Framework for General Non-Binary LDPC Code Optimization: Applications in Flash Memories
    Hareedy, Ahmed
    Lanka, Chinmayi
    Schoeny, Clayton
    Dolecek, Lara
    2016 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, 2016, : 2709 - 2713