Performance Optimizing Method for Sparse Convolutional Neural Networks on GPU

被引:0
|
作者
Dong X. [1 ,2 ]
Liu L. [1 ]
Li J. [1 ]
Feng X.-B. [1 ,2 ]
机构
[1] State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing
[2] University of Chinese Academy of Sciences, Beijing
来源
Ruan Jian Xue Bao/Journal of Software | 2020年 / 31卷 / 09期
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Code generation; Convolution; GPU; Neural networks; Performance optimization; Sparse;
D O I
10.13328/j.cnki.jos.006051
中图分类号
学科分类号
摘要
In recent years, with dominating capability shown in plenty of tasks, deep convolutional neural networks have been deployed in applications including object detection, autonomous driving, machine translation, etc. But these models are accompanied by huge amounts of parameters and bring a heavy computational burden. The neural network pruning technique can recognize and remove parameters that contribute little to the accuracy, resulting in reduced amounts of parameters and decreased theoretical computational requirement, thus providing a chance to accelerate neural network models. However, it is hard for the pruned sparse models to achieve efficient execution on GPUs, and the performance of sparse models cannot even match their well-optimized dense counterparts. This study designs a sparsity-aware code generating method, which can generate efficient GPU code for sparse convolutions in pruned neural networks. First, a template is designed for convolution operators with several optimizations targeting GPU architecture. Through compiling and analyzing, the operator template is transformed to the intermediate representation template, which serves as the input to the designed algorithm to generate sparse convolution code according to specific sparse convolution parameters. Moreover, to improve memory throughput, optimizations are performed on data access and data placement based on the characteristics of memory access in neural networks. Finally, as the location information can be encoded into the generated code implicitly, the index structure for the sparse parameters can be eliminated, reducing the memory footprint during the execution. In experiments, it is demonstrated that the proposed sparse code generating method can improve the performance of sparse convolutional neural networks compared with current methods. © Copyright 2020, Institute of Software, the Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:2944 / 2964
页数:20
相关论文
共 37 条
  • [21] Han S, Pool J, Tran J, Dally WJ., Learning both weights and connections for efficient neural network, Proc. of the Advances in Neural Information Processing Systems (NIPS), pp. 1135-1143, (2015)
  • [22] Dong X, Chen S, Pan SJ., Learning to prune deep neural networks via layer-wise optimal brain surgeon, Proc. of the Advances in Neural Information Processing Systems (NIPS), pp. 4857-4867, (2017)
  • [23] Naumov M, Chien L, Vandermersch P, Kapasi U., Cusparse library, Proc. of the GPU Technology Conf, (2010)
  • [24] Chen X., Escoin: Efficient sparse convolutional neural network inference on GPUs, (2018)
  • [25] Mao H, Han S, Pool J, Li W, Liu X, Wang Y, Dally WJ., Exploring the granularity of sparsity in convolutional neural networks, Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition Workshops (CVPR Workshops), pp. 1927-1934, (2017)
  • [26] Park J, Li SR, Wen W, Tang PTP, Li H, Chen Y, Dubey P., Faster cnns with direct sparse convolutions and guided pruning, Proc. of the Int'l Conf. on Learning Representations (ICLR), (2017)
  • [27] Lei J, Gao X, Song J, Wang XL, Song ML., Survey of deep neural network model compression, Ruan Jian Xue Bao/Journal of Software, 29, 2, pp. 251-266, (2018)
  • [28] Zhang X, Tan G, Xue S, Li J, Zhou K, Chen M., Understanding the GPU microarchitecture to achieve bare-metal performance tuning, Proc. of the 22nd ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP), pp. 31-43, (2017)
  • [29] Williams S, Waterman A, Patterson DA., Roofline: An insightful visual performance model for multicore architectures, Communications of the ACM, 52, 4, pp. 65-76, (2009)
  • [30] Liu B, Wang M, Foroosh H, Tappen MF, Pensky M., Sparse convolutional neural networks, Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 806-814, (2015)