Performance Optimizing Method for Sparse Convolutional Neural Networks on GPU

被引：0

作者：

Dong X. ^{[1
,2
]}

Liu L. ^{[1
]}

Li J. ^{[1
]}

Feng X.-B. ^{[1
,2
]}

机构：

[1] State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing

[2] University of Chinese Academy of Sciences, Beijing

来源：

Ruan Jian Xue Bao/Journal of Software | 2020年 / 31卷 / 09期

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Code generation; Convolution; GPU; Neural networks; Performance optimization; Sparse;

D O I：

10.13328/j.cnki.jos.006051

中图分类号：

学科分类号：

摘要：

In recent years, with dominating capability shown in plenty of tasks, deep convolutional neural networks have been deployed in applications including object detection, autonomous driving, machine translation, etc. But these models are accompanied by huge amounts of parameters and bring a heavy computational burden. The neural network pruning technique can recognize and remove parameters that contribute little to the accuracy, resulting in reduced amounts of parameters and decreased theoretical computational requirement, thus providing a chance to accelerate neural network models. However, it is hard for the pruned sparse models to achieve efficient execution on GPUs, and the performance of sparse models cannot even match their well-optimized dense counterparts. This study designs a sparsity-aware code generating method, which can generate efficient GPU code for sparse convolutions in pruned neural networks. First, a template is designed for convolution operators with several optimizations targeting GPU architecture. Through compiling and analyzing, the operator template is transformed to the intermediate representation template, which serves as the input to the designed algorithm to generate sparse convolution code according to specific sparse convolution parameters. Moreover, to improve memory throughput, optimizations are performed on data access and data placement based on the characteristics of memory access in neural networks. Finally, as the location information can be encoded into the generated code implicitly, the index structure for the sparse parameters can be eliminated, reducing the memory footprint during the execution. In experiments, it is demonstrated that the proposed sparse code generating method can improve the performance of sparse convolutional neural networks compared with current methods. © Copyright 2020, Institute of Software, the Chinese Academy of Sciences. All rights reserved.

引用

页码：2944 / 2964

页数：20

共 37 条

[1] Krizhevsky A, Sutskever I, Hinton GE., ImageNet classification with deep convolutional neural networks, Proc. of the Advances in Neural Information Processing Systems (NIPS), pp. 1106-1114, (2012)
[2] He K, Gkioxari G, Dollar P, Girshick RB., Mask r-CNN, Proc. of the IEEE Int'l Conf. on Computer Vision (ICCV), pp. 2980-2988, (2017)
[3] Liu W, Anguelov D, Erhan D, Szegedy C, Reed SE, Fu CY, Berg AC., SSD: Single shot multibox detector, Proc. of the European Conf. on Computer Vision (ECCV), 9905, pp. 21-37, (2016)
[4] Chen X, Ma H, Wan J, Li B, Xia T., Multi-View 3D object detection network for autonomous driving, Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 6526-6534, (2017)
[5] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I., Attention is all you need, Proc. of the Advances in Neural Information Processing Systems (NIPS), pp. 5998-6008, (2017)
[6] He K, Zhang X, Ren S, Sun J., Deep residual learning for image recognition, Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, (2016)
[7] Lecun Y, Bottou L, Bengio Y, Haffner P., Gradient-Based learning applied to document recognition, Proc. of the IEEE, 86, 11, pp. 2278-2324, (1998)
[8] Deng J, Dong W, Socher R, Li LJ, Li K, Li FF., ImageNet: A large-scale hierarchical image database, Proc. of the IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 248-255, (2009)
[9] Ye S, Zhang T, Zhang K, Li J, Xu K, Yang Y, Yu F, Tang J, Fardad M, Liu S, Chen X, Lin X, Wang Y., Progressive weight pruning of deep neural networks using admm, (2018)
[10] Guo Y, Yao A, Chen Y., Dynamic network surgery for efficient dnns, Proc. of the Advances in Neural Information Processing Systems (NIPS), pp. 1379-1387, (2016)

← 1 2 3 4 →