Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training

被引：44

作者：

Yang, Dingqing ^{[1
]}

Ghasemazar, Amin ^{[1
]}

Ren, Xiaowei ^{[1
]}

Golub, Maximilian ^{[1
,2
]}

Lemieux, Guy ^{[1
]}

Lis, Mieszko ^{[1
]}

机构：

[1] Univ British Columbia, Vancouver, BC, Canada

[2] Microsoft Corp, Redmond, WA 98052 USA

来源：

2020 53RD ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO 2020) | 2020年

基金：

加拿大自然科学与工程研究理事会;

关键词：

D O I：

10.1109/MICRO50266.2020.00064

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The success of DNN pruning has led to the development of energy-efficient inference accelerators that support pruned models with sparse weight and activation tensors. Because the memory layouts and dataflows in these architectures are optimized for the access patterns during inference, however, they do not efficiently support the emerging sparse training techniques. In this paper, we demonstrate (a) that accelerating sparse training requires a co-design approach where algorithms are adapted to suit the constraints of hardware, and (b) that hardware for sparse DNN training must tackle constraints that do not arise in inference accelerators. As proof of concept, we adapt a sparse training algorithm to be amenable to hardware acceleration; we then develop dataflow, data layout, and load-balancing techniques to accelerate it. The resulting system is a sparse DNN training accelerator that produces pruned models with the same accuracy as dense models without first training, then pruning, and finally retraining, a dense model. Compared to training the equivalent unpruned models using a state-of-the-art DNN accelerator without sparse training support, Procrustes consumes up to 3.26x less energy and offers up to 4x speedup across a range of models, while pruning weights by an order of magnitude and maintaining unpruned accuracy.

引用

页码：711 / 724

页数：14

共 50 条

[41] The Design and Implementation of Scalable Deep Neural Network Accelerator Cores
Sakamoto, Ryuichi
Takata, Ryo
Ishii, Jun
Kondo, Masaaki
Nakamura, Hiroshi
Ohkubo, Tetsui
Kojima, Takuya
Amano, Hideharu
2017 IEEE 11TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP (MCSOC 2017), 2017, : 13 - 20
[42] An Energy-Efficient Deep Neural Network Accelerator Design
Jung, Jueun
Lee, Kyuho Jason
2020 54TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2020, : 272 - 276
[43] An Optical Accelerator for Deep Neural Network Based on Integrated Nanophotonics
Shiomi, Jun
Ishihara, Tohru
Onodera, Hidetoshi
Shinya, Akihiko
Notomi, Masaya
2020 INTERNATIONAL CONFERENCE ON REBOOTING COMPUTING (ICRC 2020), 2020, : 95 - 101
[44] Analog In-Memory Subthreshold Deep Neural Network Accelerator
Fick, L.
Blaauw, D.
Sylvester, D.
Skrzyniarz, S.
Parikh, M.
Fick, D.
2017 IEEE CUSTOM INTEGRATED CIRCUITS CONFERENCE (CICC), 2017,
[45] A Deep Neural Network Accelerator Based on Tiled RRAM Architecture
Wang, Qiwen
Wang, Xinxin
Lee, Seung Hwan
Meng, Fan-Hsuan
Lu, Wei D.
2019 IEEE INTERNATIONAL ELECTRON DEVICES MEETING (IEDM), 2019,
[46] RazorNet: Adversarial Training and Noise Training on a Deep Neural Network Fooled by a Shallow Neural Network
Taheri, Shayan
Salem, Milad
Yuan, Jiann-Shiun
BIG DATA AND COGNITIVE COMPUTING, 2019, 3 (03) : 1 - 17
[47] FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks
Lu, Wenyan
Yan, Guihai
Li, Jiajun
Gong, Shijun
Han, Yinhe
Li, Xiaowei
2017 23RD IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2017, : 553 - 564
[48] Multiply-and-Fire: An Event-Driven Sparse Neural Network Accelerator
Yu, Miao
Xiang, Tingting
Miriyala, Venkata Pavan Kumar
Carlson, Trevor E.
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2023, 20 (04)
[49] Sparse-Sparse Matrix Multiplication Accelerator on FPGA featuring Distribute-Merge Product Dataflow
Nagahara, Yuta
Yan, Jiale
Kawamura, Kazushi
Motomura, Masato
Chu, Thiem Van
29TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC 2024, 2024, : 785 - 791
[50] A power-efficient spiking convolutional neural network accelerator based on temporal parallelism and streaming dataflow
Zhang, Jian
Wang, Yong
Zhang, Yanlong
Bi, Bo
Chen, Qiliang
Cai, Yimao
MICROELECTRONICS JOURNAL, 2025, 158

← 1 2 3 4 5 →