SuSy: A Programming Model for Productive Construction of High-Performance Systolic Arrays on FPGAs

被引：32

作者：

Lai, Yi-Hsiang ^{[1
]}

Rong, Hongbo ^{[2
]}

Zheng, Size ^{[3
]}

Zhang, Weihao ^{[4
]}

Cui, Xiuping ^{[3
]}

Jia, Yunshan ^{[3
]}

Wang, Jie ^{[5
]}

Sullivan, Brendan ^{[1
]}

Zhang, Zhiru ^{[1
]}

Liang, Yun ^{[3
]}

Zhang, Youhui ^{[4
]}

Cong, Jason ^{[5
]}

George, Nithin ^{[2
]}

Alvarez, Jose ^{[2
]}

Hughes, Christopher ^{[2
]}

Dubey, Pradeep ^{[2
]}

机构：

[1] Cornell Univ, Sch Elect & Comp Engn, Ithaca, NY 14853 USA

[2] Intel, San Jose, CA USA

[3] Peking Univ, Beijing, Peoples R China

[4] Tsinghua Univ, Beijing, Peoples R China

[5] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA

来源：

2020 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED-DESIGN (ICCAD) | 2020年

基金：

美国国家科学基金会;

关键词：

DSL; FPGA; Systolic Array; Space-Time Transformation; URE; HIGH-LEVEL SYNTHESIS; LANGUAGE; COMPILER;

D O I：

10.1145/3400302.3415644

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Systolic algorithms are one of the killer applications on spatial architectures such as FPGAs and CGRAs. However, it requires a tremendous amount of human effort to design and implement a high-performance systolic array for a given algorithm using the traditional RTL-based methodology. On the other hand, existing high-level synthesis (HLS) tools either (1) force the programmers to do "micro-coding" where too many optimizations must be carried out through tedious code restructuring and insertion of vendor-specific pragmas, or (2) give them too little control to influence a push-button compilation flow to achieve high quality of results. To tackle these challenges, we introduce SuSy, a programming framework composed of a domain-specific language (DSL) and a compilation flow that enables programmers to productively build high-performance systolic arrays on FPGAs. With SuSy, programmers express the design functionality in the form of uniform recurrence equations (UREs), which can describe algorithms from a wide spectrum of applications as long as the underlying computation has a uniform dependence structure. The URE description in SuSy is followed by a set of decoupled spatial mapping primitives that specify how to map the equations to a spatial architecture. More concretely, programmers can apply space-time transformations and several other memory and I/O optimizations to build a highly efficient systolic architecture productively. Experimental results show that SuSy can describe various algorithms with UREs and generate high-performance systolic arrays by spatial optimizations. For instance, the SGEMM benchmark written in SuSy can approach the performance of the manual design optimized by experts, while using 30x fewer lines of code.

引用

页数：9

共 50 条

[21] Productive high-performance software for OpenCL devices
Melonakos, John
Yalamanchili, Pavan
McClanahan, Chris
Arshad, Umar
Landes, Michael
Jamboti, Shivapriya
Joshi, Abhijit
Mohammed, Shehzan
Spafford, Kyle
Venugopalakrishnan, Vishwanath
Malcolm, James
MODELING AND SIMULATION FOR DEFENSE SYSTEMS AND APPLICATIONS VIII, 2013, 8752
[22] HPIPM: a high-performance quadratic programming framework for model predictive control
Frison, Gianluca
Diehl, Moritz
IFAC PAPERSONLINE, 2020, 53 (02): : 6563 - 6569
[23] Generic programming and high-performance libraries
Gregor, D
Järvi, J
Kulkarni, M
Lumsdaine, A
Musser, D
Schupp, S
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2005, 33 (2-3) : 145 - 164
[24] MaMR: High-performance MapReduce programming model for material cloud applications
Jing, Weipeng
Tong, Danyu
Wang, Yangang
Wang, Jingyuan
Liu, Yaqiu
Zhao, Peng
COMPUTER PHYSICS COMMUNICATIONS, 2017, 211 : 79 - 87
[25] Generic Programming and High-Performance Libraries
Douglas Gregor
Jaakko Järvi
Mayuresh Kulkarni
Andrew Lumsdaine
David Musser
Sibylle Schupp
International Journal of Parallel Programming, 2005, 33 : 145 - 164
[26] Programming Models for High-Performance Computing
Snir, Marc
PROCEEDINGS OF THE 2013 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID 2013), 2013, : 1 - 1
[27] DaCO: A High-Performance Token Dataflow Coprocessor Overlay for FPGAs
Siddhartha
Kapre, Nachiket
2018 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT 2018), 2018, : 161 - 168
[28] A High-Performance Routing Engine for Large-Scale FPGAs
Martin, Timothy
Maarouf, Dani
Grewal, Gary
Areibi, Shawki
2024 34TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, FPL 2024, 2024, : 53 - 59
[29] Integrating FPGAs in High-Performance Computing: The Architecture and Implementation Perspective
Woods, Nathan
FPGA 2007: FIFTEENTH ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2007, : 132 - 132
[30] Comparing FPGAs and GPUs for high-performance image processing applications
Kelmelis, Eric J.
Ortiz, Fernando E.
Curt, Petersen F.
Bodnar, Michael R.
Spagnoli, Kyle E.
Paolini, Aaron L.
Price, Daniel K.
VISUAL INFORMATION PROCESSING XIX, 2010, 7701

← 1 2 3 4 5 →