SuSy: A Programming Model for Productive Construction of High-Performance Systolic Arrays on FPGAs

被引:32
|
作者
Lai, Yi-Hsiang [1 ]
Rong, Hongbo [2 ]
Zheng, Size [3 ]
Zhang, Weihao [4 ]
Cui, Xiuping [3 ]
Jia, Yunshan [3 ]
Wang, Jie [5 ]
Sullivan, Brendan [1 ]
Zhang, Zhiru [1 ]
Liang, Yun [3 ]
Zhang, Youhui [4 ]
Cong, Jason [5 ]
George, Nithin [2 ]
Alvarez, Jose [2 ]
Hughes, Christopher [2 ]
Dubey, Pradeep [2 ]
机构
[1] Cornell Univ, Sch Elect & Comp Engn, Ithaca, NY 14853 USA
[2] Intel, San Jose, CA USA
[3] Peking Univ, Beijing, Peoples R China
[4] Tsinghua Univ, Beijing, Peoples R China
[5] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA
基金
美国国家科学基金会;
关键词
DSL; FPGA; Systolic Array; Space-Time Transformation; URE; HIGH-LEVEL SYNTHESIS; LANGUAGE; COMPILER;
D O I
10.1145/3400302.3415644
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Systolic algorithms are one of the killer applications on spatial architectures such as FPGAs and CGRAs. However, it requires a tremendous amount of human effort to design and implement a high-performance systolic array for a given algorithm using the traditional RTL-based methodology. On the other hand, existing high-level synthesis (HLS) tools either (1) force the programmers to do "micro-coding" where too many optimizations must be carried out through tedious code restructuring and insertion of vendor-specific pragmas, or (2) give them too little control to influence a push-button compilation flow to achieve high quality of results. To tackle these challenges, we introduce SuSy, a programming framework composed of a domain-specific language (DSL) and a compilation flow that enables programmers to productively build high-performance systolic arrays on FPGAs. With SuSy, programmers express the design functionality in the form of uniform recurrence equations (UREs), which can describe algorithms from a wide spectrum of applications as long as the underlying computation has a uniform dependence structure. The URE description in SuSy is followed by a set of decoupled spatial mapping primitives that specify how to map the equations to a spatial architecture. More concretely, programmers can apply space-time transformations and several other memory and I/O optimizations to build a highly efficient systolic architecture productively. Experimental results show that SuSy can describe various algorithms with UREs and generate high-performance systolic arrays by spatial optimizations. For instance, the SGEMM benchmark written in SuSy can approach the performance of the manual design optimized by experts, while using 30x fewer lines of code.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] High-performance systolic arrays for band matrix multiplication
    Yang, Y
    Zhao, WQ
    Inoue, Y
    2005 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), VOLS 1-6, CONFERENCE PROCEEDINGS, 2005, : 1130 - 1133
  • [2] HIGH-PERFORMANCE PACKET ROUTING BASED ON SYSTOLIC ARRAYS
    MILLER, PR
    YANTCHEV, JT
    JESSHOPE, CR
    SYSTOLIC ARRAY PROCESSORS, 1989, : 620 - 630
  • [3] Global arrays: A nonuniform memory access programming model for high-performance computers
    Nieplocha, J
    Harrison, RJ
    Littlefield, RJ
    JOURNAL OF SUPERCOMPUTING, 1996, 10 (02): : 169 - 189
  • [4] Directive-Based, High-Level Programming and Optimizations for High-Performance Computing with FPGAs
    Lambert, Jacob
    Lee, Seyong
    Kim, Jungwon
    Vetter, Jeffrey S.
    Malony, Allen D.
    INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS 2018), 2018, : 160 - 171
  • [5] High-Performance QR Decomposition for FPGAs
    Langhammer, Martin
    Pasca, Bogdan
    PROCEEDINGS OF THE 2018 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'18), 2018, : 183 - 188
  • [6] Design of three high-performance concurrent systolic arrays for band matrix multiplication
    Yang, Y
    Zhao, WQ
    CHINESE JOURNAL OF ELECTRONICS, 2005, 14 (04): : 559 - 563
  • [7] High-performance Convolutional Neural Network Accelerator Based on Systolic Arrays and Quantization
    Li, Yufeng
    Lu, Shengli
    Luo, Jihe
    Pang, Wei
    Liu, Hao
    2019 IEEE 4TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP 2019), 2019, : 335 - 339
  • [8] Integrating FPGAs in High-Performance Computing: Programming Models for Parallel Systems - The Programmer's Perspective
    Singh, Satnam
    FPGA 2007: FIFTEENTH ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2007, : 133 - 135
  • [9] MPI as a Programming Model for High-Performance Reconfigurable Computers
    Saldana, Manuel
    Patel, Arun
    Madill, Christopher
    Nunes, Daniel
    Wang, Danyao
    Chow, Paul
    Wittig, Ralph
    Styles, Henry
    Putnam, Andrew
    ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2010, 3 (04)
  • [10] High-performance and parameterized matrix factorization on FPGAS
    Zhuo, Ling
    Prasanna, Vtktor K.
    2006 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS, PROCEEDINGS, 2006, : 363 - 368