SuSy: A Programming Model for Productive Construction of High-Performance Systolic Arrays on FPGAs

被引:32
|
作者
Lai, Yi-Hsiang [1 ]
Rong, Hongbo [2 ]
Zheng, Size [3 ]
Zhang, Weihao [4 ]
Cui, Xiuping [3 ]
Jia, Yunshan [3 ]
Wang, Jie [5 ]
Sullivan, Brendan [1 ]
Zhang, Zhiru [1 ]
Liang, Yun [3 ]
Zhang, Youhui [4 ]
Cong, Jason [5 ]
George, Nithin [2 ]
Alvarez, Jose [2 ]
Hughes, Christopher [2 ]
Dubey, Pradeep [2 ]
机构
[1] Cornell Univ, Sch Elect & Comp Engn, Ithaca, NY 14853 USA
[2] Intel, San Jose, CA USA
[3] Peking Univ, Beijing, Peoples R China
[4] Tsinghua Univ, Beijing, Peoples R China
[5] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA
基金
美国国家科学基金会;
关键词
DSL; FPGA; Systolic Array; Space-Time Transformation; URE; HIGH-LEVEL SYNTHESIS; LANGUAGE; COMPILER;
D O I
10.1145/3400302.3415644
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Systolic algorithms are one of the killer applications on spatial architectures such as FPGAs and CGRAs. However, it requires a tremendous amount of human effort to design and implement a high-performance systolic array for a given algorithm using the traditional RTL-based methodology. On the other hand, existing high-level synthesis (HLS) tools either (1) force the programmers to do "micro-coding" where too many optimizations must be carried out through tedious code restructuring and insertion of vendor-specific pragmas, or (2) give them too little control to influence a push-button compilation flow to achieve high quality of results. To tackle these challenges, we introduce SuSy, a programming framework composed of a domain-specific language (DSL) and a compilation flow that enables programmers to productively build high-performance systolic arrays on FPGAs. With SuSy, programmers express the design functionality in the form of uniform recurrence equations (UREs), which can describe algorithms from a wide spectrum of applications as long as the underlying computation has a uniform dependence structure. The URE description in SuSy is followed by a set of decoupled spatial mapping primitives that specify how to map the equations to a spatial architecture. More concretely, programmers can apply space-time transformations and several other memory and I/O optimizations to build a highly efficient systolic architecture productively. Experimental results show that SuSy can describe various algorithms with UREs and generate high-performance systolic arrays by spatial optimizations. For instance, the SGEMM benchmark written in SuSy can approach the performance of the manual design optimized by experts, while using 30x fewer lines of code.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] High-Performance Reconfigurable Computer Systems Based on Virtex FPGAs
    Dordopulo, Alexey I.
    Levin, Ilya I.
    Doronchenko, Yuri I.
    Raskladkin, Maxim K.
    PARALLEL COMPUTING TECHNOLOGIES (PACT 2015), 2015, 9251 : 349 - 362
  • [32] Efficient Compilation of CUDA Kernels for High-Performance Computing on FPGAs
    Papakonstantinou, Alexandros
    Gururaj, Karthik
    Stratton, John A.
    Chen, Deming
    Cong, Jason
    Hwu, Wen-Mei W.
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2013, 13 (02)
  • [33] Mocarabe: High-Performance Time-Multiplexed Overlays for FPGAs
    Tombs, Frederick
    Mellat, Alireza
    Kapre, Nachiket
    2021 IEEE 29TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2021), 2021, : 115 - 123
  • [34] High-Performance Mixed-Precision Linear Solver for FPGAs
    Sun, Junqing
    Peterson, Gregory D.
    Storaasli, Olaf O.
    IEEE TRANSACTIONS ON COMPUTERS, 2008, 57 (12) : 1614 - 1623
  • [35] HIGH-PERFORMANCE FLOATING-POINT IMPLEMENTATION USING FPGAS
    Parker, Michael
    MILCOM 2009 - 2009 IEEE MILITARY COMMUNICATIONS CONFERENCE, VOLS 1-4, 2009, : 323 - 327
  • [36] A Multicore Architecture for High-Performance Scientific Computing using FPGAs
    Cobos Carrascosa, J. P.
    Aparicio del Moral, B.
    Ramos, J. L.
    Lopez Jimenez, A. C.
    del Toro Iniesta, J. C.
    2014 IEEE 8TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANYCORE SOCS (MCSOC), 2014, : 223 - 228
  • [37] High-Performance High-Order Stencil Computation on FPGAs Using OpenCL
    Zohouri, Hamid Reza
    Podobas, Artur
    Matsuoka, Satoshi
    2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 123 - 130
  • [38] Evaluating High-Level Design Strategies on FPGAs for High-Performance Computing
    Podobas, Artur
    Zohouri, Hamid Reza
    Maruyama, Naoya
    Matsuoka, Satoshi
    2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2017,
  • [39] Evaluating High-Level Design Strategies on FPGAs for High-Performance Computing
    Podobas, Artur
    Zohouri, Hamid Reza
    Maruyama, Naoya
    Matsuoka, Satoshi
    2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2017,
  • [40] High-performance Systolic Array Montgomery Multiplier for SIKE
    Ni, Ziying
    Kundi, Dur-E-Shahwar
    O'Neill, Maire
    Liu, Weiqiang
    2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,