SuSy: A Programming Model for Productive Construction of High-Performance Systolic Arrays on FPGAs

被引:32
|
作者
Lai, Yi-Hsiang [1 ]
Rong, Hongbo [2 ]
Zheng, Size [3 ]
Zhang, Weihao [4 ]
Cui, Xiuping [3 ]
Jia, Yunshan [3 ]
Wang, Jie [5 ]
Sullivan, Brendan [1 ]
Zhang, Zhiru [1 ]
Liang, Yun [3 ]
Zhang, Youhui [4 ]
Cong, Jason [5 ]
George, Nithin [2 ]
Alvarez, Jose [2 ]
Hughes, Christopher [2 ]
Dubey, Pradeep [2 ]
机构
[1] Cornell Univ, Sch Elect & Comp Engn, Ithaca, NY 14853 USA
[2] Intel, San Jose, CA USA
[3] Peking Univ, Beijing, Peoples R China
[4] Tsinghua Univ, Beijing, Peoples R China
[5] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA
基金
美国国家科学基金会;
关键词
DSL; FPGA; Systolic Array; Space-Time Transformation; URE; HIGH-LEVEL SYNTHESIS; LANGUAGE; COMPILER;
D O I
10.1145/3400302.3415644
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Systolic algorithms are one of the killer applications on spatial architectures such as FPGAs and CGRAs. However, it requires a tremendous amount of human effort to design and implement a high-performance systolic array for a given algorithm using the traditional RTL-based methodology. On the other hand, existing high-level synthesis (HLS) tools either (1) force the programmers to do "micro-coding" where too many optimizations must be carried out through tedious code restructuring and insertion of vendor-specific pragmas, or (2) give them too little control to influence a push-button compilation flow to achieve high quality of results. To tackle these challenges, we introduce SuSy, a programming framework composed of a domain-specific language (DSL) and a compilation flow that enables programmers to productively build high-performance systolic arrays on FPGAs. With SuSy, programmers express the design functionality in the form of uniform recurrence equations (UREs), which can describe algorithms from a wide spectrum of applications as long as the underlying computation has a uniform dependence structure. The URE description in SuSy is followed by a set of decoupled spatial mapping primitives that specify how to map the equations to a spatial architecture. More concretely, programmers can apply space-time transformations and several other memory and I/O optimizations to build a highly efficient systolic architecture productively. Experimental results show that SuSy can describe various algorithms with UREs and generate high-performance systolic arrays by spatial optimizations. For instance, the SGEMM benchmark written in SuSy can approach the performance of the manual design optimized by experts, while using 30x fewer lines of code.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Productive high-performance software for OpenCL devices
    Melonakos, John
    Yalamanchili, Pavan
    McClanahan, Chris
    Arshad, Umar
    Landes, Michael
    Jamboti, Shivapriya
    Joshi, Abhijit
    Mohammed, Shehzan
    Spafford, Kyle
    Venugopalakrishnan, Vishwanath
    Malcolm, James
    MODELING AND SIMULATION FOR DEFENSE SYSTEMS AND APPLICATIONS VIII, 2013, 8752
  • [22] HPIPM: a high-performance quadratic programming framework for model predictive control
    Frison, Gianluca
    Diehl, Moritz
    IFAC PAPERSONLINE, 2020, 53 (02): : 6563 - 6569
  • [23] Generic programming and high-performance libraries
    Gregor, D
    Järvi, J
    Kulkarni, M
    Lumsdaine, A
    Musser, D
    Schupp, S
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2005, 33 (2-3) : 145 - 164
  • [24] MaMR: High-performance MapReduce programming model for material cloud applications
    Jing, Weipeng
    Tong, Danyu
    Wang, Yangang
    Wang, Jingyuan
    Liu, Yaqiu
    Zhao, Peng
    COMPUTER PHYSICS COMMUNICATIONS, 2017, 211 : 79 - 87
  • [25] Generic Programming and High-Performance Libraries
    Douglas Gregor
    Jaakko Järvi
    Mayuresh Kulkarni
    Andrew Lumsdaine
    David Musser
    Sibylle Schupp
    International Journal of Parallel Programming, 2005, 33 : 145 - 164
  • [26] Programming Models for High-Performance Computing
    Snir, Marc
    PROCEEDINGS OF THE 2013 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID 2013), 2013, : 1 - 1
  • [27] DaCO: A High-Performance Token Dataflow Coprocessor Overlay for FPGAs
    Siddhartha
    Kapre, Nachiket
    2018 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT 2018), 2018, : 161 - 168
  • [28] A High-Performance Routing Engine for Large-Scale FPGAs
    Martin, Timothy
    Maarouf, Dani
    Grewal, Gary
    Areibi, Shawki
    2024 34TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, FPL 2024, 2024, : 53 - 59
  • [29] Integrating FPGAs in High-Performance Computing: The Architecture and Implementation Perspective
    Woods, Nathan
    FPGA 2007: FIFTEENTH ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2007, : 132 - 132
  • [30] Comparing FPGAs and GPUs for high-performance image processing applications
    Kelmelis, Eric J.
    Ortiz, Fernando E.
    Curt, Petersen F.
    Bodnar, Michael R.
    Spagnoli, Kyle E.
    Paolini, Aaron L.
    Price, Daniel K.
    VISUAL INFORMATION PROCESSING XIX, 2010, 7701