Extending High-Level Synthesis for Task-Parallel Programs

被引：0

作者：

Chi, Yuze ^{[1
]}

Guo, Licheng ^{[1
]}

Lau, Jason ^{[1
]}

Choi, Young-kyu ^{[1
,2
]}

Wang, Jie ^{[1
]}

Cong, Jason ^{[1
]}

机构：

[1] Univ Calif Los Angeles, Los Angeles, CA 90095 USA

[2] Inha Univ, Incheon, South Korea

来源：

2021 IEEE 29TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2021) | 2021年

关键词：

D O I：

10.1109/FCCM51124.2021.00032

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

C/C++/OpenCL-based high-level synthesis (HLS) becomes more and more popular for field-programmable gate array (FPGA) accelerators in many application domains in recent years, thanks to its competitive quality of results (QoR) and short development cycles compared with the traditional register-transfer level design approach. Yet, limited by the sequential C semantics, it remains challenging to adopt the same highly productive high-level programming approach in many other application domains, where coarse-grained tasks run in parallel and communicate with each other at a fine-grained level. While current HLS tools do support task-parallel programs, the productivity is greatly limited (1) in the code development cycle due to the poor programmability, (2) in the correctness verification cycle due to restricted software simulation, and (3) in the QoR tuning cycle due to slow code generation. Such limited productivity often defeats the purpose of HLS and hinder programmers from adopting HLS for task-parallel FPGA accelerators. In this paper, we extend the HLS C++ language and present a fully automated framework with programmer-friendly interfaces, unconstrained software simulation, and fast hierarchical code generation to overcome these limitations and demonstrate how task-parallel programs can be productively supported in HLS. Experimental results based on a wide range of real-world task-parallel programs show that, on average, the lines of kernel and host code are reduced by 22% and 51%, respectively, which considerably improves the programmability. The correctness verification and the iterative QoR tuning cycles are both greatly shortened by 3.2x and 6.8x, respectively. Our work is open-source at https://github.com/UCLA-VAST/tapa/.

引用

页码：204 / 213

页数：10

共 50 条

[1] Performance modelling for task-parallel programs
Kühnemann, M
Rauber, T
Rünger, G
PERFORMANCE ANALYSIS AND GRID COMPUTING, 2004, : 77 - 91
[2] A Transformation Framework for Optimizing Task-Parallel Programs
Nandivada, V. Krishna
Shirako, Jun
Zhao, Jisheng
Sarkar, Vivek
ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 2013, 35 (01):
[3] TProf: An energy profiler for task-parallel programs
Manousakis, Ioannis
Zakkak, Foivos S.
Pratikakis, Polyvios
Nikolopoulos, Dimitrios S.
SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2015, 5 : 1 - 13
[4] Hierarchical Parallel Dynamic Dependence Analysis for Recursively Task-Parallel Programs
Papakonstantinou, Nikolaos
Zakkak, Foivos S.
Pratikakis, Polyvios
2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016), 2016, : 933 - 942
[5] EFFICIENT IMPLEMENTATION OF HIGH-LEVEL PARALLEL PROGRAMS
BAGRODIA, R
MATHUR, S
SIGPLAN NOTICES, 1991, 26 (04): : 142 - 151
[6] A POWERFUL HIGH-LEVEL DEBUGGER FOR PARALLEL PROGRAMS
CAERTS, C
LAUWEREINS, R
PEPERSTRAETE, JA
LECTURE NOTES IN COMPUTER SCIENCE, 1992, 591 : 54 - 64
[7] Extracting SIMD Parallelism from Recursive Task-Parallel Programs
Ren, Bin
Balakrishna, Shruthi
Jo, Youngjoon
Krishnamoorthy, Sriram
Agrawal, Kunal
Kulkarni, Milind
ACM TRANSACTIONS ON PARALLEL COMPUTING, 2019, 6 (04)
[8] Global Dead-Block Management for Task-Parallel Programs
Manivannan, Madhavan
Pericas, Miquel
Papaefstathiou, Vassilis
Stenstrom, Per
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2018, 15 (03)
[9] Runtime-Driven Shared Last-Level Cache Management for Task-Parallel Programs
Pan, Abhisek
Pai, Vijay S.
PROCEEDINGS OF SC15: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2015,
[10] Auto-Partitioning Heterogeneous Task-Parallel Programs with StreamBlocks
Emami, Mahyar
Bezati, Endri
Janneck, Jorn W.
Larus, James R.
PROCEEDINGS OF THE 2022 31ST INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT 2022, 2022, : 398 - 411

← 1 2 3 4 5 →