Extending High-Level Synthesis for Task-Parallel Programs

被引:0
|
作者
Chi, Yuze [1 ]
Guo, Licheng [1 ]
Lau, Jason [1 ]
Choi, Young-kyu [1 ,2 ]
Wang, Jie [1 ]
Cong, Jason [1 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90095 USA
[2] Inha Univ, Incheon, South Korea
关键词
D O I
10.1109/FCCM51124.2021.00032
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
C/C++/OpenCL-based high-level synthesis (HLS) becomes more and more popular for field-programmable gate array (FPGA) accelerators in many application domains in recent years, thanks to its competitive quality of results (QoR) and short development cycles compared with the traditional register-transfer level design approach. Yet, limited by the sequential C semantics, it remains challenging to adopt the same highly productive high-level programming approach in many other application domains, where coarse-grained tasks run in parallel and communicate with each other at a fine-grained level. While current HLS tools do support task-parallel programs, the productivity is greatly limited (1) in the code development cycle due to the poor programmability, (2) in the correctness verification cycle due to restricted software simulation, and (3) in the QoR tuning cycle due to slow code generation. Such limited productivity often defeats the purpose of HLS and hinder programmers from adopting HLS for task-parallel FPGA accelerators. In this paper, we extend the HLS C++ language and present a fully automated framework with programmer-friendly interfaces, unconstrained software simulation, and fast hierarchical code generation to overcome these limitations and demonstrate how task-parallel programs can be productively supported in HLS. Experimental results based on a wide range of real-world task-parallel programs show that, on average, the lines of kernel and host code are reduced by 22% and 51%, respectively, which considerably improves the programmability. The correctness verification and the iterative QoR tuning cycles are both greatly shortened by 3.2x and 6.8x, respectively. Our work is open-source at https://github.com/UCLA-VAST/tapa/.
引用
收藏
页码:204 / 213
页数:10
相关论文
共 50 条
  • [1] Performance modelling for task-parallel programs
    Kühnemann, M
    Rauber, T
    Rünger, G
    PERFORMANCE ANALYSIS AND GRID COMPUTING, 2004, : 77 - 91
  • [2] A Transformation Framework for Optimizing Task-Parallel Programs
    Nandivada, V. Krishna
    Shirako, Jun
    Zhao, Jisheng
    Sarkar, Vivek
    ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 2013, 35 (01):
  • [3] TProf: An energy profiler for task-parallel programs
    Manousakis, Ioannis
    Zakkak, Foivos S.
    Pratikakis, Polyvios
    Nikolopoulos, Dimitrios S.
    SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2015, 5 : 1 - 13
  • [4] Hierarchical Parallel Dynamic Dependence Analysis for Recursively Task-Parallel Programs
    Papakonstantinou, Nikolaos
    Zakkak, Foivos S.
    Pratikakis, Polyvios
    2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016), 2016, : 933 - 942
  • [5] EFFICIENT IMPLEMENTATION OF HIGH-LEVEL PARALLEL PROGRAMS
    BAGRODIA, R
    MATHUR, S
    SIGPLAN NOTICES, 1991, 26 (04): : 142 - 151
  • [6] A POWERFUL HIGH-LEVEL DEBUGGER FOR PARALLEL PROGRAMS
    CAERTS, C
    LAUWEREINS, R
    PEPERSTRAETE, JA
    LECTURE NOTES IN COMPUTER SCIENCE, 1992, 591 : 54 - 64
  • [7] Extracting SIMD Parallelism from Recursive Task-Parallel Programs
    Ren, Bin
    Balakrishna, Shruthi
    Jo, Youngjoon
    Krishnamoorthy, Sriram
    Agrawal, Kunal
    Kulkarni, Milind
    ACM TRANSACTIONS ON PARALLEL COMPUTING, 2019, 6 (04)
  • [8] Global Dead-Block Management for Task-Parallel Programs
    Manivannan, Madhavan
    Pericas, Miquel
    Papaefstathiou, Vassilis
    Stenstrom, Per
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2018, 15 (03)
  • [9] Runtime-Driven Shared Last-Level Cache Management for Task-Parallel Programs
    Pan, Abhisek
    Pai, Vijay S.
    PROCEEDINGS OF SC15: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2015,
  • [10] Auto-Partitioning Heterogeneous Task-Parallel Programs with StreamBlocks
    Emami, Mahyar
    Bezati, Endri
    Janneck, Jorn W.
    Larus, James R.
    PROCEEDINGS OF THE 2022 31ST INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT 2022, 2022, : 398 - 411