Loop Parallelization Techniques for FPGA Accelerator Synthesis

被引:4
|
作者
Reiche, Oliver [1 ]
Ozkan, M. Akif [1 ]
Hannig, Frank [1 ]
Teich, Juergen [1 ]
Schmid, Moritz [2 ]
机构
[1] Friedrich Alexander Univ, Hardware Software Codesign, Dept Comp Sci, Erlangen Nurnberg FAU, Cauerstr 11, D-91054 Erlangen, Germany
[2] Siemens Healthcare GmbH, Adv Therapies Business Unit, R&D, Forchheim, Germany
来源
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY | 2018年 / 90卷 / 01期
关键词
Altera OpenCL; Vivado HLS; Vectorization; Loop coarsening; Loop tiling; FLOW;
D O I
10.1007/s11265-017-1229-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Current tools for High-Level Synthesis (HLS) excel at exploiting Instruction-Level Parallelism (ILP). The support for Data-Level Parallelism (DLP), one of the key advantages of Field programmable Gate Arrays (FPGAs), is in contrast very limited. This work examines the exploitation of DLP on FPGAs using code generation for C-based HLS of image filters and streaming pipelines. In addition to well-known loop tiling techniques, we propose loop coarsening, which delivers superior performance and scalability. Loop tiling corresponds to splitting an image into separate regions, which are then processed in parallel by replicated accelerators. For data streaming, this also requires the generation of glue logic for the distribution of image data. Conversely, loop coarsening allows processing multiple pixels in parallel, whereby only the kernel operator is replicated within a single accelerator. We present concrete implementations of tiling and coarsening for Vivado HLS and Altera OpenCL. Furthermore, we present a comparison of our implementations to the keyword-driven parallelization support provided by the Altera Offline Compiler. We augment the FPGA back end of the heterogeneous Domain-Specific Language (DSL) framework HIPAcc to generate loop coarsening implementations for Vivado HLS and Altera OpenCL. Moreover, we compare the resulting FPGA accelerators to highly optimized software implementations for Graphics Processing Units (GPU), all generated from exactly the same code base.
引用
收藏
页码:3 / 27
页数:25
相关论文
共 50 条
  • [11] A new model of exploiting loop parallelization using knowledge-based techniques
    Yang, CT
    Tseng, SS
    Tsai, CJ
    Chuang, CD
    Chuang, SW
    SEVENTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS: WORKSHOPS, PROCEEDINGS, 2000, : 9 - 14
  • [12] NEURAL ACCELERATOR FOR PARALLELIZATION OF BACKPROPAGATION ALGORITHM
    FRANZI, E
    MICROPROCESSING AND MICROPROGRAMMING, 1993, 38 (1-5): : 689 - 696
  • [13] Synthesis of FPGA implementations from loop algorithms
    Bednara, M
    Teich, J
    ERSA 2001: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ENGINEERING OF RECONFIGURABLE SYSTEMS AND ALGORITHMS, 2001, : 1 - 7
  • [14] Validation of Loop Parallelization and Loop Vectorization Transformations
    Dutta, Sudakshina
    Sarkar, Dipankar
    Rawat, Arvind
    Singh, Kulwant
    ENASE: PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON EVALUATION OF NOVEL SOFTWARE APPROACHES TO SOFTWARE ENGINEERING, 2016, : 195 - 202
  • [15] Modeling and Synthesis of Communication Subsystems for Loop Accelerator Pipelines
    Dutta, Hritam
    Hannig, Frank
    Schmid, Moritz
    Keinert, Joachim
    21ST IEEE INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, 2010,
  • [16] FPGA Circuit Synthesis of Accelerator Data-Parallel Programs
    Bond, Barry
    Hammil, Kerry
    Litchev, Lubomir
    Singh, Satnam
    2010 18TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2010), 2010, : 167 - 170
  • [17] A novel approach to loop parallelization
    Purnell, V
    Corr, PH
    Milligan, P
    23RD EUROMICRO CONFERENCE - NEW FRONTIERS OF INFORMATION TECHNOLOGY, PROCEEDINGS: SHORT CONTRIBUTIONS, 1997, : 272 - 277
  • [18] QuickDough: A Rapid FPGA Loop Accelerator Design Framework Using Soft CGRA Overlay
    Liu, Cheng
    Ng, Ho-Cheung
    So, Hayden Kwok-Hay
    2015 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (FPT), 2015, : 56 - 63
  • [19] WALSH SPECTRAL TECHNIQUES FOR LOGIC SYNTHESIS FPGA
    Nguyen, Nhan Khanh Huu
    ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING, 2015, 13 (02) : 162 - 170
  • [20] Nested Loop Parallelization Using Polyhedral Optimization in High-Level Synthesis
    Suda, Akihiro
    Takase, Hideki
    Takagi, Kazuyoshi
    Takagi, Naofumi
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2014, E97A (12) : 2498 - 2506