Optimized unrolling of nested loops

被引:23
|
作者
Sarkar, V [1 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
loop transformations; loop unrolling; unroll-and-jam; unroll factors;
D O I
10.1023/A:1012246031671
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Loop unrolling is a well known loop transformation that has been used in optimizing compilers for over three decades. In this paper, we address the problems of automatically selecting unroll factors for perfectly nested loops, and generating compact code for the selected unroll factors. Compared to past work, the contributions of our work include (i) a more detailed cost model that includes register locality, instruction-level parallelism and instruction-cache considerations; (ii) a new code generation algorithm that generates more compact code than the unroll-and-jam transformation; and (iii) a new algorithm for efficiently enumerating feasible unroll vectors. Our experimental results confirm the wide applicability of our approach by showing a 2.2 x speedup on matrix multiply, and an average 1.08 x speedup on seven of the SPEC95fp benchmarks (with a 1.2 x speedup for two benchmarks). Larger performance improvements can be expected on processors that have larger numbers of registers and larger degrees of instruction-level parallelism than the processor used for our measurements (PowerPC 604).
引用
收藏
页码:545 / 581
页数:37
相关论文
共 50 条
  • [21] Partitioning and mapping nested loops on multicomputers
    Chen, TS
    Sheu, JP
    APPLICATIONS OF HIGH-PERFORMANCE COMPUTING IN ENGINEERING VI, 2000, 6 : 201 - 212
  • [22] Nested feedback loops in gene regulation
    Mengel, Benedicte
    Krishna, Sandeep
    Jensen, Mogens H.
    Trusina, Ala
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2012, 391 (1-2) : 100 - 106
  • [23] Automatic Mapping of Nested Loops to FPGAs
    Bondhugula, Uday
    Ramanujam, J.
    Sadayappan, P.
    PROCEEDINGS OF THE 2007 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING PPOPP'07, 2007, : 101 - 111
  • [24] An optimized dependence convex hull partitioning technique to maximize parallelism of nested loops with non-uniform dependences
    Pean, DL
    Lai, GJ
    Chen, C
    SEVENTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, PROCEEDINGS, 2000, : 367 - 374
  • [25] GRAPH SIGNAL DENOISING USING NESTED-STRUCTURED DEEP ALGORITHM UNROLLING
    Nagahama, Masatoshi
    Yamada, Koki
    Tanaka, Yuichi
    Chan, Stanley H.
    Eldar, Yonina C.
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5280 - 5284
  • [26] Using the Meeting Graph Framework to Minimise Kernel Loop Unrolling for Scheduled Loops
    Bachir, Mounira
    Gregg, David
    Touati, Sid-Ahmed-Ali
    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, 2010, 5898 : 278 - +
  • [27] PARTITIONING AND MAPPING NESTED LOOPS ON MULTIPROCESSOR SYSTEMS
    SHEU, JP
    TAI, TH
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1991, 2 (04) : 430 - 439
  • [28] Generation of efficient nested loops from polyhedra
    Quilleré, F
    Rajopadhye, S
    Wilde, D
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2000, 28 (05) : 469 - 498
  • [29] Tiling and memory reuse for sequences of nested loops
    Bouchebaba, Y
    Coelho, F
    EURO-PAR 2002 PARALLEL PROCESSING, PROCEEDINGS, 2002, 2400 : 255 - 264
  • [30] Tiling nested loops into maximal rectangular blocks
    Chen, YS
    Wang, SD
    Wang, CM
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1996, 35 (02) : 123 - 132