Optimized unrolling of nested loops

被引:23
|
作者
Sarkar, V [1 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
loop transformations; loop unrolling; unroll-and-jam; unroll factors;
D O I
10.1023/A:1012246031671
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Loop unrolling is a well known loop transformation that has been used in optimizing compilers for over three decades. In this paper, we address the problems of automatically selecting unroll factors for perfectly nested loops, and generating compact code for the selected unroll factors. Compared to past work, the contributions of our work include (i) a more detailed cost model that includes register locality, instruction-level parallelism and instruction-cache considerations; (ii) a new code generation algorithm that generates more compact code than the unroll-and-jam transformation; and (iii) a new algorithm for efficiently enumerating feasible unroll vectors. Our experimental results confirm the wide applicability of our approach by showing a 2.2 x speedup on matrix multiply, and an average 1.08 x speedup on seven of the SPEC95fp benchmarks (with a 1.2 x speedup for two benchmarks). Larger performance improvements can be expected on processors that have larger numbers of registers and larger degrees of instruction-level parallelism than the processor used for our measurements (PowerPC 604).
引用
收藏
页码:545 / 581
页数:37
相关论文
共 50 条
  • [1] Optimized unrolling of nested loops
    IBM T. J. Watson Research Center, P.O. Box 704, Yorktown Heights, NY 10598
    Int J Parallel Program, 5 (545-581):
  • [2] Optimized unrolling of nested loops
    Sarkar, Vivek
    Proceedings of the International Conference on Supercomputing, 2000, : 153 - 166
  • [3] Optimized Unrolling of Nested Loops
    Vivek Sarkar
    International Journal of Parallel Programming, 2001, 29 : 545 - 581
  • [4] A method for estimating optimal unrolling times for nested loops
    Koseki, A
    Komastu, H
    Fukuzawa, Y
    THIRD INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS, AND NETWORKS, PROCEEDINGS (I-SPAN '97), 1997, : 376 - 382
  • [5] UNROLLING LOOPS IN FORTRAN
    DONGARRA, JJ
    HINDS, AR
    SOFTWARE-PRACTICE & EXPERIENCE, 1979, 9 (03): : 219 - 226
  • [6] Unrolling Loops Containing Task Parallelism
    Ferrer, Roger
    Duran, Alejandro
    Martorell, Xavier
    Ayguade, Eduard
    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, 2010, 5898 : 416 - 423
  • [7] PSDSE: Particle Swarm Driven Design Space Exploration of Architecture and Unrolling Factors for Nested Loops in High Level Synthesis
    Mishra, Vipul Kumar
    Sengupta, Anirban
    2014 FIFTH INTERNATIONAL SYMPOSIUM ON ELECTRONIC SYSTEM DESIGN (ISED), 2014, : 10 - 14
  • [8] Graph Signal Restoration Using Nested Deep Algorithm Unrolling
    Nagahama, Masatoshi
    Yamada, Koki
    Tanaka, Yuichi
    Chan, Stanley H.
    Eldar, Yonina C.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2022, 70 : 3296 - 3311
  • [9] Unrolling loops with indeterminate loop counts in system level pipelines
    Guo, H
    Paramewaran, S
    PROCEEDINGS OF THE ASP-DAC '98 - ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE 1998 WITH EDA TECHNO FAIR '98, 1998, : 99 - 104
  • [10] Extracting parallelism in nested loops
    Song, WB
    Park, DS
    Kim, BS
    Kong, YH
    TWENTIETH ANNUAL INTERNATIONAL COMPUTER SOFTWARE & APPLICATIONS CONFERENCE (COMPSAC'96), PROCEEDINGS, 1996, 20 : 41 - 47