Optimized unrolling of nested loops

被引:23
|
作者
Sarkar, V [1 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
loop transformations; loop unrolling; unroll-and-jam; unroll factors;
D O I
10.1023/A:1012246031671
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Loop unrolling is a well known loop transformation that has been used in optimizing compilers for over three decades. In this paper, we address the problems of automatically selecting unroll factors for perfectly nested loops, and generating compact code for the selected unroll factors. Compared to past work, the contributions of our work include (i) a more detailed cost model that includes register locality, instruction-level parallelism and instruction-cache considerations; (ii) a new code generation algorithm that generates more compact code than the unroll-and-jam transformation; and (iii) a new algorithm for efficiently enumerating feasible unroll vectors. Our experimental results confirm the wide applicability of our approach by showing a 2.2 x speedup on matrix multiply, and an average 1.08 x speedup on seven of the SPEC95fp benchmarks (with a 1.2 x speedup for two benchmarks). Larger performance improvements can be expected on processors that have larger numbers of registers and larger degrees of instruction-level parallelism than the processor used for our measurements (PowerPC 604).
引用
收藏
页码:545 / 581
页数:37
相关论文
共 50 条
  • [31] A CONTROL STRUCTURE FOR A VARIABLE NUMBER OF NESTED LOOPS
    MCKENZIE, BJ
    TAKAOKA, T
    COMPUTER JOURNAL, 1983, 26 (03): : 282 - 283
  • [32] Composable, Sound Transformations of Nested Recursion and Loops
    Sundararajah, Kirshanthan
    Kulkarni, Milind
    PROCEEDINGS OF THE 40TH ACM SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION (PLDI '19), 2019, : 902 - 917
  • [33] A CONTROL STRUCTURE FOR A VARIABLE NUMBER OF NESTED LOOPS
    SKORDALAKIS, E
    PAPAKONSTANTINOU, G
    COMPUTER JOURNAL, 1982, 25 (01): : 48 - 51
  • [34] Loop striping: Maximize parallelism for nested loops
    Xue, Chun
    Shao, Zili
    Liu, Meilin
    Qiu, Meikang
    Sha, Edwin H-M.
    EMBEDDED AND UBIQUITOUS COMPUTING, PROCEEDINGS, 2006, 4096 : 405 - 414
  • [35] Precise Data Locality Optimization of Nested Loops
    Vincent Loechner
    Benoît Meister
    Philippe Clauss
    The Journal of Supercomputing, 2002, 21 : 37 - 76
  • [36] GROUPING IN NESTED LOOPS FOR PARALLEL EXECUTION ON MULTICOMPUTERS
    KING, CT
    NI, LM
    PROCEEDINGS OF THE 1989 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, VOL 2: SOFTWARE, 1989, : 31 - 38
  • [37] Exploitation of parallelism to nested loops with dependence cycles
    Chang, WL
    Chu, CP
    Ho, M
    JOURNAL OF SYSTEMS ARCHITECTURE, 2004, 50 (12) : 729 - 742
  • [38] Exact analysis of the cache behavior of nested loops
    Chatterjee, S
    Parker, E
    Hanlon, PJ
    Lebeck, AR
    ACM SIGPLAN NOTICES, 2001, 36 (05) : 286 - 297
  • [39] Nested loops optimization for multiprocessor architecture design
    Leonardi, A
    Passos, NL
    Sha, EHM
    1998 MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, PROCEEDINGS, 1999, : 415 - 418
  • [40] Precise data locality optimization of nested loops
    Loechner, V
    Meister, B
    Clauss, P
    JOURNAL OF SUPERCOMPUTING, 2002, 21 (01): : 37 - 76