A Transformation Framework for Optimizing Task-Parallel Programs

被引:21
|
作者
Nandivada, V. Krishna [1 ]
Shirako, Jun [2 ]
Zhao, Jisheng [2 ]
Sarkar, Vivek [2 ]
机构
[1] IIT Madras, Dept Comp Sci & Engn, Chennai 600036, Tamil Nadu, India
[2] Rice Univ, Dept Comp Sci, Houston, TX 77251 USA
来源
ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS | 2013年 / 35卷 / 01期
基金
美国国家科学基金会;
关键词
Algorithms; Performance; Experimentation; EFFICIENT;
D O I
10.1145/2450136.2450138
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Task parallelism has increasingly become a trend with programming models such as OpenMP 3.0, Cilk, Java Concurrency, X10, Chapel and Habanero-Java (HJ) to address the requirements of multicore programmers. While task parallelism increases productivity by allowing the programmer to express multiple levels of parallelism, it can also lead to performance degradation due to increased overheads. In this article, we introduce a transformation framework for optimizing task-parallel programs with a focus on task creation and task termination operations. These operations can appear explicitly in constructs such as async, finish in X10 and HJ, task, taskwait in OpenMP 3.0, and spawn, sync in Cilk, or implicitly in composite code statements such as foreach and ateach loops in X10, forall and foreach loops in HJ, and parallel loop in OpenMP. Our framework includes a definition of data dependence in task-parallel programs, a happens-before analysis algorithm, and a range of program transformations for optimizing task parallelism. Broadly, our transformations cover three different but interrelated optimizations: (1) finish-elimination, (2) forall-coarsening, and (3) loop-chunking. Finish-elimination removes redundant task termination operations, forall-coarsening replaces expensive task creation and termination operations with more efficient synchronization operations, and loop-chunking extracts useful parallelism from ideal parallelism. All three optimizations are specified in an iterative transformation framework that applies a sequence of relevant transformations until a fixed point is reached. Further, we discuss the impact of exception semantics on the specified transformations, and extend them to handle task-parallel programs with precise exception semantics. Experimental results were obtained for a collection of task-parallel benchmarks on three multicore platforms: a dual-socket 128-thread (16-core) Niagara T2 system, a quad-socket 16-core Intel Xeon SMP, and a quad-socket 32-core Power7 SMP. We have observed that the proposed optimizations interact with each other in a synergistic way, and result in an overall geometric average performance improvement between 6.28x and 10.30x, measured across all three platforms for the benchmarks studied.
引用
收藏
页数:48
相关论文
共 50 条
  • [1] Performance modelling for task-parallel programs
    Kühnemann, M
    Rauber, T
    Rünger, G
    PERFORMANCE ANALYSIS AND GRID COMPUTING, 2004, : 77 - 91
  • [2] TProf: An energy profiler for task-parallel programs
    Manousakis, Ioannis
    Zakkak, Foivos S.
    Pratikakis, Polyvios
    Nikolopoulos, Dimitrios S.
    SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2015, 5 : 1 - 13
  • [3] An Efficient Task-Parallel Pipeline Programming Framework
    Chiu, Cheng-Hsiang
    Xiong, Zhicheng
    Guo, Zizheng
    Huang, Tsung-Wei
    Lin, Yibo
    THE PROCEEDINGS OF INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING IN ASIA-PACIFIC REGION, HPC ASIA 2024, 2024, : 95 - 106
  • [4] Hierarchical Parallel Dynamic Dependence Analysis for Recursively Task-Parallel Programs
    Papakonstantinou, Nikolaos
    Zakkak, Foivos S.
    Pratikakis, Polyvios
    2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016), 2016, : 933 - 942
  • [5] Extracting SIMD Parallelism from Recursive Task-Parallel Programs
    Ren, Bin
    Balakrishna, Shruthi
    Jo, Youngjoon
    Krishnamoorthy, Sriram
    Agrawal, Kunal
    Kulkarni, Milind
    ACM TRANSACTIONS ON PARALLEL COMPUTING, 2019, 6 (04)
  • [6] Extending High-Level Synthesis for Task-Parallel Programs
    Chi, Yuze
    Guo, Licheng
    Lau, Jason
    Choi, Young-kyu
    Wang, Jie
    Cong, Jason
    2021 IEEE 29TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2021), 2021, : 204 - 213
  • [7] Global Dead-Block Management for Task-Parallel Programs
    Manivannan, Madhavan
    Pericas, Miquel
    Papaefstathiou, Vassilis
    Stenstrom, Per
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2018, 15 (03)
  • [8] Auto-Partitioning Heterogeneous Task-Parallel Programs with StreamBlocks
    Emami, Mahyar
    Bezati, Endri
    Janneck, Jorn W.
    Larus, James R.
    PROCEEDINGS OF THE 2022 31ST INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT 2022, 2022, : 398 - 411
  • [9] Dynamic Determinacy Race Detection for Task-Parallel Programs with Promises
    Jin, Feiyang
    Yu, Lechen
    Cogumbreiro, Tiago
    Shirako, Jun
    Sarkar, Vivek
    Leibniz International Proceedings in Informatics, LIPIcs, 2023, 263
  • [10] Model-checking task-parallel programs for data-race
    Radha Nakade
    Eric Mercer
    Peter Aldous
    Kyle Storey
    Benjamin Ogles
    Joshua Hooker
    Sheridan Jacob Powell
    Jay McCarthy
    Innovations in Systems and Software Engineering, 2019, 15 : 289 - 306