A Transformation Framework for Optimizing Task-Parallel Programs

被引：21

作者：

Nandivada, V. Krishna ^{[1
]}

Shirako, Jun ^{[2
]}

Zhao, Jisheng ^{[2
]}

Sarkar, Vivek ^{[2
]}

机构：

[1] IIT Madras, Dept Comp Sci & Engn, Chennai 600036, Tamil Nadu, India

[2] Rice Univ, Dept Comp Sci, Houston, TX 77251 USA

来源：

ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS | 2013年 / 35卷 / 01期

基金：

美国国家科学基金会;

关键词：

Algorithms; Performance; Experimentation; EFFICIENT;

D O I：

10.1145/2450136.2450138

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Task parallelism has increasingly become a trend with programming models such as OpenMP 3.0, Cilk, Java Concurrency, X10, Chapel and Habanero-Java (HJ) to address the requirements of multicore programmers. While task parallelism increases productivity by allowing the programmer to express multiple levels of parallelism, it can also lead to performance degradation due to increased overheads. In this article, we introduce a transformation framework for optimizing task-parallel programs with a focus on task creation and task termination operations. These operations can appear explicitly in constructs such as async, finish in X10 and HJ, task, taskwait in OpenMP 3.0, and spawn, sync in Cilk, or implicitly in composite code statements such as foreach and ateach loops in X10, forall and foreach loops in HJ, and parallel loop in OpenMP. Our framework includes a definition of data dependence in task-parallel programs, a happens-before analysis algorithm, and a range of program transformations for optimizing task parallelism. Broadly, our transformations cover three different but interrelated optimizations: (1) finish-elimination, (2) forall-coarsening, and (3) loop-chunking. Finish-elimination removes redundant task termination operations, forall-coarsening replaces expensive task creation and termination operations with more efficient synchronization operations, and loop-chunking extracts useful parallelism from ideal parallelism. All three optimizations are specified in an iterative transformation framework that applies a sequence of relevant transformations until a fixed point is reached. Further, we discuss the impact of exception semantics on the specified transformations, and extend them to handle task-parallel programs with precise exception semantics. Experimental results were obtained for a collection of task-parallel benchmarks on three multicore platforms: a dual-socket 128-thread (16-core) Niagara T2 system, a quad-socket 16-core Intel Xeon SMP, and a quad-socket 32-core Power7 SMP. We have observed that the proposed optimizations interact with each other in a synergistic way, and result in an overall geometric average performance improvement between 6.28x and 10.30x, measured across all three platforms for the benchmarks studied.

引用

页数：48

共 50 条

[1] Performance modelling for task-parallel programs
Kühnemann, M
Rauber, T
Rünger, G
PERFORMANCE ANALYSIS AND GRID COMPUTING, 2004, : 77 - 91
[2] TProf: An energy profiler for task-parallel programs
Manousakis, Ioannis
Zakkak, Foivos S.
Pratikakis, Polyvios
Nikolopoulos, Dimitrios S.
SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2015, 5 : 1 - 13
[3] An Efficient Task-Parallel Pipeline Programming Framework
Chiu, Cheng-Hsiang
Xiong, Zhicheng
Guo, Zizheng
Huang, Tsung-Wei
Lin, Yibo
THE PROCEEDINGS OF INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING IN ASIA-PACIFIC REGION, HPC ASIA 2024, 2024, : 95 - 106
[4] Hierarchical Parallel Dynamic Dependence Analysis for Recursively Task-Parallel Programs
Papakonstantinou, Nikolaos
Zakkak, Foivos S.
Pratikakis, Polyvios
2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016), 2016, : 933 - 942
[5] Extracting SIMD Parallelism from Recursive Task-Parallel Programs
Ren, Bin
Balakrishna, Shruthi
Jo, Youngjoon
Krishnamoorthy, Sriram
Agrawal, Kunal
Kulkarni, Milind
ACM TRANSACTIONS ON PARALLEL COMPUTING, 2019, 6 (04)
[6] Extending High-Level Synthesis for Task-Parallel Programs
Chi, Yuze
Guo, Licheng
Lau, Jason
Choi, Young-kyu
Wang, Jie
Cong, Jason
2021 IEEE 29TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2021), 2021, : 204 - 213
[7] Global Dead-Block Management for Task-Parallel Programs
Manivannan, Madhavan
Pericas, Miquel
Papaefstathiou, Vassilis
Stenstrom, Per
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2018, 15 (03)
[8] Auto-Partitioning Heterogeneous Task-Parallel Programs with StreamBlocks
Emami, Mahyar
Bezati, Endri
Janneck, Jorn W.
Larus, James R.
PROCEEDINGS OF THE 2022 31ST INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT 2022, 2022, : 398 - 411
[9] Dynamic Determinacy Race Detection for Task-Parallel Programs with Promises
Jin, Feiyang
Yu, Lechen
Cogumbreiro, Tiago
Shirako, Jun
Sarkar, Vivek
Leibniz International Proceedings in Informatics, LIPIcs, 2023, 263
[10] Model-checking task-parallel programs for data-race
Radha Nakade
Eric Mercer
Peter Aldous
Kyle Storey
Benjamin Ogles
Joshua Hooker
Sheridan Jacob Powell
Jay McCarthy
Innovations in Systems and Software Engineering, 2019, 15 : 289 - 306

← 1 2 3 4 5 →