Multi-tasking Execution in PGAS Language XcalableMP and Communication Optimization on Many-core Clusters

被引:4
|
作者
Tsugane, Keisuke [1 ]
Lee, Jinpil [2 ]
Murai, Hitoshi [2 ]
Sato, Mitsuhisa [1 ,2 ]
机构
[1] Univ Tsukuba, Grad Sch Syst & Informat Engn, Ibaraki, Japan
[2] RIKEN, Adv Inst Computat Sci, Kobe, Hyogo, Japan
关键词
Task Parallelism; Many-core cluster; PGAS; XcalableMP;
D O I
10.1145/3149457.3154482
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Large-scale clusters based on many-core processors such as Intel Xeon Phi have recently been deployed. Multi-tasking execution using task dependencies in OpenMP 4.0 is a promising candidate for facilitating the parallelization of such many-core processors, because this enables users to avoid global synchronization through fine-grained task-to-task synchronization using userspecified data dependencies. Recently, the partitioned global address space (PGAS) model has emerged as a usable distribute-dmemory programming model. In this paper, we propose a multitasking execution model in the PGAS language XcalableMP (XMP) for many-core clusters. The model provides a method to describe interactions between tasks based on point-to-point communications on the global address space. A communication is executed non-collectively among nodes. We implemented the proposed execution model in XMP, and designed a simple code transformation algorithm to MPI and OpenMP. We implemented two benchmarks using our model for preliminary evaluation, namely blocked Cholesky factorization and the Laplace equation solver. Most of the implementations using our model outperform the conventional barrier-based data-parallel model. To improve the performance in many-core clusters,we propose a communication optimization method by dedicating a single thread for communications, to avoid performance problems related to the current multi-threaded MPI execution. As a result, the performances of blocked Cholesky factorization and the Laplace equation solver using this communication optimization are improved to 138% and 119% compared with the barrier-based implementation in Intel Xeon Phi KNL clusters, respectively. From the viewpoint of productivity, the program implemented by our model in XMP is almost the same as the implementation based on the OpenMP task depend clause, because XMP enables the parallelization of the serial source code with additional directives and small changes as well as OpenMP.
引用
收藏
页码:75 / 85
页数:11
相关论文
共 40 条
  • [1] A PGAS Execution Model for Efficient Stencil Computation on Many-Core Processors
    Ikei, Mitsuru
    Sato, Mitsuhisa
    2014 14TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2014, : 305 - 314
  • [2] WorkQ: A Many-Core Producer/Consumer Execution Model Applied to PGAS Computations
    Ozog, David
    Malony, Allen
    Hammond, Jeff R.
    Balaji, Pavan
    2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2014, : 632 - 639
  • [3] Reliability Optimization on Multi-Core Systems with Multi-Tasking and Redundant Multi-Threading
    Chen, Kuan-Hsun
    von der Brueggen, Georg
    Chen, Jian-Jia
    IEEE TRANSACTIONS ON COMPUTERS, 2018, 67 (04) : 484 - 497
  • [4] The ROSACE Case Study: From Simulink Specification to Multi/Many-Core Execution
    Pagetti, Claire
    Saussie, David
    Gratia, Romain
    Noulard, Eric
    Siron, Pierre
    2014 IEEE 20TH REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM (RTAS), 2014, : 309 - 318
  • [5] Optimization of Scan Algorithms on Multi- and Many-core Processors
    Sun, Qiao
    Yang, Chao
    2014 21ST INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2014,
  • [6] Optimization of scan algorithms on multi- and many-core processors
    Sun, Qiao
    Yang, Chao
    2014 21st International Conference on High Performance Computing, HiPC 2014, 2014,
  • [7] Mapping a Multi-Rate Synchronous Language to a Many-Core Processor
    Puffitsch, Wolfgang
    Noulard, Eric
    Pagetti, Claire
    2013 IEEE 19TH REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM (RTAS), 2013, : 293 - 302
  • [8] MetaMorph: A Library Framework for Interoperable Kernels on Multi- and Many-core Clusters
    Helal, Ahmed E.
    Sathre, Paul
    Feng, Wu-Chun
    SC '16: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2016, : 119 - 129
  • [9] Parallel optimization using/for multi and many-core high performance computing
    Melab, Nouredine
    Zomaya, Albert Y.
    Chakroun, Imen
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2018, 112 : 109 - 110
  • [10] Memory Access and Communication Fusion Compiler Optimization for Sunway Many-core Processors
    Fang Y.-F.
    Li Y.-B.
    Dong E.-M.
    Wang Y.-F.
    Liu Q.
    Ruan Jian Xue Bao/Journal of Software, 2024, 35 (06): : 2648 - 2667