ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures

被引：44

作者：

Broquedis, Francois ^{[1
]}

Furmento, Nathalie ^{[1
]}

Goglin, Brice ^{[1
]}

Wacrenier, Pierre-Andre ^{[1
]}

Namyst, Raymond ^{[1
]}

机构：

[1] Univ Bordeaux, LaBRI, INRIA Bordeaux Sud Ouest, F-33405 Talence, France

来源：

INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING | 2010年 / 38卷 / 5-6期

关键词：

OpenMP; Memory; NUMA; Hierarchical Thread Scheduling; Multi-Core; PERFORMANCE;

D O I：

10.1007/s10766-010-0136-3

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Exploiting the full computational power of current hierarchical multiprocessor machines requires a very careful distribution of threads and data among the underlying non-uniform architecture so as to avoid remote memory access penalties. Directive-based programming languages such as OpenMP, can greatly help to perform such a distribution by providing programmers with an easy way to structure the parallelism of their application and to transmit this information to the runtime system. Our runtime, which is based on a multi-level thread scheduler combined with a NUMA-aware memory manager, converts this information into scheduling hints related to thread-memory affinity issues. These hints enable dynamic load distribution guided by application structure and hardware topology, thus helping to achieve performance portability. Several experiments show that mixed solutions (migrating both threads and data) outperform work-stealing based balancing strategies and next-touch-based data distribution policies. These techniques provide insights about additional optimizations.

引用

页码：418 / 439

页数：22

共 50 条

[41] Scheduling dynamic OpenMP applications over multicore architectures
Broquedis, Francois
Diakhate, Francois
Thibault, Samuel
Aumage, Olivier
Namyst, Raymond
Wacrenier, Pierre-Andre
OPENMP IN A NEW ERA OF PARALLELISM, PROCEEDINGS, 2008, 5004 : 170 - 180
[42] A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures
Ayguade, Eduard
Badia, Rosa M.
Cabrera, Daniel
Duran, Alejandro
Gonzalez, Marc
Igual, Francisco
Jimenez, Daniel
Labarta, Jesus
Martorell, Xavier
Mayo, Rafael
Perez, Josep M.
Quintana-Orti, Enrique S.
EVOLVING OPENMP IN AN AGE OF EXTREME PARALLELISM, 2009, 5568 : 154 - +
[43] Efficient Execution of OpenMP on GPUs
Huber, Joseph
Cornelius, Melanie
Georgakoudis, Giorgis
Tian, Shilei
Diaz, Jose M. Monsalve
Dinel, Kuter
Chapman, Barbara
Doerfert, Johannes
CGO '22: PROCEEDINGS OF THE 2022 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO), 2022, : 41 - 52
[44] An efficient synchronization model for OpenMP
Garcia Lopez, F. C.
Frias Arrocha, N. L.
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2006, 66 (11) : 1359 - 1365
[45] Case Studies on the Impact and Challenges of Heterogeneous NUMA Architectures for HPC
Zaourar, Lilia
Benazouz, Mohamed
Mouhagir, Ayoub
Falquez, Carlos
Portero, Antoni
Ho, Nam
Suarez, Estela
Petrakis, Polydoros
Marazakis, Manolis
Sgherzi, Francesco
Fernandez, Ivan
Dolbeau, Romain
Pleiter, Dirk
ARCHITECTURE OF COMPUTING SYSTEMS, ARCS 2024, 2024, 14842 : 251 - 265
[46] Multilevel parallelism optimization of stencil computations on SIMDlized NUMA architectures
Zhang, Kaifang
Su, Huayou
Dou, Yong
JOURNAL OF SUPERCOMPUTING, 2021, 77 (11): : 13584 - 13600
[47] Performance analysis of four parallel programming models on NUMA architectures
Mohamed, AS
Cantonnet, F
PARALLEL AND DISTRIBUTED COMPUTING SYSTEMS, PROCEEDINGS, 2003, : 119 - 125
[48] Speculative Synchronization for Coherence-free Embedded NUMA Architectures
Papagiannopoulou, Dimitra
Moreshet, Tali
Marongiu, Andrea
Benini, Luca
Herlihy, Maurice
Bahar, R. Iris
2014 INTERNATIONAL CONFERENCE ON EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION (SAMOS XIV), 2014, : 99 - 106
[49] Optimising MPI tree-based communication for NUMA architectures
Karlsson, Christer
Chen, Zizhong
International Journal of Autonomous and Adaptive Communications Systems, 2015, 8 (04) : 407 - 423
[50] Resolving Load Balancing Issues in BWA on NUMA Multicore Architectures
Herzeel, Charlotte
Ashby, Thomas J.
Costanza, Pascal
De Meuter, Wolfgang
PARALLEL PROCESSING AND APPLIED MATHEMATICS (PPAM 2013), PT II, 2014, 8385 : 227 - 236

← 1 2 3 4 5 →