ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures

被引:44
|
作者
Broquedis, Francois [1 ]
Furmento, Nathalie [1 ]
Goglin, Brice [1 ]
Wacrenier, Pierre-Andre [1 ]
Namyst, Raymond [1 ]
机构
[1] Univ Bordeaux, LaBRI, INRIA Bordeaux Sud Ouest, F-33405 Talence, France
关键词
OpenMP; Memory; NUMA; Hierarchical Thread Scheduling; Multi-Core; PERFORMANCE;
D O I
10.1007/s10766-010-0136-3
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Exploiting the full computational power of current hierarchical multiprocessor machines requires a very careful distribution of threads and data among the underlying non-uniform architecture so as to avoid remote memory access penalties. Directive-based programming languages such as OpenMP, can greatly help to perform such a distribution by providing programmers with an easy way to structure the parallelism of their application and to transmit this information to the runtime system. Our runtime, which is based on a multi-level thread scheduler combined with a NUMA-aware memory manager, converts this information into scheduling hints related to thread-memory affinity issues. These hints enable dynamic load distribution guided by application structure and hardware topology, thus helping to achieve performance portability. Several experiments show that mixed solutions (migrating both threads and data) outperform work-stealing based balancing strategies and next-touch-based data distribution policies. These techniques provide insights about additional optimizations.
引用
收藏
页码:418 / 439
页数:22
相关论文
共 50 条
  • [41] Scheduling dynamic OpenMP applications over multicore architectures
    Broquedis, Francois
    Diakhate, Francois
    Thibault, Samuel
    Aumage, Olivier
    Namyst, Raymond
    Wacrenier, Pierre-Andre
    OPENMP IN A NEW ERA OF PARALLELISM, PROCEEDINGS, 2008, 5004 : 170 - 180
  • [42] A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures
    Ayguade, Eduard
    Badia, Rosa M.
    Cabrera, Daniel
    Duran, Alejandro
    Gonzalez, Marc
    Igual, Francisco
    Jimenez, Daniel
    Labarta, Jesus
    Martorell, Xavier
    Mayo, Rafael
    Perez, Josep M.
    Quintana-Orti, Enrique S.
    EVOLVING OPENMP IN AN AGE OF EXTREME PARALLELISM, 2009, 5568 : 154 - +
  • [43] Efficient Execution of OpenMP on GPUs
    Huber, Joseph
    Cornelius, Melanie
    Georgakoudis, Giorgis
    Tian, Shilei
    Diaz, Jose M. Monsalve
    Dinel, Kuter
    Chapman, Barbara
    Doerfert, Johannes
    CGO '22: PROCEEDINGS OF THE 2022 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO), 2022, : 41 - 52
  • [44] An efficient synchronization model for OpenMP
    Garcia Lopez, F. C.
    Frias Arrocha, N. L.
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2006, 66 (11) : 1359 - 1365
  • [45] Case Studies on the Impact and Challenges of Heterogeneous NUMA Architectures for HPC
    Zaourar, Lilia
    Benazouz, Mohamed
    Mouhagir, Ayoub
    Falquez, Carlos
    Portero, Antoni
    Ho, Nam
    Suarez, Estela
    Petrakis, Polydoros
    Marazakis, Manolis
    Sgherzi, Francesco
    Fernandez, Ivan
    Dolbeau, Romain
    Pleiter, Dirk
    ARCHITECTURE OF COMPUTING SYSTEMS, ARCS 2024, 2024, 14842 : 251 - 265
  • [46] Multilevel parallelism optimization of stencil computations on SIMDlized NUMA architectures
    Zhang, Kaifang
    Su, Huayou
    Dou, Yong
    JOURNAL OF SUPERCOMPUTING, 2021, 77 (11): : 13584 - 13600
  • [47] Performance analysis of four parallel programming models on NUMA architectures
    Mohamed, AS
    Cantonnet, F
    PARALLEL AND DISTRIBUTED COMPUTING SYSTEMS, PROCEEDINGS, 2003, : 119 - 125
  • [48] Speculative Synchronization for Coherence-free Embedded NUMA Architectures
    Papagiannopoulou, Dimitra
    Moreshet, Tali
    Marongiu, Andrea
    Benini, Luca
    Herlihy, Maurice
    Bahar, R. Iris
    2014 INTERNATIONAL CONFERENCE ON EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION (SAMOS XIV), 2014, : 99 - 106
  • [49] Optimising MPI tree-based communication for NUMA architectures
    Karlsson, Christer
    Chen, Zizhong
    International Journal of Autonomous and Adaptive Communications Systems, 2015, 8 (04) : 407 - 423
  • [50] Resolving Load Balancing Issues in BWA on NUMA Multicore Architectures
    Herzeel, Charlotte
    Ashby, Thomas J.
    Costanza, Pascal
    De Meuter, Wolfgang
    PARALLEL PROCESSING AND APPLIED MATHEMATICS (PPAM 2013), PT II, 2014, 8385 : 227 - 236