Porting, monitoring and tuning UPC on NUMA architectures

被引：0

作者：

Mohamed, AS ^{[1
]}

机构：

[1] George Washington Univ, Dept Elect & Comp Engn, Washington, DC 20052 USA

来源：

PDPTA'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS 1-4 | 2003年

关键词：

parallel C; P-threads; optimization; memory consistency;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In this work we report on our experience in porting NAS NPB benchmark using the recently developed GCC-SGI UPC compiler on the Origin 03800 NUMA machine. In fact, the SGI NUMA environment has provided new opportunities for UPC For example, by coupling Unix P-threads with standard UPC threads one is able to code solutions to problems using pipelining, divide-and-conquer, and speculative parallelization styles. This task-level parallelism was never before possible in UPC that relies mainly on distributed shared memory fine-grain data parallelism. This has led to having multi-threads per processor and provided further opportunities for optimization through load balancing. The SGI CC-NUMA environment also provided memory consistency optimizations to mask the latency of remote accesses, convert aggregate accesses into more efficient bulk operations, and cache data locally. UPC allows programmers to specify memory accesses with "relaxed" consistency semantics. These explicit consistency "hints" are exploited by the CC-NUMA environment very effectively to hide latency and reduce coherence overheads further by, for example, allowing two or more processors to modify their local copies of shared data concurrently and merging modifications at synchronization points. This characteristic alleviates the effect of false sharing. Yet another opportunity that was made possible by the spectrum of performance analysis and profiler tools within the SGI NUMA environment is the development of new monitoring and tuning strategy that aims at improving the efficiency of parallel UPC applications. We are able to project the physically monitored parameters back to the data structures and high-level program constructs within the UPC source code. This increases a programmer's ability to effectively understand, develop, and optimize UPC programs; enabling an exact analysis of a program's data and code layouts. Using this visualized information, programmers are able to detect communication, data/threads layouts, and I/O bottlenecks and further optimizes UPC programs with a better data and threads layouts potentially resulting in significant performance improvements.

引用

页码：1518 / 1525

页数：8

共 50 条

[21] Optimized Execution Strategies for Sequence Aligners on NUMA Architectures
Lenis, Josefina
Senar, Miquel Angel
EURO-PAR 2016: PARALLEL PROCESSING WORKSHOPS, 2017, 10104 : 492 - 503
[22] A Tool to Analyze the Performance of Multithreaded Programs on NUMA Architectures
Liu, Xu
Mellor-Crummey, John
ACM SIGPLAN NOTICES, 2014, 49 (08) : 259 - 271
[23] Data access collection and data partitioning for NUMA architectures
Calidonna, CR
Furnari, MM
ADVANCES IN COMPUTATIONAL MECHANICS WITH HIGH PERFORMANCE COMPUTING, 1998, : 33 - 40
[24] Parallel simulations of seismic wave propagation on NUMA architectures
Dupros, Fabrice
Pousa Ribeiro, Christiane
Carissimi, Alexandre
Mehaut, Jean-Francois
PARALLEL COMPUTING: FROM MULTICORES AND GPU'S TO PETASCALE, 2010, 19 : 67 - 74
[25] Methodology for analog technology porting including performance tuning
Francken, K
Gielen, G
ISCAS '99: PROCEEDINGS OF THE 1999 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 1: VLSI, 1999, : 415 - 418
[26] Porting large scale air pollution codes to parallel architectures
Bendtsen, C
Zlatev, Z
Ostergaard, J
MODELLING PHYSICAL AND CHEMICAL PROCESSES IN THE ATMOSPHERE, 1999, : 105 - 109
[27] Europort-1: Porting industrial codes to parallel architectures
Mierendorff, H
Stuben, K
Thole, CA
Thomas, O
HIGH-PERFORMANCE COMPUTING AND NETWORKING, 1995, 919 : 806 - 812
[28] Case Studies on the Impact and Challenges of Heterogeneous NUMA Architectures for HPC
Zaourar, Lilia
Benazouz, Mohamed
Mouhagir, Ayoub
Falquez, Carlos
Portero, Antoni
Ho, Nam
Suarez, Estela
Petrakis, Polydoros
Marazakis, Manolis
Sgherzi, Francesco
Fernandez, Ivan
Dolbeau, Romain
Pleiter, Dirk
ARCHITECTURE OF COMPUTING SYSTEMS, ARCS 2024, 2024, 14842 : 251 - 265
[29] Multilevel parallelism optimization of stencil computations on SIMDlized NUMA architectures
Zhang, Kaifang
Su, Huayou
Dou, Yong
JOURNAL OF SUPERCOMPUTING, 2021, 77 (11): : 13584 - 13600
[30] Performance analysis of four parallel programming models on NUMA architectures
Mohamed, AS
Cantonnet, F
PARALLEL AND DISTRIBUTED COMPUTING SYSTEMS, PROCEEDINGS, 2003, : 119 - 125

← 1 2 3 4 5 →