(Mis)Understanding the NUMA Memory System Performance of Multithreaded Workloads

被引：0

作者：

Majo, Zoltan ^{[1
]}

Gross, Thomas R. ^{[1
]}

机构：

[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland

来源：

2013 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2013) | 2013年

关键词：

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

An important aspect of workload characterization is understanding memory system performance (i.e., understanding a workload's interaction with the memory system). On systems with a non-uniform memory architecture (NUMA) the performance critically depends on the distribution of data and computations. The actual memory access patterns have a large influence on performance on systems with aggressive prefetcher units. This paper describes an analysis of the memory system performance of multithreaded programs and shows that some programs are (unintentionally) structured so that they use the memory system of today's NUMA-multicores inefficiently: Programs exhibit program-level data sharing, a performance-limiting factor that makes data and computation distribution in NUMA systems difficult. Moreover, many programs have irregular memory access patterns that are hard to predict by processor prefetcher units. The memory system performance as observed for a given program on a specific platform depends also on many algorithm and implementation decisions. The paper shows that a set of simple algorithmic changes coupled with commonly available OS functionality suffice to eliminate data sharing and to regularize the memory access patterns for a subset of the PARSEC parallel benchmarks. These simple source-level changes result in performance improvements of up to 3.1X, but more importantly, they lead to a fairer and more accurate performance evaluation on NUMA-multicore systems. They also illustrate the importance of carefully considering all details of algorithms and architectures to avoid drawing incorrect conclusions.

引用

页码：11 / 22

页数：12

共 50 条

[21] Memory System Characterization of Big Data Workloads
Dimitrov, Martin
Kumar, Karthik
Lu, Patrick
Viswanathan, Vish
Willhalm, Thomas
2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
[22] Memory System Characterization of Deep Learning Workloads
Chishti, Zeshan
Akin, Berkin
MEMSYS 2019: PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS, 2019, : 497 - 505
[23] Performance Analysis of a Phase-Change Memory System on Various CNN Inference Workloads
Jang, Jihoon
Kim, Hyun
Lee, Hyokeun
2022 19TH INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC), 2022, : 133 - 134
[24] The memory performance of DSS commercial workloads in shared-memory multiprocessors
Trancoso, P
LarribaPey, JL
Zhang, Z
Torrellas, J
THIRD INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE - PROCEEDINGS, 1997, : 250 - 260
[25] Performance Evaluation of Intel Optane Memory for Managed Workloads
Akram, Shoaib
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2021, 18 (03)
[26] Performance evaluation of High Bandwidth Memory for HPC Workloads
Kabat, Amit Kumar
Pandey, Shubhang
Gopalakrishnan, Venkatesh Tiruchirai
2022 IEEE 35TH INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE (IEEE SOCC 2022), 2022, : 172 - 177
[27] Performance Optimization for In-Memory File Systems on NUMA Machines
Liu, Zhixiang
Sha, Edwin H. -M.
Chen, Xianzhang
Jiang, Weiwen
Zhuge, Qingfeng
2016 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT), 2016, : 7 - 12
[28] Evaluation of Performance Unfairness in NUMA System Architecture
Song, Wonjun
Jung, Hyung-Joon
Ahn, Jung Ho
Lee, Jae W.
Kim, John
IEEE COMPUTER ARCHITECTURE LETTERS, 2017, 16 (01) : 26 - 29
[29] Memory model effects on application performance for a lightweight multithreaded architecture
Li, Sheng
Kuntz, Shannon
Kogge, Peter
Brockman, Jay
2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, 2008, : 2258 - +
[30] PERFORMANCE ANALYSIS OF 4 MEMORY CONSISTENCY MODELS FOR MULTITHREADED MULTIPROCESSORS
CHONG, YK
HWANG, K
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1995, 6 (10) : 1085 - 1099

← 1 2 3 4 5 →