(Mis)Understanding the NUMA Memory System Performance of Multithreaded Workloads

被引:0
|
作者
Majo, Zoltan [1 ]
Gross, Thomas R. [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland
来源
2013 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2013) | 2013年
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
An important aspect of workload characterization is understanding memory system performance (i.e., understanding a workload's interaction with the memory system). On systems with a non-uniform memory architecture (NUMA) the performance critically depends on the distribution of data and computations. The actual memory access patterns have a large influence on performance on systems with aggressive prefetcher units. This paper describes an analysis of the memory system performance of multithreaded programs and shows that some programs are (unintentionally) structured so that they use the memory system of today's NUMA-multicores inefficiently: Programs exhibit program-level data sharing, a performance-limiting factor that makes data and computation distribution in NUMA systems difficult. Moreover, many programs have irregular memory access patterns that are hard to predict by processor prefetcher units. The memory system performance as observed for a given program on a specific platform depends also on many algorithm and implementation decisions. The paper shows that a set of simple algorithmic changes coupled with commonly available OS functionality suffice to eliminate data sharing and to regularize the memory access patterns for a subset of the PARSEC parallel benchmarks. These simple source-level changes result in performance improvements of up to 3.1X, but more importantly, they lead to a fairer and more accurate performance evaluation on NUMA-multicore systems. They also illustrate the importance of carefully considering all details of algorithms and architectures to avoid drawing incorrect conclusions.
引用
收藏
页码:11 / 22
页数:12
相关论文
共 50 条
  • [31] Balancing the performance of block multithreaded distributed-memory systems
    Zuberek, W. M.
    SIMULATION MODELLING PRACTICE AND THEORY, 2011, 19 (05) : 1318 - 1329
  • [32] Performance evaluation of low level multithreaded BLAS kernels on intel processor based cc-NUMA systems
    Nishida, A
    Oyanagi, Y
    HIGH PERFORMANCE COMPUTING, 2003, 2858 : 500 - 510
  • [33] Tuyere: Enabling Scalable Memory Workloads for System Exploration
    Peng, Ivy Bo
    Vetter, Jeffrey S.
    Moore, Shirley V.
    Lee, Seyong
    HPDC '18: PROCEEDINGS OF THE 27TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, 2018, : 180 - 191
  • [34] Optimizing the Memory Management of a Virtual Machine Monitor on a NUMA System
    Luo, Qiuming
    Xiao, Feng
    Ming, Zhong
    Li, Hao
    Chen, Jianyong
    Zhang, Jianhua
    COMPUTER, 2016, 49 (06) : 66 - 74
  • [35] Evaluation of memory performance in NUMA architectures using Stochastic Reward Nets
    Entezari-Maleki, Reza
    Cho, Younghyun
    Egger, Bernhard
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2020, 144 : 172 - 188
  • [36] Performance Enhancement of NUMA Multiprocessor Systems with On-Demand Memory Migration
    Mishra, Vipul Kumar
    Mehta, D. A.
    PROCEEDINGS OF THE 2013 3RD IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2013, : 40 - 43
  • [37] CACHE PERFORMANCE OF OPERATING SYSTEM AND MULTIPROGRAMMING WORKLOADS
    AGARWAL, A
    HENNESSY, J
    HOROWITZ, M
    ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1988, 6 (04): : 393 - 431
  • [38] AdaptMD: Balancing Space and Performance in NUMA Architectures With Adaptive Memory Deduplication
    Yao, Lulu
    Li, Yongkun
    Lee, Patrick P. C.
    Wang, Xiaoyang
    Xu, Yinlong
    IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (06) : 1588 - 1602
  • [39] Optimizing the performance of in-memory file system by thread scheduling and file migration under NUMA multiprocessor systems
    Wu, Ting
    He, Jingting
    Qian, Ying
    Liu, Weichen
    JOURNAL OF SYSTEMS ARCHITECTURE, 2025, 159
  • [40] Efficient Performance Evaluation of Memory Hierarchy for Highly Multithreaded Graphics Processors
    Baghsorkhi, Sara S.
    Gelado, Isaac
    Delahaye, Matthieu
    Hwu, Wen-mei W.
    ACM SIGPLAN NOTICES, 2012, 47 (08) : 23 - 33