Predicting inter-thread cache contention on a chip multi-processor architecture

被引:168
|
作者
Chandra, D [1 ]
Guo, F [1 ]
Kim, S [1 ]
Solihin, Y [1 ]
机构
[1] N Carolina State Univ, Dept Elect & Comp Engn, Raleigh, NC 27695 USA
关键词
D O I
10.1109/HPCA.2005.27
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper studies the impact of L2 cache sharing on threads that simultaneously share the cache, on a Chip Multi-Processor (CMP) architecture. Cache sharing impacts threads non-uniformly, where some threads may be slowed down significantly, while others are not. This may cause severe performance problems such as sub-optimal throughput, cache thrashing, and thread starvation for threads that fail to occupy sufficient cache space to make good progress. Unfortunately, there is no existing model that allows extensive investigation of the impact of cache sharing. To allow such a study, we propose three performance models that predict the impact of cache sharing on co-scheduled threads. The input to our models is the isolated L2 cache stack distance or circular sequence profile of each thread, which can be easily obtained on-line or off-line. The output of the models is the number of extra L2 cache misses for each thread due to cache sharing. The models differ by their complexity and prediction accuracy. We validate the models against a cycle-accurate simulation that implements a dual-core CMP architecture, on fourteen pairs of mostly SPEC benchmarks. The most accurate model, the Inductive Probability model, achieves an average error of only 3.9%. Finally, to demonstrate the usefulness and practicality of the model, a case study that details the relationship between an application's temporal reuse behavior and its cache sharing impact is presented.
引用
收藏
页码:340 / 351
页数:12
相关论文
共 50 条
  • [31] xpipes: a Latency Insensitive Parameterized Network-on-chip Architecture For Multi-Processor SoCs
    Dall'Osso, Matteo
    Biccari, Gianluca
    Giovannini, Luca
    Bertozzi, Davide
    Benini, Luca
    2012 IEEE 30TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2012, : 45 - 48
  • [32] A multi-processor computer architecture for active control
    Darbyshire, EP
    Kerry, CJ
    CONTROL ENGINEERING PRACTICE, 1997, 5 (10) : 1429 - 1434
  • [33] A multi-processor computer architecture for active control
    Darbyshire, EP
    Kerry, CJ
    ALGORITHMS AND ARCHITECTURES FOR REAL-TIME CONTROL 1997, 1997, : 31 - 36
  • [34] Configurable multi-processor architecture and its processor element design
    Nishimura, Tsutomu
    Miki, Takuji
    Sugiura, Hiroaki
    Matsumoto, Yuki
    Kobayashi, Masatsugu
    Kato, Toshiyuki
    Eda, Tsutomu
    Yamauchi, Hironori
    ASP-DAC 2006: 11TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, PROCEEDINGS, 2006, : 124 - +
  • [35] PROPHID: A heterogeneous multi-processor architecture for multimedia
    Leijten, JAJ
    vanMeerbergen, JL
    Timmer, AH
    Jess, JAG
    INTERNATIONAL CONFERENCE ON COMPUTER DESIGN - VLSI IN COMPUTERS AND PROCESSORS, PROCEEDINGS, 1997, : 164 - 169
  • [36] Barrier Synchronization for CELL Multi-Processor Architecture
    Bai, Shuwei
    Zhou, Qingguo
    Zhou, Rui
    Li, Lian
    2008 FIRST IEEE INTERNATIONAL CONFERENCE ON UBI-MEDIA COMPUTING AND WORKSHOPS, PROCEEDINGS, 2008, : 155 - 158
  • [37] Analysis of System Reliability for Cache Coherence Scheme in Multi-Processor
    Li, Sizhao
    Lin, Shan
    Chen, Deming
    Wong, W. Eric
    Guo, Donghui
    2014 IEEE EIGHTH INTERNATIONAL CONFERENCE ON SOFTWARE SECURITY AND RELIABILITY - COMPANION (SERE-C 2014), 2014, : 247 - 251
  • [38] Secure Virtualization within a Multi-processor Soft-Core System-on-Chip Architecture
    Biedermann, Alexander
    Stoettinger, Marc
    Chen, Lijing
    Huss, Sorin A.
    RECONFIGURABLE COMPUTING: ARCHITECTURES, TOOLS AND APPLICATIONS, 2011, 6578 : 385 - 396
  • [39] A Low-Latency Fine-Grained Dynamic Shared Cache Management Scheme for Chip Multi-Processor
    Xu, Jinbo
    Xu, Weixia
    Pang, Zhengbin
    2015 IEEE 34TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2015,
  • [40] Low-Latency Last-Level Cache Structure Based on Grouped Cores in Chip Multi-Processor
    Xu, Jinbo
    Xu, Weixia
    Wang, Kefei
    Pang, Zhengbin
    2014 IEEE INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2014,