Selective Caching: Avoiding Performance Valleys in Massively Parallel Architectures

被引:0
|
作者
Jadidi, Amin [1 ]
Kandemir, Mahmut T. [2 ]
Das, Chita R. [2 ]
机构
[1] Cadence Design Syst, San Jose, CA 95134 USA
[2] Penn State Univ, Dept Comp Sci & Engn, University Pk, PA 16802 USA
关键词
D O I
10.1109/PDP50117.2020.00051
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Emerging general purpose graphics processing units (GPGPU) make use of a memory hierarchy very similar to that of modern multi-core processors they typically have multiple levels of on-chip caches and a DDR-like off-chip main memory. In such massively parallel architectures, caches are expected to reduce the average data access latency by reducing the number of off-chip memory accesses; however, our extensive experimental studies confirm that not all applications utilize the on-chip caches in an efficient manner. Even though GPGPUs are adopted to run a wide range of general purpose applications, the conventional cache management policies are incapable of achieving the optimal performance over different memory characteristics of the applications. This paper first investigates the underlying reasons for inefficiency of common cache management policies in GPGPUs. To address and resolve those issues, we then propose (i) a characterization mechanism to analyze each kernel at runtime and, (ii) a selective caching policy to manage the flow of cache accesses. Evaluation results of the studied platform show that our proposed dynamically reconfigurable cache hierarchy improves the system performance by up to 105% (average of 27%) over a wide range of modern GPGPU applications, which is within 10% of the optimal improvement.
引用
收藏
页码:290 / 298
页数:9
相关论文
共 50 条
  • [1] Classification of Massively Parallel Computer Architectures
    Shami, Muhammad Ali
    Hemani, Ahmed
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 344 - 351
  • [2] OPTICAL INTERCONNECTIONS FOR MASSIVELY PARALLEL ARCHITECTURES
    GUHA, A
    BRISTOW, J
    SULLIVAN, C
    HUSAIN, A
    APPLIED OPTICS, 1990, 29 (08): : 1077 - 1093
  • [3] Performance analysis of massively parallel embedded hardware architectures for retinal image processing
    Nieto, Alejandro
    Brea, Victor
    Vilarino, David L.
    Osorio, Roberto R.
    EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2011,
  • [4] Performance analysis of massively parallel embedded hardware architectures for retinal image processing
    Alejandro Nieto
    Victor Brea
    David L Vilariño
    Roberto R Osorio
    EURASIP Journal on Image and Video Processing, 2011
  • [5] High performance domain decomposition methods on massively parallel architectures with freefem++
    Jolivet, P.
    Dolean, V.
    Hecht, F.
    Nataf, F.
    Prud'Homme, C.
    Spillane, N.
    JOURNAL OF NUMERICAL MATHEMATICS, 2012, 20 (3-4) : 287 - 302
  • [6] Vortex Methods for Massively Parallel Computer Architectures
    Chatelain, Philippe
    Curioni, Alessandro
    Bergdorf, Michael
    Rossinelli, Diego
    Andreoni, Wanda
    Koumoutsakos, Petros
    HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2008, 2008, 5336 : 479 - +
  • [7] PFFT: AN EXTENSION OF FFTW TO MASSIVELY PARALLEL ARCHITECTURES
    Pippig, Michael
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2013, 35 (03): : C213 - C236
  • [8] Analytical approach to massively parallel architectures for nanotechnologies
    Jäger, B
    Niemann, JC
    Rückert, U
    16TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURE AND PROCESSORS, PROCEEDINGS, 2005, : 268 - 275
  • [9] A multiprocessor cache for massively parallel SoC architectures
    Niemann, Jorg-Christian
    LiB, Christian
    Porrmann, Mario
    Rueckert, Ulrich
    ARCHITECTURE OF COMPUTING SYSTEMS - ARCS 2007, PROCEEDINGS, 2007, 4415 : 83 - +
  • [10] COMPILING FOR MASSIVELY-PARALLEL ARCHITECTURES - A PERSPECTIVE
    FEAUTRIER, P
    MICROPROCESSING AND MICROPROGRAMMING, 1995, 41 (5-6): : 425 - 439