Selective Caching: Avoiding Performance Valleys in Massively Parallel Architectures

被引：0

作者：

Jadidi, Amin ^{[1
]}

Kandemir, Mahmut T. ^{[2
]}

Das, Chita R. ^{[2
]}

机构：

[1] Cadence Design Syst, San Jose, CA 95134 USA

[2] Penn State Univ, Dept Comp Sci & Engn, University Pk, PA 16802 USA

来源：

2020 28TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2020) | 2020年

关键词：

D O I：

10.1109/PDP50117.2020.00051

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Emerging general purpose graphics processing units (GPGPU) make use of a memory hierarchy very similar to that of modern multi-core processors they typically have multiple levels of on-chip caches and a DDR-like off-chip main memory. In such massively parallel architectures, caches are expected to reduce the average data access latency by reducing the number of off-chip memory accesses; however, our extensive experimental studies confirm that not all applications utilize the on-chip caches in an efficient manner. Even though GPGPUs are adopted to run a wide range of general purpose applications, the conventional cache management policies are incapable of achieving the optimal performance over different memory characteristics of the applications. This paper first investigates the underlying reasons for inefficiency of common cache management policies in GPGPUs. To address and resolve those issues, we then propose (i) a characterization mechanism to analyze each kernel at runtime and, (ii) a selective caching policy to manage the flow of cache accesses. Evaluation results of the studied platform show that our proposed dynamically reconfigurable cache hierarchy improves the system performance by up to 105% (average of 27%) over a wide range of modern GPGPU applications, which is within 10% of the optimal improvement.

引用

页码：290 / 298

页数：9

共 50 条

[41] MASSIVELY PARALLEL ARCHITECTURES FOR LARGE-SCALE NEURAL NETWORK SIMULATIONS
FUJIMOTO, Y
FUKUDA, N
AKABANE, T
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1992, 3 (06): : 876 - 888
[42] Efficient, Out-of-Memory Sparse MTTKRP on Massively Parallel Architectures
Nguyen, Andy
Helal, Ahmed E.
Checconi, Fabio
Laukemann, Jan
Tithi, Jesmin Jahan
Soh, Yongseok
Ranadive, Teresa
Petrini, Fabrizio
Choi, Jee W.
PROCEEDINGS OF THE 36TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ICS 2022, 2022,
[43] Blitz. A rule-based system for massively parallel architectures
Morgan, K.
Conference on Hypercube Concurrent Computers and Applications, 1988,
[44] The multi-reference configuration interaction method on massively parallel architectures
Stampfuss, P
Wenzel, W
HIGH PERFORMANCE COMPUTING IN SCIENCE AND ENGINEERING '99, 2000, : 165 - 182
[45] Special Issue on Architectures, Algorithms and Networks for Massively Parallel Computing - Foreword
Horiguchi, S
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1996, E79D (08) : 1013 - 1014
[46] MASSIVELY PARALLEL ARCHITECTURES FOR IMAGE-PROCESSING - A CASE-STUDY
ADORNI, G
BROGGI, A
CONTE, G
DANDREA, V
PERCEPTION, 1991, 20 (01) : 68 - 69
[47] Lossless Image Compression by Block Matching on Practical Massively Parallel Architectures
Cinque, Luigi
De Agostino, Sergio
PROCEEDINGS OF THE PRAGUE STRINGOLOGY CONFERENCE 2008, 2008, : 26 - 34
[48] An auction-based weighted matching implementation on massively parallel architectures
Sathe, Madan
Schenk, Olaf
Burkhart, Helmar
PARALLEL COMPUTING, 2012, 38 (12) : 595 - 614
[49] Three-dimensional optoelectronic architectures for massively parallel processing systems
VanMarck, H
VanCampenhout, J
PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE - MASSIVELY PARALLEL PROCESSING USING OPTICAL INTERCONNECTIONS, 1997, : 178 - 182
[50] A massively parallel adaptive fast-multipole method on heterogeneous architectures
Lashuk, Ilya
Chandramowlishwaran, Aparna
Langston, Harper
Tuan-Anh Nguyen
Sampath, Rahul
Shringarpure, Aashay
Vuduc, Richard
Ying, Lexing
Zorin, Denis
Biros, George
PROCEEDINGS OF THE CONFERENCE ON HIGH PERFORMANCE COMPUTING NETWORKING, STORAGE AND ANALYSIS, 2009,

← 1 2 3 4 5 →