DyCache: Dynamic Multi-Grain Cache Management for Irregular Memory Accesses on GPU

被引：6

作者：

Guo, Hui ^{[1
]}

Huang, Libo ^{[2
]}

Lu, Yashuai ^{[4
]}

Ma, Sheng ^{[2
]}

Wang, Zhiying ^{[3
]}

机构：

[1] Natl Univ Def Technol, Changsha 410073, Hunan, Peoples R China

[2] Natl Univ Def Technol, Sch Comp, Changsha 410073, Hunan, Peoples R China

[3] Natl Univ Def Technol, Comp Engn, Dept Comp, Changsha 410073, Hunan, Peoples R China

[4] Space Engn Univ, Beijing 101416, Peoples R China

来源：

IEEE ACCESS | 2018年 / 6卷

关键词：

Accelerator architectures; cache memory; fine-grain cache management; GPGPU computing; irregular memory access; memory divergence; memory management;

D O I：

10.1109/ACCESS.2018.2818193

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

GPU utilizes the wide cache-line (128B) on-chip cache to provide high bandwidth and efficient memory accesses for applications with regularly-organized data structures. However, emerging applications exhibit a lot of irregular control flows and memory access patterns. Irregular memory accesses generate many fine-grain memory accesses to L1 data cache. This mismatching between fine-grain data accesses and the coarse-grain cache design makes the on-chip memory space more constrained and as a result, the frequency of cache line replacement increases and Ll data cache is utilized inefficiently. Fine-grain cache management is proposed to provide efficient cache management to improve the efficiency of data array utilization. Unlike other static fine-grain cache managements, we propose a dynamic multi-grain cache management, called DyCache, to resolve the inefficient use of L1 data cache. Through monitoring the memory access pattern of applications, DyCache can dynamically alter the cache management granularity in order to improve the performance of GPU for applications with irregular memory accesses while not impact the performance for regular applications. Our experiment demonstrates that DyCache can achieve a 40% geometric mean improvement on IPC for applications with irregular memory accesses against the baseline cache (128B), while for applications with regular memory accesses, DyCache does not degrade the performance.

引用

页码：38881 / 38891

页数：11

共 37 条

[31] faimGraph: High Performance Management of Fully-Dynamic Graphs Under Tight Memory Constraints on the GPU
Winter, Martin
Mlakar, Daniel
Zayer, Rhaleb
Seidel, Hans-Peter
Steinberger, Markus
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE, AND ANALYSIS (SC'18), 2018,
[32] Cache simulation for irregular memory traffic on multi-core CPUs: Case study on performance models for sparse matrix-vector multiplication
Trotter, James D.
Langguth, Johannes
Cai, Xing
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2020, 144 : 189 - 205
[33] An interactive and dynamic scratchpad memory management strategy for multi-core processors
Tabbassum, Kavita
Talpur, Shahnawaz
Khahro, Shahnawaz Farhan
MICROPROCESSORS AND MICROSYSTEMS, 2022, 92
[34] Custom design of multi-level dynamic memory management subsystem for embedded systems
Mamagkakis, S
Atienza, D
Poucet, C
Catthoor, E
Soudris, D
Mendias, JM
2004 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS DESIGN AND IMPLEMENTATION, PROCEEDINGS, 2004, : 170 - 175
[35] EDDY: A Multi-Core BDD Package With Dynamic Memory Management and Reduced Fragmentation
Krauss, Rune
Goli, Mehran
Drechsler, Rolf
2023 28TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC, 2023, : 423 - 428
[36] A Low-Latency Fine-Grained Dynamic Shared Cache Management Scheme for Chip Multi-Processor
Xu, Jinbo
Xu, Weixia
Pang, Zhengbin
2015 IEEE 34TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2015,
[37] SyncMalloc: A Synchronized Host-Device Co-Management System for GPU Dynamic Memory Allocation across All Scales
Zhang, Jiajian
Wu, Fangyu
Jiang, Hai
Cheng, Guangliang
Chen, Genlang
Wang, Qiufeng
53RD INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2024, 2024, : 179 - 188

← 1 2 3 4 →