DyCache: Dynamic Multi-Grain Cache Management for Irregular Memory Accesses on GPU

被引:6
|
作者
Guo, Hui [1 ]
Huang, Libo [2 ]
Lu, Yashuai [4 ]
Ma, Sheng [2 ]
Wang, Zhiying [3 ]
机构
[1] Natl Univ Def Technol, Changsha 410073, Hunan, Peoples R China
[2] Natl Univ Def Technol, Sch Comp, Changsha 410073, Hunan, Peoples R China
[3] Natl Univ Def Technol, Comp Engn, Dept Comp, Changsha 410073, Hunan, Peoples R China
[4] Space Engn Univ, Beijing 101416, Peoples R China
来源
IEEE ACCESS | 2018年 / 6卷
关键词
Accelerator architectures; cache memory; fine-grain cache management; GPGPU computing; irregular memory access; memory divergence; memory management;
D O I
10.1109/ACCESS.2018.2818193
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
GPU utilizes the wide cache-line (128B) on-chip cache to provide high bandwidth and efficient memory accesses for applications with regularly-organized data structures. However, emerging applications exhibit a lot of irregular control flows and memory access patterns. Irregular memory accesses generate many fine-grain memory accesses to L1 data cache. This mismatching between fine-grain data accesses and the coarse-grain cache design makes the on-chip memory space more constrained and as a result, the frequency of cache line replacement increases and Ll data cache is utilized inefficiently. Fine-grain cache management is proposed to provide efficient cache management to improve the efficiency of data array utilization. Unlike other static fine-grain cache managements, we propose a dynamic multi-grain cache management, called DyCache, to resolve the inefficient use of L1 data cache. Through monitoring the memory access pattern of applications, DyCache can dynamically alter the cache management granularity in order to improve the performance of GPU for applications with irregular memory accesses while not impact the performance for regular applications. Our experiment demonstrates that DyCache can achieve a 40% geometric mean improvement on IPC for applications with irregular memory accesses against the baseline cache (128B), while for applications with regular memory accesses, DyCache does not degrade the performance.
引用
收藏
页码:38881 / 38891
页数:11
相关论文
共 37 条
  • [31] faimGraph: High Performance Management of Fully-Dynamic Graphs Under Tight Memory Constraints on the GPU
    Winter, Martin
    Mlakar, Daniel
    Zayer, Rhaleb
    Seidel, Hans-Peter
    Steinberger, Markus
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE, AND ANALYSIS (SC'18), 2018,
  • [32] Cache simulation for irregular memory traffic on multi-core CPUs: Case study on performance models for sparse matrix-vector multiplication
    Trotter, James D.
    Langguth, Johannes
    Cai, Xing
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2020, 144 : 189 - 205
  • [33] An interactive and dynamic scratchpad memory management strategy for multi-core processors
    Tabbassum, Kavita
    Talpur, Shahnawaz
    Khahro, Shahnawaz Farhan
    MICROPROCESSORS AND MICROSYSTEMS, 2022, 92
  • [34] Custom design of multi-level dynamic memory management subsystem for embedded systems
    Mamagkakis, S
    Atienza, D
    Poucet, C
    Catthoor, E
    Soudris, D
    Mendias, JM
    2004 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS DESIGN AND IMPLEMENTATION, PROCEEDINGS, 2004, : 170 - 175
  • [35] EDDY: A Multi-Core BDD Package With Dynamic Memory Management and Reduced Fragmentation
    Krauss, Rune
    Goli, Mehran
    Drechsler, Rolf
    2023 28TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC, 2023, : 423 - 428
  • [36] A Low-Latency Fine-Grained Dynamic Shared Cache Management Scheme for Chip Multi-Processor
    Xu, Jinbo
    Xu, Weixia
    Pang, Zhengbin
    2015 IEEE 34TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2015,
  • [37] SyncMalloc: A Synchronized Host-Device Co-Management System for GPU Dynamic Memory Allocation across All Scales
    Zhang, Jiajian
    Wu, Fangyu
    Jiang, Hai
    Cheng, Guangliang
    Chen, Genlang
    Wang, Qiufeng
    53RD INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2024, 2024, : 179 - 188