Warped-MC: An Efficient Memory Controller Scheme for Massively Parallel Processors

被引:0
|
作者
Jeong, Jong Hyun [1 ]
Yoon, Myung Kuk [2 ]
Oh, Yunho [1 ]
Koo, Gunjae [1 ]
机构
[1] Korea Univ, Seoul, South Korea
[2] Ewha Womans Univ, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
GPU Architecture; Memory System; Memory Controller; CACHE MANAGEMENT; SUITE;
D O I
10.1145/3605573.3605645
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The performance of GPU's external memories is becoming more critical since a modern GPU runs thousands of concurrent threads that demand a huge volume of data. In order to utilize resources in the memory hierarchy more efficiently, GPU employs a memory coalescing scheme to reduce the number of demand requests created from a group of threads (i.e. a warp). However, GPU's memory coalescing does not work well for applications that exhibit irregular memory access patterns, thus a single warp can generate multiple memory transactions. Since memory requests are serviced by different hierarchy levels and/or memory partitions, multiple outstanding requests from a single warp exhibit diverged fetch latency. Considering the execution time of a load warp is decided by the slowest memory transaction, the diverged memory latency within a warp is a critical performance factor for load warps. In this paper, we propose a warp-aware memory controller scheme, called Warped-MC, to mitigate the memory latency divergence issues. Based on the in-depth analysis, we reveal the memory latency divergence within a warp is mainly caused by GPU memory controllers. While the conventional FR-FCFS memory controller can maximize the effective bandwidth of DRAM channels, the scheduling scheme of the conventional memory controller can exacerbate the memory latency divergence of a warp. Warped-MC employs a warp-aware scheduling scheme to alleviate the memory latency divergence, thus Warped-MC can tackle the long tail of the load warp execution time to improve the performance of memory-intensive applications. We implement Warped-MC on GPGPU-Sim configured with the modern GPU architecture, and our evaluation results exhibit Warped-MC can improve the performance of memory-intensive applications by 8.9% on average with a maximum of 45.8%.
引用
收藏
页码:546 / 555
页数:10
相关论文
共 24 条