GPU Performance Optimization via Intergroup Cache Cooperation

被引:0
|
作者
Wang, Guosheng [1 ]
Du, Yajuan [1 ,2 ]
Huang, Weiming [1 ]
机构
[1] Wuhan Univ Technol, Sch Comp Sci & Technol, Wuhan 430070, Peoples R China
[2] Wuhan Univ Technol, Shenzhen Res Inst, Shenzhen 518000, Peoples R China
关键词
Integrated circuits; Design automation; Graphics processing units; Computer architecture; Bidirectional control; Bandwidth; Benchmark testing; System-on-chip; Optimization; Cache; cooperation; GPU; hit ratio;
D O I
10.1109/TCAD.2024.3443707
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Modern GPUs have integrated multilevel cache hierarchy to provide high bandwidth and mitigate the memory wall problem. However, the benefit of on-chip cache is far from achieving optimal performance. In this article, we investigate existing cache architecture and find that the cache utilization is imbalanced and there exists serious data duplication among L1 cache groups.In order to exploit the duplicate data, we propose an intergroup cache cooperation (ICC) method to establish the cooperation across L1 cache groups. According the cooperation scope, we design two schemes of the adjacent cache cooperation (ICC-AGC) and the multiple cache cooperation (ICC-MGC). In ICC-AGC, we design an adjacent cooperative directory table to realize the perception of duplicate data and integrate a lightweight network for communication. In ICC-MGC, a ring bi-directional network is designed to realize the connection among multiple groups. And we present a two-way sending mechanism and a dynamic sending mechanism to balance the overhead and efficiency involved in request probing and sending.Evaluation results show that the proposed two ICC methods can reduce the average traffic to L2 cache by 10% and 20%, respectively, and improve overall GPU performance by 19% and 49% on average, respectively, compared with the existing work.
引用
收藏
页码:4142 / 4153
页数:12
相关论文
共 50 条
  • [21] Cache Coherence for GPU Architectures
    Singh, Inderpreet
    Shriraman, Arrvindh
    Fung, Wilson W. L.
    O'Connor, Mike
    Aamodt, Tor M.
    19TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA2013), 2013, : 578 - 590
  • [22] CACHE COHERENCE FOR GPU ARCHITECTURES
    Singh, Inderpreet
    Shriraman, Arrvindh
    Fung, Wilson W. L.
    O'Connor, Mike
    Aamodt, Tor M.
    IEEE MICRO, 2014, 34 (03) : 69 - 79
  • [23] Performance Optimization Strategies of High Performance Computing on GPU
    Ma, Anguo
    Cai, Jing
    Cheng, Yu
    Ni, Xiaoqiang
    Tang, Yuxing
    Xing, Zuocheng
    ADVANCED PARALLEL PROCESSING TECHNOLOGIES, PROCEEDINGS, 2009, 5737 : 150 - 164
  • [24] INTERGROUP AND INTRAGROUP COMPETITION AND COOPERATION
    GOLDMAN, M
    STOCKBAUER, JW
    MCAULIFFE, TG
    JOURNAL OF EXPERIMENTAL SOCIAL PSYCHOLOGY, 1977, 13 (01) : 81 - 88
  • [25] SOCIAL DIFFERENTIATION IN INTERGROUP COOPERATION
    BECK, D
    INTERNATIONAL JOURNAL OF SMALL GROUP RESEARCH, 1987, 3 (02): : 221 - 223
  • [26] SOCIAL DIFFERENTIATION IN INTERGROUP COOPERATION
    BECK, D
    INTERNATIONAL JOURNAL OF SMALL GROUP RESEARCH, 1988, 4 (01): : 3 - 29
  • [27] Performance Evaluations of Document-Oriented Databases using GPU and Cache Structure
    Morishima, Shin
    Matsutani, Hiroki
    2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 3, 2015, : 108 - 115
  • [28] RDGC: A REUSE DISTANCE-BASED APPROACH TO GPU CACHE PERFORMANCE ANALYSIS
    Kiani, Mohsen
    Rajabzadeh, Amir
    COMPUTING AND INFORMATICS, 2019, 38 (02) : 421 - 453
  • [29] Performance Drawbacks for Matrix Multiplication using Set Associative Cache in GPU devices
    Djinevski, Leonid
    Arsenovski, Sime
    Ristov, Sasko
    Gusev, Marjan
    2013 36TH INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2013, : 193 - 198
  • [30] GPU performance optimization targeting OpenCL model
    Chen, Gang
    Wu, Baifeng
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2011, 23 (04): : 571 - 581