GPU Performance Optimization via Intergroup Cache Cooperation

被引:0
|
作者
Wang, Guosheng [1 ]
Du, Yajuan [1 ,2 ]
Huang, Weiming [1 ]
机构
[1] Wuhan Univ Technol, Sch Comp Sci & Technol, Wuhan 430070, Peoples R China
[2] Wuhan Univ Technol, Shenzhen Res Inst, Shenzhen 518000, Peoples R China
关键词
Integrated circuits; Design automation; Graphics processing units; Computer architecture; Bidirectional control; Bandwidth; Benchmark testing; System-on-chip; Optimization; Cache; cooperation; GPU; hit ratio;
D O I
10.1109/TCAD.2024.3443707
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Modern GPUs have integrated multilevel cache hierarchy to provide high bandwidth and mitigate the memory wall problem. However, the benefit of on-chip cache is far from achieving optimal performance. In this article, we investigate existing cache architecture and find that the cache utilization is imbalanced and there exists serious data duplication among L1 cache groups.In order to exploit the duplicate data, we propose an intergroup cache cooperation (ICC) method to establish the cooperation across L1 cache groups. According the cooperation scope, we design two schemes of the adjacent cache cooperation (ICC-AGC) and the multiple cache cooperation (ICC-MGC). In ICC-AGC, we design an adjacent cooperative directory table to realize the perception of duplicate data and integrate a lightweight network for communication. In ICC-MGC, a ring bi-directional network is designed to realize the connection among multiple groups. And we present a two-way sending mechanism and a dynamic sending mechanism to balance the overhead and efficiency involved in request probing and sending.Evaluation results show that the proposed two ICC methods can reduce the average traffic to L2 cache by 10% and 20%, respectively, and improve overall GPU performance by 19% and 49% on average, respectively, compared with the existing work.
引用
收藏
页码:4142 / 4153
页数:12
相关论文
共 50 条
  • [1] Improving GPU Cache Hierarchy Performance with a Fetch and Replacement Cache
    Candel, Francisco
    Petit, Salvador
    Valero, Alejandro
    Sahuquillo, Julio
    EURO-PAR 2018: PARALLEL PROCESSING, 2018, 11014 : 235 - 248
  • [2] Cache Reuse Aware Replacement Policy for Improving GPU Cache Performance
    Son, Dong Oh
    Kim, Gwang Bok
    Kim, Jong Myon
    Kim, Cheol Hong
    IT CONVERGENCE AND SECURITY 2017, VOL 2, 2018, 450 : 127 - 133
  • [3] Cache performance and algorithm optimization
    Qiao, XZ
    HIGH PERFORMANCE COMPUTING ON THE INFORMATION SUPERHIGHWAY - HPC ASIA '97, PROCEEDINGS, 1997, : 12 - 17
  • [4] Cache performance optimization of irregular sparse matrix multiplication on modern multi-core CPU and GPU
    刘力
    LiuLi
    Yang Guang wen
    HighTechnologyLetters, 2013, 19 (04) : 339 - 345
  • [5] Performance Evaluation and Optimization on GPU
    Gan, Xinbiao
    Shen, Li
    Tan, Quanyuan
    Liu, Cong
    Wang, Zhiying
    ADVANCED RESEARCH ON INFORMATION SCIENCE, AUTOMATION AND MATERIAL SYSTEM, PTS 1-6, 2011, 219-220 : 1445 - +
  • [6] Impact of L2 Cache Locking on GPU Performance
    Picchi, John
    Zhang, Wei
    IEEE SOUTHEASTCON 2015, 2015,
  • [7] High-Performance with an In-GPU Graph Database Cache
    Morishima, Shin
    Matsutani, Hiroki
    IT PROFESSIONAL, 2017, 19 (06) : 58 - 64
  • [8] Understanding and Optimizing GPU Cache Memory Performance for Compute Workloads
    Choo, Kyoshin
    Panlener, William
    Jang, Byunghyun
    2014 IEEE 13TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2014, : 189 - 196
  • [9] Evolving intergroup cooperation
    Bausch, Andrew W.
    COMPUTATIONAL AND MATHEMATICAL ORGANIZATION THEORY, 2014, 20 (04) : 369 - 393
  • [10] Reducing intergroup bias: Elements of intergroup cooperation
    Gaertner, SL
    Dovidio, JF
    Rust, MC
    Nier, JA
    Banker, BS
    Ward, CM
    Mottola, GR
    Houlette, M
    JOURNAL OF PERSONALITY AND SOCIAL PSYCHOLOGY, 1999, 76 (03) : 388 - 402