GPU Performance Optimization via Intergroup Cache Cooperation

被引:0
|
作者
Wang, Guosheng [1 ]
Du, Yajuan [1 ,2 ]
Huang, Weiming [1 ]
机构
[1] Wuhan Univ Technol, Sch Comp Sci & Technol, Wuhan 430070, Peoples R China
[2] Wuhan Univ Technol, Shenzhen Res Inst, Shenzhen 518000, Peoples R China
关键词
Integrated circuits; Design automation; Graphics processing units; Computer architecture; Bidirectional control; Bandwidth; Benchmark testing; System-on-chip; Optimization; Cache; cooperation; GPU; hit ratio;
D O I
10.1109/TCAD.2024.3443707
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Modern GPUs have integrated multilevel cache hierarchy to provide high bandwidth and mitigate the memory wall problem. However, the benefit of on-chip cache is far from achieving optimal performance. In this article, we investigate existing cache architecture and find that the cache utilization is imbalanced and there exists serious data duplication among L1 cache groups.In order to exploit the duplicate data, we propose an intergroup cache cooperation (ICC) method to establish the cooperation across L1 cache groups. According the cooperation scope, we design two schemes of the adjacent cache cooperation (ICC-AGC) and the multiple cache cooperation (ICC-MGC). In ICC-AGC, we design an adjacent cooperative directory table to realize the perception of duplicate data and integrate a lightweight network for communication. In ICC-MGC, a ring bi-directional network is designed to realize the connection among multiple groups. And we present a two-way sending mechanism and a dynamic sending mechanism to balance the overhead and efficiency involved in request probing and sending.Evaluation results show that the proposed two ICC methods can reduce the average traffic to L2 cache by 10% and 20%, respectively, and improve overall GPU performance by 19% and 49% on average, respectively, compared with the existing work.
引用
收藏
页码:4142 / 4153
页数:12
相关论文
共 50 条
  • [31] Communication protocol optimization for enhanced GPU performance
    Sharkawi, S. S.
    Chochia, G. A.
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2020, 64 (3-4) : 3 - 4
  • [32] Throughput optimization via cache partitioning for embedded multiprocessors
    Molnos, Anca A.
    Cotofana, Soirin D.
    Heijligers, Marc J. M.
    van Eijndhoven, Jos T. J.
    2006 INTERNATIONAL CONFERENCE ON EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING AND SIMULATION, PROCEEDINGS, 2006, : 185 - +
  • [33] Genetic Algorithm on GPU Performance Optimization Issues
    Paukste, Andrius
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2013, 2013, 8206 : 529 - 536
  • [34] Achieve scattering distribution of GPU cache indices on frame buffer via XY coordinates
    Zhang, J. (junzhang@mail.csu.edu.cn), 1600, Editorial Board of Jilin University (43):
  • [35] Using GPU to Accelerate Cache Simulation
    Wan Han
    Gao Xiaopeng
    Wang Zhiqiang
    Li Yi
    2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS, PROCEEDINGS, 2009, : 565 - 570
  • [36] Optimization of Cooperation Sensing Spectrum Performance
    Kadhim, Deah J.
    Gong, Shimin
    Liu, Wei
    Cheng, Wenqing
    2009 WRI INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND MOBILE COMPUTING: CMC 2009, VOL I, 2009, : 78 - 82
  • [37] Accelerating parallel particle swarm optimization via GPU
    Hung, Yukai
    Wang, Weichung
    OPTIMIZATION METHODS & SOFTWARE, 2012, 27 (01): : 33 - 51
  • [38] A cache cooperation management for wireless
    Xiang, Z
    Zhong, Z
    Zhong, YZ
    2001 INTERNATIONAL CONFERENCES ON INFO-TECH AND INFO-NET PROCEEDINGS, CONFERENCE A-G: INFO-TECH & INFO-NET: A KEY TO BETTER LIFE, 2001, : E328 - E333
  • [39] Cache-Conscious Performance Optimization for Similarity Search
    Alabduljalil, Maha
    Tang, Xun
    Yang, Tao
    SIGIR'13: THE PROCEEDINGS OF THE 36TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH & DEVELOPMENT IN INFORMATION RETRIEVAL, 2013, : 713 - 722
  • [40] Utilizing GPU Performance Counters to Characterize GPU Kernels via Machine Learning
    Zigon, Bob
    Song, Fengguang
    COMPUTATIONAL SCIENCE - ICCS 2020, PT I, 2020, 12137 : 88 - 101