GPU Performance Optimization via Intergroup Cache Cooperation

被引：0

作者：

Wang, Guosheng ^{[1
]}

Du, Yajuan ^{[1
,2
]}

Huang, Weiming ^{[1
]}

机构：

[1] Wuhan Univ Technol, Sch Comp Sci & Technol, Wuhan 430070, Peoples R China

[2] Wuhan Univ Technol, Shenzhen Res Inst, Shenzhen 518000, Peoples R China

来源：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS | 2024年 / 43卷 / 11期

关键词：

Integrated circuits; Design automation; Graphics processing units; Computer architecture; Bidirectional control; Bandwidth; Benchmark testing; System-on-chip; Optimization; Cache; cooperation; GPU; hit ratio;

D O I：

10.1109/TCAD.2024.3443707

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Modern GPUs have integrated multilevel cache hierarchy to provide high bandwidth and mitigate the memory wall problem. However, the benefit of on-chip cache is far from achieving optimal performance. In this article, we investigate existing cache architecture and find that the cache utilization is imbalanced and there exists serious data duplication among L1 cache groups.In order to exploit the duplicate data, we propose an intergroup cache cooperation (ICC) method to establish the cooperation across L1 cache groups. According the cooperation scope, we design two schemes of the adjacent cache cooperation (ICC-AGC) and the multiple cache cooperation (ICC-MGC). In ICC-AGC, we design an adjacent cooperative directory table to realize the perception of duplicate data and integrate a lightweight network for communication. In ICC-MGC, a ring bi-directional network is designed to realize the connection among multiple groups. And we present a two-way sending mechanism and a dynamic sending mechanism to balance the overhead and efficiency involved in request probing and sending.Evaluation results show that the proposed two ICC methods can reduce the average traffic to L2 cache by 10% and 20%, respectively, and improve overall GPU performance by 19% and 49% on average, respectively, compared with the existing work.

引用

页码：4142 / 4153

页数：12

共 50 条

[1] Improving GPU Cache Hierarchy Performance with a Fetch and Replacement Cache
Candel, Francisco
Petit, Salvador
Valero, Alejandro
Sahuquillo, Julio
EURO-PAR 2018: PARALLEL PROCESSING, 2018, 11014 : 235 - 248
[2] Cache Reuse Aware Replacement Policy for Improving GPU Cache Performance
Son, Dong Oh
Kim, Gwang Bok
Kim, Jong Myon
Kim, Cheol Hong
IT CONVERGENCE AND SECURITY 2017, VOL 2, 2018, 450 : 127 - 133
[3] Cache performance and algorithm optimization
Qiao, XZ
HIGH PERFORMANCE COMPUTING ON THE INFORMATION SUPERHIGHWAY - HPC ASIA '97, PROCEEDINGS, 1997, : 12 - 17
[4] Cache performance optimization of irregular sparse matrix multiplication on modern multi-core CPU and GPU
刘力
LiuLi
Yang Guang wen
HighTechnologyLetters, 2013, 19 (04) : 339 - 345
[5] Performance Evaluation and Optimization on GPU
Gan, Xinbiao
Shen, Li
Tan, Quanyuan
Liu, Cong
Wang, Zhiying
ADVANCED RESEARCH ON INFORMATION SCIENCE, AUTOMATION AND MATERIAL SYSTEM, PTS 1-6, 2011, 219-220 : 1445 - +
[6] Impact of L2 Cache Locking on GPU Performance
Picchi, John
Zhang, Wei
IEEE SOUTHEASTCON 2015, 2015,
[7] High-Performance with an In-GPU Graph Database Cache
Morishima, Shin
Matsutani, Hiroki
IT PROFESSIONAL, 2017, 19 (06) : 58 - 64
[8] Understanding and Optimizing GPU Cache Memory Performance for Compute Workloads
Choo, Kyoshin
Panlener, William
Jang, Byunghyun
2014 IEEE 13TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2014, : 189 - 196
[9] Evolving intergroup cooperation
Bausch, Andrew W.
COMPUTATIONAL AND MATHEMATICAL ORGANIZATION THEORY, 2014, 20 (04) : 369 - 393
[10] Reducing intergroup bias: Elements of intergroup cooperation
Gaertner, SL
Dovidio, JF
Rust, MC
Nier, JA
Banker, BS
Ward, CM
Mottola, GR
Houlette, M
JOURNAL OF PERSONALITY AND SOCIAL PSYCHOLOGY, 1999, 76 (03) : 388 - 402

← 1 2 3 4 5 →