Optimizing GPU Cache Policies for MI Workloads

被引：0

作者：

Alsop, Johnathan ^{[1
]}

Sinclair, Matthew D. ^{[1
,2
]}

Bharadwaj, Srikant ^{[1
]}

Dutu, Alexandru ^{[1
]}

Gutierrez, Anthony ^{[1
]}

Kayiran, Onur ^{[1
]}

LeBeane, Michael ^{[1
]}

Potter, Brandon ^{[1
]}

Puthoor, Sooraj ^{[1
,2
]}

Zhang, Xianwei ^{[1
]}

Yeh, Tsung Tai ^{[3
]}

Beckmann, Bradford M. ^{[1
]}

机构：

[1] AMD Res, Urbana, IL 61801 USA

[2] Univ Wisconsin, Madison, WI 53706 USA

[3] Purdue Univ, W Lafayette, IN 47907 USA

来源：

PROCEEDINGS OF THE 2019 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2019) | 2019年

关键词：

execution; driven simulation; GPU caching; machine intelligence; machine learning;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

In recent years, machine intelligence (MI) applications have emerged as a major driver for the computing industry. Optimizing these workloads is important, but complicated. As memory demands grow and data movement overheads increasingly limit performance, determining the best GPU caching policy to use for a diverse range of MI workloads represents one important challenge. To study this, we evaluate 17 MI applications and characterize their behavior using a range of GPU caching strategies. In our evaluations, we find that the choice of caching policy in GPU caches involves multiple performance trade-offs and interactions, and there is no one-size-fits-all GPU caching policy for MI workloads. Based on detailed simulation results, we motivate and evaluate a set of cache optimizations that consistently match the performance of the best static GPU caching policies.

引用

页码：243 / 248

页数：6

共 50 条

[1] Understanding and Optimizing GPU Cache Memory Performance for Compute Workloads
Choo, Kyoshin
Panlener, William
Jang, Byunghyun
2014 IEEE 13TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2014, : 189 - 196
[2] Optimizing Deep Learning Workloads on ARM GPU with TVM
Zheng, Lianmin
Chen, Tianqi
1ST ACM REQUEST WORKSHOP/TOURNAMENT ON REPRODUCIBLE SOFTWARE/HARDWARE CO-DESIGN OF PARETO-EFFICIENT DEEP LEARNING, 2018,
[3] Exploration of GPU sharing policies under GEMM workloads
Oroutzoglou, Ioannis
Masouros, Dimosthenis
Koliogeorgi, Konstantina
Xydis, Sotirios
Soudris, Dimitrios
PROCEEDINGS OF THE 23RD INTERNATIONAL WORKSHOP ON SOFTWARE AND COMPILERS FOR EMBEDDED SYSTEMS (SCOPES 2020), 2020, : 66 - 69
[4] An Evaluation of Cache Management Policies under Workloads with Malicious Requests
Castro, Sixto J.
Boza, Edwin F.
Abad, Cristina L.
2017 IEEE SECOND ECUADOR TECHNICAL CHAPTERS MEETING (ETCM), 2017,
[5] Optimizing GPU Kernels for Irregular Batch Workloads: A Case Study for Cholesky Factorization
Abdelfattah, Ahmad
Haidar, Azzam
Tomov, Stanimire
Dongarra, Jack
2018 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2018,
[6] Revisiting The Vertex Cache: Understanding and Optimizing Vertex Processing on the modern GPU
Kerbl, Bernhard
Kenzel, Michael
Ivanchenko, Elena
Schmalstieg, Dieter
Steinberger, Markus
PROCEEDINGS OF THE ACM ON COMPUTER GRAPHICS AND INTERACTIVE TECHNIQUES, 2018, 1 (02)
[7] Characterizing the impact of last-level cache replacement policies on big-data workloads
Jamet, Alexandre Valentin
Alvarez, Lluc
Jimenez, Daniel A.
Casas, Marc
2020 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2020), 2020, : 134 - 144
[8] Cache performance of video computation workloads
Petko, S
Kudithipudi, D
John, E
THIRD INTERNATIONAL WORKSHOP ON DIGITAL AND COMPUTATIONAL VIDEO, PROCEEDINGS, 2002, : 169 - 175
[9] Multilayer Cache Partitioning for Multiprogram Workloads
Kandemir, Mahmut
Prabhakar, Ramya
Karakoy, Mustafa
Zhang, Yuanrui
EURO-PAR 2011 PARALLEL PROCESSING, PT 1, 2011, 6852 : 130 - 141
[10] GPU Support for Batch Oriented Workloads
Costa, Lauro B.
Al-Kiswany, Samer
Ripeanu, Matei
2009 IEEE 28TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCC 2009), 2009, : 231 - 238

← 1 2 3 4 5 →