Optimizing GPU Cache Policies for MI Workloads

被引:0
|
作者
Alsop, Johnathan [1 ]
Sinclair, Matthew D. [1 ,2 ]
Bharadwaj, Srikant [1 ]
Dutu, Alexandru [1 ]
Gutierrez, Anthony [1 ]
Kayiran, Onur [1 ]
LeBeane, Michael [1 ]
Potter, Brandon [1 ]
Puthoor, Sooraj [1 ,2 ]
Zhang, Xianwei [1 ]
Yeh, Tsung Tai [3 ]
Beckmann, Bradford M. [1 ]
机构
[1] AMD Res, Urbana, IL 61801 USA
[2] Univ Wisconsin, Madison, WI 53706 USA
[3] Purdue Univ, W Lafayette, IN 47907 USA
关键词
execution; driven simulation; GPU caching; machine intelligence; machine learning;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In recent years, machine intelligence (MI) applications have emerged as a major driver for the computing industry. Optimizing these workloads is important, but complicated. As memory demands grow and data movement overheads increasingly limit performance, determining the best GPU caching policy to use for a diverse range of MI workloads represents one important challenge. To study this, we evaluate 17 MI applications and characterize their behavior using a range of GPU caching strategies. In our evaluations, we find that the choice of caching policy in GPU caches involves multiple performance trade-offs and interactions, and there is no one-size-fits-all GPU caching policy for MI workloads. Based on detailed simulation results, we motivate and evaluate a set of cache optimizations that consistently match the performance of the best static GPU caching policies.
引用
收藏
页码:243 / 248
页数:6
相关论文
共 50 条
  • [1] Understanding and Optimizing GPU Cache Memory Performance for Compute Workloads
    Choo, Kyoshin
    Panlener, William
    Jang, Byunghyun
    2014 IEEE 13TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2014, : 189 - 196
  • [2] Optimizing Deep Learning Workloads on ARM GPU with TVM
    Zheng, Lianmin
    Chen, Tianqi
    1ST ACM REQUEST WORKSHOP/TOURNAMENT ON REPRODUCIBLE SOFTWARE/HARDWARE CO-DESIGN OF PARETO-EFFICIENT DEEP LEARNING, 2018,
  • [3] Exploration of GPU sharing policies under GEMM workloads
    Oroutzoglou, Ioannis
    Masouros, Dimosthenis
    Koliogeorgi, Konstantina
    Xydis, Sotirios
    Soudris, Dimitrios
    PROCEEDINGS OF THE 23RD INTERNATIONAL WORKSHOP ON SOFTWARE AND COMPILERS FOR EMBEDDED SYSTEMS (SCOPES 2020), 2020, : 66 - 69
  • [4] An Evaluation of Cache Management Policies under Workloads with Malicious Requests
    Castro, Sixto J.
    Boza, Edwin F.
    Abad, Cristina L.
    2017 IEEE SECOND ECUADOR TECHNICAL CHAPTERS MEETING (ETCM), 2017,
  • [5] Optimizing GPU Kernels for Irregular Batch Workloads: A Case Study for Cholesky Factorization
    Abdelfattah, Ahmad
    Haidar, Azzam
    Tomov, Stanimire
    Dongarra, Jack
    2018 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2018,
  • [6] Revisiting The Vertex Cache: Understanding and Optimizing Vertex Processing on the modern GPU
    Kerbl, Bernhard
    Kenzel, Michael
    Ivanchenko, Elena
    Schmalstieg, Dieter
    Steinberger, Markus
    PROCEEDINGS OF THE ACM ON COMPUTER GRAPHICS AND INTERACTIVE TECHNIQUES, 2018, 1 (02)
  • [7] Characterizing the impact of last-level cache replacement policies on big-data workloads
    Jamet, Alexandre Valentin
    Alvarez, Lluc
    Jimenez, Daniel A.
    Casas, Marc
    2020 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2020), 2020, : 134 - 144
  • [8] Cache performance of video computation workloads
    Petko, S
    Kudithipudi, D
    John, E
    THIRD INTERNATIONAL WORKSHOP ON DIGITAL AND COMPUTATIONAL VIDEO, PROCEEDINGS, 2002, : 169 - 175
  • [9] Multilayer Cache Partitioning for Multiprogram Workloads
    Kandemir, Mahmut
    Prabhakar, Ramya
    Karakoy, Mustafa
    Zhang, Yuanrui
    EURO-PAR 2011 PARALLEL PROCESSING, PT 1, 2011, 6852 : 130 - 141
  • [10] GPU Support for Batch Oriented Workloads
    Costa, Lauro B.
    Al-Kiswany, Samer
    Ripeanu, Matei
    2009 IEEE 28TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCC 2009), 2009, : 231 - 238