LARA: Locality-aware resource allocation to improve GPU memory-access time

被引：0

作者：

Hossein BiTalebi

Farshad Safaei

机构：

[1] Shahid Beheshti University,Faculty of Computer Science and Engineering

来源：

The Journal of Supercomputing | 2021年 / 77卷

关键词：

Cache contention; Memory divergence; Graphics Processing Unit (GPU); GPU-NoC; Interconnection network; Locality; Memory; Priority; Row access; Stall time;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Memory access as a primary performance bottleneck of each processing unit also plays a significant role in GPU performance. In addition to high challenging parts of GPU’s memory access path, the low locality property among the requests considerably increases the memory access delay. Despite the GPU’s immense processing power, they cannot reach their maximum throughput values because of the memory access bottlenecks. Memory divergence and miss locality among the L1 missed requests significantly impose the Last-Level-Cache contention and main memory row switching overheads. In addition, interconnection network routes the request packets regardless of locality properties, such routing algorithm considerably disrupts the locality among the requests.

引用

页码：14438 / 14460

页数：22

共 50 条

[31] Improving Memory Efficiency in Heterogeneous MPSoCs through Row-Buffer Locality-aware Forwarding
Song, Yang
Lin, Bill
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2020, 17 (01)
[32] InnerSP: A Memory Efficient Sparse Matrix Multiplication Accelerator with Locality-aware Inner Product Processing
Baek, Daehyeon
Hwang, Soojin
Heo, Taekyung
Kim, Daehoon
Huh, Jaehyuk
30TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT 2021), 2021, : 116 - 128
[33] Network Coding Aware Resource Allocation to Improve Throughput
Zhang, Dan
Su, Kai
Mandayam, Narayan B.
2012 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY PROCEEDINGS (ISIT), 2012,
[34] Criticality-aware priority to accelerate GPU memory access
Hossein Bitalebi
Farshad Safaei
The Journal of Supercomputing, 2023, 79 : 188 - 213
[35] PIM-Enabled Instructions: A Low-Overhead, Locality-Aware Processing-in-Memory Architecture
Ahn, Junwhan
Yoo, Sungjoo
Mutlu, Onur
Choi, Kiyoung
2015 ACM/IEEE 42ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2015, : 336 - 348
[36] Criticality-aware priority to accelerate GPU memory access
Bitalebi, Hossein
Safaei, Farshad
JOURNAL OF SUPERCOMPUTING, 2023, 79 (01): : 188 - 213
[37] Scalable, resource and locality-aware selection of active scatterers in Geometry-based stochastic channel models
Rainer, Benjamin
Hofer, Markus
Zelenbaba, Stefan
Loeschenbrand, David
Zemen, Thomas
Ye, Xiaochun
Priller, Peter
2021 IEEE 32ND ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS (PIMRC), 2021,
[38] BiloKey : A Scalable Bi-Index Locality-Aware In-Memory Key-Value Store
Ma, Wenlong
Zhu, Yuqing
Li, Cheng
Guo, Mengying
Bao, Yungang
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (07) : 1528 - 1540
[39] Locality-Aware Replacement Algorithm in Flash Memory to Optimize Cloud Computing for Smart Factory of Industry 4.0
He, Jianfan
Jia, Gangyong
Han, Guangjie
Wang, Hao
Yang, Xuan
IEEE ACCESS, 2017, 5 : 16252 - 16262
[40] A run-time optimization approach for reducing data movements using locality-aware searching
Liang Li
Endong Wang
Xingjun Zhang
Kang Yan
Tao Ju
Xiaoshe Dong
The Journal of Supercomputing, 2014, 69 : 864 - 886

← 1 2 3 4 5 →