BARM: A Batch-Aware Resource Manager for Boosting Multiple Neural Networks Inference on GPUs With Memory Oversubscription

被引：1

作者：

Qiu, Zhao-Wei ^{[1
]}

Liu, Kun-Sheng ^{[1
]}

Chen, Ya-Shu ^{[1
]}

机构：

[1] Natl Taiwan Univ Sci & Technol, Dept Elect Engn, Taipei 10607, Taiwan

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2022年 / 33卷 / 12期

关键词：

Graphics processing units; Neural networks; Memory management; Resource management; Kernel; Registers; Random access memory; Memory oversubscription; memory management; memory thrashing; SUPPORT;

D O I：

10.1109/TPDS.2022.3199806

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Modern intelligent devices usually execute multiple neural networks to improve service quality. However, system performance degrades significantly when the working set exceeds the physical memory capability, a phenomenon called memory oversubscription. To support the execution of multiple independent neural networks with limited physical memory, this article explores resource management in GPUs with unified virtual memory and demand paging. We first analyze the relationship between the simultaneous execution of multiple neural networks from streaming multiprocessors (SM) assignment and page fault overhead from memory thrashing. To boost performance by reducing the page fault penalty, we propose a batch-aware resource management approach, BARM, including (1) batch-aware SM resource allocation to increase the batch size and (2) thrashing-preventing memory allocation to eliminate run-time thrashing. The performance of the proposed method was evaluated using a series of workloads, and response latency is reduced significantly over the state-of-the-art page fault prefetcher and batch-aware TLP management. The proposed framework was also implemented on the real platform and evaluated by a case study, and impressive results were obtained.

引用

页码：4612 / 4624

页数：13

共 5 条

[1] Batch-Aware Unified Memory Management in GPUs for Irregular Workloads
Kim, Hyojong
Sim, Jaewoong
Gera, Prasun
Hadidi, Ramyad
Kim, Hyesoon
TWENTY-FIFTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXV), 2020, : 1357 - 1370
[2] Integration of millions of transcriptomes using batch-aware triplet neural networks
Simon, Lukas M.
Wang, Yin-Ying
Zhao, Zhongming
NATURE MACHINE INTELLIGENCE, 2021, 3 (08) : 705 - +
[3] Integration of millions of transcriptomes using batch-aware triplet neural networks
Lukas M. Simon
Yin-Ying Wang
Zhongming Zhao
Nature Machine Intelligence, 2021, 3 : 705 - 715
[4] TORRES: A Resource-Efficient Inference Processor for Binary Convolutional Neural Networks Based on Locality-Aware Operation Skipping
Lee, Su-Jung
Kwak, Gil-Ho
Kim, Tae-Hwan
ELECTRONICS, 2022, 11 (21)
[5] Self-attention convolutional neural network optimized with Remora Optimization Algorithm for energy aware resource management in Non-orthogonal Multiple Access networks
Dani, W. Vinil
Praveen, K. V.
Murugesan, S.
Ramshankar, N.
INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, 2024, 37 (12)

← 1 →