Strategies for maximizing utilization on multi-CPU and multi-GPU heterogeneous architectures

被引:0
|
作者
Angeles Navarro
Antonio Vilches
Francisco Corbera
Rafael Asenjo
机构
[1] University of Malaga,Department of Computer Architecture
[2] Universidad de Málaga,Andalucía Tech, Department of Computer Architecture
来源
关键词
Heterogeneous computing; Dynamic scheduling; Adaptive partitioning; Task parallelism; Oversubscription; Synchronization;
D O I
暂无
中图分类号
学科分类号
摘要
This paper explores the possibility of efficiently executing a single application using multicores simultaneously with multiple GPU accelerators under a parallel task programming paradigm. In particular, we address the challenge of extending a parallel_for template to allow its exploitation on heterogeneous architectures. Due to the asymmetry of the computing resources, we propose in this work a dynamic scheduling strategy coupled with an adaptive partitioning scheme that resizes chunks to prevent underutilization and load imbalance of CPUs and GPUs. In this paper we also address the problem of the underutilization of the CPU core where a host thread operates. To solve it, we propose two different approaches: (1) a collaborative host thread strategy, in which the host thread, instead of busy-waiting for the GPU to complete, it carries out useful chunk processing; and (2) a host thread blocking strategy combined with oversubscription, that delegates on the OS the duty of scheduling threads to available CPU cores in order to guarantee that all cores are doing useful work. Using two benchmarks we evaluate the overhead introduced by our scheduling and partitioning algorithms, finding that it is negligible. We also evaluate the efficiency of the strategies proposed finding that allowing oversubscription controlled by the OS can be beneficial under certain scenarios.
引用
收藏
页码:756 / 771
页数:15
相关论文
共 50 条
  • [21] A Heterogeneous System Based on Latent Semantic Analysis Using GPU and Multi-CPU
    Leon-Paredes, Gabriel A.
    Barbosa-Santillan, Liliana I.
    Sanchez-Escobar, Juan J.
    SCIENTIFIC PROGRAMMING, 2017, 2017
  • [22] Dynamic and thermodynamic crossover scenarios in the Kob-Andersen mixture: Insights from multi-CPU and multi-GPU simulations
    Coslovich, Daniele
    Ozawa, Misaki
    Kob, Walter
    EUROPEAN PHYSICAL JOURNAL E, 2018, 41 (05):
  • [23] Dynamic and thermodynamic crossover scenarios in the Kob-Andersen mixture: Insights from multi-CPU and multi-GPU simulations
    Daniele Coslovich
    Misaki Ozawa
    Walter Kob
    The European Physical Journal E, 2018, 41
  • [24] Scalable multi-node multi-GPU Louvain community detection algorithm for heterogeneous architectures
    Bhowmick, Anwesha
    Vadhiyar, Sathish
    Varun, P. V.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (17):
  • [25] Scalable multi-node multi-GPU Louvain community detection algorithm for heterogeneous architectures
    Bhowmick, Anwesha
    Vadhiyar, Sathish
    Varun, P.V.
    Concurrency and Computation: Practice and Experience, 2022, 34 (17)
  • [26] Task-Based Conjugate Gradient: From Multi-GPU Towards Heterogeneous Architectures
    Agullo, E.
    Giraud, L.
    Guermouche, A.
    Nakov, S.
    Roman, J.
    EURO-PAR 2016: PARALLEL PROCESSING WORKSHOPS, 2017, 10104 : 69 - 82
  • [27] PowerCoord: A Coordinated Power Capping Controller for Multi-CPU/GPU Servers
    Azimi, Reza
    Jing, Chao
    Reda, Sherief
    2018 NINTH INTERNATIONAL GREEN AND SUSTAINABLE COMPUTING CONFERENCE (IGSC), 2018,
  • [28] Compiler Provenance Recovery for Multi-CPU Architectures Using a Centrifuge Mechanism
    Otsubo, Yuhei
    Otsuka, Akira
    Mimura, Mamoru
    IEEE ACCESS, 2024, 12 : 34477 - 34488
  • [29] Benchmarking multi-GPU applications on modern multi-GPU integrated systems
    Bernaschi, Massimo
    Agostini, Elena
    Rossetti, Davide
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (14):
  • [30] Simulating cortical networks on heterogeneous multi-GPU systems
    Nere, Andrew
    Franey, Sean
    Hashmi, Atif
    Lipasti, Mikko
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (07) : 953 - 971