Strategies for maximizing utilization on multi-CPU and multi-GPU heterogeneous architectures

被引:0
|
作者
Angeles Navarro
Antonio Vilches
Francisco Corbera
Rafael Asenjo
机构
[1] University of Malaga,Department of Computer Architecture
[2] Universidad de Málaga,Andalucía Tech, Department of Computer Architecture
来源
关键词
Heterogeneous computing; Dynamic scheduling; Adaptive partitioning; Task parallelism; Oversubscription; Synchronization;
D O I
暂无
中图分类号
学科分类号
摘要
This paper explores the possibility of efficiently executing a single application using multicores simultaneously with multiple GPU accelerators under a parallel task programming paradigm. In particular, we address the challenge of extending a parallel_for template to allow its exploitation on heterogeneous architectures. Due to the asymmetry of the computing resources, we propose in this work a dynamic scheduling strategy coupled with an adaptive partitioning scheme that resizes chunks to prevent underutilization and load imbalance of CPUs and GPUs. In this paper we also address the problem of the underutilization of the CPU core where a host thread operates. To solve it, we propose two different approaches: (1) a collaborative host thread strategy, in which the host thread, instead of busy-waiting for the GPU to complete, it carries out useful chunk processing; and (2) a host thread blocking strategy combined with oversubscription, that delegates on the OS the duty of scheduling threads to available CPU cores in order to guarantee that all cores are doing useful work. Using two benchmarks we evaluate the overhead introduced by our scheduling and partitioning algorithms, finding that it is negligible. We also evaluate the efficiency of the strategies proposed finding that allowing oversubscription controlled by the OS can be beneficial under certain scenarios.
引用
收藏
页码:756 / 771
页数:15
相关论文
共 50 条
  • [31] Dynamic load balancing on heterogeneous multi-GPU systems
    Acosta, Alejandro
    Blanco, Vicente
    Almeida, Francisco
    COMPUTERS & ELECTRICAL ENGINEERING, 2013, 39 (08) : 2591 - 2602
  • [32] Fraction Execution Resolver Using a Hybrid Multi-CPU/GPU Encoding Scheme
    Papaioannou, Georgios I.
    Koziri, Maria
    Loukopoulos, Thanasis
    Anagnostopoulos, Ioannis
    ELECTRONICS, 2023, 12 (17)
  • [33] GPU-Centered Parallel Model on Heterogeneous Multi-GPU Clusters
    Wang, Feng
    PROCEEDINGS OF 2012 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2012), 2012, : 1865 - 1868
  • [34] Mapping Streaming Applications on Commodity Multi-CPU and GPU On-Chip Processors
    Vilches, Antonio
    Navarro, Angeles
    Asenjo, Rafael
    Corbera, Francisco
    Gran, Ruben
    Garzaran, Maria J.
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (04) : 1099 - 1115
  • [35] Advanced Load Balancing for SPH Simulations on Multi-GPU Architectures
    Verma, Kevin
    Szewc, Kamil
    Wille, Robert
    2017 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2017,
  • [36] An efficient implementation of fair load balancing over multi-CPU SOC architectures
    Kornaros, G
    Orphanoudakis, T
    Zervos, N
    EUROMICRO SYMPOSIUM ON DIGITAL SYSTEM DESIGN, PROCEEDINGS, 2003, : 197 - 203
  • [37] The Optimization of Model Parallelization Strategies for Multi-GPU Training
    Zhang, Zechao
    Chen, Jianfeng
    Hu, Bing
    2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,
  • [38] Problems Related to Parallelization of CFD Algorithms on GPU, Multi-GPU and Hybrid Architectures.
    Blazewicz, Marek
    Kurowski, Krzysztof'
    Ludwiczak, Bogdan
    Napierala, Krystyna
    NUMERICAL ANALYSIS AND APPLIED MATHEMATICS, VOLS I-III, 2010, 1281 : 1301 - 1304
  • [39] PowerCoord: Power capping coordination for multi-CPU/GPU servers using reinforcement learning
    Azimi, Reza
    Jing, Chao
    Reda, Sherief
    SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2020, 28
  • [40] Consumer Level Multi-GPU Systems Utilization, Efficiency, and Optimization
    Ross, John Brandon
    2013 PROCEEDINGS OF IEEE SOUTHEASTCON, 2013,