Performance Improvement of CUDA Applications by Reducing CPU-GPU Data Transfer Overhead

被引:0
|
作者
Sunitha, N., V [1 ]
Raju, K. [1 ]
Chiplunkar, Niranjan N. [1 ]
机构
[1] NMAMIT, Dept CSE, Nitte, India
关键词
Heterogeneous system; CUDA; Kernel; Stream;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In a CPU-GPU based heterogeneous computing system, the input data to be processed by the kernel resides in the host memory. The host and the device memory address spaces are different. Therefore, the device can not directly access the host memory. In CUDA programming model, the data is moved between the host memory and the device memory. This data transfer is a time consuming task. The communication overhead can be hidden by overlapping the data transfer and the kernel execution. CUDA streams provide a means for overlapping data transfer and the kernel execution. In this paper we explore the effects of overlapping data transfer and the kernel execution on overall execution time of some CUDA applications. The results show that the usage of the different levels of concurrency supported by the streams enhances the performance of the CUDA applications.
引用
收藏
页码:211 / 215
页数:5
相关论文
共 50 条
  • [31] A Simulation Framework for Scheduling Performance Evaluation on CPU-GPU Heterogeneous System
    Vella, Flavio
    Neri, Igor
    Gervasi, Osvaldo
    Tasso, Sergio
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2012, PT IV, 2012, 7336 : 457 - 469
  • [32] High Performance Graph Analytics with Productivity on Hybrid CPU-GPU Platforms
    Yang, Haoduo
    Su, Huayou
    Lan, Qiang
    Wen, Mei
    Zhang, Chunyuan
    2018 2ND INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPILATION, COMPUTING AND COMMUNICATIONS (HP3C 2018), 2018, : 17 - 21
  • [33] FusionFlow: Accelerating Data Preprocessing for Machine Learning with CPU-GPU Cooperation
    Kim, Taeyoon
    Park, ChanHo
    Mukimbekov, Mansur
    Hong, Heelim
    Kim, Minseok
    Jin, Ze
    Kim, Changdae
    Shin, Ji-Yong
    Jeon, Myeongjae
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 17 (04): : 863 - 876
  • [34] THE PROBLEM OF EVALUATING CPU-GPU SYSTEMS WITH 3D VISUALIZATION APPLICATIONS
    Verdu, Javier
    Pajuelo, Alex
    Valero, Mateo
    IEEE MICRO, 2012, 32 (06) : 17 - 27
  • [35] High performance computing of stiff bubble collapse on CPU-GPU heterogeneous platform
    Dubois, Remy
    da Silva, Eric Goncalves
    Parnaudeau, Philippe
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2021, 99 : 246 - 256
  • [36] Design of a simulation model for high performance LINPACK in hybrid CPU-GPU systems
    Hu, Yichang
    Lu, Lu
    JOURNAL OF SUPERCOMPUTING, 2021, 77 (12): : 13739 - 13756
  • [37] Performance modeling and analysis of heterogeneous lattice Boltzmann simulations on CPU-GPU clusters
    Feichtinger, Christian
    Habich, Johannes
    Koestler, Harald
    Ruede, Ulrich
    Aoki, Takayuki
    PARALLEL COMPUTING, 2015, 46 : 1 - 13
  • [38] Design of a simulation model for high performance LINPACK in hybrid CPU-GPU systems
    Yichang Hu
    Lu Lu
    The Journal of Supercomputing, 2021, 77 : 13739 - 13756
  • [39] High Performance FFT Based Poisson Solver on a CPU-GPU Heterogeneous Platform
    Wu, Jing
    JaJa, Joseph
    IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013), 2013, : 115 - 125
  • [40] High performance computing of stiff bubble collapse on CPU-GPU heterogeneous platform
    Dubois, Remy
    Goncalves da Silva, Eric
    Parnaudeau, Philippe
    Computers and Mathematics with Applications, 2021, 99 : 246 - 256