Performance Improvement of CUDA Applications by Reducing CPU-GPU Data Transfer Overhead

被引:0
|
作者
Sunitha, N., V [1 ]
Raju, K. [1 ]
Chiplunkar, Niranjan N. [1 ]
机构
[1] NMAMIT, Dept CSE, Nitte, India
关键词
Heterogeneous system; CUDA; Kernel; Stream;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In a CPU-GPU based heterogeneous computing system, the input data to be processed by the kernel resides in the host memory. The host and the device memory address spaces are different. Therefore, the device can not directly access the host memory. In CUDA programming model, the data is moved between the host memory and the device memory. This data transfer is a time consuming task. The communication overhead can be hidden by overlapping the data transfer and the kernel execution. CUDA streams provide a means for overlapping data transfer and the kernel execution. In this paper we explore the effects of overlapping data transfer and the kernel execution on overall execution time of some CUDA applications. The results show that the usage of the different levels of concurrency supported by the streams enhances the performance of the CUDA applications.
引用
收藏
页码:211 / 215
页数:5
相关论文
共 50 条
  • [1] Boosting CUDA Applications with CPU-GPU Hybrid Computing
    Lee, Changmin
    Ro, Won Woo
    Gaudiot, Jean-Luc
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2014, 42 (02) : 384 - 404
  • [2] Reducing CPU-GPU Interferences to Improve CPU Performance in Heterogeneous Architectures
    Wen H.
    Zhang W.
    Journal of Computing Science and Engineering, 2020, 16 (04) : 131 - 145
  • [3] Performance models for CPU-GPU data transfers
    van Werkhoven, B.
    Maassen, J.
    Seinstra, F. J.
    Bal, H. E.
    2014 14TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2014, : 11 - 20
  • [4] Heterogeneous programming using OpenMP and CUDA/HIP for hybrid CPU-GPU scientific applications
    Tallada, Marc Gonzalez
    Morancho, Enric
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2023, 37 (05): : 626 - 646
  • [5] BigKernel - High Performance CPU-GPU Communication Pipelining for Big Data-style Applications
    Mokhtari, Reza
    Stumm, Michael
    2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
  • [6] Heterogeneous CPU-GPU Execution of Stencil Applications
    Siklosi, Balint
    Reguly, Istvan Z.
    Mudalige, Gihan R.
    PROCEEDINGS OF 2018 IEEE/ACM INTERNATIONAL WORKSHOP ON PERFORMANCE, PORTABILITY AND PRODUCTIVITY IN HPC (P3HPC 2018), 2018, : 71 - 80
  • [7] Design of a Hybrid MPI-CUDA Benchmark Suite for CPU-GPU Clusters
    Agarwal, Tejaswi
    Becchi, Michela
    PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'14), 2014, : 505 - 506
  • [8] Evaluation of NDVI and NDWI parameters in CPU-GPU Heterogeneous Platforms based CUDA
    Guerrouj, Fatima Zahra
    Latif, Rachid
    Saddik, Amine
    PROCEEDINGS OF 2020 5TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND ARTIFICIAL INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS (CLOUDTECH'20), 2020, : 74 - 79
  • [9] A CPU-GPU Data Transfer Optimization Approach Based On Code Migration and Merging
    Fu, Cong
    Zhai, Yanlong
    Wang, Zhenhua
    2017 16TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS TO BUSINESS, ENGINEERING AND SCIENCE (DCABES), 2017, : 23 - 26
  • [10] Boosting CUDA Applications with CPU–GPU Hybrid Computing
    Changmin Lee
    Won Woo Ro
    Jean-Luc Gaudiot
    International Journal of Parallel Programming, 2014, 42 : 384 - 404