Performance Improvement of CUDA Applications by Reducing CPU-GPU Data Transfer Overhead

被引:0
|
作者
Sunitha, N., V [1 ]
Raju, K. [1 ]
Chiplunkar, Niranjan N. [1 ]
机构
[1] NMAMIT, Dept CSE, Nitte, India
关键词
Heterogeneous system; CUDA; Kernel; Stream;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In a CPU-GPU based heterogeneous computing system, the input data to be processed by the kernel resides in the host memory. The host and the device memory address spaces are different. Therefore, the device can not directly access the host memory. In CUDA programming model, the data is moved between the host memory and the device memory. This data transfer is a time consuming task. The communication overhead can be hidden by overlapping the data transfer and the kernel execution. CUDA streams provide a means for overlapping data transfer and the kernel execution. In this paper we explore the effects of overlapping data transfer and the kernel execution on overall execution time of some CUDA applications. The results show that the usage of the different levels of concurrency supported by the streams enhances the performance of the CUDA applications.
引用
收藏
页码:211 / 215
页数:5
相关论文
共 50 条
  • [21] Big data simulation for surface reconstruction on CPU-GPU platform
    Hadi, N. A.
    2ND INTERNATIONAL CONFERENCE ON DATA AND INFORMATION SCIENCE, 2019, 1192
  • [22] Simultaneous CPU-GPU Execution of Data Parallel Algorithmic Skeletons
    Wrede, Fabian
    Ernsting, Steffen
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2018, 46 (01) : 42 - 61
  • [23] A High Performance Implementation of Spectral Clustering on CPU-GPU Platforms
    Jin, Yu
    Jaja, Joseph F.
    2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 825 - 834
  • [24] Random Forests over normalized data in CPU-GPU DBMSes
    Huang, Zezhou
    Damalapati, Pavan Kalyan
    Sen, Rathijit
    Wu, Eugene
    19TH INTERNATIONAL WORKSHOP ON DATA MANAGEMENT ON NEW HARDWARE, DAMON 2023, 2023, : 98 - 101
  • [25] Comparison of analytical and ML-based models for predicting CPU-GPU data transfer time
    Riahi, Ali
    Savadi, Abdorreza
    Naghibzadeh, Mahmoud
    COMPUTING, 2020, 102 (09) : 2099 - 2116
  • [26] Reducing Inter-Application Interferences in Integrated CPU-GPU Heterogeneous Architecture
    Wen, Hao
    Zhang, Wei
    2018 IEEE 36TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2018, : 278 - 281
  • [27] Accurate Measurements and Precise Modeling of Power Dissipation of CUDA Kernels toward Power Optimized High Performance CPU-GPU Computing
    Suda, Reiji
    Ren, Da Qi
    2009 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT 2009), 2009, : 432 - 438
  • [28] Fast Parallel CPU-GPU Approximate Spectral Clustering for Transcriptomics Data
    Brankovic, Stefan
    Smiljkovic, Lazar
    Obradovic, Predrag
    Radonjiic, Milos
    Misic, Marko
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2025, 53 (01)
  • [29] Orchestrating Data Placement and Query Execution in Heterogeneous CPU-GPU DBMS
    Yogatama, Bobbi W.
    Gong, Weiwei
    Yu, Xiangyao
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 15 (11): : 2491 - 2503
  • [30] WCET Analysis of the Shared Data Cache in Integrated CPU-GPU Architectures
    Huangfu, Yijie
    Zhang, Wei
    2017 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2017,