Performance Improvement of CUDA Applications by Reducing CPU-GPU Data Transfer Overhead

被引:0
|
作者
Sunitha, N., V [1 ]
Raju, K. [1 ]
Chiplunkar, Niranjan N. [1 ]
机构
[1] NMAMIT, Dept CSE, Nitte, India
关键词
Heterogeneous system; CUDA; Kernel; Stream;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In a CPU-GPU based heterogeneous computing system, the input data to be processed by the kernel resides in the host memory. The host and the device memory address spaces are different. Therefore, the device can not directly access the host memory. In CUDA programming model, the data is moved between the host memory and the device memory. This data transfer is a time consuming task. The communication overhead can be hidden by overlapping the data transfer and the kernel execution. CUDA streams provide a means for overlapping data transfer and the kernel execution. In this paper we explore the effects of overlapping data transfer and the kernel execution on overall execution time of some CUDA applications. The results show that the usage of the different levels of concurrency supported by the streams enhances the performance of the CUDA applications.
引用
收藏
页码:211 / 215
页数:5
相关论文
共 50 条
  • [41] A high-performance dynamic scheduling for sparse matrix-based applications on heterogeneous CPU-GPU environment
    Shokrani Baigi, Ahmad
    Savadi, Abdorreza
    Naghibzadeh, Mahmoud
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (17): : 25071 - 25098
  • [42] Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems
    Lee, Janghaeng
    Samadi, Mehrzad
    Park, Yongjun
    Mahlke, Scott
    2013 22ND INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT), 2013, : 245 - 255
  • [43] Improving Mobile Gaming Performance through Cooperative CPU-GPU Thermal Management
    Prakash, Alok
    Amrouch, Hussam
    Shafique, Muhammad
    Mitra, Tulika
    Henkel, Joerg
    2016 ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2016,
  • [44] GraphVite: A High-Performance CPU-GPU Hybrid System for Node Embedding
    Zhu, Zhaocheng
    Xu, Shizhen
    Qu, Meng
    Tang, Jian
    WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 2494 - 2504
  • [45] Feedback Control Optimization for Performance and Energy Efficiency on CPU-GPU Heterogeneous Systems
    Lin, Feng-Sheng
    Liu, Po-Ting
    Li, Ming-Hua
    Hsiung, Pao-Ann
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2016, 2016, 10048 : 388 - 404
  • [46] Power Capping of CPU-GPU Heterogeneous Systems using Power and Performance Models
    Tsuzuku, Kazuki
    Endo, Toshio
    SMARTGREENS 2015 PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON SMART CITIES AND GREEN ICT SYSTEMS, 2015, : 226 - 233
  • [47] Deep learning based data prefetching in CPU-GPU unified virtual memory
    Long, Xinjian
    Gong, Xiangyang
    Zhang, Bo
    Zhou, Huiyang
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2023, 174 : 19 - 31
  • [48] Resource Scheduling Strategy for Performance Optimization Based on Heterogeneous CPU-GPU Platform
    Fang, Juan
    Zhou, Kuan
    Zhang, Mengyuan
    Xiang, Wei
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 73 (01): : 1621 - 1635
  • [49] P4GPU: Acceleration of Programmable Data Plane Using a CPU-GPU Heterogeneous Architecture
    Li, Peilong
    Luo, Yan
    2016 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE SWITCHING AND ROUTING (HPSR), 2016, : 168 - 175
  • [50] Performance Measurement of Applications with GPU Acceleration using CUDA
    Mayanglambam, Shangkar
    Malony, Allen D.
    Sottile, Matthew J.
    PARALLEL COMPUTING: FROM MULTICORES AND GPU'S TO PETASCALE, 2010, 19 : 341 - 348