Performance Improvement of CUDA Applications by Reducing CPU-GPU Data Transfer Overhead

被引：0

作者：

Sunitha, N., V ^{[1
]}

Raju, K. ^{[1
]}

Chiplunkar, Niranjan N. ^{[1
]}

机构：

[1] NMAMIT, Dept CSE, Nitte, India

来源：

PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICICCT) | 2017年

关键词：

Heterogeneous system; CUDA; Kernel; Stream;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In a CPU-GPU based heterogeneous computing system, the input data to be processed by the kernel resides in the host memory. The host and the device memory address spaces are different. Therefore, the device can not directly access the host memory. In CUDA programming model, the data is moved between the host memory and the device memory. This data transfer is a time consuming task. The communication overhead can be hidden by overlapping the data transfer and the kernel execution. CUDA streams provide a means for overlapping data transfer and the kernel execution. In this paper we explore the effects of overlapping data transfer and the kernel execution on overall execution time of some CUDA applications. The results show that the usage of the different levels of concurrency supported by the streams enhances the performance of the CUDA applications.

引用

页码：211 / 215

页数：5

共 50 条

[21] Big data simulation for surface reconstruction on CPU-GPU platform
Hadi, N. A.
2ND INTERNATIONAL CONFERENCE ON DATA AND INFORMATION SCIENCE, 2019, 1192
[22] Simultaneous CPU-GPU Execution of Data Parallel Algorithmic Skeletons
Wrede, Fabian
Ernsting, Steffen
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2018, 46 (01) : 42 - 61
[23] A High Performance Implementation of Spectral Clustering on CPU-GPU Platforms
Jin, Yu
Jaja, Joseph F.
2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 825 - 834
[24] Random Forests over normalized data in CPU-GPU DBMSes
Huang, Zezhou
Damalapati, Pavan Kalyan
Sen, Rathijit
Wu, Eugene
19TH INTERNATIONAL WORKSHOP ON DATA MANAGEMENT ON NEW HARDWARE, DAMON 2023, 2023, : 98 - 101
[25] Comparison of analytical and ML-based models for predicting CPU-GPU data transfer time
Riahi, Ali
Savadi, Abdorreza
Naghibzadeh, Mahmoud
COMPUTING, 2020, 102 (09) : 2099 - 2116
[26] Reducing Inter-Application Interferences in Integrated CPU-GPU Heterogeneous Architecture
Wen, Hao
Zhang, Wei
2018 IEEE 36TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2018, : 278 - 281
[27] Accurate Measurements and Precise Modeling of Power Dissipation of CUDA Kernels toward Power Optimized High Performance CPU-GPU Computing
Suda, Reiji
Ren, Da Qi
2009 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT 2009), 2009, : 432 - 438
[28] Fast Parallel CPU-GPU Approximate Spectral Clustering for Transcriptomics Data
Brankovic, Stefan
Smiljkovic, Lazar
Obradovic, Predrag
Radonjiic, Milos
Misic, Marko
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2025, 53 (01)
[29] Orchestrating Data Placement and Query Execution in Heterogeneous CPU-GPU DBMS
Yogatama, Bobbi W.
Gong, Weiwei
Yu, Xiangyao
PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 15 (11): : 2491 - 2503
[30] WCET Analysis of the Shared Data Cache in Integrated CPU-GPU Architectures
Huangfu, Yijie
Zhang, Wei
2017 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2017,

← 1 2 3 4 5 →