Data-Oriented Runtime Scheduling Framework on Multi-GPUs

被引：0

作者：

Li, Tao ^{[1
,2
]}

Zhao, Kezhao ^{[1
]}

Dong, Qiankun ^{[1
]}

Leng, Jiabing ^{[1
]}

Yang, Yulu ^{[1
]}

Ma, Wenjing ^{[3
]}

机构：

[1] Nankai Univ, Coll Comp & Control Engn, Tianjin, Peoples R China

[2] Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing, Peoples R China

[3] Chinese Acad Sci, Inst Software, Lab Parallel Software & Comp Sci, State Key Lab Comp Sci, Beijing, Peoples R China

来源：

2016 IEEE TRUSTCOM/BIGDATASE/ISPA | 2016年

基金：

中国国家自然科学基金;

关键词：

GPU; Heterogeneous system; Data-oriented DAG; task scheduling; TASK; FACTORIZATION; SYSTEM;

D O I：

10.1109/TrustCom.2016.207

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

GPU has been generally accepted as an efficient accelerator in the field of high performance computing (HPC). On some heterogeneous systems, multiple GPUs are installed on each computing node. To make things more complicated, these GPUs may even have different architectures. Therefore, it is a challenge to efficiently schedule tasks and data on heterogeneous system. In this paper, we present DoSFoG, a data-oriented runtime scheduling framework on heterogeneous system equipped with multiple GPUs. In DoSFoG, the data blocks, instead of tasks, are taken as the scheduling units. It uses a data-oriented directed acyclic graph (DoDAG) as representation of an application, which is proved to be equivalence to task DAG. Based on DoDAG, a runtime scheduling framework is designed. Besides, a hierarchical storage structure is carefully designed based on the various levels of memory in the system. Page-locked memory and soft cache on GPU device memory are used to improve the data transfer. DoSFoG is evaluated with different applications on a system equipped with different GPUs. The results show that DoSFoG can achieve high data locality, scalability, load balance and performance improvement for large size of data.

引用

页码：1311 / 1318

页数：8

共 50 条

[21] Data-oriented parsing
Klein, D
COMPUTATIONAL LINGUISTICS, 2004, 30 (02) : 240 - 244
[22] A Theoretical Approach to the Data-Oriented Scheduling Strategies across Multiple Clouds
Ma, Yongzheng
Nan, Kai
2013 IEEE SIXTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD 2013), 2013, : 942 - 943
[23] Training Deep Nets with Progressive Batch Normalization on Multi-GPUs
Qin, Lianke
Gong, Yifan
Tang, Tianqi
Wang, Yutian
Jin, Jiangming
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2019, 47 (03) : 373 - 387
[24] Adaptive optimization modeling of preconditioned conjugate gradient on Multi-GPUs
Gao J.
Wang Y.
Wang J.
Liang R.
ACM Transactions on Parallel Computing, 2016, 3 (03) : 1 - 33
[25] ZMCintegral: A package for multi-dimensional Monte Carlo integration on multi-GPUs
Wu, Hong-Zhong
Zhang, Jun-Jie
Pang, Long-Gang
Wang, Qun
COMPUTER PHYSICS COMMUNICATIONS, 2020, 248 (248)
[26] Design of a Data-Oriented GPC
Guan, Zhe
Wakitani, Shin
Yamamoto, Toru
2013 INTERNATIONAL CONFERENCE ON ADVANCED MECHATRONIC SYSTEMS (ICAMECHS), 2013, : 555 - 558
[27] A Study of Graph Analytics for Massive Datasets on Distributed Multi-GPUs
Jatala, Vishwesh
Dathathri, Roshan
Gill, Gurbinder
Hoang, Loc
Nandivada, V. Krishna
Pingali, Keshav
2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020, 2020, : 84 - 94
[28] Training Deep Nets with Progressive Batch Normalization on Multi-GPUs
Lianke Qin
Yifan Gong
Tianqi Tang
Yutian Wang
Jiangming Jin
International Journal of Parallel Programming, 2019, 47 : 373 - 387
[29] A versatile tomographic forward- and backprojection approach on Multi-GPUs
Fehringer, Andreas
Lasser, Tobias
Zanette, Irene
Noel, Peter B.
Pfeiffer, Franz
MEDICAL IMAGING 2014: IMAGE PROCESSING, 2014, 9034
[30] cuFastTucker: A Novel Sparse FastTucker Decomposition For HHLST on Multi-GPUs
Li, Zixuan
Hu, Yikun
Li, Mengquan
Yang, Wangdong
Li, Kenli
ACM TRANSACTIONS ON PARALLEL COMPUTING, 2024, 11 (02)

← 1 2 3 4 5 →