共 1 条
Performance Characterization of Python']Python Runtimes for Multi-device Task Parallel Programming
被引:0
|作者:
Ruys, William
[1
]
Lee, Hochan
[1
]
You, Bozhi
[1
]
Talati, Shreya
[1
]
Park, Jaeyoung
[1
]
Almgren-Bell, James
[1
]
Yan, Yineng
[1
]
Fernando, Milinda
[1
]
Erez, Mattan
[1
]
Gligoric, Milos
[1
]
Burtscher, Martin
[2
]
Rossbach, Christopher J.
[1
]
Pingali, Keshav
[1
]
Biros, George
[1
]
机构:
[1] Univ Texas Austin, Austin, TX 78712 USA
[2] Texas State Univ, San Marcos, TX USA
关键词:
GPU tasking systems;
HPC in [!text type='Python']Python[!/text;
GPU programming in [!text type='Python']Python[!/text;
Global Interpreter Lock;
Task parallel programming;
D O I:
10.1007/s10766-025-00788-1
中图分类号:
TP301 [理论、方法];
学科分类号:
081202 ;
摘要:
Modern Python programs in high-performance computing call into compiled libraries and kernels for performance-critical tasks. However, effectively parallelizing these finer-grained, and often dynamic, kernels across modern heterogeneous platforms remains a challenge. This paper designs and optimizes a multi-threaded runtime for Python tasks on single-node multi-GPU systems, including tasks that use resources across multiple devices. We perform an experimental study which examines the impact of Python's Global Interpreter Lock (GIL) on runtime performance and the potential gains under a GIL-less PEP703 future. This work explores tasks with variants for different different device sets, introducing new programming abstractions and runtime mechanisms to simplify their management and enhance portability. Our experimental analysis, using tasks graphs from synthetic and real applications, shows at least a 3x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} (and up to 6x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document}) performance improvement over its predecessor in scenarios with high GIL contention. Our implementation of multi-device tasks achieves 8x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} less overhead per task relative to a multi-process alternative using Ray.
引用
收藏
页数:24
相关论文