PPS: Fair and efficient black-box scheduling for multi-tenant GPU clusters

被引:0
|
作者
Ma, Kaihao [1 ]
Cai, Zhenkun [1 ]
Yan, Xiao [2 ]
Zhang, Yang [3 ]
Liu, Zhi [1 ]
Feng, Yihui [3 ]
Li, Chao [3 ]
Lin, Wei [3 ]
Cheng, James [1 ]
机构
[1] Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, Hong Kong
[2] Department of Computer Science and Engineering, Southern University of Science and Technology, Shen Zhen, China
[3] Alibaba Group, Beijing, China
关键词
Graphics processing unit;
D O I
暂无
中图分类号
学科分类号
摘要
引用
收藏
相关论文
共 50 条
  • [1] PPS: Fair and efficient black-box scheduling for multi-tenant GPU clusters
    Ma, Kaihao
    Cai, Zhenkun
    Yan, Xiao
    Zhang, Yang
    Liu, Zhi
    Feng, Yihui
    Li, Chao
    Lin, Wei
    Cheng, James
    PARALLEL COMPUTING, 2024, 120
  • [2] KubeSphere: An Approach to Multi-Tenant Fair Scheduling for Kubernetes Clusters
    Beltre, Angel
    Saha, Pankaj
    Govindaraju, Madhusudhan
    2019 3RD IEEE INTERNATIONAL CONFERENCE ON CLOUD AND FOG COMPUTING TECHNOLOGIES AND APPLICATIONS (IEEE CLOUD SUMMIT 2019), 2019, : 14 - 20
  • [3] Astraea: A Fair Deep Learning Scheduler for Multi-Tenant GPU Clusters
    Ye, Zhisheng
    Sun, Peng
    Gao, Wei
    Zhang, Tianwei
    Wang, Xiaolin
    Yan, Shengen
    Luo, Yingwei
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (11) : 2781 - 2793
  • [4] Elastic Deep Learning in Multi-Tenant GPU Clusters
    Wu, Yidi
    Ma, Kaihao
    Yan, Xiao
    Liu, Zhi
    Cai, Zhenkun
    Huang, Yuzhen
    Cheng, James
    Yuan, Han
    Yu, Fan
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (01) : 144 - 158
  • [5] Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing
    Luo, Yizhou
    Wang, Qiang
    Shi, Shaohuai
    Lai, Jiaxin
    Qi, Shuhan
    Zhang, Jiajia
    Wang, Xuan
    2024 IEEE/ACM 32ND INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE, IWQOS, 2024,
  • [6] MISO: Exploiting Multi-Instance GPU Capability on Multi-Tenant GPU Clusters
    Li, Baolin
    Patel, Tirthak
    Samsi, Siddharth
    Gadepally, Vijay
    Tiwari, Devesh
    PROCEEDINGS OF THE 13TH SYMPOSIUM ON CLOUD COMPUTING, SOCC 2022, 2022, : 173 - 189
  • [7] On Scheduling Ring-All-Reduce Learning Jobs in Multi-Tenant GPU Clusters with Communication Contention
    Yu, Menglu
    Ji, Bo
    Rajan, Hridesh
    Liu, Jia
    PROCEEDINGS OF THE 2022 THE TWENTY-THIRD INTERNATIONAL SYMPOSIUM ON THEORY, ALGORITHMIC FOUNDATIONS, AND PROTOCOL DESIGN FOR MOBILE NETWORKS AND MOBILE COMPUTING, MOBIHOC 2022, 2022, : 21 - 30
  • [8] Daphne: A Flexible and Hybrid Scheduling Framework in Multi-Tenant Clusters
    Xia, Yiqian
    Ren, Rui
    Cai, Hongming
    Vasilakos, Athanasios V.
    Lv, Zheng
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2018, 15 (01): : 330 - 343
  • [9] Looking Beyond GPUs for DNN Scheduling on Multi-Tenant Clusters
    Mohan, Jayashree
    Phanishayee, Amar
    Kulkarni, Janardhan
    Chidambaram, Vijay
    PROCEEDINGS OF THE 16TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, OSDI 2022, 2022, : 579 - 596
  • [10] MTFT : Multi-Tenant Fair Throttling
    Song, Ilhan
    Lee, Sang -Won
    2023 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, BIGCOMP, 2023, : 304 - 307