Exploring Plan-Based Scheduling for Large-Scale Computing Systems

被引:9
|
作者
Zheng, Xingwu [1 ]
Zhou, Zhou [2 ]
Yang, Xu [2 ]
Lan, Zhiling [2 ]
Wang, Jia [1 ]
机构
[1] IIT, Dept Elect & Comp Engn, Chicago, IL 60616 USA
[2] IIT, Dept Comp Sci, Chicago, IL 60616 USA
关键词
Plan-based scheduling; Simulated Annealing algorithm; Optimization;
D O I
10.1109/CLUSTER.2016.43
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As HPC systems scale toward exascale, it becomes critical to manage the underlying resource more effectively. While almost all existing resource management systems schedule jobs in a queuing fashion and have drawbacks of making isolated scheduling decisions that would compromise system performance even with backfilling, plan-based schedulers have the potential to generate better job schedules by producing an execution plan of all waiting jobs but do not receive enough attention. In this paper, we present a novel plan-based scheduling system that utilizes simulated annealing as the optimization engine to support effective resource management on HPC systems. As demonstrated by extensive trace-based simulations with workload traces collected from a wide range of production supercomputers, in comparison with the queue-based scheduling system using FCFS with EASY backfilling, our plan-based scheduling system can reduce the job wait time by 40%, reduce the job response time by 30%, while slightly improving system utilization at the same time. Moreover, our plan-based system is able to run online by solving the scheduling problem at each scheduling iteration within one second, making it practical for production HPC systems.
引用
收藏
页码:259 / 268
页数:10
相关论文
共 50 条
  • [1] A load balanced task scheduling heuristic for large-scale computing systems
    Zaman, Sardar Khaliq Uz
    Maqsood, Tahir
    Ali, Mazhar
    Bilal, Kashif
    Madani, Sajjad A.
    Khan, Atta Ur Rehman
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2019, 34 (02): : 79 - 90
  • [2] Energy Efficient Scheduling and Management for Large-Scale Services Computing Systems
    Chen, Ying
    Lin, Chuang
    Huang, Jiwei
    Xiang, Xudong
    Shen, Xuemin
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2017, 10 (02) : 217 - 230
  • [3] Scheduling parallel processes and load balancing in large-scale computing systems
    Kutepov, V. P.
    DCABES 2007 Proceedings, Vols I and II, 2007, : 444 - 448
  • [4] Cooperative scheduling mechanism for large-scale peer-to-peer computing systems
    Rius, Josep
    Cores, Fernando
    Solsona, Francesc
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2013, 36 (06) : 1620 - 1631
  • [5] SYSTEMS FOR VERY LARGE-SCALE COMPUTING
    Jerger, Natalie Enright
    Lipasti, Mikko
    IEEE MICRO, 2011, 31 (03) : 4 - 6
  • [6] Large-scale neuromorphic computing systems
    Furber, Steve
    JOURNAL OF NEURAL ENGINEERING, 2016, 13 (05)
  • [7] Intelligent computing in large-scale systems
    Kolodziej, Joanna
    Gonzalez-Velez, Horacio
    Xhafa, Fatos
    Barolli, Leonard
    KNOWLEDGE ENGINEERING REVIEW, 2015, 30 (02): : 137 - 139
  • [8] A scheduling heuristic for large-scale heterogeneous computing environments
    Du, Xiao Li
    Jiang, Chang Jun
    Vin, Fei
    DCABES 2007 Proceedings, Vols I and II, 2007, : 459 - 463
  • [9] Resources scheduling strategy of very large-scale terrain based on cloud computing
    Zeng, Y. (zyyhost@126.com), 1600, ICIC Express Letters Office, Tokai University, Kumamoto Campus, 9-1-1, Toroku, Kumamoto, 862-8652, Japan (06):
  • [10] Deep and reinforcement learning for automated task scheduling in large-scale cloud computing systems
    Rjoub, Gaith
    Bentahar, Jamal
    Wahab, Omar Abdel
    Bataineh, Ahmed Saleh
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (23):