Speculative Scheduling for Stochastic HPC Applications

被引:7
|
作者
Gainaru, Ana [1 ]
Pallez , Guillaume [2 ]
Sun, Hongyang [1 ]
Raghavan, Padma [1 ]
机构
[1] Vanderbilt Univ, 221 Kirkland Hall, Nashville, TN 37235 USA
[2] Univ Bordeaux, INRIA, Talence, France
关键词
Scheduling algorithm; HPC runtime; stochastic applications; PERFORMANCE; TASKS;
D O I
10.1145/3337821.3337890
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
New emerging fields are developing a growing number of large-scale applications with heterogeneous, dynamic and data-intensive requirements that put a high emphasis on productivity and thus are not tuned to run efficiently on today's high performance computing (HPC) systems. Some of these applications, such as neuroscience workloads and those that use adaptive numerical algorithms, develop modeling and simulation workflows with stochastic execution times and unpredictable resource requirements. When they are deployed on current HPC systems using existing resource management solutions, it can result in loss of efficiency for the users and decrease in effective system utilization for the platform providers. In this paper, we consider the current HPC scheduling model and describe the challenge it poses for stochastic applications due to the strict requirement in its job deployment policies. To address the challenge, we present speculative scheduling techniques that adapt the resource requirements of a stochastic application on-the-fly, based on its past execution behavior instead of relying on estimates given by the user. We focus on improving the overall system utilization and application response time without disrupting the current HPC scheduling model or the application development process. Our solution can operate alongside existing HPC batch schedulers without interfering with their usage modes. We show that speculative scheduling can improve the system utilization and average application response time by 25-30% compared to the classical HPC approach.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] A stochastic model for speculative dynamics
    Gadat, Sebastien
    Miclo, Laurent
    Panloup, Fabien
    ALEA-LATIN AMERICAN JOURNAL OF PROBABILITY AND MATHEMATICAL STATISTICS, 2015, 12 (01): : 491 - 532
  • [22] Parallel Auto-Scheduling of Counting Queries in Machine Learning Applications on HPC Systems
    Bratek, Pawel
    Szustak, Lukasz
    Zola, Jaroslaw
    EURO-PAR 2023: PARALLEL PROCESSING WORKSHOPS, PT II, EURO-PAR 2023, 2024, 14352 : 327 - 333
  • [23] A Pareto-based metaheuristic for scheduling HPC applications on a geographically distributed cloud federation
    Yacine Kessaci
    Nouredine Melab
    El-Ghazali Talbi
    Cluster Computing, 2013, 16 : 451 - 468
  • [24] A Pareto-based metaheuristic for scheduling HPC applications on a geographically distributed cloud federation
    Kessaci, Yacine
    Melab, Nouredine
    Talbi, El-Ghazali
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2013, 16 (03): : 451 - 468
  • [25] DDL-QoS: A dynamic I/O scheduling strategy of QoS for HPC applications
    Yang, Ying
    Shi, Xuanhua
    Liu, Wei
    Jin, Hai
    Hua, Yusheng
    Jiang, Yan
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (07):
  • [26] Speculative trace scheduling in VLIW processors
    Agarwal, M
    Nandy, SK
    von Eijndhoven, J
    Balakrishnan, S
    ICCD'2002: IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN: VLSI IN COMPUTERS AND PROCESSORS, PROCEEDINGS, 2002, : 408 - 413
  • [27] Speculative energy scheduling for LDPC decoding
    Wang, Weihuang
    Choi, Gwan
    ISQED 2007: PROCEEDINGS OF THE EIGHTH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN, 2007, : 79 - +
  • [28] Hybrid Workload Scheduling on HPC Systems
    Fan, Yuping
    Lan, Zhiling
    Rich, Paul
    Allcock, William
    Papka, Michael E.
    2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022), 2022, : 470 - 480
  • [29] Scheduling Strategies for HPC as a Service (HPCaaS)
    Shainer, Gilad
    Tong Liu
    Layton, Jeffrey
    Mora, Joshua
    2009 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING AND WORKSHOPS, 2009, : 447 - +
  • [30] Scheduling MapReduce Jobs in HPC Clusters
    Neves, Marcelo Veiga
    Ferreto, Tiago
    De Rose, Cesar
    EURO-PAR 2012 PARALLEL PROCESSING, 2012, 7484 : 179 - 190