Optimizing Checkpoint Intervals for Reduced Energy Use in Exascale Systems

被引:0
|
作者
Dauwe, Daniel [1 ]
Jhaveri, Rohan [1 ]
Pasricha, Sudeep [1 ,2 ]
Maciejewski, Anthony A. [1 ]
Siegel, Howard Jay [1 ,2 ]
机构
[1] Colorado State Univ, Dept Elect & Comp Engn, Ft Collins, CO 80523 USA
[2] Colorado State Univ, Dept Comp Sci, Ft Collins, CO 80523 USA
关键词
exascale resilience; checkpoint restart; multilevel checkpointing; fault tolerance; HPC energy efficiency; PERFORMANCE;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In today's high performance computing (HPC) systems, the probability of applications experiencing failures has increased significantly with the increase in the number of system nodes. It is expected that exascale-sized systems are likely to operate with mean time between failures (MTBF) of as little as a few minutes, causing frequent interrupts in application execution as well as substantially greater energy costs in a system that will already consume large amounts of energy. State-of-the-art HPC resilience techniques proposed for use in these future systems complicate the energy problem further as the overhead associated with utilizing these techniques also further increases energy use. While work has been done that attempts to analyze and improve the energy use of systems utilizing resilience techniques, our work offers a new approach through the optimization of checkpoint interval lengths that allows a system designer the freedom to choose between intervals that optimize for application performance efficiency or energy use in both a traditional checkpoint and multilevel checkpoint approach to resilience. We create a set of equations able to optimize for either performance efficiency or energy use, demonstrate that distinct intervals exist when optimizing for either one metric or the other, and examine the sensitivity of this phenomena to changes in several system parameters and application characteristics.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] OPTIMIZING COMFORT AND ENERGY USE IN REHEAT SYSTEMS
    Hutzel, William J.
    Odukomaiya, Oluwaseun
    ES2009: PROCEEDINGS OF THE ASME 3RD INTERNATIONAL CONFERENCE ON ENERGY SUSTAINABILITY, VOL 2, 2009, : 261 - 270
  • [2] Updating the Energy Model for Future Exascale Systems
    Kogge, Peter M.
    HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2015, 2015, 9137 : 323 - 339
  • [3] Energy Efficient Runtime Framework for Exascale Systems
    Mhedheb, Yousri
    Streit, Achim
    HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2016 INTERNATIONAL WORKSHOPS, 2016, 9945 : 32 - 44
  • [4] INEDIBLE RENDERING SYSTEMS WITH REDUCED ENERGY USE
    PROKOP, WH
    JOURNAL OF THE AMERICAN OIL CHEMISTS SOCIETY, 1984, 61 (04) : 693 - 693
  • [5] On the Use of Commodity Ethernet Technology in Exascale HPC Systems
    Benito, Mariano
    Vallejo, Enrique
    Beivide, Ramon
    2015 IEEE 22ND INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2015, : 254 - 263
  • [6] INTELLIGENT APPROACHES FOR MODELING AND OPTIMIZING HVAC SYSTEMS' ENERGY USE
    Tesiero, Raymond C., III
    Nassif, Nabil
    Gokaraju, Balakrishna
    Doss, Daniel Adrian
    PROCEEDINGS OF THE ASME 11TH INTERNATIONAL CONFERENCE ON ENERGY SUSTAINABILITY, 2017, 2017,
  • [7] On the energy footprint of I/O management in Exascale HPC systems
    Dorier, Matthieu
    Yildiz, Orcun
    Ibrahim, Shadi
    Orgerie, Anne-Cecile
    Antoniu, Gabriel
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2016, 62 : 17 - 28
  • [8] Optimizing binary decision systems by manipulating transmission intervals
    Lexa, MA
    Johnson, DH
    SEVENTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOL 2, PROCEEDINGS, 2003, : 339 - 342
  • [9] Optimizing checkpoint intervals for real-time multi-tasks with arbitrary periods
    Kwak S.W.
    Yang J.-M.
    Transactions of the Korean Institute of Electrical Engineers, 2011, 60 (01): : 193 - 200
  • [10] Energy-Performance Modeling of Speculative Checkpointing for Exascale Systems
    Alfian Amrizal, Muhammad
    Uno, Atsuya
    Sato, Yukinori
    Takizawa, Hiroyuki
    Kobayashi, Hiroaki
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (12): : 2749 - 2760