Optimizing Checkpoint Intervals for Reduced Energy Use in Exascale Systems

被引:0
|
作者
Dauwe, Daniel [1 ]
Jhaveri, Rohan [1 ]
Pasricha, Sudeep [1 ,2 ]
Maciejewski, Anthony A. [1 ]
Siegel, Howard Jay [1 ,2 ]
机构
[1] Colorado State Univ, Dept Elect & Comp Engn, Ft Collins, CO 80523 USA
[2] Colorado State Univ, Dept Comp Sci, Ft Collins, CO 80523 USA
关键词
exascale resilience; checkpoint restart; multilevel checkpointing; fault tolerance; HPC energy efficiency; PERFORMANCE;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In today's high performance computing (HPC) systems, the probability of applications experiencing failures has increased significantly with the increase in the number of system nodes. It is expected that exascale-sized systems are likely to operate with mean time between failures (MTBF) of as little as a few minutes, causing frequent interrupts in application execution as well as substantially greater energy costs in a system that will already consume large amounts of energy. State-of-the-art HPC resilience techniques proposed for use in these future systems complicate the energy problem further as the overhead associated with utilizing these techniques also further increases energy use. While work has been done that attempts to analyze and improve the energy use of systems utilizing resilience techniques, our work offers a new approach through the optimization of checkpoint interval lengths that allows a system designer the freedom to choose between intervals that optimize for application performance efficiency or energy use in both a traditional checkpoint and multilevel checkpoint approach to resilience. We create a set of equations able to optimize for either performance efficiency or energy use, demonstrate that distinct intervals exist when optimizing for either one metric or the other, and examine the sensitivity of this phenomena to changes in several system parameters and application characteristics.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] MIMO systems: Optimizing the use of eigenmodes
    Getu, BN
    Andersen, JB
    Farserotu, JR
    PIMRC 2003: 14TH IEEE 2003 INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS PROCEEDINGS, VOLS 1-3 2003, 2003, : 1129 - 1133
  • [32] Optimizing Use of Course Management Systems
    Wink, Diane M.
    NURSE EDUCATOR, 2011, 36 (01) : 4 - 6
  • [33] Optimizing Reorder Intervals for Two-Echelon Distribution Systems with Stochastic Demand
    Shang, Kevin H.
    Tao, Zhijie
    Zhou, Sean X.
    OPERATIONS RESEARCH, 2015, 63 (02) : 458 - 475
  • [34] Resource and Energy Management in High-Performance Computing: From Heterogeneous to Exascale Systems
    Ahmad, Ishfaq
    2017 INTERNATIONAL CONFERENCE ON INFOCOM TECHNOLOGIES AND UNMANNED SYSTEMS (TRENDS AND FUTURE DIRECTIONS) (ICTUS), 2017, : 70 - 70
  • [35] Optimizing the use of renewable energy sources in the energy mix of Hungary
    Borcsok, Endre
    Gerse, Agnes
    Fulop, Janos
    2019 IEEE 17TH WORLD SYMPOSIUM ON APPLIED MACHINE INTELLIGENCE AND INFORMATICS (SAMI 2019), 2019, : 223 - 227
  • [36] On the Use of Linear Programming in Optimizing Energy Costs
    Javed, Fahad
    Arshad, Naveed
    SELF-ORGANIZING SYSTEMS, PROCEEDINGS, 2008, 5343 : 305 - 310
  • [37] Optimizing Energy Use of SmartFarms with Smartgrid Integration
    Odara, Stephen
    Khan, Zain
    Ustun, Taha Selim
    PROCEEDINGS OF 2015 3RD IEEE INTERNATIONAL RENEWABLE AND SUSTAINABLE ENERGY CONFERENCE (IRSEC'15), 2015, : 592 - 597
  • [38] Skin-electrode circuit model for use in optimizing energy transfer in volume conduction systems
    Hackworth, Steven A.
    Sun, Mingui
    Sclabassi, Robert J.
    2009 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-20, 2009, : 4508 - 4511
  • [39] Optimizing energy production systems under uncertainty
    Huseby, Arne B.
    RISK, RELIABILITY AND SAFETY: INNOVATING THEORY AND PRACTICE, 2017, : 1619 - 1626
  • [40] Reinforcement Learning Techniques in Optimizing Energy Systems
    Stavrev, Stefan
    Ginchev, Dimitar
    ELECTRONICS, 2024, 13 (08)