Optimizing Checkpoint Intervals for Reduced Energy Use in Exascale Systems

被引:0
|
作者
Dauwe, Daniel [1 ]
Jhaveri, Rohan [1 ]
Pasricha, Sudeep [1 ,2 ]
Maciejewski, Anthony A. [1 ]
Siegel, Howard Jay [1 ,2 ]
机构
[1] Colorado State Univ, Dept Elect & Comp Engn, Ft Collins, CO 80523 USA
[2] Colorado State Univ, Dept Comp Sci, Ft Collins, CO 80523 USA
关键词
exascale resilience; checkpoint restart; multilevel checkpointing; fault tolerance; HPC energy efficiency; PERFORMANCE;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In today's high performance computing (HPC) systems, the probability of applications experiencing failures has increased significantly with the increase in the number of system nodes. It is expected that exascale-sized systems are likely to operate with mean time between failures (MTBF) of as little as a few minutes, causing frequent interrupts in application execution as well as substantially greater energy costs in a system that will already consume large amounts of energy. State-of-the-art HPC resilience techniques proposed for use in these future systems complicate the energy problem further as the overhead associated with utilizing these techniques also further increases energy use. While work has been done that attempts to analyze and improve the energy use of systems utilizing resilience techniques, our work offers a new approach through the optimization of checkpoint interval lengths that allows a system designer the freedom to choose between intervals that optimize for application performance efficiency or energy use in both a traditional checkpoint and multilevel checkpoint approach to resilience. We create a set of equations able to optimize for either performance efficiency or energy use, demonstrate that distinct intervals exist when optimizing for either one metric or the other, and examine the sensitivity of this phenomena to changes in several system parameters and application characteristics.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] A utilization model for optimization of checkpoint intervals in distributed stream processing systems
    Jayasekara, Sachini
    Harwood, Aaron
    Karunasekera, Shanika
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 110 : 68 - 79
  • [22] Optimizing Multiwell Aquifer Storage and Recovery Systems for Energy Use and Recovery Efficiency
    Majumdar, Saheli
    Miller, Gretchen R.
    Sheng, Zhuping
    GROUNDWATER, 2021, 59 (05) : 629 - 643
  • [23] Toward Exascale Computing Systems: An Energy Efficient Massive Parallel Computational Model
    Ashraf, Muhammad Usman
    Eassa, Fathy Alburaei
    Albeshri, Aiiad Ahmad
    Algarni, Abdullah
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (02) : 118 - 126
  • [24] Optimizing Energy Use in Automotive Production
    Trampus, Vincent
    MANUFACTURING ENGINEERING, 2014, : 74 - 76
  • [25] First Exascale Flow Simulations of Fission and Fusion Energy Systems Invited Talk
    Merzari, Elia
    PROCEEDINGS OF THE 21ST ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2024, CF 2024, 2024, : 1 - 1
  • [26] AutoTuning and Adaptivity appRoach for Energy efficient eXascale HPC systems: the ANTAREX Approach
    Silvano, Cristina
    Agosta, Giovanni
    Bartolini, Andrea
    Beccari, Andrea R.
    Benini, Luca
    Bispo, Joao
    Cmar, Radim
    Cardoso, Joao M. P.
    Cavazzoni, Carlo
    Martinovic, Jan
    Palermo, Gianluca
    Palkovic, Martin
    Pinto, Pedro
    Rohou, Erven
    Sanna, Nico
    Slaninova, Katerina
    PROCEEDINGS OF THE 2016 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2016, : 708 - 713
  • [27] Optimizing energy use in automotive production
    Manufacturing Engineering, 2014, 153 (05): : 74 - 76
  • [28] OPTIMIZING ENERGY USE - SIMULATION MODULES
    GERY, FW
    SCCS PROCEEDINGS : 22ND ANNUAL SMALL COLLEGE COMPUTING SYMPOSIUM, 1989, : 139 - 145
  • [29] A new approach to optimizing energy systems
    van der Lee, PEA
    Terlaky, T
    Woudstra, T
    COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2001, 190 (40-41) : 5297 - 5310
  • [30] Optimizing the operation of port energy systems
    Prousalidis, John
    Kanellos, Fotios
    Lyridis, Dimitrios
    Dallas, Stefanos
    Spathis, Dimosthenis
    Georgiou, Vassilis
    Mitrou, Panayiotis
    2019 IEEE INTERNATIONAL CONFERENCE ON ENVIRONMENT AND ELECTRICAL ENGINEERING AND 2019 IEEE INDUSTRIAL AND COMMERCIAL POWER SYSTEMS EUROPE (EEEIC / I&CPS EUROPE), 2019,