The case for lifetime reliability-aware microprocessors

被引:0
|
作者
Srinivasan, J [1 ]
Adve, SV [1 ]
Bose, P [1 ]
Rivers, JA [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, Champaign, IL 60680 USA
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Ensuring long processor lifetimes by limiting failures due to wear-out related hard errors is a critical requirement for all microprocessor manufacturers. We observe that continuous device scaling and increasing temperatures are making lifetime reliability targets even harder to meet. However current methodologies for qualifying lifetime reliability are overly conservative since they assume worst-case operating conditions. This paper makes the case that the continued use of such methodologies will significantly and unnecessarily constrain performance. Instead, lifetime reliability awareness at the microarchitectural design stage can mitigate this problem, by designing processors that dynamically adapt in response to the observed usage to meet a reliability target. We make two specific contributions. First, we describe an architecture-level model and its implementation, called RAMP that can dynamically track lifetime reliability, responding to changes in application behavior RAMP is based on state-of-the-art device models for different wearout mechanisms. Second, we propose dynamic reliability management (DRM) - a technique where the processor can respond to changing application behavior to maintain its lifetime reliability target. In contrast to current worst-case behavior based reliability qualification methodologies, DRM allows processors to be qualified for reliability at lower (but more likely) operating points than the worst case. Using RAMP, we show that this can save cost and/or improve performance, that dynamic voltage scaling is an effective response technique for DRM, and that dynamic thermal management neither subsumes nor is subsumed by DRM.
引用
收藏
页码:276 / 287
页数:12
相关论文
共 50 条
  • [21] Reliability-Aware Requirements Development for Autonomy Software
    Meshkat, Leila
    Magnusson, Gudjon
    Diep, Madeline
    Lindvall, Mikael
    2022 68TH ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM (RAMS 2022), 2022,
  • [22] Reliability-Aware Resource Allocation in HPC Systems
    Gottumukkala, Narasimha Raju
    Leangsuksun, Chokchai Box
    Taerat, Narate
    Nassar, Raja
    Scott, Stephen L.
    2007 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, 2007, : 312 - +
  • [23] BRAVO: Balanced Reliability-Aware Voltage Optimization
    Swaminathan, Karthik
    Chandramoorthy, Nandhini
    Cher, Chen-Yong
    Bertran, Ramon
    Buyuktosunoglu, Alper
    Bose, Pradip
    2017 23RD IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2017, : 97 - 108
  • [24] Reliability-Aware Dynamic Voltage and Frequency Scaling
    Firouzi, F.
    Salehi, M. E.
    Wang, F.
    Fakhraie, S. M.
    Safari, S.
    IEEE ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2010), 2010, : 304 - 309
  • [25] On Reliability-Aware Server Consolidation in Cloud Datacenters
    Varasteh, Amir
    Tashtarian, Farzad
    Goudarzi, Maziar
    2017 16TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC-2017), 2017, : 95 - 101
  • [26] Reliability-Aware Scheduling on Heterogeneous Multicore Processors
    Naithani, Ajeya
    Eyerman, Stijn
    Eeckhout, Lieven
    2017 23RD IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2017, : 397 - 408
  • [27] Reliability-aware Virtual Data Center Embedding
    Zuo, Cheng
    Yu, Hongfang
    Anand, Vishal
    2014 6TH INTERNATIONAL WORKSHOP ON RELIABLE NETWORKS DESIGN AND MODELING (RNDM), 2014, : 151 - 157
  • [28] A reliability-aware LDPC code decoding algorithm
    Alles, Matthias
    Brack, Torben
    Welm, Norbert
    2007 IEEE 65TH VEHICULAR TECHNOLOGY CONFERENCE, VOLS 1-6, 2007, : 1544 - 1548
  • [29] Joint Latency and Reliability-Aware Controller Placement
    Rasol, Kurdman Abdulrahman Rasol
    Domingo-Pascual, Jordi
    35TH INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING (ICOIN 2021), 2021, : 197 - 202
  • [30] Reliability-aware core partitioning in chip multiprocessors
    Oz, Isil
    Topcuoglu, Haluk Rahmi
    Kandemir, Mahmut
    Tosun, Oguz
    JOURNAL OF SYSTEMS ARCHITECTURE, 2012, 58 (3-4) : 160 - 176