The case for lifetime reliability-aware microprocessors

被引:0
|
作者
Srinivasan, J [1 ]
Adve, SV [1 ]
Bose, P [1 ]
Rivers, JA [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, Champaign, IL 60680 USA
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Ensuring long processor lifetimes by limiting failures due to wear-out related hard errors is a critical requirement for all microprocessor manufacturers. We observe that continuous device scaling and increasing temperatures are making lifetime reliability targets even harder to meet. However current methodologies for qualifying lifetime reliability are overly conservative since they assume worst-case operating conditions. This paper makes the case that the continued use of such methodologies will significantly and unnecessarily constrain performance. Instead, lifetime reliability awareness at the microarchitectural design stage can mitigate this problem, by designing processors that dynamically adapt in response to the observed usage to meet a reliability target. We make two specific contributions. First, we describe an architecture-level model and its implementation, called RAMP that can dynamically track lifetime reliability, responding to changes in application behavior RAMP is based on state-of-the-art device models for different wearout mechanisms. Second, we propose dynamic reliability management (DRM) - a technique where the processor can respond to changing application behavior to maintain its lifetime reliability target. In contrast to current worst-case behavior based reliability qualification methodologies, DRM allows processors to be qualified for reliability at lower (but more likely) operating points than the worst case. Using RAMP, we show that this can save cost and/or improve performance, that dynamic voltage scaling is an effective response technique for DRM, and that dynamic thermal management neither subsumes nor is subsumed by DRM.
引用
收藏
页码:276 / 287
页数:12
相关论文
共 50 条
  • [41] A reliability-aware vehicular crowdsensing system for pothole profiling
    Zhong W.
    Suo Q.
    Ma F.
    Hou Y.
    Gupta A.
    Qiao C.
    Su L.
    Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2019, 3 (04):
  • [42] ATARI: Advanced Thermomigration Analysis for Reliability-Aware Interconnects
    Axelou, Olympia
    Tselepi, Eleni
    Floros, George
    2024 30TH INTERNATIONAL WORKSHOP ON THERMAL INVESTIGATIONS OF ICS AND SYSTEMS, THERMINIC 2024, 2024,
  • [43] Fog Resource Provisioning in Reliability-Aware IoT Networks
    Yao, Jingjing
    Ansari, Nirwan
    IEEE INTERNET OF THINGS JOURNAL, 2019, 6 (05) : 8262 - 8269
  • [44] Reliability-aware SOC voltage islands partition and floorplan
    Yang, Tshengqi
    Wolf, Wayne
    Vijaykrishnan, N.
    Xie, Yuan
    IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI, PROCEEDINGS: EMERGING VLSI TECHNOLOGIES AND ARCHITECTURES, 2006, : 343 - +
  • [45] Reliability-aware automatic composition approach for web services
    Li Mu
    Li Bo
    Huai JinPeng
    SCIENCE CHINA-INFORMATION SCIENCES, 2012, 55 (04) : 921 - 937
  • [46] Reliability-Aware Design Automation Flow for Analog Circuits
    Liu, Chien-Nan Jimmy
    Chen, Yen-Lung
    Liu, Tsung-Yu
    Chen, Tai-Chen
    2015 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC), 2015, : 1 - 2
  • [47] Reliability-Aware Data Placement for Heterogeneous Memory Architecture
    Gupta, Manish
    Sridharan, Vilas
    Roberts, David
    Prodromou, Andreas
    Venkat, Ashish
    Tullsen, Dean
    Gupta, Rajesh
    2018 24TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2018, : 583 - 595
  • [48] Enhancing Reliability-Aware Speedup Modeling via Replication
    Hussain, Zaeem
    Znati, Taieb
    Melhem, Rami
    2020 50TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN 2020), 2020, : 528 - 539
  • [49] RAISE: Reliability-Aware Instruction SchEduling for Unreliable Hardware
    Rehman, Semeen
    Shafique, Muhammad
    Kriebel, Florian
    Henkel, Joerg
    2012 17TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2012, : 671 - 676
  • [50] Reliability-aware design for nanometer-scale devices
    Atienza, David
    De Micheli, Giovanni
    Benini, Luca
    Ayala, Jose L.
    Del Valle, Pablo G.
    DeBole, Michael
    Narayanan, Vijay
    2008 ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, VOLS 1 AND 2, 2008, : 503 - +