System-level dynamic thermal management for high-performance microprocessors

被引:56
|
作者
Kumar, Amit [1 ]
Shang, Li [2 ]
Peh, Li-Shiuan [1 ]
Jha, Niraj K. [1 ]
机构
[1] Princeton Univ, Dept Elect Engn, Princeton, NJ 08544 USA
[2] Queens Univ, Dept Elect & Comp Engn, Kingston, ON K7L 3N6, Canada
基金
美国国家科学基金会;
关键词
dynamic thermal management; hybrid hardware-software management; thermal model;
D O I
10.1109/TCAD.2007.907062
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Thermal issues are fast becoming major design constraints in high-performance systems. Temperature variations adversely affect system reliability and prompt worst-case design. In recent history, researchers have proposed dynamic thermal-management (DTM) techniques targeting average-case design and tackling the temperature issue at runtime. While past work on DTM has focused on different techniques in isolation, it fails to consider a system-level approach which uses both hardware and software support in a synergistic fashion and hence leads to a significant execution-time overhead. In this paper, we propose HybDTM, a system-level framework for doing fine-grained coordinated thermal management using a hybrid of hardware techniques (like clock gating) and software techniques (like thermal-aware process scheduling), leveraging the advantages of both approaches in a synergistic fashion. We show that while hardware techniques can be used reactively to manage the overall temperature in case of thermal emergencies, proactive use of software techniques can build on top of it to balance the overall thermal profile with minimal overhead using the operating system (OS) support. In order to evaluate our proposed hybrid-DTM policy, we develop a novel regression-based thermal model, providing fast and accurate temperature estimates to do runtime thermal characterization of all applications running on the system, using hardware performance counters available in modern high-performance processors alongside thermal sensors for training the model at runtime. Our model is validated against actual temperature measurements from online thermal sensors, with the average estimation error found to be less than 5%. We also study system-level DTM issues, jointly considering both the processor and memory, and show how a unified DTM approach can benefit from global knowledge of individual system components. We evaluate our proposed methodology on a desktop system with an Intel Pentium-4 processor and a modified Linux OS, running a number of SPEC2000 benchmarks, in both uniprocessor and simultaneous multithreaded environments and show that our proposed technique is able to successfully manage the overall temperature with an average execution-time overhead of only 10.4% (20.1% maximum) compared to the case without any DTM, as opposed to 23.9% (46% maximum) overhead for purely hardware-based DTM. Our system, including the thermal-aware OS, built-in runtime thermal-characterization model, and interface to the underlying hardware using the Pentium-4 processor, is ready for release.
引用
收藏
页码:96 / 108
页数:13
相关论文
共 50 条
  • [11] Mathematical formulation and demonstration of a dynamic system-level ship thermal management tool
    Yang, S.
    Ordonez, J. C.
    Vargas, J. V. C.
    Chalfant, J.
    Chryssostomidis, C.
    ADVANCES IN ENGINEERING SOFTWARE, 2016, 100 : 1 - 18
  • [12] Thermally anisotropic composite heat spreaders for enhanced thermal management of high-performance microprocessors
    Suszko, Arthur
    El-Genk, Mohamed S.
    INTERNATIONAL JOURNAL OF THERMAL SCIENCES, 2016, 100 : 213 - 228
  • [13] Bringing High-Performance Microprocessors up to Space Level Reliability
    Frasse-Sombet, Sebastien
    Brunel, Yoann
    Mariottini, Sacha
    2015 IEEE AEROSPACE CONFERENCE, 2015,
  • [14] System-level transient thermal analysis for performance optimization of high power microelectronics
    Chiriac, VA
    Lee, TYT
    Chen, HS
    ADVANCES IN ELECTRONIC PACKAGING 2003, VOL 2, 2003, : 435 - 442
  • [15] High-performance RISC microprocessors
    Choquette, J
    Gupta, M
    McCarthy, D
    Veenstra, J
    IEEE MICRO, 1999, 19 (04) : 48 - 55
  • [16] System-level virtualization for high performance computing
    Vallee, Geoffroy
    Naughton, Thomas
    Engelmann, Christian
    Ong, Hong
    Scott, Stephen L.
    PROCEEDINGS OF THE 16TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, 2008, : 636 - +
  • [17] Thermal management of high performance microprocessors in burn-in environment
    Vassighi, A
    Semenov, O
    Sachdev, M
    Keshavarzi, A
    18TH IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI SYSTEMS, PROCEEDINGS, 2003, : 313 - 319
  • [18] System-Level Reliability Case Studies of High-Performance Automotive and Medical IC Packages
    Jiang, Li
    Li, Guangxu
    Zeng, Kejun
    Williamson, Jaimal
    Murugan, Rajen
    2023 IEEE 73RD ELECTRONIC COMPONENTS AND TECHNOLOGY CONFERENCE, ECTC, 2023, : 828 - 833
  • [19] A survey of design techniques for system-level dynamic power management
    Benini, L
    Bogliolo, A
    De Micheli, G
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2000, 8 (03) : 299 - 316
  • [20] Variation-Tolerant Dynamic Power Management at the System-Level
    Chandra, Saumya
    Lahiri, Kanishka
    Raghunathan, Anand
    Dey, Sujit
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2009, 17 (09) : 1220 - 1232