EENet: Energy Efficient Neural Networks with Run-time Power Management

被引:0
|
作者
Li, Xiangjie [1 ]
Shen, Yingtao [1 ]
Zou, An [1 ]
Ma, Yehan [1 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
关键词
Neural Networks; Early Exit; Energy Efficiency; Inference Time; Feedback Control;
D O I
10.1109/DAC56929.2023.10247701
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning approaches, such as convolution neural networks (CNNs), have achieved tremendous success in versatile applications. However, one of the challenges to deploy the deep learning models on resource-constrained systems is its huge energy cost. As a dynamic inference approach, early exit adds exiting layers to the networks, which can terminate the inference earlier with accurate results to save energy. The current passive decision-making for energy regulation of early exit cannot adapt to ongoing inference status, varying inference workloads, and timing constraints, let alone guide the reasonable configuration of the computing platforms alongside the inference proceeds for potential energy saving. In this paper, we propose an Energy Efficient Neural Networks (EENet), which introduces a plug-in module to the state-of-the-art networks by incorporating run-time power management. Within each inference, we establish prediction of where the network will exit and adjust computing configurations (i.e., frequency and voltage) accordingly over a small timescale. Considering multiple inferences over a large timescale, we provide frequency and voltage calibration advice, given inference workloads and timing constraints. Finally, the dynamic voltage and frequency scaling (DVFS) governor configures voltage and frequency to execute the network according to the prediction and calibration. Extensive experimental results demonstrate that EENet achieves up to 63.8% energy-saving compared with classic deep learning networks and 21.5% energy-saving compared with the early exit under state-of-the-art exiting strategies, together with improved timing performance.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] PARMA: Parallelization-Aware Run-Time Management for Energy-Efficient Many-Core Systems
    Al-hayanni, Mohammed A. Noaman
    Rafiev, Ashur
    Xia, Fei
    Shafik, Rishad
    Romanovsky, Alexander
    Yakovlev, Alex
    IEEE TRANSACTIONS ON COMPUTERS, 2020, 69 (10) : 1507 - 1518
  • [32] Run-Time Thermal Management for Lifetime Optimization in Low-Power Designs
    Rossi, Daniele
    Tenentes, Vasileios
    ELECTRONICS, 2022, 11 (03)
  • [33] Run-time Power-gating in Caches of GPUs for Leakage Energy Savings
    Wang, Yue
    Roy, Soumyaroop
    Ranganathan, Nagarajan
    DESIGN, AUTOMATION & TEST IN EUROPE (DATE 2012), 2012, : 300 - 303
  • [34] Server Power Modeling for Run-time Energy Optimization of Cloud Computing Facilities
    Arroba, Patricia
    Risco-Martin, Jose L.
    Zapater, Marina
    Moya, Jose M.
    Ayala, Jose L.
    Olcoz, Katzalin
    6TH INTERNATIONAL CONFERENCE ON SUSTAINABILITY IN ENERGY AND BUILDINGS, 2014, 62 : 401 - 410
  • [35] A run-time generic decision framework for power and performance management on mobile devices
    Peres, Martin
    Chalouf, Mohamed Aymen
    Krief, Francine
    2014 IEEE 11TH INTL CONF ON UBIQUITOUS INTELLIGENCE AND COMPUTING AND 2014 IEEE 11TH INTL CONF ON AUTONOMIC AND TRUSTED COMPUTING AND 2014 IEEE 14TH INTL CONF ON SCALABLE COMPUTING AND COMMUNICATIONS AND ITS ASSOCIATED WORKSHOPS, 2014, : 72 - 79
  • [36] Deep Quantization of Graph Neural Networks with Run-Time Hardware-Aware Training
    Hansson, Olle
    Grailoo, Mahdieh
    Gustafsson, Oscar
    Nunez-Yanez, Jose
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2024, 14553 LNCS : 33 - 47
  • [37] Run-time Non-uniform Quantization for Dynamic Neural Networks in Wireless Communication
    Allwin, Priscilla Sharon
    Gomony, Manil Dev
    Geilen, Marc
    29TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC 2024, 2024, : 915 - 920
  • [38] Energy Reduction with Run-Time Partial Reconfiguration
    Liu, Shaoshan
    Pittman, Richard Neil
    Forin, Alessandro
    FPGA 10, 2010, : 292 - 292
  • [39] Deep Quantization of Graph Neural Networks with Run-Time Hardware-Aware Training
    Hansson, Olle
    Grailoo, Mahdieh
    Gustafsson, Oscar
    Nunez-Yanez, Jose
    APPLIED RECONFIGURABLE COMPUTING. ARCHITECTURES, TOOLS, AND APPLICATIONS, ARC 2024, 2024, 14553 : 33 - 47
  • [40] A Fuzzy Logic Based Power-Efficient Run-Time Reconfigurable Multicore System
    Najam, Shaheryar
    Qadri, Muhammad Yasir
    Najam, Zohaib
    Ahmed, Jameel
    Qadri, Nadia N.
    CHINESE JOURNAL OF ELECTRONICS, 2018, 27 (03) : 549 - 555