Improving the learning process of deep reinforcement learning agents operating in collective heating environments

被引:0
|
作者
Jacobs, Stef [1 ,2 ]
Ghane, Sara [3 ]
Houben, Pieter Jan [2 ]
Kabbara, Zakarya [1 ]
Huybrechts, Thomas [3 ]
Hellinckx, Peter [2 ]
Verhaert, Ivan [1 ]
机构
[1] Univ Antwerp, Fac Appl Engn Electromech Engn Technol, EMIB, Groenenborgerlaan 171, B-2020 Antwerp, Belgium
[2] Univ Antwerp, Fac Appl Engn Elect ICT, M4S, Groenenborgerlaan 171, B-2020 Antwerp, Belgium
[3] Univ Antwerp, IMEC, IDLab, Fac Appl Engn, Sint Pietersvliet 7, B-2000 Antwerp, Belgium
关键词
Reinforcement learning; Thermal inertia; Control strategy; Discount factor; Learning rate schedule; Collective heating; PPO; BUILDING ENERGY; SYSTEMS;
D O I
10.1016/j.apenergy.2025.125420
中图分类号
TE [石油、天然气工业]; TK [能源与动力工程];
学科分类号
0807 ; 0820 ;
摘要
Deep reinforcement learning (DRL) can be used to optimise the performance of Collective Heating Systems (CHS) by reducing operational costs while ensuring thermal comfort. However, heating systems often exhibit slow responsiveness to control inputs due to thermal inertia, which delays the effects of actions such as adapting temperature set points. This delayed feedback complicates the learning process for DRL agents, as it becomes more difficult to associate specific control actions with their outcomes. To address this challenge, this study evaluates four hyperparameter schemes during training. The focus lies on schemes with varying learning rate (the rate at which weights in neural networks are adapted) and/or discount factor (the importance the DRL agent attaches to future rewards). In this respect, we introduce the GALER approach, which combines the progressive increase of the discount factor with the reduction of the learning rate throughout the training process. The effectiveness of the four learning schemes is evaluated using the actor-critic Proximal Policy Optimization (PPO) algorithm for three types of CHS with a multi-objective reward function balancing thermal comfort and energy use or operational costs. The results demonstrate that energy-based reward functions allow for limited optimisation possibilities, while the GALER scheme yields the highest potential for price-based optimisation across all considered concepts. It achieved a 3%-15% performance improvement over other successful training schemes. DRL agents trained with GALER schemes strategically anticipate on high-price times by lowering the supply temperature and vice versa. This research highlights the advantage of varying both learning rates and discount factors when training DRL agents to operate in complex multi-objective environments with slow responsiveness.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Learning to Navigate in Human Environments via Deep Reinforcement Learning
    Gao, Xingyuan
    Sun, Shiying
    Zhao, Xiaoguang
    Tan, Min
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT I, 2019, 11953 : 418 - 429
  • [2] Learning key steps to attack deep reinforcement learning agents
    Chien-Min Yu
    Ming-Hsin Chen
    Hsuan-Tien Lin
    Machine Learning, 2023, 112 : 1499 - 1522
  • [3] Learning key steps to attack deep reinforcement learning agents
    Yu, Chien-Min
    Chen, Ming-Hsin
    Lin, Hsuan-Tien
    MACHINE LEARNING, 2023, 112 (05) : 1499 - 1522
  • [4] Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
    Das, Abhishek
    Kottur, Satwik
    Moura, Jose M. F.
    Lee, Stefan
    Batra, Dhruv
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2970 - 2979
  • [5] Temporal encoding in deep reinforcement learning agents
    Dongyan Lin
    Ann Zixiang Huang
    Blake Aaron Richards
    Scientific Reports, 13
  • [6] Interval timing in deep reinforcement learning agents
    Deverett, Ben
    Faulkner, Ryan
    Fortunato, Meire
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [7] Perspective Taking in Deep Reinforcement Learning Agents
    Labash, Aqeel
    Aru, Jaan
    Matiisen, Tambet
    Tampuu, Ardi
    Vicente, Raul
    FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2020, 14 (14)
  • [8] Goal Modelling for Deep Reinforcement Learning Agents
    Leung, Jonathan
    Shen, Zhiqi
    Zeng, Zhiwei
    Miao, Chunyan
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, 2021, 12975 : 271 - 286
  • [9] Temporal encoding in deep reinforcement learning agents
    Lin, Dongyan
    Huang, Ann Zixiang
    Richards, Blake Aaron
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [10] Temporal Explanations of Deep Reinforcement Learning Agents
    Towers, Mark
    Du, Yali
    Freeman, Christopher
    Norman, Tim
    EXPLAINABLE AND TRANSPARENT AI AND MULTI-AGENT SYSTEMS, EXTRAAMAS 2024, 2024, 14847 : 99 - 115