NPV-DQN: Improving Value-based Reinforcement Learning, by Variable Discount Factor, with Control Applications

被引：0

作者：

Paczolay, Gabor ^{[1
]}

Harmati, Istvan ^{[1
]}

机构：

[1] Budapest Univ Technol & Econ, Dept Control Engn, Magyar tudosok krt 2,1 bldg, H-1117 Budapest, Hungary

来源：

ACTA POLYTECHNICA HUNGARICA | 2024年 / 21卷 / 11期

关键词：

reinforcement learning; DQN; NPV; NPV-DQN;

D O I：

暂无

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

Discount factor plays an important role in reinforcement learning algorithms. It decides how much future rewards are valued for the present time-step. In this paper, a system with a Q value estimation, based on two distinct discount factors are utilized. These estimations can later be merged into one network, to make the computations more efficient. The decision of which network to use, is based on the relative value of the maximum value of the short-term network, the more unambiguous the maximum is, the more probability is rendered to the selection of that network. The system is then benchmarked, on a cartpole and a gridworld environment.

引用

页码：175 / 190

页数：16

共 50 条

[21] A value-based deep reinforcement learning model with human expertise in optimal treatment of sepsis
XiaoDan Wu
RuiChang Li
Zhen He
TianZhi Yu
ChangQing Cheng
npj Digital Medicine, 6
[22] Rethinking Exploration and Experience Exploitation in Value-Based Multi-Agent Reinforcement Learning
Borzilov, Anatolii
Skrynnik, Alexey
Panov, Aleksandr
IEEE ACCESS, 2025, 13 : 13770 - 13781
[23] Variable Sampling Period Adaptive Control Based on Reinforcement Learning
Lemos, Joao M.
Parente, Francisco
Cunha, Rita
CONTROLO 2022, 2022, 930 : 577 - 586
[24] Guest Editorial: Reinforcement Learning based Control Applications
Lee, Donghwan
Lee, Deok-Jin
Journal of Institute of Control, Robotics and Systems, 2022, 28 (11)
[25] Robust Control for Uncertain Discrete-Time Linear Systems Using Reinforcement Learning With Discount Factor
Ding, Yuntian
Yang, Yuxiao
Yan, Zhilian
Tai, Weipeng
IAENG International Journal of Applied Mathematics, 2024, 54 (12) : 2783 - 2791
[26] Value-based Healthcare: Improving Outcomes through Patient Activation and Risk Factor Modification
Alokozai, Aaron
Jayakumar, Prakash
Bozic, Kevin J.
CLINICAL ORTHOPAEDICS AND RELATED RESEARCH, 2019, 477 (11) : 2418 - 2420
[27] Inverse Optimal Control with Discount Factor for Continuous and Discrete-Time Control-Affine Systems and Reinforcement Learning
Rodrigues, Luis
2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 5783 - 5788
[28] Corticostriatal circuit mechanisms of value-based action selection: Implementation of reinforcement learning algorithms and beyond
Morita, Kenji
Jitsev, Jenia
Morrison, Abigail
BEHAVIOURAL BRAIN RESEARCH, 2016, 311 : 110 - 121
[29] Convex Programs and Lyapunov Functions for Reinforcement Learning: A Unified Perspective on the Analysis of Value-Based Methods
Guo, Xingang
Hu, Bin
2022 AMERICAN CONTROL CONFERENCE, ACC, 2022, : 3317 - 3322
[30] Value-based reinforcement learning approaches for task offloading in Delay Constrained Vehicular Edge Computing
Do Bao Son
Ta Huu Binh
Vo, Hiep Khac
Binh Minh Nguyen
Huynh Thi Thanh Binh
Yu, Shui
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 113

← 1 2 3 4 5 →