NPV-DQN: Improving Value-based Reinforcement Learning, by Variable Discount Factor, with Control Applications

被引:0
|
作者
Paczolay, Gabor [1 ]
Harmati, Istvan [1 ]
机构
[1] Budapest Univ Technol & Econ, Dept Control Engn, Magyar tudosok krt 2,1 bldg, H-1117 Budapest, Hungary
关键词
reinforcement learning; DQN; NPV; NPV-DQN;
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Discount factor plays an important role in reinforcement learning algorithms. It decides how much future rewards are valued for the present time-step. In this paper, a system with a Q value estimation, based on two distinct discount factors are utilized. These estimations can later be merged into one network, to make the computations more efficient. The decision of which network to use, is based on the relative value of the maximum value of the short-term network, the more unambiguous the maximum is, the more probability is rendered to the selection of that network. The system is then benchmarked, on a cartpole and a gridworld environment.
引用
收藏
页码:175 / 190
页数:16
相关论文
共 50 条
  • [21] A value-based deep reinforcement learning model with human expertise in optimal treatment of sepsis
    XiaoDan Wu
    RuiChang Li
    Zhen He
    TianZhi Yu
    ChangQing Cheng
    npj Digital Medicine, 6
  • [22] Rethinking Exploration and Experience Exploitation in Value-Based Multi-Agent Reinforcement Learning
    Borzilov, Anatolii
    Skrynnik, Alexey
    Panov, Aleksandr
    IEEE ACCESS, 2025, 13 : 13770 - 13781
  • [23] Variable Sampling Period Adaptive Control Based on Reinforcement Learning
    Lemos, Joao M.
    Parente, Francisco
    Cunha, Rita
    CONTROLO 2022, 2022, 930 : 577 - 586
  • [24] Guest Editorial: Reinforcement Learning based Control Applications
    Lee, Donghwan
    Lee, Deok-Jin
    Journal of Institute of Control, Robotics and Systems, 2022, 28 (11)
  • [25] Robust Control for Uncertain Discrete-Time Linear Systems Using Reinforcement Learning With Discount Factor
    Ding, Yuntian
    Yang, Yuxiao
    Yan, Zhilian
    Tai, Weipeng
    IAENG International Journal of Applied Mathematics, 2024, 54 (12) : 2783 - 2791
  • [26] Value-based Healthcare: Improving Outcomes through Patient Activation and Risk Factor Modification
    Alokozai, Aaron
    Jayakumar, Prakash
    Bozic, Kevin J.
    CLINICAL ORTHOPAEDICS AND RELATED RESEARCH, 2019, 477 (11) : 2418 - 2420
  • [27] Inverse Optimal Control with Discount Factor for Continuous and Discrete-Time Control-Affine Systems and Reinforcement Learning
    Rodrigues, Luis
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 5783 - 5788
  • [28] Corticostriatal circuit mechanisms of value-based action selection: Implementation of reinforcement learning algorithms and beyond
    Morita, Kenji
    Jitsev, Jenia
    Morrison, Abigail
    BEHAVIOURAL BRAIN RESEARCH, 2016, 311 : 110 - 121
  • [29] Convex Programs and Lyapunov Functions for Reinforcement Learning: A Unified Perspective on the Analysis of Value-Based Methods
    Guo, Xingang
    Hu, Bin
    2022 AMERICAN CONTROL CONFERENCE, ACC, 2022, : 3317 - 3322
  • [30] Value-based reinforcement learning approaches for task offloading in Delay Constrained Vehicular Edge Computing
    Do Bao Son
    Ta Huu Binh
    Vo, Hiep Khac
    Binh Minh Nguyen
    Huynh Thi Thanh Binh
    Yu, Shui
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 113