NPV-DQN: Improving Value-based Reinforcement Learning, by Variable Discount Factor, with Control Applications

被引:0
|
作者
Paczolay, Gabor [1 ]
Harmati, Istvan [1 ]
机构
[1] Budapest Univ Technol & Econ, Dept Control Engn, Magyar tudosok krt 2,1 bldg, H-1117 Budapest, Hungary
关键词
reinforcement learning; DQN; NPV; NPV-DQN;
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Discount factor plays an important role in reinforcement learning algorithms. It decides how much future rewards are valued for the present time-step. In this paper, a system with a Q value estimation, based on two distinct discount factors are utilized. These estimations can later be merged into one network, to make the computations more efficient. The decision of which network to use, is based on the relative value of the maximum value of the short-term network, the more unambiguous the maximum is, the more probability is rendered to the selection of that network. The system is then benchmarked, on a cartpole and a gridworld environment.
引用
收藏
页码:175 / 190
页数:16
相关论文
共 50 条
  • [31] Trajectory Tracking Control of Variable Sweep Aircraft Based on Reinforcement Learning
    Cao, Rui
    Lu, Kelin
    BIOMIMETICS, 2024, 9 (05)
  • [32] Value-Based Reinforcement Learning for Selective Disassembly Sequence Optimization Problems Demonstrating and Comparing a Proposed Model
    Qin, Shujin
    Bi, Zhiliang
    Wang, Jiacun
    Liu, Shixin
    Guo, Xiwang
    Zhao, Ziyan
    Qi, Liang
    IEEE SYSTEMS MAN AND CYBERNETICS MAGAZINE, 2024, 10 (02): : 24 - 31
  • [33] How pupil responses track value-based decision-making during and after reinforcement learning
    Van Slooten, Joanne C.
    Jahfari, Sara
    Knapen, Tomas
    Theeuwes, Jan
    PLOS COMPUTATIONAL BIOLOGY, 2018, 14 (11)
  • [34] Improving performance of WSNs in IoT applications by transmission power control and adaptive learning rates in reinforcement learning
    Chaukiyal, Arunita
    TELECOMMUNICATION SYSTEMS, 2024, 87 (03) : 575 - 591
  • [35] Reinforcement Learning Based Variable Speed Limit Control for Mixed Traffic Flows
    Vrbanic, Filip
    Ivanjko, Edouard
    Mandzuka, Sadko
    Miletic, Mladen
    2021 29TH MEDITERRANEAN CONFERENCE ON CONTROL AND AUTOMATION (MED), 2021, : 560 - 565
  • [36] Assessment of reinforcement learning applications for industrial control based on complexity measures
    Grothoff, Julian
    Camargo Torres, Nicolas
    Kleinert, Tobias
    AT-AUTOMATISIERUNGSTECHNIK, 2022, 70 (01) : 53 - 66
  • [37] The role of reinforcement learning and value-based decision-making frameworks in understanding food choice and eating behaviors
    Pearce, Alaina L. L.
    Fuchs, Bari A. A.
    Keller, Kathleen L. L.
    FRONTIERS IN NUTRITION, 2022, 9
  • [38] Value-based multi-agent deep reinforcement learning for collaborative computation offloading in internet of things networks
    Li, Han
    Meng, Shunmei
    Shang, Jing
    Huang, Anqi
    Cai, Zhicheng
    WIRELESS NETWORKS, 2024, 30 (08) : 6915 - 6928
  • [39] Stable and Efficient Shapley Value-Based Reward Reallocation for Multi-Agent Reinforcement Learning of Autonomous Vehicles
    Han, Songyang
    Wang, He
    Su, Sanbao
    Shi, Yuanyuan
    Miao, Fei
    2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 8765 - 8771
  • [40] Dopamine-Mediated Reinforcement Learning Signals in the Striatum and Ventromedial Prefrontal Cortex Underlie Value-Based Choices
    Jocham, Gerhard
    Klein, Tilmann A.
    Ullsperger, Markus
    JOURNAL OF NEUROSCIENCE, 2011, 31 (05): : 1606 - 1613