NPV-DQN: Improving Value-based Reinforcement Learning, by Variable Discount Factor, with Control Applications

被引:0
|
作者
Paczolay, Gabor [1 ]
Harmati, Istvan [1 ]
机构
[1] Budapest Univ Technol & Econ, Dept Control Engn, Magyar tudosok krt 2,1 bldg, H-1117 Budapest, Hungary
关键词
reinforcement learning; DQN; NPV; NPV-DQN;
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Discount factor plays an important role in reinforcement learning algorithms. It decides how much future rewards are valued for the present time-step. In this paper, a system with a Q value estimation, based on two distinct discount factors are utilized. These estimations can later be merged into one network, to make the computations more efficient. The decision of which network to use, is based on the relative value of the maximum value of the short-term network, the more unambiguous the maximum is, the more probability is rendered to the selection of that network. The system is then benchmarked, on a cartpole and a gridworld environment.
引用
收藏
页码:175 / 190
页数:16
相关论文
共 50 条
  • [1] MetaLight: Value-Based Meta-Reinforcement Learning for Traffic Signal Control
    Zang, Xinshi
    Yao, Huaxiu
    Zheng, Guanjie
    Xu, Nan
    Xu, Kai
    Li, Zhenhui
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 1153 - 1160
  • [2] Value-based deep reinforcement learning for adaptive isolated intersection signal control
    Wan, Chia-Hao
    Hwang, Ming-Chorng
    IET INTELLIGENT TRANSPORT SYSTEMS, 2018, 12 (09) : 1005 - 1010
  • [3] Reinforcement Learning for value-based Placement of Fog Services
    Poltronieri, Filippo
    Tortonesi, Mauro
    Stefanelli, Cesare
    Suri, Niranjan
    2021 IFIP/IEEE INTERNATIONAL SYMPOSIUM ON INTEGRATED NETWORK MANAGEMENT (IM 2021), 2021, : 466 - 472
  • [4] A reinforcement learning diffusion decision model for value-based decisions
    Laura Fontanesi
    Sebastian Gluth
    Mikhail S. Spektor
    Jörg Rieskamp
    Psychonomic Bulletin & Review, 2019, 26 : 1099 - 1121
  • [5] Value-Based Reinforcement Learning for Digital Twins in Cloud Computing
    Van-Phuc Bui
    Pandey, Shashi Raj
    de Sant Ana, Pedro M.
    Popovski, Petar
    ICC 2024 - IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2024, : 1413 - 1418
  • [6] The impact of environmental stochasticity on value-based multiobjective reinforcement learning
    Vamplew, Peter
    Foale, Cameron
    Dazeley, Richard
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (03): : 1783 - 1799
  • [7] A reinforcement learning diffusion decision model for value-based decisions
    Fontanesi, Laura
    Gluth, Sebastian
    Spektor, Mikhail S.
    Rieskamp, Joerg
    PSYCHONOMIC BULLETIN & REVIEW, 2019, 26 (04) : 1099 - 1121
  • [8] The impact of environmental stochasticity on value-based multiobjective reinforcement learning
    Peter Vamplew
    Cameron Foale
    Richard Dazeley
    Neural Computing and Applications, 2022, 34 : 1783 - 1799
  • [9] Transition Based Discount Factor for Model Free Algorithms in Reinforcement Learning
    Sharma, Abhinav
    Gupta, Ruchir
    Lakshmanan, K.
    Gupta, Atul
    SYMMETRY-BASEL, 2021, 13 (07):
  • [10] A multi process value-based reinforcement learning environment framework for adaptive traffic signal control
    Cao, Jie
    Huang, Dailin
    Hou, Liang
    Ma, Jialin
    JOURNAL OF CONTROL AND DECISION, 2023, 10 (02) : 229 - 236