NPV-DQN: Improving Value-based Reinforcement Learning, by Variable Discount Factor, with Control Applications

被引：0

作者：

Paczolay, Gabor ^{[1
]}

Harmati, Istvan ^{[1
]}

机构：

[1] Budapest Univ Technol & Econ, Dept Control Engn, Magyar tudosok krt 2,1 bldg, H-1117 Budapest, Hungary

来源：

ACTA POLYTECHNICA HUNGARICA | 2024年 / 21卷 / 11期

关键词：

reinforcement learning; DQN; NPV; NPV-DQN;

D O I：

暂无

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

Discount factor plays an important role in reinforcement learning algorithms. It decides how much future rewards are valued for the present time-step. In this paper, a system with a Q value estimation, based on two distinct discount factors are utilized. These estimations can later be merged into one network, to make the computations more efficient. The decision of which network to use, is based on the relative value of the maximum value of the short-term network, the more unambiguous the maximum is, the more probability is rendered to the selection of that network. The system is then benchmarked, on a cartpole and a gridworld environment.

引用

页码：175 / 190

页数：16

共 50 条

[31] Trajectory Tracking Control of Variable Sweep Aircraft Based on Reinforcement Learning
Cao, Rui
Lu, Kelin
BIOMIMETICS, 2024, 9 (05)
[32] Value-Based Reinforcement Learning for Selective Disassembly Sequence Optimization Problems Demonstrating and Comparing a Proposed Model
Qin, Shujin
Bi, Zhiliang
Wang, Jiacun
Liu, Shixin
Guo, Xiwang
Zhao, Ziyan
Qi, Liang
IEEE SYSTEMS MAN AND CYBERNETICS MAGAZINE, 2024, 10 (02): : 24 - 31
[33] How pupil responses track value-based decision-making during and after reinforcement learning
Van Slooten, Joanne C.
Jahfari, Sara
Knapen, Tomas
Theeuwes, Jan
PLOS COMPUTATIONAL BIOLOGY, 2018, 14 (11)
[34] Improving performance of WSNs in IoT applications by transmission power control and adaptive learning rates in reinforcement learning
Chaukiyal, Arunita
TELECOMMUNICATION SYSTEMS, 2024, 87 (03) : 575 - 591
[35] Reinforcement Learning Based Variable Speed Limit Control for Mixed Traffic Flows
Vrbanic, Filip
Ivanjko, Edouard
Mandzuka, Sadko
Miletic, Mladen
2021 29TH MEDITERRANEAN CONFERENCE ON CONTROL AND AUTOMATION (MED), 2021, : 560 - 565
[36] Assessment of reinforcement learning applications for industrial control based on complexity measures
Grothoff, Julian
Camargo Torres, Nicolas
Kleinert, Tobias
AT-AUTOMATISIERUNGSTECHNIK, 2022, 70 (01) : 53 - 66
[37] The role of reinforcement learning and value-based decision-making frameworks in understanding food choice and eating behaviors
Pearce, Alaina L. L.
Fuchs, Bari A. A.
Keller, Kathleen L. L.
FRONTIERS IN NUTRITION, 2022, 9
[38] Value-based multi-agent deep reinforcement learning for collaborative computation offloading in internet of things networks
Li, Han
Meng, Shunmei
Shang, Jing
Huang, Anqi
Cai, Zhicheng
WIRELESS NETWORKS, 2024, 30 (08) : 6915 - 6928
[39] Stable and Efficient Shapley Value-Based Reward Reallocation for Multi-Agent Reinforcement Learning of Autonomous Vehicles
Han, Songyang
Wang, He
Su, Sanbao
Shi, Yuanyuan
Miao, Fei
2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 8765 - 8771
[40] Dopamine-Mediated Reinforcement Learning Signals in the Striatum and Ventromedial Prefrontal Cortex Underlie Value-Based Choices
Jocham, Gerhard
Klein, Tilmann A.
Ullsperger, Markus
JOURNAL OF NEUROSCIENCE, 2011, 31 (05): : 1606 - 1613

← 1 2 3 4 5 →