The impact of environmental stochasticity on value-based multiobjective reinforcement learning

被引:12
|
作者
Vamplew, Peter [1 ]
Foale, Cameron [1 ]
Dazeley, Richard [2 ]
机构
[1] Federat Univ, Ballarat, Vic, Australia
[2] Deakin Univ, Geelong, Vic, Australia
来源
NEURAL COMPUTING & APPLICATIONS | 2022年 / 34卷 / 03期
关键词
Multiobjective reinforcement learning; Multiobjective MDPs; Stochastic MDPs;
D O I
10.1007/s00521-021-05859-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A common approach to address multiobjective problems using reinforcement learning methods is to extend model-free, value-based algorithms such as Q-learning to use a vector of Q-values in combination with an appropriate action selection mechanism that is often based on scalarisation. Most prior empirical evaluation of these approaches has focused on deterministic environments. This study examines the impact on stochasticity in rewards and state transitions on the behaviour of multi-objective Q-learning. It shows that the nature of the optimal solution depends on these environmental characteristics, and also on whether we desire to maximise the Expected Scalarised Return (ESR) or the Scalarised Expected Return (SER). We also identify a novel aim which may arise in some applications of maximising SER subject to satisfying constraints on the variation in return and show that this may require different solutions than ESR or conventional SER. The analysis of the interaction between environmental stochasticity and multi-objective Q-learning is supported by empirical evaluations on several simple multiobjective Markov Decision Processes with varying characteristics. This includes a demonstration of a novel approach to learning deterministic SER-optimal policies for environments with stochastic rewards. In addition, we report a previously unidentified issue with model-free, value-based approaches to multiobjective reinforcement learning in the context of environments with stochastic state transitions. Having highlighted the limitations of value-based model-free MORL methods, we discuss several alternative methods that may be more suitable for maximising SER in MOMDPs with stochastic transitions.
引用
收藏
页码:1783 / 1799
页数:17
相关论文
共 50 条
  • [31] NPV-DQN: Improving Value-based Reinforcement Learning, by Variable Discount Factor, with Control Applications
    Paczolay, Gabor
    Harmati, Istvan
    ACTA POLYTECHNICA HUNGARICA, 2024, 21 (11) : 175 - 190
  • [32] How pupil responses track value-based decision-making during and after reinforcement learning
    Van Slooten, Joanne C.
    Jahfari, Sara
    Knapen, Tomas
    Theeuwes, Jan
    PLOS COMPUTATIONAL BIOLOGY, 2018, 14 (11)
  • [33] Impact of Relational Networks in Multi-Agent Learning: A Value-Based Factorization View
    Findik, Yasin
    Robinette, Paul
    Jerath, Kshitij
    Ahmadzadeh, S. Reza
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 4447 - 4454
  • [34] Multiobjective Battery Charging Strategy Based on Deep Reinforcement Learning
    Xiong, Zheng
    Luo, Biao
    Wang, Bing-Chuan
    Xu, Xiaodong
    Huang, Tingwen
    IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION, 2024, 10 (03): : 6893 - 6903
  • [35] The role of reinforcement learning and value-based decision-making frameworks in understanding food choice and eating behaviors
    Pearce, Alaina L. L.
    Fuchs, Bari A. A.
    Keller, Kathleen L. L.
    FRONTIERS IN NUTRITION, 2022, 9
  • [36] Value-based multi-agent deep reinforcement learning for collaborative computation offloading in internet of things networks
    Li, Han
    Meng, Shunmei
    Shang, Jing
    Huang, Anqi
    Cai, Zhicheng
    WIRELESS NETWORKS, 2024, 30 (08) : 6915 - 6928
  • [37] Stable and Efficient Shapley Value-Based Reward Reallocation for Multi-Agent Reinforcement Learning of Autonomous Vehicles
    Han, Songyang
    Wang, He
    Su, Sanbao
    Shi, Yuanyuan
    Miao, Fei
    2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 8765 - 8771
  • [38] Dopamine-Mediated Reinforcement Learning Signals in the Striatum and Ventromedial Prefrontal Cortex Underlie Value-Based Choices
    Jocham, Gerhard
    Klein, Tilmann A.
    Ullsperger, Markus
    JOURNAL OF NEUROSCIENCE, 2011, 31 (05): : 1606 - 1613
  • [39] Machine Learning as a Catalyst for Value-Based Health Care
    Matthew G. Crowson
    Timothy C. Y. Chan
    Journal of Medical Systems, 2020, 44
  • [40] Machine Learning as a Catalyst for Value-Based Health Care
    Crowson, Matthew G.
    Chan, Timothy C. Y.
    JOURNAL OF MEDICAL SYSTEMS, 2020, 44 (09)