The impact of environmental stochasticity on value-based multiobjective reinforcement learning

被引:12
|
作者
Vamplew, Peter [1 ]
Foale, Cameron [1 ]
Dazeley, Richard [2 ]
机构
[1] Federat Univ, Ballarat, Vic, Australia
[2] Deakin Univ, Geelong, Vic, Australia
来源
NEURAL COMPUTING & APPLICATIONS | 2022年 / 34卷 / 03期
关键词
Multiobjective reinforcement learning; Multiobjective MDPs; Stochastic MDPs;
D O I
10.1007/s00521-021-05859-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A common approach to address multiobjective problems using reinforcement learning methods is to extend model-free, value-based algorithms such as Q-learning to use a vector of Q-values in combination with an appropriate action selection mechanism that is often based on scalarisation. Most prior empirical evaluation of these approaches has focused on deterministic environments. This study examines the impact on stochasticity in rewards and state transitions on the behaviour of multi-objective Q-learning. It shows that the nature of the optimal solution depends on these environmental characteristics, and also on whether we desire to maximise the Expected Scalarised Return (ESR) or the Scalarised Expected Return (SER). We also identify a novel aim which may arise in some applications of maximising SER subject to satisfying constraints on the variation in return and show that this may require different solutions than ESR or conventional SER. The analysis of the interaction between environmental stochasticity and multi-objective Q-learning is supported by empirical evaluations on several simple multiobjective Markov Decision Processes with varying characteristics. This includes a demonstration of a novel approach to learning deterministic SER-optimal policies for environments with stochastic rewards. In addition, we report a previously unidentified issue with model-free, value-based approaches to multiobjective reinforcement learning in the context of environments with stochastic state transitions. Having highlighted the limitations of value-based model-free MORL methods, we discuss several alternative methods that may be more suitable for maximising SER in MOMDPs with stochastic transitions.
引用
收藏
页码:1783 / 1799
页数:17
相关论文
共 50 条
  • [41] The spillover effects of attentional learning on value-based choice
    Gwinn, Rachael
    Leber, Andrew B.
    Krajbich, Ian
    COGNITION, 2019, 182 : 294 - 306
  • [42] Potential-based multiobjective reinforcement learning approaches to low-impact agents for AI safety
    Vamplew, Peter
    Foale, Cameron
    Dazeley, Richard
    Bignold, Adam
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 100
  • [43] Confidence modulates exploration and exploitation in value-based learning
    Boldt, Annika
    Blundell, Charles
    De Martino, Benedetto
    NEUROSCIENCE OF CONSCIOUSNESS, 2019, 5 (01)
  • [44] Meta-Learning-Based Deep Reinforcement Learning for Multiobjective Optimization Problems
    Zhang, Zizhen
    Wu, Zhiyuan
    Zhang, Hang
    Wang, Jiahai
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (10) : 7978 - 7991
  • [45] Diabetes Educator Impact in Value-Based Care Models
    Koshinsky, Janice L.
    Krall, Jodi
    Ruppert, Kristine
    Kanter, Justin
    Solano, Francis X., Jr.
    Siminerio, Linda M.
    DIABETES, 2018, 67
  • [46] THE IMPACT OF VALUE-BASED CARE ON ONCOLOGY SPECIALTY PHYSICIANS
    Lord, K.
    Feinberg, B. A.
    Klink, A.
    Kish, J. K.
    Yeh, T. C.
    Phillips, E. G., Jr.
    VALUE IN HEALTH, 2019, 22 : S493 - S493
  • [47] The Impact of Hospitalists on Value-Based Purchasing Program Scores
    Spaulding, Aaron
    Choate, Stephen
    Hamadi, Hanadi
    Zhao, Mei
    JOURNAL OF HEALTHCARE MANAGEMENT, 2018, 63 (04) : E43 - E58
  • [48] CODING IS CRITICAL! THE IMPACT OF VALUE-BASED PAYMENT EDUCATION
    Schwartz, Jessica
    Gwynn, Kendrick
    Rebbert, Nicholas
    McGuire, Maura J.
    JOURNAL OF GENERAL INTERNAL MEDICINE, 2021, 36 (SUPPL 1) : S404 - S404
  • [49] The Impact of Value-Based Insurance Design on Oncology Drugs
    Fendrick, A. Mark
    CLINICAL ADVANCES IN HEMATOLOGY & ONCOLOGY, 2016, 14 (01) : 14 - 16
  • [50] A multistage value-based model for prioritization of distribution projects using a multiobjective genetic algorithm
    Mussoi F.L.R.
    Teive R.C.G.
    Journal of Control, Automation and Electrical Systems, 2013, 24 (05) : 623 - 637