The impact of environmental stochasticity on value-based multiobjective reinforcement learning

被引:12
|
作者
Vamplew, Peter [1 ]
Foale, Cameron [1 ]
Dazeley, Richard [2 ]
机构
[1] Federat Univ, Ballarat, Vic, Australia
[2] Deakin Univ, Geelong, Vic, Australia
来源
NEURAL COMPUTING & APPLICATIONS | 2022年 / 34卷 / 03期
关键词
Multiobjective reinforcement learning; Multiobjective MDPs; Stochastic MDPs;
D O I
10.1007/s00521-021-05859-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A common approach to address multiobjective problems using reinforcement learning methods is to extend model-free, value-based algorithms such as Q-learning to use a vector of Q-values in combination with an appropriate action selection mechanism that is often based on scalarisation. Most prior empirical evaluation of these approaches has focused on deterministic environments. This study examines the impact on stochasticity in rewards and state transitions on the behaviour of multi-objective Q-learning. It shows that the nature of the optimal solution depends on these environmental characteristics, and also on whether we desire to maximise the Expected Scalarised Return (ESR) or the Scalarised Expected Return (SER). We also identify a novel aim which may arise in some applications of maximising SER subject to satisfying constraints on the variation in return and show that this may require different solutions than ESR or conventional SER. The analysis of the interaction between environmental stochasticity and multi-objective Q-learning is supported by empirical evaluations on several simple multiobjective Markov Decision Processes with varying characteristics. This includes a demonstration of a novel approach to learning deterministic SER-optimal policies for environments with stochastic rewards. In addition, we report a previously unidentified issue with model-free, value-based approaches to multiobjective reinforcement learning in the context of environments with stochastic state transitions. Having highlighted the limitations of value-based model-free MORL methods, we discuss several alternative methods that may be more suitable for maximising SER in MOMDPs with stochastic transitions.
引用
收藏
页码:1783 / 1799
页数:17
相关论文
共 50 条
  • [21] Convex Programs and Lyapunov Functions for Reinforcement Learning: A Unified Perspective on the Analysis of Value-Based Methods
    Guo, Xingang
    Hu, Bin
    2022 AMERICAN CONTROL CONFERENCE, ACC, 2022, : 3317 - 3322
  • [22] A multi process value-based reinforcement learning environment framework for adaptive traffic signal control
    Cao, Jie
    Huang, Dailin
    Hou, Liang
    Ma, Jialin
    JOURNAL OF CONTROL AND DECISION, 2023, 10 (02) : 229 - 236
  • [23] Value-based reinforcement learning approaches for task offloading in Delay Constrained Vehicular Edge Computing
    Do Bao Son
    Ta Huu Binh
    Vo, Hiep Khac
    Binh Minh Nguyen
    Huynh Thi Thanh Binh
    Yu, Shui
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 113
  • [24] Multiobjective Evaluation of Reinforcement Learning Based Recommender Systems
    Grishanov, Alexey
    Ianinat, Anastasia
    Vorontsov, Konstantin
    PROCEEDINGS OF THE 16TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2022, 2022, : 622 - 627
  • [25] Learning Options in Multiobjective Reinforcement Learning
    Bonini, Rodrigo Cesar
    da Silva, Felipe Leno
    Reali Costa, Anna Helena
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4907 - 4908
  • [26] VALUE-BASED LEARNING - BUILDING A BRIDGE TO CHANGE
    Fifield, Charles H.
    IDEAS IN MARKETING: FINDING THE NEW AND POLISHING THE OLD, 2015, : 308 - 308
  • [27] Value-Based Deep Learning Hardware Acceleration
    Moshovos, Andreas
    2018 11TH INTERNATIONAL WORKSHOP ON NETWORK ON CHIP ARCHITECTURES (NOCARC), 2018, : 1 - 1
  • [28] Value-Based Deep-Learning Acceleration
    Moshovos, Andreas
    Albericio, Jorge
    Judd, Patrick
    Lascorz, Alberto Delmas
    Sharify, Sayeh
    Hetherington, Tayler
    Aamodt, Tor
    Jerger, Natalie Enright
    IEEE MICRO, 2018, 38 (01) : 41 - 55
  • [29] Value-Based Reinforcement Learning for Selective Disassembly Sequence Optimization Problems Demonstrating and Comparing a Proposed Model
    Qin, Shujin
    Bi, Zhiliang
    Wang, Jiacun
    Liu, Shixin
    Guo, Xiwang
    Zhao, Ziyan
    Qi, Liang
    IEEE SYSTEMS MAN AND CYBERNETICS MAGAZINE, 2024, 10 (02): : 24 - 31
  • [30] Environmental-Impact-Based Multi-Agent Reinforcement Learning
    Alamiyan-Harandi, Farinaz
    Ramazi, Pouria
    APPLIED SCIENCES-BASEL, 2024, 14 (15):