Distributionally Robust Model-based Reinforcement Learning with Large State Spaces

被引:0
|
作者
Ramesh, Shyam Sundhar [1 ]
Sessa, Pier Giuseppe [2 ]
Hu, Yifan [3 ]
Krause, Andreas [2 ]
Bogunovic, Ilija [1 ]
机构
[1] UCL, London, England
[2] Swiss Fed Inst Technol, Zurich, Switzerland
[3] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
基金
英国工程与自然科学研究理事会;
关键词
MARKOV DECISION-PROCESSES;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment. To overcome these issues, we study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets. We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics, leveraging access to a generative model (i.e., simulator). We further demonstrate the statistical sample complexity of the proposed method for different uncertainty sets. These complexity bounds are independent of the number of states and extend beyond linear dynamics, ensuring the effectiveness of our approach in identifying near-optimal distributionally-robust policies. The proposed method can be further combined with other model-free distributionally robust reinforcement learning methods to obtain a near-optimal robust policy. Experimental results demonstrate the robustness of our algorithm to distributional shifts and its superior performance in terms of the number of samples needed.
引用
收藏
页数:42
相关论文
共 50 条
  • [21] Model-Based Reinforcement Learning Exploiting State-Action Equivalence
    Asadi, Mahsa
    Talebi, Mohammad Sadegh
    Bourel, Hippolyte
    Maillard, Odalric-Ambrym
    ASIAN CONFERENCE ON MACHINE LEARNING, VOL 101, 2019, 101 : 204 - 219
  • [22] Efficient state synchronisation in model-based testing through reinforcement learning
    Turker, Uraz Cengiz
    Hierons, Robert M.
    Mousavi, Mohammad Reza
    Tyukin, Ivan Y.
    2021 36TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING ASE 2021, 2021, : 368 - 380
  • [23] Learning to Paint With Model-based Deep Reinforcement Learning
    Huang, Zhewei
    Heng, Wen
    Zhou, Shuchang
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8708 - 8717
  • [24] Group Distributionally Robust Reinforcement Learning with Hierarchical Latent Variables
    Xu, Mengdi
    Huang, Peide
    Niu, Yaru
    Kumar, Visak
    Qiu, Jielin
    Fang, Chao
    Lee, Kuan-Hui
    Qi, Xuewei
    Lam, Henry
    Li, Bo
    Zhao, Ding
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [25] Incremental model-based reinforcement learning with model constraint
    Yang, Zhiyou
    Fu, Mingsheng
    Qu, Hong
    Li, Fan
    Shi, Shuqing
    Hu, Wang
    NEURAL NETWORKS, 2025, 185
  • [26] Objective Mismatch in Model-based Reinforcement Learning
    Lambert, Nathan
    Amos, Brandon
    Yadan, Omry
    Calandra, Roberto
    LEARNING FOR DYNAMICS AND CONTROL, VOL 120, 2020, 120 : 761 - 770
  • [27] Model-based reinforcement learning with dimension reduction
    Tangkaratt, Voot
    Morimoto, Jun
    Sugiyama, Masashi
    NEURAL NETWORKS, 2016, 84 : 1 - 16
  • [28] On Effective Scheduling of Model-based Reinforcement Learning
    Lai, Hang
    Shen, Jian
    Zhang, Weinan
    Huang, Yimin
    Zhang, Xing
    Tang, Ruiming
    Yu, Yong
    Li, Zhenguo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [29] Transferring Instances for Model-Based Reinforcement Learning
    Taylor, Matthew E.
    Jong, Nicholas K.
    Stone, Peter
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PART II, PROCEEDINGS, 2008, 5212 : 488 - 505
  • [30] MOReL: Model-Based Offline Reinforcement Learning
    Kidambi, Rahul
    Rajeswaran, Aravind
    Netrapalli, Praneeth
    Joachims, Thorsten
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33