Robust Reinforcement Learning with Bayesian Optimisation and Quadrature

被引:0
|
作者
Paul, Supratik [1 ]
Chatzilygeroudis, Konstantinos [2 ,3 ,4 ]
Ciosek, Kamil [1 ]
Mouret, Jean-Baptiste [2 ,3 ,4 ]
Osborne, Michael A. [5 ]
Whiteson, Shimon [1 ]
机构
[1] Univ Oxford, Dept Comp Sci, Wolfson Bldg,Parks Rd, Oxford OX1 3QD, England
[2] INRIA, Paris, France
[3] Univ Lorraine, Nancy, France
[4] CNRS, Paris, France
[5] Univ Oxford, Dept Engn Sci, Oxford, England
基金
欧洲研究理事会;
关键词
Reinforcement Learning; Bayesian Optimisation; Bayesian Quadrature; Significant rare events; Environment variables;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Bayesian optimisation has been successfully applied to a variety of reinforcement learning problems. However, the traditional approach for learning optimal policies in simulators does not utilise the opportunity to improve learning by adjusting certain environment variables: state features that are unobservable and randomly determined by the environment in a physical setting but are controllable in a simulator. This article considers the problem of finding a robust policy while taking into account the impact of environment variables. We present alternating optimisation and quadrature (ALOQ), which uses Bayesian optimisation and Bayesian quadrature to address such settings. We also present transferable ALOQ (TALOQ), for settings where simulator inaccuracies lead to difficulty in transferring the learnt policy to the physical system. We show that our algorithms are robust to the presence of significant rare events, which may not be observable under random sampling but play a substantial role in determining the optimal policy. Experimental results across different domains show that our algorithms learn robust policies efficiently.
引用
收藏
页数:31
相关论文
共 50 条
  • [21] Robust multi-agent reinforcement learning via Bayesian distributional value estimation
    Du, Xinqi
    Chen, Hechang
    Wang, Che
    Xing, Yongheng
    Yang, Jielong
    Yu, Philip S.
    Chang, Yi
    He, Lifang
    PATTERN RECOGNITION, 2024, 145
  • [22] An Approximate Bayesian Reinforcement Learning Approach Using Robust Control Policy and Tree Search
    Hishinuma, Toru
    Senda, Kei
    TWENTY-EIGHTH INTERNATIONAL CONFERENCE ON AUTOMATED PLANNING AND SCHEDULING (ICAPS 2018), 2018, : 417 - 421
  • [23] Robust Adversarial Reinforcement Learning
    Pinto, Lerrel
    Davidson, James
    Sukthankar, Rahul
    Gupta, Abhinav
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [24] Robust reinforcement learning control
    Kretchmar, RM
    Young, PM
    Anderson, CW
    Hittle, DC
    Anderson, ML
    Tu, J
    Delnero, CC
    PROCEEDINGS OF THE 2001 AMERICAN CONTROL CONFERENCE, VOLS 1-6, 2001, : 902 - 907
  • [25] Dynamic quadrature booster control using reinforcement learning
    Li, BH
    Wu, QH
    Wang, PY
    Zhou, XX
    UKACC INTERNATIONAL CONFERENCE ON CONTROL '98, VOLS I&II, 1998, : 993 - 998
  • [26] Online Reinforcement Learning by Bayesian Inference
    Xia, Zhongpu
    Zhao, Dongbin
    2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
  • [27] Bayesian Reinforcement Learning in Factored POMDPs
    Katt, Sammie
    Oliehoek, Frans A.
    Amato, Christopher
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 7 - 15
  • [28] A parallel framework for Bayesian reinforcement learning
    Barrett, Enda
    Duggan, Jim
    Howley, Enda
    CONNECTION SCIENCE, 2014, 26 (01) : 7 - 23
  • [29] Active Bayesian perception and reinforcement learning
    Lepora, Nathan F.
    Martinez-Hernandez, Uriel
    Pezzulo, Giovanni
    Prescott, Tony J.
    2013 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2013, : 4735 - 4740
  • [30] Bayesian reinforcement learning: A basic overview
    Kang, Pyungwon
    Tobler, Philippe N.
    Dayan, Peter
    NEUROBIOLOGY OF LEARNING AND MEMORY, 2024, 211