Robust Reinforcement Learning with Bayesian Optimisation and Quadrature

被引：0

作者：

Paul, Supratik ^{[1
]}

Chatzilygeroudis, Konstantinos ^{[2
,3
,4
]}

Ciosek, Kamil ^{[1
]}

Mouret, Jean-Baptiste ^{[2
,3
,4
]}

Osborne, Michael A. ^{[5
]}

Whiteson, Shimon ^{[1
]}

机构：

[1] Univ Oxford, Dept Comp Sci, Wolfson Bldg,Parks Rd, Oxford OX1 3QD, England

[2] INRIA, Paris, France

[3] Univ Lorraine, Nancy, France

[4] CNRS, Paris, France

[5] Univ Oxford, Dept Engn Sci, Oxford, England

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2020年 / 21卷

基金：

欧洲研究理事会;

关键词：

Reinforcement Learning; Bayesian Optimisation; Bayesian Quadrature; Significant rare events; Environment variables;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Bayesian optimisation has been successfully applied to a variety of reinforcement learning problems. However, the traditional approach for learning optimal policies in simulators does not utilise the opportunity to improve learning by adjusting certain environment variables: state features that are unobservable and randomly determined by the environment in a physical setting but are controllable in a simulator. This article considers the problem of finding a robust policy while taking into account the impact of environment variables. We present alternating optimisation and quadrature (ALOQ), which uses Bayesian optimisation and Bayesian quadrature to address such settings. We also present transferable ALOQ (TALOQ), for settings where simulator inaccuracies lead to difficulty in transferring the learnt policy to the physical system. We show that our algorithms are robust to the presence of significant rare events, which may not be observable under random sampling but play a substantial role in determining the optimal policy. Experimental results across different domains show that our algorithms learn robust policies efficiently.

引用

页数：31

共 50 条

[21] Robust multi-agent reinforcement learning via Bayesian distributional value estimation
Du, Xinqi
Chen, Hechang
Wang, Che
Xing, Yongheng
Yang, Jielong
Yu, Philip S.
Chang, Yi
He, Lifang
PATTERN RECOGNITION, 2024, 145
[22] An Approximate Bayesian Reinforcement Learning Approach Using Robust Control Policy and Tree Search
Hishinuma, Toru
Senda, Kei
TWENTY-EIGHTH INTERNATIONAL CONFERENCE ON AUTOMATED PLANNING AND SCHEDULING (ICAPS 2018), 2018, : 417 - 421
[23] Robust Adversarial Reinforcement Learning
Pinto, Lerrel
Davidson, James
Sukthankar, Rahul
Gupta, Abhinav
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[24] Robust reinforcement learning control
Kretchmar, RM
Young, PM
Anderson, CW
Hittle, DC
Anderson, ML
Tu, J
Delnero, CC
PROCEEDINGS OF THE 2001 AMERICAN CONTROL CONFERENCE, VOLS 1-6, 2001, : 902 - 907
[25] Dynamic quadrature booster control using reinforcement learning
Li, BH
Wu, QH
Wang, PY
Zhou, XX
UKACC INTERNATIONAL CONFERENCE ON CONTROL '98, VOLS I&II, 1998, : 993 - 998
[26] Online Reinforcement Learning by Bayesian Inference
Xia, Zhongpu
Zhao, Dongbin
2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
[27] Bayesian Reinforcement Learning in Factored POMDPs
Katt, Sammie
Oliehoek, Frans A.
Amato, Christopher
AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 7 - 15
[28] A parallel framework for Bayesian reinforcement learning
Barrett, Enda
Duggan, Jim
Howley, Enda
CONNECTION SCIENCE, 2014, 26 (01) : 7 - 23
[29] Active Bayesian perception and reinforcement learning
Lepora, Nathan F.
Martinez-Hernandez, Uriel
Pezzulo, Giovanni
Prescott, Tony J.
2013 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2013, : 4735 - 4740
[30] Bayesian reinforcement learning: A basic overview
Kang, Pyungwon
Tobler, Philippe N.
Dayan, Peter
NEUROBIOLOGY OF LEARNING AND MEMORY, 2024, 211

← 1 2 3 4 5 →