Bayes-adaptive hierarchical MDPs

被引：2

作者：

Ngo Anh Vien ^{[1
]}

Lee, SeungGwan ^{[2
]}

Chung, TaeChoong ^{[3
]}

机构：

[1] Univ Stuttgart, Machine Learning & Robot Lab, D-70174 Stuttgart, Germany

[2] Kyung Hee Univ, Coll Liberal Arts, 1 Seocheon Dong, Yongin 446701, Gyeonggi Do, South Korea

[3] Kyung Hee Univ, Dept Comp Engn, Yongin 446701, Gyeonggi Do, South Korea

来源：

APPLIED INTELLIGENCE | 2016年 / 45卷 / 01期

基金：

新加坡国家研究基金会;

关键词：

Reinforcement learning; Bayesian reinforcement learning; Hierarchical reinforcement learning; MDP; POMDP; POSMDP; Monte-Carlo tree search; Hierarchical Monte-Carlo planning; POLICY GRADIENT SMDP; RESOURCE-ALLOCATION;

D O I：

10.1007/s10489-015-0742-2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement learning (RL) is an area of machine learning that is concerned with how an agent learns to make decisions sequentially in order to optimize a particular performance measure. For achieving such a goal, the agent has to choose either 1) exploiting previously known knowledge that might end up at local optimality or 2) exploring to gather new knowledge that expects to improve the current performance. Among other RL algorithms, Bayesian model-based RL (BRL) is well-known to be able to trade-off between exploitation and exploration optimally via belief planning, i.e. partially observable Markov decision process (POMDP). However, solving that POMDP often suffers from curse of dimensionality and curse of history. In this paper, we make two major contributions which are: 1) an integration framework of temporal abstraction into BRL that eventually results in a hierarchical POMDP formulation, which can be solved online using a hierarchical sample-based planning solver; 2) a subgoal discovery method for hierarchical BRL that automatically discovers useful macro actions to accelerate learning. In the experiment section, we demonstrate that the proposed approach can scale up to much larger problems. On the other hand, the agent is able to discover useful subgoals for speeding up Bayesian reinforcement learning.

引用

页码：112 / 126

页数：15

共 50 条

[31] On minimaxity and admissibility of hierarchical Bayes estimators
Kubokawa, Tatsuya
Strawderman, William E.
JOURNAL OF MULTIVARIATE ANALYSIS, 2007, 98 (04) : 829 - 851
[32] A hierarchical Bayes model for human growth
Schmid, CH
Brown, EN
AMERICAN STATISTICAL ASSOCIATION - 1996 PROCEEDINGS OF THE SECTION ON BAYESIAN STATISTICAL SCIENCE, 1996, : 146 - 151
[33] Decelerated testing: A hierarchical Bayes approach
Singpurwalla, ND
TECHNOMETRICS, 2005, 47 (04) : 468 - 477
[34] Adaptive empirical Bayes filter
Deng, G.
ELECTRONICS LETTERS, 2017, 53 (21) : 1398 - 1399
[35] ROBUST HIERARCHICAL BAYES ESTIMATION OF EXCHANGEABLE MEANS
ANGERS, JF
BERGER, JO
CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 1991, 19 (01): : 39 - 56
[36] Optical proximity correction with hierarchical Bayes model
Matsunawa, Tetsuaki
Yu, Bei
Pan, David Z.
JOURNAL OF MICRO-NANOLITHOGRAPHY MEMS AND MOEMS, 2016, 15 (02):
[37] Hierarchical Bayes variable selection and microarray experiments
Nott, David J.
Yu, Zeming
Chan, Eva
Cotsapas, Chris
Cowley, Mark J.
Pulvers, Jeremy
Williams, Rohan
Little, Peter
JOURNAL OF MULTIVARIATE ANALYSIS, 2007, 98 (04) : 852 - 872
[38] Hierarchical Bayes Models for Response Time Data
Craigmile, Peter F.
Peruggia, Mario
Van Zandt, Trisha
PSYCHOMETRIKA, 2010, 75 (04) : 613 - 632
[39] ASYMPTOTIC OPTIMALITY OF HIERARCHICAL BAYES ESTIMATORS AND PREDICTORS
DATTA, GS
GHOSH, M
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 1991, 29 (03) : 229 - 243
[40] Hierarchical Naive Bayes for genetic association studies
Malovini, Alberto
Barbarini, Nicola
Bellazzi, Riccardo
BMC BIOINFORMATICS, 2012, 13

← 1 2 3 4 5 →