Piecewise linear value function approximation for factored MDPs

被引：0

作者：

Poupart, P ^{[1
]}

Boutilier, C ^{[1
]}

Patrascu, R ^{[1
]}

Schuurmans, D ^{[1
]}

机构：

[1] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 3H5, Canada

来源：

EIGHTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-02)/FOURTEENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-02), PROCEEDINGS | 2002年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A number of proposals have been put forth in recent years for the solution of Markov decision processes (MDPs) whose state (and sometimes action) spaces are factored. One recent class of methods involves linear value function approximation, where the optimal value function is assumed to be a linear combination of some set of basis functions, with the aim of finding suitable weights. While sophisticated techniques have been developed for finding the best approximation within this constrained space, few methods have been proposed for choosing a suitable basis set, or modifying it if solution quality is found wanting. We propose a general framework, and specific proposals, that address. both of,these questions. In particular, we examine weakly coupled MDPS where a number of subtasks can be viewed independently modulo resource constraints. We then describe. methods for constructing a piecewise linear combination of the subtask value. functions, using greedy decision tree techniques. We argue that this architecture is suitable for many types of MDPs whose combinatorics are determined largely by the existence multiple conflicting objectives.

引用

页码：292 / 299

页数：8

共 50 条

[1] Direct value-approximation for factored MDPs
Schuurmans, D
Patrascu, R
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 1579 - 1586
[2] Basis Refinement Strategies for Linear Value Function Approximation in MDPs
Comanici, Gheorghe
Precup, Doina
Panangaden, Prakash
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
[3] Polynomial Time Reinforcement Learning in Factored State MDPs with Linear Value Functions
Deng, Zihao
Devic, Siddartha
Juba, Brendan
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[4] Efficient approximate linear programming for factored MDPs
Chen, Feng
Cheng, Qiang
Dong, Jianwu
Yu, Zhaofei
Wang, Guojun
Xu, Wenli
International Journal of Approximate Reasoning, 2015, 63 : 101 - 121
[5] Efficient approximate linear programming for factored MDPs
Chen, Feng
Cheng, Qiang
Dong, Jianwu
Yu, Zhaofei
Wang, Guojun
Xu, Wenli
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2015, 63 : 101 - 121
[6] Refined Regret for Adversarial MDPs with Linear Function Approximation
Dai, Yan
Luo, Haipeng
Wei, Chen-Yu
Zimmert, Julian
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
[7] An Analysis of Laplacian Methods for Value Function Approximation in MDPs
Petrik, Marek
20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2574 - 2579
[8] Pseudo-MDPs and Factored Linear Action Models
Yao, Hengshuai
Szepesvari, Csaba
Pires, Bernardo Avila
Zhang, Xinhua
2014 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING (ADPRL), 2014, : 189 - 197
[9] Computing factored value functions for policies in structured MDPs
Koller, D
Parr, R
IJCAI-99: PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 & 2, 1999, : 1332 - 1339
[10] Online Learning in MDPs with Linear Function Approximation and Bandit Feedback
Neu, Gergely
Olkhovskaya, Julia
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34

← 1 2 3 4 5 →