Efficient exploration through active learning for value function approximation in reinforcement learning

被引：15

作者：

Akiyama, Takayuki ^{[1
]}

Hachiya, Hirotaka ^{[1
]}

Sugiyama, Masashi ^{[1
,2
]}

机构：

[1] Tokyo Inst Technol, Dept Comp Sci, Meguro Ku, Tokyo 1528552, Japan

[2] Japan Sci & Technol Agcy, PRESTO, Tokyo, Japan

来源：

NEURAL NETWORKS | 2010年 / 23卷 / 05期

关键词：

Reinforcement learning; Markov decision process; Least-squares policy iteration; Active learning; Batting robot; REGRESSION;

D O I：

10.1016/j.neunet.2009.12.010

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares policy iteration (LSPI) framework allows us to employ statistical active learning methods for linear regression. Then we propose a design method of good sampling policies for efficient exploration, which is particularly useful when the sampling cost of immediate rewards is high. The effectiveness of the proposed method, which we call active policy iteration (API), is demonstrated through simulations with a batting robot. (C) 2010 Elsevier Ltd. All rights reserved.

引用

页码：639 / 648

页数：10

共 50 条

[31] Ensemble Methods for Reinforcement Learning with Function Approximation
Fausser, Stefan
Schwenker, Friedhelm
MULTIPLE CLASSIFIER SYSTEMS, 2011, 6713 : 56 - 65
[32] Distributional reinforcement learning with linear function approximation
Bellemare, Marc G.
Le Roux, Nicolas
Castro, Pablo Samuel
Moitra, Subhodeep
22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
[33] Reinforcement learning with function approximation converges to a region
Gordon, GJ
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 13, 2001, 13 : 1040 - 1046
[34] Parallel reinforcement learning with linear function approximation
Grounds, Matthew
Kudenko, Daniel
ADAPTIVE AGENTS AND MULTI-AGENT SYSTEMS, 2008, 4865 : 60 - 74
[35] Safe Reinforcement Learning with Linear Function Approximation
Amani, Sanae
Thrampoulidis, Christos
Yang, Lin F.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[36] Autonomous exploration through deep reinforcement learning
Yan, Xiangda
Huang, Jie
He, Keyan
Hong, Huajie
Xu, Dasheng
INDUSTRIAL ROBOT-THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH AND APPLICATION, 2023, 50 (05): : 793 - 803
[37] Learning to Interrupt: A Hierarchical Deep Reinforcement Learning Framework for Efficient Exploration
Li, Tingguang
Pan, Jin
Zhu, Delong
Meng, Max Q. -H.
2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO), 2018, : 648 - 653
[38] Sample efficient reinforcement learning with active learning for molecular design
Dodds, Michael
Guo, Jeff
Loehr, Thomas
Tibo, Alessandro
Engkvist, Ola
Janet, Jon Paul
CHEMICAL SCIENCE, 2024, 15 (11) : 4146 - 4160
[39] Object Learning Through Active Exploration
Ivaldi, Serena
Sao Mai Nguyen
Lyubova, Natalia
Droniou, Alain
Padois, Vincent
Filliat, David
Oudeyer, Pierre-Yves
Sigaud, Olivier
IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, 2014, 6 (01) : 56 - 72
[40] A Novel Artificial Hydrocarbon Networks Based Value Function Approximation in Hierarchical Reinforcement Learning
Ponce, Hiram
ADVANCES IN SOFT COMPUTING, MICAI 2016, PT II, 2017, 10062 : 211 - 225

← 1 2 3 4 5 →