Efficient exploration through active learning for value function approximation in reinforcement learning

被引:15
|
作者
Akiyama, Takayuki [1 ]
Hachiya, Hirotaka [1 ]
Sugiyama, Masashi [1 ,2 ]
机构
[1] Tokyo Inst Technol, Dept Comp Sci, Meguro Ku, Tokyo 1528552, Japan
[2] Japan Sci & Technol Agcy, PRESTO, Tokyo, Japan
关键词
Reinforcement learning; Markov decision process; Least-squares policy iteration; Active learning; Batting robot; REGRESSION;
D O I
10.1016/j.neunet.2009.12.010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares policy iteration (LSPI) framework allows us to employ statistical active learning methods for linear regression. Then we propose a design method of good sampling policies for efficient exploration, which is particularly useful when the sampling cost of immediate rewards is high. The effectiveness of the proposed method, which we call active policy iteration (API), is demonstrated through simulations with a batting robot. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:639 / 648
页数:10
相关论文
共 50 条
  • [21] How Active is Active Learning: Value Function Method Versus an Approximation Method
    Amman, Hans M.
    Tucci, Marco P.
    COMPUTATIONAL ECONOMICS, 2020, 56 (03) : 675 - 693
  • [22] How Active is Active Learning: Value Function Method Versus an Approximation Method
    Hans M. Amman
    Marco P. Tucci
    Computational Economics, 2020, 56 : 675 - 693
  • [23] Rethinking Value Function Learning for Generalization in Reinforcement Learning
    Moon, Seungyong
    Lee, JunYeong
    Song, Hyun Oh
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [24] Explicit Planning for Efficient Exploration in Reinforcement Learning
    Zhang, Liangpeng
    Tang, Ke
    Yao, Xin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [25] Online Support Vector Regression based Value Function Approximation for Reinforcement Learning
    Lee, Dong-Hyun
    Quang, Vo Van
    Jo, Sungho
    Lee, Ju-Jang
    ISIE: 2009 IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS, 2009, : 449 - +
  • [26] Uncertainty Propagation for Efficient Exploration in Reinforcement Learning
    Hans, Alexander
    Udluft, Steffen
    ECAI 2010 - 19TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2010, 215 : 361 - 366
  • [27] The Value Function Polytope in Reinforcement Learning
    Dadashi, Robert
    Taiga, Adrien Ali
    Le Roux, Nicolas
    Schuurmans, Dale
    Bellemare, Marc G. L.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [28] Provably Efficient Reinforcement Learning with Linear Function Approximation under Adaptivity Constraints
    Wang, Tianhao
    Zhou, Dongruo
    Gu, Quanquan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [29] Multiagent reinforcement learning using function approximation
    Abul, O
    Polat, F
    Alhajj, R
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2000, 30 (04): : 485 - 497
  • [30] Resilient Multiagent Reinforcement Learning With Function Approximation
    Ye, Lintao
    Figura, Martin
    Lin, Yixuan
    Pal, Mainak
    Das, Pranoy
    Liu, Ji
    Gupta, Vijay
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (12) : 8497 - 8512