Efficient exploration through active learning for value function approximation in reinforcement learning

被引:15
|
作者
Akiyama, Takayuki [1 ]
Hachiya, Hirotaka [1 ]
Sugiyama, Masashi [1 ,2 ]
机构
[1] Tokyo Inst Technol, Dept Comp Sci, Meguro Ku, Tokyo 1528552, Japan
[2] Japan Sci & Technol Agcy, PRESTO, Tokyo, Japan
关键词
Reinforcement learning; Markov decision process; Least-squares policy iteration; Active learning; Batting robot; REGRESSION;
D O I
10.1016/j.neunet.2009.12.010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares policy iteration (LSPI) framework allows us to employ statistical active learning methods for linear regression. Then we propose a design method of good sampling policies for efficient exploration, which is particularly useful when the sampling cost of immediate rewards is high. The effectiveness of the proposed method, which we call active policy iteration (API), is demonstrated through simulations with a batting robot. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:639 / 648
页数:10
相关论文
共 50 条
  • [31] Ensemble Methods for Reinforcement Learning with Function Approximation
    Fausser, Stefan
    Schwenker, Friedhelm
    MULTIPLE CLASSIFIER SYSTEMS, 2011, 6713 : 56 - 65
  • [32] Distributional reinforcement learning with linear function approximation
    Bellemare, Marc G.
    Le Roux, Nicolas
    Castro, Pablo Samuel
    Moitra, Subhodeep
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [33] Reinforcement learning with function approximation converges to a region
    Gordon, GJ
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 13, 2001, 13 : 1040 - 1046
  • [34] Parallel reinforcement learning with linear function approximation
    Grounds, Matthew
    Kudenko, Daniel
    ADAPTIVE AGENTS AND MULTI-AGENT SYSTEMS, 2008, 4865 : 60 - 74
  • [35] Safe Reinforcement Learning with Linear Function Approximation
    Amani, Sanae
    Thrampoulidis, Christos
    Yang, Lin F.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [36] Autonomous exploration through deep reinforcement learning
    Yan, Xiangda
    Huang, Jie
    He, Keyan
    Hong, Huajie
    Xu, Dasheng
    INDUSTRIAL ROBOT-THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH AND APPLICATION, 2023, 50 (05): : 793 - 803
  • [37] Learning to Interrupt: A Hierarchical Deep Reinforcement Learning Framework for Efficient Exploration
    Li, Tingguang
    Pan, Jin
    Zhu, Delong
    Meng, Max Q. -H.
    2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO), 2018, : 648 - 653
  • [38] Sample efficient reinforcement learning with active learning for molecular design
    Dodds, Michael
    Guo, Jeff
    Loehr, Thomas
    Tibo, Alessandro
    Engkvist, Ola
    Janet, Jon Paul
    CHEMICAL SCIENCE, 2024, 15 (11) : 4146 - 4160
  • [39] Object Learning Through Active Exploration
    Ivaldi, Serena
    Sao Mai Nguyen
    Lyubova, Natalia
    Droniou, Alain
    Padois, Vincent
    Filliat, David
    Oudeyer, Pierre-Yves
    Sigaud, Olivier
    IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, 2014, 6 (01) : 56 - 72
  • [40] A Novel Artificial Hydrocarbon Networks Based Value Function Approximation in Hierarchical Reinforcement Learning
    Ponce, Hiram
    ADVANCES IN SOFT COMPUTING, MICAI 2016, PT II, 2017, 10062 : 211 - 225