Efficient exploration through active learning for value function approximation in reinforcement learning

被引:15
|
作者
Akiyama, Takayuki [1 ]
Hachiya, Hirotaka [1 ]
Sugiyama, Masashi [1 ,2 ]
机构
[1] Tokyo Inst Technol, Dept Comp Sci, Meguro Ku, Tokyo 1528552, Japan
[2] Japan Sci & Technol Agcy, PRESTO, Tokyo, Japan
关键词
Reinforcement learning; Markov decision process; Least-squares policy iteration; Active learning; Batting robot; REGRESSION;
D O I
10.1016/j.neunet.2009.12.010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares policy iteration (LSPI) framework allows us to employ statistical active learning methods for linear regression. Then we propose a design method of good sampling policies for efficient exploration, which is particularly useful when the sampling cost of immediate rewards is high. The effectiveness of the proposed method, which we call active policy iteration (API), is demonstrated through simulations with a batting robot. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:639 / 648
页数:10
相关论文
共 50 条
  • [41] Tensor and Matrix Low-Rank Value-Function Approximation in Reinforcement Learning
    Rozada, Sergio
    Paternain, Santiago
    Marques, Antonio
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2024, 72 : 1634 - 1649
  • [42] Dynamic Spectrum Anti-Jamming With Reinforcement Learning Based on Value Function Approximation
    Zhu, Xinyu
    Huang, Yang
    Wang, Shaoyu
    Wu, Qihui
    Ge, Xiaohu
    Liu, Yuan
    Gao, Zhen
    IEEE WIRELESS COMMUNICATIONS LETTERS, 2023, 12 (02) : 386 - 390
  • [43] The Role of Lookahead and Approximate Policy Evaluation in Reinforcement Learning with Linear Value Function Approximation
    Winnicki, Anna
    Lubars, Joseph
    Livesay, Michael
    Srikant, R.
    OPERATIONS RESEARCH, 2025, 73 (01)
  • [44] Adaptive importance sampling for value function approximation in off-policy reinforcement learning
    Hachiya, Hirotaka
    Akiyama, Takayuki
    Sugiayma, Masashi
    Peters, Jan
    NEURAL NETWORKS, 2009, 22 (10) : 1399 - 1410
  • [45] Model-Free Active Exploration in Reinforcement Learning
    Russo, Alessio
    Proutiere, Alexandre
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [46] Ramp Metering for a Distant Downstream Bottleneck Using Reinforcement Learning with Value Function Approximation
    Zhou, Yue
    Ozbay, Kaan
    Kachroo, Pushkin
    Zuo, Fan
    JOURNAL OF ADVANCED TRANSPORTATION, 2020, 2020 (2020)
  • [47] Active exploration is important for reinforcement learning of interval timing
    Osamu Shouno
    Hiroshi Tsujino
    BMC Neuroscience, 12 (Suppl 1)
  • [48] A Clustering-Based Graph Laplacian Framework for Value Function Approximation in Reinforcement Learning
    Xu, Xin
    Huang, Zhenhua
    Graves, Daniel
    Pedrycz, Witold
    IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (12) : 2613 - 2625
  • [49] Restricted gradient-descent algorithm for value-function approximation in reinforcement learning
    Salles Barreto, Andre da Motta
    Anderson, Charles W.
    ARTIFICIAL INTELLIGENCE, 2008, 172 (4-5) : 454 - 482
  • [50] Efficient Exploration in Resource-Restricted Reinforcement Learning
    Wang, Zhihai
    Pan, Taoxing
    Zhou, Qi
    Wang, Jie
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 10279 - 10287