Efficient exploration through active learning for value function approximation in reinforcement learning

被引:15
|
作者
Akiyama, Takayuki [1 ]
Hachiya, Hirotaka [1 ]
Sugiyama, Masashi [1 ,2 ]
机构
[1] Tokyo Inst Technol, Dept Comp Sci, Meguro Ku, Tokyo 1528552, Japan
[2] Japan Sci & Technol Agcy, PRESTO, Tokyo, Japan
关键词
Reinforcement learning; Markov decision process; Least-squares policy iteration; Active learning; Batting robot; REGRESSION;
D O I
10.1016/j.neunet.2009.12.010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares policy iteration (LSPI) framework allows us to employ statistical active learning methods for linear regression. Then we propose a design method of good sampling policies for efficient exploration, which is particularly useful when the sampling cost of immediate rewards is high. The effectiveness of the proposed method, which we call active policy iteration (API), is demonstrated through simulations with a batting robot. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:639 / 648
页数:10
相关论文
共 50 条
  • [1] Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning
    Akiyama, Takayuki
    Hachiya, Hirotaka
    Sugiyama, Masashi
    21ST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-09), PROCEEDINGS, 2009, : 980 - 985
  • [2] Randomized Exploration for Reinforcement Learning with General Value Function Approximation
    Ishfaq, Haque
    Cui, Qiwen
    Viet Nguyen
    Ayoub, Alex
    Yang, Zhuoran
    Wang, Zhaoran
    Precup, Doina
    Yang, Lin F.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [3] Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning
    Tan, Tian
    Xiong, Zhihan
    Dwaracherla, Vikranth R.
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 5948 - 5955
  • [4] AN ACTIVE EXPLORATION METHOD FOR DATA EFFICIENT REINFORCEMENT LEARNING
    Zhao, Dongfang
    Liu, Jiafeng
    Wu, Rui
    Cheng, Dansong
    Tang, Xianglong
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2019, 29 (02) : 351 - 362
  • [5] CBR for state value function approximation in reinforcement learning
    Gabel, T
    Riedmiller, M
    CASE-BASED REASONING RESEARCH AND DEVELOPMENT, PROCEEDINGS, 2005, 3620 : 206 - 221
  • [6] Provably Efficient Reinforcement Learning with Linear Function Approximation
    Jin, Chi
    Yang, Zhuoran
    Wang, Zhaoran
    Jordan, Michael, I
    MATHEMATICS OF OPERATIONS RESEARCH, 2023, 48 (03) : 1496 - 1521
  • [7] Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation
    Foster, Dylan J.
    Krishnamurthy, Akshay
    Simchi-Levi, David
    Xu, Yunzong
    CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178
  • [8] Distributed Value Function Approximation for Collaborative Multiagent Reinforcement Learning
    Stankovic, Milos S.
    Beko, Marko
    Stankovic, Srdjan S.
    IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2021, 8 (03): : 1270 - 1280
  • [9] A grey approximation approach to state value function in reinforcement learning
    Hwang, Kao-Shing
    Chen, Yu-Jen
    Lee, Guar-Yuan
    2007 IEEE INTERNATIONAL CONFERENCE ON INTEGRATION TECHNOLOGY, PROCEEDINGS, 2007, : 379 - +
  • [10] A Multiplicative Value Function for Safe and Efficient Reinforcement Learning
    Buhrer, Nick
    Zhang, Zhejun
    Liniger, Alexander
    Yu, Fisher
    Van Gool, Luc
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 5582 - 5589