A new Potential-Based Reward Shaping for Reinforcement Learning Agent

被引:4
|
作者
Badnava, Babak [1 ]
Esmaeili, Mona [2 ]
Mozayani, Nasser [3 ]
Zarkesh-Ha, Payman [2 ]
机构
[1] Univ Kansas, Lawrence, KS 66045 USA
[2] Univ New Mexico, Albuquerque, NM 87131 USA
[3] Iran Univ Sci & Technol, Tehran 16846, Iran
关键词
Potential-based Reward Shaping; Reinforcement Learning; Reward Shaping; Knowledge Extraction;
D O I
10.1109/CCWC57344.2023.10099211
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Potential-based reward shaping (PBRS) is a particular category of machine learning methods that aims to improve the learning speed of a reinforcement learning agent by extracting and utilizing extra knowledge while performing a task. There are two steps in the transfer learning process: extracting knowledge from previously learned tasks and transferring that knowledge to use it in a target task. The latter step is well discussed in the literature, with various methods being proposed for it, while the former has been explored less. With this in mind, the type of knowledge that is transmitted is very important and can lead to considerable improvement. Among the literature of both transfer learning and potential-based reward shaping, a subject that has never been addressed is the knowledge gathered during the learning process itself. In this paper, we presented a novel potential-based reward shaping method that attempted to extract knowledge from the learning process. The proposed method extracts knowledge from episodes' cumulative rewards. The proposed method has been evaluated in the Arcade learning environment, and the results indicate an improvement in the learning process in both the single-task and the multi-task reinforcement learner agents.
引用
收藏
页码:630 / 635
页数:6
相关论文
共 50 条
  • [31] Reinforcement online learning to rank with unbiased reward shaping
    Zhuang, Shengyao
    Qiao, Zhihao
    Zuccon, Guido
    INFORMATION RETRIEVAL JOURNAL, 2022, 25 (04): : 386 - 413
  • [32] Reinforcement online learning to rank with unbiased reward shaping
    Shengyao Zhuang
    Zhihao Qiao
    Guido Zuccon
    Information Retrieval Journal, 2022, 25 : 386 - 413
  • [33] Temporal-Logic-Based Reward Shaping for Continuing Reinforcement Learning Tasks
    Jiang, Yuqian
    Bharadwaj, Suda
    Wu, Bo
    Shah, Rishi
    Topcu, Ufuk
    Stone, Peter
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7995 - 8003
  • [34] Expressing Arbitrary Reward Functions as Potential-Based Advice
    Harutyunyan, Anna
    Devlin, Sam
    Vrancx, Peter
    Nowe, Ann
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2652 - 2658
  • [35] Multi-Agent Meta-Reinforcement Learning with Coordination and Reward Shaping for Traffic Signal Control
    Du, Xin
    Wang, Jiahai
    Chen, Siyuan
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2023, PT II, 2023, 13936 : 349 - 360
  • [36] Direct reward and indirect reward in multi-agent reinforcement learning
    Ohta, M
    ROBOCUP 2002: ROBOT SOCCER WORLD CUP VI, 2003, 2752 : 359 - 366
  • [37] Direct reward and indirect reward in multi-agent reinforcement learning
    Ohta, M. (ohta@carc.aist.go.jp), (Springer Verlag):
  • [38] Adaptively Shaping Reinforcement Learning Agents via Human Reward
    Yu, Chao
    Wang, Dongxu
    Yang, Tianpei
    Zhu, Wenxuan
    Li, Yuchen
    Ge, Hongwei
    Ren, Jiankang
    PRICAI 2018: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2018, 11012 : 85 - 97
  • [39] Generalized Maximum Entropy Reinforcement Learning via Reward Shaping
    Tao F.
    Wu M.
    Cao Y.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (04): : 1563 - 1572
  • [40] Generalization in Deep Reinforcement Learning for Robotic Navigation by Reward Shaping
    Miranda, Victor R. F.
    Neto, Armando A.
    Freitas, Gustavo M.
    Mozelli, Leonardo A.
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2024, 71 (06) : 6013 - 6020