A new Potential-Based Reward Shaping for Reinforcement Learning Agent

被引:4
|
作者
Badnava, Babak [1 ]
Esmaeili, Mona [2 ]
Mozayani, Nasser [3 ]
Zarkesh-Ha, Payman [2 ]
机构
[1] Univ Kansas, Lawrence, KS 66045 USA
[2] Univ New Mexico, Albuquerque, NM 87131 USA
[3] Iran Univ Sci & Technol, Tehran 16846, Iran
关键词
Potential-based Reward Shaping; Reinforcement Learning; Reward Shaping; Knowledge Extraction;
D O I
10.1109/CCWC57344.2023.10099211
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Potential-based reward shaping (PBRS) is a particular category of machine learning methods that aims to improve the learning speed of a reinforcement learning agent by extracting and utilizing extra knowledge while performing a task. There are two steps in the transfer learning process: extracting knowledge from previously learned tasks and transferring that knowledge to use it in a target task. The latter step is well discussed in the literature, with various methods being proposed for it, while the former has been explored less. With this in mind, the type of knowledge that is transmitted is very important and can lead to considerable improvement. Among the literature of both transfer learning and potential-based reward shaping, a subject that has never been addressed is the knowledge gathered during the learning process itself. In this paper, we presented a novel potential-based reward shaping method that attempted to extract knowledge from the learning process. The proposed method extracts knowledge from episodes' cumulative rewards. The proposed method has been evaluated in the Arcade learning environment, and the results indicate an improvement in the learning process in both the single-task and the multi-task reinforcement learner agents.
引用
收藏
页码:630 / 635
页数:6
相关论文
共 50 条
  • [1] Potential-based reward shaping using state-space segmentation for efficiency in reinforcement learning
    Bal, Melis Ilayda
    Aydin, Hueseyin
    Iyiguen, Cem
    Polat, Faruk
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 157 : 469 - 484
  • [2] Potential Based Reward Shaping for Hierarchical Reinforcement Learning
    Gao, Yang
    Toni, Francesca
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 3504 - 3510
  • [3] Learning to Run with Potential-Based Reward Shaping and Demonstrations from Video Data
    Malysheva, Aleksandra
    Kudenko, Daniel
    Shpilman, Aleksei
    2018 15TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV), 2018, : 286 - 291
  • [4] AN EMPIRICAL STUDY OF POTENTIAL-BASED REWARD SHAPING AND ADVICE IN COMPLEX, MULTI-AGENT SYSTEMS
    Devlin, Sam
    Kudenko, Daniel
    Grzes, Marek
    ADVANCES IN COMPLEX SYSTEMS, 2011, 14 (02): : 251 - 278
  • [5] Potential-Based Reward Shaping for Intrinsic Motivation (Student Abstract)
    Forbes, Grant C.
    Roberts, David L.
    THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23488 - 23489
  • [6] Plan-based reward shaping for multi-agent reinforcement learning
    Devlin, Sam
    Kudenko, Daniel
    KNOWLEDGE ENGINEERING REVIEW, 2016, 31 (01): : 44 - 58
  • [7] Reward Shaping Based Federated Reinforcement Learning
    Hu, Yiqiu
    Hua, Yun
    Liu, Wenyan
    Zhu, Jun
    IEEE ACCESS, 2021, 9 : 67259 - 67267
  • [8] HPRS: hierarchical potential-based reward shaping from task specifications
    Berducci, Luigi
    Aguilar, Edgar A.
    Nickovic, Dejan
    Grosu, Radu
    FRONTIERS IN ROBOTICS AND AI, 2025, 11
  • [9] Potential-based reward shaping for finite horizon online POMDP planning
    Adam Eck
    Leen-Kiat Soh
    Sam Devlin
    Daniel Kudenko
    Autonomous Agents and Multi-Agent Systems, 2016, 30 : 403 - 445
  • [10] Potential-based reward shaping for finite horizon online POMDP planning
    Eck, Adam
    Soh, Leen-Kiat
    Devlin, Sam
    Kudenko, Daniel
    AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2016, 30 (03) : 403 - 445