A new Potential-Based Reward Shaping for Reinforcement Learning Agent

被引:4
|
作者
Badnava, Babak [1 ]
Esmaeili, Mona [2 ]
Mozayani, Nasser [3 ]
Zarkesh-Ha, Payman [2 ]
机构
[1] Univ Kansas, Lawrence, KS 66045 USA
[2] Univ New Mexico, Albuquerque, NM 87131 USA
[3] Iran Univ Sci & Technol, Tehran 16846, Iran
关键词
Potential-based Reward Shaping; Reinforcement Learning; Reward Shaping; Knowledge Extraction;
D O I
10.1109/CCWC57344.2023.10099211
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Potential-based reward shaping (PBRS) is a particular category of machine learning methods that aims to improve the learning speed of a reinforcement learning agent by extracting and utilizing extra knowledge while performing a task. There are two steps in the transfer learning process: extracting knowledge from previously learned tasks and transferring that knowledge to use it in a target task. The latter step is well discussed in the literature, with various methods being proposed for it, while the former has been explored less. With this in mind, the type of knowledge that is transmitted is very important and can lead to considerable improvement. Among the literature of both transfer learning and potential-based reward shaping, a subject that has never been addressed is the knowledge gathered during the learning process itself. In this paper, we presented a novel potential-based reward shaping method that attempted to extract knowledge from the learning process. The proposed method extracts knowledge from episodes' cumulative rewards. The proposed method has been evaluated in the Arcade learning environment, and the results indicate an improvement in the learning process in both the single-task and the multi-task reinforcement learner agents.
引用
收藏
页码:630 / 635
页数:6
相关论文
共 50 条
  • [21] Reward Shaping for Reinforcement Learning by Emotion Expressions
    Hwang, K. S.
    Ling, J. L.
    Chen, Yu-Ying
    Wang, Wei-Han
    2014 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2014, : 1288 - 1293
  • [22] Hindsight Reward Shaping in Deep Reinforcement Learning
    de Villiers, Byron
    Sabatta, Deon
    2020 INTERNATIONAL SAUPEC/ROBMECH/PRASA CONFERENCE, 2020, : 653 - 659
  • [23] Subgoal-Based Reward Shaping to Improve Efficiency in Reinforcement Learning
    Okudo, Takato
    Yamada, Seiji
    IEEE ACCESS, 2021, 9 : 97557 - 97568
  • [24] FTPSG: Feature mixture transformer and potential-based subgoal generation for hierarchical multi-agent reinforcement learning
    Nicholaus, Isack Thomas
    Kang, Dae-Ki
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 270
  • [25] Reinforcement Learning with Reward Shaping and Hybrid Exploration in Sparse Reward Scenes
    Yang, Yulong
    Cao, Weihua
    Guo, Linwei
    Gan, Chao
    Wu, Min
    2023 IEEE 6TH INTERNATIONAL CONFERENCE ON INDUSTRIAL CYBER-PHYSICAL SYSTEMS, ICPS, 2023,
  • [26] Tactical reward shaping for large-scale combat by multi-agent reinforcement learning
    DUO Nanxun
    WANG Qinzhao
    LYU Qiang
    WANG Wei
    Journal of Systems Engineering and Electronics, 2024, 35 (06) : 1516 - 1529
  • [27] Tactical Reward Shaping for Large-Scale Combat by Multi-Agent Reinforcement Learning
    Duo, Nanxun
    Wang, Qinzhao
    Lyu, Qiang
    Wang, Wei
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2024, 35 (06) : 1516 - 1529
  • [28] Using Natural Language for Reward Shaping in Reinforcement Learning
    Goyal, Prasoon
    Niekum, Scott
    Mooney, Raymond J.
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2385 - 2391
  • [29] Barrier Functions Inspired Reward Shaping for Reinforcement Learning
    Nilaksh
    Ranjan, Abhishek
    Agrawal, Shreenabh
    Jain, Aayush
    Jagtap, Pushpak
    Kolathaya, Shishir
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2024), 2024, : 10807 - 10813
  • [30] Theoretical and Empirical Analysis of Reward Shaping in Reinforcement Learning
    Grzes, Marek
    Kudenko, Daniel
    EIGHTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2009, : 337 - 344