A new Potential-Based Reward Shaping for Reinforcement Learning Agent

被引:4
|
作者
Badnava, Babak [1 ]
Esmaeili, Mona [2 ]
Mozayani, Nasser [3 ]
Zarkesh-Ha, Payman [2 ]
机构
[1] Univ Kansas, Lawrence, KS 66045 USA
[2] Univ New Mexico, Albuquerque, NM 87131 USA
[3] Iran Univ Sci & Technol, Tehran 16846, Iran
关键词
Potential-based Reward Shaping; Reinforcement Learning; Reward Shaping; Knowledge Extraction;
D O I
10.1109/CCWC57344.2023.10099211
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Potential-based reward shaping (PBRS) is a particular category of machine learning methods that aims to improve the learning speed of a reinforcement learning agent by extracting and utilizing extra knowledge while performing a task. There are two steps in the transfer learning process: extracting knowledge from previously learned tasks and transferring that knowledge to use it in a target task. The latter step is well discussed in the literature, with various methods being proposed for it, while the former has been explored less. With this in mind, the type of knowledge that is transmitted is very important and can lead to considerable improvement. Among the literature of both transfer learning and potential-based reward shaping, a subject that has never been addressed is the knowledge gathered during the learning process itself. In this paper, we presented a novel potential-based reward shaping method that attempted to extract knowledge from the learning process. The proposed method extracts knowledge from episodes' cumulative rewards. The proposed method has been evaluated in the Arcade learning environment, and the results indicate an improvement in the learning process in both the single-task and the multi-task reinforcement learner agents.
引用
收藏
页码:630 / 635
页数:6
相关论文
共 50 条
  • [41] Reward Shaping from Hybrid Systems Models in Reinforcement Learning
    Qian, Marian
    Mitsch, Stefan
    NASA FORMAL METHODS, NFM 2023, 2023, 13903 : 122 - 139
  • [42] Obstacle Avoidance and Navigation Utilizing Reinforcement Learning with Reward Shaping
    Zhang, Daniel
    Bailey, Colleen P.
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS II, 2020, 11413
  • [43] Guaranteeing Control Requirements via Reward Shaping in Reinforcement Learning
    De Lellis, Francesco
    Coraggio, Marco
    Russo, Giovanni
    Musolesi, Mirco
    di Bernardo, Mario
    IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2024, 32 (06) : 2102 - 2113
  • [44] Multi-Objectivization of Reinforcement Learning Problems by Reward Shaping
    Brys, Tim
    Harutyunyan, Anna
    Vrancx, Peter
    Taylor, Matthew E.
    Kudenko, Daniel
    Nowe, Ann
    PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 2315 - 2322
  • [45] Graph convolutional recurrent networks for reward shaping in reinforcement learning
    Sami, Hani
    Bentahar, Jamal
    Mourad, Azzam
    Otrok, Hadi
    Damiani, Ernesto
    INFORMATION SCIENCES, 2022, 608 : 63 - 80
  • [46] Magnetic Field-Based Reward Shaping for Goal-Conditioned Reinforcement Learning
    Ding, Hongyu
    Tang, Yuanze
    Wu, Qing
    Wang, Bo
    Chen, Chunlin
    Wang, Zhi
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2023, 10 (12) : 2233 - 2247
  • [47] Adaptive reward shaping based reinforcement learning for docking control of autonomous underwater vehicles
    Chu, Shuguang
    Lin, Mingwei
    Li, Dejun
    Lin, Ri
    Xiao, Sa
    OCEAN ENGINEERING, 2025, 318
  • [48] Magnetic Field-Based Reward Shaping for Goal-Conditioned Reinforcement Learning
    Hongyu Ding
    Yuanze Tang
    Qing Wu
    Bo Wang
    Chunlin Chen
    Zhi Wang
    IEEE/CAAJournalofAutomaticaSinica, 2023, 10 (12) : 2233 - 2247
  • [49] Distributed Control using Reinforcement Learning with Temporal-Logic-Based Reward Shaping
    Zhang, Ningyuan
    Liu, Wenliang
    Belta, Calin
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 168, 2022, 168
  • [50] Funnel-Based Reward Shaping for Signal Temporal Logic Tasks in Reinforcement Learning
    Saxena, Naman
    Gorantla, Sandeep
    Jagtap, Pushpak
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (02) : 1373 - 1379