A new Potential-Based Reward Shaping for Reinforcement Learning Agent

被引：4

作者：

Badnava, Babak ^{[1
]}

Esmaeili, Mona ^{[2
]}

Mozayani, Nasser ^{[3
]}

Zarkesh-Ha, Payman ^{[2
]}

机构：

[1] Univ Kansas, Lawrence, KS 66045 USA

[2] Univ New Mexico, Albuquerque, NM 87131 USA

[3] Iran Univ Sci & Technol, Tehran 16846, Iran

来源：

2023 IEEE 13TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE, CCWC | 2023年

关键词：

Potential-based Reward Shaping; Reinforcement Learning; Reward Shaping; Knowledge Extraction;

D O I：

10.1109/CCWC57344.2023.10099211

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Potential-based reward shaping (PBRS) is a particular category of machine learning methods that aims to improve the learning speed of a reinforcement learning agent by extracting and utilizing extra knowledge while performing a task. There are two steps in the transfer learning process: extracting knowledge from previously learned tasks and transferring that knowledge to use it in a target task. The latter step is well discussed in the literature, with various methods being proposed for it, while the former has been explored less. With this in mind, the type of knowledge that is transmitted is very important and can lead to considerable improvement. Among the literature of both transfer learning and potential-based reward shaping, a subject that has never been addressed is the knowledge gathered during the learning process itself. In this paper, we presented a novel potential-based reward shaping method that attempted to extract knowledge from the learning process. The proposed method extracts knowledge from episodes' cumulative rewards. The proposed method has been evaluated in the Arcade learning environment, and the results indicate an improvement in the learning process in both the single-task and the multi-task reinforcement learner agents.

引用

页码：630 / 635

页数：6

共 50 条

[31] Reinforcement online learning to rank with unbiased reward shaping
Zhuang, Shengyao
Qiao, Zhihao
Zuccon, Guido
INFORMATION RETRIEVAL JOURNAL, 2022, 25 (04): : 386 - 413
[32] Reinforcement online learning to rank with unbiased reward shaping
Shengyao Zhuang
Zhihao Qiao
Guido Zuccon
Information Retrieval Journal, 2022, 25 : 386 - 413
[33] Temporal-Logic-Based Reward Shaping for Continuing Reinforcement Learning Tasks
Jiang, Yuqian
Bharadwaj, Suda
Wu, Bo
Shah, Rishi
Topcu, Ufuk
Stone, Peter
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7995 - 8003
[34] Expressing Arbitrary Reward Functions as Potential-Based Advice
Harutyunyan, Anna
Devlin, Sam
Vrancx, Peter
Nowe, Ann
PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2652 - 2658
[35] Multi-Agent Meta-Reinforcement Learning with Coordination and Reward Shaping for Traffic Signal Control
Du, Xin
Wang, Jiahai
Chen, Siyuan
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2023, PT II, 2023, 13936 : 349 - 360
[36] Direct reward and indirect reward in multi-agent reinforcement learning
Ohta, M
ROBOCUP 2002: ROBOT SOCCER WORLD CUP VI, 2003, 2752 : 359 - 366
[37] Direct reward and indirect reward in multi-agent reinforcement learning
Ohta, M. (ohta@carc.aist.go.jp), (Springer Verlag):
[38] Adaptively Shaping Reinforcement Learning Agents via Human Reward
Yu, Chao
Wang, Dongxu
Yang, Tianpei
Zhu, Wenxuan
Li, Yuchen
Ge, Hongwei
Ren, Jiankang
PRICAI 2018: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2018, 11012 : 85 - 97
[39] Generalized Maximum Entropy Reinforcement Learning via Reward Shaping
Tao F.
Wu M.
Cao Y.
IEEE Transactions on Artificial Intelligence, 2024, 5 (04): : 1563 - 1572
[40] Generalization in Deep Reinforcement Learning for Robotic Navigation by Reward Shaping
Miranda, Victor R. F.
Neto, Armando A.
Freitas, Gustavo M.
Mozelli, Leonardo A.
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2024, 71 (06) : 6013 - 6020

← 1 2 3 4 5 →