Skill Reward for Safe Deep Reinforcement Learning

被引:0
|
作者
Cheng, Jiangchang [1 ]
Yu, Fumin [1 ]
Zhang, Hongliang [1 ]
Dai, Yinglong [2 ,3 ]
机构
[1] Hunan Normal Univ, Coll Informat Sci & Engn, Changsha 410081, Peoples R China
[2] Natl Univ Def Technol, Coll Liberal Arts & Sci, Changsha 410073, Peoples R China
[3] Hunan Prov Key Lab Intelligent Comp & Language In, Changsha 410081, Peoples R China
来源
UBIQUITOUS SECURITY | 2022年 / 1557卷
关键词
Reinforcement learning; Deep reinforcement learning; Reward shaping; Skill reward; Safe agent; LEVEL;
D O I
10.1007/978-981-19-0468-4_15
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning technology enables an agent to interact with the environment and learn from experience to maximize the cumulative reward of specific tasks, and get a powerful agent to solve decision optimization problems. This process is highly similar to our human learning process, that is, learning from the interaction with the environment. As we know, the behavior of an agent based on deep reinforcement learning is often unpredictable, and the agent will produce some weird and unsafe behavior sometimes. To make the behavior and the decision process of the agent explainable and controllable, this paper proposed the skill reward method that the agent can be constrained to learn some controllable and safe behaviors. When an agent finishes specific skills in the process of interaction with the environment, we can design the rewards obtained by the agent during the exploration process based on prior knowledge to make the learning process converge quickly. The skill reward can be embedded into the existing reinforcement learning algorithms. In this work, we embed the skill reward into the asynchronous advantage actor-critic (A3C) algorithm, and test the method in an Atari 2600 environment (Breakout-v4). The experiments demonstrate the effectiveness of the skill reward embedding method.
引用
收藏
页码:203 / 213
页数:11
相关论文
共 50 条
  • [41] Real-world Robot Reaching Skill Learning Based on Deep Reinforcement Learning
    Liu, Naijun
    Lu, Tao
    Cai, Yinghao
    Wang, Rui
    Wang, Shuo
    PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 4780 - 4784
  • [42] Reward Reports for Reinforcement Learning
    Gilbert, Thomas Krendl
    Lambert, Nathan
    Dean, Sarah
    Zick, Tom
    Snoswell, Aaron
    Mehta, Soham
    PROCEEDINGS OF THE 2023 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, AIES 2023, 2023, : 84 - 130
  • [43] Safe reinforcement learning under temporal logic with reward design and quantum action selection
    Cai, Mingyu
    Xiao, Shaoping
    Li, Junchao
    Kan, Zhen
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [44] Reward, motivation, and reinforcement learning
    Dayan, P
    Balleine, BW
    NEURON, 2002, 36 (02) : 285 - 298
  • [45] Research on Safe Reinforcement Controller Using Deep Reinforcement Learning with Control Barrier Function
    Ryu Y.-H.
    Oualid D.
    Lee D.-J.
    Journal of Institute of Control, Robotics and Systems, 2022, 28 (11) : 1013 - 1021
  • [46] Safe reinforcement learning under temporal logic with reward design and quantum action selection
    Mingyu Cai
    Shaoping Xiao
    Junchao Li
    Zhen Kan
    Scientific Reports, 13
  • [47] Skill combination for reinforcement learning
    Luo, Zhihui
    Bell, David
    McCollum, Barry
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2007, 2007, 4881 : 87 - 96
  • [48] Evaluation of Safe Reinforcement Learning with CoMirror Algorithm in a Non-Markovian Reward Problem
    Miyashita, Megumi
    Yano, Shiro
    Kondo, Toshiyuki
    INTELLIGENT AUTONOMOUS SYSTEMS 17, IAS-17, 2023, 577 : 62 - 72
  • [49] Air combat maneuver decision based on deep reinforcement learning with auxiliary reward
    Zhang T.
    Wang Y.
    Sun M.
    Chen Z.
    Neural Computing and Applications, 2024, 36 (21) : 13341 - 13356
  • [50] Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward
    Zhou, Kaiyang
    Qiao, Yu
    Xiang, Tao
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 7582 - 7589