Skill Reward for Safe Deep Reinforcement Learning

被引:0
|
作者
Cheng, Jiangchang [1 ]
Yu, Fumin [1 ]
Zhang, Hongliang [1 ]
Dai, Yinglong [2 ,3 ]
机构
[1] Hunan Normal Univ, Coll Informat Sci & Engn, Changsha 410081, Peoples R China
[2] Natl Univ Def Technol, Coll Liberal Arts & Sci, Changsha 410073, Peoples R China
[3] Hunan Prov Key Lab Intelligent Comp & Language In, Changsha 410081, Peoples R China
来源
UBIQUITOUS SECURITY | 2022年 / 1557卷
关键词
Reinforcement learning; Deep reinforcement learning; Reward shaping; Skill reward; Safe agent; LEVEL;
D O I
10.1007/978-981-19-0468-4_15
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning technology enables an agent to interact with the environment and learn from experience to maximize the cumulative reward of specific tasks, and get a powerful agent to solve decision optimization problems. This process is highly similar to our human learning process, that is, learning from the interaction with the environment. As we know, the behavior of an agent based on deep reinforcement learning is often unpredictable, and the agent will produce some weird and unsafe behavior sometimes. To make the behavior and the decision process of the agent explainable and controllable, this paper proposed the skill reward method that the agent can be constrained to learn some controllable and safe behaviors. When an agent finishes specific skills in the process of interaction with the environment, we can design the rewards obtained by the agent during the exploration process based on prior knowledge to make the learning process converge quickly. The skill reward can be embedded into the existing reinforcement learning algorithms. In this work, we embed the skill reward into the asynchronous advantage actor-critic (A3C) algorithm, and test the method in an Atari 2600 environment (Breakout-v4). The experiments demonstrate the effectiveness of the skill reward embedding method.
引用
收藏
页码:203 / 213
页数:11
相关论文
共 50 条
  • [21] Full Gradient Deep Reinforcement Learning for Average-Reward Criterion
    Pagare, Tejas
    Borkar, Vivek
    Avrachenkov, Konstantin
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
  • [22] On-Policy Deep Reinforcement Learning for the Average-Reward Criterion
    Zhang, Yiming
    Ross, Keith W.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [23] Deep Reinforcement Learning With Optimized Reward Functions for Robotic Trajectory Planning
    Xie, Jiexin
    Shao, Zhenzhou
    Li, Yue
    Guan, Yong
    Tan, Jindong
    IEEE ACCESS, 2019, 7 : 105669 - 105679
  • [24] Manipulation Skill Acquisition for Robotic Assembly using Deep Reinforcement Learning
    Li, Fengming
    Jiang, Qi
    Quan, Wei
    Song, Rui
    Li, Yibin
    2019 IEEE/ASME INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT MECHATRONICS (AIM), 2019, : 13 - 18
  • [25] Deep Reinforcement Learning of Robotic Precision Insertion Skill Accelerated by Demonstrations
    Wu, Xiapeng
    Zhang, Dapeng
    Qin, Fangbo
    Xu, De
    2019 IEEE 15TH INTERNATIONAL CONFERENCE ON AUTOMATION SCIENCE AND ENGINEERING (CASE), 2019, : 1651 - 1656
  • [26] Modular deep reinforcement learning from reward and punishment for robot navigation
    Wang, Jiexin
    Elfwing, Stefan
    Uchibe, Eiji
    NEURAL NETWORKS, 2021, 135 : 115 - 126
  • [27] An Improvement on Mapless Navigation with Deep Reinforcement Learning: A Reward Shaping Approach
    Alipanah, Arezoo
    Moosavian, S. Ali A.
    2022 10TH RSI INTERNATIONAL CONFERENCE ON ROBOTICS AND MECHATRONICS (ICROM), 2022, : 261 - 266
  • [28] Deep Reinforcement Learning by Parallelizing Reward and Punishment using the MaxPain Architecture
    Wang, Jiexin
    Elfwing, Stefan
    Uchibe, Eiji
    2018 JOINT IEEE 8TH INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING AND EPIGENETIC ROBOTICS (ICDL-EPIROB), 2018, : 175 - 180
  • [29] Cooperative Multi-Agent Deep Reinforcement Learning with Counterfactual Reward
    Shao, Kun
    Zhu, Yuanheng
    Tang, Zhentao
    Zhao, Dongbin
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [30] Reward poisoning attacks in deep reinforcement learning based on exploration strategies
    Cai, Kanting
    Zhu, Xiangbin
    Hu, Zhaolong
    NEUROCOMPUTING, 2023, 553