Skill Reward for Safe Deep Reinforcement Learning

被引:0
|
作者
Cheng, Jiangchang [1 ]
Yu, Fumin [1 ]
Zhang, Hongliang [1 ]
Dai, Yinglong [2 ,3 ]
机构
[1] Hunan Normal Univ, Coll Informat Sci & Engn, Changsha 410081, Peoples R China
[2] Natl Univ Def Technol, Coll Liberal Arts & Sci, Changsha 410073, Peoples R China
[3] Hunan Prov Key Lab Intelligent Comp & Language In, Changsha 410081, Peoples R China
来源
UBIQUITOUS SECURITY | 2022年 / 1557卷
关键词
Reinforcement learning; Deep reinforcement learning; Reward shaping; Skill reward; Safe agent; LEVEL;
D O I
10.1007/978-981-19-0468-4_15
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning technology enables an agent to interact with the environment and learn from experience to maximize the cumulative reward of specific tasks, and get a powerful agent to solve decision optimization problems. This process is highly similar to our human learning process, that is, learning from the interaction with the environment. As we know, the behavior of an agent based on deep reinforcement learning is often unpredictable, and the agent will produce some weird and unsafe behavior sometimes. To make the behavior and the decision process of the agent explainable and controllable, this paper proposed the skill reward method that the agent can be constrained to learn some controllable and safe behaviors. When an agent finishes specific skills in the process of interaction with the environment, we can design the rewards obtained by the agent during the exploration process based on prior knowledge to make the learning process converge quickly. The skill reward can be embedded into the existing reinforcement learning algorithms. In this work, we embed the skill reward into the asynchronous advantage actor-critic (A3C) algorithm, and test the method in an Atari 2600 environment (Breakout-v4). The experiments demonstrate the effectiveness of the skill reward embedding method.
引用
收藏
页码:203 / 213
页数:11
相关论文
共 50 条
  • [1] Hindsight Reward Shaping in Deep Reinforcement Learning
    de Villiers, Byron
    Sabatta, Deon
    2020 INTERNATIONAL SAUPEC/ROBMECH/PRASA CONFERENCE, 2020, : 653 - 659
  • [2] Independent Skill Transfer for Deep Reinforcement Learning
    Tian, Qiangxing
    Wang, Guanchu
    Liu, Jinxin
    Wang, Donglin
    Kang, Yachen
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2901 - 2907
  • [3] Safe reward-based deep reinforcement learning control for an electro-hydraulic servo system
    Wu, Minling
    Liu, Lijun
    Yu, Zhen
    Li, Weizhou
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2022, 32 (13) : 7646 - 7662
  • [4] Probabilistic Guarantees for Safe Deep Reinforcement Learning
    Bacci, Edoardo
    Parker, David
    FORMAL MODELING AND ANALYSIS OF TIMED SYSTEMS, FORMATS 2020, 2020, 12288 : 231 - 248
  • [5] Variance aware reward smoothing for deep reinforcement learning
    Dong, Yunlong
    Zhang, Shengjun
    Liu, Xing
    Zhang, Yu
    Shen, Tan
    NEUROCOMPUTING, 2021, 458 : 327 - 335
  • [6] Deep reinforcement learning with reward design for quantum control
    Yu H.
    Zhao X.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (03): : 1087 - 1101
  • [7] Reward Space Noise for Exploration in Deep Reinforcement Learning
    Sun, Chuxiong
    Wang, Rui
    Li, Qian
    Hu, Xiaohui
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (10)
  • [8] Deep Reinforcement Learning for Video Summarization with Semantic Reward
    Sun, Haoran
    Zhu, Xiaolong
    Zhou, Conghua
    2022 IEEE 22ND INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY, AND SECURITY COMPANION, QRS-C, 2022, : 754 - 755
  • [9] Learning to Operate Distribution Networks With Safe Deep Reinforcement Learning
    Li, Hepeng
    He, Haibo
    IEEE TRANSACTIONS ON SMART GRID, 2022, 13 (03) : 1860 - 1872
  • [10] Nonparametric Bayesian Reward Segmentation for Skill Discovery Using Inverse Reinforcement Learning
    Ranchod, Pravesh
    Rosman, Benjamin
    Konidaris, George
    2015 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2015, : 471 - 477