Guided Soft Actor Critic: A Guided Deep Reinforcement Learning Approach for Partially Observable Markov Decision Processes

被引:8
|
作者
Haklidir, Mehmet [1 ,2 ]
Temeltas, Hakan [1 ]
机构
[1] Istanbul Tech Univ, Dept Control & Automat Engn, TR-34467 Istanbul, Turkey
[2] TUBITAK Informat & Informat Secur Res Ctr, Informat Technol Inst, TR-41470 Kocaeli, Turkey
基金
中国国家自然科学基金;
关键词
Reinforcement learning; Markov processes; Task analysis; Training; Taxonomy; Supervised learning; Licenses; Deep reinforcement learning; guided policy search; POMDP;
D O I
10.1109/ACCESS.2021.3131772
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Most real-world problems are essentially partially observable, and the environmental model is unknown. Therefore, there is a significant need for reinforcement learning approaches to solve them, where the agent perceives the state of the environment partially and noisily. Guided reinforcement learning methods solve this issue by providing additional state knowledge to reinforcement learning algorithms during the learning process, allowing them to solve a partially observable Markov decision process (POMDP) more effectively. However, these guided approaches are relatively rare in the literature, and most existing approaches are model-based, meaning that they require learning an appropriate model of the environment first. In this paper, we propose a novel model-free approach that combines the soft actor-critic method and supervised learning concept to solve real-world problems, formulating them as POMDPs. In experiments performed on OpenAI Gym, an open-source simulation platform, our guided soft actor-critic approach outperformed other baseline algorithms, gaining 7 similar to 20% more maximum average return on five partially observable tasks constructed based on continuous control problems and simulated in MuJoCo.
引用
收藏
页码:159672 / 159683
页数:12
相关论文
共 50 条
  • [1] A Deep Hierarchical Reinforcement Learning Algorithm in Partially Observable Markov Decision Processes
    Le, Tuyen P.
    Ngo Anh Vien
    Chung, Taechoong
    IEEE ACCESS, 2018, 6 : 49089 - 49102
  • [2] Consolidated actor-critic model for partially-observable Markov decision processes
    Elhanany, I.
    Niedzwiedz, C.
    Liu, Z.
    Livingston, S.
    ELECTRONICS LETTERS, 2008, 44 (22) : 1317 - U41
  • [3] Reinforcement learning algorithm for partially observable Markov decision processes
    Wang, Xue-Ning
    He, Han-Gen
    Xu, Xin
    Kongzhi yu Juece/Control and Decision, 2004, 19 (11): : 1263 - 1266
  • [4] Fuzzy Reinforcement Learning Control for Decentralized Partially Observable Markov Decision Processes
    Sharma, Rajneesh
    Spaan, Matthijs T. J.
    IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ 2011), 2011, : 1422 - 1429
  • [5] Provably Efficient Offline Reinforcement Learning for Partially Observable Markov Decision Processes
    Guo, Hongyi
    Cai, Qi
    Zhang, Yufeng
    Yang, Zhuoran
    Wang, Zhaoran
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [6] A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes
    Ross, Stephane
    Pineau, Joelle
    Chaib-draa, Brahim
    Kreitmann, Pierre
    JOURNAL OF MACHINE LEARNING RESEARCH, 2011, 12 : 1729 - 1770
  • [7] Active learning in partially observable Markov decision processes
    Jaulmes, R
    Pineau, J
    Precup, D
    MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 601 - 608
  • [8] A pulse neural network reinforcement learning algorithm for partially observable Markov decision processes
    Takita, Koichiro
    Hagiwara, Masafumi
    Systems and Computers in Japan, 2005, 36 (03): : 42 - 52
  • [9] Mixed reinforcement learning for partially observable Markov decision process
    Dung, Le Tien
    Komeda, Takashi
    Takagi, Motoki
    2007 INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN ROBOTICS AND AUTOMATION, 2007, : 436 - +
  • [10] Averaged Soft Actor-Critic for Deep Reinforcement Learning
    Ding, Feng
    Ma, Guanfeng
    Chen, Zhikui
    Gao, Jing
    Li, Peng
    COMPLEXITY, 2021, 2021