Training and Evaluation of Deep Policies Using Reinforcement Learning and Generative Models

被引：0

作者：

Ghadirzadeh, Ali ^{[1
]}

Poklukar, Petra ^{[2
]}

Arndt, Karol ^{[3
]}

Finn, Chelsea ^{[1
]}

Kyrki, Ville ^{[3
]}

Kragic, Danica ^{[2
]}

Bjorkman, Marten ^{[2
]}

机构：

[1] Stanford Univ, Stanford, CA 94305 USA

[2] KTH Royal Inst Technol, Stockholm, Sweden

[3] Aalto Univ, Espoo, Finland

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2022年 / 23卷

关键词：

reinforcement learning; policy search; robot learning; deep generative models; representation learning; PRIMITIVES;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We present a data-efficient framework for solving sequential decision-making problems which exploits the combination of reinforcement learning (RL) and latent variable genera-tive models. The framework, called GenRL, trains deep policies by introducing an action latent variable such that the feed-forward policy search can be divided into two parts: (i) training a sub-policy that outputs a distribution over the action latent variable given a state of the system, and (ii) unsupervised training of a generative model that outputs a sequence of motor actions conditioned on the latent action variable. GenRL enables safe exploration and alleviates the data-inefficiency problem as it exploits prior knowledge about valid sequences of motor actions. Moreover, we provide a set of measures for evaluation of generative models such that we are able to predict the performance of the RL policy training prior to the actual training on a physical robot. We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training on two robotics tasks: shooting a hockey puck and throwing a basket-ball. Furthermore, we empirically demonstrate that GenRL is the only method which can safely and efficiently solve the robotics tasks compared to two state-of-the-art RL methods.

引用

页数：37

共 50 条

[41] Unbiased training framework on deep reinforcement learning
Zhang, Huihui
COMPUTER JOURNAL, 2025,
[42] Dual Control by Reinforcement Learning Using Deep Hyperstate Transition Models *
Rosdahl, Christian
Cervin, Anton
Bernhardsson, Bo
IFAC PAPERSONLINE, 2022, 55 (12): : 395 - 401
[43] Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
Chua, Kurtland
Calandra, Roberto
McAllister, Rowan
Levine, Sergey
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[44] Simultaneously Evolving Deep Reinforcement Learning Models using Multifactorial Optimization
Martinez, Aritz D.
Osaba, Eneko
Del Ser, Javier
Herrera, Francisco
2020 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2020,
[45] Counterfactual state explanations for reinforcement learning agents via generative deep learning
Olson, Matthew L.
Khanna, Roli
Neal, Lawrence
Li, Fuxin
Wong, Weng-Keen
ARTIFICIAL INTELLIGENCE, 2021, 295
[46] Semi-Supervised Learning from Crowds Using Deep Generative Models
Atarashi, Kyohei
Oyama, Satoshi
Kurihara, Masahito
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 1555 - 1562
[47] Performance Analysis of Reinforcement Learning Techniques for Augmented Experience Training Using Generative Adversarial Networks
Mahajan, Smita
Patil, Shruti
Bhavnagri, Moinuddin
Singh, Rashmi
Kalra, Kshitiz
Saini, Bhumika
Kotecha, Ketan
Saini, Jatinderkumar
APPLIED SCIENCES-BASEL, 2022, 12 (24):
[48] Learning positioning policies for mobile manipulation operations with deep reinforcement learning
Ander Iriondo
Elena Lazkano
Ander Ansuategi
Andoni Rivera
Iker Lluvia
Carlos Tubío
International Journal of Machine Learning and Cybernetics, 2023, 14 : 3003 - 3023
[49] Learning positioning policies for mobile manipulation operations with deep reinforcement learning
Iriondo, Ander
Lazkano, Elena
Ansuategi, Ander
Rivera, Andoni
Lluvia, Iker
Tubio, Carlos
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (09) : 3003 - 3023
[50] Using Deep Reinforcement Learning to Learn High-Level Policies on the ATRIAS Biped
Li, Tianyu
Geyer, Hartmut
Atkeson, Christopher G.
Rai, Akshara
2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 263 - 269

← 1 2 3 4 5 →