Masked and Inverse Dynamics Modeling for Data-Efficient Reinforcement Learning

被引：0

作者：

Lee, Young Jae ^{[1
]}

Kim, Jaehoon ^{[1
]}

Park, Young Joon ^{[2
]}

Kwak, Mingu ^{[3
]}

Kim, Seoung Bum ^{[1
]}

机构：

[1] Korea Univ, Dept Ind & Management Engn, Seoul 02841, South Korea

[2] LG AI Res, Seoul 07796, South Korea

[3] Georgia Inst Technol, Sch Ind & Syst Engn, Atlanta, GA 30332 USA

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年

基金：

新加坡国家研究基金会;

关键词：

Data models; Data augmentation; Transformers; Task analysis; Representation learning; Predictive models; Inverse problems; Data-efficient reinforcement learning; inverse dynamics modeling; masked modeling; self-supervised multitask learning; transformer;

D O I：

10.1109/TNNLS.2024.3439261

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In pixel-based deep reinforcement learning (DRL), learning representations of states that change because of an agent's action or interaction with the environment poses a critical challenge in improving data efficiency. Recent data-efficient DRL studies have integrated DRL with self-supervised learning (SSL) and data augmentation to learn state representations from given interactions. However, some methods have difficulties in explicitly capturing evolving state representations or in selecting data augmentations for appropriate reward signals. Our goal is to explicitly learn the inherent dynamics that change with an agent's intervention and interaction with the environment. We propose masked and inverse dynamics modeling (MIND), which uses masking augmentation and fewer hyperparameters to learn agent-controllable representations in changing states. Our method is comprised of a self-supervised multitask learning that leverages a transformer architecture, which captures the spatiotemporal information underlying in the highly correlated consecutive frames. MIND uses two tasks to perform self-supervised multitask learning: masked modeling and inverse dynamics modeling. Masked modeling learns the static visual representation required for control in the state, and inverse dynamics modeling learns the rapidly evolving state representation with agent intervention. By integrating inverse dynamics modeling as a complementary component to masked modeling, our method effectively learns evolving state representations. We evaluate our method by using discrete and continuous control environments with limited interactions. MIND outperforms previous methods across benchmarks and significantly improves data efficiency. The code is available at https://github.com/dudwojae/MIND.

引用

页数：14

共 50 条

[31] Cascaded Gaussian Processes for Data-efficient Robot Dynamics Learning
Rezaei-Shoshtari, Sahand
Meger, David
Sharf, Inna
2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 6871 - 6877
[32] Data-Efficient Graph Learning
Ding, Kaize
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 20, 2024, : 22663 - 22663
[33] Data-efficient Co-Adaptation of Morphology and Behaviour with Deep Reinforcement Learning
Luck, Kevin Sebastian
Ben Amor, Heni
Calandra, Roberto
CONFERENCE ON ROBOT LEARNING, VOL 100, 2019, 100
[34] Mix-up Consistent Cross Representations for Data-Efficient Reinforcement Learning
Liu, Shiyu
Cao, Guitao
Liu, Yong
Li, Yan
Wu, Chunwei
Xi, Xidong
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[35] Data-efficient deep reinforcement learning with expert demonstration for active flow control
Zheng, Changdong
Xie, Fangfang
Ji, Tingwei
Zhang, Xinshuai
Lu, Yufeng
Zhou, Hongjie
Zheng, Yao
PHYSICS OF FLUIDS, 2022, 34 (11)
[36] Data-Efficient Reinforcement Learning for Energy Optimization of Power-Assisted Wheelchairs
Feng, Guoxi
Busoniu, Lucian
Guerra, Thierry-Marie
Mohammad, Sami
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2019, 66 (12) : 9734 - 9744
[37] Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning
Zhong, Rujie
Zhang, Duohan
Schafer, Lukas
Albrecht, Stefano V.
Hanna, Josiah P.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[38] Load Balancing for Communication Networks via Data-Efficient Deep Reinforcement Learning
Wu, Di
Kang, Jikun
Xu, Yi Tian
Li, Hang
Li, Jimmy
Chen, Xi
Rivkin, Dmitriy
Jenkin, Michael
Lee, Taeseop
Park, Intaik
Liu, Xue
Dudek, Gregory
2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,
[39] SDRL: Interpretable and Data-Efficient Deep Reinforcement Learning Leveraging Symbolic Planning
Lyu, Daoming
Yang, Fangkai
Liu, Bo
Gustafson, Steven
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 2970 - 2977
[40] SDRL: Interpretable and Data-efficient Deep Reinforcement Learning Leveraging Symbolic Planning
Lyu, Daoming
Yang, Fangkai
Liu, Bo
Gustafson, Steven
ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2019, (306): : 354 - 354

← 1 2 3 4 5 →