DynaSTI: Dynamics modeling with sequential temporal information for reinforcement learning in Atari

被引：1

作者：

Kim, Jaehoon ^{[1
]}

Lee, Young Jae ^{[1
]}

Kwak, Mingu ^{[2
]}

Park, Young Joon ^{[3
]}

Kim, Seoung Bum ^{[1
]}

机构：

[1] Korea Univ, Sch Ind Management Engn, 145 Anam Ro, Seoul 02841, South Korea

[2] Georgia Inst Technol, Sch Ind & Syst Engn, Atlanta, GA USA

[3] LG AI Res, Seoul, South Korea

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 299卷

基金：

新加坡国家研究基金会;

关键词：

Atari; Dynamics modeling; Hierarchical structure; Self-supervised learning; Reinforcement learning;

D O I：

10.1016/j.knosys.2024.112103

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep reinforcement learning (DRL) has shown remarkable capabilities in solving sequential decision -making problems. However, DRL requires extensive interactions with image -based environments. Existing methods have combined self -supervised learning or data augmentation to improve sample efficiency. While understanding the temporal information dynamics of the environment is important for effective learning, many methods do not consider these factors. To address the sample efficiency problem, we propose dynamics modeling with sequential temporal information (DynaSTI) that incorporates environmental dynamics and leverages the correlation among trajectories to improve sample efficiency. DynaSTI uses an effective learning strategy for state representation as an auxiliary task, using gated recurrent units to capture temporal information. It also integrates forward and inverse dynamics modeling in a hierarchical configuration, enhancing the learning of environmental dynamics compared to using each model separately. The hierarchical structure of DynaSTI enhances the stability of inverse dynamics modeling during training by using inputs derived from forward dynamics modeling, which focuses on feature extraction related to controllable state. This approach effectively filters out noisy information. Consequently, using denoised inputs from forward dynamics modeling results in improved stability when training inverse dynamics modeling, rather than using inputs directly from the encoder. We demonstrate the effectiveness of DynaSTI through experiments on the Atari game benchmark, limiting the environment interactions to 100k steps. Our extensive experiments confirm that DynaSTI significantly improves the sample efficiency of DRL, outperforming comparison methods in terms of statistically reliable metrics and nearing human -level performance.

引用

页数：12

共 50 条

[1] Domain Adaptation for Reinforcement Learning on the Atari
Carr, Thomas
Chli, Maria
Vogiatzis, George
AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1859 - 1861
[2] Sequential Decision Making with "Sequential Information" in Deep Reinforcement Learning
Xu, Aimin
Yuan, Linghui
Liu, Yunlong
PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT III, 2022, 13631 : 173 - 184
[3] STACoRe: Spatio-temporal and action-based contrastive representations for reinforcement learning in Atari
Lee, Young Jae
Kim, Jaehoon
Kwak, Mingu
Park, Young Joon
Kim, Seoung Bum
NEURAL NETWORKS, 2023, 160 : 1 - 11
[4] Accelerating Reinforcement Learning through GPU Atari Emulation
Dalton, Steven
Frosio, Iuri
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS (NEURIPS 2020), 2020, 33
[5] Loss Dynamics of Temporal Difference Reinforcement Learning
Bordelon, Blake
Masset, Paul
Kuo, Henry
Pehlevan, Cengiz
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[6] Visual Rationalizations in Deep Reinforcement Learning for Atari Games
Weitkamp, Laurens
van der Pol, Elise
Akata, Zeynep
ARTIFICIAL INTELLIGENCE, BNAIC 2018, 2019, 1021 : 151 - 165
[7] Computational modeling of temporal and sequential dynamics of foraging decisions
Kanghoon Jung
Hyeran Jang
Jerald D Kralik
Jaeseung Jeong
BMC Neuroscience, 15 (Suppl 1)
[8] Playing Atari with Hybrid Quantum-Classical Reinforcement Learning
Lockwood, Owen
Si, Mei
NEURIPS 2020 WORKSHOP ON PRE-REGISTRATION IN MACHINE LEARNING, VOL 148, 2020, 148 : 285 - 301
[9] Reinforcement Learning with Sequential Information Clustering in Real-Time Bidding
Lu, Junwei
Yang, Chaoqi
Gao, Xiaofeng
Wang, Liubin
Li, Changcheng
Chen, Guihai
PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 1633 - 1641
[10] Visual Transfer between Atari Games using Competitive Reinforcement Learning
Mittel, Akshita
Munukutla, Purna Sowmya
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 499 - 501

← 1 2 3 4 5 →