DynaSTI: Dynamics modeling with sequential temporal information for reinforcement learning in Atari

被引:1
|
作者
Kim, Jaehoon [1 ]
Lee, Young Jae [1 ]
Kwak, Mingu [2 ]
Park, Young Joon [3 ]
Kim, Seoung Bum [1 ]
机构
[1] Korea Univ, Sch Ind Management Engn, 145 Anam Ro, Seoul 02841, South Korea
[2] Georgia Inst Technol, Sch Ind & Syst Engn, Atlanta, GA USA
[3] LG AI Res, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
Atari; Dynamics modeling; Hierarchical structure; Self-supervised learning; Reinforcement learning;
D O I
10.1016/j.knosys.2024.112103
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep reinforcement learning (DRL) has shown remarkable capabilities in solving sequential decision -making problems. However, DRL requires extensive interactions with image -based environments. Existing methods have combined self -supervised learning or data augmentation to improve sample efficiency. While understanding the temporal information dynamics of the environment is important for effective learning, many methods do not consider these factors. To address the sample efficiency problem, we propose dynamics modeling with sequential temporal information (DynaSTI) that incorporates environmental dynamics and leverages the correlation among trajectories to improve sample efficiency. DynaSTI uses an effective learning strategy for state representation as an auxiliary task, using gated recurrent units to capture temporal information. It also integrates forward and inverse dynamics modeling in a hierarchical configuration, enhancing the learning of environmental dynamics compared to using each model separately. The hierarchical structure of DynaSTI enhances the stability of inverse dynamics modeling during training by using inputs derived from forward dynamics modeling, which focuses on feature extraction related to controllable state. This approach effectively filters out noisy information. Consequently, using denoised inputs from forward dynamics modeling results in improved stability when training inverse dynamics modeling, rather than using inputs directly from the encoder. We demonstrate the effectiveness of DynaSTI through experiments on the Atari game benchmark, limiting the environment interactions to 100k steps. Our extensive experiments confirm that DynaSTI significantly improves the sample efficiency of DRL, outperforming comparison methods in terms of statistically reliable metrics and nearing human -level performance.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Blackbox Attacks on Reinforcement Learning Agents Using Approximated Temporal Information
    Zhao, Yiren
    Shumailov, Ilia
    Cui, Han
    Gaol, Xitong
    Mullins, Robert
    Anderson, Ross
    50TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS WORKSHOPS (DSN-W 2020), 2020, : 16 - 24
  • [22] Neural Modeling of Hose Dynamics to Speedup Reinforcement Learning Experiments
    Manuel Lopez-Guede, Jose
    Grana, Manuel
    BIOINSPIRED COMPUTATION IN ARTIFICIAL SYSTEMS, PT II, 2015, 9108 : 311 - 319
  • [23] Reinforcement learning under temporal logic constraints as a sequence modeling problem
    Tian, Daiying
    Fang, Hao
    Yang, Qingkai
    Yu, Haoyong
    Liang, Wenyu
    Wu, Yan
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2023, 161
  • [24] Reinforcement learning strategies for sequential action learning
    Fermin, Alan
    Takehiko, Yoshida
    Tanaka, Saori
    Ito, Makoto
    Yoshimoto, Junichiro
    Doya, Kenji
    NEUROSCIENCE RESEARCH, 2009, 65 : S236 - S236
  • [25] An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents
    Such, Felipe Petroski
    Madhavan, Vashisht
    Liu, Rosanne
    Wang, Rui
    Castro, Pablo Samuel
    Li, Yulun
    Zhi, Jiale
    Schubert, Ludwig
    Bellemare, Marc G.
    Clune, Jeff
    Lehman, Joel
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3260 - 3267
  • [26] Outsmarting algorithms: A comparative battle between Reinforcement Learning and heuristics in Atari Tetris
    Bairaktaris, Julius A.
    Johannssen, Arne
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 277
  • [27] Temporal Shift Reinforcement Learning
    Thomas, Deepak George
    Wongpiromsarn, Tichakorn
    Jannesari, Ali
    PROCEEDINGS OF THE 2022 2ND EUROPEAN WORKSHOP ON MACHINE LEARNING AND SYSTEMS (EUROMLSYS '22), 2022, : 95 - 100
  • [28] Sequential Recommendation via Joint Modeling of Temporal Dynamics With LSTM and Hawkes Process
    Yan, Yongjie
    Xie, Hui
    IEEE ACCESS, 2025, 13 : 45391 - 45399
  • [29] Reinforcement Learning for Sequential Composition Control
    Najafi, Esmaeil
    Lopes, Gabriel A. D.
    Babuska, Robert
    2013 IEEE 52ND ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2013, : 7265 - 7270
  • [30] Reinforcement Learning for Minimizing Age of Information under Realistic Physical Dynamics
    Wang, Sihua
    Chen, Mingzhe
    Saad, Walid
    Yin, Changchuan
    Cui, Shuguang
    Poor, H. Vincent
    2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2020,