DynaSTI: Dynamics modeling with sequential temporal information for reinforcement learning in Atari

被引:1
|
作者
Kim, Jaehoon [1 ]
Lee, Young Jae [1 ]
Kwak, Mingu [2 ]
Park, Young Joon [3 ]
Kim, Seoung Bum [1 ]
机构
[1] Korea Univ, Sch Ind Management Engn, 145 Anam Ro, Seoul 02841, South Korea
[2] Georgia Inst Technol, Sch Ind & Syst Engn, Atlanta, GA USA
[3] LG AI Res, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
Atari; Dynamics modeling; Hierarchical structure; Self-supervised learning; Reinforcement learning;
D O I
10.1016/j.knosys.2024.112103
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep reinforcement learning (DRL) has shown remarkable capabilities in solving sequential decision -making problems. However, DRL requires extensive interactions with image -based environments. Existing methods have combined self -supervised learning or data augmentation to improve sample efficiency. While understanding the temporal information dynamics of the environment is important for effective learning, many methods do not consider these factors. To address the sample efficiency problem, we propose dynamics modeling with sequential temporal information (DynaSTI) that incorporates environmental dynamics and leverages the correlation among trajectories to improve sample efficiency. DynaSTI uses an effective learning strategy for state representation as an auxiliary task, using gated recurrent units to capture temporal information. It also integrates forward and inverse dynamics modeling in a hierarchical configuration, enhancing the learning of environmental dynamics compared to using each model separately. The hierarchical structure of DynaSTI enhances the stability of inverse dynamics modeling during training by using inputs derived from forward dynamics modeling, which focuses on feature extraction related to controllable state. This approach effectively filters out noisy information. Consequently, using denoised inputs from forward dynamics modeling results in improved stability when training inverse dynamics modeling, rather than using inputs directly from the encoder. We demonstrate the effectiveness of DynaSTI through experiments on the Atari game benchmark, limiting the environment interactions to 100k steps. Our extensive experiments confirm that DynaSTI significantly improves the sample efficiency of DRL, outperforming comparison methods in terms of statistically reliable metrics and nearing human -level performance.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Sequential and Dynamic constraint Contrastive Learning for Reinforcement Learning
    Shen, Weijie
    Yuan, Lei
    Huang, Junfu
    Gao, Songyi
    Huang, Yuyang
    Yu, Yang
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [42] A neural network for temporal sequential information
    Tijsseling, AG
    Berthouze, L
    8TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING, VOLS 1-3, PROCEEDING, 2001, : 1449 - 1454
  • [43] Source tasks selection for transfer deep reinforcement learning: a case of study on Atari games
    Jesús García-Ramírez
    Eduardo F. Morales
    Hugo Jair Escalante
    Neural Computing and Applications, 2023, 35 : 18099 - 18111
  • [44] Temporal dynamics of information use in learning and retention of predator-related information in tadpoles
    Maud C. O. Ferrari
    Douglas P. Chivers
    Animal Cognition, 2013, 16 : 667 - 676
  • [45] Evolutionary dynamics on sequential temporal networks
    Sheng, Anzhi
    Li, Aming
    Wang, Long
    PLOS COMPUTATIONAL BIOLOGY, 2023, 19 (08)
  • [46] Temporal dynamics of information use in learning and retention of predator-related information in tadpoles
    Ferrari, Maud C. O.
    Chivers, Douglas P.
    ANIMAL COGNITION, 2013, 16 (04) : 667 - 676
  • [47] Sequential and Temporal Dynamics of Online Opinion
    Godes, David
    Silva, Jose C.
    MARKETING SCIENCE, 2012, 31 (03) : 448 - 473
  • [48] Reinforcement learning in information searching
    Cen, Yonghua
    Gan, Liren
    Bai, Chen
    INFORMATION RESEARCH-AN INTERNATIONAL ELECTRONIC JOURNAL, 2013, 18 (01):
  • [49] Reinforcement Learning for Information Retrieval
    Kuhnle, Alexander
    Aroca-Ouellette, Miguel
    Basu, Anindya
    Sensoy, Murat
    Reid, John
    Zhang, Dell
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 2669 - 2672
  • [50] Vehicle emission control on road with temporal traffic information using deep reinforcement learning
    Xu, Zhenyi
    Cao, Yang
    Kang, Yu
    Zhao, Zhenyi
    IFAC PAPERSONLINE, 2020, 53 (02): : 14960 - 14965