A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation

被引:0
|
作者
Fujimoto, Scott [1 ]
Meger, David [1 ]
Precup, Doina [1 ]
机构
[1] McGill Univ, Mila, Montreal, PQ, Canada
来源
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷
基金
加拿大自然科学与工程研究理事会;
关键词
ENVIRONMENT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Marginalized importance sampling (MIS), which measures the density ratio between the state-action occupancy of a target policy and that of a sampling distribution, is a promising approach for off-policy evaluation. However, current state-of-the-art MIS methods rely on complex optimization tricks and succeed mostly on simple toy problems. We bridge the gap between MIS and deep reinforcement learning by observing that the density ratio can be computed from the successor representation of the target policy. The successor representation can be trained through deep reinforcement learning methodology and decouples the reward optimization from the dynamics of the environment, making the resulting algorithm stable and applicable to high-dimensional domains. We evaluate the empirical performance of our approach on a variety of challenging Atari and MuJoCo environments.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] The successor representation in human reinforcement learning
    Momennejad, I.
    Russek, E. M.
    Cheong, J. H.
    Botvinick, M. M.
    Daw, N. D.
    Gershman, S. J.
    NATURE HUMAN BEHAVIOUR, 2017, 1 (09): : 680 - 692
  • [2] The successor representation in human reinforcement learning
    I. Momennejad
    E. M. Russek
    J. H. Cheong
    M. M. Botvinick
    N. D. Daw
    S. J. Gershman
    Nature Human Behaviour, 2017, 1 : 680 - 692
  • [3] Temporal Abstraction in Reinforcement Learning with the Successor Representation
    Machado, Marlos C.
    Barreto, Andre
    Precup, Doina
    Bowling, Michael
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [4] Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling
    Xie, Tengyang
    Ma, Yifei
    Wang, Yu-Xiang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [5] A Deep Reinforcement Learning Approach to Configuration Sampling Problem
    Abolfazli, Amir
    Spiegetberg, Jakob
    Palmer, Gregory
    Anand, Avishek
    23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, ICDM 2023, 2023, : 1 - 10
  • [6] Black-Box Reward Attacks Against Deep Reinforcement Learning Based on Successor Representation
    Cai, Kanting
    Zhu, Xiangbin
    Hu, Zhao-Long
    IEEE ACCESS, 2022, 10 : 51548 - 51560
  • [7] Deep Reinforcement Learning with Successor Features for Navigation across Similar Environments
    Zhang, Jingwei
    Springenberg, Jost Tobias
    Boedecker, Joschka
    Burgard, Wolfram
    2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 2371 - 2378
  • [8] Unsupervised Representation Learning in Deep Reinforcement Learning: A Review
    Botteghi, Nicolo
    Poel, Mannes
    Brune, Christoph
    IEEE CONTROL SYSTEMS MAGAZINE, 2025, 45 (02): : 26 - 68
  • [9] STATE REPRESENTATION LEARNING FOR EFFECTIVE DEEP REINFORCEMENT LEARNING
    Zhao, Jian
    Zhou, Wengang
    Zhao, Tianyu
    Zhou, Yun
    Li, Houqiang
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [10] Generalized Representation Learning Methods for Deep Reinforcement Learning
    Zhu, Hanhua
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 5216 - 5217