A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation

被引：0

作者：

Fujimoto, Scott ^{[1
]}

Meger, David ^{[1
]}

Precup, Doina ^{[1
]}

机构：

[1] McGill Univ, Mila, Montreal, PQ, Canada

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷

基金：

加拿大自然科学与工程研究理事会;

关键词：

ENVIRONMENT;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Marginalized importance sampling (MIS), which measures the density ratio between the state-action occupancy of a target policy and that of a sampling distribution, is a promising approach for off-policy evaluation. However, current state-of-the-art MIS methods rely on complex optimization tricks and succeed mostly on simple toy problems. We bridge the gap between MIS and deep reinforcement learning by observing that the density ratio can be computed from the successor representation of the target policy. The successor representation can be trained through deep reinforcement learning methodology and decouples the reward optimization from the dynamics of the environment, making the resulting algorithm stable and applicable to high-dimensional domains. We evaluate the empirical performance of our approach on a variety of challenging Atari and MuJoCo environments.

引用

页数：12

共 50 条

[1] The successor representation in human reinforcement learning
Momennejad, I.
Russek, E. M.
Cheong, J. H.
Botvinick, M. M.
Daw, N. D.
Gershman, S. J.
NATURE HUMAN BEHAVIOUR, 2017, 1 (09): : 680 - 692
[2] The successor representation in human reinforcement learning
I. Momennejad
E. M. Russek
J. H. Cheong
M. M. Botvinick
N. D. Daw
S. J. Gershman
Nature Human Behaviour, 2017, 1 : 680 - 692
[3] Temporal Abstraction in Reinforcement Learning with the Successor Representation
Machado, Marlos C.
Barreto, Andre
Precup, Doina
Bowling, Michael
JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
[4] Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling
Xie, Tengyang
Ma, Yifei
Wang, Yu-Xiang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[5] A Deep Reinforcement Learning Approach to Configuration Sampling Problem
Abolfazli, Amir
Spiegetberg, Jakob
Palmer, Gregory
Anand, Avishek
23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, ICDM 2023, 2023, : 1 - 10
[6] Black-Box Reward Attacks Against Deep Reinforcement Learning Based on Successor Representation
Cai, Kanting
Zhu, Xiangbin
Hu, Zhao-Long
IEEE ACCESS, 2022, 10 : 51548 - 51560
[7] Deep Reinforcement Learning with Successor Features for Navigation across Similar Environments
Zhang, Jingwei
Springenberg, Jost Tobias
Boedecker, Joschka
Burgard, Wolfram
2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 2371 - 2378
[8] Unsupervised Representation Learning in Deep Reinforcement Learning: A Review
Botteghi, Nicolo
Poel, Mannes
Brune, Christoph
IEEE CONTROL SYSTEMS MAGAZINE, 2025, 45 (02): : 26 - 68
[9] STATE REPRESENTATION LEARNING FOR EFFECTIVE DEEP REINFORCEMENT LEARNING
Zhao, Jian
Zhou, Wengang
Zhao, Tianyu
Zhou, Yun
Li, Houqiang
2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
[10] Generalized Representation Learning Methods for Deep Reinforcement Learning
Zhu, Hanhua
PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 5216 - 5217

← 1 2 3 4 5 →