A DATASET PERSPECTIVE ON OFFLINE REINFORCEMENT LEARNING

被引:0
|
作者
Schweighofer, Kajetan [1 ,2 ]
Radler, Andreas [1 ,2 ]
Dinu, Marius-Constantin [1 ,2 ,4 ]
Hofmarcher, Markus [1 ,2 ]
Patil, Vihang [1 ,2 ]
Bitto-Nemling, Angela [1 ,2 ,3 ]
Eghbal-zadeh, Hamid [1 ,2 ,3 ]
Hochreiter, Sepp [1 ,2 ]
机构
[1] Johannes Kepler Univ Linz, ELLIS Unit Linz, Inst Machine Learning, Linz, Austria
[2] Johannes Kepler Univ Linz, Inst Machine Learning, LIT AI Lab, Linz, Austria
[3] IARAI, Vienna, Austria
[4] Dynatrace Res, Linz, Austria
基金
欧盟地平线“2020”;
关键词
CONCEPT DRIFT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The application of Reinforcement Learning (RL) in real world environments can be expensive or risky due to sub-optimal policies during training. In Offline RL, this problem is avoided since interactions with an environment are prohibited. Policies are learned from a given dataset, which solely determines their performance. Despite this fact, how dataset characteristics influence Offline RL algorithms is still hardly investigated. The dataset characteristics are determined by the behavioral policy that samples this dataset. Therefore, we define characteristics of behavioral policies as exploratory for yielding high expected information in their interaction with the Markov Decision Process (MDP) and as exploitative for having high expected return. We implement two corresponding empirical measures for the datasets sampled by the behavioral policy in deterministic MDPs. The first empirical measure SACo is defined by the normalized unique state-action pairs and captures exploration. The second empirical measure TQ is defined by the normalized average trajectory return and captures exploitation. Empirical evaluations show the effectiveness of TQ and SACo. In large-scale experiments using our proposed measures, we show that the unconstrained off-policy Deep Q-Network family requires datasets with high SACo to find a good policy. Furthermore, experiments show that policy constraint algorithms perform well on datasets with high TQ and SACo. Finally, the experiments show, that purely dataset-constrained Behavioral Cloning performs competitively to the best Offline RL algorithms for datasets with high TQ. [GRAPHICS] .
引用
收藏
页数:48
相关论文
共 50 条
  • [1] An Optimistic Perspective on Offline Reinforcement Learning
    Agarwal, Rishabh
    Schuurmans, Dale
    Norouzi, Mohammad
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [2] Measuring Data Quality for Dataset Selection in Offline Reinforcement Learning
    Swazinna, Phillip
    Udluft, Steffen
    Runkler, Thomas
    2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
  • [3] Mild evaluation policy via dataset constraint for offline reinforcement learning
    Li, Xue
    Ling, Xinghong
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 274
  • [4] Offline Reinforcement Learning with Pseudometric Learning
    Dadashi, Robert
    Rezaeifar, Shideh
    Vieillard, Nino
    Hussenot, Leonard
    Pietquin, Olivier
    Geist, Matthieu
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [5] Benchmarking Offline Reinforcement Learning
    Tittaferrante, Andrew
    Yassine, Abdulsalam
    2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 259 - 263
  • [6] Federated Offline Reinforcement Learning
    Zhou, Doudou
    Zhang, Yufeng
    Sonabend-W, Aaron
    Wang, Zhaoran
    Lu, Junwei
    Cai, Tianxi
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (548) : 3152 - 3163
  • [7] Distributed Offline Reinforcement Learning
    Heredia, Paulo
    George, Jemin
    Mou, Shaoshuai
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 4621 - 4626
  • [8] Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks
    Qu, Yun
    Wang, Boyuan
    Shao, Jianzhun
    Jiang, Yuhang
    Chen, Chen
    Ye, Zhenbin
    Liu, Lin
    Yang, Junfeng
    Lai, Lin
    Qin, Hongyang
    Deng, Minwen
    Zhuo, Juchao
    Ye, Deheng
    Fu, Qiang
    Yang, Wei
    Yang, Guang
    Huang, Lanxiao
    Ji, Xiangyang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
    Zhang, Yinmin
    Liu, Jie
    Li, Chuming
    Niu, Yazhe
    Yang, Yaodong
    Liu, Yu
    Ouyang, Wanli
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 15, 2024, : 16908 - 16916
  • [10] Learning Behavior of Offline Reinforcement Learning Agents
    Shukla, Indu
    Dozier, Haley. R.
    Henslee, Althea. C.
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS VI, 2024, 13051