Offline Model-Based Adaptable Policy Learning for Decision-Making in Out-of-Support Regions

被引:0
|
作者
Chen, Xiong-Hui [1 ]
Luo, Fan-Ming [1 ]
Yu, Yang [1 ]
Li, Qingyang [2 ]
Qin, Zhiwei [2 ]
Shang, Wenjie [3 ]
Ye, Jieping [3 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China
[2] DiDi Labs, Mountain View, CA 94043 USA
[3] DiDi Chuxing, Beijing 300450, Peoples R China
基金
美国国家科学基金会;
关键词
Adaptation models; Uncertainty; Predictive models; Behavioral sciences; Extrapolation; Trajectory; Reinforcement learning; Adaptable policy learning; meta learning; model-based reinforcement learning; offline reinforcement learning;
D O I
10.1109/TPAMI.2023.3317131
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In reinforcement learning, a promising direction to avoid online trial-and-error costs is learning from an offline dataset. Current offline reinforcement learning methods commonly learn in the policy space constrained to in-support regions by the offline dataset, in order to ensure the robustness of the outcome policies. Such constraints, however, also limit the potential of the outcome policies. In this paper, to release the potential of offline policy learning, we investigate the decision-making problems in out-of-support regions directly and propose offline Model-based Adaptable Policy LEarning (MAPLE). By this approach, instead of learning in in-support regions, we learn an adaptable policy that can adapt its behavior in out-of-support regions when deployed. We give a practical implementation of MAPLE via meta-learning techniques and ensemble model learning techniques. We conduct experiments on MuJoCo locomotion tasks with offline datasets. The results show that the proposed method can make robust decisions in out-of-support regions and achieve better performance than SOTA algorithms.
引用
收藏
页码:15260 / 15274
页数:15
相关论文
共 50 条
  • [41] Model-based fMRI and its application to reward learning and decision making
    O'Doherty, John P.
    Hampton, Alan
    Kim, Hackjin
    REWARD AND DECISION MAKING IN CORTICOBASAL GANGLIA NETWORKS, 2007, 1104 : 35 - 53
  • [42] Working Memory Guides Action Valuation in Model-based Decision-making Strategy
    Zuo, Zhaoyu
    Yang, Li-Zhuang
    Wang, Hongzhi
    Li, Hai
    JOURNAL OF COGNITIVE NEUROSCIENCE, 2025, 37 (01) : 86 - 96
  • [43] Integrating dose estimation into a decision-making framework for model-based drug development
    Dunyak, James
    Mitchell, Patrick
    Hamren, Bengt
    Helmlinger, Gabriel
    Matcham, James
    Stanski, Donald
    Al-Huniti, Nidal
    PHARMACEUTICAL STATISTICS, 2018, 17 (02) : 155 - 168
  • [44] Enhanced Adaptable and Distributed Access Control Decision Making Model Based on Machine Learning for Policy Conflict Resolution in BYOD Environment
    Ayedh, M. Aljuaid Turkea
    Wahab, Ainuddin Wahid Abdul
    Idris, Mohd Yamani Idna
    APPLIED SCIENCES-BASEL, 2023, 13 (12):
  • [45] Weighted model estimation for offline model-based reinforcement learning
    Hishinuma, Toru
    Senda, Kei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [46] POLICY AND PROCEDURE PROGRAM IS A SUPPORT SYSTEM FOR EXECUTIVE DECISION-MAKING
    ELSON, JL
    INDUSTRIAL ENGINEERING, 1985, 17 (09): : 36 - &
  • [47] Data-Driven Offline Decision-Making via Invariant Representation Learning
    Qi, Han
    Su, Yi
    Kumar, Aviral
    Levine, Sergey
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [48] Offline Reinforcement Learning with Reverse Model-based Imagination
    Wang, Jianhao
    Li, Wenzhe
    Jiang, Haozhe
    Zhu, Guangxiang
    Li, Siyuan
    Zhang, Chongjie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [49] Offline Model-Based Reinforcement Learning for Tokamak Control
    Char, Ian
    Abbate, Joseph
    Bardoczi, Laszlo
    Boyer, Mark D.
    Chung, Youngseog
    Conlin, Rory
    Erickson, Keith
    Mehta, Viraj
    Richner, Nathan
    Kolemen, Egemen
    Schneider, Jeff
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
  • [50] Moor: Model-based offline policy optimization with a risk dynamics model
    Su, Xiaolong
    Li, Peng
    Chen, Shaofei
    COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (01)