Offline Model-Based Adaptable Policy Learning for Decision-Making in Out-of-Support Regions

被引:0
|
作者
Chen, Xiong-Hui [1 ]
Luo, Fan-Ming [1 ]
Yu, Yang [1 ]
Li, Qingyang [2 ]
Qin, Zhiwei [2 ]
Shang, Wenjie [3 ]
Ye, Jieping [3 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China
[2] DiDi Labs, Mountain View, CA 94043 USA
[3] DiDi Chuxing, Beijing 300450, Peoples R China
基金
美国国家科学基金会;
关键词
Adaptation models; Uncertainty; Predictive models; Behavioral sciences; Extrapolation; Trajectory; Reinforcement learning; Adaptable policy learning; meta learning; model-based reinforcement learning; offline reinforcement learning;
D O I
10.1109/TPAMI.2023.3317131
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In reinforcement learning, a promising direction to avoid online trial-and-error costs is learning from an offline dataset. Current offline reinforcement learning methods commonly learn in the policy space constrained to in-support regions by the offline dataset, in order to ensure the robustness of the outcome policies. Such constraints, however, also limit the potential of the outcome policies. In this paper, to release the potential of offline policy learning, we investigate the decision-making problems in out-of-support regions directly and propose offline Model-based Adaptable Policy LEarning (MAPLE). By this approach, instead of learning in in-support regions, we learn an adaptable policy that can adapt its behavior in out-of-support regions when deployed. We give a practical implementation of MAPLE via meta-learning techniques and ensemble model learning techniques. We conduct experiments on MuJoCo locomotion tasks with offline datasets. The results show that the proposed method can make robust decisions in out-of-support regions and achieve better performance than SOTA algorithms.
引用
收藏
页码:15260 / 15274
页数:15
相关论文
共 50 条
  • [1] Offline Model-based Adaptable Policy Learning
    Chen, Xiong-Hui
    Yu, Yang
    Li, Qingyang
    Luo, Fan-Ming
    Qin, Zhiwei
    Shang, Wenjie
    Ye, Jieping
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [2] Comparing Robust Decision-Making and Dynamic Adaptive Policy Pathways for model-based decision support under deep uncertainty
    Kwakkel, Jan H.
    Haasnoot, Marjolijn
    Walker, Warren E.
    ENVIRONMENTAL MODELLING & SOFTWARE, 2016, 86 : 168 - 183
  • [3] Decision-making in a model-based design process
    Schade, Jutta
    Olofsson, Thomas
    Schreyer, Marcus
    CONSTRUCTION MANAGEMENT AND ECONOMICS, 2011, 29 (04) : 371 - 382
  • [4] Reduced Model-Based Decision-Making in Schizophrenia
    Culbreth, Adam J.
    Westbrook, Andrew
    Daw, Nathaniel D.
    Botvinick, Matthew
    Barch, Deanna M.
    JOURNAL OF ABNORMAL PSYCHOLOGY, 2016, 125 (06) : 777 - 787
  • [5] Using Autonomous Planning Agents to Provide Model-based Decision-making Support
    Hess, Traci J.
    Rees, Loren P.
    Rakes, Terry R.
    JOURNAL OF DECISION SYSTEMS, 2005, 14 (03) : 261 - 278
  • [6] Model-Based Support for Decision-Making in Architecture Evolution of Complex Software Systems
    Plakidas, Konstantinos
    Schall, Daniel
    Zdun, Uwe
    ECSA 2018: PROCEEDINGS OF THE 12TH EUROPEAN CONFERENCE ON SOFTWARE ARCHITECTURE: COMPANION PROCEEDINGS, 2018,
  • [7] Online model-based reinforcement learning for decision-making in long distance routes
    Alcaraz, Juan J.
    Losilla, Fernando
    Caballero-Arnaldos, Luis
    TRANSPORTATION RESEARCH PART E-LOGISTICS AND TRANSPORTATION REVIEW, 2022, 164
  • [8] Reduced model-based decision-making in gambling disorder
    Wyckmans, Florent
    Otto, A. Ross
    Sebold, Miriam
    Daw, Nathaniel
    Bechara, Antoine
    Saeremans, Melanie
    Kornreich, Charles
    Chatard, Armand
    Jaafari, Nemat
    Noel, Xavier
    SCIENTIFIC REPORTS, 2019, 9 (1)
  • [9] Reduced model-based decision-making in gambling disorder
    Florent Wyckmans
    A. Ross Otto
    Miriam Sebold
    Nathaniel Daw
    Antoine Bechara
    Mélanie Saeremans
    Charles Kornreich
    Armand Chatard
    Nemat Jaafari
    Xavier Noël
    Scientific Reports, 9
  • [10] Generative Model-Based Testing on Decision-Making Policies
    Li, Zhuo
    Wu, Xiongfei
    Zhu, Derui
    Cheng, Mingfei
    Chen, Siyuan
    Zhang, Fuyuan
    Xie, Xiaofei
    Ma, Lei
    Zhao, Jianjun
    2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE, 2023, : 243 - 254