Offline Model-Based Adaptable Policy Learning for Decision-Making in Out-of-Support Regions

被引:0
|
作者
Chen, Xiong-Hui [1 ]
Luo, Fan-Ming [1 ]
Yu, Yang [1 ]
Li, Qingyang [2 ]
Qin, Zhiwei [2 ]
Shang, Wenjie [3 ]
Ye, Jieping [3 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China
[2] DiDi Labs, Mountain View, CA 94043 USA
[3] DiDi Chuxing, Beijing 300450, Peoples R China
基金
美国国家科学基金会;
关键词
Adaptation models; Uncertainty; Predictive models; Behavioral sciences; Extrapolation; Trajectory; Reinforcement learning; Adaptable policy learning; meta learning; model-based reinforcement learning; offline reinforcement learning;
D O I
10.1109/TPAMI.2023.3317131
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In reinforcement learning, a promising direction to avoid online trial-and-error costs is learning from an offline dataset. Current offline reinforcement learning methods commonly learn in the policy space constrained to in-support regions by the offline dataset, in order to ensure the robustness of the outcome policies. Such constraints, however, also limit the potential of the outcome policies. In this paper, to release the potential of offline policy learning, we investigate the decision-making problems in out-of-support regions directly and propose offline Model-based Adaptable Policy LEarning (MAPLE). By this approach, instead of learning in in-support regions, we learn an adaptable policy that can adapt its behavior in out-of-support regions when deployed. We give a practical implementation of MAPLE via meta-learning techniques and ensemble model learning techniques. We conduct experiments on MuJoCo locomotion tasks with offline datasets. The results show that the proposed method can make robust decisions in out-of-support regions and achieve better performance than SOTA algorithms.
引用
收藏
页码:15260 / 15274
页数:15
相关论文
共 50 条
  • [31] The Bounded Rationality Model-Based Game Intelligent Decision-Making Method
    Zhou, Qiang
    Gao, Chunming
    Meng, Zhigang
    ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, PT I, 2011, 7002 : 66 - +
  • [32] Design of redistributed manufacturing networks: a model-based decision-making framework
    Haddad, Yousef
    Salonitis, Konstantinos
    Emmanouilidis, Christos
    INTERNATIONAL JOURNAL OF COMPUTER INTEGRATED MANUFACTURING, 2021, 34 (10) : 1011 - 1030
  • [33] RelTrans: An Enhancing Offline Reinforcement Learning Model for the Complex Hand Gesture Decision-Making Task
    Chen, Xiangwei
    Zeng, Zhixia
    Xiao, Ruliang
    Rida, Imad
    Zhang, Shi
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (01) : 3762 - 3769
  • [34] MOReL: Model-Based Offline Reinforcement Learning
    Kidambi, Rahul
    Rajeswaran, Aravind
    Netrapalli, Praneeth
    Joachims, Thorsten
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [35] COMBO: Conservative Offline Model-Based Policy Optimization
    Yu, Tianhe
    Kumar, Aviral
    Rafailov, Rafael
    Rajeswaran, Aravind
    Levine, Sergey
    Finn, Chelsea
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [36] A Model to Support Risk Management Decision-Making
    Tchangani, Ayeley P.
    STUDIES IN INFORMATICS AND CONTROL, 2011, 20 (03): : 209 - 220
  • [37] THE CONTROLLING MODEL AS MANAGEMENT SUPPORT IN DECISION-MAKING
    Bolfek, Berislav
    EKONOMSKI VJESNIK, 2010, 23 (01): : 94 - 113
  • [38] A model to support the decision-making in urban regeneration
    Manganelli, Benedetto
    Tataranna, Sabina
    Pontrandolfi, Piergiuseppe
    LAND USE POLICY, 2020, 99
  • [39] An Improved Model of Administrative Decision-making Support System Based on MAS
    Zhang, Yi
    Yuan, Siwen
    Xu, Xiaolin
    FIRST IITA INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, : 27 - 31
  • [40] Decision-Making Support Service Based on Technology Opportunity Discovery Model
    Lee, Mikyoung
    Lee, Seungwoo
    Kim, Jinhyung
    Seo, Dongmin
    Kim, Pyung
    Jung, Hanmin
    Lee, Jinhee
    Kim, Taehong
    Koo, Hee Kwan
    Sung, Won-Kyung
    U- AND E-SERVICE, SCIENCE AND TECHNOLOGY, 2011, 264 : 263 - 268