Parallel-mentoring for Offline Model-based Optimization

被引:0
|
作者
Chen, Can [1 ,2 ]
Beckham, Christopher [2 ,3 ]
Liu, Zixuan [4 ]
Liu, Xue [1 ,2 ]
Pal, Christopher [2 ,3 ]
机构
[1] McGill Univ, Montreal, PQ, Canada
[2] MILA Quebec AI Inst, Montreal, PQ, Canada
[3] Polytech Montreal, Montreal, PQ, Canada
[4] Univ Washington, Seattle, WA 98195 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study offline model-based optimization to maximize a black-box objective function with a static dataset of designs and scores. These designs encompass a variety of domains, including materials, robots and DNA sequences. A common approach trains a proxy on the static dataset to approximate the black-box objective function and performs gradient ascent to obtain new designs. However, this often results in poor designs due to the proxy inaccuracies for out-of-distribution designs. Recent studies indicate that: (a) gradient ascent with a mean ensemble of proxies generally outperforms simple gradient ascent, and (b) a trained proxy provides weak ranking supervision signals for design selection. Motivated by (a) and (b), we propose parallel-mentoring as an effective and novel method that facilitates mentoring among parallel proxies, creating a more robust ensemble to mitigate the out-of-distribution issue. We focus on the three-proxy case and our method consists of two modules. The first module, voting-based pairwise supervision, operates on three parallel proxies and captures their ranking supervision signals as pairwise comparison labels. These labels are combined through majority voting to generate consensus labels, which incorporate ranking supervision signals from all proxies and enable mutual mentoring. However, label noise arises due to possible incorrect consensus. To alleviate this, we introduce an adaptive soft-labeling module with soft-labels initialized as consensus labels. Based on bi-level optimization, this module fine-tunes proxies in the inner level and learns more accurate labels in the outer level to adaptively mentor proxies, resulting in a more robust ensemble. Experiments validate the effectiveness of our method. Our code is available here.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] MOPO: Model-based Offline Policy Optimization
    Yu, Tianhe
    Thomas, Garrett
    Yu, Lantao
    Ermon, Stefano
    Zou, James
    Levine, Sergey
    Finn, Chelsea
    Ma, Tengyu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [2] RoMA: Robust Model Adaptation for Offline Model-based Optimization
    Yu, Sihyun
    Ahn, Sungsoo
    Song, Le
    Shin, Jinwoo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [3] COMBO: Conservative Offline Model-Based Policy Optimization
    Yu, Tianhe
    Kumar, Aviral
    Rafailov, Rafael
    Rajeswaran, Aravind
    Levine, Sergey
    Finn, Chelsea
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [4] Moor: Model-based offline policy optimization with a risk dynamics model
    Su, Xiaolong
    Li, Peng
    Chen, Shaofei
    COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (01)
  • [5] Model-Based Offline Adaptive Policy Optimization with Episodic Memory
    Cao, Hongye
    Wei, Qianru
    Zheng, Jiangbin
    Shi, Yanqing
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT II, 2022, 13530 : 50 - 62
  • [6] ROMO: Retrieval-enhanced Offline Model-based Optimization
    Chen, Mingcheng
    Zhao, Haoran
    Zhao, Yuxiang
    Fan, Hulei
    Gao, Hongqiao
    Yu, Yong
    Tian, Zheng
    2023 5TH INTERNATIONAL CONFERENCE ON DISTRIBUTED ARTIFICIAL INTELLIGENCE, DAI 2023, 2023,
  • [7] Conservative Objective Models for Effective Offline Model-Based Optimization
    Trabucco, Brandon
    Kumar, Aviral
    Geng, Xinyang
    Levine, Sergey
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7368 - 7378
  • [8] Model-Based Offline Policy Optimization with Distribution Correcting Regularization
    Shen, Jian
    Chen, Mingcheng
    Zhang, Zhicheng
    Yang, Zhengyu
    Zhang, Weinan
    Yu, Yong
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, 2021, 12975 : 174 - 189
  • [9] Policy Optimization by Looking Ahead for Model-based Offline Reinforcement Learning
    Liu, Yang
    Hofert, Marius
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 2791 - 2797
  • [10] Bidirectional Learning for Offline Infinite-width Model-based Optimization
    Chen, Can
    Zhang, Yingxue
    Fu, Jie
    Liu, Xue
    Coates, Mark
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,