Parallel-mentoring for Offline Model-based Optimization

被引：0

作者：

Chen, Can ^{[1
,2
]}

Beckham, Christopher ^{[2
,3
]}

Liu, Zixuan ^{[4
]}

Liu, Xue ^{[1
,2
]}

Pal, Christopher ^{[2
,3
]}

机构：

[1] McGill Univ, Montreal, PQ, Canada

[2] MILA Quebec AI Inst, Montreal, PQ, Canada

[3] Polytech Montreal, Montreal, PQ, Canada

[4] Univ Washington, Seattle, WA 98195 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study offline model-based optimization to maximize a black-box objective function with a static dataset of designs and scores. These designs encompass a variety of domains, including materials, robots and DNA sequences. A common approach trains a proxy on the static dataset to approximate the black-box objective function and performs gradient ascent to obtain new designs. However, this often results in poor designs due to the proxy inaccuracies for out-of-distribution designs. Recent studies indicate that: (a) gradient ascent with a mean ensemble of proxies generally outperforms simple gradient ascent, and (b) a trained proxy provides weak ranking supervision signals for design selection. Motivated by (a) and (b), we propose parallel-mentoring as an effective and novel method that facilitates mentoring among parallel proxies, creating a more robust ensemble to mitigate the out-of-distribution issue. We focus on the three-proxy case and our method consists of two modules. The first module, voting-based pairwise supervision, operates on three parallel proxies and captures their ranking supervision signals as pairwise comparison labels. These labels are combined through majority voting to generate consensus labels, which incorporate ranking supervision signals from all proxies and enable mutual mentoring. However, label noise arises due to possible incorrect consensus. To alleviate this, we introduce an adaptive soft-labeling module with soft-labels initialized as consensus labels. Based on bi-level optimization, this module fine-tunes proxies in the inner level and learns more accurate labels in the outer level to adaptively mentor proxies, resulting in a more robust ensemble. Experiments validate the effectiveness of our method. Our code is available here.

引用

页数：18

共 50 条

[1] MOPO: Model-based Offline Policy Optimization
Yu, Tianhe
Thomas, Garrett
Yu, Lantao
Ermon, Stefano
Zou, James
Levine, Sergey
Finn, Chelsea
Ma, Tengyu
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[2] RoMA: Robust Model Adaptation for Offline Model-based Optimization
Yu, Sihyun
Ahn, Sungsoo
Song, Le
Shin, Jinwoo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[3] COMBO: Conservative Offline Model-Based Policy Optimization
Yu, Tianhe
Kumar, Aviral
Rafailov, Rafael
Rajeswaran, Aravind
Levine, Sergey
Finn, Chelsea
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[4] Moor: Model-based offline policy optimization with a risk dynamics model
Su, Xiaolong
Li, Peng
Chen, Shaofei
COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (01)
[5] Model-Based Offline Adaptive Policy Optimization with Episodic Memory
Cao, Hongye
Wei, Qianru
Zheng, Jiangbin
Shi, Yanqing
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT II, 2022, 13530 : 50 - 62
[6] ROMO: Retrieval-enhanced Offline Model-based Optimization
Chen, Mingcheng
Zhao, Haoran
Zhao, Yuxiang
Fan, Hulei
Gao, Hongqiao
Yu, Yong
Tian, Zheng
2023 5TH INTERNATIONAL CONFERENCE ON DISTRIBUTED ARTIFICIAL INTELLIGENCE, DAI 2023, 2023,
[7] Conservative Objective Models for Effective Offline Model-Based Optimization
Trabucco, Brandon
Kumar, Aviral
Geng, Xinyang
Levine, Sergey
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7368 - 7378
[8] Model-Based Offline Policy Optimization with Distribution Correcting Regularization
Shen, Jian
Chen, Mingcheng
Zhang, Zhicheng
Yang, Zhengyu
Zhang, Weinan
Yu, Yong
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, 2021, 12975 : 174 - 189
[9] Policy Optimization by Looking Ahead for Model-based Offline Reinforcement Learning
Liu, Yang
Hofert, Marius
2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 2791 - 2797
[10] Bidirectional Learning for Offline Infinite-width Model-based Optimization
Chen, Can
Zhang, Yingxue
Fu, Jie
Liu, Xue
Coates, Mark
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,

← 1 2 3 4 5 →