Parallel-mentoring for Offline Model-based Optimization

被引：0

作者：

Chen, Can ^{[1
,2
]}

Beckham, Christopher ^{[2
,3
]}

Liu, Zixuan ^{[4
]}

Liu, Xue ^{[1
,2
]}

Pal, Christopher ^{[2
,3
]}

机构：

[1] McGill Univ, Montreal, PQ, Canada

[2] MILA Quebec AI Inst, Montreal, PQ, Canada

[3] Polytech Montreal, Montreal, PQ, Canada

[4] Univ Washington, Seattle, WA 98195 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study offline model-based optimization to maximize a black-box objective function with a static dataset of designs and scores. These designs encompass a variety of domains, including materials, robots and DNA sequences. A common approach trains a proxy on the static dataset to approximate the black-box objective function and performs gradient ascent to obtain new designs. However, this often results in poor designs due to the proxy inaccuracies for out-of-distribution designs. Recent studies indicate that: (a) gradient ascent with a mean ensemble of proxies generally outperforms simple gradient ascent, and (b) a trained proxy provides weak ranking supervision signals for design selection. Motivated by (a) and (b), we propose parallel-mentoring as an effective and novel method that facilitates mentoring among parallel proxies, creating a more robust ensemble to mitigate the out-of-distribution issue. We focus on the three-proxy case and our method consists of two modules. The first module, voting-based pairwise supervision, operates on three parallel proxies and captures their ranking supervision signals as pairwise comparison labels. These labels are combined through majority voting to generate consensus labels, which incorporate ranking supervision signals from all proxies and enable mutual mentoring. However, label noise arises due to possible incorrect consensus. To alleviate this, we introduce an adaptive soft-labeling module with soft-labels initialized as consensus labels. Based on bi-level optimization, this module fine-tunes proxies in the inner level and learns more accurate labels in the outer level to adaptively mentor proxies, resulting in a more robust ensemble. Experiments validate the effectiveness of our method. Our code is available here.

引用

页数：18

共 50 条

[31] Population model-based optimization
Chen, Xi
Zhou, Enlu
JOURNAL OF GLOBAL OPTIMIZATION, 2015, 63 (01) : 125 - 148
[32] Population model-based optimization
Xi Chen
Enlu Zhou
Journal of Global Optimization, 2015, 63 : 125 - 148
[33] MODEL-BASED EVOLUTIONARY OPTIMIZATION
Wang, Yongqiang
Fu, Michael C.
Marcus, Steven I.
PROCEEDINGS OF THE 2010 WINTER SIMULATION CONFERENCE, 2010, : 1199 - 1210
[34] Model-Based Optimization for Robotics
Mombaur, Katja
Kheddar, Abderrahmane
Harada, Kensuke
Buschmann, Thomas
Atkeson, Chris
IEEE ROBOTICS & AUTOMATION MAGAZINE, 2014, 21 (03) : 24 - 161
[35] Model-Based Offline Reinforcement Learning for Autonomous Delivery of Guidewire
Li, Hao
Zhou, Xiao-Hu
Xie, Xiao-Liang
Liu, Shi-Qi
Feng, Zhen-Qiu
Gui, Mei-Jiang
Xiang, Tian-Yu
Huang, De-Xing
Hou, Zeng-Guang
IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS, 2024, 6 (03): : 1054 - 1062
[36] Bayesian Model-Based Offline Reinforcement Learning for Product Allocation
Jenkins, Porter
Wei, Hua
Jenkins, J. Stockton
Li, Zhenhui
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 12531 - 12537
[37] A MODEL-BASED APPROACH FOR DESCRIBING OFFLINE NAVIGATION OF WEB APPLICATIONS
Albertos-Marco, Felix
Penichet, Victor M. R.
Gallud, Jose A.
Winckler, Marco
JOURNAL OF WEB ENGINEERING, 2017, 16 (1-2): : 1 - 38
[38] SETTLING THE SAMPLE COMPLEXITY OF MODEL-BASED OFFLINE REINFORCEMENT LEARNING
Li, Gen
Shi, Laixi
Chen, Yuxin
Chi, Yuejie
Wei, Yuting
ANNALS OF STATISTICS, 2024, 52 (01): : 233 - 260
[39] Discriminator-Guided Model-Based Offline Imitation Learning
Zhang, Wenjia
Xu, Haoran
Niu, Haoyi
Cheng, Peng
Li, Ming
Zhang, Heming
Zhou, Guyue
Zhan, Xianyuan
CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 1266 - 1276
[40] Bidirectional Learning for Offline Model-based Biological Sequence Design
Chen, Can
Zhang, Yingxue
Liu, Xue
Coates, Mark
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202

← 1 2 3 4 5 →