Parallel-mentoring for Offline Model-based Optimization

被引：0

作者：

Chen, Can ^{[1
,2
]}

Beckham, Christopher ^{[2
,3
]}

Liu, Zixuan ^{[4
]}

Liu, Xue ^{[1
,2
]}

Pal, Christopher ^{[2
,3
]}

机构：

[1] McGill Univ, Montreal, PQ, Canada

[2] MILA Quebec AI Inst, Montreal, PQ, Canada

[3] Polytech Montreal, Montreal, PQ, Canada

[4] Univ Washington, Seattle, WA 98195 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study offline model-based optimization to maximize a black-box objective function with a static dataset of designs and scores. These designs encompass a variety of domains, including materials, robots and DNA sequences. A common approach trains a proxy on the static dataset to approximate the black-box objective function and performs gradient ascent to obtain new designs. However, this often results in poor designs due to the proxy inaccuracies for out-of-distribution designs. Recent studies indicate that: (a) gradient ascent with a mean ensemble of proxies generally outperforms simple gradient ascent, and (b) a trained proxy provides weak ranking supervision signals for design selection. Motivated by (a) and (b), we propose parallel-mentoring as an effective and novel method that facilitates mentoring among parallel proxies, creating a more robust ensemble to mitigate the out-of-distribution issue. We focus on the three-proxy case and our method consists of two modules. The first module, voting-based pairwise supervision, operates on three parallel proxies and captures their ranking supervision signals as pairwise comparison labels. These labels are combined through majority voting to generate consensus labels, which incorporate ranking supervision signals from all proxies and enable mutual mentoring. However, label noise arises due to possible incorrect consensus. To alleviate this, we introduce an adaptive soft-labeling module with soft-labels initialized as consensus labels. Based on bi-level optimization, this module fine-tunes proxies in the inner level and learns more accurate labels in the outer level to adaptively mentor proxies, resulting in a more robust ensemble. Experiments validate the effectiveness of our method. Our code is available here.

引用

页数：18

共 50 条

[21] Conservative reward enhancement through the nearest neighbor integration in model-based Offline Policy Optimization
Li, Xue
Wang, Bangjun
Ling, Xinghong
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 274
[22] Offline Reinforcement Learning with Reverse Model-based Imagination
Wang, Jianhao
Li, Wenzhe
Jiang, Haozhe
Zhu, Guangxiang
Li, Siyuan
Zhang, Chongjie
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[23] Offline Model-Based Reinforcement Learning for Tokamak Control
Char, Ian
Abbate, Joseph
Bardoczi, Laszlo
Boyer, Mark D.
Chung, Youngseog
Conlin, Rory
Erickson, Keith
Mehta, Viraj
Richner, Nathan
Kolemen, Egemen
Schneider, Jeff
LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
[24] MO2: MODEL-BASED OFFLINE OPTIONS
Salter, Sasha
Wulfmeier, Markus
Tirumala, Dhruva
Heess, Nicolas
Riedmiller, Martin
Hadsell, Raia
Rao, Dushyant
CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 199, 2022, 199
[25] An Analysis of Offline Model-Based Learning with Action Noise
Li, Haoya
Gangwani, Tanmay
Ying, Lexing
JOURNAL OF SCIENTIFIC COMPUTING, 2025, 103 (02)
[26] Model-Based Offline Reinforcement Learning with Local Misspecification
Dong, Kefan
Flet-Berliac, Yannis
Nie, Allen
Brunskill, Emma
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 7423 - 7431
[27] OCEAN-MBRL: Offline Conservative Exploration for Model-Based Offline Reinforcement Learning
Wu, Fan
Zhang, Rui
Yi, Qi
Gao, Yunkai
Guo, Jiaming
Peng, Shaohui
Lan, Siming
Han, Husheng
Pan, Yansong
Yuan, Kaizhao
Jin, Pengwei
Chen, Ruizhi
Chen, Yunji
Li, Ling
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 15897 - 15905
[28] Integrating model-based optimization and program transformation to generate efficient parallel programs
Mitschele-Thiel, A
JOURNAL OF SYSTEMS ARCHITECTURE, 1999, 45 (6-7) : 465 - 482
[29] PARALLEL GPU BASED OFFLINE SIGNATURE VERIFICATION MODEL
Kar, Amit Kumar
Chandra, Saroj Kumar
Bajpai, Manish Kumar
2019 IEEE 16TH INDIA COUNCIL INTERNATIONAL CONFERENCE (IEEE INDICON 2019), 2019,
[30] Model-based design of parallel experiments
Galvanin, Federico
Macchietto, Sandro
Bezzo, Fabrizio
INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 2007, 46 (03) : 871 - 882

← 1 2 3 4 5 →