Parallel-mentoring for Offline Model-based Optimization

被引:0
|
作者
Chen, Can [1 ,2 ]
Beckham, Christopher [2 ,3 ]
Liu, Zixuan [4 ]
Liu, Xue [1 ,2 ]
Pal, Christopher [2 ,3 ]
机构
[1] McGill Univ, Montreal, PQ, Canada
[2] MILA Quebec AI Inst, Montreal, PQ, Canada
[3] Polytech Montreal, Montreal, PQ, Canada
[4] Univ Washington, Seattle, WA 98195 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study offline model-based optimization to maximize a black-box objective function with a static dataset of designs and scores. These designs encompass a variety of domains, including materials, robots and DNA sequences. A common approach trains a proxy on the static dataset to approximate the black-box objective function and performs gradient ascent to obtain new designs. However, this often results in poor designs due to the proxy inaccuracies for out-of-distribution designs. Recent studies indicate that: (a) gradient ascent with a mean ensemble of proxies generally outperforms simple gradient ascent, and (b) a trained proxy provides weak ranking supervision signals for design selection. Motivated by (a) and (b), we propose parallel-mentoring as an effective and novel method that facilitates mentoring among parallel proxies, creating a more robust ensemble to mitigate the out-of-distribution issue. We focus on the three-proxy case and our method consists of two modules. The first module, voting-based pairwise supervision, operates on three parallel proxies and captures their ranking supervision signals as pairwise comparison labels. These labels are combined through majority voting to generate consensus labels, which incorporate ranking supervision signals from all proxies and enable mutual mentoring. However, label noise arises due to possible incorrect consensus. To alleviate this, we introduce an adaptive soft-labeling module with soft-labels initialized as consensus labels. Based on bi-level optimization, this module fine-tunes proxies in the inner level and learns more accurate labels in the outer level to adaptively mentor proxies, resulting in a more robust ensemble. Experiments validate the effectiveness of our method. Our code is available here.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] Conservative reward enhancement through the nearest neighbor integration in model-based Offline Policy Optimization
    Li, Xue
    Wang, Bangjun
    Ling, Xinghong
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 274
  • [22] Offline Reinforcement Learning with Reverse Model-based Imagination
    Wang, Jianhao
    Li, Wenzhe
    Jiang, Haozhe
    Zhu, Guangxiang
    Li, Siyuan
    Zhang, Chongjie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [23] Offline Model-Based Reinforcement Learning for Tokamak Control
    Char, Ian
    Abbate, Joseph
    Bardoczi, Laszlo
    Boyer, Mark D.
    Chung, Youngseog
    Conlin, Rory
    Erickson, Keith
    Mehta, Viraj
    Richner, Nathan
    Kolemen, Egemen
    Schneider, Jeff
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
  • [24] MO2: MODEL-BASED OFFLINE OPTIONS
    Salter, Sasha
    Wulfmeier, Markus
    Tirumala, Dhruva
    Heess, Nicolas
    Riedmiller, Martin
    Hadsell, Raia
    Rao, Dushyant
    CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 199, 2022, 199
  • [25] An Analysis of Offline Model-Based Learning with Action Noise
    Li, Haoya
    Gangwani, Tanmay
    Ying, Lexing
    JOURNAL OF SCIENTIFIC COMPUTING, 2025, 103 (02)
  • [26] Model-Based Offline Reinforcement Learning with Local Misspecification
    Dong, Kefan
    Flet-Berliac, Yannis
    Nie, Allen
    Brunskill, Emma
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 7423 - 7431
  • [27] OCEAN-MBRL: Offline Conservative Exploration for Model-Based Offline Reinforcement Learning
    Wu, Fan
    Zhang, Rui
    Yi, Qi
    Gao, Yunkai
    Guo, Jiaming
    Peng, Shaohui
    Lan, Siming
    Han, Husheng
    Pan, Yansong
    Yuan, Kaizhao
    Jin, Pengwei
    Chen, Ruizhi
    Chen, Yunji
    Li, Ling
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 15897 - 15905
  • [28] Integrating model-based optimization and program transformation to generate efficient parallel programs
    Mitschele-Thiel, A
    JOURNAL OF SYSTEMS ARCHITECTURE, 1999, 45 (6-7) : 465 - 482
  • [29] PARALLEL GPU BASED OFFLINE SIGNATURE VERIFICATION MODEL
    Kar, Amit Kumar
    Chandra, Saroj Kumar
    Bajpai, Manish Kumar
    2019 IEEE 16TH INDIA COUNCIL INTERNATIONAL CONFERENCE (IEEE INDICON 2019), 2019,
  • [30] Model-based design of parallel experiments
    Galvanin, Federico
    Macchietto, Sandro
    Bezzo, Fabrizio
    INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 2007, 46 (03) : 871 - 882