Hyperparameter Tuning in Offline Reinforcement Learning

被引：0

作者：

Tittaferrante, Andrew ^{[1
]}

Yassine, Abdulsalam ^{[2
]}

机构：

[1] Lakehead Univ, Elect & Comp Engn, Thunder Bay, ON, Canada

[2] Lakehead Univ, Software Engn, Thunder Bay, ON, Canada

来源：

2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA | 2022年

关键词：

Deep Learning; Reinforcement Learning; Offline Reinforcement Learning;

D O I：

10.1109/ICMLA55696.2022.00101

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work, we propose a reliable hyper-parameter tuning scheme for offline reinforcement learning. We demonstrate our proposed scheme using the simplest antmaze environment from the standard benchmark offline dataset, D4RL. The usual approach for policy evaluation in offline reinforcement learning involves online evaluation, i.e., cherry-picking best performance on the test environment. To mitigate this cherry-picking, we propose an ad-hoc online evaluation metric, which we name "median-median-return". This metric enables more reliable reporting of results because it represents the expected performance of the learned policy by taking the median online evaluation performance across both epochs and training runs. To demonstrate our scheme, we employ the recently state-of-the-art algorithm, IQL, and perform a thorough hyperparameter search based on our proposed metric. The tuned architectures enjoy notably stronger cherry-picked performance, and the best models are able to surpass the reported state-of-the-art performance on average.

引用

页码：585 / 590

页数：6

共 50 条

[31] Offline reinforcement learning with task hierarchies
Devin Schwab
Soumya Ray
Machine Learning, 2017, 106 : 1569 - 1598
[32] Offline Reinforcement Learning at Multiple Frequencies
Burns, Kaylee
Yu, Tianhe
Finn, Chelsea
Hausman, Karol
CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 2041 - 2051
[33] Survival Instinct in Offline Reinforcement Learning
Li, Anqi
Misra, Dipendra
Kolobov, Andrey
Cheng, Ching-An
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[34] A DATASET PERSPECTIVE ON OFFLINE REINFORCEMENT LEARNING
Schweighofer, Kajetan
Radler, Andreas
Dinu, Marius-Constantin
Hofmarcher, Markus
Patil, Vihang
Bitto-Nemling, Angela
Eghbal-zadeh, Hamid
Hochreiter, Sepp
CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 199, 2022, 199
[35] Offline Reinforcement Learning for Mobile Notifications
Yuan, Yiping
Muralidharan, Ajith
Nandy, Preetam
Cheng, Miao
Prabhakar, Prakruthi
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 3614 - 3623
[36] Efficient Transfer Learning Method for Automatic Hyperparameter Tuning
Yogatama, Dani
Mann, Gideon
ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 33, 2014, 33 : 1077 - 1085
[37] Federated Learning Hyperparameter Tuning From a System Perspective
Zhang, Huanle
Fu, Lei
Zhang, Mi
Hu, Pengfei
Cheng, Xiuzhen
Mohapatra, Prasant
Liu, Xin
IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (16) : 14102 - 14113
[38] Enabling Hyperparameter Tuning of Machine Learning Classifiers in Production
Sandha, Sandeep Singh
Aggarwal, Mohit
Saha, Swapnil Sayan
Srivastava, Mani
2021 IEEE THIRD INTERNATIONAL CONFERENCE ON COGNITIVE MACHINE INTELLIGENCE (COGMI 2021), 2021, : 262 - 271
[39] Exploring Hyperparameter Usage and Tuning in Machine Learning Research
Simon, Sebastian
Kolyada, Nikolay
Akiki, Christopher
Potthast, Martin
Stein, Benno
Siegmund, Norbert
2023 IEEE/ACM 2ND INTERNATIONAL CONFERENCE ON AI ENGINEERING - SOFTWARE ENGINEERING FOR AI, CAIN, 2023, : 68 - 79
[40] Efficient Deep Learning Hyperparameter Tuning using Cloud Infrastructure Intelligent Distributed Hyperparameter tuning with Bayesian Optimization in the Cloud
Ranjit, Mercy Prasanna
Ganapathy, Gopinath
Sridhar, Kalaivani
Arumugham, Vikram
2019 IEEE 12TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (IEEE CLOUD 2019), 2019, : 520 - 522

← 1 2 3 4 5 →