Distributional Pareto-Optimal Multi-Objective Reinforcement Learning

被引：0

作者：

Cai, Xin-Qiang ^{[1
,2
]}

Zhang, Pushi ^{[2
]}

Zhao, Li ^{[2
]}

Bian, Jiang ^{[2
]}

Sugiyama, Masashi ^{[1
,3
]}

Llorens, Ashley J. ^{[2
]}

机构：

[1] Univ Tokyo, Tokyo, Japan

[2] Microsoft Res Asia, Beijing, Peoples R China

[3] RIKEN AIP, Tokyo, Japan

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

STOCHASTIC-DOMINANCE;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-objective reinforcement learning (MORL) has been proposed to learn control policies over multiple competing objectives with each possible preference over returns. However, current MORL algorithms fail to account for distributional preferences over the multi-variate returns, which are particularly important in realworld scenarios such as autonomous driving. To address this issue, we extend the concept of Pareto-optimality in MORL into distributional Pareto-optimality, which captures the optimality of return distributions, rather than the expectations. Our proposed method, called Distributional Pareto-Optimal Multi-Objective Reinforcement Learning (DPMORL), is capable of learning distributional Pareto-optimal policies that balance multiple objectives while considering the return uncertainty. We evaluated our method on several benchmark problems and demonstrated its effectiveness in discovering distributional Pareto-optimal policies and satisfying diverse distributional preferences compared to existing MORL methods.

引用

页数：21

共 50 条

[31] Multi-Objective Bayesian Optimization for Design of Pareto-Optimal Current Drive Profiles in STEP
Brown, Theodore
Marsden, Stephen
Gopakumar, Vignesh
Terenin, Alexander
Ge, Hong
Casson, Francis
IEEE TRANSACTIONS ON PLASMA SCIENCE, 2024, : 1 - 6
[32] Multi-Objective Reinforcement Learning using Sets of Pareto Dominating Policies
Van Moffaert, Kristof
Nowe, Ann
JOURNAL OF MACHINE LEARNING RESEARCH, 2014, 15 : 3483 - 3512
[33] Multi-objective Reinforcement Learning through Continuous Pareto Manifold Approximation
Parisi, Simone
Pirotta, Matteo
Restelli, Marcello
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2016, 57 : 187 - 227
[34] A multi-phase covering Pareto-optimal front method to multi-objective parallel machine scheduling
Behnamian, J.
Zandieh, M.
Ghomi, S. M. T. Fatemi
INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2010, 48 (17) : 4949 - 4976
[35] Determining All Pareto-Optimal Paths for Multi-category Multi-objective Path Optimization Problems
Ma, Yiming
Hu, Xiaobing
Zhou, Hang
ADVANCES IN NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, ICNC-FSKD 2022, 2023, 153 : 327 - 335
[36] Multi-Objective Optimization of Water-Sedimentation-Power in Reservoir Based on Pareto-Optimal Solution
李辉
练继建
Transactions of Tianjin University, 2008, (04) : 282 - 288
[37] Pareto-Optimal Transit Route Planning With Multi-Objective Monte-Carlo Tree Search
Weng, Di
Chen, Ran
Zhang, Jianhui
Bao, Jie
Zheng, Yu
Wu, Yingcai
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2021, 22 (02) : 1185 - 1195
[38] Multi-objective optimization of water-sedimentation-power in reservoir based on pareto-optimal solution
Li H.
Lian J.
Trans. Tianjin Univ., 2008, 4 (282-288): : 282 - 288
[39] Pareto-optimal solutions based multi-objective particle swarm optimization control for batch processes
Li Jia
Dashuai Cheng
Min-Sen Chiu
Neural Computing and Applications, 2012, 21 : 1107 - 1116
[40] Pareto-optimal multi-objective dimensionality reduction deep auto-encoder for mammography classification
Taghanaki, Saeid Asgari
Kawahara, Jeremy
Miles, Brandon
Hamarneh, Ghassan
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2017, 145 : 85 - 93

← 1 2 3 4 5 →