Learning and Selection of Pareto Optimal Policies Matching User Preferences

被引：0

作者：

Tamura, Akinori ^{[1
]}

Arai, Sachiyo ^{[1
]}

机构：

[1] Chiba Univ, Dept Global & Environm Studies, Chiba 2638522, Japan

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Multi-objective reinforcement learning; Pareto front; preference;

D O I：

10.1109/ACCESS.2024.3428411

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reinforcement learning, which is attracting attention as a method for optimizing sequential decision-making, primarily focuses on scenarios with a single objective. In contrast, real-world decision-making involves multiple objectives, often with trade-off relationships among these objectives. Multi-Objective Reinforcement Learning (MORL) addresses the sequential decision-making process with multiple objectives. Previous research in MORL has mainly focused on learning the set of Pareto optimal policies, known as the Pareto front. However, these studies do not discuss methods for selecting policies from the Pareto front that align with user preferences. The utility values of each policy in the Pareto front may not necessarily align with user preferences. Therefore, an evaluation metric illustrating the relationship between user preferences and policies is necessary to select the policy that matches user preferences. We introduce a MORL method that incorporates "Mismatch" metric to evaluate the similarity between preferences and policies. The proposed method treats Mismatch metric as a penalty term in the evaluation function and learns the Pareto front using MORL with an evolutionary algorithm. Users can specify preference weight parameters and select policies from the learned Pareto front that minimize Mismatch, aligning with their preferences. We verified the performance of the proposed method by computer experiments in a multi-objective reinforcement learning benchmark environment.

引用

页码：97280 / 97297

页数：18

共 50 条

[1] Learning Optimal Subsets with Implicit User Preferences
Guo, Yunsong
Gomes, Carla
21ST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-09), PROCEEDINGS, 2009, : 1052 - 1057
[2] Stable and Pareto optimal group activity selection from ordinal preferences
Andreas Darmann
International Journal of Game Theory, 2018, 47 : 1183 - 1209
[3] Stable and Pareto optimal group activity selection from ordinal preferences
Darmann, Andreas
INTERNATIONAL JOURNAL OF GAME THEORY, 2018, 47 (04) : 1183 - 1209
[4] Dynamic Pareto Optimal Matching
Fleischer, Rudolf
Wang, Yihui
ISISE 2008: INTERNATIONAL SYMPOSIUM ON INFORMATION SCIENCE AND ENGINEERING, VOL 2, 2008, : 797 - 802
[5] Reinforcement Learning of Pareto-Optimal Multiobjective Policies Using Steering
Vamplew, Peter
Issabekov, Rustam
Dazeley, Richard
Foale, Cameron
AI 2015: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2015, 9457 : 596 - 608
[6] Optimal lossy matching by Pareto fronts
Allen, Jeffery C.
Arceo, Diana
Hansen, Peder
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2008, 55 (06) : 497 - 501
[7] Pareto Optimal Allocation under Uncertain Preferences
Aziz, Haris
de Haan, Ronald
Rastegari, Baharak
PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 77 - 83
[8] Pareto Optimal Allocation under Uncertain Preferences
Aziz, Haris
de Haan, Ronald
Rastegari, Baharak
AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 1472 - 1474
[9] PARETO OPTIMAL POLICIES FOR HARVESTING WITH MULTIPLE OBJECTIVES
MENDELSSOHN, R
MATHEMATICAL BIOSCIENCES, 1980, 51 (3-4) : 213 - 224
[10] Stochastic Pareto-optimal reinsurance policies
Zeng, Xudong
Luo, Shangzhen
INSURANCE MATHEMATICS & ECONOMICS, 2013, 53 (03): : 671 - 677

← 1 2 3 4 5 →