Learning and Selection of Pareto Optimal Policies Matching User Preferences

被引:0
|
作者
Tamura, Akinori [1 ]
Arai, Sachiyo [1 ]
机构
[1] Chiba Univ, Dept Global & Environm Studies, Chiba 2638522, Japan
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Multi-objective reinforcement learning; Pareto front; preference;
D O I
10.1109/ACCESS.2024.3428411
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning, which is attracting attention as a method for optimizing sequential decision-making, primarily focuses on scenarios with a single objective. In contrast, real-world decision-making involves multiple objectives, often with trade-off relationships among these objectives. Multi-Objective Reinforcement Learning (MORL) addresses the sequential decision-making process with multiple objectives. Previous research in MORL has mainly focused on learning the set of Pareto optimal policies, known as the Pareto front. However, these studies do not discuss methods for selecting policies from the Pareto front that align with user preferences. The utility values of each policy in the Pareto front may not necessarily align with user preferences. Therefore, an evaluation metric illustrating the relationship between user preferences and policies is necessary to select the policy that matches user preferences. We introduce a MORL method that incorporates "Mismatch" metric to evaluate the similarity between preferences and policies. The proposed method treats Mismatch metric as a penalty term in the evaluation function and learns the Pareto front using MORL with an evolutionary algorithm. Users can specify preference weight parameters and select policies from the learned Pareto front that minimize Mismatch, aligning with their preferences. We verified the performance of the proposed method by computer experiments in a multi-objective reinforcement learning benchmark environment.
引用
收藏
页码:97280 / 97297
页数:18
相关论文
共 50 条
  • [1] Learning Optimal Subsets with Implicit User Preferences
    Guo, Yunsong
    Gomes, Carla
    21ST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-09), PROCEEDINGS, 2009, : 1052 - 1057
  • [2] Stable and Pareto optimal group activity selection from ordinal preferences
    Andreas Darmann
    International Journal of Game Theory, 2018, 47 : 1183 - 1209
  • [3] Stable and Pareto optimal group activity selection from ordinal preferences
    Darmann, Andreas
    INTERNATIONAL JOURNAL OF GAME THEORY, 2018, 47 (04) : 1183 - 1209
  • [4] Dynamic Pareto Optimal Matching
    Fleischer, Rudolf
    Wang, Yihui
    ISISE 2008: INTERNATIONAL SYMPOSIUM ON INFORMATION SCIENCE AND ENGINEERING, VOL 2, 2008, : 797 - 802
  • [5] Reinforcement Learning of Pareto-Optimal Multiobjective Policies Using Steering
    Vamplew, Peter
    Issabekov, Rustam
    Dazeley, Richard
    Foale, Cameron
    AI 2015: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2015, 9457 : 596 - 608
  • [6] Optimal lossy matching by Pareto fronts
    Allen, Jeffery C.
    Arceo, Diana
    Hansen, Peder
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2008, 55 (06) : 497 - 501
  • [7] Pareto Optimal Allocation under Uncertain Preferences
    Aziz, Haris
    de Haan, Ronald
    Rastegari, Baharak
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 77 - 83
  • [8] Pareto Optimal Allocation under Uncertain Preferences
    Aziz, Haris
    de Haan, Ronald
    Rastegari, Baharak
    AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 1472 - 1474
  • [9] PARETO OPTIMAL POLICIES FOR HARVESTING WITH MULTIPLE OBJECTIVES
    MENDELSSOHN, R
    MATHEMATICAL BIOSCIENCES, 1980, 51 (3-4) : 213 - 224
  • [10] Stochastic Pareto-optimal reinsurance policies
    Zeng, Xudong
    Luo, Shangzhen
    INSURANCE MATHEMATICS & ECONOMICS, 2013, 53 (03): : 671 - 677