Learning and Selection of Pareto Optimal Policies Matching User Preferences

被引:0
|
作者
Tamura, Akinori [1 ]
Arai, Sachiyo [1 ]
机构
[1] Chiba Univ, Dept Global & Environm Studies, Chiba 2638522, Japan
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Multi-objective reinforcement learning; Pareto front; preference;
D O I
10.1109/ACCESS.2024.3428411
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning, which is attracting attention as a method for optimizing sequential decision-making, primarily focuses on scenarios with a single objective. In contrast, real-world decision-making involves multiple objectives, often with trade-off relationships among these objectives. Multi-Objective Reinforcement Learning (MORL) addresses the sequential decision-making process with multiple objectives. Previous research in MORL has mainly focused on learning the set of Pareto optimal policies, known as the Pareto front. However, these studies do not discuss methods for selecting policies from the Pareto front that align with user preferences. The utility values of each policy in the Pareto front may not necessarily align with user preferences. Therefore, an evaluation metric illustrating the relationship between user preferences and policies is necessary to select the policy that matches user preferences. We introduce a MORL method that incorporates "Mismatch" metric to evaluate the similarity between preferences and policies. The proposed method treats Mismatch metric as a penalty term in the evaluation function and learns the Pareto front using MORL with an evolutionary algorithm. Users can specify preference weight parameters and select policies from the learned Pareto front that minimize Mismatch, aligning with their preferences. We verified the performance of the proposed method by computer experiments in a multi-objective reinforcement learning benchmark environment.
引用
收藏
页码:97280 / 97297
页数:18
相关论文
共 50 条
  • [21] Optimal climate policies under fairness preferences
    Rogna, Marco
    Vogt, Carla J.
    CLIMATIC CHANGE, 2022, 174 (3-4)
  • [22] Optimal climate policies under fairness preferences
    Marco Rogna
    Carla J. Vogt
    Climatic Change, 2022, 174
  • [23] Resource Selection for Mashup Based on User Preferences
    Wu, Xiaokun
    2013 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), 2013, : 960 - 963
  • [24] Don't Forget the User: From User Preferences to Personal Privacy Policies
    Becher, Stefan
    Gerl, Armin
    Meier, Bianca
    2020 10TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER INFORMATION TECHNOLOGIES (ACIT), 2020, : 774 - 778
  • [25] Learning User Preferences in an Anxious Home
    Tibben, Hayley
    West, Geoff
    SMART HOMES AND BEYOND, 2006, 19 : 188 - +
  • [26] Dynamic algorithm selection for pareto optimal set approximation
    Ingrida Steponavičė
    Rob J. Hyndman
    Kate Smith-Miles
    Laura Villanova
    Journal of Global Optimization, 2017, 67 : 263 - 282
  • [27] Learning Fuzzy SPARQL User Preferences
    Slama, Olfa
    Yazidi, Anis
    2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 1457 - 1462
  • [28] Online Learning with Diverse User Preferences
    Gan, Chao
    Yang, Jing
    Zhou, Ruida
    Shen, Cong
    2019 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2019, : 2539 - 2543
  • [29] Learning User Preferences in Mechanism Design
    Chorppath, Anil Kumar
    Alpcan, Tansu
    2011 50TH IEEE CONFERENCE ON DECISION AND CONTROL AND EUROPEAN CONTROL CONFERENCE (CDC-ECC), 2011, : 5349 - 5355
  • [30] Learning User Preferences Without Feedbacks
    Zhang, Wei
    Challis, Chris
    2021 IEEE 8TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2021,