Distributional Pareto-Optimal Multi-Objective Reinforcement Learning

被引:0
|
作者
Cai, Xin-Qiang [1 ,2 ]
Zhang, Pushi [2 ]
Zhao, Li [2 ]
Bian, Jiang [2 ]
Sugiyama, Masashi [1 ,3 ]
Llorens, Ashley J. [2 ]
机构
[1] Univ Tokyo, Tokyo, Japan
[2] Microsoft Res Asia, Beijing, Peoples R China
[3] RIKEN AIP, Tokyo, Japan
关键词
STOCHASTIC-DOMINANCE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-objective reinforcement learning (MORL) has been proposed to learn control policies over multiple competing objectives with each possible preference over returns. However, current MORL algorithms fail to account for distributional preferences over the multi-variate returns, which are particularly important in realworld scenarios such as autonomous driving. To address this issue, we extend the concept of Pareto-optimality in MORL into distributional Pareto-optimality, which captures the optimality of return distributions, rather than the expectations. Our proposed method, called Distributional Pareto-Optimal Multi-Objective Reinforcement Learning (DPMORL), is capable of learning distributional Pareto-optimal policies that balance multiple objectives while considering the return uncertainty. We evaluated our method on several benchmark problems and demonstrated its effectiveness in discovering distributional Pareto-optimal policies and satisfying diverse distributional preferences compared to existing MORL methods.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] Estimating Objective Weights of Pareto-Optimal Policies for Multi-Objective Sequential Decision-Making
    Ikenaga, Akiko
    Arai, Sachiyo
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2024, 28 (02) : 393 - 402
  • [22] Sampling of Pareto-Optimal Trajectories using Progressive Objective Evaluation in Multi-Objective Motion Planning
    Lee, Jeongseok
    Yi, Daqing
    Srinivasa, Siddhartha S.
    2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 5358 - 5365
  • [23] On the Limitations of Scalarisation for Multi-objective Reinforcement Learning of Pareto Fronts
    Vamplew, Peter
    Yearwood, John
    Dazeley, Richard
    Berry, Adam
    AI 2008: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2008, 5360 : 372 - 378
  • [24] Multi-Objective Reinforcement Learning with Continuous Pareto Frontier Approximation
    Pirotta, Matteo
    Parisi, Simone
    Restelli, Marcello
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2928 - 2934
  • [25] Multi-objective ranking of pareto-optimal scenarios for regional solid waste management in Central GreeceΠΕλυσκΕπική Ιεράρχι ση Σεναρίων Pareto για Περιφερειακή Δια χείριση ΑπΕρριμμάτω ν στην Στερεά Ελλάδα
    A. Karagiannidis
    G. Perkoulidis
    N. Moussiopoulos
    Operational Research, 2001, 1 (3) : 225 - 240
  • [26] Steering approaches to Pareto-optimal multiobjective reinforcement learning
    Vamplew, Peter
    Issabekov, Rustam
    Dazeley, Richard
    Foale, Cameron
    Berry, Adam
    Moore, Tim
    Creighton, Douglas
    NEUROCOMPUTING, 2017, 263 : 26 - 38
  • [27] Pareto Optimal Solutions for Network Defense Strategy Selection Simulator in Multi-Objective Reinforcement Learning
    Sun, Yang
    Li, Yun
    Xiong, Wei
    Yao, Zhonghua
    Moniz, Krishna
    Zahir, Ahmed
    APPLIED SCIENCES-BASEL, 2018, 8 (01):
  • [28] Computing a Pareto-optimal solution for multi-objective flexible linear programming in a bipolar framework
    Dubey, Dipti
    Chandra, Suresh
    Mehra, Aparna
    INTERNATIONAL JOURNAL OF GENERAL SYSTEMS, 2015, 44 (04) : 457 - 470
  • [29] Improved Heatmap Visualization of Pareto-Optimal Set in Multi-Objective Optimization of Defensive Strategy
    Li, Erqing
    Xia, Chuangming
    Zhao, Dongdong
    Lu, Liping
    Xiang, Jianwen
    He, Yueying
    Wang, Jin
    Wu, Jiangning
    2018 IEEE 18TH INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY COMPANION (QRS-C), 2018, : 345 - 352
  • [30] Pareto-optimal equilibrium points in non-cooperative multi-objective optimization problems
    Monfared, Mohammadali Saniee
    Monabbati, Sayyed Ehsan
    Kafshgar, Atefeh Rajabi
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 178