Continuous-Time Fitted Value Iteration for Robust Policies

被引：3

作者：

Lutter, Michael ^{[1
]}

Belousov, Boris ^{[1
]}

Mannor, Shie ^{[2
,3
]}

Fox, Dieter ^{[4
]}

Garg, Animesh ^{[5
]}

Peters, Jan ^{[1
]}

机构：

[1] Tech Univ Darmstadt, Comp Sci Dept, Intelligent Autonomous Syst Grp, D-64289 Darmstadt, Germany

[2] Technion Israel Inst Technol, Andrew & Erna Viterbi Fac Elect & Comp Engn, x0026, IL-3200003 Haifa, Israel

[3] NVIDIA, IL-6121002 Tel Aviv, Israel

[4] Univ Washington, Allen Sch Comp Sci & Engn, NVIDIA, Seattle, WA 98195 USA

[5] Univ Toronto, Comp Sci Dept, NVIDIA, Toronto, ON M5S 1A4, Canada

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 05期

关键词：

Mathematical models; Optimization; Differential equations; Robots; Heuristic algorithms; Reinforcement learning; Costs; Value Iteration; continuous control; dynamic programming adversarial reinforcement learning; REINFORCEMENT; COST;

D O I：

10.1109/TPAMI.2022.3215769

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics. Especially for continuous control, solving this differential equation and its extension the Hamilton-Jacobi-Isaacs equation, is important as it yields the optimal policy that achieves the maximum reward on a give task. In the case of the Hamilton-Jacobi-Isaacs equation, which includes an adversary controlling the environment and minimizing the reward, the obtained policy is also robust to perturbations of the dynamics. In this paper we propose continuous fitted value iteration (cFVI) and robust fitted value iteration (rFVI). These algorithms leverage the non-linear control-affine dynamics and separable state and action reward of many continuous control problems to derive the optimal policy and optimal adversary in closed form. This analytic expression simplifies the differential equations and enables us to solve for the optimal value function using value iteration for continuous actions and states as well as the adversarial case. Notably, the resulting algorithms do not require discretization of states or actions. We apply the resulting algorithms to the Furuta pendulum and cartpole. We show that both algorithms obtain the optimal policy. The robustness Sim2Real experiments on the physical systems show that the policies successfully achieve the task in the real-world. When changing the masses of the pendulum, we observe that robust value iteration is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm. Videos of the experiments are shown at https://sites.google.com/view/rfvi.

引用

页码：5534 / 5548

页数：15

共 50 条

[1] On Integral Value Iteration for Continuous-Time Linear Systems
Lee, Jae Young
Park, Jin Bae
Choi, Yoon Ho
2013 AMERICAN CONTROL CONFERENCE (ACC), 2013, : 4215 - 4220
[2] Robust Policy Iteration for Continuous-Time Linear Quadratic Regulation
Pang, Bo
Bian, Tao
Jiang, Zhong-Ping
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (01) : 504 - 511
[3] Value Iteration for Continuous-Time Linear Time-Invariant Systems
Possieri, Corrado
Sassano, Mario
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (05) : 3070 - 3077
[4] Value Iteration and Adaptive Optimal Control for Linear Continuous-time Systems
Bian, Tao
Jiang, Zhong-Ping
PROCEEDINGS OF THE 2015 7TH IEEE INTERNATIONAL CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEMS (CIS) AND ROBOTICS, AUTOMATION AND MECHATRONICS (RAM), 2015, : 53 - 58
[5] Value-Iteration-Based Robust Adaptive Critic for Disturbed Non-Affine Continuous-Time Systems
Liu, Ao
Wang, Ding
He, Yingyun
Ye, Kai
Qiao, Junfei
INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2025,
[6] Finite-time bounds for fitted value iteration
Munos, Rémi
Szepesvári, Csaba
Journal of Machine Learning Research, 2008, 9 : 815 - 857
[7] A Novel Generalized Value Iteration Scheme For Uncertain Continuous-Time Linear Systems
Lee, Jae Young
Park, Jin Bae
Choi, Yoon Ho
49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, : 4637 - 4642
[8] Finite-time bounds for fitted value iteration
Munos, Remi
Szepesvari, Csaba
JOURNAL OF MACHINE LEARNING RESEARCH, 2008, 9 : 815 - 857
[9] Continuous-Time Time-Varying Policy Iteration
Wei, Qinglai
Liao, Zehua
Yang, Zhanyu
Li, Benkai
Liu, Derong
IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (12) : 4958 - 4971
[10] On approximate policy iteration for continuous-time systems
Wernrud, Andreas
Rantzer, Anders
2005 44th IEEE Conference on Decision and Control & European Control Conference, Vols 1-8, 2005, : 1453 - 1458

← 1 2 3 4 5 →