Approximate dynamic programming with Gaussian processes

被引:19
|
作者
Deisenroth, Marc P. [1 ,2 ]
Peters, Jan [2 ]
Rasmussen, Carl E. [1 ,2 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
[2] Max Plank Inst Biol Cybernet, Tubingen, Germany
关键词
D O I
10.1109/ACC.2008.4587201
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In general, it is difficult to determine an optimal closed-loop policy in nonlinear control problems with continuous-valued state and control domains. Hence, approximations are often inevitable. The standard method of discretizing states and controls suffers from the curse of dimensionality and strongly depends on the chosen temporal sampling rate. In this paper, we introduce Gaussian process dynamic programming (GPDP) and determine an approximate globally optimal closed-loop policy. In GPDP, value functions in the Bellman recursion of the dynamic programming algorithm are modeled using Gaussian processes. GPDP returns an optimal state-feedback for a finite set of states. Based on these outcomes, we learn a possibly discontinuous closed-loop policy on the entire state space by switching between two independently trained Gaussian processes. A binary classifier selects one Gaussian process to predict the optimal control signal. We show that GPDP is able to yield an almost optimal solution to an LQ problem using few sample points. Moreover, we successfully apply GPDP to the underpowered pendulum swing up, a complex nonlinear control problem.
引用
收藏
页码:4480 / +
页数:2
相关论文
共 50 条
  • [41] Transformation to approximate independence for locally stationary Gaussian processes
    Guinness, Joseph
    Stein, Michael L.
    JOURNAL OF TIME SERIES ANALYSIS, 2013, 34 (05) : 574 - 590
  • [42] Approximate Inference Turns Deep Networks into Gaussian Processes
    Khan, Mohammad Emtiyaz
    Immer, Alexander
    Abedi, Ehsan
    Korzepa, Maciej
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [43] Approximate Dynamic Programming of Continuous Annealing process
    Zhang, Yingwei
    Guo, Chao
    Chen, Xue
    Teng, Yongdong
    2009 IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND LOGISTICS ( ICAL 2009), VOLS 1-3, 2009, : 353 - 358
  • [44] Markdown Optimization via Approximate Dynamic Programming
    Cosgun, Ozlem
    Kula, Ufuk
    Kahraman, Cengiz
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2013, 6 (01) : 64 - 78
  • [45] The RBF neural network in approximate dynamic programming
    Ster, B
    Dobnikar, A
    ARTIFICIAL NEURAL NETS AND GENETIC ALGORITHMS, 1999, : 161 - 165
  • [46] Empirical Policy Iteration for Approximate Dynamic Programming
    Haskell, William B.
    Jain, Rahul
    Kalathil, Dileep
    2014 IEEE 53RD ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2014, : 6573 - 6578
  • [47] Sampled fictitious play for approximate dynamic programming
    Epelman, Marina
    Ghate, Archis
    Smith, Robert L.
    COMPUTERS & OPERATIONS RESEARCH, 2011, 38 (12) : 1705 - 1718
  • [48] Intelligent Questionnaires Using Approximate Dynamic Programming
    Logé F.
    Le Pennec E.
    Amadou-Boubacar H.
    i-com, 2021, 19 (03) : 227 - 237
  • [49] Approximate Dynamic Programming via Penalty Functions
    Beuchat, Paul N.
    Lygeros, John
    IFAC PAPERSONLINE, 2017, 50 (01): : 11814 - 11821
  • [50] An approximate dynamic programming approach for collaborative caching
    Yang, Xinan
    Thomos, Nikolaos
    ENGINEERING OPTIMIZATION, 2021, 53 (06) : 1005 - 1023