Backtracking for More Efficient Large Scale Dynamic Programming

被引:1
|
作者
Tripp, Charles [1 ]
Shachter, Ross [2 ]
机构
[1] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Management Sci & Engn, Stanford, CA 94305 USA
关键词
dynamic programming; reinforcement learning; Q-Learning; experience replay; backtracking;
D O I
10.1109/ICMLA.2012.63
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning algorithms are widely used to generate policies for complex Markov decision processes. We introduce backtracking, a modification to reinforcement learning algorithms that can significantly improve their performance, particularly for off-line policy generation. Backtracking waits to perform update calculations until the successor's value has been updated, allowing immediate reuse of update calculations. We demonstrate the effectiveness of backtracking on two benchmark processes using both Q-learning and real-time dynamic programming.
引用
收藏
页码:338 / 343
页数:6
相关论文
共 50 条
  • [1] Toward a model for backtracking and dynamic programming
    Alekhnovich, M
    Borodin, A
    Buresh-Oppenheim, J
    Impagliazzo, R
    Magen, A
    Pitassi, T
    TWENTIETH ANNUAL IEEE CONFERENCE ON COMPUTATIONAL COMPLEXITY, PROCEEDINGS, 2005, : 308 - 322
  • [2] TOWARD A MODEL FOR BACKTRACKING AND DYNAMIC PROGRAMMING
    Alekhnovich, Michael
    Borodin, Allan
    Buresh-Oppenheim, Joshua
    Impagliazzo, Russell
    Magen, Avner
    Pitassi, Toniann
    COMPUTATIONAL COMPLEXITY, 2011, 20 (04) : 679 - 740
  • [3] Toward a Model for Backtracking and Dynamic Programming
    Michael Alekhnovich
    Allan Borodin
    Joshua Buresh-Oppenheim
    Russell Impagliazzo
    Avner Magen
    Toniann Pitassi
    computational complexity, 2011, 20 : 679 - 740
  • [4] Efficient intelligent backtracking using linear programming
    Davey, B
    Boland, N
    Stuckey, PJ
    INFORMS JOURNAL ON COMPUTING, 2002, 14 (04) : 373 - 386
  • [5] A Method of Motif Mining Based on Backtracking and Dynamic Programming
    Song, Xiaoli
    Zhou, Changjun
    Wang, Bin
    Zhang, Qiang
    MULTI-DISCIPLINARY TRENDS IN ARTIFICIAL INTELLIGENCE, MIWAI 2015, 2015, 9426 : 317 - 328
  • [6] An Efficient Dynamic Programming Algorithm for Phosphorylation Site Assignment of Large-Scale Mass Spectrometry Data
    Saeed, Fahad
    Pisitkun, Trairak
    Hoffert, Jason D.
    Wang, Guanghui
    Gucek, Marjan
    Knepper, Mark A.
    2012 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS (BIBMW), 2012,
  • [7] An efficient algorithm for large scale stochastic nonlinear programming problems
    Shastri, Y
    Diwekar, U
    COMPUTERS & CHEMICAL ENGINEERING, 2006, 30 (05) : 864 - 877
  • [8] Efficient large scale linear programming support vector machines
    Sra, Suvrit
    MACHINE LEARNING: ECML 2006, PROCEEDINGS, 2006, 4212 : 767 - 774
  • [9] Fast dynamic programming algorithm for the large scale VCVRP problem
    Zhang P.
    Xiao K.
    Fu C.
    Yang K.
    1600, Systems Engineering Society of China (36): : 694 - 705
  • [10] Feature-based methods for large scale dynamic programming
    Tsitsiklis, JN
    VanRoy, B
    MACHINE LEARNING, 1996, 22 (1-3) : 59 - 94