共 24 条
- [1] ZHAO F, GAO W, LIU T, Et al., Policy iteration and event-triggered robust adaptive dynamic programming for large-scale systems, IFACPapersOnLine, 54, 14, pp. 376-381, (2021)
- [2] ZHU Zhibin, WANG Fuyong, YIN Yanhui, Et al., Consensus of discrete-time multi-agent system based on Q-learning, Control Theory & Applications, 38, 7, pp. 997-1005, (2021)
- [3] FENG Y, ZHANG M, GUO W, Et al., Adaptive optimal control of space tether system for payload capture via policy iteration, Transactions of Nanjing University of Aeronautics and Astronautics, 38, 4, pp. 560-570, (2021)
- [4] QIN Zhihui, LI Ning, LIU Xiaotong, Et al., Overview of research on model-free reinforcement learning, Computer Science, 48, 3, pp. 180-187, (2021)
- [5] ZOU Wei, GE Ling, LIU Yubiao, Reinforcement Learning, (2020)
- [6] HAMADOUCHE M, DEZAN C, ESPES D, Et al., Comparison of value iteration, policy iteration and Q-learning for solving decision-making problems, 2021 International Conference on Unmanned Aircraft Systems, pp. 101-110, (2021)
- [7] BERTSEKAS D P., Dynamic Programming: Deterministic and Stochastic Models, (1987)
- [8] WATKINS C J C H, DAYAN P., Q-learning, Machine Learning, 8, 3, pp. 279-292, (1992)
- [9] BRADTKE S J, YDSTIE B E, BARTO A G., Adaptive linear quadratic control using policy iteration, Proceedings of 1994 American Control Conference, pp. 3475-3479, (1994)
- [10] SINGH S P, JAAKKOLA T, JORDAN M I., Reinforcement learning with soft state aggregation, Neural Information Processing Systems, pp. 361-368, (1994)