Optimal control of policy iteration with adaptive adjustment of window length

被引:0
|
作者
Fang X. [1 ]
Luan X.-L. [1 ]
Liu F. [1 ]
机构
[1] Key Laboratory for Advanced Process Control of Light Industry, Ministry of Education, Institute of Automation, Jiangnan University, Jiangsu, Wuxi
基金
中国国家自然科学基金;
关键词
adaptive adjustment of window length; influence function; optimal control; policy iteration;
D O I
10.7641/CTA.2023.21013
中图分类号
学科分类号
摘要
In the optimal control problem with unknown system model parameters, the key to whether the policy iteration can quickly converge to the optimal control policy is the estimation of the value function. In order to improve the estimation accuracy and speed of the value function, this paper proposes a policy iteration optimal control algorithm with adaptive window length adjustment. By making full use of the historical sample data within a period of time, the influence function is used to construct the quantitative relationship between the window length and the estimation performance of the value function, and the window length is adaptively adjusted according to the different influence of the data window length on the estimation performance. Finally, the proposed method is applied to the continuous fermentation process. Simulation results show that the proposed method can accelerate the convergence of the optimal control policy, overcome the influence of parameter changes or external disturbances on the control performance, and improve the control accuracy. © 2024 South China University of Technology. All rights reserved.
引用
收藏
页码:745 / 750
页数:5
相关论文
共 24 条
  • [1] ZHAO F, GAO W, LIU T, Et al., Policy iteration and event-triggered robust adaptive dynamic programming for large-scale systems, IFACPapersOnLine, 54, 14, pp. 376-381, (2021)
  • [2] ZHU Zhibin, WANG Fuyong, YIN Yanhui, Et al., Consensus of discrete-time multi-agent system based on Q-learning, Control Theory & Applications, 38, 7, pp. 997-1005, (2021)
  • [3] FENG Y, ZHANG M, GUO W, Et al., Adaptive optimal control of space tether system for payload capture via policy iteration, Transactions of Nanjing University of Aeronautics and Astronautics, 38, 4, pp. 560-570, (2021)
  • [4] QIN Zhihui, LI Ning, LIU Xiaotong, Et al., Overview of research on model-free reinforcement learning, Computer Science, 48, 3, pp. 180-187, (2021)
  • [5] ZOU Wei, GE Ling, LIU Yubiao, Reinforcement Learning, (2020)
  • [6] HAMADOUCHE M, DEZAN C, ESPES D, Et al., Comparison of value iteration, policy iteration and Q-learning for solving decision-making problems, 2021 International Conference on Unmanned Aircraft Systems, pp. 101-110, (2021)
  • [7] BERTSEKAS D P., Dynamic Programming: Deterministic and Stochastic Models, (1987)
  • [8] WATKINS C J C H, DAYAN P., Q-learning, Machine Learning, 8, 3, pp. 279-292, (1992)
  • [9] BRADTKE S J, YDSTIE B E, BARTO A G., Adaptive linear quadratic control using policy iteration, Proceedings of 1994 American Control Conference, pp. 3475-3479, (1994)
  • [10] SINGH S P, JAAKKOLA T, JORDAN M I., Reinforcement learning with soft state aggregation, Neural Information Processing Systems, pp. 361-368, (1994)